AI News Knowledge Base

Hybrid (Wiki + Vectorial) Personal Knowledge System.

As a Data & AI builder, managing the influx of AI research and news requires robust tooling. The AI News Knowledge Base is a personal, local-first system that combines the interconnected structure of a Markdown Wiki (LLM Wiki pattern) with the lightning-fast retrieval capabilities of a vector database.

System Architecture

The core philosophy of this project is simplicity and speed. Instead of relying on heavy machine learning models for embeddings, it uses sparse vectors:

Agent Integration: Python scripts (kb_ingest.py, kb-query.py) that act as the orchestration layer, automatically scanning local Markdown files (like Obsidian vaults) and RSS feeds.
Search Engine: Qdrant running locally via Docker.
Retrieval Mechanism: Implements Qdrant sparse vectors (TF-IDF). This tokenizes text and searches by term similarity without needing PyTorch, sentence-transformers, or any heavy ML models, making it extremely lightweight and perfectly suited for structured Markdown files.

How it Works

The ingestion pipeline processes the knowledge/ directory (which contains entities, concepts, comparisons, and raw articles) and indexes them into Qdrant.

Because it uses TF-IDF, semantic searches are precise for technical terms:

Searching "reasoning models" instantly finds documents containing “reasoning”, “models”, and “chain-of-thought”.
Searching "MoE architecture" maps to “mixture of experts”, “sparse activation”, and “MoE”.

Key Takeaways

This project showcases my pragmatic approach to AI engineering. While dense embeddings (like OpenAI’s or E5) are popular, I recognized that for a local Markdown wiki, sparse vectors provide a faster, cheaper, and equally effective solution without the overhead of neural network inference. It’s a perfect example of choosing the right tool for the job.

AI News Knowledge Base

Table of Contents

AI News Knowledge Base

System Architecture

How it Works

Key Takeaways