AI News Knowledge Base
Hybrid (Wiki + Vectorial) Personal Knowledge System.
As a Data & AI builder, managing the influx of AI research and news requires robust tooling. The AI News Knowledge Base is a personal, local-first system that combines the interconnected structure of a Markdown Wiki (LLM Wiki pattern) with the lightning-fast retrieval capabilities of a vector database.
System Architecture
The core philosophy of this project is simplicity and speed. Instead of relying on heavy machine learning models for embeddings, it uses sparse vectors:
- Agent Integration: Python scripts (
kb_ingest.py,kb-query.py) that act as the orchestration layer, automatically scanning local Markdown files (like Obsidian vaults) and RSS feeds. - Search Engine: Qdrant running locally via Docker.
- Retrieval Mechanism: Implements Qdrant sparse vectors (TF-IDF). This tokenizes text and searches by term similarity without needing PyTorch,
sentence-transformers, or any heavy ML models, making it extremely lightweight and perfectly suited for structured Markdown files.
How it Works
The ingestion pipeline processes the knowledge/ directory (which contains entities, concepts, comparisons, and raw articles) and indexes them into Qdrant.
Because it uses TF-IDF, semantic searches are precise for technical terms:
- Searching
"reasoning models"instantly finds documents containing âreasoningâ, âmodelsâ, and âchain-of-thoughtâ. - Searching
"MoE architecture"maps to âmixture of expertsâ, âsparse activationâ, and âMoEâ.
Key Takeaways
This project showcases my pragmatic approach to AI engineering. While dense embeddings (like OpenAIâs or E5) are popular, I recognized that for a local Markdown wiki, sparse vectors provide a faster, cheaper, and equally effective solution without the overhead of neural network inference. Itâs a perfect example of choosing the right tool for the job.