AI Saber Pro Assistant

AI Saber Pro Assistant

Production-ready RAG system designed to answer complex queries about the Colombian state exam using official sources, featuring a custom evaluation pipeline.

Python FastAPI RAG ChromaDB Gemini
Table of Contents

SaberPro AI Assistant

A Retrieval-Augmented Generation (RAG) system for Colombian state exam guidance.

As an AI Engineer, I built the SaberPro AI Assistant to solve a real-world problem: Colombian university students waste countless hours parsing long, complex official manuals for the state exam (Saber Pro) and the PRISMA platform. This project replaces manual searching with an autonomous conversational agent that delivers accurate, real-time answers backed exclusively by official sources.

Architecture & Approach

The system is architected as a complete RAG application, specifically designed to eliminate hallucinations and maximize factual accuracy:

1. Vector Retrieval & Ingestion

  • Uses ChromaDB as the local vector database.
  • Implements the multilingual embeddings model intfloat/multilingual-e5-base to achieve high-precision semantic search over dense official documents.

2. Strict Generation

  • Powered by Google Gemini 2.5 Flash.
  • Implements strict prompting strategies to constrain the LLM: it must answer exclusively using retrieved context, cite sources, and avoid internal reasoning (Chain of Thought) leakage to the user.

3. Real-Time Streaming Backend

  • Built on top of FastAPI.
  • Utilizes WebSockets to stream tokens from the LLM directly to the frontend, significantly reducing perceived latency and improving user experience.

Benchmark & Evaluation Pipeline

To guarantee the quality of the system and prove its reliability, I built an automated benchmarking script (benchmark.py) that evaluates both the Retriever and the Generator against a curated test dataset:

  • Retriever Metrics: Measures Precision@K, Recall@K, and MRR@K to validate vector search quality.
  • Generator Metrics: Compares LLM outputs against ground-truth answers using ExactMatch, BLEU, and ROUGE-L.

All results are automatically exported to benchmark_results.csv for continuous performance tracking.

Engineering Highlights

This project demonstrates my ability to go beyond simple API wrappers. By implementing custom evaluation pipelines, strict context grounding, and WebSocket streaming, I built a reliable AI product ready for real users.