Sky Aviation Analytics Dashboard
End-to-end data pipeline and self-contained analytical dashboard.
Data engineering is about building robust, reproducible pipelines. For the Sky Aviation Club, I developed a complete analytics pipeline that ingests raw operational data, validates it, transforms it into columnar formats, and serves it through a lightweight, self-contained interactive dashboard.
The Data Pipeline
The system is structured into a clear, modular pipeline (src/pipeline):
- Ingestion & Validation: Extracts web analytics data from Rybbit (via CSV exports or API). The
ingest.pymodule enforces data quality using schema validation and checksums to ensure data integrity. - Transformation: The
transform.pyscript handles data enrichment, KPI calculation, and anomaly detection. - Analytical Engine: Utilizing DuckDB as an in-process analytical engine (
analytics.py), the pipeline performs Spark-like SQL queries directly on local data with exceptional speed. - Storage: Processed data is stored in Apache Parquet, leveraging columnar compression for fast read access and minimal storage footprint.
The Delivery Artifact
Instead of relying on heavy BI tools (like Tableau or PowerBI) or requiring a dedicated backend server, the final output is elegantly simple:
The build_dashboard.py script embeds the processed data directly into a self-contained HTML dashboard. This index.html artifact can be deployed anywhere (S3, GitHub Pages, or sent via email) while remaining fully interactive.
Engineering Mindset
This project highlights a “scrappy builder” yet production-ready mindset. I implemented automated testing (make test), strict Python environments (Python 3.11+), and a Makefile-driven execution flow. It proves my ability to design data solutions that are both technically rigorous (checksums, columnar storage) and highly pragmatic (serverless HTML delivery).