Sky Aviation Analytics

Sky Aviation Analytics

Production-grade data pipeline and interactive analytics dashboard for Sky Aviation Club, using DuckDB and Parquet.

Python Data Engineering DuckDB Parquet Pipeline
Table of Contents

Sky Aviation Analytics Dashboard

End-to-end data pipeline and self-contained analytical dashboard.

Data engineering is about building robust, reproducible pipelines. For the Sky Aviation Club, I developed a complete analytics pipeline that ingests raw operational data, validates it, transforms it into columnar formats, and serves it through a lightweight, self-contained interactive dashboard.

The Data Pipeline

The system is structured into a clear, modular pipeline (src/pipeline):

  1. Ingestion & Validation: Extracts web analytics data from Rybbit (via CSV exports or API). The ingest.py module enforces data quality using schema validation and checksums to ensure data integrity.
  2. Transformation: The transform.py script handles data enrichment, KPI calculation, and anomaly detection.
  3. Analytical Engine: Utilizing DuckDB as an in-process analytical engine (analytics.py), the pipeline performs Spark-like SQL queries directly on local data with exceptional speed.
  4. Storage: Processed data is stored in Apache Parquet, leveraging columnar compression for fast read access and minimal storage footprint.

The Delivery Artifact

Instead of relying on heavy BI tools (like Tableau or PowerBI) or requiring a dedicated backend server, the final output is elegantly simple: The build_dashboard.py script embeds the processed data directly into a self-contained HTML dashboard. This index.html artifact can be deployed anywhere (S3, GitHub Pages, or sent via email) while remaining fully interactive.

Engineering Mindset

This project highlights a “scrappy builder” yet production-ready mindset. I implemented automated testing (make test), strict Python environments (Python 3.11+), and a Makefile-driven execution flow. It proves my ability to design data solutions that are both technically rigorous (checksums, columnar storage) and highly pragmatic (serverless HTML delivery).