Hello world!

# Microtypo - Data Engineer Project

This is an end-to-end data engineer project to monitor my keystores, with the data flow is designed for maximizing automation and availability.

Related services are deployed on bare-metal Kubernetes, including most modern data tools and frameworks (OSS).

This is the python package that will be installed on one's machine
It's job is to capture keystores and mouse clicks using pynput and running as a click application
After receiving 100 records (keystrokes), the timestamps of these records will be shuffled and then write as a .csv file under ~/microtypo/records/[timestamp].csv
For an interval of 10 minutes (configurable), the above-mentioned .csv file is then uploaded on Amazon S3 using Boto3 with prefix such as records/2024-06/U000002/
This process is to allow Typorio to upload records whenever internet connection is available, and leveraging S3's amazing SLA attributes as the data lake to store raw .csv files.

An implementation of Dagster data pipelines orchestration, can be visited at: https://dagster.microtypo.com/
A Dagster Sensor is set up to subscribe for new files arrive under a particular S3 Bucket (dev/stage/prod), with month-partitioned prefix to avoid hitting list_objects_v2 api limitation (1,000 objects)
After a new s3 key is detected, a dagster runner is spin-up to download the file, shuffle all timestamps for all records (similar as above), and then:
- Overwrite to a .parquet file stored in local Minio storage with new records using Polars with its awesome functionalities
- Append new rows to the SQL data warehouse, implemented using StackGres with 2 nodes: primary and replication
Additionally, a Dagster Schedule is set up to run dbt command for an interval of 1 hour (cron expressions), materializing defined models for data visualization using Lightdash
Notes: there are some unrelated pipelines in the Dagster UI, that I researched and practiced building more complex pipelines, for example:
- Process and store Amazon Reviews'23 dataset
- Follow Dagster University courses

A minimal implementation of dbt, following this structuring approach with staging, intermediate and mart dbt models

Dagster UI
- View-only
Apache Superset dashboard with prebuilt charts
- username: admin
- password: password
Lightdash dashboard for custom metrics
- username: admin
- password: password$