SQL Stories · Ecom Data Lake Extension

Purpose: Hydrate the SQL Stories ecosystem with production-style data lake drops. The ecom-datalake-exten repository wraps the ecom_sales_data_generator engine with a CLI that converts synthetic CSV output into partitioned Parquet, enriches files with lineage metadata, and publishes them to cloud storage.

Python Click CLI Parquet Google Cloud Storage Pytest CI/CD

How It Extends SQL Stories

This repo turns the synthetic generator into a lightweight data platform. Pair it with the Skills Builder notebooks or Portfolio Demo dashboards to show how ingestion, analysis, and storytelling connect.

Core CLI Workflow

1. Generate CSV Artifacts

ecomlake run-generator \
  --config gen_config/ecom_sales_gen_quick.yaml \
  --artifact-root artifacts \
  --messiness-level medium \
  --generator-src ../ecom_sales_data_generator/src

2. Export to Partitioned Parquet

ecomlake export-raw \
  --source artifacts/raw_run_20251019T173945Z \
  --target output/raw \
  --ingest-date 2024-02-15 \
  --target-size-mb 10

3. Publish to Your Lake

ecomlake upload-raw \
  --source output/raw \
  --bucket gs://your-project-raw \
  --prefix ecom/raw \
  --ingest-date 2024-02-15

Why It Matters for Stakeholders

Resources & Next Steps