π οΈ How SQL Stories Are Made
π― Purpose
This document explains how each SQL Story in this repo is created using a mix of synthetic data generation, structured business framing, and AI-assisted scenario design. The goal is to simulate realistic business questions and data challenges that strengthen SQL and analytical fluency.
𧬠Data Generation Strategy
All stories are powered by a dataset produced using the companion project:
β‘οΈ ecom_sales_data_generator
That repo provides:
- Modular Python scripts for data simulation
- Scenario-based YAML configurations
- Controlled injection of data messiness
ποΈ This repository includes database.zip
, which contains the pre-built SQLite databases. The output includes:
- Clean CSVs for each table (
orders
,order_items
,returns
, etc.) - A zip archive with loading assets:
ecom_data_gen_output/database.zip
Inside that zip:
*.csv
files (one per table)load_data.sql
to construct the schema and load into SQLite
π§ͺ Mess Injection (Realism Tuning)
The data generator supports configurable βmessβ levels:
none
: perfectly clean, ideal for baselines or learnersmedium
: includes nulls, case issues, date shifts, return spikesheavy
: simulates real-world chaos β fuzzy joins, data mismatches, and edge-case outliers
This messiness emulates POS systems or early-stage data warehouses where governance is still maturing.
The included database was configured with a
medium
mess injection.
π€ AI's Role in Story Design
AI acts as a co-author and validator, helping shape business scenarios around each dataset. Contributions include:
- Business context and stakeholder goals
- Analytical framing and key metrics
- Prompt engineering for SQL challenges
- Narrative tone and documentation
AI helps keep every story grounded, engaging, and useful β from beginner tutorials to portfolio-grade projects.