🛠️ How SQL Stories Are Made

🎯 Purpose

This document explains how each SQL Story in this repo is created using a mix of synthetic data generation, structured business framing, and AI-assisted scenario design. The goal is to simulate realistic business questions and data challenges that strengthen SQL and analytical fluency.

🧬 Data Generation Strategy

All stories are powered by a dataset produced using the companion project:
➡️ ecom_sales_data_generator

That repo provides:

Modular Python scripts for data simulation
Scenario-based YAML configurations
Controlled injection of data messiness

🗂️ This repository includes database.zip, which contains the pre-built SQLite databases. The output includes:

Clean CSVs for each table (orders, order_items, returns, etc.)
A zip archive with loading assets: ecom_data_gen_output/database.zip

Inside that zip:

*.csv files (one per table)
load_data.sql to construct the schema and load into SQLite

🧪 Mess Injection (Realism Tuning)

The data generator supports configurable “mess” levels:

none: perfectly clean, ideal for baselines or learners
medium: includes nulls, case issues, date shifts, return spikes
heavy: simulates real-world chaos — fuzzy joins, data mismatches, and edge-case outliers

This messiness emulates POS systems or early-stage data warehouses where governance is still maturing.

The included database was configured with a medium mess injection.

🤖 AI's Role in Story Design

AI acts as a co-author and validator, helping shape business scenarios around each dataset. Contributions include:

Business context and stakeholder goals
Analytical framing and key metrics
Prompt engineering for SQL challenges
Narrative tone and documentation

AI helps keep every story grounded, engaging, and useful — from beginner tutorials to portfolio-grade projects.