Medallion lakehouse pipeline from Bronze ingestion through Gold marts, with layered validation and atomic publishing.
Partitioned Parquet + _MANIFEST.json lineage
dbt-duckdb Base Silver + Polars Enriched
BigQuery marts + partition idempotency
Manifest exists, row counts, schema conformance, coverage
PK/FK integrity, quarantine rate, row loss thresholds
Business rules, min rows, null analysis
Freshness gates, parallel snapshots, lightweight validation.
Full Bronze → Gold with validation stages and optional BigQuery load.
Backfill single partitions without replaying entire tables.
Fail fast on contract violations before downstream publish.
Independent Bronze, Base Silver, Enriched Silver, Gold schedules.
Specs define tables, partitions, PK/FK rules, and transform dependencies for dynamic DAG generation.
base_silver:
tables:
- name: orders
partition_key: ingestion_dt
primary_key: [order_id]
foreign_keys:
- column: customer_id
references: customers.customer_id
Per-table JSON audits for SLA dashboards and alerts.
Bronze, Silver, Enriched, and Dims snapshot reports.
Local logs + Airflow task logs with enriched context.
<2GB memory, under 2 minutes for 8 tables.
<6GB memory, under 5 minutes for 10 transforms.
Warehouse execution under 3 minutes for 8 facts.