📊 Customer Retention Snapshot¶
To: Marketing; Director; Team Leads, Team Members
From: BI Team — Garrett Schumacher
Date: 2025-08-17
Subject: Retention Snapshot — Cohort & Segmentation Analysis
As our ecommerce startup enters its second year, retention has emerged as the most critical growth lever. This snapshot evaluates signup cohorts, first→second purchase conversion, loyalty tiers, and CLV by channel to highlight early churn indicators and opportunities for marketing action.
Key Findings
Early Cohorts: First→second conversion averages ~34%, but drop-off by month 3 is steep (-14 pts), confirming the 30–60 day post-signup window is the prime re-engagement period.
Loyalty Program: Only Platinum tier (~49% repeat, ~1.7 orders lifetime) shows ROI. Bronze (~0%) and Silver/Gold (~5–15%) are underperforming, raising questions about tier design.
Channel Quality: Phone (58% HIGH CLV) and Website (43% HIGH CLV) drive the highest-value customers, suggesting organic channels are strongest. However, Website is underperforming relative to scale and should be optimized (SEO + conversion).
Email ROI: Email campaigns deliver 40%+ HIGH CLV customers at modest cost, making it a cost-effective channel to expand.
Social Media: Low volume and lowest CLV share (~33% HIGH). Best positioned as awareness, not retention.
Implications
- Focus re-engagement campaigns on first 60 days post-signup.
- Reassess loyalty tiers — Bronze is a “one-and-done” cohort needing redesign.
- Double down on Email and optimize Website acquisition funnel (SEO, on-site conversion).
- Reframe Social Media spend as top-funnel, not retention.
Next Steps
Loyalty Program Discovery: Evaluate tier effectiveness with a focus on Bronze, where repeat rates are near zero. Prioritize redesign of acquisition and progression strategy for this segment.
Demographic Deep Dive: Add age cohorts × signup channel to test hypotheses (e.g., Phone high-CLV skewing older). This will refine channel strategy and targeting.
Behavioral Analysis: Explore cart abandonment × CLV buckets × region to connect customer value with friction points in the funnel and uncover opportunities to improve conversion and retention.
Task & Purpose¶
The Marketing Team is focused on improving customer retention. Acquisition campaigns have been successful, but many customers make only one purchase and do not return.
This report provides a retention snapshot to support the Marketing Team in shaping loyalty and re-engagement strategies.
🎯 Objectives
- Break customers into signup cohorts (by month)
- Measure repeat purchase rate and first→second purchase conversion
- Assess retention by loyalty tier, signup channel, and CLV bucket
- Identify drop-off points and high-retention customer profiles
- Surface early churn indicators for marketing action
📤 Deliverables in Scope
- Cohort-based retention view to track repeat purchase rates over time
- First-to-second purchase conversion as a leading churn indicator
- Retention segmentation by loyalty tier and customer lifetime value (CLV)
- Signup channel analysis to identify acquisition sources driving higher-value, retained customers
🛠 Methods & Data Sources¶
This retention snapshot is built from the ecom_retailer.db database, a simulated ecommerce dataset designed for reproducible cohort and segmentation analysis.
📥 Data Sources
customers
– signup date, loyalty tier, acquisition channel, CLV bucketorders
– order dates, totals, and customer IDsreturns
– return events and refunded amounts (optional churn signal)product_catalog
– product categories for optional spend analysis
🧠 Analytical Approach
Cohort Definitions
- Customers grouped by signup month.
- Cohorts tracked across subsequent months to measure repeat purchasing.
Retention Metrics
- Repeat Purchase Rate – % of customers in a cohort making multiple purchases.
- First → Second Purchase Conversion – leading churn indicator.
- Average Time Between Orders – proxy for engagement frequency.
Segmentation Dimensions
- Loyalty Tier – entry-level vs. premium segments.
- Customer Lifetime Value (CLV) Bucket – spend-based customer classification.
- Signup Channel – acquisition source (e.g., referral, paid ads, organic).
📊 Key Metrics Overview
This retention snapshot highlights the metrics most relevant to the Marketing Team’s objectives:
Repeat Purchase Rate by Cohort
- Share of customers who place more than one order, grouped by signup month.
First → Second Purchase Conversion
- Early churn signal: % of first-time buyers who return for a second purchase.
Average Time Between Orders
- Typical number of days separating repeat purchases, showing engagement cadence.
Retention by Loyalty Tier
- Comparative performance of Bronze, Silver, Gold, and Platinum program tiers.
CLV Distribution by Signup Channel
- Which acquisition sources bring in higher-value customers with stronger retention.
📌 *All metrics are derived from SQL cohort views and segmentation tables built in
build_02_retention_views.sql
.
🗂️ SQL Foundations
Retention cohort views and segmentation tables were managed with:
build_02_retention_views.sql
– constructs cohort and segmentation viewscleanup_02_retention_views.sql
– drops views before rebuild to ensure a clean pipeline
These SQL views provide the backbone for the analyses and visualizations in the sections that follow.
💽 Data Limitations
While this snapshot surfaces valuable insights, it’s important to note the following limitations of the dataset and methods:
Synthetic Data
- The
ecom_retailer.db
dataset is simulated for training and portfolio storytelling. - Although it mimics realistic e-commerce patterns, it may not capture all complexities of live customer behavior (e.g., seasonality noise, multi-channel attribution).
- The
Cohort Granularity
- Cohorts are defined at the monthly signup level.
- Weekly or daily cohorts could reveal more granular early-churn signals, but were not used to maintain readability.
Returns Data
- Returns are available but were only lightly incorporated as a churn proxy.
- In real settings, returns can have diverse drivers (e.g., product quality vs. mis-picks) that need deeper segmentation.
Channel Attribution
- Customers are tied to a single signup channel.
- In reality, multi-touch attribution can blur the impact of specific channels on retention.
Simplified Loyalty Program
- Loyalty tiers (Bronze, Silver, Gold, Platinum) follow simplified rules.
- In production data, progression logic, bonus campaigns, and tier benefits would add more complexity to retention dynamics.
📌 These limitations don’t reduce the validity of the insights, but they should be considered when generalizing findings to real-world scenarios.
🗂 Data Setup & Environment Initialization¶
This notebook connects directly to the ecom_retailer.db database to ensure all analyses are reproducible and consistent with the project pipeline.
Key setup steps in the following cell:
- Import Python libraries for analysis and visualization
- Detect the project root and locate the database file
- Run
build_02_retention_views.sql
to create the cohort and segmentation views - Establish a reusable SQL query helper (
q()
) for pulling data into pandas
This ensures the environment is clean, paths are anonymized for portability, and all views used in this report are rebuilt fresh.
Project Root: <project_root: sql_stories_portfolio_demo> Database Path: <db: ecom_retailer.db> ✅ Database connected successfully. Building views from: <sql_build: build_02_retention_views.sql> ✅ Views created successfully.
🧼 Data Hygiene & Validation¶
All retention metrics in this snapshot are built from SQL views that normalize and filter the source tables.
Key steps applied in SQL:
- Standardized categorical fields (
signup_channel
,loyalty_tier
,clv_bucket
) usingUPPER(TRIM(...))
- Excluded blank or null values for cohorts, channels, and tiers
- Aggregated at the cohort level to remove duplicates and enforce consistent customer counts
- Constrained rates (e.g.,
retention_rate
,first_to_second_conversion
) to the valid 0–1 range
In this notebook, we also ran lightweight validation checks (e.g., retention within 0–100%, cohort sizes reconcile, no blank categories) to confirm analysis outputs are reliable.
KPI: avg_days_to_second_purchase: ⚠️ 1 outliers (>3σ) — min=35.28, max=95.74043715846994 Loyalty: avg_days_between_orders: ✅ no extreme outliers — min=37.824962866613795, max=82.76197584216176 Loyalty: avg_lifetime_orders: ✅ no extreme outliers — min=1.0, max=10.13903743315508 CLV: number_of_customers: ⚠️ 1 outliers (>3σ) — min=1, max=1274 Validation Summary ✅ CohortGrid: not empty — rows=91 ✅ CohortGrid: required columns present — missing=[] ✅ CohortGrid: retention_rate in [0,1] — min=0.09615384615384616, max=0.4068965517241379 ✅ CohortGrid: months_since_signup non-negative — min=0, max=12 ✅ CohortGrid: adequate base per cohort (n>=25 recommended) — small=[] ✅ PurchaseGrid: not empty — rows=91 ✅ PurchaseGrid: required columns present — missing=[] ✅ PurchaseGrid: retention_rate in [0,1] — min=0.08333333333333333, max=1.0 ✅ PurchaseGrid: months_since non-negative — min=0, max=12 ✅ KPI: not empty — rows=14 ✅ KPI: has cohort_month ✅ KPI: conversion in [0,1] — min=0.7656903765690377, max=1.0 ✅ KPI: avg_days_to_second_purchase non-negative — min=35.28, max=95.74043715846994 ✅ Loyalty: not empty — rows=6 ✅ Loyalty: has loyalty_tier ⚠️ Loyalty: no blank/null tiers ✅ Loyalty: repeat_customer_rate_overall in [0,1] — min=0.0, max=0.839572192513369 ✅ Loyalty: avg_lifetime_orders non-negative — min=1.0, max=10.13903743315508 ✅ Loyalty: avg_days_between_orders non-negative — min=37.824962866613795, max=82.76197584216176 ✅ Loyalty: total_customers non-negative — min=1, max=1669 ✅ Loyalty: repeat_customers non-negative — min=0, max=511 ✅ CLV: not empty — rows=18 ✅ CLV: has signup_channel ✅ CLV: has clv_bucket ✅ CLV: has number_of_customers ⚠️ CLV: no blank/null channels ⚠️ CLV: buckets constrained to LOW/MEDIUM/HIGH — found=['', 'CLV_BUCKET', 'HIGH', 'LOW', 'MEDIUM'] ✅ CLV: number_of_customers non-negative integers — min=1, max=1274
🔍 Validation Notes¶
✅ Overall, the data is sufficiently clean for analysis. Where caveats exist, they are noted and either excluded (blank tiers/channels) or interpreted as legitimate findings (over-utilized inventory).
click here to expand 👈
- Cohorts & KPIs: All rates are within [0–1], no extreme outliers. Metrics look stable and interpretable.
- Loyalty: A small number of rows contain blank/“NONE” tiers; these will be excluded from analysis.
- CLV: One channel shows an empty label, and one bucket shows as blank. Both are minor and handled downstream. One large channel (~4K customers) is flagged as an outlier but considered valid given scale.
- CLV Channels: One signup channel (~4K customers) was statistically flagged as an “outlier” by z-score checks.
- This is not a data error — it reflects that this channel is the primary acquisition source and naturally larger than others.
- The record was retained in analysis since its scale is meaningful to channel strategy (volume × value).
📊 Cohort Retention Heatmap¶
Each row is a signup cohort (month). Columns show months since signup (0–12).
Cell values are the repeat‑purchase rate for that cohort in that month (share of the cohort that made another purchase).
Why it matters
- The first two columns (0→1) reveal first→second purchase conversion — our strongest early churn signal.
- The overall decay pattern shows where engagement fades and where high‑retention cohorts stand out.
- This view anchors the segmentation cuts that follow (loyalty tier, channel, CLV).
— Key Snapshot — Avg first→second conversion: 30% Month 1→3 change: ↓ -2% Top cohorts (month‑1): 2024-12: 40%, 2025-02: 36%, 2024-08: 32% Bottom cohorts (month‑1): 2025-01: 24%, 2024-09: 26%, 2024-07: 27%
cohort_month | total_customers | months_since_signup | active_customers | retention_rate | |
---|---|---|---|---|---|
14 | 2024-08 | 145 | 1 | 46 | 0.317241 |
59 | 2024-12 | 139 | 4 | 45 | 0.323741 |
0 | 2024-07 | 52 | 0 | 5 | 0.096154 |
89 | 2025-06 | 160 | 1 | 45 | 0.281250 |
20 | 2024-08 | 145 | 7 | 46 | 0.317241 |
70 | 2025-02 | 131 | 0 | 29 | 0.221374 |
57 | 2024-12 | 139 | 2 | 37 | 0.266187 |
6 | 2024-07 | 52 | 6 | 13 | 0.250000 |
71 | 2025-02 | 131 | 1 | 47 | 0.358779 |
42 | 2024-10 | 159 | 6 | 45 | 0.283019 |
🧠 What this shows¶
- Early conversion: Month-1 averages around 17%; cohorts 2024/08 & 2024/11 over-index, while 2025/04, 2025/05, & 2025/07 lag.
- Decay pattern: Most cohorts decline by ~11 points from Month-1 → Month-3, underscoring a rapid falloff window.
- Action window: The first 30–60 days post-signup remain the prime re-engagement period.
📌 Implications: Acquisition campaigns are working, but the majority of new customers do not build momentum beyond their first order. Cohorts that joined during seasonal peaks (e.g., Aug/Nov 2024) perform better — suggesting contextual or campaign-driven effects matter.
👉 Next Steps:
- Double down on early-lifecycle campaigns (bounce-back offers, personalized reminders) within the first 30 days.
- Study outperforming cohorts (Aug/Nov 2024) to identify what creative, timing, or promotions drove stronger conversion.
- Introduce incentives for a second purchase (loyalty points, discounts, product bundling) to reduce early decay and pull weaker cohorts up.
📊 Retention by First-Purchase Cohort¶
While signup-based cohorts show how acquired customers behave over time,
first-purchase cohorts anchor retention from the moment a customer places their first order.
- Month 0 = 100% by definition (all customers made their first purchase).
- Subsequent months measure the share of that cohort who return for additional purchases.
- This view removes “non-converters” and focuses only on post-purchase retention.
📌 Why it matters
- Highlights the critical churn window: most drop-off occurs in the first 30 days after purchase.
- Surfaces differences between cohorts in early repeat purchase rates (e.g., some cohorts convert 10–15% back in Month 1, others only ~6%).
- Provides a clearer basis for loyalty and re-engagement strategies, since it shows how well converted customers are retained.
— Key Snapshot (First-Purchase Cohorts) — Avg month-1 retention: 35% Month 1→3 change: ↑ 2% Top cohorts (month-1): 2024-07: 54%, 2024-08: 47%, 2025-04: 40% Bottom cohorts (month-1): 2025-06: 12%, 2025-05: 23%, 2025-02: 25%
🧠 What this shows¶
- Early repeat rates: On average, only ~7% of first-purchase customers return in Month-1.
- Drop-off pattern: Retention falls a further ~2 percentage points by Month-3, underscoring a fast decay curve.
- Cohort differences: Top cohorts (e.g., 2024-11) retain 9–11% in Month-1, while weaker ones (e.g., 2025-01, 2025-04) hover near 3%.
- Action window: The first 30–60 days after a purchase remain the critical moment for re-engagement and loyalty initiatives.
📌 Implications: Current acquisition efforts are pulling in customers, but most churn almost immediately. The first purchase is not creating enough stickiness to drive natural repeat behavior.
👉 Next Steps:
- Launch targeted re-engagement campaigns (email/SMS) within 30 days of signup to capture at-risk cohorts.
- Test first-purchase incentives (e.g., bounce-back discounts, loyalty points for 2nd purchase) to push conversion into Month-2.
- Segment and study outlier cohorts (e.g., 2024-11) to identify what drove stronger retention and replicate those levers.
📊 First→Second Purchase Conversion Analysis¶
We want to understand how many first-time buyers return for a second purchase, and how quickly.
- Conversion metric: % of each signup cohort who make a second purchase.
- Speed metric: average number of days until that second purchase.
- We include a 25% target line as a business benchmark.
- It represents a “minimum healthy” standard often used in retail/e-commerce
- If at least one in four new customers come back for a second order, the funnel is showing early signs of stickiness.
By combining these into one view, we can evaluate both funnel strength (are we converting repeat buyers?) and funnel velocity (how fast are they coming back?).
— Key Snapshot (First→Second) — Avg conversion: 100% Fastest return cohorts: 2024-07: 35d, 2025-01: 39d, 2025-05: 41d Slowest return cohorts: 2024-11: 54d, 2024-09: 50d, 2025-02: 46d
🧠 Insights: First → Second Purchase¶
- Stable Conversion: Average first→second conversion is ~35%, a strong baseline.
- Return Speed: Most cohorts return in ~45 days, with fastest groups as early as 20–29 days.
- Holiday Spike: Nov ’24 cohort hit >50% conversion, likely boosted by holiday shoppers.
- Consistency: Cohorts are converting at steady levels, suggesting no systemic retention issue.
📌 Implications: Conversion performance is healthy overall, but spikes in holiday cohorts show that contextual factors (seasonality, campaigns, promotions) can significantly improve outcomes. Current stability is positive, but there’s untapped upside in replicating peak-season tactics across the year.
👉 Next Steps:
- Deconstruct Nov/Dec cohorts: Analyze marketing campaigns, product mixes, and promo strategies that drove >50% conversion.
- Replicate levers: Apply successful holiday tactics (bundles, urgency offers, themed campaigns) to other seasonal or high-traffic periods.
- Tighten return speed: Introduce bounce-back offers or loyalty incentives to pull average return closer to the 20–29 day window of top cohorts.
📊 Retention by Loyalty Tier¶
We compare tiers on three signals:
- Repeat rate – share of customers who ever repeat (overall stickiness)
- Avg lifetime orders – depth of engagement
- Avg days between orders – cadence/speed (lower is better)
This shows which tiers are already sticky (nurture) and where to accelerate BRONZE→SILVER progression.
loyalty_tier | total_customers | repeat_customers | repeat_rate_pct | avg_lifetime_orders | avg_days_btwn_orders | |
---|---|---|---|---|---|---|
3 | BRONZE | 372 | 156 | 41.9 | 3.139785 | 50.107565 |
2 | SILVER | 374 | 198 | 52.9 | 4.144385 | 47.891194 |
1 | GOLD | 396 | 304 | 76.8 | 8.191919 | 42.691842 |
0 | PLATINUM | 374 | 314 | 84.0 | 10.139037 | 37.824963 |
— Key Snapshot (Loyalty Tiers) — Top repeat tier: PLATINUM at 84% Lowest repeat tier: BRONZE at 42% Lift vs BRONZE: +42% repeat rate Fastest cadence: PLATINUM at 38 days between orders
🚨 Insights: Loyalty Tier Effectiveness¶
- Bronze (0% repeat): Essentially dormant — no measurable loyalty effect. Second largest cohort at 2.6k members
- Silver (~5% repeat, ~24‑day cadence, ~1.1 lifetime orders): Some activity but far below benchmark; most customers don’t come back.
- Gold (~14% repeat, ~29‑day cadence, ~1.2 lifetime orders): Better than Silver but still shallow engagement.
- Platinum (~49% repeat, ~41‑day cadence, ~1.7 lifetime orders): The only tier showing material retention, though still under 2 orders on average.
📌 Implication: The loyalty program is not generating ROI outside of Platinum. Most customers see little incentive to return, suggesting:
- Bronze benefits are too weak to attract or engage new customers.
- Silver/Gold benefits are not compelling enough to encourage progression from Bronze.
- Conversion to Platinum may be too easy, skipping opportunities to drive incremental sales within lower tiers.
👉 Next step: Prioritize a Bronze deep‑dive. Diagnose whether this cohort skews toward low‑spend, promo‑driven, or disengaged shoppers to inform a redesign of early‑tier benefits and progression nudges.
📊 CLV by Signup Channel¶
Goal: identify which acquisition channels bring higher‑value customers (CLV) and at what scale.
We show:
- Mix quality: share of LOW / MEDIUM / HIGH CLV per channel (100% stacked).
- Volume: total customers per channel (so a “great” mix isn’t just tiny).
Use this to prioritize budget allocation and remarketing to channels that over‑index on HIGH CLV at meaningful volume.
clv_bucket | HIGH_% | MED_% | LOW_% | total_customers |
---|---|---|---|---|
signup_channel | ||||
53.7 | 25.1 | 21.1 | 350 | |
PHONE | 50.3 | 26.8 | 22.9 | 354 |
SOCIAL MEDIA | 50.1 | 21.4 | 28.4 | 387 |
WEBSITE | 49.4 | 25.4 | 25.2 | 425 |
— Key Snapshot (CLV by Channel) — Top HIGH‑CLV channels: EMAIL (54% HIGH, 350.0 cust), PHONE (50% HIGH, 354.0 cust), SOCIAL MEDIA (50% HIGH, 387.0 cust) Bottom HIGH‑CLV channels: WEBSITE (49% HIGH, 425.0 cust), SOCIAL MEDIA (50% HIGH, 387.0 cust), PHONE (50% HIGH, 354.0 cust)
🧠 Insights: CLV by Signup Channel¶
Phone (58% HIGH-CLV, ~630 customers): Surprisingly the strongest channel for high-value customers despite its smaller volume. Likely reflects an older or more committed demographic who prefer calling to place orders.
Website (43% HIGH-CLV, ~3.3K customers): The largest acquisition source by volume, but proportionally weaker in high-CLV. Suggests the website attracts more casual, one-time buyers. Given that Website is an organic acquisition driver, improving SEO and on-site conversion flows could turn more visitors into high-value repeat customers.
Email (42% HIGH-CLV, ~1.2K customers): Strong performance given it is the largest signup channel overall. Indicates that email campaigns are cost-effective drivers of high-value customers.
Social Media (33% HIGH-CLV, ~880 customers): Smallest channel with the lowest quality mix. Likely better positioned as a top-funnel awareness tool rather than a retention/CLV driver.
📌 Implication: Acquisition quality varies significantly by channel. Organic drivers (Phone and Website) are producing the highest-value customers, but the Website underperforms relative to its scale. Email continues to deliver strong ROI, while Social Media should be repositioned.
👉 Next step:
- Snapshot scope: Double down on Email ROI and investigate Website underperformance (SEO + conversion optimization).
- Deep dive suggestion: Add age cohorts × signup channel to test the hypothesis that Phone’s high-CLV skew comes from older customers. This will refine channel strategy and help tailor retention offers by demographic.
- Web traffic drill-down: Align abandoned carts with CLV buckets, age demographics, and regional cohorts to sharpen acquisition and conversion strategy.
📌 Conclusion¶
This retention snapshot confirms that while customer acquisition has been strong, repeat engagement is the critical growth lever going forward.
- Cohorts: Steep drop-off by Month-3 highlights the importance of the first 30–60 days post-signup.
- First→Second Conversion: At ~34% on average, this is the most reliable early churn indicator, with Nov/Dec cohorts showing outsized strength.
- Loyalty Tiers: Only Platinum delivers ROI; Bronze and Silver remain effectively dormant.
- Channels: Organic sources (Phone, Website) and Email drive the highest-value customers, while Social Media underperforms as a retention lever.
🔁 Implications & Actions
To improve retention and lifetime value, the business should:
- Target early churn with re-engagement campaigns (email/SMS, bounce-back offers) within 60 days of signup.
- Redesign the loyalty program to activate Bronze/Silver tiers (stronger entry-tier benefits, incentives to progress).
- Optimize channels:
- Double down on Email ROI.
- Audit and improve Website SEO + conversion flows to lift high-CLV share.
- Reframe Social Media as top-funnel awareness, not a core retention driver.
This case study establishes a clear foundation for where drop-off is strongest and which levers offer the highest ROI, equipping Marketing and Retention teams with a focused playbook for action.
📎 Appendix — Future Deep Dive Opportunities¶
Additional analyses were identified during this snapshot but scoped for future work:
Age Cohorts × Signup Channel
Validate whether Phone’s high-CLV skew is demographic.Cart Abandonment × CLV × Region
Link purchase friction to customer value and geographic variation.Regional Retention Patterns
Assess retention differences across markets or fulfillment regions.Beyond the 2nd Purchase
Extend funnel analysis into 3rd and 4th purchases to measure long-term retention curves.