Survival Analysis Case Report - Telecom Customer Churn Prediction

Author: Chen Sijia

Dataset: IBM Telco Customer Churn

Tutorial Link: https://github.com/databricks-industry-solutions/survival-analysis

Environment: PySpark Environment

Introduction
Data Overview
Survival Analysis Methods
Kaplan-Meier Survival Analysis
Cox Proportional Hazards Model
Accelerated Failure Time Model (AFT)
Customer Lifetime Value (CLV)
Conclusion
Business Recommendations
Model Limitations
Appendix

1. Introduction

1.1 What is Survival Analysis?

Survival Analysis is a collection of statistical methods for studying “time-to-event” data. Although originally applied in medicine (studying patient survival time), it is now widely used in:

Telecommunications: Customer churn prediction
Manufacturing: Equipment failure prediction
Finance: Loan default time prediction
E-commerce: User activation time prediction

1.2 Project Overview

This case uses survival analysis to predict churn time for telecom customers, helping enterprises:

Identify customers with high churn risk
Take retention actions at critical time points
Optimize customer retention strategies

2. Data Overview

2.1 Data Source

Dataset: IBM Telco Customer Churn Dataset
Original records: 7,043
Analysis sample size: 3,351 (month-to-month contract + internet service customers)

2.2 Sample Characteristics

Metric	Value
Number of churned customers	1,556
Churn rate	46.4%
Observation time range	0-72 months

3. Survival Analysis Methods

3.1 Kaplan-Meier Estimator

Basic Principle:
Non-parametric method to estimate the survival function S(t) = P(T > t)

Formula:
S(t) = ∏(1 - d_j / n_j)

Where:

d_j: number of events (churns) at time point j
n_j: number at risk just before time point j

Advantages:

No distributional assumptions
Handles censored data
Intuitive and easy to interpret

Application: Visualize survival curves for different groups, compare differences between groups

3.2 Cox Proportional Hazards Model

Basic Principle:
Semi-parametric model to analyze the effect of multiple covariates on survival time

Model Form:
h(t|X) = h₀(t) × exp(β₁X₁ + … + βₚXₚ)

Where:

h(t X): hazard function
h₀(t): baseline hazard
β: covariate coefficients
exp(β): hazard ratio (HR)

Advantages:

Analyzes multiple factors simultaneously
No need to specify baseline hazard
HR < 1 indicates protective factor, HR > 1 indicates risk factor

Application: Identify key factors influencing customer churn

3.3 Accelerated Failure Time (AFT) Model

Basic Principle:
Parametric model assuming covariates accelerate or decelerate survival time

Model Form:
T = exp(β₁X₁ + … + βₚXₚ + σ·ε)

Where:

T: survival time
exp(β): time acceleration factor (>1 extends, <1 shortens)
ε: error term

Advantages:

Directly predicts survival time
Handles multiple distributions (Weibull, LogNormal, LogLogistic)

Application: Predict customer churn probability at specific time points

3.4 Log-Rank Test

Basic Principle:
Chi-square test to test whether multiple survival curves are statistically equivalent

Null Hypothesis: No significant difference among survival curves of groups

Application: Verify whether survival curves differ significantly across groups

3.5 Method Comparison

Method	Type	Purpose	Output
KM	Non-parametric	Estimate survival function	Survival probability
Cox	Semi-parametric	Analyze influencing factors	Hazard ratio (HR)
AFT	Parametric	Predict survival time	Time acceleration factor
Log-rank	Non-parametric test	Compare differences between groups	test_statistic, p, -log2(p)

4. Kaplan-Meier Survival Analysis

4.1 Analysis Workflow

from lifelines import KaplanMeierFitter
from lifelines.statistics import pairwise_logrank_test

# Overall KM fit
kmf = KaplanMeierFitter()
kmf.fit(telco_pd['tenure'], telco_pd['churn'].astype(float))

# Group KM and log-rank test
def plot_km(col):
    for r in telco_pd[col].unique():
        ix = telco_pd[col] == r
        kmf.fit(telco_pd.loc[ix, 'tenure'], telco_pd.loc[ix, 'churn'], label=r)
        kmf.plot()

def print_logrank(col):
    log_rank = pairwise_logrank_test(telco_pd['tenure'], telco_pd[col], telco_pd['churn'])
    print(log_rank.summary)

# Perform group analysis
for col in categorical_cols:
    plot_km(col)
    print_logrank(col)

4.2 Analysis Results

4.2.1 Overall Survival Curve

Median survival time: 34 months
Interpretation: 50% of customers churn within 34 months

Overall survival curve Figure 1: Overall Kaplan-Meier survival curve

4.2.2 DSL Internet Service Survival Probability (first 10 months)

Month	DSL Survival Probability
0	1.000000
1	0.902698
2	0.864380
3	0.834702
4	0.810522
5	0.794352
6	0.783900
7	0.776362
8	0.768486
9	0.750833

4.2.3 Group Survival Analysis Results

Gender

Metric	Value
test_statistic	2.038938
p-value	0.153317
-log2(p)	2.705414

Conclusion: p > 0.05, survival curves for different genders are not significantly different

Gender group survival curve Figure 2: Survival curve by gender

Senior Citizen Status (seniorCitizen)

Metric	Value
test_statistic	0.125471
p-value	0.723174
-log2(p)	0.467584

Conclusion: p > 0.05, senior citizen status has no significant impact on customer retention

Senior citizen group survival curve Figure 3: Survival curve by senior citizen status

Partner Status (partner)

Metric	Value
test_statistic	135.758896
p-value	2.252911e-31
-log2(p)	101.807981

Conclusion: Customers with partners have significantly longer retention time

Partner status group survival curve Figure 4: Survival curve by partner status

Dependents Status (dependents)

Metric	Value
test_statistic	35.031241
p-value	3.244576e-09
-log2(p)	28.199323

Conclusion: Customers with dependents have significantly longer retention time

Dependents status group survival curve Figure 5: Survival curve by dependents status

Phone Service (phoneService)

Metric	Value
test_statistic	1.683709
p-value	0.194432
-log2(p)	2.36266

Conclusion: p > 0.05, having phone service has no significant impact on retention

Phone service group survival curve Figure 6: Survival curve by phone service

Multiple Lines Service (multipleLines)

Group Comparison	test_statistic	p-value
No phone service vs No	12.382712	4.333273e-04
No vs Yes	72.358368	1.794602e-17
No phone service vs Yes	1.500291	0.2206266

Conclusion: Multiple lines service has a significant impact on retention

Multiple lines service group survival curve Figure 7: Survival curve by multiple lines service

Internet Service Type (internetService)

Metric	Value
test_statistic	25.172866
p-value	5.241449e-07
-log2(p)	20.863531

Conclusion: DSL customers retain significantly better than fiber optic customers

Internet service group survival curve Figure 8: Survival curve by internet service type

Streaming TV (streamingTV)

Metric	Value
test_statistic	12.93926
p-value	0.000322
-log2(p)	11.601718

Conclusion: Customers with streaming TV service retain significantly better

Streaming TV group survival curve Figure 9: Survival curve by streaming TV

Streaming Movies (streamingMovies)

Metric	Value
test_statistic	17.941685
p-value	0.000023
-log2(p)	15.422016

Conclusion: Customers with streaming movies service retain significantly better

Streaming movies group survival curve Figure 10: Survival curve by streaming movies

Online Security Service (onlineSecurity)

Metric	Value
test_statistic	141.60316
p-value	1.187554e-32
-log2(p)	106.053706

Conclusion: Customers with online security service have significantly longer retention time

Online security group survival curve Figure 11: Survival curve by online security

Online Backup Service (onlineBackup)

Metric	Value
test_statistic	189.482865
p-value	4.122979e-43
-log2(p)	140.799221

Conclusion: Customers with online backup service have significantly longer retention time

Online backup group survival curve Figure 12: Survival curve by online backup

Device Protection Service (deviceProtection)

Metric	Value
test_statistic	71.496825
p-value	2.777047e-17
-log2(p)	54.999226

Conclusion: Customers with device protection service have significantly longer retention time

Device protection group survival curve Figure 13: Survival curve by device protection

Tech Support Service (techSupport)

Metric	Value
test_statistic	90.430334
p-value	1.916059e-21
-log2(p)	68.822348

Conclusion: Customers with tech support service have significantly longer retention time

Tech support group survival curve Figure 14: Survival curve by tech support

Paperless Billing (paperlessBilling)

Metric	Value
test_statistic	8.340802
p-value	0.003876
-log2(p)	8.011049

Conclusion: Customers using paperless billing have higher churn risk

Paperless billing group survival curve Figure 15: Survival curve by paperless billing

Payment Method (paymentMethod)
- Conclusion: Payment method has a highly significant impact on retention; electronic check is a high-risk payment method

Group Comparison	test_statistic	p-value	-log2(p)
Bank transfer (automatic) vs Credit card (automatic)	0.061543	8.040732e-01	0.314601
Bank transfer (automatic) vs Electronic check	91.191889	1.303937e-21	69.377616
Bank transfer (automatic) vs Mailed check	43.536998	4.160192e-11	34.484559
Credit card (automatic) vs Electronic check	79.991082	3.761035e-19	61.205504
Credit card (automatic) vs Mailed check	39.684613	2.984678e-10	31.641706
Electronic check vs Mailed check	0.898320	3.432326e-01	1.542741

Payment method group survival curve Figure 16: Survival curve by payment method

4.2.4 Key Findings

Overall customer retention level
The target customer segment has a median survival time of 34 months, meaning 50% of customers churn within 34 months of joining the network, indicating a relatively high overall churn risk.
Factors with no significant impact on retention
Gender, senior citizen status, and phone service subscription do not significantly affect customer retention (p > 0.05).
Protective services that significantly extend customer retention
Online backup, online security, tech support, and device protection all significantly reduce churn risk, with:
- Online backup having the strongest effect (log-rank statistic as high as 189.48)
- Online security second
- Tech support also being a core protective factor
Service type differences
- DSL customers retain significantly better than fiber optic customers; fiber optic customers are a key churn concern.
- Customers with streaming TV and streaming movies have significantly better retention.
Impact of customer personal characteristics
Customers with partners or dependents have lower churn risk; family-type customers are more stable.
Billing and payment method risk signals
- Customers using paperless billing have higher churn risk.
- Electronic check payment is the highest-risk payment method; automatic deductions (bank transfer/credit card) yield the best retention.
Summary of high-risk customer profile
Customers without a partner, without dependents, using fiber optic internet service, not purchasing value-added services (security/backup/tech support/device protection), and paying by electronic check are the highest churn risk group in this analysis.

5. Cox Proportional Hazards Model

5.1 Analysis Workflow

from lifelines import CoxPHFitter

# Data preparation and One-Hot encoding
encode_cols = ['dependents', 'internetService', 'onlineBackup', 'techSupport', 'paperlessBilling']
encoded_pd = pd.get_dummies(telco_pd, columns=encode_cols, prefix=encode_cols, drop_first=False)

# Select variables
survival_pd = encoded_pd[['churn', 'tenure', 'dependents_Yes', 
                          'internetService_DSL', 'onlineBackup_Yes', 'techSupport_Yes']]

# Fit Cox model
cph = CoxPHFitter(alpha=0.05)
cph.fit(survival_pd, 'tenure', 'churn')

# Output results
cph.print_summary()
cph.plot(hazard_ratios=True)

# Proportional hazards assumption test
cph.check_assumptions(survival_pd, p_value_threshold=0.05)
cph.check_assumptions(survival_pd, p_value_threshold=0.05, show_plots=True)

5.2 Analysis Results

5.2.1 Model Overview

5.2.2 Model Coefficients and Hazard Ratio Analysis

Figure 17: Cox model hazard ratios (HR<1 indicates protective factor, with 95% CI)

Variable	coef	exp(coef)	se(coef)	coef 95% CI	exp(coef) 95% CI	z	p-value	-log2(p)	Significance
dependents_Yes	-0.33	0.72	0.07	[-0.47, -0.19]	[0.63, 0.83]	-4.64	<0.005	18.12	***
internetService_DSL	-0.22	0.80	0.06	[-0.33, -0.10]	[0.72, 0.90]	-3.68	<0.005	12.07	***
onlineBackup_Yes	-0.78	0.46	0.06	[-0.89, -0.66]	[0.41, 0.52]	-13.13	<0.005	128.37	***
techSupport_Yes	-0.64	0.53	0.08	[-0.79, -0.49]	[0.46, 0.61]	-8.48	<0.005	55.36	***

Significance markers: ** p<0.001, ** p<0.01, * p<0.05

Scaled Schoenfeld residuals plot
Figure 18: Scaled Schoenfeld residual plots for each variable (with both rank and km time transformation methods)

5.2.3 Proportional Hazards Assumption Test Results

Variable	Test Method	Test Statistic	p-value	-log2(p)	Assumption Check
dependents_Yes	km	1.48	0.22	2.16	Pass
dependents_Yes	rank	0.81	0.37	1.44	Pass
internetService_DSL	km	20.98	<0.005	17.72	Violated
internetService_DSL	rank	26.71	<0.005	22.01	Violated
onlineBackup_Yes	km	17.80	<0.005	15.31	Violated
onlineBackup_Yes	rank	17.47	<0.005	15.07	Violated
techSupport_Yes	km	8.09	<0.005	7.81	Violated
techSupport_Yes	rank	13.76	<0.005	12.23	Violated

The following variables violate the proportional hazards assumption:

internetService_DSL: p-value < 5e-05
onlineBackup_Yes: p-value < 5e-05
techSupport_Yes: p-value = 0.0002

Remedial suggestion: When modeling, use strata=['internetService_DSL', 'onlineBackup_Yes', 'techSupport_Yes'] to stratify variables that violate the assumption, improving model reliability.

Log-log KM curves
Figure 19: Log-log Kaplan-Meier curves for each variable group, used to visually verify the proportional hazards assumption

5.2.4 Key Findings

Protective factors (reducing customer churn risk)
All variables included in this model are protective factors against customer churn, ordered by effect strength as follows:
- onlineBackup_Yes: HR=0.46, reduces customer churn risk by 54.0% (p<0.001) – strongest churn inhibition factor
- techSupport_Yes: HR=0.53, reduces customer churn risk by 47.2% (p<0.001)
- dependents_Yes: HR=0.72, reduces customer churn risk by 28.0% (p<0.001)
- internetService_DSL: HR=0.80, reduces customer churn risk by 19.5% (p<0.001)
Risk factors
In the Cox regression model constructed in this study, no risk factors with HR>1.2 and statistical significance were identified. All included features showed a positive effect on customer retention.

6. Accelerated Failure Time Model (AFT)

6.1 Analysis Workflow

from lifelines import LogLogisticAFTFitter

# Data preparation and One-Hot encoding
encode_cols = ['partner', 'multipleLines', 'internetService', 'onlineSecurity', 
               'onlineBackup', 'deviceProtection', 'techSupport', 'paymentMethod']
encoded_pd = pd.get_dummies(telco_pd, columns=encode_cols, prefix=encode_cols, drop_first=False)

# Select variables
survival_pd = encoded_pd[['churn', 'tenure', 'partner_Yes', 'multipleLines_Yes',
                          'internetService_DSL', 'onlineSecurity_Yes', 'onlineBackup_Yes',
                          'deviceProtection_Yes', 'techSupport_Yes',
                          'paymentMethod_Bank transfer (automatic)',
                          'paymentMethod_Credit card (automatic)']]

# Fit LogLogistic AFT model
aft = LogLogisticAFTFitter()
aft.fit(survival_pd, duration_col='tenure', event_col='churn')

# Output results
print(f"Median Survival Time:{np.exp(aft.median_survival_time_):.2f}")
aft.print_summary()
aft.plot()

6.2 Analysis Results

6.2.1 Model Results

Metric	Value
model	lifelines.LogLogisticAFTFitter
duration col	tenure
event col	churn
baseline estimation	breslow
number of observations	3351
number of events observed	1556
log-likelihood	-6838.36
Concordance	0.73
AIC	13698.72
log-likelihood ratio test	877.49 (df=9)
-log2(p) of ll-ratio test	605.78

AFT model coefficient plot Figure 20: AFT model coefficients and confidence intervals

6.2.2 AFT Model Coefficient Table

Variable	coef	exp(coef)	se(coef)	z	p	-log2(p)
deviceProtection_Yes	0.48	1.62	0.07	6.88	<0.005	-
internetService_DSL	0.38	1.47	0.08	4.98	<0.005	-
multipleLines_Yes	0.66	1.94	0.07	9.64	<0.005	-
onlineBackup_Yes	0.81	2.25	0.07	11.63	<0.005	-
onlineSecurity_Yes	0.86	2.37	0.09	10.12	<0.005	-
partner_Yes	0.68	1.97	0.07	10.21	<0.005	-
paymentMethod_Bank transfer	0.74	2.10	0.09	8.05	<0.005	-
paymentMethod_Credit card	0.80	2.22	0.10	8.36	<0.005	-
techSupport_Yes	0.69	1.99	0.09	7.90	<0.005	-
Intercept	1.59	4.91	0.07	24.47	<0.005	-
beta_Intercept	0.12	1.13	0.02	5.71	<0.005	-

6.2.3 Model Assumption Verification - Log-odds Plots

Log-odds plot - partner Figure 21: Log-odds plot (partner)

Log-odds plot - multipleLines Figure 22: Log-odds plot (multipleLines)

Log-odds plot - internetService Figure 23: Log-odds plot (internetService)

Log-odds plot - onlineSecurity Figure 24: Log-odds plot (onlineSecurity)

Log-odds plot - onlineBackup Figure 25: Log-odds plot (onlineBackup)

Log-odds plot - deviceProtection Figure 26: Log-odds plot (deviceProtection)

Log-odds plot - techSupport Figure 27: Log-odds plot (techSupport)

Log-odds plot - paymentMethod Figure 28: Log-odds plot (paymentMethod)

6.2.4 Reliability Warnings

Warning 1: Predicted value (135.5) exceeds 1.5 times the data range (72.0)
Warning 2: Large discrepancy from Kaplan-Meier result (34.0), ratio = 3.99

6.2.5 Recommendation

Do not use AFT model results for business decisions. Use Kaplan-Meier results (34 months) instead.

7. Customer Lifetime Value (CLV)

7.1 Calculation Workflow

def calculate_customer_lifetime_value(cph, monthly_profit=30, discount_rate=0.10):
    # Define baseline customer
    baseline_customer = pd.DataFrame([{
        'dependents_Yes': 0, 'internetService_DSL': 0,
        'onlineBackup_Yes': 0, 'techSupport_Yes': 0
    }])
    
    irr = discount_rate / 12
    survival_func = cph.predict_survival_function(baseline_customer)
    
    # Build cohort table
    cohort_df = pd.concat([pd.DataFrame([1.00]), round(survival_func, 2)])
    cohort_df = cohort_df.rename(columns={0: 'Survival Probability'})
    cohort_df['Contract Month'] = cohort_df.index.astype('int')
    cohort_df['Monthly Profit for the Selected Plan'] = monthly_profit
    cohort_df['Avg Expected Monthly Profit'] = round(cohort_df['Survival Probability'] * monthly_profit, 2)
    cohort_df['NPV of Avg Expected Monthly Profit'] = round(
        cohort_df['Avg Expected Monthly Profit'] / ((1 + irr) ** cohort_df['Contract Month']), 2
    )
    cohort_df['Cumulative NPV'] = cohort_df['NPV of Avg Expected Monthly Profit'].cumsum()
    cohort_df['Contract Month'] = cohort_df['Contract Month'] + 1
    
    return cohort_df.set_index('Contract Month')

7.2 Calculation Results

7.2.1 Calculation Parameters

Monthly profit per customer: $30
Annual discount rate: 10%
Monthly discount rate: 0.83%
Forecast time horizon: 72 months

7.2.2 CLV Key Node Results

Time Horizon	Cumulative NPV (CLV)
12 months	$266.88
24 months	$405.44
36 months	$515.01
Lifetime CLV	$626.69

CLV payback period analysis Figure 29: Payback period analysis

CLV survival probability curve Figure 30: Survival probability curve

7.2.3 CLV Trend Table (complete data for first 25 months)

Contract Month	Survival Probability	Monthly Profit	Avg Expected Monthly Profit	NPV	Cumulative NPV
1	1.00	30	30.00	30.00	30.00
2	0.87	30	26.10	25.88	55.88
3	0.81	30	24.30	23.90	79.78
4	0.77	30	23.10	22.53	102.31
5	0.74	30	22.20	21.48	123.79
6	0.71	30	21.30	20.43	144.22
7	0.69	30	20.70	19.69	163.91
8	0.67	30	20.10	18.97	182.88
9	0.65	30	19.50	18.25	201.13
10	0.63	30	18.90	17.54	218.67
11	0.60	30	18.00	16.57	235.24
12	0.59	30	17.70	16.16	251.40
13	0.57	30	17.10	15.48	266.88
14	0.55	30	16.50	14.81	281.69
15	0.54	30	16.20	14.42	296.11
16	0.52	30	15.60	13.77	309.88
17	0.51	30	15.30	13.40	323.28
18	0.50	30	15.00	13.03	336.31
19	0.48	30	14.40	12.40	348.71
20	0.47	30	14.10	12.04	360.75
21	0.46	30	13.80	11.69	372.44
22	0.45	30	13.50	11.34	383.78
23	0.44	30	13.20	11.00	394.78
24	0.43	30	12.90	10.66	405.44
25	0.42	30	12.60	10.32	415.76

7.2.4 Key Findings

Customer Lifetime Value (CLV): The cumulative net present value (NPV) for the baseline customer over 72 months is $626.69, a core reference metric for setting customer acquisition cost limits.
Revenue growth trend: Customer CLV grows rapidly to $266.88 in the first 12 months, to $405.44 by 24 months, and reaches $515.01 by 36 months, then growth slows, indicating the early period is critical for value contribution.
Survival probability decay: Customer survival probability continuously declines over time, from 1.00 in the first month to 0.43 by 24 months, reflecting the long-term trend of customer churn.
Impact of expected profit and discounting: Due to decaying survival probability and the discount rate, the average expected monthly profit per customer gradually declines from $30.00 in the first month to $12.90 by 24 months, and the growth rate of NPV also slows.
Business decision recommendations: Customer acquisition cost (CAC) should be controlled within 30% of CLV (approximately $188) to ensure profitability of customer relationships; at the same time, focus on implementing customer retention strategies within the first 24 months to maximize long-term customer value.

8. Conclusion

8.1 Model Applicability and Reliability Assessment

Based on the IBM Telco Customer Churn dataset, this study systematically quantifies churn behavior of month-to-month internet service customers using Kaplan-Meier estimation, Cox proportional hazards regression, Accelerated Failure Time (AFT) models, and the Customer Lifetime Value (CLV) framework. Main model evaluation conclusions are as follows:

Model	Reliability	Primary Use	Key Output
Kaplan-Meier estimation	✅ Highly reliable	Non-parametric survival function estimation	Median survival time: 34 months
Cox proportional hazards model	✅ Reliable	Multi-factor hazard ratio analysis	Concordance Index: 0.64; HR(onlineBackup)=0.46
LogLogistic AFT model	❌ Unreliable	Parametric survival time prediction	Predicted median survival time 135.5 months (beyond observation range)
CLV framework	✅ Usable	Long-term customer value quantification	72-month cumulative NPV: $626.69

Overall judgment: The Kaplan-Meier and Cox models provide robust core analytical conclusions for this study; the AFT model is not suitable for business decisions due to extrapolation beyond supported data range; the CLV framework, while informative, depends on the predictive ability of the Cox model.

8.2 Core Empirical Findings

(1) Overall customer retention level

The target customer segment (month-to-month + internet service users) has a median survival time of 34 months. This indicates that 50% of customers in this segment churn within 34 months after joining the network, representing a relatively high overall churn risk.

(2) Identification of key protective factors (based on Cox model)

Four variables were identified as significant protective factors, ordered by effect strength:

Protective Factor	Hazard Ratio (HR)	Reduction in Churn Risk	Statistical Significance
onlineBackup_Yes	0.46	54.0%	p < 0.001
techSupport_Yes	0.53	47.2%	p < 0.001
dependents_Yes	0.72	28.0%	p < 0.001
internetService_DSL	0.80	19.5%	p < 0.001

These results indicate that online backup and tech support services are the two most effective interventions for reducing customer churn risk. The Cox model’s Concordance Index is 0.64, indicating moderate discriminative ability.

(3) High-risk customer profile

Combining KM group analysis and marginal effects from the Cox model, high-risk churn customers exhibit the following typical characteristics:

Demographic characteristics: No partner, no dependents
Service usage characteristics: Use fiber optic internet service, not subscribed to value-added services such as online backup/online security/device protection/tech support
Payment behavior characteristics: Pay by electronic check, use paperless billing

Log-rank test results show that the between-group difference for partner status is 135.76 (p < 2.25e-31), for dependents is 35.03 (p < 3.24e-09), and for fiber vs. DSL users is 25.17 (p < 5.24e-07), all statistically significant.

(4) Customer Lifetime Value (CLV)

The 72-month cumulative NPV for the baseline customer (not subscribed to any value-added services) predicted by the Cox model is $626.69. Of this, the first 12 months contribute $266.88 (42.6% of total value), and the first 24 months contribute $405.44 (64.7% of total value), indicating that customer value is concentrated in the first two years after joining.

8.3 Summary of Methodological Limitations

Unreliability of the AFT model: The LogLogistic AFT model predicted a median survival time (135.5 months) significantly exceeding the actual observed range (0–72 months), with a ratio of 3.99 compared to the KM estimate (34 months). This deviation arises from the combination of high censoring rate and insufficient observation window, limiting the model’s extrapolation ability.
Partial violation of proportional hazards assumption: In the Cox model, the variables internetService_DSL, onlineBackup_Yes, and techSupport_Yes did not pass the proportional hazards test (p < 0.05), suggesting that the effects of these variables may change over time. Stratified Cox or time-varying covariate models are recommended to address this.
Sample selection bias: This study includes only month-to-month contract customers who subscribe to internet services. Conclusions cannot be directly generalized to long-term contract customers or those without internet service.

9. Business Recommendations

9.1 Short-term Operational Strategies (0–6 months)

(1) Value-added service promotion plan

Based on the hazard ratio estimates from the Cox model, online backup (HR=0.46) and tech support (HR=0.53) are the most effective risk mitigation tools. Recommendations:

Implement bundling strategies for new customers, offering online backup and tech support as default add-ons to internet service with a first-month free trial.
Conduct targeted marketing campaigns for existing high-risk customers (fiber users, those without partners/dependents) via email, in-app notifications, etc., to promote these services.
Establish an A/B testing framework to quantify the causal effect of interventions on retention.

(2) Early identification of high-risk customers

Based on median survival time differences from KM group analysis:

Risk Dimension	High-risk Group	Low-risk Group	Median Survival Time Difference
Partner status	No partner (24 months)	With partner (49 months)	25 months
Dependents status	No dependents (25 months)	With dependents (48 months)	23 months
Internet service	Fiber (30 months)	DSL (52 months)	22 months
Tech support	No (29 months)	Yes (56 months)	27 months

It is recommended to embed the above four high-risk labels into the real-time risk scoring engine of the CRM system, setting up automated retention intervention nodes at months 6, 12, and 18 after customer onboarding.

9.2 Medium-term Strategy Optimization (6–12 months)

(1) Customer stratification and refined operations

Based on the risk score (linear predictor = β̂ᵀX) output by the Cox model, divide customers into three risk tiers:

Risk Tier	Risk Score Percentile	Suggested Intervention	Expected Resource Investment
Low risk	< 25%	Routine service maintenance	Low
Medium risk	25%–75%	Quarterly service follow-up, coupon推送	Medium
High risk	> 75%	Dedicated account manager, personalized retention plan	High

(2) Service portfolio optimization

For fiber optic internet service customers (median survival only 30 months), design exclusive service packages including online backup, tech support, and device protection to close the retention gap with DSL users.
Target single-person households (no partner and no dependents) as core intervention subjects; their KM median survival is only 24 months, significantly lower than customers with families.

(3) Customer Acquisition Cost (CAC) control

Based on the CLV estimate ($626.69) and the 10% annual discount rate assumption, it is recommended to:

Control CAC within 30% of CLV, i.e., not exceeding $188.
Adjust CAC limits by channel according to the average risk score of customers acquired from that channel. Channels with higher risk propensity should have a lower CAC ceiling.

9.3 Long-term Strategic Recommendations (12–36 months)

(1) Model lifecycle management

Establish a quarterly model recalibration mechanism, incorporating the latest churn data to update Cox model coefficients.
Expand feature engineering to include behavioral time-series features such as customer service interaction records (number of complaints, call duration), bill payment delay days, and plan change history.
Explore the use of random survival forests or deep survival models (e.g., DeepSurv) as alternatives to the Cox model to capture non-linear effects and interactions.

(2) Retention effectiveness monitoring system

Recommend setting up the following Key Performance Indicators (KPIs) with automated monitoring dashboards:

KPI	Definition	Update Frequency	Alert Threshold
Overall median survival time	50% churn time point estimated by KM	Monthly	Month-over-month decrease > 5%
Proportion of high-risk customers	Percentage of customers with risk score > 75th percentile	Weekly	Proportion > 30%
Value-added service penetration rate	Subscription rate for online backup/tech support	Monthly	Year-over-year growth < 5%
CLV trend	72-month cumulative NPV for baseline customer	Quarterly	Quarter-over-quarter decrease > 10%

(3) Maximizing customer lifetime value

The CLV trend analysis shows that the first 24 months contribute 64.7% of total value. Therefore:

Front-load retention resources in the first two years after onboarding, implementing the highest-intensity interventions during this period.
Set up key touchpoints at months 12 and 24 to enhance renewal rates at those times through personalized offers, service upgrade recommendations, etc.
For long-standing customers who have been active for more than 36 months, reduce retention resource investment and transition them into a low-maintenance “stable period” management.

10. Model Limitations and Improvement Directions

10.1 Reasons for AFT Model Prediction Failure

The LogLogistic AFT model used in this study predicted a median survival time (135.5 months) that significantly deviates from the Kaplan-Meier estimate (34 months). The root causes can be attributed to the following two points:

High censoring rate: The analysis sample (n=3,351) had 1,556 observed churn events, a censoring rate of approximately 53.5%. A large number of customers had not yet churned by the end of the observation period (72 months), leading to severe extrapolation bias in the AFT model’s inference of the tail survival distribution.
Mismatched distributional assumption: The LogLogistic distribution assumes a unimodal hazard function (increasing then decreasing), whereas telecom churn data may be closer to a monotonically decreasing hazard function. We recommend trying a Weibull distribution (allows monotonic hazard changes) or selecting the optimal parametric form via cross‑validation.

Remedial suggestion: When the observation window is insufficient to capture churn events for the majority of customers, prioritize non‑parametric (KM) or semi‑parametric (Cox) methods over fully parametric AFT models for extrapolation.

10.2 Data Limitations

Sample selection bias: This study includes only month-to-month contract customers with internet service (n=3,351), representing 47.6% of the original sample (n=7,043). This selection criterion controls for heterogeneity in contract type and service scope, but it also means conclusions cannot be generalized to:
- Customers with annual/two-year long-term contracts (typically lower churn rates)
- Customers with only phone service (no internet)
Insufficient observation window: The maximum follow‑up time is 72 months. The AFT model’s predicted churn time far exceeds this range, indicating that the available data are insufficient for reliable inference about the churn time of long‑tail customers.
Cross‑sectional data limitation: The data used are cross‑sectional observations, lacking time‑series information on customer behavior (e.g., service usage frequency, billing payment history, customer service interactions), limiting the model’s ability to capture dynamic churn signals.

10.3 External Validity

The empirical findings of this study are based on a simulated telecom dataset provided by IBM. Although designed to reflect real‑world business scenarios, differences exist compared to actual telecom company operations (e.g., service pricing, market competition intensity, customer demographic distributions).
When generalizing the conclusions of this study to other industries (e.g., finance, retail, SaaS), the model needs to be recalibrated and validated for industry‑specific customer lifecycle characteristics.

10.4 Model Assumption Violations and Mitigation Strategies

The proportional hazards assumption test for the Cox model shows that the variables internetService_DSL, onlineBackup_Yes, and techSupport_Yes all fail the test (p < 0.05). This suggests that the effects of these variables may change over customer tenure. For example:

The protective effect of online backup may be more pronounced early in a customer’s tenure and decay over time.
The effect of tech support may manifest at specific times when customers encounter issues, rather than being uniformly distributed.

Model improvement directions:

Strategy	Operational Path	Applicable Scenario
Stratified Cox model	Use `strata=['internetService_DSL', 'onlineBackup_Yes', 'techSupport_Yes']`	Variable effect does not change uniformly over time, but explicit time interaction modeling is not required
Time-varying covariates	Construct interaction terms of the form `X(t) = X × g(t)`	Need to quantify the specific functional form of effect change over time
Extended Cox model	Use `CoxTimeVaryingFitter`	Covariate values themselves change over time (e.g., service subscription status changes)

We recommend trying the stratified Cox model first in model iterations. This method is easy to implement and effectively handles violations of the proportional hazards assumption.

10.5 Future Research Directions

Feature engineering expansion: Incorporate behavioral time‑series features (e.g., average monthly data usage, number of customer service complaints, bill payment delay days) to build dynamic survival analysis models.
Model comparison experiments: Compare the predictive performance of the Cox model, random survival forest, and DeepSurv on the same validation set, using time‑dependent AUC or Brier Score as evaluation metrics.
Causal inference extensions: Use propensity score matching (PSM) or instrumental variables to further validate the causal relationship strength between online backup/tech support services and customer retention, ruling out self‑selection bias.

11. Appendix

Appendix A: Technical Parameters

Analysis tools: PySpark + Lifelines
Spark configuration: Driver memory 4G, Executor memory 2G
Model version: v1.0

Appendix B: File List

File Name	Content
kaplan_meier_summary.csv	KM analysis event table
cox_model_summary.csv	Detailed Cox model results
aft_model_summary.csv	AFT model results
clv_cohort.csv	CLV monthly calculation results
analysis_report.txt	Full analysis report

Appendix C: Code Runtime Environment

Report generation date: 2026-04-26
Python version: 3.x
Dependencies: pyspark, pandas, numpy, lifelines, matplotlib, seaborn

Table of Contents

1. Introduction

1.1 What is Survival Analysis?

1.2 Project Overview

2. Data Overview

2.1 Data Source

2.2 Sample Characteristics

3. Survival Analysis Methods

3.1 Kaplan-Meier Estimator

3.2 Cox Proportional Hazards Model

3.3 Accelerated Failure Time (AFT) Model

3.4 Log-Rank Test

3.5 Method Comparison

4. Kaplan-Meier Survival Analysis

4.1 Analysis Workflow

4.2 Analysis Results

4.2.1 Overall Survival Curve

4.2.2 DSL Internet Service Survival Probability (first 10 months)

4.2.3 Group Survival Analysis Results

4.2.4 Key Findings

5. Cox Proportional Hazards Model

5.1 Analysis Workflow

5.2 Analysis Results

5.2.1 Model Overview

5.2.2 Model Coefficients and Hazard Ratio Analysis

5.2.3 Proportional Hazards Assumption Test Results

5.2.4 Key Findings

6. Accelerated Failure Time Model (AFT)

6.1 Analysis Workflow

6.2 Analysis Results

6.2.1 Model Results

6.2.2 AFT Model Coefficient Table

6.2.3 Model Assumption Verification - Log-odds Plots

6.2.4 Reliability Warnings

6.2.5 Recommendation

7. Customer Lifetime Value (CLV)

7.1 Calculation Workflow

7.2 Calculation Results

7.2.1 Calculation Parameters

7.2.2 CLV Key Node Results

7.2.3 CLV Trend Table (complete data for first 25 months)

7.2.4 Key Findings

8. Conclusion

8.1 Model Applicability and Reliability Assessment

8.2 Core Empirical Findings

(1) Overall customer retention level

(2) Identification of key protective factors (based on Cox model)

(3) High-risk customer profile

(4) Customer Lifetime Value (CLV)

8.3 Summary of Methodological Limitations

9. Business Recommendations

9.1 Short-term Operational Strategies (0–6 months)

(1) Value-added service promotion plan

(2) Early identification of high-risk customers

9.2 Medium-term Strategy Optimization (6–12 months)

(1) Customer stratification and refined operations

(2) Service portfolio optimization

(3) Customer Acquisition Cost (CAC) control

9.3 Long-term Strategic Recommendations (12–36 months)

(1) Model lifecycle management

(2) Retention effectiveness monitoring system

(3) Maximizing customer lifetime value

10. Model Limitations and Improvement Directions

10.1 Reasons for AFT Model Prediction Failure

10.2 Data Limitations

10.3 External Validity

10.4 Model Assumption Violations and Mitigation Strategies

10.5 Future Research Directions

11. Appendix

Appendix A: Technical Parameters

Appendix B: File List

Appendix C: Code Runtime Environment