Gradient Boosting Survival Tree with Applications in Credit Scoring Miaojun Bai, Yan Zheng, Yun Shen 360 F I. (N: QFIN) Credit Scoring and Credit Control XVI, Edinburgh, 29.08.2019 Yun Shen | Gradient Boosting Survival Tree 1/21
Gradient Boosting Survival Tree
with Applications in Credit Scoring
Miaojun Bai, Yan Zheng, Yun Shen
360 Finance Inc. (Nasdaq: QFIN)
Credit Scoring and Credit Control XVI, Edinburgh, 29.08.2019
Yun Shen | Gradient Boosting Survival Tree 1/21
Outline
1 Motivation
2 Gradient boosting survival tree
3 Applications in credit scoring
4 Conclusion
Yun Shen | Gradient Boosting Survival Tree 2/21
Chinese consumer finance market
01.2010 10.2018
97.1
1207.7
Market size ($ billion)
Rapid growth
Heterogeneous data
PBC report: 1/3 has credit ratings
personal info.
device info.
third party rate agencies
Changing market conditions
regulation
macroeconomic factor
Yun Shen | Gradient Boosting Survival Tree 3/21
Motivation
Pros of tree ensemble methods (e.g., XGB, LightGBM)
robust for heterogeneous data
fast modeling for credit scoring
utilize numerous “weak” a�ributes
Pros of survival analysis
predict the probability of default time
take long-term behavior into consideration
Idea: survival analysis + tree ensemble methods?
Yun Shen | Gradient Boosting Survival Tree 4/21
Motivation
Pros of tree ensemble methods (e.g., XGB, LightGBM)
robust for heterogeneous data
fast modeling for credit scoring
utilize numerous “weak” a�ributes
Pros of survival analysis
predict the probability of default time
take long-term behavior into consideration
Idea: survival analysis + tree ensemble methods?
Yun Shen | Gradient Boosting Survival Tree 4/21
Motivation
Pros of tree ensemble methods (e.g., XGB, LightGBM)
robust for heterogeneous data
fast modeling for credit scoring
utilize numerous “weak” a�ributes
Pros of survival analysis
predict the probability of default time
take long-term behavior into consideration
Idea: survival analysis + tree ensemble methods?
Yun Shen | Gradient Boosting Survival Tree 4/21
Survival analysis
Survival function: S(t) = P(T > t)Discrete time periods
...
Hazard function:
h(τj) := P(τj−1 < T ≤ τj|T > τj−1), j = 1, 2, . . . ,
Hence,
S(τj) =j∏
l=1
(1− h(τl))
Likelihood
P(τj−1 < T ≤ τj) = h(τj)S(τj−1) = h(τj)j−1∏l=1
(1− h(τl))
Yun Shen | Gradient Boosting Survival Tree 5/21
Survival analysis
Survival function: S(t) = P(T > t)Discrete time periods
...
Hazard function:
h(τj) := P(τj−1 < T ≤ τj|T > τj−1), j = 1, 2, . . . ,
Hence,
S(τj) =j∏
l=1
(1− h(τl))
Likelihood
P(τj−1 < T ≤ τj) = h(τj)S(τj−1) = h(τj)j−1∏l=1
(1− h(τl))
Yun Shen | Gradient Boosting Survival Tree 5/21
Likelihood
Log hazard function
f (t) := log
h(t)1− h(t)
Likelihood
P(T = t) =J(t)∧J∏j=1
1
1 + e−yj(t)f (τj),
where
J(t) :={
j, if t ∈ (τj−1, τj]J + 1, if t > τJ
yj(t) :={−1, if t > τj1, if t ≤ τj
...
Yun Shen | Gradient Boosting Survival Tree 6/21
Learning objective
For each individual x, f is approximated by a survival tree ensemble
f (t; x) ∼= f̂ (t; x) :=K∑
k=1
fk(t; x)
age
sex
education salary
education
sex
male female
femalemale low high
low high
Yun Shen | Gradient Boosting Survival Tree 7/21
Learning objective
To minimize the negative log-likelihood
L =N∑i=1
J(ti)∧J∑j=1
log
(1 + exp
{−yj(ti)f̂ (τj; xi)
})+λ
2
‖w‖2
=
J∑j=1
∑i∈Nj
log
(1 + exp
(−yj(ti)f̂ (τj; xi)
))+λ
2
‖w‖2
where Nj := {i ∈ {1, 2, . . . ,N}|J(ti) ≥ j} is the set of samples
surviving longer than τj−1.
Regularization term
punish model complexity
avoid over-fi�ing
overcome numerical problems
Yun Shen | Gradient Boosting Survival Tree 8/21
Gradient tree boosting
Boosting algorithm:
At mth iteration, given f̂ (m−1)
min
fL(m) =
∑j,i
log
(1 + exp
{−yj(ti)
(f̂ (m−1)(τj; xi) + f (τj; xi)
)})+λ
2
‖w‖2 ⇒ fm
update f̂ (m)(t; x) = f̂ (m−1)(t; x) + fm(t; x)
Approximate by Taylor expansion up to the 2nd order
L(m)(f ) ∼=∑j,i
(r(m−1)i,j f (τj; xi) +
1
2
σ(m−1)i,j f 2(τj; xi)
)+λ
2
‖w‖2
Yun Shen | Gradient Boosting Survival Tree 9/21
Gradient tree boosting
Survival tree with L nodes: f (τj; xi) =∑L
l=1wl(τj)1(i ∈ Il)
The objective function is strictly convex with optimal solution
w(m)l (τj) = −
∑i∈Nj∩Il r
(m−1)i,j∑
i∈Nj∩Il σ(m−1)i,j + λ
Split rule: I = IL ∪ IR
L̃split =1
2
∑j
(∑i∈Nj∩IL r
(m−1)i,j
)2
∑i∈Nj∩IL σ
(m−1)i,j + λ
+
(∑i∈Nj∩IR r
(m−1)i,j
)2
∑i∈Nj∩IR σ
(m−1)i,j + λ
−
(∑i∈Nj∩I r
(m−1)i,j
)2
∑i∈Nj∩I σ
(m−1)i,j + λ
.
Yun Shen | Gradient Boosting Survival Tree 10/21
Summary
Log hazard function is approximated by a survival tree ensemble
maximum likelihood as the objective function
boosting algorithm
for each step, a gradient method applied to optimize the approximated
objective up to 2nd order
Yun Shen | Gradient Boosting Survival Tree 11/21
Datasets
Installment loans with 12 months
Definition of default: if on any scheduled repayment due date the
borrower is overdue for at least 10 days
Early repayments: regarded as “repaying on time” in the rest time
training and testing datasets
dataset time sample sizetraining set January 2018 200,000
testing set March 2018 120,000
Default rate
default rate(t) =#default accounts up to month t
#total accounts
Yun Shen | Gradient Boosting Survival Tree 12/21
Default rates on datasets
1 2 3 4 5 6 7 8 9 10 11 12Month
0
0.2b
0.4b
0.6b
0.8b
b
1.2bDe
fault R
ate
Training dataTesting database_rate: b
Yun Shen | Gradient Boosting Survival Tree 13/21
Dataset and preprocessing
Over 400 original a�ributes are collected
exclude a�ributes with missing rate higher than 80%one-hot encoding for categorical a�ributes
50 features are selected by xgboost
source feature
PBC report
income score
credit score
overdue information of credit cards
personal information
age
sex
education level
device information location
third-party rate agency
no. of loans in other lending platforms
travel intensity
other information
whether possessing a car
application channel
Yun Shen | Gradient Boosting Survival Tree 14/21
Convergence
1000 runs with λ = 0.001 and the max tree depth 6
0 5 10 15 20 25 30Iterations
0
2
4
6
8
10
12
14
16Lo
ss
Yun Shen | Gradient Boosting Survival Tree 15/21
Performance
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20Survival Groups
Default R
ate
Month 1Month 2Month 3Month 4Month 5Month 6Month 7Month 8Month 9Month 10Month 11Month 12
Yun Shen | Gradient Boosting Survival Tree 16/21
Comparison with existing models: C-Index
1 2 3 4 5 6 7 8 9 10 11 12Month
0.77
0.78
0.79
0.80
0.81
C-Inde
x
GBSTCOXRSFXGB
Yun Shen | Gradient Boosting Survival Tree 17/21
Comparison with existing models: AUC
1 2 3 4 5 6 7 8 9 10 11 12Month
0.77
0.78
0.79
0.80
0.81
AUC
GBSTCOXRSFXGB
Yun Shen | Gradient Boosting Survival Tree 18/21
Comparison with existing models
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20Survival Groups
0
0.5b
b
1.5b
2b
2.5b
3b
3.5b
4bDe
faul
t Rat
e
GBSTCOXRSFXGBbase_rate: b
Yun Shen | Gradient Boosting Survival Tree 19/21
Conclusion
Propose the gradient boosting survival tree (GBST) model
Confirm the convergence of GBST with a real dataset
GBST outperforms existing survival analysis and machine learning
models
Thank you!
Yun Shen | Gradient Boosting Survival Tree 20/21
Conclusion
Propose the gradient boosting survival tree (GBST) model
Confirm the convergence of GBST with a real dataset
GBST outperforms existing survival analysis and machine learning
models
Thank you!
Yun Shen | Gradient Boosting Survival Tree 20/21