Gradient Boosting Survival Tree with Applications in ...€¦ · Rapid growth Heterogeneous data PBC report: 1/3 has credit ratings personal info. device info. third party rate agencies

Gradient Boosting Survival Tree

with Applications in Credit Scoring

Miaojun Bai, Yan Zheng, Yun Shen

360 Finance Inc. (Nasdaq: QFIN)

Credit Scoring and Credit Control XVI, Edinburgh, 29.08.2019

Yun Shen | Gradient Boosting Survival Tree 1/21

Outline

1 Motivation

2 Gradient boosting survival tree

3 Applications in credit scoring

4 Conclusion


Chinese consumer finance market

01.2010 10.2018

97.1

1207.7

Market size ($ billion)

Rapid growth

Heterogeneous data

PBC report: 1/3 has credit ratings

personal info.

device info.

third party rate agencies

Changing market conditions

regulation

macroeconomic factor


Motivation

Pros of tree ensemble methods (e.g., XGB, LightGBM)

robust for heterogeneous data

fast modeling for credit scoring

utilize numerous “weak” a�ributes

Pros of survival analysis

predict the probability of default time

take long-term behavior into consideration

Idea: survival analysis + tree ensemble methods?


Motivation










Motivation










Survival analysis

Survival function: S(t) = P(T > t)Discrete time periods

...

Hazard function:

h(τj) := P(τj−1 < T ≤ τj|T > τj−1), j = 1, 2, . . . ,

Hence,

S(τj) =j∏

l=1

(1− h(τl))

Likelihood

P(τj−1 < T ≤ τj) = h(τj)S(τj−1) = h(τj)j−1∏l=1

(1− h(τl))


Survival analysis

Survival function: S(t) = P(T > t)Discrete time periods

...

Hazard function:

h(τj) := P(τj−1 < T ≤ τj|T > τj−1), j = 1, 2, . . . ,

Hence,

S(τj) =j∏

l=1

(1− h(τl))

Likelihood

P(τj−1 < T ≤ τj) = h(τj)S(τj−1) = h(τj)j−1∏l=1

(1− h(τl))


Likelihood

Log hazard function

f (t) := log

h(t)1− h(t)

Likelihood

P(T = t) =J(t)∧J∏j=1

1

1 + e−yj(t)f (τj),

where

J(t) :={

j, if t ∈ (τj−1, τj]J + 1, if t > τJ

yj(t) :={−1, if t > τj1, if t ≤ τj

...


Learning objective

For each individual x, f is approximated by a survival tree ensemble

f (t; x) ∼= f̂ (t; x) :=K∑

k=1

fk(t; x)

age

sex

education salary

education

sex

male female

femalemale low high

low high


Learning objective

To minimize the negative log-likelihood

L =N∑i=1

J(ti)∧J∑j=1

log

(1 + exp

{−yj(ti)f̂ (τj; xi)

})+λ

2

‖w‖2

=

J∑j=1

∑i∈Nj

log

(1 + exp

(−yj(ti)f̂ (τj; xi)

))+λ

2

‖w‖2

where Nj := {i ∈ {1, 2, . . . ,N}|J(ti) ≥ j} is the set of samples

surviving longer than τj−1.

Regularization term

punish model complexity

avoid over-fi�ing

overcome numerical problems


Gradient tree boosting

Boosting algorithm:

At mth iteration, given f̂ (m−1)

min

fL(m) =

∑j,i

log

(1 + exp

{−yj(ti)

(f̂ (m−1)(τj; xi) + f (τj; xi)

)})+λ

2

‖w‖2 ⇒ fm

update f̂ (m)(t; x) = f̂ (m−1)(t; x) + fm(t; x)

Approximate by Taylor expansion up to the 2nd order

L(m)(f ) ∼=∑j,i

(r(m−1)i,j f (τj; xi) +

1

2

σ(m−1)i,j f 2(τj; xi)

)+λ

2

‖w‖2


Gradient tree boosting

Survival tree with L nodes: f (τj; xi) =∑L

l=1wl(τj)1(i ∈ Il)

The objective function is strictly convex with optimal solution

w(m)l (τj) = −

∑i∈Nj∩Il r

(m−1)i,j∑

i∈Nj∩Il σ(m−1)i,j + λ

Split rule: I = IL ∪ IR

L̃split =1

2

∑j

(∑i∈Nj∩IL r

(m−1)i,j

)2

∑i∈Nj∩IL σ

(m−1)i,j + λ

+

(∑i∈Nj∩IR r

(m−1)i,j

)2

∑i∈Nj∩IR σ

(m−1)i,j + λ

−

(∑i∈Nj∩I r

(m−1)i,j

)2

∑i∈Nj∩I σ

(m−1)i,j + λ

.


Summary

Log hazard function is approximated by a survival tree ensemble

maximum likelihood as the objective function

boosting algorithm

for each step, a gradient method applied to optimize the approximated

objective up to 2nd order


Datasets

Installment loans with 12 months

Definition of default: if on any scheduled repayment due date the

borrower is overdue for at least 10 days

Early repayments: regarded as “repaying on time” in the rest time

training and testing datasets

dataset time sample sizetraining set January 2018 200,000

testing set March 2018 120,000

Default rate

default rate(t) =#default accounts up to month t

#total accounts


Default rates on datasets

1 2 3 4 5 6 7 8 9 10 11 12Month

0

0.2b

0.4b

0.6b

0.8b

b

1.2bDe

fault R

ate

Training dataTesting database_rate: b


Dataset and preprocessing

Over 400 original a�ributes are collected

exclude a�ributes with missing rate higher than 80%one-hot encoding for categorical a�ributes

50 features are selected by xgboost

source feature

PBC report

income score

credit score

overdue information of credit cards

personal information

age

sex

education level

device information location

third-party rate agency

no. of loans in other lending platforms

travel intensity

other information

whether possessing a car

application channel


Convergence

1000 runs with λ = 0.001 and the max tree depth 6

0 5 10 15 20 25 30Iterations

0

2

4

6

8

10

12

14

16Lo

ss


Performance

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20Survival Groups

Default R

ate

Month 1Month 2Month 3Month 4Month 5Month 6Month 7Month 8Month 9Month 10Month 11Month 12


Comparison with existing models: C-Index

1 2 3 4 5 6 7 8 9 10 11 12Month

0.77

0.78

0.79

0.80

0.81

C-Inde

x

GBSTCOXRSFXGB


Comparison with existing models: AUC

1 2 3 4 5 6 7 8 9 10 11 12Month

0.77

0.78

0.79

0.80

0.81

AUC

GBSTCOXRSFXGB


Comparison with existing models

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20Survival Groups

0

0.5b

b

1.5b

2b

2.5b

3b

3.5b

4bDe

faul

t Rat

e

GBSTCOXRSFXGBbase_rate: b


Conclusion

Propose the gradient boosting survival tree (GBST) model

Confirm the convergence of GBST with a real dataset

GBST outperforms existing survival analysis and machine learning

models

Thank you!


Conclusion

Propose the gradient boosting survival tree (GBST) model

Confirm the convergence of GBST with a real dataset

GBST outperforms existing survival analysis and machine learning

models

Thank you!


Gradient Boosting Survival Tree with Applications in ...€¦ · Rapid growth Heterogeneous data PBC report: 1/3 has credit ratings personal info. device info. third party rate agencies

Documents