透視 XGBoost(0) 總結篇

2021-02-172021-02-18Machine Learning4 minutes read (About 535 words)0 visits

此篇總結透視 XGBoost 系列，建議系列閱讀順序為

XGBoost 通用 objective function

XGBoost 的 loss function 以 second order Taylor Expansion approximate，使的 loss function 存在 first and second order derivative

$\begin{aligned} l(y,\hat{y}^{(m-1)} + f_m(x)) \approx l(y, F_{m-1}(x)) + gf_m(x) + \cfrac{1}{2}hf_m(x)^2 \end{aligned}$

通用 objective function 使 XGBoost 在 classification, regression, rank 的任務上表達式皆一致

$\begin{aligned} \mathcal{\tilde{L}}^{(m)} &= \sum^{T_m}_{j=1}[(\sum_{i \in R_{m,j}}g_i) \gamma_{j,m} + \cfrac{1}{2}(\sum_{i \in R_{j,m} }h_i + \lambda)\gamma_{j,m}^2] + \tau T_m \end{aligned}$

請參閱透視 XGBoost(3) 蘋果樹下的 objective function

XGBoost v.s. GBDT

一般 GBDT 用 regression tree 擬合 residuals，本質上是往 negative gradient 方向移動
XGBoost tree $f_m(x)$ 擬合 residuals，同時考慮 gradient 的方向和 gradient 變化趨勢，這讓他朝 optimal value 移動時顯得更加聰明有效
- gradient 的方向： first order derivative
- gradient 變化趨勢： second order derivative

請參閱透視 XGBoost(3) 蘋果樹下的 objective function

What Makes XGBoost so Effective?

Avoiding Overfitting

Objective function 內加入 regularization term，限制 leaf node 輸出大小和 leaf node number

$\begin{aligned} \mathcal{L}& = [\sum_i^n l(y_i, \hat{y_i}) ] + \sum_{m}\Omega({f_m}) \\ & \Omega(f_m) = \tau T_m + \cfrac{1}{2}\lambda \lVert \gamma \rVert ^2 \end{aligned}$

可以設置 min_child_weight 限制 each leaf node 的 cover
可以設置 max_depth 限制 XGB tree 的深度
Column sampling and Row sampling

參閱

Split Findding

採用 approximate greedy algorithm 選取 candidate split 點，candidate split 基於 weighted quantile 切分
weighted Quantile Sketch 盡量使 quantiles 間的 summation of weight 相等
weight 計算直接來自 second derivative of loss function $h_i$

參閱透視 XGBoost(4) 神奇 optimization 在哪裡？

System Design

數據量大時，將 data samples 切分成 multi blocks 方便 parallel computing
- 訓練前 sorting block 內的 columns values，避免每次分裂皆要對特徵 (columns) 排序
開 thread pre-fetch $g_i \ h_i$，優化 cache hitting rate
out-of-core blocks 將放不進 memory 的 block 成塊的放到 hard disks，並在要計算前 pre-fetch to memory using thread

參閱透視 XGBoost(4) 神奇 optimization 在哪裡？

Other

data samples 內的 missing value (NAs) 可以在訓練時學出來分裂位置

參閱透視 XGBoost(4) 神奇 optimization 在哪裡？

透視 XGBoost(0) 總結篇

https://seed9d.github.io/what-make-XGBoost-so-effective/

Author

seed9D

Posted on

2021-02-17

Updated on

2021-02-18

透視 XGBoost(0) 總結篇

XGBoost 通用 objective function

XGBoost v.s. GBDT

What Makes XGBoost so Effective?

Avoiding Overfitting

Split Findding

System Design

Other

Author

Posted on

Updated on

Licensed under

Comments

Catalogue

Recents