😎 What is min_child_weight in XGBoost?

min_child_weight is not intuitive. But who cares...xgboost still rox.

Feb 25, 2023

green tree on grassland during daytime — Photo by Johann Siemens on Unsplash

Great SO question about “Explanation of min_child_weight in xgboost algorithm”. Because when you read the docs you expect to hear that it’s the number of samples in the leaf. But instead it talks about “instance weight”. From the answer by hahdawg we learn the following:

For regression, min_child_weight is the number of samples in a leaf. The default value for this is 1, so for regression tasks you may have an overfit tree that’s essentially splitting down to one datapoint. This is because the sum of the instance weight for regression is the number of samples.
For non-regression, it’s not. It’s some measure of the purity of the node.

This question asks, “why not just use min_child_leaf”, where min_child_leaf would be the number of samples. It seems a lot more intuitive and you wouldn’t have to do the mental math for non-regression tasks.

From the XGBoost docs

min_child_weight [default=1] . Minimum sum of instance weight (hessian) needed in a child. If the tree partition step results in a leaf node with the sum of instance weight less than min_child_weight, then the building process will give up further partitioning. In linear regression task, this simply corresponds to minimum number of instances needed to be in each node. The larger min_child_weight is, the more conservative the algorithm will be. (source)

From the XGBoost code:

bool IsValid(GradStats const &left, GradStats const &right) const {
    return left.GetHess() >= param_.min_child_weight && right.GetHess() >= param_.min_child_weight;
}

So we can see that if either side’s hessian goes below the hessian, it stops splitting.

QED.

red white and green color pencils — Photo by Alexander Grey on Unsplash

Data Science Daily

Discussion about this post