14、决策树构建与优化全解析

最新推荐文章于 2026-01-01 16:34:03 发布

原创最新推荐文章于 2026-01-01 16:34:03 发布 · 33 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#决策树 #信息增益 #熵

机器学习入门精要专栏收录该内容

37 篇文章 ¥499.90

订阅专栏¥69.90

会员秒杀 ¥9.9 重磅福利

超级会员免费看

决策树构建与优化全解析

1. 决策树的归纳原理

决策树的归纳采用分治法。其基本流程如下：
设 $T$ 为训练集， grow(T) 函数具体步骤为：
1. 找到对类别标签贡献最大信息的属性 $at$。
2. 将 $T$ 划分为子集 $T_i$，每个子集的特征是 $at$ 具有不同的值。
3. 对于每个 $T_i$：
- 如果 $T_i$ 中的所有示例都属于同一类，则创建一个标有该类别的叶子节点。
- 否则，对每个训练子集递归应用相同的过程： grow(Ti) 。

以下是对应的伪代码：

Let T be the training set.
grow(T):
(1) Find the attribute, at, that contributes the maximum information about the class labels.
(2) Divide T into subsets, Ti, each characterized by a different value of at.
(3) For each Ti:
    If all examples in Ti belong to the same class, then create a leaf labeled with this class; 
    otherwise, apply the same procedure recursively to each training subset: grow(Ti).