Decision tree is a type of tree which has tree structure classifier.
Decision Tree has two type of node :-
1) Decision Nodes
2) Leaf Nodes
Example :-
Employee have two outcome first is No (Credit Score ?) and another is Yes (Income ?). In Credit Score ? have a two way one is High(Approve) and another is Low(Reject). In Income ? have two way high and low.
Issues :-
Given some training examples, what decision tree should be generated ?
One proposal: prefer the smallest tree that is consistent with the data (Bias)
Possible method:
- search the space of decision trees for the smallest decision tree that fits the data
Prefer small trees
- Low depth
- Small no of node
Examples of DECISION TREES
Decision Tree for Play Tennis
Attributes and their values :
- Outlook : Sunny, Overcast, Rain
- Humidity : High, Normal
- Wind : Strong, Weak
- Temperature : Hot, Mild, Cool
Target concept - Play Tennis : Yes, No
Decision trees represent disjunctions of conjunctions
(Outlook = Sunny ^ Humidity = Normal )
⋁ (Outlook = Overcast)
⋁ (Outlook = Rain ^ Wind = Weak)
Searching for a good tree
The space of Decision trees is too big for systematic search.
Stop and
- return the a value for the target feature or
- a distribution over target feature values
Choose a test (e.g. an input feature) to split on.
- For each value of the test, build a subtree for those examples with this value for the test.
Top-Down Induction of Decision Trees ID3
1. A← the "best" decision attribute for next node
2. Assign A as decision attribute for node
3. For each value of A create new descendant
4. Sort training examples to leaf node according to the attribute value of the branch
5. If all training examples are perfectly classified (same value of target attribute) stop, else iterate over leaf nodes.
Choices
When to stop
- no more input features
- all examples are classified the same
- too few examples to make an informative split
Which test to split on
- split gives smallest error.
- With multi-valued features
* split on all values or
* split values into half.
Decision Tree has two type of node :-
1) Decision Nodes
2) Leaf Nodes
Example :-
Employee have two outcome first is No (Credit Score ?) and another is Yes (Income ?). In Credit Score ? have a two way one is High(Approve) and another is Low(Reject). In Income ? have two way high and low.
Issues :-
Given some training examples, what decision tree should be generated ?
One proposal: prefer the smallest tree that is consistent with the data (Bias)
Possible method:
- search the space of decision trees for the smallest decision tree that fits the data
Prefer small trees
- Low depth
- Small no of node
Examples of DECISION TREES
Decision Tree for Play Tennis
Attributes and their values :
- Outlook : Sunny, Overcast, Rain
- Humidity : High, Normal
- Wind : Strong, Weak
- Temperature : Hot, Mild, Cool
Target concept - Play Tennis : Yes, No
Decision trees represent disjunctions of conjunctions
(Outlook = Sunny ^ Humidity = Normal )
⋁ (Outlook = Overcast)
⋁ (Outlook = Rain ^ Wind = Weak)
Searching for a good tree
The space of Decision trees is too big for systematic search.
Stop and
- return the a value for the target feature or
- a distribution over target feature values
Choose a test (e.g. an input feature) to split on.
- For each value of the test, build a subtree for those examples with this value for the test.
Top-Down Induction of Decision Trees ID3
1. A← the "best" decision attribute for next node
2. Assign A as decision attribute for node
3. For each value of A create new descendant
4. Sort training examples to leaf node according to the attribute value of the branch
5. If all training examples are perfectly classified (same value of target attribute) stop, else iterate over leaf nodes.
Choices
When to stop
- no more input features
- all examples are classified the same
- too few examples to make an informative split
Which test to split on
- split gives smallest error.
- With multi-valued features
* split on all values or
* split values into half.
No comments:
Post a Comment