Decision Tree in Machine Learning | Decision Nodes | Leaf Nodes | Examples of DECISION TREES

Decision tree is a type of tree which has tree structure classifier.

Decision Tree has two type of node :-

1) Decision Nodes
2) Leaf Nodes

Example :-

Employee have two outcome first is No (Credit Score ?) and another is Yes (Income ?). In Credit Score ? have a two way one is High(Approve) and another is Low(Reject). In Income ? have two way high and low.

Issues :-

Given some training examples, what decision tree should be generated ?
One proposal: prefer the smallest tree that is consistent with the data (Bias)

Possible method:
- search the space of decision trees for the smallest decision tree that fits the data

Prefer small trees
- Low depth
- Small no of node

Examples of DECISION TREES

Decision Tree for Play Tennis

Attributes and their values :
- Outlook : Sunny, Overcast, Rain
- Humidity : High, Normal
- Wind : Strong, Weak
- Temperature : Hot, Mild, Cool
Target concept - Play Tennis : Yes, No

Decision trees represent disjunctions of conjunctions
(Outlook = Sunny ^ Humidity = Normal )
⋁ (Outlook = Overcast)
⋁ (Outlook = Rain ^ Wind = Weak)

Searching for a good tree

The space of Decision trees is too big for systematic search.

Stop and
- return the a value for the target feature or
- a distribution over target feature values

Choose a test (e.g. an input feature) to split on.
- For each value of the test, build a subtree for those examples with this value for the test.

Top-Down Induction of Decision Trees ID3

1. A← the "best" decision attribute for next node
2. Assign A as decision attribute for node
3. For each value of A create new descendant
4. Sort training examples to leaf node according to the attribute value of the branch
5. If all training examples are perfectly classified (same value of target attribute) stop, else iterate over leaf nodes.

Choices

When to stop
- no more input features
- all examples are classified the same
- too few examples to make an informative split

Which test to split on
- split gives smallest error.
- With multi-valued features
* split on all values or
* split values into half.