Decision Tree Problem, Part 1

 


Reference: P. Winston, 1992.

Factors Affecting Sunburn

 

Given Data

Independent Attributes / Condition Attributes

Dependent Attributes / Decision Attributes

 

Name

Hair

Height

Weight

Lotion

Result

Sarah

blonde

average

light

no

sunburned (positive)

Dana

blonde

tall

average

yes

none (negative)

Alex

brown

short

average

yes

none

Annie

blonde

short

average

no

sunburned

Emily

red

average

heavy

no

sunburned

Pete

brown

tall

heavy

no

none

John

brown

average

heavy

no

none

Katie

blonde

short

light

yes

none

 

Phase 1: From Data to Tree

1.      Perform average entropy calculations on the complete data set for each of the four attributes:

b1 = blonde
b2 = red
b3 = brown

Average Entropy = 0.50



b1 = short
b2 = average
b3 = tall

Average Entropy = 0.69



 

b1 = light
b2 = average
b3 = heavy

Average Entropy = 0.94



b1 = no
b2 = yes

Average Entropy = 0.61



Results

Attribute

Average Entropy

Hair Color

0.50

Height

0.69

Weight

0.94

Lotion

0.61

The attribute "hair color" is selected as the first test because it minimizes the entropy.

 

 

  1. Similarily, we now choose another test to separate out the sunburned individuals from the blonde haired inhomogeneous subset, {Sarah, Dana, Annie, and Katie}.

Results

Attribute

Average Entropy

Height

0.50

Weight

1.00

Lotion

0.00

The attribute "lotion" is selected because it minimizes the entropy in the blonde hair subset.

Thus, using the "hair color" and "lotion" tests together ensures the proper identification of all the samples.

 

This is the completed decision tree.