CS 540 HW #1: Learning Decision Trees

ECS170

Homework Assignment

Winter 2003

ECS 170 HW #5b: Suggested Solutions

Bayes Learning

Assigned: 4 March 2003

Due: 11 March 2003

(NO LATE SUBMISSIONS will be accepted. I need to post the solutions to this set ASAP after the deadline.)

TENTATIVE

1. Do problem 15.1

2. Do Problem 15.2, Do only the parts a, b, c, d, and e.

(I made an error on this page. The second problem should be 15.2, not 15.3) Now it is corrected.

But, if you already solved 15.3, consider it a good exercise!)

Solutions:

15.1 (a) When you have to extend an existing network by adding new variables, you can do it in a variety of ways. You have to start by asking, "what variables are 'causes' and what are the 'effects'. This is not always a straightforward process.

Consider IcyWeather. It is not caused by any of the car-related factros such as Battery, gas, radio etc. So it needs no parents. But it does effect the functioning of the battery and the starter motor. So one way to handle this is to put IcyWeather at the root. From there build two branches: one branch goes to Battery and the other to StarterMotorWorks.
From Battery, you have two branches: one branch goes to radio and the other to Ignition
Now the node labelled Starts receives three inputs: One from Ignition (whose parent is Battery) and one from StarterMotorWorks, SMW (whose parent is IcyWeather). The third input to this node comes from Gas (Gas itself has no parents)
Now the Start node has only one child, namely Moves.

NOTE: You should do this by drawing a picture – not the awy I did here, In fact, you can look at Fig. 15.5. In this figure, you introduce two new variables, namely Icy Weather and SMW. Make IcyWeather the parent of both Battery and SMW and SMW as the third parent of Starts.

(b) What are reasonable probabilities? This is a tough call. Normally, you use your best common sense and then work on the model to see how well it makes predictions. Or, you gather lots of data from actual cases. Or, you talk to an expert mechanic.

Here are some "reasonable" numbers for the prior probabilities. These are just intelligent guesses, based on our experience with cars

P(IcyWeather) = 0.5 (this depends on where you are and the season)

P(Battery|IcyWeather) = 0.95; P(Battery|~IcyWeather) = 0.997 (This assumes that you have fairly new battery)

P(SMW|IcyWeather) = 0.98; P(SMW|~IcyWeather) = 0.999 (This assumes that you have fairly reliable starter motor)

P(Radio|Battery) = 0.999; P(Radio|~Battery) = 0.05 (This assumes that you have fairly reliable starter motor)

P(Ignition|Battery) = 0.998; P(Ignition|~Battery) = 0.01

P(Gas) = 0.995

P(Starts|Ignition, SMW, Gas) = 0.9999

All other entries are zero

P(Moves|Starts) = 0.998

(d) With reference to the new figure:

IcyWeather has one entry, the Prior.

Gas has one entry, the prior

Battery, Radio, Ignitionand Moves – each has two entries

Starts, which receives inputs from 3 parents has 8 entries.

Total numberof independent CPT entries is = 20

(e) The CPT of Starts has 8 entries. These entries describe a set of necessary condition for the motor to start. If you look att he figure, you can see that the engine starts only if ALL three antrecedents are satisfied. So except for that entry, all others will be zero. The entry for which the engine starts is fairly close to 1, not quite. because there is always some piece of information we did not think about (may be a wire is broken, may be engine is flodded, etc.) As learn more about the problem, we can add more and more conditions and this entry will move closer and closer to 1.

As I told you, this is a bit involved question. Pl. read through this answer carefully so you get a fairly good idea of how to handle this problem.

(a) A suitable network looks like this. (Please draw this and keep it in front of you. You will need this later)

Root node T

T G

(That is, there are two parallel paths from T to G)

: Another root node

The point to observe is this. The failure nodes are parents to sensor nodes. Notice that the Temperature node is responsible for the Gauge reading and also (indirectly) responsible (via the Gauge Faulty). So if there is an unreliable sensor, then it becomes very difficult for humans to interpret failure modes. (See what happened with Columbia? Whgen Houston received high temperature reading, they have to decide whether it is due to a faulty sensor or really due to high temperature.)

(b) I did NOT mention the name "polytrees" in the class, although I discussed it. I am assuming that you look it up and answer this question. Please see page 448 for a formal definition. Singly connected networks are called polytrees. In such networks, there is at most one un-directed path between any two nodes.

So no matter how you draw your Bayes' net it should NOT be a polytree because the temperature influence the gauge via two different paths.

(c) the CPT table for the situation looks as shown below. Notice that the wording of the question talks in terms of "incorrect".

T=Normal

T=High

T= High

G=Normal

G= high

1-y

1-x

1-y

1-x

(d) Suppose the alarm works, unless it is faulty, in which case it never goes off. Give the CPT associated with A

G=Normal

G=High

G= High

Although I did not ask you to the rest as HW, these are reallly very important because the algorithm gives a step by step systematic procedure for something that we can sometimes guess as intuitive. It is better to follow the algorithm because that way it can be implemented on a computer. Also, the danger of making an intutive error is avoided.

e. Suppose the alarm and the gauge are working, and the alarm sounds. Calculate the probability that the core tempoerature is high.

Let us first state the problem in the language of mathematics.

Let T stads for T = High and G for G = High.

The probability of interest here is

But the alarm's behavior is deterministic. If the alarm is working and makes a sound we can conclude G must be high.. Because and A are d-separated from T, we need only calculate .

Let us try to do this problem two different ways. The first method assumes that a human is solving the problem and so is capable of recognizing an opportunity to simplify the problem

Opportunistic method.

Notice that the CPT entries give us (Look at the diagram)

But we want . That is, we have to reverse the roles of T and G, leaving in the background. This is done using Generalized Bayes' Rule:

Use Bayes' rule again on the last term to get

A similar relationship holds for .

Normalizing

Systematic Method

Here we depend on joint entries, rather than invoking Bayes' theorem, we work from joint densities. Why is the use of joint density table justified here? In the class i have been telling that joint density table has too many entries. The justification comes from two angels. First, the problem size is very small and more importantly, if you look at the graph, you see that T, G, and F_G constitutea completely connected graph. So no loss in efficiency when you work from the joint.

Now we use the chain rule to rewrite the joint entries as CPT entries:(Look at the diagram)

Which is the same as the expression arrived earlier.