**Machine Learning**

Problem 1: Naive Bayes Classifiers (20 pts) Consider the binary classification problem where class label Y E {0, 1} and each training example X has 2 binary attributes X = [X1, X2] E {0,1}2. Assume that class priors are given P(Y = 0) = P(Y = 1) = 0.5, and that the conditional probabilities P(X1 I Y) and P(X2IY) are given as follows:

p(Xi IY) Y = 0 Y = 1 X1 = 0 0.7 0.2 X1 = 1 0.3 0.8

p(X2 1Y) Y = 0 Y = 1 X2 = 0 0.9 0.5 X2 = 1 0.1 0.5

(a) [6 ptsJ What is the naive Bayes prediction “(x) for the input x = [x1, x2] = [0, 0]? Explain your reasoning.

(b) Assume you are not given the probability distributions P(Y), P(Xi iY) or P(X2IY), and are asked to estimate them from data instead. How many parameters would you need to estimate?

(c) Assume you want to estimate the conditional probability distribu-tion P(YiXi, X2) directly, without making the naive Bayes assumption. How many parameters would you need to estimate from data?

(d) Assume you now want to estimate the joint probability distribution P(Y, Xi, X2) directly. How many parameters would you need to estimate from data?