Uncategorized

probability – Pattern Recognition and Machine Learning (Bishop). How is this gradient computed and is it wrong


I am reading Pattern Recognition and Machine Learning [1 ed] by Bishop and believe I have found an error. in section 3.1.1 the gradient for the maximum likelihood step seems wrong. Equation 3.13 is the gradient equation. Firstly the gradient should have a negative sign on the error term so that to maximize the probability we must minimize our error. I am not sure where the phi transpose vector comes into play either.

here is the log likelihood function (which I believe is also wrong since the first term should not have a $\frac{1}{2}$ scalar).
log likelihood function

here is the gradient equation 3.13

3.13

I have tried to calculate the gradient myself but I’m not able to get the equation they get.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *