probability – Pattern Recognition and Machine Learning (Bishop). How is this gradient computed and is it wrong

AIGumbo.crew December 30, 2023 No Comments

I am reading Pattern Recognition and Machine Learning [1 ed] by Bishop and believe I have found an error. in section 3.1.1 the gradient for the maximum likelihood step seems wrong. Equation 3.13 is the gradient equation. Firstly the gradient should have a negative sign on the error term so that to maximize the probability we must minimize our error. I am not sure where the phi transpose vector comes into play either.

here is the log likelihood function (which I believe is also wrong since the first term should not have a $\frac{1}{2}$ scalar).