I am reading Pattern Recognition and Machine Learning [1 ed] by Bishop and believe I have found an error. in section 3.1.1 the gradient for the maximum likelihood step seems wrong. Equation 3.13 is the gradient equation. Firstly the gradient should have a negative sign on the error term so that to maximize the probability we must minimize our error. I am not sure where the phi transpose vector comes into play either.
here is the log likelihood function (which I believe is also wrong since the first term should not have a $\frac{1}{2}$ scalar).
here is the gradient equation 3.13
I have tried to calculate the gradient myself but I’m not able to get the equation they get.