SGD Classifier and Regressor (page 3 of 9) |
One of the limitations of gradient descent is that it only finds local minima (not the global minimum). As soon as the algorithm finds some point that's at a local minimum, it may never escape depending on the step size.
The key is to pick the right step size. A good step size moves toward the minimum rapidly whereby each step makes substantial progress to efficiently find a solution.
If we pick too large of a step size, we may never converge to a local minimum because we overshoot the solution with every step.
If we pick too small of a step size, we will be more likely to converge to a minimum but it will take far more steps than were necessary to do so.