SGD Classifier and Regressor

SGD Classifier and Regressor (page 3 of 9)

One of the limitations of gradient descent is that it only finds local minima (not the global minimum). As soon as the algorithm finds some point that's at a local minimum, it may never escape depending on the step size. The key is to pick the right step size. A good step size moves toward the minimum rapidly whereby each step makes substantial progress to efficiently find a solution. If we pick too large of a step size, we may never converge to a local minimum because we overshoot the solution with every step. If we pick too small of a step size, we will be more likely to converge to a minimum but it will take far more steps than were necessary to do so.

GD Image General