SGD Classifier and Regressor (page 2 of 9) |
Think of a function f(x,y) that defines some mountainous terrain. The gradient evaluated at any point in the mountain represents the direction of steepest ascent.
If we wanted to maximize the function, then we could start at a random input and as many times as we can, take a small step in the direction of the gradient to move uphill to ascend the mountain.
To minimize the function, we can instead follow the negative (not the positive) of the gradient and thus go in the direction of steepest descent. This is gradient descent.
Starting from an initial guess, we keep improving little by little until we find a local minimum. This process may take thousands of iterations, so this process cannot be done manually.