Curvature-Aware Stochastic Gradient Descent Algorithms for Non-Convex Landscape Navigation
Keywords:
Curvature-Aware SGD, Non-Convex Optimization, Adaptive Learning Rate, Hessian-Vector Approximation, Convergence Acceleration, Deep Learning, Optimization AlgorithmsAbstract
Machine learning optimization algorithms often face challenges in navigating highly complex non-convex surfaces, leading to convergence to poor solutions because of saddle points, sharp minima, and plateaus. In this paper, a new algorithm, namely Curvature-aware Stochastic Gradient Descent (CA-SGD), is developed by combining curvature estimates through Hessian vector product approximations to adaptively vary the step size of optimization according to the geometry of the local landscape. The method strikes a balance between computational tractability and geometry-aware update, hence, improving efficiency. The CA-SGD algorithm was tested using both synthetic and real-life benchmarks, such as the Rosenbrock problem and the MNIST benchmark dataset. The results obtained from the experiments show that CA-SGD is better compared to other algorithms such as SGD, RMSProp, and Adam. The lowest Rosenbrock loss value achieved was 0.82, while the highest accuracy attained by using the MNIST dataset was 98.5%, with the minimum iteration being 650. It can therefore be deduced that the application of CA-SGD can lead to efficient solutions to high-dimensional optimization problems. Future work will entail implementing CA-SGD in deep neural networks, meta-learning, as well as Hessian approximations.




