Abstract Search

ISEF | Projects Database | Finalist Abstract

Back to Search Results | Print PDF

DynaGrad: A Novel Gradient Descent With Adaptive Dual Learning Rates & Momenta for Improved Optimization and Accelerated Convergence in Deep Neural Networks

Booth Id:
SOFT046T

Category:
Systems Software

Year:
2024

Finalist Names:
Kamal, Pratishrut (School: William G. Enloe High School)
Vadlamudi, Venkata Varshith (School: William G. Enloe High School)

Abstract:
Machine Learning (ML) optimizers are widely used to optimize the weights and biases of deep neural networks. However, current methods can suffer from issues like slow convergence and getting trapped in poor local minima. In this work, we propose a novel gradient descent algorithm with dual adaptive learning rates and momenta that aims to improve optimization and achieve faster convergence. The key idea is to dynamically adjust the learning rate and momentum for each weight and bias based on the magnitude and direction of its partial derivative while incorporating a slow and fast gradient, moving at different paces along the loss function. Weights and biases with larger gradient magnitudes are updated more aggressively, while those with noisy gradients are updated more conservatively. The momentum is also adapted to accelerate training along dimensions with consistent gradient directions and dampen oscillations in other dimensions. We evaluate our proposed algorithm on various deep neural network architectures and datasets. Results demonstrate faster decreases in the training loss and test error compared to the current state-of-the-art ML optimizers, and other adaptive methods. The improved optimization enables reduced training times and generally leads to better model performance. This adaptive gradient descent approach can enable more efficient training of deep neural networks across a variety of applications. Our approach, DynaGrad, is 15.6% faster than state-of-the-art algorithms and achieves a 2% lower loss. This leads to higher accuracy in deep neural networks, for medical imaging tasks, and any kind of research that involves advanced machine learning techniques.