Optimizing Financial Portfolio Management Using Deep Reinforcement Learning And A3C

Authors

  • R. Gokilavani Associate Professor, CHRIST University, Bengaluru, Karnataka, India.
  • Dr. R. Arivukkodi Assistant Professor, Computer Science, Meenakshi College of Arts and Science, Meenakshi Academy of Higher Education and Research, Chennai, Tamil Nadu, India.
  • M. Devi Asssitant professor, Department of EEE, New prince shri Bhavani College of Engineering and Technology, Chennai, Tamil Nadu, India.
  • Diwakar Bhardwaj Department of Computer Engineering & Applications, GLA University, Mathura, Mathura, Uttar Pradesh, India.
  • Ramakrishna Manda Artificial intelligence and Data science, Ramachandra College of Engineering, Eluru, India.
  • Dr.G. Mohana Priya Assistant Professor, Aeronautical Engineering, Mahendra Engineering College, Namakkal, Tamil Nadu, India.

Keywords:

Asynchronous Advantage Actor-Critic (A3C), Deep Reinforcement Learning, Financial Portfolio Optimization, Long Short-Term Memory (LSTM), Sharpe Ratio, Risk-Adjusted Returns, Equity Market Trading.

Abstract

 Due to the dynamic and stochastic nature of global financial markets, conventional portfolio optimization approaches like mean-variance optimization (MVO) or rule-based heuristics are too weak for sustained risk-adjusted performance. This paper presents an improved Asynchronous Advantage Actor-Critic (A3C) approach that combines with the Long Short-Term Memory (LSTM) network for adaptive, real-time financial portfolio management over equity markets. This proposed model combines the asynchronous multi-agent parallelism of A3C and the temporal feature extraction capability of LSTM to maximize cumulative portfolio returns while controlling downside risk exposure at the same time. The actor network continuously learns the optimal portfolio weight allocations while the critic learns the state-value functions, both using a hybrid reward function defined by the Sharpe ratio, a maximum drawdown penalty term and a risk-adjusted return objective. Experiments are carried out on the S&P 500 equity data from 2015 to 2023 and cover various market phases such as bull and bear markets and periods of high and low volatility. The proposed A3C-LSTM model significantly outperforms the baseline methods such as DQN, PPO, A2C and classical MVO, with an annualized return of 23.7%, a Sharpe ratio of 1.68 and a maximum drawdown of −15.3%. The LSTM temporal encoder, a modified reward function, and an asynchronous learning architecture are all confirmed through an ablation study. Results show that the A3C portfolio agents can transfer the knowledge learned from a specific time series to a different one, and they can provide investment strategies as well. This paper pushes the state of the art in the field of deep reinforcement learning and offers a deployable framework for institutional and algorithmic portfolio managers in quantitative finance.

Downloads

Published

2026-05-24

How to Cite

Gokilavani, R., Arivukkodi, D. R., Devi, M., Bhardwaj, D., Manda, R., & Priya, D. M. (2026). Optimizing Financial Portfolio Management Using Deep Reinforcement Learning And A3C. International Journal of Artificial Intelligence and Machine Learning, 6(3s), 617–632. Retrieved from https://www.svedbergopen.com/index.php/ijaiml/article/view/384