Adaptive Portfolio Optimization via PPO-HER: A Reinforcement Learning Framework for Non-Stationary Markets
Abstract
We propose PPO-HER, a novel reinforcement learning framework for adaptive portfolio optimization in non-stationary markets, which integrates Proximal Policy Optimization (PPO) with Hindsight Experience Replay (HER) to address sparse rewards and dynamic market conditions. The proposed method reformulates the portfolio optimization problem as a goal-conditioned Markov Decision Process, where the agent learns to reallocate assets by processing spatiotemporal market data through a Transformer-based actor network. The reward function combines logarithmic returns, risk penalties, and sparse bonuses, while HER relabels suboptimal trajectories to improve sample efficiency. Moreover, the architecture employs a TimeSformer for cross-asset attention and a GRU-based critic with spectral normalization to stabilize training. Experimental results demonstrate that PPO-HER outperforms conventional methods in terms of risk-adjusted returns, particularly during regime shifts detected by an auxiliary Changepoint-LSTM module. The framework is implemented using cuDNN-accelerated PyTorch, enabling efficient high-frequency trading with liquidity constraints. Our approach achieves state-of-the-art performance by explicitly modeling non-stationary dependencies and dynamically adjusting reward shaping based on realized volatility.
Keywords
Portfolio Optimization, Reinforcement Learning, PPO-HER, Non-Stationary Markets, Sample Efficiency
References
- P Jorion (1992) Portfolio optimization in practice. Financial analysts journal.
- J Jang & NY Seong (2023) Deep reinforcement learning for stock portfolio optimization by con-necting with modern portfolio theory. Expert Systems with Applications.
- J Schulman, F Wolski, P Dhariwal, A Radford, et al. (2017) Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.
- M Andrychowicz, F Wolski, A Ray, et al. (2017) Hindsight experience replay. In Advances in Neural Information Processing Systems.
- XY Liu, Z Xia, J Rui, J Gao, H Yang, et al. (2022) FinRL-Meta: Market environments and benchmarks for data-driven financial reinforcement learning. In Advances in Neural Information Processing Systems.
- Y Fei, Z Yang & Z Wang (2021) Risk-sensitive reinforcement learning with function approximation: A debiasing approach. In International Conference on Machine Learning.
- MG Bellemare, W Dabney & M Rowland (2023) Distributional reinforcement learning. books.google.com.
- Y Huang, C Zhou, K Cui & X Lu (2024) A multi-agent reinforcement learning framework for opti-mizing financial trading strategies based on TimesNet. Expert Systems with Applications.
- N Casas (2017) Deep deterministic policy gradient for urban traffic light control. arXiv preprint arXiv:1703.09035.
- T Haarnoja, A Zhou, K Hartikainen, G Tucker, et al. (2018) Soft actor-critic algorithms and ap-plications. arXiv preprint arXiv:1812.05905.
- J Jang & NY Seong (2023) Deep reinforcement learning for stock portfolio optimization by connecting with modern portfolio theory. Expert Systems with Applications.
- Q Kang, H Zhou & Y Kang (2018) An asynchronous advantage actor-critic reinforcement learning method for stock selection and portfolio management. In Proceedings of the 2nd International Con-ference on Big Data Engineering.
- L Wei & Z Weiwei (2020) Research on portfolio optimization models using deep deterministic policy gradient. In 2020 International Conference on Robots & Intelligent System (ICRIS).
- F KHEMLICHI, H CHOUGRAD, SEBEN ALI, et al. (2023) MULTI-AGENT PROXIMAL POLICY OPTIMIZATION FOR PORTFOLIO OPTIMIZATION. Journal of Theoretical and Applied Information Technology.
- W Wu & CA Hargreaves (2024) Deep Reinforcement Learning Approach to Portfolio Optimization in the Australian Stock Market. AI, Computer Science and Robotics Technology.
- Z Zhan & SK Kim (2024) Versatile time-window sliding machine learning techniques for stock market forecasting. Artificial Intelligence Review.
- A Sattar, A Sarwar, S Gillani, M Bukhari, S Rho, et al. (2025) A Novel RMS-Driven Deep Rein-forcement Learning for Optimized Portfolio Management in Stock Trading. IEEE Access.
- Y Liu, D Mikriukov, OC Tjahyadi, G Li, TR Payne, et al. (2023) Revolutionising Financial Portfolio Management: The Non-Stationary Transformer’s Fusion of Macroeconomic Indicators and Sentiment Analysis in a Deep …. Applied Sciences.
- Z Bing, D Lerch, K Huang, et al. (2022) Meta-reinforcement learning in non-stationary and dy-namic environments. IEEE Transactions on Pattern Analysis and Machine Intelligence.
- T Schaul, J Quan, I Antonoglou & D Silver (2015) Prioritized experience replay. arXiv preprint arXiv:1511.05952.
- B Manela & A Biess (2021) Bias-reduced hindsight experience replay with virtual goal prioritization. Neurocomputing.
- E Chan (2013) Algorithmic trading: winning strategies and their rationale. books.google.com.
- L Xiao, X Wei, Y Xu, X Xu, K Gong, et al. (2023) Truncated Quantile Critics Algorithm for Cryp-tocurrency Portfolio Optimization. In IEEE International Conference on Systems, Man, and Cyber-netics.
- Z Chen, S Wang, D Yan & Y Li (2024) A Spatio-Temporl Deepfake Video Detection Method Based on TimeSformer-CNN. In 2024 Third International Conference on Artificial Intelligence and Smart Energy.
- Y Hou, W Gu, K Yang & L Dang (2023) Deep Reinforcement Learning Recommendation System based on GRU and Attention Mechanism. Engineering Letters.
- J He, C Hua, C Zhou & Z Zheng (2025) Reinforcement-Learning Portfolio Allocation with Dynamic Embedding of Market Information. arXiv preprint arXiv:2501.17992.
- FJ Fabozzi, HM Markowitz & F Gupta (2008) Portfolio selection. Handbook of finance.
- F Baldovin, D Bovina, F Camana & AL Stella (2011) Modeling the non-Markovian, non-stationary scaling dynamics of financial markets, Online Draft.
- J Fu, J Wei & H Yang (2014) Portfolio optimization in a regime-switching market with derivatives. European Journal of Operational Research.
- QYE Lim, Q Cao & C Quek (2022) Dynamic portfolio rebalancing through reinforcement learning. Neural Computing and Applications.
- F Morais, Z Serrasqueiro & JJS Ramalho (2020) The zero-leverage phenomenon: A bivariate probit with partial observability approach. Research in International Business and Finance.
- I Palupi, BA Wahyudi & AP Putra (2021) Implementation of hidden markov model (HMM) to predict financial market regime. In 2021 9th International Conference on Cyber and IT Service Management (CITSM).
- V Matta, P Braca, S Marano, et al. (2016) Diffusion-based adaptive distributed detection: Steady-state performance in the slow adaptation regime. IEEE Transactions on Signal Processing.
- A Puzanov & K Cohen (2018) Deep reinforcement one-shot learning for change point detection. In 2018 56th Annual Allerton Conference on Communication, Control, and Computing (Allerton).
- M Martens (2002) Measuring and forecasting S&P 500 index‐futures volatility using high‐frequency data. Journal of Futures Markets: Futures, Options, and Other Derivative Products.
- S Lahmiri & S Bekiros (2021) Deep learning forecasting in cryptocurrency high-frequency trading. Cognitive Computation.
- B Xiao, H Yu, L Fang & S Ding (2020) Estimating the connectedness of commodity futures using a network approach. Journal of Futures Markets.
- TB Klos & B Nooteboom (2001) Agent-based computational transaction cost economics. Journal of Economic Dynamics and Control.
- Z Shan (2024) Optimal Hedging via Deep Reinforcement Learning with Soft Actor-Critic. cdn.shanghai.nyu.edu.
- A Kalai & S Vempala (2002) Efficient algorithms for universal portfolios. Journal of Machine Learning Research.
- Y Li, W Zheng & Z Zheng (2019) Deep robust reinforcement learning for practical algorithmic trading. IEEE Access.
- N Bjorck, CP Gomes, et al. (2021) Towards deeper deep reinforcement learning with spectral normalization. In Advances in Neural Information Processing Systems.
- TN Rollinger & ST Hoffman (2013) Sortino: a ’sharper’ ratio. Chicago, Illinois: Red Rock Capital.
- RS Mariano & D Preve (2012) Statistical tests for multiple forecast comparison. Journal of econometrics.