Dieses Ergebnis aus BASE kann Gästen nicht angezeigt werden. Login für vollen Zugriff.

Risk-averse policy optimization via risk-neutral policy optimization

Title:	Risk-averse policy optimization via risk-neutral policy optimization
Authors:	Bisi L.; Santambrogio D.; Sandrelli F.; Tirinzoni A.; Ziebart B. D.; Restelli M.
Contributors:	Bisi, L.; Santambrogio, D.; Sandrelli, F.; Tirinzoni, A.; Ziebart, B. D.; Restelli, M.
Publication Year:	2022
Collection:	RE.PUBLIC@POLIMI - Research Publications at Politecnico di Milano
Subject Terms:	Reinforcement learning; Risk-aversion; Risk-sensitivity
Description:	Keeping risk under control is a primary objective in many critical real-world domains, including finance and healthcare. The literature on risk-averse reinforcement learning (RL) has mostly focused on designing ad-hoc algorithms for specific risk measures. As such, most of these algorithms do not easily generalize to measures other than the one they are designed for. Furthermore, it is often unclear whether state-of-the-art risk-neutral RL algorithms can be extended to reduce risk. In this paper, we take a step towards overcoming these limitations, proposing a single framework to optimize some of the most popular risk measures, including conditional value-at-risk, utility functions, and mean-variance. Leveraging recent theoretical results on state augmentation, we transform the decision-making process so that optimizing the chosen risk measure in the original environment is equivalent to optimizing the expected cost in the transformed one. We then present a simple risk-sensitive meta-algorithm that transforms the trajectories it collects from the environment and feeds these into any risk-neutral policy optimization method. Finally, we provide extensive experiments that show the benefits of our approach over existing ad-hoc methodologies in different domains, including the Mujoco robotic suite and a real-world trading dataset.
Document Type:	article in journal/newspaper
Language:	English
Relation:	info:eu-repo/semantics/altIdentifier/wos/WOS:000835648600003; volume:311; firstpage:1; lastpage:16; numberofpages:16; journal:ARTIFICIAL INTELLIGENCE; https://hdl.handle.net/11311/1231797
DOI:	10.1016/j.artint.2022.103765
Availability:	https://hdl.handle.net/11311/1231797; https://doi.org/10.1016/j.artint.2022.103765
Rights:	info:eu-repo/semantics/openAccess
Accession Number:	edsbas.65F7CAC0
Database:	BASE