arxiv:2505.16925

Risk-Averse Reinforcement Learning with Itakura-Saito Loss

Published on May 22

· Submitted by

i-udovichenko on May 23

Upvote

Authors:

Igor Udovichenko ,

Abstract

Proposed Itakura-Saito divergence-based loss function enhances numerical stability in risk-averse reinforcement learning using exponential utility functions.

AI-generated summary

Risk-averse reinforcement learning finds application in various high-stakes fields. Unlike classical reinforcement learning, which aims to maximize expected returns, risk-averse agents choose policies that minimize risk, occasionally sacrificing expected value. These preferences can be framed through utility theory. We focus on the specific case of the exponential utility function, where we can derive the Bellman equations and employ various reinforcement learning algorithms with few modifications. However, these methods suffer from numerical instability due to the need for exponent computation throughout the process. To address this, we introduce a numerically stable and mathematically sound loss function based on the Itakura-Saito divergence for learning state-value and action-value functions. We evaluate our proposed loss function against established alternatives, both theoretically and empirically. In the experimental section, we explore multiple financial scenarios, some with known analytical solutions, and show that our loss function outperforms the alternatives.

View arXiv page View PDF Add to collection

Community

i-udovichenko

Paper author Paper submitter 2 days ago

The paper proposes a new loss function for risk-averse RL with exponential utility. The loss is mathematically justified (no euristics!) and numerically stable.