Reward-free decision-making, planning and learning
Reward-free decision-making, planning and learning
Reward-free decision-making, planning and learning
In this project, 3 PIs will work together to establish a new theory of behavior in the absence of rewards.
The usual approach to analyze naturalistic behavior is to define its function as some form of reward or utility maximization. However, inferring the reward function for natural agents or designing one for artificial ones is problematic due to the unobservability of internal states and to the appearance of unintended behavior, respectively. Here, we propose abandoning the idea of reward maximization and propose a principle of behavior based on the intrinsic motivation to maximize the occupancy of future action and state paths. We will reconceptualize `reward' as means to occupy paths, instead of the goals. We will study measures of path occupancy and provide rigorous analysis of the optimal policy. We will study whether goal-directed behavior emerges from this principle by applying it to various discrete and continuous state tasks, aiming at providing a proof of concept that goal-directedness is possible in the complete absence of external reward maximization. In this project, 3 PIs will work together to establish a new theory of behavior in the absence of rewards.
The project will support a post-doctoral position during 2 years (start expected in 2024).