LatentObservations/潜在观察

Posts

Jan 4, 2026
Branching Reinforcement Learning
May 16, 2025
Notes on Linear Gaussian Models
Apr 12, 2025
Empowerment: A New Reward-Free Paradigm for Human-AI Collaboration?
Mar 28, 2025
Introducing CleanIL for Imitation and Inverse Reinforcement Learning
Mar 28, 2025
Observations and Implementation Tricks for Imitation and Inverse Reinforcement Learning
Oct 18, 2024
Resource Rational Adaptive Inference Time Compute
Oct 12, 2024
RNNs are Switching State Space Models?
Oct 5, 2024
Simple Alchemy for Meta Reinforcement Learning
Sep 25, 2024
A Tutorial on Dual Reinforcement Learning - Mostly Intuitions
Feb 2, 2024
Do We Need Reward in RLHF? DPO and the Unlikelihood Family Curse
Jan 15, 2024
Why do we need RLHF? Imitation, Inverse RL, and the role of reward
Nov 24, 2023
On the Exploration-Exploitation Tradeoff and Identifying Epistemic Actions in POMDPs
Sep 7, 2023
Another Attempt to Rationalize Expected Free Energy: Insights From Reinforcement Learning
Jul 30, 2023
Making Sense of Active Inference: Optimal Control Without Cost Function
Jun 26, 2023
The Uniqueness of Agent Beliefs in Meta and Bayesian Reinforcement Learning
Jun 23, 2023
The Uniqueness of Meta Learning and Autoregressive Pre-training
May 27, 2023
Bayesian Theory of Mind for RLHF: Towards Richer Human Models for Alignment
Jun 18, 2022
Dummy First Post