LatentObservations/潜在观察
AboutAuthor

Posts

  • May 16, 2025

    Notes on Linear Gaussian Models

  • Apr 12, 2025

    Empowerment: A New Reward-Free Paradigm for Human-AI Collaboration?

  • Mar 28, 2025

    Introducing CleanIL for Imitation and Inverse Reinforcement Learning

  • Mar 28, 2025

    Observations and Implementation Tricks for Imitation and Inverse Reinforcement Learning

  • Oct 18, 2024

    Resource Rational Adaptive Inference Time Compute

  • Oct 12, 2024

    RNNs are Switching State Space Models?

  • Oct 5, 2024

    Simple Alchemy for Meta Reinforcement Learning

  • Sep 25, 2024

    A Tutorial on Dual Reinforcement Learning - Mostly Intuitions

  • Feb 2, 2024

    Do We Need Reward in RLHF? DPO and the Unlikelihood Family Curse

  • Jan 15, 2024

    Why do we need RLHF? Imitation, Inverse RL, and the role of reward

  • Nov 24, 2023

    On the Exploration-Exploitation Tradeoff and Identifying Epistemic Actions in POMDPs

  • Sep 7, 2023

    Another Attempt to Rationalize Expected Free Energy: Insights From Reinforcement Learning

  • Jul 30, 2023

    Making Sense of Active Inference: Optimal Control Without Cost Function

  • Jun 26, 2023

    The Uniqueness of Agent Beliefs in Meta and Bayesian Reinforcement Learning

  • Jun 23, 2023

    The Uniqueness of Meta Learning and Autoregressive Pre-training

  • May 27, 2023

    Bayesian Theory of Mind for RLHF: Towards Richer Human Models for Alignment

  • Jun 18, 2022

    Dummy First Post

subscribe via RSS

LatentObservations/潜在观察

  • LatentObservations/潜在观察
  • ran-weii
  • wei-ran
  • _RanW_