EMMA

EMMA is designed for embodied environments with visual observations and textual actions

Retrospective LLM Expert

The pretrained LLMs are utilized as a retrospective expert in a parallel text world for EMMA to imitate

the textual state $s_{l}$ is derived from metadata of simulator by PDDL and TextWorld engine
the LLM critic outputs the retrospection of corresponding trial to the long-term memory pool
the LLM actor outputs the expert action based on stored retrospections and past trajectory

EMMA adopts DPO to maximize the preference of LLM-expert’s action (+) $x_{a}^{\star}$ over agent-student’s action (-) $x_{a}$

\max_{\theta} \mathcal{E}_{(s_{v},\ x_{a}^{\star},\ x_{a}) \sim \mathcal{D}} \left[ \ln \sigma \left( \beta \ln \frac{\pi_{\theta}(x_{a}^{\star} \mid s_{v})}{\pi_{\mathrm{ref}}(x_{a}^{\star} \mid s_{v})} - \beta \ln \frac{\pi_{\theta}(x_{a} \mid s_{v})}{\pi_{\mathrm{ref}}(x_{a} \mid s_{v})} \right) \right]

where interaction dataset $\mathcal{D}$ is collected through DAgger to alleviate the cumulative error and distribution shift

The reference agent $\pi_{\mathrm{ref}}$ is obtained by behavior cloning from a rule-based expert in the visual world

RL > Language-Assistant

#EMMA

EMMA

http://example.com/2024/09/15/EMMA/

Author

木辛

Posted on

September 15, 2024

Licensed under