EMMA
EMMA
EMMA is designed for embodied environments with visual observations and textual actions

Retrospective LLM Expert
The pretrained LLMs are utilized as a retrospective expert in a parallel text world for EMMA to imitate
- the textual state is derived from metadata of simulator by PDDL and TextWorld engine
- the LLM critic outputs the retrospection of corresponding trial to the long-term memory pool
- the LLM actor outputs the expert action based on stored retrospections and past trajectory
Cross-Modal Imitation
EMMA adopts DPO to maximize the preference of LLM-expert’s action (+) over agent-student’s action (-)
where interaction dataset is collected through DAgger to alleviate the cumulative error and distribution shift

The reference agent is obtained by behavior cloning from a rule-based expert in the visual world
EMMA
http://example.com/2024/09/15/EMMA/