EMMA

EMMA

EMMA is designed for embodied environments with visual observations and textual actions

Retrospective LLM Expert

The pretrained LLMs are utilized as a retrospective expert in a parallel text world for EMMA to imitate

  1. the textual state sls_{l} is derived from metadata of simulator by PDDL and TextWorld engine
  2. the LLM critic outputs the retrospection of corresponding trial to the long-term memory pool
  3. the LLM actor outputs the expert action based on stored retrospections and past trajectory

Cross-Modal Imitation

EMMA adopts DPO to maximize the preference of LLM-expert’s action (+) xax_{a}^{\star} over agent-student’s action (-) xax_{a}

maxθE(sv, xa, xa)D[lnσ(βlnπθ(xasv)πref(xasv)βlnπθ(xasv)πref(xasv))]\max_{\theta} \mathcal{E}_{(s_{v},\ x_{a}^{\star},\ x_{a}) \sim \mathcal{D}} \left[ \ln \sigma \left( \beta \ln \frac{\pi_{\theta}(x_{a}^{\star} \mid s_{v})}{\pi_{\mathrm{ref}}(x_{a}^{\star} \mid s_{v})} - \beta \ln \frac{\pi_{\theta}(x_{a} \mid s_{v})}{\pi_{\mathrm{ref}}(x_{a} \mid s_{v})} \right) \right]

where interaction dataset D\mathcal{D} is collected through DAgger to alleviate the cumulative error and distribution shift

The reference agent πref\pi_{\mathrm{ref}} is obtained by behavior cloning from a rule-based expert in the visual world


EMMA
http://example.com/2024/09/15/EMMA/
Author
木辛
Posted on
September 15, 2024
Licensed under