The MESSENGER environment is designed as a multi-task grid navigation game in which the agent need to retrive the message from a specific entity and deliver it to a target entity while avoiding the enermy entities
An instance of the environment contains task-relevant free-form textual manual where each item describes
the role of an entity (message, target or enermy)
the movement dynamics of an entity (stationary, chasing or fleeing)
Different from other similar environments, the mapping from entities to manual M:Z↦E isn’t available
EMMmA Model
For a MESSENGER game (S,A,Ω,P,O,R∣E,Z,M), the policy network πθ(at∣ot−k:t,Z) consists of
Text Encoder
Each manual item z∈Z is encoded to a sequence of tokens t1:n through a pretrained BERT-based model. The key vector kz and value vector vz of manual item z are obtained through the following transformation