FOCAL++

Context Encoder

FOCAL++ adopts two forms of intra-task attention on offline context for more robust task inference in context encoder

Range	Mechanism	Description
batch-wise	gated attention	adaptively recalibrates the weights on batch-wise samples
seqeunce-wise	self attention	captures the correlation along the transition seqeunce

The two parallel attention module are connected by addition to generate the output task embedding $z \sim E_{\phi}(z \mid c)$

The context encoder is trained through InfoNCE as a query encoder, along with a momentum counterpart as key encoder

\max_{\phi} \sum_{i = 1}^{T} \log p(+ \mid z_{i}^{q},\ z_{i}^{k}) = \sum_{i = 1}^{T} \log \frac{\exp(z_{i}^{q} \cdot z_{i}^{k} / \tau)}{\sum_{j = 1}^{T} \exp(z_{i}^{q} \cdot z_{j}^{k} / \tau)} \qquad z_{i}^{q} \sim E_{\phi}(z_{i}^{q} \mid c_{i}) \quad z_{i}^{k} \sim E_{\phi^{-}}(z_{i}^{k} \mid c_{i}) \quad c_{i} \sim \mathcal{D}_{i}

where the momentum encoder is updated through moving average and progresses more slowly than origin encoder

Similarly, FOCAL++ trains behavior regularized actor and critic, which is decoupled with the training of task encoder

The trained policy can be directly deployed to new tasks with a few transition samples to generate task embeddings

RL > Meta-Learning

#FOCAL++

FOCAL++

http://example.com/2024/10/11/FOCAL++/

Author

木辛

Posted on

October 11, 2024

Licensed under