FOCAL++

FOCAL++

Context Encoder

FOCAL++ adopts two forms of intra-task attention on offline context for more robust task inference in context encoder

Range Mechanism Description
batch-wise gated attention adaptively recalibrates the weights on batch-wise samples
seqeunce-wise self attention captures the correlation along the transition seqeunce

The two parallel attention module are connected by addition to generate the output task embedding zEϕ(zc)z \sim E_{\phi}(z \mid c)

Contrastive Learning

The context encoder is trained through InfoNCE as a query encoder, along with a momentum counterpart as key encoder

maxϕi=1Tlogp(+ziq, zik)=i=1Tlogexp(ziqzik/τ)j=1Texp(ziqzjk/τ)ziqEϕ(ziqci)zikEϕ(zikci)ciDi\max_{\phi} \sum_{i = 1}^{T} \log p(+ \mid z_{i}^{q},\ z_{i}^{k}) = \sum_{i = 1}^{T} \log \frac{\exp(z_{i}^{q} \cdot z_{i}^{k} / \tau)}{\sum_{j = 1}^{T} \exp(z_{i}^{q} \cdot z_{j}^{k} / \tau)} \qquad z_{i}^{q} \sim E_{\phi}(z_{i}^{q} \mid c_{i}) \quad z_{i}^{k} \sim E_{\phi^{-}}(z_{i}^{k} \mid c_{i}) \quad c_{i} \sim \mathcal{D}_{i}

where the momentum encoder is updated through moving average and progresses more slowly than origin encoder

Meta Behavior Learning

Similarly, FOCAL++ trains behavior regularized actor and critic, which is decoupled with the training of task encoder

The trained policy can be directly deployed to new tasks with a few transition samples to generate task embeddings


FOCAL++
http://example.com/2024/10/11/FOCAL++/
Author
木辛
Posted on
October 11, 2024
Licensed under