G2ANet

G2ANet

Graph Extraction

Given the locality of interaction, G2ANet constructs the relationship between agents as an agent-coordination graph

The coordination graph is extracted from two-stage attention mechanism and individual observation embeddings h1:Nh_{1:N}

Hard Attention

The hard attention of first stage samples the binary hard attention weights whi, jw_{h}^{i,\ j} for each pair of agents through Bi-LSTM

whi, jBernoulli(pi, j)pi, j=f(BiLSTM(hi, h1:N)j)[0, 1]w_{h}^{i,\ j} \sim \operatorname{Bernoulli}(p^{i,\ j}) \qquad p^{i,\ j} = f(\operatorname{BiLSTM}(h_{i},\ h_{1:N})_{j}) \in [0,\ 1]

To enable the back-propagation of gradients, the above sampling process is approximated through gumbel-softmax

whi, j=gum(pi, j)=exp[1τ(g+i, j+log(pi, j))]exp[1τ(g+i, j+log(pi, j))]+exp[1τ(gi, j+log(1pi, j))]w_{h}^{i,\ j} = \operatorname{gum}(p^{i,\ j}) = \frac{\exp \left[ \dfrac{1}{\tau} \Big( g_{+}^{i,\ j} + \log(p^{i,\ j}) \Big) \right]}{\exp \left[ \dfrac{1}{\tau} \Big( g_{+}^{i,\ j} + \log(p^{i,\ j}) \Big) \right] + \exp \left[ \dfrac{1}{\tau} \Big( g_{-}^{i,\ j} + \log(1 - p^{i,\ j}) \Big) \right]}

where the g±i, jGumbel(0, 1)g_{\pm}^{i,\ j} \sim \operatorname{Gumbel}(0,\ 1) and τ\tau is the temperature coefficient which controls the smoothness of softmax

Soft Attention

The soft attention of second stage performs scaled dot product based on the hard attention weights in the first stage

wsi, j=softmaxj[whi, j(Wqhi)(Wkhj)]w_{s}^{i,\ j} = \operatorname{softmax}_{j} \Big[ w_{h}^{i,\ j} \cdot (W_{q} h_{i}) \cdot (W_{k} h_{j}) \Big]

which forms the final output weights of edges in the coordination graph to be used in the downstream network or module

Network Architecture

Based on the weighted coordination graph, G2ANet adopts GNN to integrate the information of neighbouring agents

Policy Value

The policy and value networks can be further derived from the GNN encoded embeddings and trained end-to-end


G2ANet
http://example.com/2024/10/09/G2ANet/
Author
木辛
Posted on
October 9, 2024
Licensed under