G2ANet

Graph Extraction

Given the locality of interaction, G2ANet constructs the relationship between agents as an agent-coordination graph

The coordination graph is extracted from two-stage attention mechanism and individual observation embeddings $h_{1:N}$

The hard attention of first stage samples the binary hard attention weights $w_{h}^{i,\ j}$ for each pair of agents through Bi-LSTM

w_{h}^{i,\ j} \sim \operatorname{Bernoulli}(p^{i,\ j}) \qquad p^{i,\ j} = f(\operatorname{BiLSTM}(h_{i},\ h_{1:N})_{j}) \in [0,\ 1]

To enable the back-propagation of gradients, the above sampling process is approximated through gumbel-softmax

w_{h}^{i,\ j} = \operatorname{gum}(p^{i,\ j}) = \frac{\exp \left[ \dfrac{1}{\tau} \Big( g_{+}^{i,\ j} + \log(p^{i,\ j}) \Big) \right]}{\exp \left[ \dfrac{1}{\tau} \Big( g_{+}^{i,\ j} + \log(p^{i,\ j}) \Big) \right] + \exp \left[ \dfrac{1}{\tau} \Big( g_{-}^{i,\ j} + \log(1 - p^{i,\ j}) \Big) \right]}

where the $g_{\pm}^{i,\ j} \sim \operatorname{Gumbel}(0,\ 1)$ and $\tau$ is the temperature coefficient which controls the smoothness of softmax

The soft attention of second stage performs scaled dot product based on the hard attention weights in the first stage

w_{s}^{i,\ j} = \operatorname{softmax}_{j} \Big[ w_{h}^{i,\ j} \cdot (W_{q} h_{i}) \cdot (W_{k} h_{j}) \Big]

which forms the final output weights of edges in the coordination graph to be used in the downstream network or module

Based on the weighted coordination graph, G2ANet adopts GNN to integrate the information of neighbouring agents

Policy	Value

The policy and value networks can be further derived from the GNN encoded embeddings and trained end-to-end

RL > Multi-Agent

#G2ANet

G2ANet

http://example.com/2024/10/09/G2ANet/

Author

木辛

Posted on

October 9, 2024

Licensed under