DGN

General Structure

DGN views the multi-agent environment as a dynamic coordination graph varing over time, where the adjacency among agents is determined by a specific metric like distance and the neighbouring agents can communicate with each other

The agent $i$ fuses the local information with its neighbours $N(i)$ through multi-head attention in a convolutional layer

h_{i} \leftarrow \sigma \left\{ \operatorname{concat}_{m} \left[ \sum_{j \in N^{+}(i)} \alpha_{ij}^{m} W_{V}^{m} h_{j} \right] \right\} \quad \operatorname{s.t.} \quad \alpha_{i}^{m} = \operatorname{softmax}_{j \in N^{+}(i)} \Big( \tau (W_{Q}^{m} h_{i})^{\top} (W_{K}^{m} h_{j}) \Big) \quad N^{+}(i) = N(i) \cup \{ i \}

where the $h_{i}$ is initially encoded from the individual observation $o_{i}$ and processed by multiple convolutional layers

The Q network of agent $i$ takes the feature vectors of all preceding layers as input and output the individual action value

Learning Objective

The network parameters $\theta$ are shared by all agents and trained through Target Q-Learning algorithm end-to-end

\mathcal{L}(\theta) = \mathcal{E}_{(o_{1:n},\ a_{1:n},\ r_{1:n},\ o_{1:n}') \in \mathcal{D}} \Bigg\{ \frac{1}{N} \sum_{i = 1}^{N} \bigg[ \underset{y_{i}}{\underbrace{r_{i} + \gamma \max_{a} Q_{\theta^{-}}(o_{i}',\ o_{j \in N(i)}',\ a)}} - Q_{\theta}(o_{i},\ o_{j \in N(i)},\ a_{i}) \bigg]^{2} \Bigg\}

In addition, to make coordination relationship more stable and consistent over time, DGN introduces a temporal relation regularization between the attention weights distribution $\alpha_{i}^{m}$ in a high-level layer $\kappa$ on current step and next step

\mathcal{L}(\theta) = \mathcal{E}_{(o_{1:n},\ a_{1:n},\ r_{1:n},\ o_{1:n}') \in \mathcal{D}} \left\{ \frac{1}{N} \sum_{i = 1}^{N} \Big[ y_{i} - Q_{\theta}(o_{i},\ o_{j \in N(i)},\ a_{i}) \Big]^{2} + \lambda \frac{1}{M} \sum_{m = 1}^{M} D_{\text{KL}} \Big( \alpha_{i}^{m,\ \kappa}\ \|\ \tilde{\alpha}_{i}^{m,\ \kappa} \Big) \right\}

RL > Multi-Agent

#DGN

DGN

http://example.com/2024/10/03/DGN/

Author

木辛

Posted on

October 3, 2024

Licensed under

DCG Previous

DICG Next