DGN

DGN

General Structure

DGN views the multi-agent environment as a dynamic coordination graph varing over time, where the adjacency among agents is determined by a specific metric like distance and the neighbouring agents can communicate with each other

The agent ii fuses the local information with its neighbours N(i)N(i) through multi-head attention in a convolutional layer

hiσ{concatm[jN+(i)αijmWVmhj]}s.t.αim=softmaxjN+(i)(τ(WQmhi)(WKmhj))N+(i)=N(i){i}h_{i} \leftarrow \sigma \left\{ \operatorname{concat}_{m} \left[ \sum_{j \in N^{+}(i)} \alpha_{ij}^{m} W_{V}^{m} h_{j} \right] \right\} \quad \operatorname{s.t.} \quad \alpha_{i}^{m} = \operatorname{softmax}_{j \in N^{+}(i)} \Big( \tau (W_{Q}^{m} h_{i})^{\top} (W_{K}^{m} h_{j}) \Big) \quad N^{+}(i) = N(i) \cup \{ i \}

where the hih_{i} is initially encoded from the individual observation oio_{i} and processed by multiple convolutional layers

The Q network of agent ii takes the feature vectors of all preceding layers as input and output the individual action value

Learning Objective

The network parameters θ\theta are shared by all agents and trained through Target Q-Learning algorithm end-to-end

L(θ)=E(o1:n, a1:n, r1:n, o1:n)D{1Ni=1N[ri+γmaxaQθ(oi, ojN(i), a)yiQθ(oi, ojN(i), ai)]2}\mathcal{L}(\theta) = \mathcal{E}_{(o_{1:n},\ a_{1:n},\ r_{1:n},\ o_{1:n}') \in \mathcal{D}} \Bigg\{ \frac{1}{N} \sum_{i = 1}^{N} \bigg[ \underset{y_{i}}{\underbrace{r_{i} + \gamma \max_{a} Q_{\theta^{-}}(o_{i}',\ o_{j \in N(i)}',\ a)}} - Q_{\theta}(o_{i},\ o_{j \in N(i)},\ a_{i}) \bigg]^{2} \Bigg\}

In addition, to make coordination relationship more stable and consistent over time, DGN introduces a temporal relation regularization between the attention weights distribution αim\alpha_{i}^{m} in a high-level layer κ\kappa on current step and next step

L(θ)=E(o1:n, a1:n, r1:n, o1:n)D{1Ni=1N[yiQθ(oi, ojN(i), ai)]2+λ1Mm=1MDKL(αim, κ  α~im, κ)}\mathcal{L}(\theta) = \mathcal{E}_{(o_{1:n},\ a_{1:n},\ r_{1:n},\ o_{1:n}') \in \mathcal{D}} \left\{ \frac{1}{N} \sum_{i = 1}^{N} \Big[ y_{i} - Q_{\theta}(o_{i},\ o_{j \in N(i)},\ a_{i}) \Big]^{2} + \lambda \frac{1}{M} \sum_{m = 1}^{M} D_{\text{KL}} \Big( \alpha_{i}^{m,\ \kappa}\ \|\ \tilde{\alpha}_{i}^{m,\ \kappa} \Big) \right\}


DGN
http://example.com/2024/10/03/DGN/
Author
木辛
Posted on
October 3, 2024
Licensed under