DICG

Module Structure

DICG adopts attention mechanism to implicitly construct the coordination graph among agents with soft weighted edge

\mu_{ij} = \operatorname{softmax}_{j} \Big[ \operatorname{Attention}(e_{i},\ e_{j},\ W_{a}) \Big] = \operatorname{softmax}_{j} \Big[ e_{j}^{\top} W_{a} e_{i} \Big]

where the observation embedding $e_{i}$ is derived from the observation encoder and observation $o^{i}$ or full history $h^{i}$

The embeddings are further stacked into a feature matrix $E = \{ e_{i} \} \in \mathbb{R}^{n \times d}$ and fed to GCN for information integration

E^{(l + 1)} = \sigma \left( D^{-\frac{1}{2}} M D^{-\frac{1}{2}} E^{(l)} W_{c}^{(l)} \right) = \sigma \left( M E^{(l)} W_{c}^{(l)} \right)

where the adjacent matrix of coordination graph $M = \{ \mu_{ij} \} \in \mathbb{R}^{n \times n}$ and the degree matrix $D = \operatorname{diag} \left\{ \sum_{j} \mu_{ij} \right\} = I_{n}$ , the final embeddings $\tilde{E}$ are obtained through $m$ times graph convolution and residual connection between $E^{(0)}$ and $E^{(m)}$

Module Usage

The information embeddings integrated by DICG can be used for downstream networks in CTCE or CTDE paradigm

Paradigm	Policy Network	Value Network
CTCE	$\pi_{\theta}^{i}(a_{i} \mid \tilde{e}_{i})$	$Q_{w}(\tilde{e}_{1:n},\ a_{1:n})$
CTDE	$\pi_{\theta}^{i}(a_{i} \mid o_{i})$	$Q_{w}(\tilde{e}_{1:n},\ a_{1:n})$

RL > Multi-Agent

#DICG

DICG

http://example.com/2024/10/02/DICG/

Author

木辛

Posted on

October 2, 2024

Licensed under

DGN Previous

FOCAL Next