DICG

DICG

Module Structure

DICG adopts attention mechanism to implicitly construct the coordination graph among agents with soft weighted edge

μij=softmaxj[Attention(ei, ej, Wa)]=softmaxj[ejWaei]\mu_{ij} = \operatorname{softmax}_{j} \Big[ \operatorname{Attention}(e_{i},\ e_{j},\ W_{a}) \Big] = \operatorname{softmax}_{j} \Big[ e_{j}^{\top} W_{a} e_{i} \Big]

where the observation embedding eie_{i} is derived from the observation encoder and observation oio^{i} or full history hih^{i}

The embeddings are further stacked into a feature matrix E={ei}Rn×dE = \{ e_{i} \} \in \mathbb{R}^{n \times d} and fed to GCN for information integration

E(l+1)=σ(D12MD12E(l)Wc(l))=σ(ME(l)Wc(l))E^{(l + 1)} = \sigma \left( D^{-\frac{1}{2}} M D^{-\frac{1}{2}} E^{(l)} W_{c}^{(l)} \right) = \sigma \left( M E^{(l)} W_{c}^{(l)} \right)

where the adjacent matrix of coordination graph M={μij}Rn×nM = \{ \mu_{ij} \} \in \mathbb{R}^{n \times n} and the degree matrix D=diag{jμij}=InD = \operatorname{diag} \left\{ \sum_{j} \mu_{ij} \right\} = I_{n}, the final embeddings E~\tilde{E} are obtained through mm times graph convolution and residual connection between E(0)E^{(0)} and E(m)E^{(m)}

Module Usage

The information embeddings integrated by DICG can be used for downstream networks in CTCE or CTDE paradigm

Paradigm Policy Network Value Network
CTCE πθi(aie~i)\pi_{\theta}^{i}(a_{i} \mid \tilde{e}_{i}) Qw(e~1:n, a1:n)Q_{w}(\tilde{e}_{1:n},\ a_{1:n})
CTDE πθi(aioi)\pi_{\theta}^{i}(a_{i} \mid o_{i}) Qw(e~1:n, a1:n)Q_{w}(\tilde{e}_{1:n},\ a_{1:n})

DICG
http://example.com/2024/10/02/DICG/
Author
木辛
Posted on
October 2, 2024
Licensed under