DICG
Module Structure
DICG adopts attention mechanism to implicitly construct the coordination graph among agents with soft weighted edge
μij=softmaxj[Attention(ei, ej, Wa)]=softmaxj[ej⊤Waei]
where the observation embedding ei is derived from the observation encoder and observation oi or full history hi
The embeddings are further stacked into a feature matrix E={ei}∈Rn×d and fed to GCN for information integration
E(l+1)=σ(D−21MD−21E(l)Wc(l))=σ(ME(l)Wc(l))
where the adjacent matrix of coordination graph M={μij}∈Rn×n and the degree matrix D=diag{∑jμij}=In, the final embeddings E~ are obtained through m times graph convolution and residual connection between E(0) and E(m)
Module Usage
The information embeddings integrated by DICG can be used for downstream networks in CTCE or CTDE paradigm
Paradigm |
Policy Network |
Value Network |
CTCE |
πθi(ai∣e~i) |
Qw(e~1:n, a1:n) |
CTDE |
πθi(ai∣oi) |
Qw(e~1:n, a1:n) |