GAT-MF
Weighted MF Approximation
GAT-MF assumes the joint action value function can be factorized as the sum of local pairwise action value function
Q(s, a)≜j∑Qj(s, a)≜j∑Wj1j∈Nj∑wjkQ~j(sj, sk, aj, ak)Wj=k∈Nj∑wjk
where Qj(s, a) can be approximated by the pairwise action value with weighted mean of neighbours’ state and action
s~j=Wj1k∈Nj∑wjkska~j=Wj1k∈Nj∑wjkak
The approximation is derived from Taylor expansion at point (sj, s~j, aj, a~j) with a second order small error
Qj(s, a)=Wj1j∈Nj∑wjkQ~j(sj, sk, aj, ak)=Wj1j∈Nj∑wjkQ~j(sj, sks~j+δsjk, aj, aka~j+δajk)=Wj1j∈Nj∑wjk[Q~j(sj, s~j, aj, a~j)Q~0+∇s~jQ~0⋅δsjk+∇a~jQ~0⋅δajk+ojko(∥∥∥∥δsjk, δajk∥∥∥∥22)]=Q~0+∇s~jQ~0⋅0Wj1j∈Nj∑wjk(s~j−sk)+∇a~jQ~0⋅0Wj1j∈Nj∑wjk(a~j−ak)+Wj1j∈Nj∑wjkojk=Q~0+Wj1j∈Nj∑wjkojk≈Q~0=Q~j(sj, s~j, aj, a~j)
Similarly, the policy of each agent can also be approximated through weighted mean field with corresponding weights ujk
πj(s)≈π^j(sj, s^j)=π^j⎣⎢⎡sj, Uj1k∈Nj∑ujksk⎦⎥⎤Uj=k∈Nj∑ujk
Graph Construction
GAT-MF designs a graph attention mechanism to construct coordination graph with dynamic weights between agents
The calculated weights wjk and ujk are further fed to the actor and critic network as weighted mean field
The parameters of graph attention, actor and critic networks are shared by all homogeneous agents and trained end-to-end