GAT-MF

GAT-MF

Weighted MF Approximation

GAT-MF assumes the joint action value function can be factorized as the sum of local pairwise action value function

Q(s, a)jQj(s, a)j1WjjNjwjkQ~j(sj, sk, aj, ak)Wj=kNjwjkQ(s,\ a) \triangleq \sum_{j} Q_{j}(s,\ a) \triangleq \sum_{j} \frac{1}{W_{j}} \sum_{j \in \mathcal{N}_{j}} w_{jk} \tilde{Q}_{j}(s_{j},\ s_{k},\ a_{j},\ a_{k}) \qquad W_{j} = \sum_{k \in \mathcal{N}_{j}} w_{jk}

where Qj(s, a)Q_{j}(s,\ a) can be approximated by the pairwise action value with weighted mean of neighbours’ state and action

s~j=1WjkNjwjkska~j=1WjkNjwjkak\tilde{s}_{j} = \frac{1}{W_{j}} \sum_{k \in \mathcal{N}_{j}} w_{jk} s_{k} \qquad \tilde{a}_{j} = \frac{1}{W_{j}} \sum_{k \in \mathcal{N}_{j}} w_{jk} a_{k}

The approximation is derived from Taylor expansion at point (sj, s~j, aj, a~j)(s_{j},\ \tilde{s}_{j},\ a_{j},\ \tilde{a}_{j}) with a second order small error

Qj(s, a)=1WjjNjwjkQ~j(sj, sk, aj, ak)=1WjjNjwjkQ~j(sj, s~j+δsjksk, aj, a~j+δajkak)=1WjjNjwjk[Q~0Q~j(sj, s~j, aj, a~j)+s~jQ~0δsjk+a~jQ~0δajk+o(δsjk, δajk22)ojk]=Q~0+s~jQ~01WjjNjwjk(s~jsk)0+a~jQ~01WjjNjwjk(a~jak)0+1WjjNjwjkojk=Q~0+1WjjNjwjkojkQ~0=Q~j(sj, s~j, aj, a~j)\begin{aligned} Q_{j}(s,\ a) &= \frac{1}{W_{j}} \sum_{j \in \mathcal{N}_{j}} w_{jk} \tilde{Q}_{j}(s_{j},\ s_{k},\ a_{j},\ a_{k}) = \frac{1}{W_{j}} \sum_{j \in \mathcal{N}_{j}} w_{jk} \tilde{Q}_{j}(s_{j},\ \underset{s_{k}}{\underbrace{\tilde{s}_{j} + \delta s_{jk}}},\ a_{j},\ \underset{a_{k}}{\underbrace{\tilde{a}_{j} + \delta a_{jk}}}) \\[10mm] &= \frac{1}{W_{j}} \sum_{j \in \mathcal{N}_{j}} w_{jk} \bigg[ \underset{\tilde{Q}_{j}(s_{j},\ \tilde{s}_{j},\ a_{j},\ \tilde{a}_{j})}{\underbrace{\tilde{Q}_{0}}} + \nabla_{\tilde{s}_{j}} \tilde{Q}_{0} \cdot \delta s_{jk} + \nabla_{\tilde{a}_{j}} \tilde{Q}_{0} \cdot \delta a_{jk} + \underset{o_{jk}}{\underbrace{o \left( \Big\| \delta s_{jk},\ \delta a_{jk} \Big\|_{2}^{2} \right)}} \bigg] \\[10mm] &= \tilde{Q}_{0} + \nabla_{\tilde{s}_{j}} \tilde{Q}_{0} \cdot \underset{0}{\underbrace{\frac{1}{W_{j}} \sum_{j \in \mathcal{N}_{j}} w_{jk} (\tilde{s}_{j} - s_{k})}} + \nabla_{\tilde{a}_{j}} \tilde{Q}_{0} \cdot \underset{0}{\underbrace{\frac{1}{W_{j}} \sum_{j \in \mathcal{N}_{j}} w_{jk} (\tilde{a}_{j} - a_{k})}} + \frac{1}{W_{j}} \sum_{j \in \mathcal{N}_{j}} w_{jk} o_{jk} \\[10mm] &= \tilde{Q}_{0} + \frac{1}{W_{j}} \sum_{j \in \mathcal{N}_{j}} w_{jk} o_{jk} \approx \tilde{Q}_{0} = \tilde{Q}_{j}(s_{j},\ \tilde{s}_{j},\ a_{j},\ \tilde{a}_{j}) \end{aligned}

Similarly, the policy of each agent can also be approximated through weighted mean field with corresponding weights ujku_{jk}

πj(s)π^j(sj, s^j)=π^j[sj, 1UjkNjujksk]Uj=kNjujk\pi_{j}(s) \approx \hat{\pi}_{j}(s_{j},\ \hat{s}_{j}) = \hat{\pi}_{j} \left[ s_{j},\ \frac{1}{U_{j}} \sum_{k \in \mathcal{N}_{j}} u_{jk} s_{k} \right] \qquad U_{j} = \sum_{k \in \mathcal{N}_{j}} u_{jk}

Graph Construction

GAT-MF designs a graph attention mechanism to construct coordination graph with dynamic weights between agents

The calculated weights wjkw_{jk} and ujku_{jk} are further fed to the actor and critic network as weighted mean field

The parameters of graph attention, actor and critic networks are shared by all homogeneous agents and trained end-to-end


GAT-MF
http://example.com/2024/10/14/GAT-MF/
Author
木辛
Posted on
October 14, 2024
Licensed under