Adaptive and Learning Agents: International Workshop, ALA by Edward Robinson, Peter McBurney, Xin Yao (auth.), Peter

By Edward Robinson, Peter McBurney, Xin Yao (auth.), Peter Vrancx, Matthew Knudson, Marek Grześ (eds.)

This quantity constitutes the completely refereed post-conference lawsuits of the overseas Workshop on Adaptive and studying brokers, ALA 2011, held on the tenth overseas convention on self sufficient brokers and Multiagent platforms, AAMAS 2011, in Taipei, Taiwan, in might 2011. The 7 revised complete papers offered including 1 invited speak have been conscientiously reviewed and chosen from quite a few submissions. The papers are geared up in topical sections on unmarried and multi-agent reinforcement studying, supervised multiagent studying, edition and studying in dynamic environments, studying belief and attractiveness, minority video games and agent coordination.

By solving VGs, agents agree on an optimal Nash equilibrium for each virtual game, which by construction is also an optimal NE for the corresponding stage game. 1 Note that throughout this work, the small letter q indicates local q-tables calculated by each agent, and capital letter Q represents an ordinary central Q-table. A Convergent Multiagent Reinforcement Learning Approach 41 The considered class of sequential stage games does not require agents to observe the state of their environment. This, in a sense, is related to other models than SGs, which are designed based on the partial observability paradigm.

This is a common problem in transfer learning (related to the problem of negative transfer [10]) which we cannot solve, but work to avoid by considering the distance between successor states. Consider patterns in the target task, s2 , s2 , and a pattern in the source task, s1 , s1 . Using Algorithm 2, lines 2 and 4, we find that f2 and f1 maps each of the successor states into the common sub-space as sc,2 , sc,2 and sc,1 , sc,1 respectively. 3 This state may not be the best choice for a prior in the target task — only states with small distances are used as inputs and outputs for the supervised learning algorithm.

In this subclass, several stage games are played one after the other. We also propose a transformation function for that class and prove that transformed and original games have the same set of optimal joint strategies. Under the condition that the played game is obtained through transformation, it will be proven that our approach converges to an optimal joint strategy for the last stage game of the transformed game and thus also for the original game. In addition, the ability to converge to -optimal joint strategies for each of the stage games is shown.

