عنوان انگلیسی مقاله:
XCS with opponent modelling for concurrent reinforcement learners
ترجمه فارسی عنوان مقاله:
XCS با مدل سازی حریف برای یادگیرنده تقویتی همزمان
Sciencedirect - Elsevier - Neurocomputing, 399 (2020) 449-466. doi:10.1016/j.neucom.2020.02.118
Hao ChenHao Chen, Chang Wang, Jian Huang ∗, Jiangtao Kong, Hanqiang Deng
Reinforcement learning (RL) of optimal policies against an opponent agent also with learning capabil- ity is still challenging in Markov games. A variety of algorithms have been proposed for solving this problem such as the traditional Q-learning-based RL (QbRL) algorithms as well as the state-of-the-art neural-network-based RL (NNbRL) algorithms. However, the QbRL approaches have poor generalization capability for complex problems with non-stationary opponents, while the learned policies by NNbRL al- gorithms are lack of explainability and transparency. In this paper, we propose an algorithm X-OMQ( λ) that integrates eXtended Classifier System (XCS) with opponent modelling for concurrent reinforcement learners in zero-sum Markov Games. The algorithm can learn general, accurate, and interpretable action selection rules and allow policy optimization using the genetic algorithm (GA). Besides, the X-OMQ( λ) agent optimizes the established opponent’s model while simultaneously learning to select actions in a goal-directed manner. In addition, we use the eligibility trace mechanism to further speed up the learn- ing process. In the reinforcement component, not only the classifiers in the action set are updated, but other relevant classifiers are also updated in a certain proportion. We demonstrate the performance of the proposed algorithm in the hunter prey problem and two adversarial soccer scenarios where the op- ponent is allowed to learn with several benchmark QbRL and NNbRL algorithms. The results show that our method has similar learning performance with the NNbRL algorithms while our method requires no prior knowledge of the opponent or the environment. Moreover, the learned action selection rules are also interpretable while having generalization capability.
Keywords: Opponent modelling | XCS | Markov games | Reinforcement learning