دانلود مقاله انگلیسی رایگان:مرزهای PALO برای یادگیری تقویتی در بازی های تصادفی تا حدی قابل مشاهده - 2020
بلافاصله پس از پرداخت دانلود کنید
دانلود مقاله انگلیسی یادگیری تقویتی رایگان
  • PALO Bounds for Reinforcement Learning in Partially Observable Stochastic Games PALO Bounds for Reinforcement Learning in Partially Observable Stochastic Games
    PALO Bounds for Reinforcement Learning in Partially Observable Stochastic Games

    سال انتشار:

    2020


    عنوان انگلیسی مقاله:

    PALO Bounds for Reinforcement Learning in Partially Observable Stochastic Games


    ترجمه فارسی عنوان مقاله:

    مرزهای PALO برای یادگیری تقویتی در بازی های تصادفی تا حدی قابل مشاهده


    منبع:

    Sciencedirect - Elsevier - Neurocomputing, Journal Pre-proof. doi:10.1016/j.neucom.2020.08.054


    نویسنده:

    Roi Ceren, Keyang He, Prashant Doshi, Bikramjit Banerjee


    چکیده انگلیسی:

    A partially observable stochastic game (POSG) is a general model for multiagent de- cision making under uncertainty. Perkins’ Monte Carlo exploring starts for partially observable Markov decision process (POMDP) (MCES-P) integrates Monte Carlo ex- ploring starts (MCES) into a local search of the policy space to offer an elegant template for model-free reinforcement learning in POSGs. However, multiagent reinforcement learning in POSGs is tremendously more complex than in single agent settings due to the heterogeneity of agents and discrepancy of their goals. In this article, we generalize reinforcement learning under partial observability to self-interested and cooperative multiagent settings under the POSG umbrella. We present three new templates for multiagent reinforcement learning in POSGs. MCES for interactive POMDP (MCESIP ) extends MCES-P by maintaining predictions of the other agent’s actions based on dynamic beliefs over models. MCES for multiagent POMDP (MCES-MP) generalizes MCES-P to the canonical multiagent POMDP framework, with a single policy mapping joint observations of all agents to joint actions. Finally, MCES for factored-reward multiagent POMDP (MCES-FMP) has each agent individually mapping joint obser- vations to their own action. We use probabilistic approximate locally optimal (PALO) bounds to analyze sample complexity, thereby instantiating these templates to PALO learning. We promote sample efficiency by including a policy space pruning technique and evaluate the approaches on six benchmark domains as well as compare with the state-of-the-art techniques, which demonstrates that MCES-IP and MCES-FMP yield improved policies with fewer samples compared to the previous baselines.
    Keywords: multiagent systems | reinforcement learning | POMDP | POSG


    سطح: متوسط
    تعداد صفحات فایل pdf انگلیسی: 46
    حجم فایل: 1213 کیلوبایت

    قیمت: رایگان


    توضیحات اضافی:




اگر این مقاله را پسندیدید آن را در شبکه های اجتماعی به اشتراک بگذارید (برای به اشتراک گذاری بر روی ایکن های زیر کلیک کنید)

تعداد نظرات : 0

الزامی
الزامی
الزامی
rss مقالات ترجمه شده rss مقالات انگلیسی rss کتاب های انگلیسی rss مقالات آموزشی
logo-samandehi