دانلود مقاله انگلیسی رایگان:یک رویکرد اکتشاف حداکثر آنتروپی موثر برای بازی فریبنده در یادگیری تقویتی - 2020
بلافاصله پس از پرداخت دانلود کنید
دانلود مقاله انگلیسی یادگیری تقویتی رایگان
  • An effective maximum entropy exploration approach for deceptive game in reinforcement learning An effective maximum entropy exploration approach for deceptive game in reinforcement learning
    An effective maximum entropy exploration approach for deceptive game in reinforcement learning

    سال انتشار:

    2020


    عنوان انگلیسی مقاله:

    An effective maximum entropy exploration approach for deceptive game in reinforcement learning


    ترجمه فارسی عنوان مقاله:

    یک رویکرد اکتشاف حداکثر آنتروپی موثر برای بازی فریبنده در یادگیری تقویتی


    منبع:

    Sciencedirect - Elsevier - Neurocomputing, 403 (2020) 98-108. doi:10.1016/j.neucom.2020.04.068


    نویسنده:

    Chunmao Li 1 , Xuanguang Wei 1 , Yinliang Zhao ∗, Xupeng Geng


    چکیده انگلیسی:

    Deceptive games are games that utilize the reward structure to keep the agent away from the global optimization and have been grown up to become a huge challenge in the field of deep reinforcement learning intelligent exploration. Most of the cutting-edge exploration approaches, such as count-based and curiosity-driven, even with intrinsic motivation, which achieves better performance in the sparse re- ward game, still easily fall into local optimal traps in the deceptive game. To address this shortfall, we introduce a further exploration approach called Maximum Entropy Explore (MEE). Based on entropy re- wards and the off-policy actor-critic reinforcement learning algorithm, we divided the agent exploration policy into two independent parts, namely, the target policy and the explorer policy. The explorer policy, taking the maximum entropy of the target policy as the optimization goal, is used to interact with the environment and generated trajectories for the target policy. The target policy regards the maximization of external reward as the optimization goal to achieve the global solution. To alleviate the catastrophic forgetting problem which leads to the training of the agent not stabilized during the off-policy explo- ration phrase, the optimal experience replay is applied. An on-policy mode switch trick is used to validly prevent the unstable and diverge which caused by the deadly triad. We conduct experiments comparing our approach with state-of-the-art deep reinforcement learning algorithm and exploration methods in the grid world and StarCraft II environments with deceptive reward. The experiment indicates that the MME approach sets out to be in the present paper effectively avoids the deceptive reward trap and learns the global optimal strategy.
    Keywords: Deep reinforcement learning | Deceptive game | Maximum entropy explorer approach | Experience replay | On-policy switch


    سطح: متوسط
    تعداد صفحات فایل pdf انگلیسی: 11
    حجم فایل: 2224 کیلوبایت

    قیمت: رایگان


    توضیحات اضافی:




اگر این مقاله را پسندیدید آن را در شبکه های اجتماعی به اشتراک بگذارید (برای به اشتراک گذاری بر روی ایکن های زیر کلیک کنید)

تعداد نظرات : 0

الزامی
الزامی
الزامی
rss مقالات ترجمه شده rss مقالات انگلیسی rss کتاب های انگلیسی rss مقالات آموزشی
logo-samandehi