دانلود و نمایش مقالات مرتبط با On-policy switch::صفحه 1
دانلود بهترین مقالات isi همراه با ترجمه فارسی 2

با سلام خدمت کاربران در صورتی که با خطای سیستم پرداخت بانکی مواجه شدید از طریق کارت به کارت (6037997535328901 بانک ملی ناصر خنجری ) مقاله خود را دریافت کنید (تا مشکل رفع گردد). 

نتیجه جستجو - On-policy switch

تعداد مقالات یافته شده: 1
ردیف عنوان نوع
1 An effective maximum entropy exploration approach for deceptive game in reinforcement learning
یک رویکرد اکتشاف حداکثر آنتروپی موثر برای بازی فریبنده در یادگیری تقویتی-2020
Deceptive games are games that utilize the reward structure to keep the agent away from the global optimization and have been grown up to become a huge challenge in the field of deep reinforcement learning intelligent exploration. Most of the cutting-edge exploration approaches, such as count-based and curiosity-driven, even with intrinsic motivation, which achieves better performance in the sparse re- ward game, still easily fall into local optimal traps in the deceptive game. To address this shortfall, we introduce a further exploration approach called Maximum Entropy Explore (MEE). Based on entropy re- wards and the off-policy actor-critic reinforcement learning algorithm, we divided the agent exploration policy into two independent parts, namely, the target policy and the explorer policy. The explorer policy, taking the maximum entropy of the target policy as the optimization goal, is used to interact with the environment and generated trajectories for the target policy. The target policy regards the maximization of external reward as the optimization goal to achieve the global solution. To alleviate the catastrophic forgetting problem which leads to the training of the agent not stabilized during the off-policy explo- ration phrase, the optimal experience replay is applied. An on-policy mode switch trick is used to validly prevent the unstable and diverge which caused by the deadly triad. We conduct experiments comparing our approach with state-of-the-art deep reinforcement learning algorithm and exploration methods in the grid world and StarCraft II environments with deceptive reward. The experiment indicates that the MME approach sets out to be in the present paper effectively avoids the deceptive reward trap and learns the global optimal strategy.
Keywords: Deep reinforcement learning | Deceptive game | Maximum entropy explorer approach | Experience replay | On-policy switch
مقاله انگلیسی
rss مقالات ترجمه شده rss مقالات انگلیسی rss کتاب های انگلیسی rss مقالات آموزشی
logo-samandehi
بازدید امروز: 316 :::::::: بازدید دیروز: 0 :::::::: بازدید کل: 316 :::::::: افراد آنلاین: 68