دانلود مقاله انگلیسی رایگان:تولید اهداف توجه برای اولویت بندی یادگیری تقویتی بینایی - 2020
دانلود بهترین مقالات isi همراه با ترجمه فارسی
دانلود مقاله انگلیسی یادگیری تقویتی رایگان
  • Generating attentive goals for prioritized hindsight reinforcement learning Generating attentive goals for prioritized hindsight reinforcement learning
    Generating attentive goals for prioritized hindsight reinforcement learning

    سال انتشار:

    2020


    عنوان انگلیسی مقاله:

    Generating attentive goals for prioritized hindsight reinforcement learning


    ترجمه فارسی عنوان مقاله:

    تولید اهداف توجه برای اولویت بندی یادگیری تقویتی بینایی


    منبع:

    Sciencedirect - Elsevier - Knowledge-Based Systems, 203 (2020) 106140. doi:10.1016/j.knosys.2020.106140


    نویسنده:

    Peng Liu a, Chenjia Bai a, Yingnan Zhao a, Chenyao Bai b, Wei Zhao a,∗, Xianglong Tang a


    چکیده انگلیسی:

    Typical reinforcement learning (RL) performs a single task and does not scale to problems in which an agent must perform multiple tasks, such as moving a robot arm to different locations. The multigoal framework extends typical RL using a goal-conditional value function and policy, whereby the agent pursues different goals in different episodes. By treating a virtual goal as the desired one, and frequently giving the agent rewards, hindsight experience replay has achieved promising results in the sparse-reward setting of multi-goal RL. However, these virtual goals are uniformly sampled after the replay state from experiences, regardless of their significance. We propose a novel prioritized hindsight model for multi-goal RL in which the agent is provided with more valuable goals, as measured by the expected temporal-difference (TD) error. An attentive goals generation (AGG) network, which consists of temporal convolutions, multi-head dot product attentions, and a last-attention network, is structured to generate the virtual goals to replay. The AGG network is trained by following the gradient of TDerror calculated by an actor–critic model, and generates goals to maximize the expected TD-error with replay transitions. The whole network is fully differentiable and can be learned in an end-to-end manner. The proposed method is evaluated on several robotic manipulating tasks and demonstrates improved sample efficiency and performance.
    Keywords: Attentive goals generation | Prioritized hindsight model | Hindsight experience replay | Reinforcement learning


    سطح: متوسط
    تعداد صفحات فایل pdf انگلیسی: 17
    حجم فایل: 3303 کیلوبایت

    قیمت: رایگان


    توضیحات اضافی:




اگر این مقاله را پسندیدید آن را در شبکه های اجتماعی به اشتراک بگذارید (برای به اشتراک گذاری بر روی ایکن های زیر کلیک کنید)

تعداد نظرات : 0

الزامی
الزامی
الزامی
rss مقالات ترجمه شده rss مقالات انگلیسی rss کتاب های انگلیسی rss مقالات آموزشی
logo-samandehi