نتیجه جستجو - Actor–critic

ردیف | عنوان | نوع |
---|---|---|

1 |
Fixed-Wing UAVs flocking in continuous spaces: A deep reinforcement learning approach
پهپادهای ثابت بال در فضاهای مداوم هجوم می آورند: یک رویکرد یادگیری تقویتی عمیق-2020 Fixed-Wing UAVs (Unmanned Aerial Vehicles) flocking is still a challenging problem due to the
kinematics complexity and environmental dynamics. In this paper, we solve the leader–followers
flocking problem using a novel deep reinforcement learning algorithm that can generate roll angle
and velocity commands by training an end-to-end controller in continuous state and action spaces.
Specifically, we choose CACLA (Continuous Actor–Critic Learning Automation) as the base algorithm
and we use the multi-layer perceptron to represent both the actor and the critic. Besides, we further
improve the learning efficiency by using the experience replay technique that stores the training
data in the experience memory and samples from the memory as needed. We have compared the
performance of the proposed CACER (Continuous Actor–Critic with Experience Replay) algorithm
with benchmark algorithms such as DDPG and double DQN in numerical simulation, and we have
demonstrated the performance of the learned optimal policy in semi-physical simulation without any
parameter tuning. Keywords: Fixed-wing UAV | Flocking | Reinforcement learning | Actor–critic |
مقاله انگلیسی |

2 |
ADRL: An attention-based deep reinforcement learning framework for knowledge graph reasoning
ADRL: یک چارچوب یادگیری تقویتی عمیق مبتنی بر توجه برای استدلال نمودار دانش-2020 Knowledge graph reasoning is one of the key technologies for knowledge graph construction, which
plays an important part in application scenarios such as vertical search and intelligent question
answering. It is intended to infer the desired entity from the entities and relations that already exist in
the knowledge graph. Most current methods for reasoning, such as embedding-based methods, globally
embed all entities and relations, and then use the similarity of vectors to infer relations between
entities or whether given triples are true. However, in real application scenarios, we require a clear and
interpretable target entity as the output answer. In this paper, we propose a novel attention-based deep
reinforcement learning framework (ADRL) for learning multi-hop relational paths, which improves
the efficiency, generalization capacity, and interpretability of conventional approaches through the
structured perception of deep learning and relational reasoning of reinforcement learning. We define
the entire process of reasoning as a Markov decision process. First, we employ CNN to map the
knowledge graph to a low-dimensional space, and a message-passing mechanism to sense neighbor
entities at each level, and then employ LSTM to memorize and generate a sequence of historical
trajectories to form a policy and value functions. We design a relational module that includes a selfattention
mechanism that can infer and share the weights of neighborhood entity vectors and relation
vectors. Finally, we employ the actor–critic algorithm to optimize the entire framework. Experiments
confirm the effectiveness and efficiency of our method on several benchmark data sets. Keywords: Knowledge graph | Knowledge reasoning | Reinforcement learning | Deep learning | Attention |
مقاله انگلیسی |

3 |
Generating attentive goals for prioritized hindsight reinforcement learning
تولید اهداف توجه برای اولویت بندی یادگیری تقویتی بینایی-2020 Typical reinforcement learning (RL) performs a single task and does not scale to problems in which
an agent must perform multiple tasks, such as moving a robot arm to different locations. The multigoal
framework extends typical RL using a goal-conditional value function and policy, whereby the
agent pursues different goals in different episodes. By treating a virtual goal as the desired one, and
frequently giving the agent rewards, hindsight experience replay has achieved promising results in the
sparse-reward setting of multi-goal RL. However, these virtual goals are uniformly sampled after the
replay state from experiences, regardless of their significance. We propose a novel prioritized hindsight
model for multi-goal RL in which the agent is provided with more valuable goals, as measured by the
expected temporal-difference (TD) error. An attentive goals generation (AGG) network, which consists
of temporal convolutions, multi-head dot product attentions, and a last-attention network, is structured
to generate the virtual goals to replay. The AGG network is trained by following the gradient of TDerror
calculated by an actor–critic model, and generates goals to maximize the expected TD-error
with replay transitions. The whole network is fully differentiable and can be learned in an end-to-end
manner. The proposed method is evaluated on several robotic manipulating tasks and demonstrates
improved sample efficiency and performance. Keywords: Attentive goals generation | Prioritized hindsight model | Hindsight experience replay | Reinforcement learning |
مقاله انگلیسی |

4 |
Interpretable policies for reinforcement learning by empirical fuzzy sets
سیاست های قابل تفسیر برای یادگیری تقویتی توسط مجموعه های فازی تجربی-2020 This paper proposes a method and an algorithm to implement interpretable fuzzy reinforcement learning
(IFRL). It provides alternative solutions to common problems in RL, like function approximation and continuous
action space. The learning process resembles that of human beings by clustering the encountered states,
developing experiences for each of the typical cases, and making decisions fuzzily. The learned policy can
be expressed as human-intelligible IF-THEN rules, which facilitates further investigation and improvement. It
adopts the actor–critic architecture whereas being different from mainstream policy gradient methods. The
value function is approximated through the fuzzy system AnYa. The state–action space is discretized into a
static grid with nodes. Each node is treated as one prototype and corresponds to one fuzzy rule, with the value
of the node being the consequent. Values of consequents are updated using the Sarsa(????) algorithm. Probability
distribution of optimal actions regarding different states is estimated through Empirical Data Analytics (EDA),
Autonomous Learning Multi-Model Systems (ALMMo), and Empirical Fuzzy Sets (εFS). The fuzzy kernel of
IFRL avoids the lack of interpretability in other methods based on neural networks. Simulation results with
four problems, namely Mountain Car, Continuous Gridworld, Pendulum Position, and Tank Level Control, are
presented as a proof of the proposed concept. Keywords: Interpretable fuzzy systems | Reinforcement learning | Probability distribution learning | Autonomous learning systems | AnYa type fuzzy systems | Empirical Fuzzy Sets |
مقاله انگلیسی |

5 |
Behavior fusion for deep reinforcement learning
همجوشی رفتار برای یادگیری تقویت عمیق-2020 For deep reinforcement learning (DRL) system, it is difficult to design a reward function for complex
tasks, so this paper proposes a framework of behavior fusion for the actor–critic architecture, which
learns the policy based on an advantage function that consists of two value functions. Firstly, the
proposed method decomposes a complex task into several sub-tasks, and merges the trained policies
for those sub-tasks into a unified policy for the complex task, instead of designing a new reward
function and training for the policy. Each sub-task is trained individually by an actor–critic algorithm
using a simple reward function. These pre-trained sub-tasks are building blocks that are used to
rapidly assemble a rapid prototype of a complicated task. Secondly, the proposed method integrates
modules in the calculation of the policy gradient by calculating the accumulated returns to reduce
variation. Thirdly, two alternative methods to acquire integrated returns for the complicated task are
also proposed. The Atari 2600 pong game and a wafer probe task are used to validate the performance
of the proposed methods by comparison with the method using a gate network. Keywords: Deep reinforcement learning | Actor–critic | Policy gradient | Behavior fusion | Complex task |
مقاله انگلیسی |

6 |
An adaptive deep reinforcement learning approach for MIMO PID control of mobile robots
یک رویکرد یادگیری تقویتی عمیق سازگار برای کنترل MIMO PID ربات های موبایل-2020 Intelligent control systems are being developed for the control of plants with complex dynamics.
However, the simplicity of the PID (proportional–integrative–derivative) controller makes it still widely
used in industrial applications and robotics. This paper proposes an intelligent control system based on
a deep reinforcement learning approach for self-adaptive multiple PID controllers for mobile robots.
The proposed hybrid control strategy uses an actor–critic structure and it only receives low-level
dynamic information as input and simultaneously estimates the multiple parameters or gains of the PID
controllers. The proposed approach was tested in several simulated environments and in a real time
robotic platform showing the feasibility of the approach for the low-level control of mobile robots.
From the simulation and experimental results, our proposed approach demonstrated that it can be
of aid by providing with behavior that can compensate or even adapt to changes in the uncertain
environments providing a model free unsupervised solution. Also, a comparative study against other
adaptive methods for multiple PIDs tuning is presented, showing a successful performance of the
approach. Keywords: Reinforcement learning | Adaptive control | Policy gradient | Mobile robots | Multi-platforms |
مقاله انگلیسی |