Fixed-Wing UAVs flocking in continuous spaces: A deep reinforcement learning approach
پهپادهای ثابت بال در فضاهای مداوم هجوم می آورند: یک رویکرد یادگیری تقویتی عمیق-2020
Fixed-Wing UAVs (Unmanned Aerial Vehicles) flocking is still a challenging problem due to the kinematics complexity and environmental dynamics. In this paper, we solve the leader–followers flocking problem using a novel deep reinforcement learning algorithm that can generate roll angle and velocity commands by training an end-to-end controller in continuous state and action spaces. Specifically, we choose CACLA (Continuous Actor–Critic Learning Automation) as the base algorithm and we use the multi-layer perceptron to represent both the actor and the critic. Besides, we further improve the learning efficiency by using the experience replay technique that stores the training data in the experience memory and samples from the memory as needed. We have compared the performance of the proposed CACER (Continuous Actor–Critic with Experience Replay) algorithm with benchmark algorithms such as DDPG and double DQN in numerical simulation, and we have demonstrated the performance of the learned optimal policy in semi-physical simulation without any parameter tuning.
Keywords: Fixed-wing UAV | Flocking | Reinforcement learning | Actor–critic
ADRL: An attention-based deep reinforcement learning framework for knowledge graph reasoning
ADRL: یک چارچوب یادگیری تقویتی عمیق مبتنی بر توجه برای استدلال نمودار دانش-2020
Knowledge graph reasoning is one of the key technologies for knowledge graph construction, which plays an important part in application scenarios such as vertical search and intelligent question answering. It is intended to infer the desired entity from the entities and relations that already exist in the knowledge graph. Most current methods for reasoning, such as embedding-based methods, globally embed all entities and relations, and then use the similarity of vectors to infer relations between entities or whether given triples are true. However, in real application scenarios, we require a clear and interpretable target entity as the output answer. In this paper, we propose a novel attention-based deep reinforcement learning framework (ADRL) for learning multi-hop relational paths, which improves the efficiency, generalization capacity, and interpretability of conventional approaches through the structured perception of deep learning and relational reasoning of reinforcement learning. We define the entire process of reasoning as a Markov decision process. First, we employ CNN to map the knowledge graph to a low-dimensional space, and a message-passing mechanism to sense neighbor entities at each level, and then employ LSTM to memorize and generate a sequence of historical trajectories to form a policy and value functions. We design a relational module that includes a selfattention mechanism that can infer and share the weights of neighborhood entity vectors and relation vectors. Finally, we employ the actor–critic algorithm to optimize the entire framework. Experiments confirm the effectiveness and efficiency of our method on several benchmark data sets.
Keywords: Knowledge graph | Knowledge reasoning | Reinforcement learning | Deep learning | Attention
Generating attentive goals for prioritized hindsight reinforcement learning
تولید اهداف توجه برای اولویت بندی یادگیری تقویتی بینایی-2020
Typical reinforcement learning (RL) performs a single task and does not scale to problems in which an agent must perform multiple tasks, such as moving a robot arm to different locations. The multigoal framework extends typical RL using a goal-conditional value function and policy, whereby the agent pursues different goals in different episodes. By treating a virtual goal as the desired one, and frequently giving the agent rewards, hindsight experience replay has achieved promising results in the sparse-reward setting of multi-goal RL. However, these virtual goals are uniformly sampled after the replay state from experiences, regardless of their significance. We propose a novel prioritized hindsight model for multi-goal RL in which the agent is provided with more valuable goals, as measured by the expected temporal-difference (TD) error. An attentive goals generation (AGG) network, which consists of temporal convolutions, multi-head dot product attentions, and a last-attention network, is structured to generate the virtual goals to replay. The AGG network is trained by following the gradient of TDerror calculated by an actor–critic model, and generates goals to maximize the expected TD-error with replay transitions. The whole network is fully differentiable and can be learned in an end-to-end manner. The proposed method is evaluated on several robotic manipulating tasks and demonstrates improved sample efficiency and performance.
Keywords: Attentive goals generation | Prioritized hindsight model | Hindsight experience replay | Reinforcement learning
Interpretable policies for reinforcement learning by empirical fuzzy sets
سیاست های قابل تفسیر برای یادگیری تقویتی توسط مجموعه های فازی تجربی-2020
This paper proposes a method and an algorithm to implement interpretable fuzzy reinforcement learning (IFRL). It provides alternative solutions to common problems in RL, like function approximation and continuous action space. The learning process resembles that of human beings by clustering the encountered states, developing experiences for each of the typical cases, and making decisions fuzzily. The learned policy can be expressed as human-intelligible IF-THEN rules, which facilitates further investigation and improvement. It adopts the actor–critic architecture whereas being different from mainstream policy gradient methods. The value function is approximated through the fuzzy system AnYa. The state–action space is discretized into a static grid with nodes. Each node is treated as one prototype and corresponds to one fuzzy rule, with the value of the node being the consequent. Values of consequents are updated using the Sarsa(????) algorithm. Probability distribution of optimal actions regarding different states is estimated through Empirical Data Analytics (EDA), Autonomous Learning Multi-Model Systems (ALMMo), and Empirical Fuzzy Sets (εFS). The fuzzy kernel of IFRL avoids the lack of interpretability in other methods based on neural networks. Simulation results with four problems, namely Mountain Car, Continuous Gridworld, Pendulum Position, and Tank Level Control, are presented as a proof of the proposed concept.
Keywords: Interpretable fuzzy systems | Reinforcement learning | Probability distribution learning | Autonomous learning systems | AnYa type fuzzy systems | Empirical Fuzzy Sets
Behavior fusion for deep reinforcement learning
همجوشی رفتار برای یادگیری تقویت عمیق-2020
For deep reinforcement learning (DRL) system, it is difficult to design a reward function for complex tasks, so this paper proposes a framework of behavior fusion for the actor–critic architecture, which learns the policy based on an advantage function that consists of two value functions. Firstly, the proposed method decomposes a complex task into several sub-tasks, and merges the trained policies for those sub-tasks into a unified policy for the complex task, instead of designing a new reward function and training for the policy. Each sub-task is trained individually by an actor–critic algorithm using a simple reward function. These pre-trained sub-tasks are building blocks that are used to rapidly assemble a rapid prototype of a complicated task. Secondly, the proposed method integrates modules in the calculation of the policy gradient by calculating the accumulated returns to reduce variation. Thirdly, two alternative methods to acquire integrated returns for the complicated task are also proposed. The Atari 2600 pong game and a wafer probe task are used to validate the performance of the proposed methods by comparison with the method using a gate network.
Keywords: Deep reinforcement learning | Actor–critic | Policy gradient | Behavior fusion | Complex task
An adaptive deep reinforcement learning approach for MIMO PID control of mobile robots
یک رویکرد یادگیری تقویتی عمیق سازگار برای کنترل MIMO PID ربات های موبایل-2020
Intelligent control systems are being developed for the control of plants with complex dynamics. However, the simplicity of the PID (proportional–integrative–derivative) controller makes it still widely used in industrial applications and robotics. This paper proposes an intelligent control system based on a deep reinforcement learning approach for self-adaptive multiple PID controllers for mobile robots. The proposed hybrid control strategy uses an actor–critic structure and it only receives low-level dynamic information as input and simultaneously estimates the multiple parameters or gains of the PID controllers. The proposed approach was tested in several simulated environments and in a real time robotic platform showing the feasibility of the approach for the low-level control of mobile robots. From the simulation and experimental results, our proposed approach demonstrated that it can be of aid by providing with behavior that can compensate or even adapt to changes in the uncertain environments providing a model free unsupervised solution. Also, a comparative study against other adaptive methods for multiple PIDs tuning is presented, showing a successful performance of the approach.
Keywords: Reinforcement learning | Adaptive control | Policy gradient | Mobile robots | Multi-platforms