دانلود و نمایش مقالات مرتبط با Policy gradient::صفحه 1
دانلود بهترین مقالات isi همراه با ترجمه فارسی 2

با سلام خدمت کاربران در صورتی که با خطای سیستم پرداخت بانکی مواجه شدید از طریق کارت به کارت (6037997535328901 بانک ملی ناصر خنجری ) مقاله خود را دریافت کنید (تا مشکل رفع گردد). 

نتیجه جستجو - Policy gradient

تعداد مقالات یافته شده: 31
ردیف عنوان نوع
1 Resource Allocation in Time Slotted Channel Hopping (TSCH) Networks Based on Phasic Policy Gradient Reinforcement Learning
تخصیص منابع در شبکه های گام کانال با شکاف زمانی (TSCH) بر اساس یادگیری تقویت گرادیان خط مشی فازی-2022
The concept of the Industrial Internet of Things (IIoT) is gaining prominence due to its lowcost solutions and improved productivity of manufacturing processes. To address the ultra-high reliability and ultra-low power communication requirements of IIoT networks, Time Slotted Channel Hopping (TSCH) behavioral mode has been introduced in IEEE 802.15.4e standard. Scheduling the packet transmissions in IIoT networks is a difficult task owing to the limited resources and dynamic topology. In IEEE 802.15.4e TSCH, the design of the schedule is open to implementation. In this paper, we propose a phasic policy gradient (PPG) based TSCH schedule learning algorithm. We construct the utility function that accounts for the throughput, and energy efficiency of the TSCH network. The proposed PPG based scheduling algorithm overcomes the drawbacks of totally distributed and totally centralized deep reinforcement learning-based scheduling algorithms by employing the actor–critic policy gradient method that learns the scheduling algorithm in two phases, namely policy phase and auxiliary phase. In this method, we show that the schedule converges quickly compared to any other actor–critic method and also improves the system throughput performance by 58% compared to the minimal scheduling function, a default TSCH schedule.
Keywords: Industrial internet of things | IEEE 802.15.4e | Time slotted channel hopping | Deep reinforcement learning | Actor–critic policy gradient methods | Phasic policy gradient
مقاله انگلیسی
2 Rule-interposing deep reinforcement learning based energy management strategy for power-split hybrid electric vehicle
استراتژی مدیریت انرژی مبتنی بر یادگیری تقویتی عمیق قانون برای خودروی الکتریکی هیبریدی تقسیم برق-2020
The optimization and training processes of deep reinforcement learning (DRL) based energy management strategy (EMS) can be very slow and resource-intensive. In this paper, an improved energy management framework that embeds expert knowledge into deep deterministic policy gradient (DDPG) is proposed. Incorporated with the battery characteristics and the optimal brake specific fuel consumption (BSFC) curve of hybrid electric vehicles (HEVs), we are committed to solving the optimization problem of multi-objective energy management with a large space of control variables. By incorporating this prior knowledge, the proposed framework not only accelerates the learning process, but also gets a better fuel economy, thus making the energy management system relatively stable. The experimental results show that the proposed EMS outperforms the one without prior knowledge and the other state-of-art deep reinforcement learning approaches. In addition, the proposed approach can be easily generalized to other types of HEV EMSs.
Keywords: Energy management strategy | Hybrid electric vehicle | Expert knowledge | Deep deterministic policy gradient | Continuous action space
مقاله انگلیسی
3 Study on deep reinforcement learning techniques for building energy consumption forecasting
مطالعه تکنیک های یادگیری تقویتی عمیق برای پیش بینی مصرف انرژی در ساخت-2020
Reliable and accurate building energy consumption prediction is becoming increasingly pivotal in build- ing energy management. Currently, data-driven approach has shown promising performances and gained lots of research attention due to its efficiency and flexibility. As a combination of reinforcement learning and deep learning, deep reinforcement learning (DRL) techniques are expected to solve nonlinear and complex issues. However, very little is known about DRL techniques in forecasting building energy con- sumption. Therefore, this paper presents a case study of an office building using three commonly-used DRL techniques to forecast building energy consumption, namely Asynchronous Advantage Actor-Critic (A3C), Deep Deterministic Policy Gradient (DDPG) and Recurrent Deterministic Policy Gradient (RDPG). The objective is to investigate the potential of DRL techniques in building energy consumption predic- tion field. A comprehensive comparison between DRL models and common supervised models is also provided. The results demonstrate that the proposed DDPG and RDPG models have obvious advantages in forecast- ing building energy consumption compared to common supervised models, while accounting for more computation time for model training. Their prediction performances measured by mean absolute error (MAE) can be improved by 16%-24% for single-step ahead prediction, and 19%-32% for multi-step ahead prediction. The results also indicate that A3C performs poor prediction accuracy and shows much slower convergence speed than DDPG and RDPG. However, A3C is still the most efficient technique among these three DRL methods. The findings are enlightening and the proposed DRL methodologies can be positively extended to other prediction problems, e.g., wind speed prediction and electricity load prediction.
Keywords: Energy consumption prediction | Ground source heat pump | Deep reinforcement learning | Asynchronous advantage Actor-Critic | Deep deterministic Policy gradient | Recurrent deterministic Policy gradient
مقاله انگلیسی
4 Reinforcement learning based on movement primitives for contact tasks
یادگیری تقویتی بر اساس ابتدای حرکت برای وظایف تماس-2020
Recently, robot learning through deep reinforcement learning has incorporated various robot tasks through deep neural networks, without using specific control or recognition algorithms. However, this learning method is difficult to apply to the contact tasks of a robot, due to the exertion of excessive force from the random search process of reinforcement learning. Therefore, when applying reinforcement learning to contact tasks, solving the contact problem using an existing force controller is necessary. A neural-network-based movement primitive (NNMP) that generates a continuous trajectory which can be transmitted to the force controller and learned through a deep deterministic policy gradient (DDPG) algorithm is proposed for this study. In addition, an imitation learning algorithm suitable for NNMP is proposed such that the trajectories similar to the demonstration trajectory are stably generated. The performance of the proposed algorithms was verified using a square peg-in-hole assembly task with a tolerance of 0.1 mm. The results confirm that the complicated assembly trajectory can be learned stably through NNMP by the proposed imitation learning algorithm, and that the assembly trajectory is improved by learning the proposed NNMP through the DDPG algorithm.
Keywords: AI-based methods | Force control | Deep Learning in robotics and automation
مقاله انگلیسی
5 Comparison of end-to-end and hybrid deep reinforcement learning strategies for controlling cable-driven parallel robots
مقایسه استراتژی های یادگیری تقویتی عمیق انتها به انتها و ترکیبی برای کنترل ربات های موازی کابل محور-2020
Deep reinforcement learning (DRL) has been proven effective in learning policies of high-dimensional states and actions. Recently, a variety of robot manipulation tasks have been accomplished using end-to- end DRL strategies. An end-to-end DRL strategy accomplishes a robot manipulation task as a black box. On the other hand, a robot manipulation task can be divided into multiple subtasks and accomplished by non-learning-based approaches. A hybrid DRL strategy integrates DRL with non-learning-based ap- proaches. The hybrid DRL strategy accomplishes some subtasks of a robot manipulation task by DRL and the rest subtasks by non-learning-based approaches. However, the effects of integrating DRL with non- learning-based approaches on the learning speed and the robustness of DRL to model uncertainties have not been discussed. In this study, an end-to-end DRL strategy and a hybrid DRL strategy are developed and compared in controlling a cable-driven parallel robot. This study shows that, by integrating DRL with non-learning-based approaches, the hybrid DRL strategy learns faster and is more robust to model un- certainties than the end-to-end DRL strategy. This study demonstrates that, by taking advantages of both learning and non-learning-based approaches, the hybrid DRL strategy provides an alternative to accom- plish a robot manipulation task.
Keywords: Deep reinforcement learning | End-to-end DRL strategy | Hybrid DRL strategy | Deep deterministic policy gradient | Cable-driven parallel robot
مقاله انگلیسی
6 A TD3-based multi-agent deep reinforcement learning method in mixed cooperation-competition environment
یک روش یادگیری تقویتی عمیق چند عاملی مبتنی بر TD3 در محیط رقابتی و همکاری ترکیبی -2020
We explored the problem about function approximation error and complex mission adaptability in multiagent deep reinforcement learning. This paper proposes a new multi-agent deep reinforcement learning algorithm framework named multi-agent time delayed deep deterministic policy gradient. Our work reduces the overestimation error of neural network approximation and variance of estimation result using dual-centered critic, group target network smoothing and delayed policy updating. According to experiment results, it improves the ability to adapt complex missions eventually. Then, we discuss that there is an inevitable overestimation issue about existing multi-agent algorithms about approximating real action-value equations with neural network. We also explain the approximate error of equations in the multi-agent deep deterministic policy gradient algorithm mathematically and experimentally. Finally, the application of our algorithm in the mixed cooperative competition experimental environment further demonstrates the effectiveness and generalization of our algorithm, especially improving the group’s ability of adapting complex missions and completing more difficult missions.
Keywords: Reinforcement learning | Overestimation error | Dual-critic | MADDPG | MATD3
مقاله انگلیسی
7 Cooperative control for multi-player pursuit-evasion games with reinforcement learning
کنترل مشارکتی برای بازی های چند نفره تعقیب و گریز با یادگیری تقویتی-2020
In this paper, we consider a pursuit-evasion game in which multiple pursuers attempt to capture one superior evader. A distributed cooperative pursuit strategy with communication is developed based on reinforcement learning. The centralized critic and distributed actor structure and the learning-based communication mechanism are adopted to solve the cooperative pursuit control problem. Instead of using broadcast to share information among the pursuers, we construct the ring topology network and the leader-follower line topology network for communication, which could significantly reduce the complexity and save the communication and computation resources. The training algorithms for these two network topologies are developed based on the deep deterministic policy gradient algorithm. Furthermore, the proposed approach is implemented in a simulation environment. The training and evaluation results demonstrate that the pursuit team could learn highly efficient cooperative control and communication policies. The pursuers can capture a superior evader driven by an intelligent escape policy with a high success rate.
Keywords: Pursuit-evasion game | Reinforcement learning | Distributed control | Communication network
مقاله انگلیسی
8 Interpretable policies for reinforcement learning by empirical fuzzy sets
سیاست های قابل تفسیر برای یادگیری تقویتی توسط مجموعه های فازی تجربی-2020
This paper proposes a method and an algorithm to implement interpretable fuzzy reinforcement learning (IFRL). It provides alternative solutions to common problems in RL, like function approximation and continuous action space. The learning process resembles that of human beings by clustering the encountered states, developing experiences for each of the typical cases, and making decisions fuzzily. The learned policy can be expressed as human-intelligible IF-THEN rules, which facilitates further investigation and improvement. It adopts the actor–critic architecture whereas being different from mainstream policy gradient methods. The value function is approximated through the fuzzy system AnYa. The state–action space is discretized into a static grid with nodes. Each node is treated as one prototype and corresponds to one fuzzy rule, with the value of the node being the consequent. Values of consequents are updated using the Sarsa(????) algorithm. Probability distribution of optimal actions regarding different states is estimated through Empirical Data Analytics (EDA), Autonomous Learning Multi-Model Systems (ALMMo), and Empirical Fuzzy Sets (εFS). The fuzzy kernel of IFRL avoids the lack of interpretability in other methods based on neural networks. Simulation results with four problems, namely Mountain Car, Continuous Gridworld, Pendulum Position, and Tank Level Control, are presented as a proof of the proposed concept.
Keywords: Interpretable fuzzy systems | Reinforcement learning | Probability distribution learning | Autonomous learning systems | AnYa type fuzzy systems | Empirical Fuzzy Sets
مقاله انگلیسی
9 Continuous control with Stacked Deep Dynamic Recurrent Reinforcement Learning for portfolio optimization
کنترل مداوم با یادگیری تقویتی مجدد پویا عمیق انباشته برای بهینه سازی نمونه کارها-2020
Recurrent reinforcement learning (RRL) techniques have been used to optimize asset trading systems and have achieved outstanding results. However, the majority of the previous work has been dedicated to sys- tems with discrete action spaces. To address the challenge of continuous action and multi-dimensional state spaces, we propose the so called Stacked Deep Dynamic Recurrent Reinforcement Learning (SDDRRL) architecture to construct a real-time optimal portfolio. The algorithm captures the up-to-date market con- ditions and rebalances the portfolio accordingly. Under this general vision, Sharpe ratio, which is one of the most widely accepted measures of risk-adjusted returns, has been used as a performance metric. Ad- ditionally, the performance of most machine learning algorithms highly depends on their hyperparameter settings. Therefore, we equipped SDDRRL with the ability to find the best possible architecture topology using an automated Gaussian Process ( GP ) with Expected Improvement ( EI ) as an acquisition function. This allows us to select the best architectures that maximizes the total return while respecting the car- dinality constraints. Finally, our system was trained and tested in an online manner for 20 successive rounds with data for ten selected stocks from different sectors of the S&P 500 from January 1st, 2013 to July 31st, 2017. The experiments reveal that the proposed SDDRRL achieves superior performance com- pared to three benchmarks: the rolling horizon Mean-Variance Optimization (MVO) model, the rolling horizon risk parity model, and the uniform buy-and-hold (UBAH) index.
Keywords: Reinforcement learning | Policy gradient | Deep learning | Sequential model-based optimization | Financial time series | Portfolio management | Trading systems
مقاله انگلیسی
10 Application of deep reinforcement learning to intrusion detection for supervised problems
کاربرد یادگیری تقویتی عمیق برای تشخیص نفوذ برای مسائل تحت نظارت-2020
The application of new techniques to increase the performance of intrusion detection systems is crucial in modern data networks with a growing threat of cyber-attacks. These attacks impose a greater risk on network services that are increasingly important from a social end economical point of view. In this work we present a novel application of several deep reinforcement learning (DRL) algorithms to intru- sion detection using a labeled dataset. We present how to perform supervised learning based on a DRL framework. The implementation of a reward function aligned with the detection of intrusions is extremely diffi- cult for Intrusion Detection Systems (IDS) since there is no automatic way to identify intrusions. Usually the identification is performed manually and stored in datasets of network features associated with in- trusion events. These datasets are used to train supervised machine learning algorithms for classifying intrusion events. In this paper we apply DRL using two of these datasets: NSL-KDD and AWID datasets. As a novel approach, we have made a conceptual modification of the classic DRL paradigm (based on interaction with a live environment), replacing the environment with a sampling function of recorded training intrusions. This new pseudo-environment, in addition to sampling the training dataset, generates rewards based on detection errors found during training. We present the results of applying our technique to four of the most relevant DRL models: Deep Q- Network (DQN), Double Deep Q-Network (DDQN), Policy Gradient (PG) and Actor-Critic (AC). The best results are obtained for the DDQN algorithm. We show that DRL, with our model and some parameter adjustments, can improve the results of intrusion detection in comparison with current machine learning techniques. Besides, the classifier ob- tained with DRL is faster than alternative models. A comprehensive comparison of the results obtained with other machine learning models is provided for the AWID and NSL-KDD datasets, together with the lessons learned from the application of several design alternatives to the four DRL models.
Keywords: Intrusion detection | Data networks | Deep reinforcement learning
مقاله انگلیسی
rss مقالات ترجمه شده rss مقالات انگلیسی rss کتاب های انگلیسی rss مقالات آموزشی
logo-samandehi
بازدید امروز: 4064 :::::::: بازدید دیروز: 0 :::::::: بازدید کل: 4064 :::::::: افراد آنلاین: 60