Deep reinforcement learning based energy management for a hybrid electric vehicle
مدیریت انرژی مبتنی بر یادگیری تقویت عمیق برای یک وسیله نقلیه الکتریکی هیبریدی-2020
This research proposes a reinforcement learning-based algorithm and a deep reinforcement learningbased algorithm for energy management of a series hybrid electric tracked vehicle. Firstly, the powertrain model of the series hybrid electric tracked vehicle (SHETV) is constructed, then the corresponding energy management formulation is established. Subsequently, a new variant of reinforcement learning (RL) method Dyna, namely Dyna-H, is developed by combining the heuristic planning step with the Dyna agent and is applied to energy management control for SHETV. Its rapidity and optimality are validated by comparing with DP and conventional Dyna method. Facing the problem of the “curse of dimensionality” in the reinforcement learning method, a novel deep reinforcement learning algorithm deep Qlearning (DQL) is designed for energy management control, which uses a new optimization method (AMSGrad) to update the weights of the neural network. Then the proposed deep reinforcement learning control system is trained and verified by the realistic driving condition with high-precision, and is compared with the benchmark method DP and the traditional DQL method. Results show that the proposed deep reinforcement learning method realizes faster training speed and lower fuel consumption than traditional DQL policy does, and its fuel economy quite approximates to global optimum. Furthermore, the adaptability of the proposed method is confirmed in another driving schedule.
Keywords: Hybrid electric tracked vehicle | Energy management | Dyna-H | Deep reinforcement learning | AMSGrad optimizer
A new hybrid ensemble deep reinforcement learning model for wind speed short term forecasting
یک مدل یادگیری تقویتی عمیق گروه ترکیبی جدید برای پیش بینی کوتاه مدت سرعت باد-2020
Wind speed forecasting is a promising solution to improve the efficiency of energy utilization. In this study, a novel hybrid wind speed forecasting model is proposed. The whole modeling process of the proposed model consists of three steps. In stage I, the empirical wavelet transform method reduces the non-stationarity of the original wind speed data by decomposing the original data into several subseries. In stage II, three kinds of deep networks are utilized to build the forecasting model and calculate prediction results of all sub-series, respectively. In stage III, the reinforcement learning method is used to combine three kinds of deep networks. The forecasting results of each sub-series are combined to obtain the final forecasting results. By comparing all the results of the predictions over three different types of wind speed series, it can be concluded that: (a) the proposed reinforcement learning based ensemble method is effective in integrating three kinds of deep network and works better than traditional optimization based ensemble method; (b) the proposed ensemble deep reinforcement learning based wind speed prediction model can get accurate results in all cases and provide the best accuracy compared with sixteen alternative models and three state-of-the-art models.
Keywords: Wind speed forecasting | Ensemble deep reinforcement learning | Empirical wavelet transform | Hybrid wind speed forecasting model
A deep reinforcement learning approach for real-time sensor-driven decision making and predictive analytics
یک رویکرد یادگیری تقویتی عمیق برای تصمیم گیری در زمان واقعی مبتنی بر حسگر و تجزیه و تحلیل پیش بینی-2020
The increased complexity of sensor-intensive systems with expensive subsystems and costly repairs and failures calls for efficient real-time control and decision making policies. Deep reinforcement learning has demonstrated great potential in addressing highly complex and challenging control and decision making problems. Despite its potential to derive real-time policies using real-time data for dynamic systems, it has been rarely used for sensordriven maintenance related problems. In this paper, we propose two novel decision making methods in which reinforcement learning and particle filtering are utilized for (i) deriving real-time maintenance policies and (ii) estimating remaining useful life for sensor-monitored degrading systems. The proposed framework introduces a new direction with many potential opportunities for system monitoring. To demonstrate the effectiveness of the proposed methods, numerical experiments are provided from a set of simulated data and a turbofan engine dataset provided by NASA.
Keywords: Particle filters | Deep reinforcement learning | Real-time control | Decision-making | Remaining useful life estimation
Portfolio trading system of digital currencies: A deep reinforcement learning with multidimensional attention gating mechanism
سیستم تجارت اوراق بهادار ارزهای دیجیتال: یک یادگیری تقویتی عمیق با مکانیسم توجه چند بعدی-2020
As a hot topic in the financial engineering, the portfolio optimization aims to increase investors’ wealth. In this paper, a portfolio management system based on deep-reinforcement learning is proposed. In con- trast to inflexible traditional methods, the proposed system achieves a better trading strategy through Reinforcement learning. The reward signal of Reinforcement learning is updated by action weights from Deep learning networks. Low price, high price and close price constitute the inputs, but the importance of these three features is quite different. Traditional methods and the classical CNN can’t deal with these three features separately, but in our method, a designed depth convolution is proposed to deal with these three features separately. In a virtual currency market, the price rise only occurs in a flash. Traditional methods and CNN networks can’t accurately judge the critical time. In order to solve this problem, a three-dimensional attention gating network is proposed and it gives higher weights on rising moments and assets. Under different market conditions, the proposed system achieves more substantial returns and greatly improves the Sharpe ratios. The short-term risk index of the proposed system is lower than those of the traditional algorithms. Simulation results show that the traditional algorithms (including Best, CRP, PAMR, CWMR and CNN) are unable to perform as well as our approach.
Keywords: Portfolio | Deep-reinforcement learning | Reinforcement learning | Attention gating mechanism
Multi-attention deep reinforcement learning and re-ranking for vehicle re-identification
یادگیری تقویتی عمیق چند منظوره و رتبه بندی مجدد برای شناسایی مجدد خودرو-2020
For solving the vehicle Re-identification (Re-ID) task, we need to focus our attention on the details with arbitrary size in the image, and it’s tough to locate these details accurately. In this paper, we propose a Multi-Attention Deep Reinforcement Learning (MADRL) model to focus on multi-attentional subregions that spreading randomly in the image, and extract the discriminative features for the Re-ID task. First, we obtain multiple attentions from the representative features, then group the feature channels into different parts, then train a deep reinforcement learning model to learn more accurate positions of these fine-grained details with different losses. Unlike existing models with complex strategies to keep the patch-matching constrains, our MADRL model can automatically locate the matching patches (multiattentional subregions) in different vehicle images with the same identification (ID). Furthermore, based on the fine-grained attention and global features we re-calculate the distance between the inter- and intra- classes, and we get better re-ranking results. Compared with state-of-the-art methods on three large-scale vehicle Re-ID datasets, our algorithm greatly improves the performance of vehicle Re-ID.
Keywords: Re-identification | Deep reinforcement learning | Multi-attention | Re-ranking
Energy-aware resource management for uplink non-orthogonal multiple access: Multi-agent deep reinforcement learning
مدیریت منابع آگاه در زمینه انرژی برای دسترسی چندگانه غیر متعاملی به هم پیوسته: یادگیری تقویت عمیق چند عامل-2020
Non-orthogonal multiple access (NOMA) is one of the promising technologies to meet the huge access demand and the high data rate requirements of the next generation networks. In this paper, we investigate the joint subchannel assignment and power allocation problem in an uplink multi-user NOMA system to maximize the energy efficiency (EE) while ensuring the quality-of-service (QoS) of all users. Different from conventional model-based resource allocation methods, we propose two deep reinforcement learning (DRL) based frameworks to solve this non-convex and dynamic optimization problem, referred to as discrete DRL based resource allocation (DDRA) framework and continuous DRL based resource allocation (CDRA) framework. Specifically, for the DDRA framework, we use a deep Q network (DQN) to output the optimum subchannel assignment policy, and design a distributed and discretized multi-DQN based network to allocate the corresponding transmit power of all users. For the CDRA framework, we design a joint DQN and deep deterministic policy gradient (DDPG) based network to generate the optimal subchannel assignment and power allocation policy. The entire resource allocation policies of these two frameworks are adjusted by updating the weights of their neural networks according to feedback of the system. Numerical results show that the proposed DRLbased resource allocation frameworks can significantly improve the EE of the whole NOMA system compared with other approaches. The proposed DRL based frameworks can provide good performance in various moving speed scenarios through adjusting learning parameters.
Keywords: Non-orthogonal multiple access | Resource allocation | Energy efficiency | Deep reinforcement learning | Deep deterministic policy gradient
Reinforcement learning in sustainable energy and electric systems: a survey
یادگیری تقویتی در سیستم های انرژی پایدار و الکتریکی: یک نظرسنجی-2020
The dynamic nature of sustainable energy and electric systems can vary significantly along with the en- vironment and load change, and they represent the features of multivariate, high complexity and uncer- tainty of the nonlinear system. Moreover, the integration of intermittent renewable energy sources and energy consumption behaviours of households introduce more uncertainty into sustainable energy and electric systems. The operation, control and decision-making in such an environment definitely require increasing intelligence and flexibility in the control and optimization to ensure the quality of service of sustainable energy and electric systems. Reinforcement learning is a wide class of optimal control strate- gies that uses estimating value functions from experience, simulation, or search to learn in highly dy- namic, stochastic environment. The interactive context enables reinforcement learning to develop strong learning ability and high adaptability. Reinforcement learning does not require the use of the model of system dynamics, which makes it suitable for sustainable energy and electric systems with complex non- linearity and uncertainty. The use of reinforcement learning in sustainable energy and electric systems will certainly change the traditional energy utilization mode and bring more intelligence into the system. In this survey, an overview of reinforcement learning, the demand for reinforcement learning in sustain- able energy and electric systems, reinforcement learning applications in sustainable energy and electric systems, and future challenges and opportunities will be explicitly addressed.
Keywords: Reinforcement learning | Sustainable energy and electric systems | Deep reinforcement learning | Power system | Integrated energy system
SmartFCT: Improving power-efficiency for data center networks with deep reinforcement learning
SmartFCT: بهبود بهره وری انرژی برای شبکه های مرکز داده با یادگیری تقویتی عمیق-2020
Reducing the power consumption of Data Center Networks (DCNs) and guaranteeing the Flow Completion Time (FCT) of applications in DCNs are two major concerns for data center operators. However, existing works cannot realize the two goals together because of two issues: (1) dynamic traffic pattern in DCNs is hard to accurately model; (2) an optimal flow scheduling scheme is computationally expensive. In this paper, we propose SmartFCT, which employs the Deep Reinforcement Learning (DRL) coupled with Software-Defined Networking (SDN) to improve the power efficiency of DCNs and guarantee FCT. SmartFCT dynamically collects traffic distribution from switches to train its DRL model. The well-trained DRL agent of SmartFCT can quickly analyze the complicated traffic characteristics using neural networks and adaptively gen- erate a action for scheduling flows and deliberately configuring margins for different links. Following the gen- erated action, flows are consolidated into a few of active links and switches for saving power, and fine-grained margin configuration for active links avoids FCT violation of unexpected flow bursts. Simulation results show that SmartFCT can guarantee FCT and save up to 12.2% power consumption, compared with the state-of-the-art solutions.
Keywords: Data center networks | Software-Defined networking | Power efficiency | Flow completion time | Deep reinforcement learning
Negotiating team formation using deep reinforcement learning
مذاکره در مورد تشکیل تیم با استفاده از یادگیری تقویت عمیق-2020
When autonomous agents interact in the same environment, they must often cooperate to achieve their goals. One way for agents to cooperate effectively is to form a team, make a binding agreement on a joint plan, and execute it. However, when agents are self-interested, the gains from team formation must be allocated appropriately to incentivize agreement. Various approaches for multi-agent negotiation have been proposed, but typically only work for particular negotiation protocols. More general methods usually require human input or domain-specific data, and so do not scale. To address this, we propose a framework for training agents to negotiate and form teams using deep reinforcement learning. Importantly, our method makes no assumptions about the specific negotiation protocol, and is instead completely experience driven. We evaluate our approach on both non-spatial and spatially extended team-formation negotiation environments, demonstrating that our agents beat hand-crafted bots and reach negotiation outcomes consistent with fair solutions predicted by cooperative game theory. Additionally, we investigate how the physical location of agents influences negotiation outcomes.
Keywords: Multi-agent systems | Team formation | Coalition formation | Reinforcement learning | Deep learning | Cooperative games | Shapley value
A Deep Reinforcement Learning-Based On-Demand Charging Algorithm for Wireless Rechargeable Sensor Networks
الگوریتم شارژ تقاضایی مبتنی بر یادگیری تقویتی عمیق برای شبکه های حسگر قابل شارژ بی سیم-2020
Wireless rechargeable sensor networks are widely used in many fields. However, the limited battery capacity of sensor nodes hinders its development. With the help of wireless energy transfer technology, employing a mobile charger to charge sensor nodes wirelessly has become a promising technology for prolonging the lifetime of wireless sensor networks. Considering that the energy consumption rate varies significantly among sensors, we need a better way to model the charging demand of each sensor, such that the sensors are able to be charged multiple times in one charging tour. Therefore, time window is used to represent charging demand. In order to allow the mobile charger to respond to these charging demands in time and transfer more energy to the sensors, we introduce a new metric: charging reward. This new metric enables us to measure the quality of sensor charging. And then, we study the problem of how to schedule the mobile charger to replenish the energy supply of sensors, such that the sum of charging rewards collected by mobile charger on its charging tour is maximized. The sum of the collected charging reward is subject to the energy capacity constraint on the mobile charger and the charging time windows of all sensor nodes. We first prove that this problem is NP-hard. Due to the complexity of the problem, then deep reinforcement learning technique is exploited to obtain the moving path for mobile charger. Finally, experimental simulations are conducted to evaluate the performance of the proposed charging algorithm, and the results show that the proposed scheme is very promising.
Keywords: wireless rechargeable sensor networks | time window | mobile charging | deep reinforcement learning technique