Optimal carbon storage reservoir management through deep reinforcement learning
مدیریت بهینه ذخیره مخزن کربن از طریق یادگیری تقویتی عمیق-2020
Model-based optimization plays a central role in energy system design and management. The complexity and high-dimensionality of many process-level models, especially those used for geosystem energy exploration and utilization, often lead to formidable computational costs when the dimension of decision space is also large. This work adopts elements of recently advanced deep learning techniques to solve a sequential decisionmaking problem in applied geosystem management. Specifically, a deep reinforcement learning framework was formed for optimal multiperiod planning, in which a deep Q-learning network (DQN) agent was trained to maximize rewards by learning from high-dimensional inputs and from exploitation of its past experiences. To expedite computation, deep multitask learning was used to approximate high-dimensional, multistate transition functions. Both DQN and deep multitask learning are pattern based. As a demonstration, the framework was applied to optimal carbon sequestration reservoir planning using two different types of management strategies: monitoring only and brine extraction. Both strategies are designed to mitigate potential risks due to pressure buildup. Results show that the DQN agent can identify the optimal policies to maximize the reward for given risk and cost constraints. Experiments also show that knowledge the agent gained from interacting with one environment is largely preserved when deploying the same agent in other similar environments.
Keywords: Reinforcement learning | Multistage decision-making | Deep autoregressive model | Deep Q network | Surrogate modeling | Markov decision process | Geological carbon sequestration
Towards optimal control of air handling units using deep reinforcement learning and recurrent neural network
به سمت کنترل بهینه واحدهای مدیریت هوا با استفاده از یادگیری تقویتی عمیق و شبکه عصبی بازگشتی -2020
A new generation of smart stormwater systems promises to reduce the need for new construction by enhancing the performance of the existing infrastructure through real-time control. Smart stormwater systems dynamically adapt their response to individual storms by controlling distributed assets, such as valves, gates, and pumps. This paper introduces a real-time control approach based on Reinforcement Learning (RL), which has emerged as a state-of-the-art methodology for autonomous control in the artificial intelligence community. Using a Deep Neu- ral Network, a RL-based controller learns a control strategy by interacting with the system it controls - effectively trying various control strategies until converging on those that achieve a desired objective. This paper formulates and implements a RL algorithm for the real-time control of urban stormwater systems. This algorithm trains a RL agent to control valves in a distributed stormwater system across thousands of simulated storm scenarios, seeking to achieve water level and flow set-points in the system. The algorithm is first evaluated for the control of an individual stormwater basin, after which it is adapted to the control of multiple basins in a larger watershed (4 km 2 ). The results indicate that RL can very effectively control individual sites. Performance is highly sensitive to the reward formulation of the RL agent. Generally, more explicit guidance led to better control performance, and more rapid and stable convergence of the learning process. While the control of multiple distributed sites also shows promise in reducing flooding and peak flows, the complexity of controlling larger systems comes with a number of caveats. The RL controller’s performance is very sensitive to the formulation of the Deep Neural Network and requires a significant amount of computational resource to achieve a reasonable performance en- hancement. Overall, the controlled system significantly outperforms the uncontrolled system, especially across storms of high intensity and duration. A frank discussion is provided, which should allow the benefits and draw- backs of RL to be considered when implementing it for the real-time control of stormwater systems. An open source implementation of the full simulation environment and control algorithms is also provided.
Keywords: Real-time control | Reinforcement learning | Smart stormwater systems
Rule-interposing deep reinforcement learning based energy management strategy for power-split hybrid electric vehicle
استراتژی مدیریت انرژی مبتنی بر یادگیری تقویتی عمیق قانون برای خودروی الکتریکی هیبریدی تقسیم برق-2020
The optimization and training processes of deep reinforcement learning (DRL) based energy management strategy (EMS) can be very slow and resource-intensive. In this paper, an improved energy management framework that embeds expert knowledge into deep deterministic policy gradient (DDPG) is proposed. Incorporated with the battery characteristics and the optimal brake specific fuel consumption (BSFC) curve of hybrid electric vehicles (HEVs), we are committed to solving the optimization problem of multi-objective energy management with a large space of control variables. By incorporating this prior knowledge, the proposed framework not only accelerates the learning process, but also gets a better fuel economy, thus making the energy management system relatively stable. The experimental results show that the proposed EMS outperforms the one without prior knowledge and the other state-of-art deep reinforcement learning approaches. In addition, the proposed approach can be easily generalized to other types of HEV EMSs.
Keywords: Energy management strategy | Hybrid electric vehicle | Expert knowledge | Deep deterministic policy gradient | Continuous action space
Deep reinforcement learning based energy management for a hybrid electric vehicle
مدیریت انرژی مبتنی بر یادگیری تقویت عمیق برای یک وسیله نقلیه الکتریکی هیبریدی-2020
This research proposes a reinforcement learning-based algorithm and a deep reinforcement learningbased algorithm for energy management of a series hybrid electric tracked vehicle. Firstly, the powertrain model of the series hybrid electric tracked vehicle (SHETV) is constructed, then the corresponding energy management formulation is established. Subsequently, a new variant of reinforcement learning (RL) method Dyna, namely Dyna-H, is developed by combining the heuristic planning step with the Dyna agent and is applied to energy management control for SHETV. Its rapidity and optimality are validated by comparing with DP and conventional Dyna method. Facing the problem of the “curse of dimensionality” in the reinforcement learning method, a novel deep reinforcement learning algorithm deep Qlearning (DQL) is designed for energy management control, which uses a new optimization method (AMSGrad) to update the weights of the neural network. Then the proposed deep reinforcement learning control system is trained and verified by the realistic driving condition with high-precision, and is compared with the benchmark method DP and the traditional DQL method. Results show that the proposed deep reinforcement learning method realizes faster training speed and lower fuel consumption than traditional DQL policy does, and its fuel economy quite approximates to global optimum. Furthermore, the adaptability of the proposed method is confirmed in another driving schedule.
Keywords: Hybrid electric tracked vehicle | Energy management | Dyna-H | Deep reinforcement learning | AMSGrad optimizer
A new hybrid ensemble deep reinforcement learning model for wind speed short term forecasting
یک مدل یادگیری تقویتی عمیق گروه ترکیبی جدید برای پیش بینی کوتاه مدت سرعت باد-2020
Wind speed forecasting is a promising solution to improve the efficiency of energy utilization. In this study, a novel hybrid wind speed forecasting model is proposed. The whole modeling process of the proposed model consists of three steps. In stage I, the empirical wavelet transform method reduces the non-stationarity of the original wind speed data by decomposing the original data into several subseries. In stage II, three kinds of deep networks are utilized to build the forecasting model and calculate prediction results of all sub-series, respectively. In stage III, the reinforcement learning method is used to combine three kinds of deep networks. The forecasting results of each sub-series are combined to obtain the final forecasting results. By comparing all the results of the predictions over three different types of wind speed series, it can be concluded that: (a) the proposed reinforcement learning based ensemble method is effective in integrating three kinds of deep network and works better than traditional optimization based ensemble method; (b) the proposed ensemble deep reinforcement learning based wind speed prediction model can get accurate results in all cases and provide the best accuracy compared with sixteen alternative models and three state-of-the-art models.
Keywords: Wind speed forecasting | Ensemble deep reinforcement learning | Empirical wavelet transform | Hybrid wind speed forecasting model
A deep reinforcement learning approach for real-time sensor-driven decision making and predictive analytics
یک رویکرد یادگیری تقویتی عمیق برای تصمیم گیری در زمان واقعی مبتنی بر حسگر و تجزیه و تحلیل پیش بینی-2020
The increased complexity of sensor-intensive systems with expensive subsystems and costly repairs and failures calls for efficient real-time control and decision making policies. Deep reinforcement learning has demonstrated great potential in addressing highly complex and challenging control and decision making problems. Despite its potential to derive real-time policies using real-time data for dynamic systems, it has been rarely used for sensordriven maintenance related problems. In this paper, we propose two novel decision making methods in which reinforcement learning and particle filtering are utilized for (i) deriving real-time maintenance policies and (ii) estimating remaining useful life for sensor-monitored degrading systems. The proposed framework introduces a new direction with many potential opportunities for system monitoring. To demonstrate the effectiveness of the proposed methods, numerical experiments are provided from a set of simulated data and a turbofan engine dataset provided by NASA.
Keywords: Particle filters | Deep reinforcement learning | Real-time control | Decision-making | Remaining useful life estimation
Deep reinforcement learning based AGVs real-time scheduling with mixed rule for flexible shop floor in industry 4.0
زمانبندی مبتنی بر یادگیری تقویتی عمیق مبتنی بر AGV با قاعده مختلط برای کف انعطاف پذیر در صنعت 4.0-2020
Driven by the recent advances in industry 4.0 and industrial artificial intelligence, Automated Guided Vehicles (AGVs) has been widely used in flexible shop floor for material handling. However, great challenges aroused by the high dynamics, complexity, and uncertainty of the shop floor environment still exists on AGVs real-time scheduling. To address these challenges, an adaptive deep reinforcement learning (DRL) based AGVs real-time scheduling approach with mixed rule is proposed to the flexible shop floor to minimize the makespan and delay ratio. Firstly, the problem of AGVs real-time scheduling is formulated as a Markov Decision Process (MDP) in which state representation, action representation, reward function, and optimal mixed rule policy, are described in detail. Then a novel deep q-network (DQN) method is further developed to achieve the optimal mixed rule policy with which the suitable dispatching rules and AGVs can be selected to execute the scheduling towards various states. Finally, the case study based on a real-world flexible shop floor is illustrated and the results validate the feasibility and effectiveness of the proposed approach.
Keywords: Automated guided vehicles | Real-time scheduling | Deep reinforcement learning | Industry 4.0
Dynamic selective maintenance optimization for multi-state systems over a finite horizon: A deep reinforcement learning approach
بهینه سازی تعمیر و نگهداری انتخابی پویا برای سیستم های چند حالته در یک افق محدود: یک رویکرد یادگیری تقویتی عمیق-2020
Selective maintenance, which aims to choose a subset of feasible maintenance actions to be performed for a repairable system with limited maintenance resources, has been extensively studied over the past decade. Most of the reported works on selective maintenance have been dedicated to maximizing the success of a single future mission. Cases of multiple consecutive missions, which are oftentimes encoun- tered in engineering practices, have been rarely investigated to date. In this paper, a new selective main- tenance optimization for multi-state systems that can execute multiple consecutive missions over a finite horizon is developed. The selective maintenance strategy can be dynamically optimized to maximize the expected number of future mission successes whenever the states and effective ages of the components become known at the end of the last mission. The dynamic optimization problem, which accounts for imperfect maintenance, is formulated as a discrete-time finite-horizon Markov decision process with a mixed integer-discrete-continuous state space. Based on the framework of actor-critic algorithms, a cus- tomized deep reinforcement learning method is put forth to overcome the “curse of dimensionality”and mitigate the uncountable state space. In our proposed method, a postprocess is developed for the actor to search the optimal maintenance actions in a large-scale discrete action space, whereas the techniques of the experience replay and the target network are utilized to facilitate the agent training. The perfor- mance of the proposed method is examined by an illustrative example and an engineering example of a coal transportation system.
Keywords: Maintenance | Dynamic selective maintenance | Deep reinforcement learning | Imperfect maintenance | Multi-state system
Study on deep reinforcement learning techniques for building energy consumption forecasting
مطالعه تکنیک های یادگیری تقویتی عمیق برای پیش بینی مصرف انرژی در ساخت-2020
Reliable and accurate building energy consumption prediction is becoming increasingly pivotal in build- ing energy management. Currently, data-driven approach has shown promising performances and gained lots of research attention due to its efficiency and flexibility. As a combination of reinforcement learning and deep learning, deep reinforcement learning (DRL) techniques are expected to solve nonlinear and complex issues. However, very little is known about DRL techniques in forecasting building energy con- sumption. Therefore, this paper presents a case study of an office building using three commonly-used DRL techniques to forecast building energy consumption, namely Asynchronous Advantage Actor-Critic (A3C), Deep Deterministic Policy Gradient (DDPG) and Recurrent Deterministic Policy Gradient (RDPG). The objective is to investigate the potential of DRL techniques in building energy consumption predic- tion field. A comprehensive comparison between DRL models and common supervised models is also provided. The results demonstrate that the proposed DDPG and RDPG models have obvious advantages in forecast- ing building energy consumption compared to common supervised models, while accounting for more computation time for model training. Their prediction performances measured by mean absolute error (MAE) can be improved by 16%-24% for single-step ahead prediction, and 19%-32% for multi-step ahead prediction. The results also indicate that A3C performs poor prediction accuracy and shows much slower convergence speed than DDPG and RDPG. However, A3C is still the most efficient technique among these three DRL methods. The findings are enlightening and the proposed DRL methodologies can be positively extended to other prediction problems, e.g., wind speed prediction and electricity load prediction.
Keywords: Energy consumption prediction | Ground source heat pump | Deep reinforcement learning | Asynchronous advantage Actor-Critic | Deep deterministic Policy gradient | Recurrent deterministic Policy gradient
Deep reinforcement learning to optimise indoor temperature control and heating energy consumption in buildings
یادگیری تقویتی عمیق برای بهینه سازی کنترل دمای داخلی و مصرف انرژی گرمایشی در ساخت-2020
In this work, Deep Reinforcement Learning (DRL) is implemented to control the supply water temperature setpoint to terminal units of a heating system. The experiment was carried out for an office building in an integrated simulation environment. A sensitivity analysis is carried out on relevant hyperparameters to identify their optimal configuration. Moreover, two sets of input variables were considered for assessing their impact on the adaptability capabilities of the DRL controller. In this context a static and dynamic deployment of the DRL controller is performed. The trained control agent is tested for four different scenarios to determine its adaptability to the variation of forcing variables such as weather conditions, occupant presence patterns and different indoor temperature setpoint requirements. The performance of the agent is evaluated against a reference controller that implements a combination of rule-based and climatic-based logics. As a result, when the set of variables are adequately selected a heating energy saving ranging between 5 and 12% is obtained with an enhanced indoor temperature control with both static and dynamic deployment. Eventually the study proves that if the set of input variables are not carefully selected a dynamic deployment is strictly required for obtaining good performance.
Keywords: Deep reinforcement learning | Building adaptive control | Energy efficiency | Temperature control | HVAC