دانلود و نمایش مقالات مرتبط با Markov decision process::صفحه 1
دانلود بهترین مقالات isi همراه با ترجمه فارسی 2

با سلام خدمت کاربران در صورتی که با خطای سیستم پرداخت بانکی مواجه شدید از طریق کارت به کارت (6037997535328901 بانک ملی ناصر خنجری ) مقاله خود را دریافت کنید (تا مشکل رفع گردد). 

نتیجه جستجو - Markov decision process

تعداد مقالات یافته شده: 38
ردیف عنوان نوع
1 Deep Q learning based secure routing approach for OppIoT networks
رویکرد مسیریابی ایمن مبتنی بر یادگیری Q برای شبکه های OppIoT-2022
Opportunistic IoT (OppIoT) networks are a branch of IoT where the human and machines collaborate to form a network for sharing data. The broad spectrum of devices and ad-hoc nature of connections, further alleviate the problem of network and data security. Traditional approaches like trust based approaches or cryptographic approaches fail to preemptively secure these networks. Machine learning (ML) approaches, mainly deep reinforcement learning (DRL) methods can prove to be very effective in ensuring the security of the network as they are profoundly capable of solving complex and dynamic problems. Deep Q-learning (DQL) incorporates deep neural network in the Q learning process for dealing with high-dimensional data. This paper proposes a routing approach for OppIoT, DQNSec, based on DQL for securing the network against attacks viz. sinkhole, hello flood and distributed denial of service attack. The actor–critic approach of DQL is utilized and OppIoT is modeled as a Markov decision process (MDP). Extensive simulations prove the efficiency of DQNSec in comparison to other ML based routing protocols, viz. RFCSec, RLProph, CAML and MLProph.
Keywords: OppIoT | Reinforcement learning | Deep learning | Deep Q-learning | Markov decision process | Sinkhole attack | Hello flood attack | Distributed denial of service attack
مقاله انگلیسی
2 Optimal carbon storage reservoir management through deep reinforcement learning
مدیریت بهینه ذخیره مخزن کربن از طریق یادگیری تقویتی عمیق-2020
Model-based optimization plays a central role in energy system design and management. The complexity and high-dimensionality of many process-level models, especially those used for geosystem energy exploration and utilization, often lead to formidable computational costs when the dimension of decision space is also large. This work adopts elements of recently advanced deep learning techniques to solve a sequential decisionmaking problem in applied geosystem management. Specifically, a deep reinforcement learning framework was formed for optimal multiperiod planning, in which a deep Q-learning network (DQN) agent was trained to maximize rewards by learning from high-dimensional inputs and from exploitation of its past experiences. To expedite computation, deep multitask learning was used to approximate high-dimensional, multistate transition functions. Both DQN and deep multitask learning are pattern based. As a demonstration, the framework was applied to optimal carbon sequestration reservoir planning using two different types of management strategies: monitoring only and brine extraction. Both strategies are designed to mitigate potential risks due to pressure buildup. Results show that the DQN agent can identify the optimal policies to maximize the reward for given risk and cost constraints. Experiments also show that knowledge the agent gained from interacting with one environment is largely preserved when deploying the same agent in other similar environments.
Keywords: Reinforcement learning | Multistage decision-making | Deep autoregressive model | Deep Q network | Surrogate modeling | Markov decision process | Geological carbon sequestration
مقاله انگلیسی
3 Deep reinforcement learning based AGVs real-time scheduling with mixed rule for flexible shop floor in industry 4.0
زمانبندی مبتنی بر یادگیری تقویتی عمیق مبتنی بر AGV با قاعده مختلط برای کف انعطاف پذیر در صنعت 4.0-2020
Driven by the recent advances in industry 4.0 and industrial artificial intelligence, Automated Guided Vehicles (AGVs) has been widely used in flexible shop floor for material handling. However, great challenges aroused by the high dynamics, complexity, and uncertainty of the shop floor environment still exists on AGVs real-time scheduling. To address these challenges, an adaptive deep reinforcement learning (DRL) based AGVs real-time scheduling approach with mixed rule is proposed to the flexible shop floor to minimize the makespan and delay ratio. Firstly, the problem of AGVs real-time scheduling is formulated as a Markov Decision Process (MDP) in which state representation, action representation, reward function, and optimal mixed rule policy, are described in detail. Then a novel deep q-network (DQN) method is further developed to achieve the optimal mixed rule policy with which the suitable dispatching rules and AGVs can be selected to execute the scheduling towards various states. Finally, the case study based on a real-world flexible shop floor is illustrated and the results validate the feasibility and effectiveness of the proposed approach.
Keywords: Automated guided vehicles | Real-time scheduling | Deep reinforcement learning | Industry 4.0
مقاله انگلیسی
4 Dynamic selective maintenance optimization for multi-state systems over a finite horizon: A deep reinforcement learning approach
بهینه سازی تعمیر و نگهداری انتخابی پویا برای سیستم های چند حالته در یک افق محدود: یک رویکرد یادگیری تقویتی عمیق-2020
Selective maintenance, which aims to choose a subset of feasible maintenance actions to be performed for a repairable system with limited maintenance resources, has been extensively studied over the past decade. Most of the reported works on selective maintenance have been dedicated to maximizing the success of a single future mission. Cases of multiple consecutive missions, which are oftentimes encoun- tered in engineering practices, have been rarely investigated to date. In this paper, a new selective main- tenance optimization for multi-state systems that can execute multiple consecutive missions over a finite horizon is developed. The selective maintenance strategy can be dynamically optimized to maximize the expected number of future mission successes whenever the states and effective ages of the components become known at the end of the last mission. The dynamic optimization problem, which accounts for imperfect maintenance, is formulated as a discrete-time finite-horizon Markov decision process with a mixed integer-discrete-continuous state space. Based on the framework of actor-critic algorithms, a cus- tomized deep reinforcement learning method is put forth to overcome the “curse of dimensionality”and mitigate the uncountable state space. In our proposed method, a postprocess is developed for the actor to search the optimal maintenance actions in a large-scale discrete action space, whereas the techniques of the experience replay and the target network are utilized to facilitate the agent training. The perfor- mance of the proposed method is examined by an illustrative example and an engineering example of a coal transportation system.
Keywords: Maintenance | Dynamic selective maintenance | Deep reinforcement learning | Imperfect maintenance | Multi-state system
مقاله انگلیسی
5 PALO Bounds for Reinforcement Learning in Partially Observable Stochastic Games
مرزهای PALO برای یادگیری تقویتی در بازی های تصادفی تا حدی قابل مشاهده-2020
A partially observable stochastic game (POSG) is a general model for multiagent de- cision making under uncertainty. Perkins’ Monte Carlo exploring starts for partially observable Markov decision process (POMDP) (MCES-P) integrates Monte Carlo ex- ploring starts (MCES) into a local search of the policy space to offer an elegant template for model-free reinforcement learning in POSGs. However, multiagent reinforcement learning in POSGs is tremendously more complex than in single agent settings due to the heterogeneity of agents and discrepancy of their goals. In this article, we generalize reinforcement learning under partial observability to self-interested and cooperative multiagent settings under the POSG umbrella. We present three new templates for multiagent reinforcement learning in POSGs. MCES for interactive POMDP (MCESIP ) extends MCES-P by maintaining predictions of the other agent’s actions based on dynamic beliefs over models. MCES for multiagent POMDP (MCES-MP) generalizes MCES-P to the canonical multiagent POMDP framework, with a single policy mapping joint observations of all agents to joint actions. Finally, MCES for factored-reward multiagent POMDP (MCES-FMP) has each agent individually mapping joint obser- vations to their own action. We use probabilistic approximate locally optimal (PALO) bounds to analyze sample complexity, thereby instantiating these templates to PALO learning. We promote sample efficiency by including a policy space pruning technique and evaluate the approaches on six benchmark domains as well as compare with the state-of-the-art techniques, which demonstrates that MCES-IP and MCES-FMP yield improved policies with fewer samples compared to the previous baselines.
Keywords: multiagent systems | reinforcement learning | POMDP | POSG
مقاله انگلیسی
6 Adaptive early classification of temporal sequences using deep reinforcement learning
طبقه بندی اولیه انطباقی توالی های زمانی با استفاده از یادگیری تقویتی عمیق-2020
In this article, we address the problem of early classification (EC) of temporal sequences with adaptive prediction times. We frame EC as a sequential decision making problem and we define a partially observable Markov decision process (POMDP) fitting the competitive objectives of classification earliness and accuracy. We solve the POMDP by training an agent for EC with deep reinforcement learning (DRL). The agent learns to make adaptive decisions between classifying incomplete sequences now or delaying its prediction to gather more measurements. We adapt an existing DRL algorithm for batch and online learning of the agent’s action value function with a deep neural network. We propose strategies of prioritized sampling, prioritized storing and random episode initialization to address the fact that the agent’s memory is unbalanced due to (1): all but one of its actions terminate the process and thus (2): actions of classification are less frequent than the action of delay. In experiments, we show improvements in accuracy induced by our specific adaptation of the algorithm used for online learning of the agent’s action value function. Moreover, we compare two definitions of the POMDP based on delay reward shaping against reward discounting. Finally, we demonstrate that a static naive deep neural network, i.e. trained to classify at static times, is less efficient in terms of accuracy against speed than the equivalent network trained with adaptive decision making capabilities
Keywords: Early classification | Adaptive prediction time | Deep reinforcement learning | Temporal sequences | Double DQN | Trade-off between accuracy vs. speed
مقاله انگلیسی
7 ADRL: An attention-based deep reinforcement learning framework for knowledge graph reasoning
ADRL: یک چارچوب یادگیری تقویتی عمیق مبتنی بر توجه برای استدلال نمودار دانش-2020
Knowledge graph reasoning is one of the key technologies for knowledge graph construction, which plays an important part in application scenarios such as vertical search and intelligent question answering. It is intended to infer the desired entity from the entities and relations that already exist in the knowledge graph. Most current methods for reasoning, such as embedding-based methods, globally embed all entities and relations, and then use the similarity of vectors to infer relations between entities or whether given triples are true. However, in real application scenarios, we require a clear and interpretable target entity as the output answer. In this paper, we propose a novel attention-based deep reinforcement learning framework (ADRL) for learning multi-hop relational paths, which improves the efficiency, generalization capacity, and interpretability of conventional approaches through the structured perception of deep learning and relational reasoning of reinforcement learning. We define the entire process of reasoning as a Markov decision process. First, we employ CNN to map the knowledge graph to a low-dimensional space, and a message-passing mechanism to sense neighbor entities at each level, and then employ LSTM to memorize and generate a sequence of historical trajectories to form a policy and value functions. We design a relational module that includes a selfattention mechanism that can infer and share the weights of neighborhood entity vectors and relation vectors. Finally, we employ the actor–critic algorithm to optimize the entire framework. Experiments confirm the effectiveness and efficiency of our method on several benchmark data sets.
Keywords: Knowledge graph | Knowledge reasoning | Reinforcement learning | Deep learning | Attention
مقاله انگلیسی
8 Deep reinforcement learning for condition-based maintenance planning of multi-component systems under dependent competing risks
یادگیری تقویتی عمیق برای برنامه ریزی نگهداری مبتنی بر شرایط سیستم های چند جزئی تحت خطرات رقابت وابسته-2020
Condition-Based Maintenance (CBM) planning for multi-component systems has been receiving increasing attention in recent years. Most existing research on CBM assumes that preventive maintenances should be conducted when the degradations of system components reach specific threshold levels upon inspection. However, the search of optimal maintenance threshold levels is often efficient for low-dimensional CBM but becomes challenging if the number of components gets large, especially when those components are subject to complex dependencies. To overcome the challenge, in this paper we propose a novel and flexible CBM model based on a customized deep reinforcement learning for multi-component systems with dependent competing risks. Both stochastic and economic dependencies among the components are considered. Specifically, different from the threshold-based decision making paradigm used in traditional CBM, the proposed model directly maps the multicomponent degradation measurements at each inspection epoch to the maintenance decision space with a cost minimization objective, and the leverage of deep reinforcement learning enables high computational efficiencies and thus makes the proposed model suitable for both low and high dimensional CBM. Various numerical studies are conducted for model validations.
Keywords: Maintenance | Markov decision process | Deep Q network | Failure dependency | Cost minimization
مقاله انگلیسی
9 A Markov Decision Process approach for balancing intelligence and interdiction operations in city-level drug trafficking enforcement
یک رویکرد فرآیند تصمیم گیری مارکوف برای توازن اطلاعات و عملیات ممنوعیت در اجرای قاچاق مواد مخدر در سطح شهر-2020
We study a resource allocation problem in which law enforcement aims to balance intelligence and interdiction decisions to fight against illegal city-level drug trafficking. We propose a Markov Decision Process framework, apply a column generation technique, and develop a heuristic to solve this problem. Our approaches provide insights into how law enforcement should prioritize its actions when there are multiple criminals of different types known to them. We prove that when only one action can be implemented, law enforcement will take action (either target or arrest) on the highest known criminal type to them. Our results demonstrate that: (i) it may be valuable to diversify the action taken on the same criminal type when more than one action can be implemented; (ii) the marginal improvement in terms of the value of the criminals interdicted per unit time by increasing available resources decreases as resource level increases; and (iii) there are losses that arise from not holistically planning the actions of all available resources across distinct operations against drug trafficking networks.
Keywords: Markov decision process | Intelligence and interdiction operations | Illegal drug supply chains | Column generation
مقاله انگلیسی
10 Deep reinforcement learning algorithm for dynamic pricing of express lanes with multiple access locations
الگوریتم یادگیری تقویتی عمیق برای قیمت گذاری پویا خطوط اکسپرس با مکان های دسترسی متعدد-2020
This article develops a deep reinforcement learning (Deep-RL) framework for dynamic pricing on managed lanes with multiple access locations and heterogeneity in travelers’ value of time, origin, and destination. This framework relaxes assumptions in the literature by considering multiple origins and destinations, multiple access locations to the managed lane, en route diversion of travelers, partial observability of the sensor readings, and stochastic demand and observations. The problem is formulated as a partially observable Markov decision process (POMDP) and policy gradient methods are used to determine tolls as a function of real-time observations. Tolls are modeled as continuous and stochastic variables and are determined using a feedforward neural network. The method is compared against a feedback control method used for dynamic pricing. We show that Deep-RL is effective in learning toll policies for maximizing revenue, minimizing total system travel time, and other joint weighted objectives, when tested on real-world transportation networks. The Deep-RL toll policies outperform the feedback control heuristic for the revenue maximization objective by generating revenues up to 8.5% higher than the heuristic and for the objective minimizing total system travel time (TSTT) by generating TSTT up to 8.4% lower than the heuristic. We also propose reward shaping methods for the POMDP to overcome the undesired behavior of toll policies, like the jam-and-harvest behavior of revenuemaximizing policies. Additionally, we test transferability of the algorithm trained on one set of inputs for new input distributions and offer recommendations on real-time implementations of Deep-RL algorithms.
Keywords: Managed lanes | Express lanes | High occupancy/toll (HOT) lanes | Dynamic pricing | Deep reinforcement learning | Traffic control | Feedback control heuristic
مقاله انگلیسی
rss مقالات ترجمه شده rss مقالات انگلیسی rss کتاب های انگلیسی rss مقالات آموزشی
logo-samandehi
بازدید امروز: 6139 :::::::: بازدید دیروز: 3097 :::::::: بازدید کل: 40406 :::::::: افراد آنلاین: 44