با سلام خدمت کاربران در صورتی که با خطای سیستم پرداخت بانکی مواجه شدید از طریق کارت به کارت (6037997535328901 بانک ملی ناصر خنجری ) مقاله خود را دریافت کنید (تا مشکل رفع گردد).
ردیف | عنوان | نوع |
---|---|---|
1 |
Deep Q learning based secure routing approach for OppIoT networks
رویکرد مسیریابی ایمن مبتنی بر یادگیری Q برای شبکه های OppIoT-2022 Opportunistic IoT (OppIoT) networks are a branch of IoT where the human and machines
collaborate to form a network for sharing data. The broad spectrum of devices and ad-hoc
nature of connections, further alleviate the problem of network and data security. Traditional
approaches like trust based approaches or cryptographic approaches fail to preemptively secure
these networks. Machine learning (ML) approaches, mainly deep reinforcement learning (DRL)
methods can prove to be very effective in ensuring the security of the network as they
are profoundly capable of solving complex and dynamic problems. Deep Q-learning (DQL)
incorporates deep neural network in the Q learning process for dealing with high-dimensional
data. This paper proposes a routing approach for OppIoT, DQNSec, based on DQL for securing the
network against attacks viz. sinkhole, hello flood and distributed denial of service attack. The
actor–critic approach of DQL is utilized and OppIoT is modeled as a Markov decision process
(MDP). Extensive simulations prove the efficiency of DQNSec in comparison to other ML based
routing protocols, viz. RFCSec, RLProph, CAML and MLProph.
Keywords: OppIoT | Reinforcement learning | Deep learning | Deep Q-learning | Markov decision process | Sinkhole attack | Hello flood attack | Distributed denial of service attack |
مقاله انگلیسی |
2 |
Optimal carbon storage reservoir management through deep reinforcement learning
مدیریت بهینه ذخیره مخزن کربن از طریق یادگیری تقویتی عمیق-2020 Model-based optimization plays a central role in energy system design and management. The complexity and
high-dimensionality of many process-level models, especially those used for geosystem energy exploration
and utilization, often lead to formidable computational costs when the dimension of decision space is also
large. This work adopts elements of recently advanced deep learning techniques to solve a sequential decisionmaking
problem in applied geosystem management. Specifically, a deep reinforcement learning framework was
formed for optimal multiperiod planning, in which a deep Q-learning network (DQN) agent was trained to
maximize rewards by learning from high-dimensional inputs and from exploitation of its past experiences. To
expedite computation, deep multitask learning was used to approximate high-dimensional, multistate transition
functions. Both DQN and deep multitask learning are pattern based. As a demonstration, the framework was
applied to optimal carbon sequestration reservoir planning using two different types of management strategies:
monitoring only and brine extraction. Both strategies are designed to mitigate potential risks due to pressure
buildup. Results show that the DQN agent can identify the optimal policies to maximize the reward for given
risk and cost constraints. Experiments also show that knowledge the agent gained from interacting with one
environment is largely preserved when deploying the same agent in other similar environments. Keywords: Reinforcement learning | Multistage decision-making | Deep autoregressive model | Deep Q network | Surrogate modeling | Markov decision process | Geological carbon sequestration |
مقاله انگلیسی |
3 |
Deep reinforcement learning based AGVs real-time scheduling with mixed rule for flexible shop floor in industry 4.0
زمانبندی مبتنی بر یادگیری تقویتی عمیق مبتنی بر AGV با قاعده مختلط برای کف انعطاف پذیر در صنعت 4.0-2020 Driven by the recent advances in industry 4.0 and industrial artificial intelligence, Automated Guided Vehicles
(AGVs) has been widely used in flexible shop floor for material handling. However, great challenges aroused by
the high dynamics, complexity, and uncertainty of the shop floor environment still exists on AGVs real-time
scheduling. To address these challenges, an adaptive deep reinforcement learning (DRL) based AGVs real-time
scheduling approach with mixed rule is proposed to the flexible shop floor to minimize the makespan and
delay ratio. Firstly, the problem of AGVs real-time scheduling is formulated as a Markov Decision Process (MDP)
in which state representation, action representation, reward function, and optimal mixed rule policy, are
described in detail. Then a novel deep q-network (DQN) method is further developed to achieve the optimal
mixed rule policy with which the suitable dispatching rules and AGVs can be selected to execute the scheduling
towards various states. Finally, the case study based on a real-world flexible shop floor is illustrated and the
results validate the feasibility and effectiveness of the proposed approach. Keywords: Automated guided vehicles | Real-time scheduling | Deep reinforcement learning | Industry 4.0 |
مقاله انگلیسی |
4 |
Dynamic selective maintenance optimization for multi-state systems over a finite horizon: A deep reinforcement learning approach
بهینه سازی تعمیر و نگهداری انتخابی پویا برای سیستم های چند حالته در یک افق محدود: یک رویکرد یادگیری تقویتی عمیق-2020 Selective maintenance, which aims to choose a subset of feasible maintenance actions to be performed for a repairable system with limited maintenance resources, has been extensively studied over the past decade. Most of the reported works on selective maintenance have been dedicated to maximizing the success of a single future mission. Cases of multiple consecutive missions, which are oftentimes encoun- tered in engineering practices, have been rarely investigated to date. In this paper, a new selective main- tenance optimization for multi-state systems that can execute multiple consecutive missions over a finite horizon is developed. The selective maintenance strategy can be dynamically optimized to maximize the expected number of future mission successes whenever the states and effective ages of the components become known at the end of the last mission. The dynamic optimization problem, which accounts for imperfect maintenance, is formulated as a discrete-time finite-horizon Markov decision process with a mixed integer-discrete-continuous state space. Based on the framework of actor-critic algorithms, a cus- tomized deep reinforcement learning method is put forth to overcome the “curse of dimensionality”and mitigate the uncountable state space. In our proposed method, a postprocess is developed for the actor to search the optimal maintenance actions in a large-scale discrete action space, whereas the techniques of the experience replay and the target network are utilized to facilitate the agent training. The perfor- mance of the proposed method is examined by an illustrative example and an engineering example of a coal transportation system. Keywords: Maintenance | Dynamic selective maintenance | Deep reinforcement learning | Imperfect maintenance | Multi-state system |
مقاله انگلیسی |
5 |
PALO Bounds for Reinforcement Learning in Partially Observable Stochastic Games
مرزهای PALO برای یادگیری تقویتی در بازی های تصادفی تا حدی قابل مشاهده-2020 A partially observable stochastic game (POSG) is a general model for multiagent de-
cision making under uncertainty. Perkins’ Monte Carlo exploring starts for partially
observable Markov decision process (POMDP) (MCES-P) integrates Monte Carlo ex-
ploring starts (MCES) into a local search of the policy space to offer an elegant template
for model-free reinforcement learning in POSGs. However, multiagent reinforcement
learning in POSGs is tremendously more complex than in single agent settings due to
the heterogeneity of agents and discrepancy of their goals. In this article, we generalize
reinforcement learning under partial observability to self-interested and cooperative
multiagent settings under the POSG umbrella. We present three new templates for
multiagent reinforcement learning in POSGs. MCES for interactive POMDP (MCESIP
) extends MCES-P by maintaining predictions of the other agent’s actions based on
dynamic beliefs over models. MCES for multiagent POMDP (MCES-MP) generalizes
MCES-P to the canonical multiagent POMDP framework, with a single policy mapping
joint observations of all agents to joint actions. Finally, MCES for factored-reward
multiagent POMDP (MCES-FMP) has each agent individually mapping joint obser-
vations to their own action. We use probabilistic approximate locally optimal (PALO)
bounds to analyze sample complexity, thereby instantiating these templates to PALO
learning. We promote sample efficiency by including a policy space pruning technique
and evaluate the approaches on six benchmark domains as well as compare with the
state-of-the-art techniques, which demonstrates that MCES-IP and MCES-FMP yield
improved policies with fewer samples compared to the previous baselines. Keywords: multiagent systems | reinforcement learning | POMDP | POSG |
مقاله انگلیسی |
6 |
Adaptive early classification of temporal sequences using deep reinforcement learning
طبقه بندی اولیه انطباقی توالی های زمانی با استفاده از یادگیری تقویتی عمیق-2020 In this article, we address the problem of early classification (EC) of temporal sequences with adaptive
prediction times. We frame EC as a sequential decision making problem and we define a partially
observable Markov decision process (POMDP) fitting the competitive objectives of classification
earliness and accuracy. We solve the POMDP by training an agent for EC with deep reinforcement
learning (DRL). The agent learns to make adaptive decisions between classifying incomplete sequences
now or delaying its prediction to gather more measurements. We adapt an existing DRL algorithm for
batch and online learning of the agent’s action value function with a deep neural network. We propose
strategies of prioritized sampling, prioritized storing and random episode initialization to address the
fact that the agent’s memory is unbalanced due to (1): all but one of its actions terminate the process
and thus (2): actions of classification are less frequent than the action of delay. In experiments, we
show improvements in accuracy induced by our specific adaptation of the algorithm used for online
learning of the agent’s action value function. Moreover, we compare two definitions of the POMDP
based on delay reward shaping against reward discounting. Finally, we demonstrate that a static naive
deep neural network, i.e. trained to classify at static times, is less efficient in terms of accuracy against
speed than the equivalent network trained with adaptive decision making capabilities Keywords: Early classification | Adaptive prediction time | Deep reinforcement learning | Temporal sequences | Double DQN | Trade-off between accuracy vs. speed |
مقاله انگلیسی |
7 |
ADRL: An attention-based deep reinforcement learning framework for knowledge graph reasoning
ADRL: یک چارچوب یادگیری تقویتی عمیق مبتنی بر توجه برای استدلال نمودار دانش-2020 Knowledge graph reasoning is one of the key technologies for knowledge graph construction, which
plays an important part in application scenarios such as vertical search and intelligent question
answering. It is intended to infer the desired entity from the entities and relations that already exist in
the knowledge graph. Most current methods for reasoning, such as embedding-based methods, globally
embed all entities and relations, and then use the similarity of vectors to infer relations between
entities or whether given triples are true. However, in real application scenarios, we require a clear and
interpretable target entity as the output answer. In this paper, we propose a novel attention-based deep
reinforcement learning framework (ADRL) for learning multi-hop relational paths, which improves
the efficiency, generalization capacity, and interpretability of conventional approaches through the
structured perception of deep learning and relational reasoning of reinforcement learning. We define
the entire process of reasoning as a Markov decision process. First, we employ CNN to map the
knowledge graph to a low-dimensional space, and a message-passing mechanism to sense neighbor
entities at each level, and then employ LSTM to memorize and generate a sequence of historical
trajectories to form a policy and value functions. We design a relational module that includes a selfattention
mechanism that can infer and share the weights of neighborhood entity vectors and relation
vectors. Finally, we employ the actor–critic algorithm to optimize the entire framework. Experiments
confirm the effectiveness and efficiency of our method on several benchmark data sets. Keywords: Knowledge graph | Knowledge reasoning | Reinforcement learning | Deep learning | Attention |
مقاله انگلیسی |
8 |
Deep reinforcement learning for condition-based maintenance planning of multi-component systems under dependent competing risks
یادگیری تقویتی عمیق برای برنامه ریزی نگهداری مبتنی بر شرایط سیستم های چند جزئی تحت خطرات رقابت وابسته-2020 Condition-Based Maintenance (CBM) planning for multi-component systems has been receiving increasing attention
in recent years. Most existing research on CBM assumes that preventive maintenances should be conducted
when the degradations of system components reach specific threshold levels upon inspection. However,
the search of optimal maintenance threshold levels is often efficient for low-dimensional CBM but becomes
challenging if the number of components gets large, especially when those components are subject to complex
dependencies. To overcome the challenge, in this paper we propose a novel and flexible CBM model based on a
customized deep reinforcement learning for multi-component systems with dependent competing risks. Both
stochastic and economic dependencies among the components are considered. Specifically, different from the
threshold-based decision making paradigm used in traditional CBM, the proposed model directly maps the multicomponent
degradation measurements at each inspection epoch to the maintenance decision space with a cost
minimization objective, and the leverage of deep reinforcement learning enables high computational efficiencies
and thus makes the proposed model suitable for both low and high dimensional CBM. Various numerical studies
are conducted for model validations. Keywords: Maintenance | Markov decision process | Deep Q network | Failure dependency | Cost minimization |
مقاله انگلیسی |
9 |
A Markov Decision Process approach for balancing intelligence and interdiction operations in city-level drug trafficking enforcement
یک رویکرد فرآیند تصمیم گیری مارکوف برای توازن اطلاعات و عملیات ممنوعیت در اجرای قاچاق مواد مخدر در سطح شهر-2020 We study a resource allocation problem in which law enforcement aims to balance intelligence and interdiction
decisions to fight against illegal city-level drug trafficking. We propose a Markov Decision Process framework,
apply a column generation technique, and develop a heuristic to solve this problem. Our approaches provide
insights into how law enforcement should prioritize its actions when there are multiple criminals of different
types known to them. We prove that when only one action can be implemented, law enforcement will take action
(either target or arrest) on the highest known criminal type to them. Our results demonstrate that: (i) it may be
valuable to diversify the action taken on the same criminal type when more than one action can be implemented;
(ii) the marginal improvement in terms of the value of the criminals interdicted per unit time by increasing
available resources decreases as resource level increases; and (iii) there are losses that arise from not holistically
planning the actions of all available resources across distinct operations against drug trafficking networks. Keywords: Markov decision process | Intelligence and interdiction operations | Illegal drug supply chains | Column generation |
مقاله انگلیسی |
10 |
Deep reinforcement learning algorithm for dynamic pricing of express lanes with multiple access locations
الگوریتم یادگیری تقویتی عمیق برای قیمت گذاری پویا خطوط اکسپرس با مکان های دسترسی متعدد-2020 This article develops a deep reinforcement learning (Deep-RL) framework for dynamic pricing on
managed lanes with multiple access locations and heterogeneity in travelers’ value of time,
origin, and destination. This framework relaxes assumptions in the literature by considering
multiple origins and destinations, multiple access locations to the managed lane, en route diversion
of travelers, partial observability of the sensor readings, and stochastic demand and
observations. The problem is formulated as a partially observable Markov decision process
(POMDP) and policy gradient methods are used to determine tolls as a function of real-time
observations. Tolls are modeled as continuous and stochastic variables and are determined using
a feedforward neural network. The method is compared against a feedback control method used
for dynamic pricing. We show that Deep-RL is effective in learning toll policies for maximizing
revenue, minimizing total system travel time, and other joint weighted objectives, when tested on
real-world transportation networks. The Deep-RL toll policies outperform the feedback control
heuristic for the revenue maximization objective by generating revenues up to 8.5% higher than
the heuristic and for the objective minimizing total system travel time (TSTT) by generating TSTT
up to 8.4% lower than the heuristic. We also propose reward shaping methods for the POMDP to
overcome the undesired behavior of toll policies, like the jam-and-harvest behavior of revenuemaximizing
policies. Additionally, we test transferability of the algorithm trained on one set of
inputs for new input distributions and offer recommendations on real-time implementations of
Deep-RL algorithms. Keywords: Managed lanes | Express lanes | High occupancy/toll (HOT) lanes | Dynamic pricing | Deep reinforcement learning | Traffic control | Feedback control heuristic |
مقاله انگلیسی |