دانلود و نمایش مقالات مرتبط با یادگیری تقویتی::صفحه 1
بلافاصله پس از پرداخت دانلود کنید

با سلام خدمت کاربران در صورتی که با خطای سیستم پرداخت بانکی مواجه شدید از طریق کارت به کارت (6037997535328901 بانک ملی ناصر خنجری ) مقاله خود را دریافت کنید (تا مشکل رفع گردد). 

نتیجه جستجو - یادگیری تقویتی

تعداد مقالات یافته شده: 271
ردیف عنوان نوع
1 Deep Reinforcement Learning With Quantum-Inspired Experience Replay
یادگیری تقویتی عمیق با تکرار تجربه کوانتومی-2022
In this article, a novel training paradigm inspired by quantum computation is proposed for deep reinforcement learning (DRL) with experience replay. In contrast to the traditional experience replay mechanism in DRL, the proposed DRL with quantum-inspired experience replay (DRL-QER) adaptively chooses experiences from the replay buffer according to the complexity and the replayed times of each experience (also called transition), to achieve a balance between exploration and exploitation. In DRL-QER, transitions are first formulated in quantum representations and then the preparation operation and depreciation operation are performed on the transitions. In this process, the preparation operation reflects the relationship between the temporal-difference errors (TD-errors) and the importance of the experiences, while the depreciation operation is taken into account to ensure the diversity of the transitions. The experimental results on Atari 2600 games show that DRL-QER outperforms state-of-the-art algorithms, such as DRL-PER and DCRL on most of these games with improved training efficiency and is also applicable to such memory-based DRL approaches as double network and dueling network.
Index Terms: Deep reinforcement learning (DRL) | quantum computation | quantum-inspired experience replay (QER) | quantum reinforcement learning.
مقاله انگلیسی
2 DQRA: Deep Quantum Routing Agent for Entanglement Routing in Quantum Networks
DQRA: عامل مسیریابی کوانتومی عمیق برای مسیریابی درهم تنیده در شبکه های کوانتومی-2022
Quantum routing plays a key role in the development of the next-generation network system. In particular, an entangled routing path can be constructed with the help of quantum entanglement and swapping among particles (e.g., photons) associated with nodes in the network. From another side of computing, machine learning has achieved numerous breakthrough successes in various application domains, including networking. Despite its advantages and capabilities, machine learning is not as much utilized in quantum networking as in other areas. To bridge this gap, in this article, we propose a novel quantum routing model for quantum networks that employs machine learning architectures to construct the routing path for the maximum number of demands (source–destination pairs) within a time window. Specifically, we present a deep reinforcement routing scheme that is called Deep Quantum Routing Agent (DQRA). In short, DQRA utilizes an empirically designed deep neural network that observes the current network states to accommodate the network’s demands, which are then connected by a qubit-preserved shortest path algorithm. The training process of DQRA is guided by a reward function that aims toward maximizing the number of accommodated requests in each routing window. Our experiment study shows that, on average, DQRA is able to maintain a rate of successfully routed requests at above 80% in a qubit-limited grid network and approximately 60% in extreme conditions, i.e., each node can be repeater exactly once in a window. Furthermore, we show that the model complexity and the computational time of DQRA are polynomial in terms of the sizes of the quantum networks.
INDEX TERMS: Deep learning | deep reinforcement learning (DRL) | machine learning | next-generation network | quantum network routing | quantum networks.
مقاله انگلیسی
3 Resource Management for Edge Intelligence (EI)-Assisted IoV Using Quantum-Inspired Reinforcement Learning
مدیریت منابع برای IoV به کمک هوش لبه (EI) با استفاده از یادگیری تقویتی الهام گرفته از پردازش کوانتومی-2022
Recent developments in the Internet of Vehicles (IoV) enable interconnected vehicles to support ubiquitous services. Various emerging service applications are promising to increase the Quality of Experience (QoE) of users. On-board computation tasks generated by these applications have heavily overloaded the resource-constrained vehicles, forcing it to offload on-board tasks to other edge intelligence (EI)-assisted servers. However, excessive task offloading can lead to severe competition for communication and computation resources among vehicles, thereby increasing the processing latency, energy consumption, and system cost. To address these problems, we investigate the transmission-awareness and computing-sense uplink resource management problem and formulate it as a time-varying Markov decision process. Considering the total delay, energy consumption, and cost, quantum-inspired reinforcement learning (QRL) is proposed to develop an intelligence-oriented edge offloading strategy. Specifically, the vehicle can flexibly choose the network access mode and offloading strategy through two different radio interfaces to offload tasks to multiaccess edge computing (MEC) servers through WiFi and cloud servers through 5G. The objective of this joint optimization is to maintain a self-adaptive balance between these two aspects. Simulation results show that the proposed algorithm can significantly reduce the transmission latency and computation delay.
Index Terms: Cloud computing | edge intelligence (EI) | Internet of Vehicles (IoV) | multiaccess edge computing (MEC) | quantum-inspired reinforcement learning (QRL)
مقاله انگلیسی
4 Resource Allocation in Time Slotted Channel Hopping (TSCH) Networks Based on Phasic Policy Gradient Reinforcement Learning
تخصیص منابع در شبکه های گام کانال با شکاف زمانی (TSCH) بر اساس یادگیری تقویت گرادیان خط مشی فازی-2022
The concept of the Industrial Internet of Things (IIoT) is gaining prominence due to its lowcost solutions and improved productivity of manufacturing processes. To address the ultra-high reliability and ultra-low power communication requirements of IIoT networks, Time Slotted Channel Hopping (TSCH) behavioral mode has been introduced in IEEE 802.15.4e standard. Scheduling the packet transmissions in IIoT networks is a difficult task owing to the limited resources and dynamic topology. In IEEE 802.15.4e TSCH, the design of the schedule is open to implementation. In this paper, we propose a phasic policy gradient (PPG) based TSCH schedule learning algorithm. We construct the utility function that accounts for the throughput, and energy efficiency of the TSCH network. The proposed PPG based scheduling algorithm overcomes the drawbacks of totally distributed and totally centralized deep reinforcement learning-based scheduling algorithms by employing the actor–critic policy gradient method that learns the scheduling algorithm in two phases, namely policy phase and auxiliary phase. In this method, we show that the schedule converges quickly compared to any other actor–critic method and also improves the system throughput performance by 58% compared to the minimal scheduling function, a default TSCH schedule.
Keywords: Industrial internet of things | IEEE 802.15.4e | Time slotted channel hopping | Deep reinforcement learning | Actor–critic policy gradient methods | Phasic policy gradient
مقاله انگلیسی
5 Attention-based model and deep reinforcement learning for distribution of event processing tasks
مدل مبتنی بر توجه و یادگیری تقویتی عمیق برای توزیع وظایف پردازش رویداد-2022
Event processing is the cornerstone of the dynamic and responsive Internet of Things (IoT). Recent approaches in this area are based on representational state transfer (REST) principles, which allow event processing tasks to be placed at any device that follows the same principles. However, the tasks should be properly distributed among edge devices to ensure fair resources utilization and guarantee seamless execution. This article investigates the use of deep learning to fairly distribute the tasks. An attention-based neural network model is proposed to generate efficient load balancing solutions under different scenarios. The proposed model is based on the Transformer and Pointer Network architectures, and is trained by an advantage actorcritic reinforcement learning algorithm. The model is designed to scale to the number of event processing tasks and the number of edge devices, with no need for hyperparameters re-tuning or even retraining. Extensive experimental results show that the proposed model outperforms conventional heuristics in many key performance indicators. The generic design and the obtained results show that the proposed model can potentially be applied to several other load balancing problem variations, which makes the proposal an attractive option to be used in real-world scenarios due to its scalability and efficiency.
keywords: Web of Things (WoT) | Representational state transfer (REST) | application programming interface (APIs) | Edge computing | Load balancing | Resource placement | Deep reinforcement leaning | Transformer model | Pointer networks | Actor critic
مقاله انگلیسی
6 Deep Q learning based secure routing approach for OppIoT networks
رویکرد مسیریابی ایمن مبتنی بر یادگیری Q برای شبکه های OppIoT-2022
Opportunistic IoT (OppIoT) networks are a branch of IoT where the human and machines collaborate to form a network for sharing data. The broad spectrum of devices and ad-hoc nature of connections, further alleviate the problem of network and data security. Traditional approaches like trust based approaches or cryptographic approaches fail to preemptively secure these networks. Machine learning (ML) approaches, mainly deep reinforcement learning (DRL) methods can prove to be very effective in ensuring the security of the network as they are profoundly capable of solving complex and dynamic problems. Deep Q-learning (DQL) incorporates deep neural network in the Q learning process for dealing with high-dimensional data. This paper proposes a routing approach for OppIoT, DQNSec, based on DQL for securing the network against attacks viz. sinkhole, hello flood and distributed denial of service attack. The actor–critic approach of DQL is utilized and OppIoT is modeled as a Markov decision process (MDP). Extensive simulations prove the efficiency of DQNSec in comparison to other ML based routing protocols, viz. RFCSec, RLProph, CAML and MLProph.
Keywords: OppIoT | Reinforcement learning | Deep learning | Deep Q-learning | Markov decision process | Sinkhole attack | Hello flood attack | Distributed denial of service attack
مقاله انگلیسی
7 Curriculum-Based Deep Reinforcement Learning for Quantum Control
یادگیری تقویتی عمیق مبتنی بر برنامه درسی برای کنترل کوانتومی-2022
Deep reinforcement learning (DRL) has been recognized as an efficient technique to design optimal strategies for different complex systems without prior knowledge of the control landscape. To achieve a fast and precise control for quantum systems, we propose a novel DRL approach by constructing a curriculum consisting of a set of intermediate tasks defined by fidelity thresholds, where the tasks among a curriculum can be statically determined before the learning process or dynamically generated during the learning process. By transferring knowledge between two successive tasks and sequencing tasks according to their difficulties, the proposed curriculum-based DRL (CDRL) method enables the agent to focus on easy tasks in the early stage, then move onto difficult tasks, and eventually approaches the final task. Numerical comparison with the traditional methods [gradient method (GD), genetic algorithm (GA), and several other DRL methods] demonstrates that CDRL exhibits improved control performance for quantum systems and also provides an efficient way to identify optimal strategies with few control pulses.
Index Terms: Curriculum learning | deep reinforcement learning (DRL) | quantum control.
مقاله انگلیسی
8 Zero shot augmentation learning in internet of biometric things for health signal processing
یادگیری تقویتی صفر در اینترنت اشیا بیومتریک برای پردازش سیگنال سلامتی-2021
In recent years, the number of Internet of Things (IoT) devices has increased rapidly. The Internet of Biometric Things (IoBT) can process biometrics and health signals, and it will greatly extend the range of biometric applications. The analysis of health signals in the IoBT can use computer-aided diagnosis techniques. However, most of the existing computer-aided diagnosis methods are developed for common diseases and are not suitable for rare diseases. Zero shot learning is a potential method for the computer- aided diagnosis of rare diseases because it can identify objects of unknown categories. However, the ex- isting zero shot learning methods are based on attribute learning and rely on an attribute dataset. There is no attribute dataset for health signal processing. Therefore, the existing zero shot learning methods are not suitable for health signal processing. Based on the above background, we propose a zero shot aug- mentation learning model (ZSAL) in the IoBT for health signal processing. First, an expert doctor identifies the contour of a lesion and selects a background image without a lesion. Second, the computer automatically generates virtual images using zero shot augmentation technology. Finally, the generated virtual dataset is used to train a convolutional classifier, and then we apply the classifier to the computer-aided diagnosis of actual medical images. The experiment shows the efficiency and effectiveness of our method.© 2021 Elsevier B.V. All rights reserved.
Keywords: Internet of biometric things | Zero shot learning | Data augmentation | Health signal processing
مقاله انگلیسی
9 Actor–Critic Reinforcement Learning and Application in Developing Computer-Vision-Based Interface Tracking
یادگیری و کاربرد تقویت کننده منتقد در توسعه ردیابی رابط مبتنی بر بینایی ماشین-2021
This paper synchronizes control theory with computer vision by formalizing object tracking as a sequential decision-making process. A reinforcement learning (RL) agent successfully tracks an interface between two liquids, which is often a critical variable to track in many chemical, petrochemical, metallurgical, and oil industries. This method utilizes less than 100 images for creating an environment, from which the agent generates its own data without the need for expert knowledge. Unlike supervised learning (SL) methods that rely on a huge number of parameters, this approach requires far fewer parameters, which naturally reduces its maintenance cost. Besides its frugal nature, the agent is robust to environmental uncertainties such as occlusion, intensity changes, and excessive noise. From a closed-loop control context, an interface location-based deviation is chosen as the optimization goal during training. The methodology showcases RL for real-time object-tracking applications in the oil sands industry. Along with a presentation of the interface tracking problem, this paper provides a detailed review of one of the most effective RL methodologies: actor–critic policy.
Keywords: Interface tracking | Object tracking | Occlusion | Reinforcement learning | Uniform manifold approximation and projection
مقاله انگلیسی
10 Optimal carbon storage reservoir management through deep reinforcement learning
مدیریت بهینه ذخیره مخزن کربن از طریق یادگیری تقویتی عمیق-2020
Model-based optimization plays a central role in energy system design and management. The complexity and high-dimensionality of many process-level models, especially those used for geosystem energy exploration and utilization, often lead to formidable computational costs when the dimension of decision space is also large. This work adopts elements of recently advanced deep learning techniques to solve a sequential decisionmaking problem in applied geosystem management. Specifically, a deep reinforcement learning framework was formed for optimal multiperiod planning, in which a deep Q-learning network (DQN) agent was trained to maximize rewards by learning from high-dimensional inputs and from exploitation of its past experiences. To expedite computation, deep multitask learning was used to approximate high-dimensional, multistate transition functions. Both DQN and deep multitask learning are pattern based. As a demonstration, the framework was applied to optimal carbon sequestration reservoir planning using two different types of management strategies: monitoring only and brine extraction. Both strategies are designed to mitigate potential risks due to pressure buildup. Results show that the DQN agent can identify the optimal policies to maximize the reward for given risk and cost constraints. Experiments also show that knowledge the agent gained from interacting with one environment is largely preserved when deploying the same agent in other similar environments.
Keywords: Reinforcement learning | Multistage decision-making | Deep autoregressive model | Deep Q network | Surrogate modeling | Markov decision process | Geological carbon sequestration
مقاله انگلیسی
rss مقالات ترجمه شده rss مقالات انگلیسی rss کتاب های انگلیسی rss مقالات آموزشی
logo-samandehi
بازدید امروز: 416 :::::::: بازدید دیروز: 0 :::::::: بازدید کل: 416 :::::::: افراد آنلاین: 53