دانلود و نمایش مقالات مرتبط با Hadoop::صفحه 4
بلافاصله پس از پرداخت دانلود کنید

با سلام خدمت کاربران در صورتی که با خطای سیستم پرداخت بانکی مواجه شدید از طریق کارت به کارت (6037997535328901 بانک ملی ناصر خنجری ) مقاله خود را دریافت کنید (تا مشکل رفع گردد). 

نتیجه جستجو - Hadoop

تعداد مقالات یافته شده: 113
ردیف عنوان نوع
31 MIA: Metric Importance Analysis for Big Data Workload Characterization
MIA: تحلیل اهمیت متریک برای خصوصیات بار کار داده های بزرگ-2018
Data analytics is at the foundation of both high-quality products and services in modern economies and societies. Big data workloads run on complex large-scale computing clusters, which implies significant challenges for deeply understanding and characterizing overall system performance. In general, performance is affected by many factors at multiple layers in the system stack, hence it is challenging to identify the key metrics when understanding big data workload performance. In this paper, we propose a novel workload characterization methodology using ensemble learning, called Metric Importance Analysis (MIA), to quantify the respective importance of workload metrics. By focusing on the most important metrics, MIA reduces the complexity of the analysis without losing information. Moreover, we develop the MIA-based Kiviat Plot (MKP) and Benchmark Similarity Matrix (BSM) which provide more insightful information than the traditional linkage clustering based dendrogram to visualize program behavior (dis)similarity. To demonstrate the applicability of MIA, we use it to characterize three big data benchmark suites: HiBench, CloudRank-D and SZTS. The results show that MIA is able to characterize complex big data workloads in a simple, intuitive manner, and reveal interesting insights. Moreover, through a case study, we demonstrate that tuning the configuration parameters related to the important metrics found by MIA results in higher performance improvements than through tuning the parameters related to the less important ones.
Index Terms: Big data, benchmarking, workload characterization, performance measurement, MapReduce/hadoop
مقاله انگلیسی
32 Socio-cyber network: The potential of cyber-physical system to define human behaviors using big data analytics
شبکه اجتماعی سایری: پتانسیل سیستم فیزیکی سایبری برای تعریف رفتارهای انسانی با استفاده از تجزیه و تحلیل داده های بزرگ-2018
The growing gap between users and the big data analytics requires innovative tools that address the chal lenges faced by big data volume, variety, and velocity. Therefore, it becomes computationally inefficient to analyze such massive volume of data. Moreover, advancements in the field of big data application and data science leads toward a new paradigm of human behavior, where various smart devices integrate with each other and establish a relationship. However, majority of the systems are either memoryless or computational inefficient, which are unable to define or predict human behavior. Therefore, keeping in view the aforementioned needs, there is a requirement for a system that can efficiently analyze a stream of big data within their requirements. Hence, this paper presents a system architecture that integrates social network with the technical network. We derive a novel notion of ‘Socio-Cyber Network’, where a friendship is made based on the geo-location information of the user, where trust index is used based on graphs theory. The proposed graph theory provides a better understanding of extraction knowledge from the data and finding relationship between different users. To check the efficiency of the proposed algorithms exploited in the proposed system architecture, we have implemented our proposed system using Hadoop and MapReduce. MapReduce for cyber-physical system (CPS) is supported by a parallel algorithm that efficiently process a huge volume of data sets. The system is implemented using Spark GraphX tool at the top of the Hadoop parallel nodes to generate and process graphs with near real-time. Moreover, the system is evaluated in terms of efficiency by considering the system throughput and processing time. The results show that the proposed system is more scalable and efficient.
Keywords: Big data ، Socio-cyber network ، Human behavior ، Graphs ، Friendship ، Trust index
مقاله انگلیسی
33 Selective I/O Bypass and Load Balancing Method for Write-Through SSD Caching in Big Data Analytics
روش تعادل بار بای پس ورودی خروجی انتخابی برای نوشتن از طریق حافظه SSD در تجزیه و تحلیل داده های بزرگ-2018
Fast network quality analysis in the telecom industry is an important method used to provide quality service. SK Telecom, based in South Korea, built a Hadoop-based analytical system consisting of a hundred nodes, each of which only contains hard disk drives (HDDs). Because the analysis process is a set of parallel I/O intensive jobs, adding solid state drives (SSDs) with appropriate settings is the most cost-efficient way to improve the performance, as shown in previous studies. Therefore, we decided to configure SSDs as a write-through cache instead of increasing the number of HDDs. To improve the cost-perperformance of the SSD cache, we introduced a selective I/O bypass (SIB) method, redirecting the automatically calculated number of read I/O requests from the SSD cache to idle HDDs when the SSDs are I/O over-saturated, which means the disk utilization is greater than 100 percent. To precisely calculate the disk utilization, we also introduced a combinational approach for SSDs because the current method used for HDDs cannot be applied to SSDs because of their internal parallelism. In our experiments, the proposed approach achieved a maximum 2x faster performance than other approaches.
Index Terms: I/O load balancing, SQL-on-Hadoop, SSD cache, storage hierarchies
مقاله انگلیسی
34 Smart health monitoring and management system: Toward autonomous wearable sensing for internet of things using big data analytics
سیستم نظارت و مدیریت هوشمند سلامت: به سوی سنجش پوشیدنی مستقل برای اینترنت اشیا با استفاده از تجزیه و تحلیل داده های بزرگ-2018
The current development and growth in the arena of Internet of Things (IoT) are providing a great potential in the route of the novel epoch of healthcare. The vision of the healthcare is expansively favored, as it advances the excellence of life and health of humans, involving several health regulations. The incessant increase of the multifaceted IoT devices in health is broadly tested by challenges such as powering the IoT terminal nodes used for health monitoring, real-time data processing and smart decision and event management. In this paper, we propose a healthcare architecture which is based analysis of energy harvesting for health monitoring sensors and the realization of Big Data analytics in healthcare. The rationale of proposed architecture is twofold: (1) comprehensive conceptual framework for energy harvesting for health monitoring sensors, and (2) data processing and decision management for healthcare. The proposed architecture is three-layered architecture, that comprised (1) energy harvesting and data generation, data pre-processing, and data processing and application. We also verified the consistent data sets on Hadoop server to validate the proposed architecture based on threshold limit value (TLV). The study reveals that the proposed architecture offer valuable imminent into the field of smart health.
Key Words: IoT, Energy Harvesting, Big Data Analytics
مقاله انگلیسی
35 Clustering big IoT data by metaheuristic optimized mini-batch and parallel partition-based DGC in Hadoop
خوشه بندی داده های اینترنت اشیا بزرگ توسط بهینه سازی ماتریس های متمرکز و DGC مبتنی بر پارتیشن موازی در Hadoop-2018
Clustering algorithms are an important branch of data mining family which has been applied widely in IoT applications such as finding similar sensing patterns, detecting outliers, and segmenting large behavioral groups in real-time. Traditional full batch k-means for clustering IoT big data is confronted by large scaled storage and high computational complexity problems. In order to overcome the latency inherited from full batch k-means, two big data processing methods were often used: the first method is to use small batches as the input data to multiple computers for reducing the computation efforts. However, depending on the sensed data which may be heterogeneously fused from different sources in an IoT network, the size of each mini batch may vary in each iteration of clustering process. When these input data are subject to clustering their centers would shift drastically, which affects the final clustering results. The second method is parallel computing, it decreases the runtime while the overall computational effort remains the same. Furthermore, some centroid based clustering algorithm such as k-means converges easily into local optima. In light of this, in this paper, a new partitioned clustering method that is optimized by metaheuristic is proposed for IoT big data environment. The method has three main activities: Firstly, a sample of the dataset is partitioned into mini batches. It is followed by adjusting the centroids of the mini batches of data. The third step is collating the mini batches to form clusters, so the quality of the clusters would be maximized. How the positions of the centroids could be optimally attuned at the mini batches are governed by a metaheuristic called Dynamic Group Optimization. The data are processed in parallel in Hadoop. Extensive experiments are conducted to investigate the performance. The results show that our proposed method is a promising tool for clustering fused IoT data efficiently.
Keywords: Metaheuristic ، Partitioning ، Clustering ، Hadoop ، IoT data، Data fusion
مقاله انگلیسی
36 In-Mapper combiner based MapReduce algorithm for processing of big climate data
الگوریتم MapReduce مبتنی بر ترکیب Mapper در پردازش داده های آب و هوایی بزرگ -2018
Big data refers to a collection of massive volume of data that cannot be processed by conventional data processing tools and technologies. In recent years, the data production sources are enlarged noticeably, such as high-end streaming devices, wireless sensor networks, satellite, wearable Internet of Things (IoT) devices. These data generation sources generate a massive volume of data in a continuous manner. The large volume of climate data is collected from the IoT weather sensor devices and NCEP. In this paper, the big data processing framework is proposed to integrate climate and health data and to find the correlation between the climate parameters and incidence of dengue. This framework is demonstrated with the help of MapReduce programming model, Hive, HBase and ArcGIS in a Hadoop Distributed File System (HDFS) environment. The following weather parameters such as minimum temperature, maximum temperature, wind, precipitation, solar and relative humidity are collected for the study are Tamil Nadu with the help of IoT weather sensor devices and NCEP. Proposed framework focuses only on climate data for 32 districts of Tamil Nadu where each district contains 1,57,680 rows and so there are 50,45,760 rows in total. Batch view precomputation for the monthly mean of various climate parameters would require 50,45,760 rows. Hence, this would create more latency in query processing. In order to overcome this issue, batch views can precompute for a smaller number of records and involve more computation to be done at query time. The In-Mapper based MapReduce framework is used to compute the monthly mean of climate parameter for each latitude and longitude. The experimental results prove the effectiveness of the response time for the In-Mapper based combiner algorithm is less when compared with the existing MapReduce algorithm.
Keywords: Big data ، Internet of Things ، Weather sensor devices ، MapReduce programming ،Model ، Hadoop distributed file system
مقاله انگلیسی
37 Optimized Big Data Management across Multi-Cloud Data Centers: Software-Defined Network-Based Analysis
مدیریت داده های بزرگ بهینه شده در سراسر مراکز داده چند ابری: تحلیل مبتنی بر شبکه نرم افزار تعریف شده-2018
With an exponential increase in smart device users, there is an increase in the bulk amount of data generation from various smart devices, which varies with respect to all the essential Vs used to categorize it as big data. Generally, most service providers, including Google, Amazon, Microsoft and so on, have deployed a large number of geographically distributed data centers to process this huge amount of data generated from various smart devices so that users can get quick response time. For this purpose, Hadoop, and SPARK are widely used by these service providers for processing large datasets. However, less emphasis has been given on the underlying infrastructure (the network through which data flows), which is one of the most important components for successful implementation of any designed solution in this environment. In the worst case, due to heavy network traffic with respect to data migrations across different data centers, the underlying network infrastructure may not be able to transfer data packets from source to destination, resulting in performance degradation. Focusing on all these issues, in this article, we propose a novel SDN-based big data management approach with respect to the optimized network resource consumption such as network bandwidth and data storage units. We analyze various components at both the data and control planes that can enhance the optimized big data analytics across multiple cloud data centers. For example, we analyze the performance of the proposed solution using Bloom-filter-based insertion and deletion of an element in the flow table maintained at the OpenFlow controller, which makes most of the decisions for network traffic classification using the rule-and-action-based mechanism. Using the proposed solution, developers can deploy and analyze real-time traffic behavior for the future big data applications in MCE.
Keywords: Big Data,cloud computing, computer centres, software defined networking, telecommunication traffic
مقاله انگلیسی
38 A Big Data Analytics Architecture for the Internet of Small Things
معماری تحلیل داده های بزرگ برای اینترنت اشیا کوچک-2018
The SK Telecom Company of South Korea recently introduced the concept of IoST to its business model. The company deployed IoST, which constantly generates data via the LoRa wireless platform. The increase in data rates generated by IoST is escalating exponentially. After attempting to analyze and store the massive volume of IoST data using existing tools and technologies, the South Korean company realized the shortcomings immediately. The current article addresses some of the issues and presents a big data analytics architecture for its IoST. A system developed using the proposed architecture will be able to analyze and store IoST data efficiently while enabling better decisions. The proposed architecture is composed of four layers, namely the small things layer, infrastructure layer, platform layer, and application layer. Finally, a detailed analysis of a big data implementation of the IoST used to track humidity and temperature via Hadoop is presented as a proof of concept.
Keywords: Big Data, data analysis, Internet of Things, parallel programming
مقاله انگلیسی
39 A Distributed Computing Platform for fMRI Big Data Analytics
یک پلت فرم محاسباتی توزیع شده برای تحلیل داده های بزرگ fMRI-2018
Since the BRAIN Initiative and Human Brain Project began, a few efforts have been made to address the computational challenges of neuroscience Big Data. The promises of these two projects were to model the complex interaction of brain and behavior and to understand and diagnose brain diseases by collecting and analyzing large quanitites of data. Archiving, analyzing, and sharing the growing neuroimaging datasets posed major challenges. New computational methods and technologies have emerged in the domain of Big Data but have not been fully adapted for use in neuroimaging. In this work, we introduce the current challenges of neuroimaging in a big data context. We review our efforts toward creating a data management system to organize the large-scale fMRI datasets, and present our novel algorithms/methods for the distributed fMRI data processing that employs Hadoop and Spark. Finally, we demonstrate the significant performance gains of our algorithms/methods to perform distributed dictionary learning
Index Terms: fMRI, big data analytics, distributed computing, apache-spark, machine learning
مقاله انگلیسی
40 An approach for Big Data Security based on Hadoop Distributed File system
یک رویکرد برای امنیت داده های بزرگ مبتنی بر سیستم فایل توزیع هادوپ-2018
Cloud computing appeared for huge data because of its ability to provide users with on-demand, reliable, flexible, and low-cost services. With the increasing use of cloud applications, data security protection has become an important issue for the cloud. In this work, the proposed approach was used to improve the performance of encryption /Decryption file by using AES and OTP algorithms integrated on Hadoop. Where files are encrypted within the HDFS and decrypted within the Map Task. Encryption /Decryption in previous works used AES algorithm, the size of the encrypted file increased by 50% from the original file size. The proposed approach improved this ratio as the size of the encrypted file increased by 20% from the original file size. Also, we have compared this approach with the previously implemented method, we implement this new approach to secure HDFS, and some experimental studies were conducted to verify its effectiveness.
Keywords: Cloud storage, Hadoop, HDFS, Data Security, Encryption, Decryption
مقاله انگلیسی
rss مقالات ترجمه شده rss مقالات انگلیسی rss کتاب های انگلیسی rss مقالات آموزشی
logo-samandehi
بازدید امروز: 713 :::::::: بازدید دیروز: 0 :::::::: بازدید کل: 713 :::::::: افراد آنلاین: 69