دانلود و نمایش مقالات مرتبط با MapReduce::صفحه 3
دانلود بهترین مقالات isi همراه با ترجمه فارسی
نتیجه جستجو - MapReduce

تعداد مقالات یافته شده: 110
ردیف عنوان نوع
21 سیستم تشخیص نفوذ توزیع شده برای محیط های ابری بر اساس تکنیک های داده کاوی
سال انتشار: 2018 - تعداد صفحات فایل pdf انگلیسی: 7 - تعداد صفحات فایل doc فارسی: 16
تقریبا دو دهه بعد از ظهور انها؛ محاسبات ابری همچنان در میان سازمان ها و کاربران فردی در حال افزایش است. بسیاری از مسائل امنیتی همراه انتقال برای این الگوی محاسباتی شامل تشخیص نفوذ به وجود می اید. ابزارهای حمله و نفوذ با شکستن سیستم های تشخیص نفوذ سنتی (IDS) با مقدار زیادی از اطلاعات ترافیک شبکه و رفتارهای پویا پیچیده تر شده است. IDSs ابری موجود از کمبود دقت تشخیص؛ نرخ مثبت کاذب بالا و زمان اجرای بالا رنج می برد. در این مقاله ما یک یادگیری توزیع ماشینی بر مبنی سیستم تشخیص نفوذ برای محیط های ابری را ارائه می دهیم. سیستم پیشنهاد شده برای مندرجات در سمت ابری به وسیله اندازه همراه اجزای شبکه لبه از ابرهای ارائه شده است. اینها به ترافیک رهگیری شبکه های ورودی به لبه شبکه routers از از لایه فیزیکی اجازه می دهد. یک الگوریتم پنجره کشویی (sliding window) مبتنی بر زمان برای پیش پردازش شبکه گرفتار ترافیک در هر router ابری استفاده می شود و سپس در نمونه تشخیص ناهنجاری دسته بندی Naive Bayes استفاده می شود. یک مجموعه از گره های سرور کالا بر مبنی یک Hadoop و MapReduce برای هر نمونه تشخیص ناهنجاری از زمانی که تراکم شبکه افزایش می یابد؛ در دسترس است. برای هر پنجره زمانی؛ داده ترافیک ناهنجاری شبکه در هر طرف router برای یک سرور ذخیره سازی مرکزی هماهنگ شده است. بعد؛ یک طبقه بندی یادگیری گروهی بر مبنی یک Forest تصادفی برای اجرای یک مرحله دسته بندی چند کلاسه نهایی به منظور تشخیص انواعی از هر حمله استفاده می شود.
لغات کلیدی: سیستم های تشخیص نفوذ | محاسبات ابری | یادگیری ماشین | هادوپ | MapReduce
مقاله ترجمه شده
22 Spatial cumulative sum algorithm with big data analytics for climate change detection
الگوریتم مجموع تجمعی فضایی با تجزیه و تحلیل داده های بزرگ برای تشخیص تغییرات اقلیمی-2018
Big data plays a vital role in the prediction of diseases that occur due to climate change. For such predictions, scalable data storage platforms and efficient change detection algo rithms are required to monitor the climate change. However, traditional data storage tech niques and algorithms are not applicable to process the huge amount of climate data. This paper presents a scalable data processing framework with a novel change detection al gorithm. The large volume of climate data is stored on Hadoop Distributed File System (HDFS) and MapReduce algorithm is applied to calculate the seasonal average of climate parameters. Spatial autocorrelation based climate change detection algorithm is proposed in this paper to monitor the changes in the seasonal climate. The proposed climate change detection algorithm is compared with various existing approaches such as pruned exact linear time method, binary segmentation method, and segment neighborhood method.
Keywords: Hadoop Distributed File System ، Big data ،Climate change ، Data analytics ، Weather sensor data
مقاله انگلیسی
23 Scalable Distributed Semantic Network for knowledge management in cyber physical system
شبکه معنایی توزیع شده مقیاس پذیر برای مدیریت دانش در سیستم فیزیکی سایبری-2018
The remarkable growth of emerging technologies and computing paradigms in cyberspace and the cyber physical systems generate a huge mass of data sources. These different autonomous and heterogeneous data sources can contain complementary or semantically equivalent information stored under different formats that vary from structured, semi structured, to unstructured. These heterogeneities influence on data semantics and meaning. Therefore, knowledge management became more and more difficult and sometimes fruitless. In this paper, we propose a new scalable model, named Distributed Semantic Network (DSN), for heterogeneous data representation and can extract more semantic information from different data sources. We use the prior knowledge of WordNet and Wikipedia to scale out DSN horizontally and vertically. Furthermore, we proposed a MapReduce based framework to construct the knowledge base more effectively in Parallel and Distributed Computing (PDC). The experimental results show that DSN can better model the semantic information in the text. It can extract a larger amount of information from the text with a higher precision, achieving 34% increase in quantity and 15% promotion on precision than the best-performing alternative method on same datasets. On the three datasets, our proposed PDC framework shorten the process time by 5.8–11.5 times.
Keywords: Parallel and distributed computing ، Knowledge management ، Distributed semantic network ، MapReduce framework ، Cyber physical system
مقاله انگلیسی
24 Intelligent inversion method for pre-stack seismic big data based on MapReduce
روش معکوس هوشمند برای داده های بزرگ لرزه ای قبل از پشته بر اساس MapReduce-2018
Seismic exploration is a method of oil exploration that uses seismic information; that is, according to the inversion of seismic information, the useful information of the reservoir parameters can be obtained to carry out exploration effectively. Pre-stack data are characterised by a large amount of data, abundant information, and so on, and according to its inversion, the abundant information of the reservoir parameters can be obtained. Owing to the large amount of pre-stack seismic data, existing single-machine environments have not been able to meet the computational needs of the huge amount of data; thus, the development of a method with a high efficiency and the speed to solve the inversion problem of pre-stack seismic data is urgently needed. The optimisation of the elastic parameters by using a genetic algorithm easily falls into a local optimum, which results in a non-obvious inversion effect, especially for the optimisation effect of the density. Therefore, an intelligent optimisation al gorithm is proposed in this paper and used for the elastic parameter inversion of pre-stack seismic data. This algorithm improves the population initialisation strategy by using the Gardner formula and the genetic operation of the algorithm, and the improved algorithm obtains better inversion results when carrying out a model test with logging data. All of the elastic parameters obtained by inversion and the logging curve of theoretical model are fitted well, which effectively improves the inversion precision of the density. This algorithm was implemented with a MapReduce model to solve the seismic big data inversion problem. The experimental results show that the parallel model can effectively reduce the running time of the algorithm.
Keywords: Intelligent optimisation algorithm ، Pre-stack seismic data ، Elastic parameter inversion ، MapReduce
مقاله انگلیسی
25 A Big Data Scale Algorithm for Optimal Scheduling of Integrated Microgrids
الگوریتم مقیاس داده های بزرگ برای زمانبندی بهینه میکرو شبکه های یکپارچه-2018
The capability of switching into the islanded operation mode of microgrids has been advocated as a viable solution to achieve high system reliability. This paper proposes a new model for the microgrids optimal scheduling and load curtailment problem. The proposed problem determines the optimal schedule for local generators of microgrids to minimize the generation cost of the associated distribution system in the normal operation. Moreover, when microgrids have to switch into the islanded operation mode due to reliability considerations, the optimal generation solution still guarantees for the minimal amount of load curtailment. Due to the large number of constraints in both normal and islanded operations, the formulated problem becomes a large-scale optimization problem and is very challenging to solve using the centralized computational method. Therefore, we propose a decomposition algorithm using the alternating direction method of multipliers that provides a parallel computational framework. The simulation results demonstrate the efficiency of our proposed model in reducing generation cost, as well as guaranteeing the reliable operation of microgrids in the islanded mode. We finally describe the detailed implementation of parallel computation for our proposed algorithm to run on a computer cluster using the Hadoop MapReduce software framework.
Index Terms: Alternating direction method of multipliers (ADMM), big data, Hadoop, integrated microgrid, islanded operation, load curtailment, MapReduce
مقاله انگلیسی
26 On Distributed Fuzzy Decision Trees for Big Data
درخت تصمیم گیری فازی توزیع شده برای داده های بزرگ-2018
Fuzzy decision trees (FDTs) have shown to be an effective solution in the framework of fuzzy classification. The approaches proposed so far to FDT learning, however, have generally neglected time and space requirements. In this paper, we propose a distributed FDT learning scheme shaped according to the MapReduceprogrammingmodelforgeneratingbothbinaryandmultiway FDTs from big data. The scheme relies on a novel distributed fuzzy discretizer that generates a strong fuzzy partition for each continuous attribute based on fuzzy information entropy. The fuzzy partitions are, therefore, used as an input to the FDT learning algorithm, which employs fuzzy information gain for selecting the attributes at the decision nodes. We have implemented the FDT learning scheme on the Apache Spark framework. We have used ten real-world publicly available big datasets for evaluating the behavior of the scheme along three dimensions: 1) performance in terms of classification accuracy, model complexity, and execution time; 2) scalability varying the number of computing units; and 3) ability to efficiently accommodate an increasing dataset size. We have demonstrated that the proposed scheme turns out to be suitable for managing big datasets even with a modest commodity hardware support. Finally, we have used the distributed decision tree learning algorithm implemented in the MLLib library and the Chi-FRBCS-BigData algorithm, a MapReduce distributed fuzzy rule-based classification system, for comparative analysis
Index Terms: Apache spark, big data, fuzzy decision trees (FDTs), fuzzy discretizer, fuzzy entropy, fuzzy partitioning,MapReduce
مقاله انگلیسی
27 A Parallel Multiclassification Algorithm for Big Data Using an Extreme Learning Machine
الگوریتم طبقه بندی چندگانه برای داده های بزرگ با استفاده از ماشین یادگیری نهایی-2018
As data sets become larger and more complicated, an extreme learning machine (ELM) that runs in a traditional serial environment cannot realize its ability to be fast and effective. Although a parallel ELM (PELM) based on MapReduce to process large-scale data shows more efficient learning speed than identical ELM algorithms in a serial environment, some operations, such as intermediate results stored on disks and multiple copies for each task, are indispensable, and these operations create a large amount of extra overhead and degrade the learning speed and efficiency of the PELMs. In this paper, an efficient ELM based on the Spark framework (SELM), which includes three parallel subalgorithms, is proposed for big data classification. By partitioning the corresponding data sets reasonably, the hidden layer output matrix calculation algorithm, matrix Û decomposition algorithm, and matrix V decomposition algorithm perform most of the computations locally. At the same time, they retain the intermediate results in distributed memory and cache the diagonal matrix as broadcast variables instead of several copies for each task to reduce a large amount of the costs, and these actions strengthen the learning ability of the SELM. Finally, we implement our SELM algorithm to classify large data sets. Extensive experiments have been conducted to validate the effectiveness of the proposed algorithms. As shown, our SELM achieves an 8.71× speedup on a cluster with ten nodes, and reaches a 13.79× speedup with 15 nodes, an 18.74× speedup with 20 nodes, a 23.79× speedup with 25 nodes, a 28.89× speedup with 30 nodes, and a 33.81× speedup with 35 nodes
Index Terms: Big data, classification, extreme learning machine (ELM), matrix, parallel algorithms, Spark
مقاله انگلیسی
28 Big Data Based Security Analytics for Protecting Virtualized Infrastructures in Cloud Computing
تحلیل امنیتی بر اساس داده های بزرگ برای حفاظت زیرساخت های مجازی شده در محاسبات ابری-2018
Virtualized infrastructure in cloud computing has become an attractive target for cyberattackers to launch advanced attacks. This paper proposes a novel big data based security analytics approach to detecting advanced attacks in virtualized infrastructures. Network logs as well as user application logs collected periodically from the guest virtual machines (VMs) are stored in the Hadoop Distributed File System (HDFS). Then, extraction of attack features is performed through graph-based event correlation and MapReduce parser based identification of potential attack paths. Next, determination of attack presence is performed through two-step machine learning, namely logistic regression is applied to calculate attack’s conditional probabilities with respect to the attributes, and belief propagation is applied to calculate the belief in existence of an attack based on them. Experiments are conducted to evaluate the proposed approach using well-known malware as well as in comparison with existing security techniques for virtualized infrastructure. The results show that our proposed approach is effective in detecting attacks with minimal performance overhead.
Index Terms: Virtualized infrastructure, virtualization security, cloud security, malware detection, rootkit detection, security analytics, event correlation, logistic regression, belief propagation
مقاله انگلیسی
29 Achieving high performance and privacy-preserving query over encrypted multidimensional big metering data
دستیابی به پرس و جو با حفظ عملکرد بالا و حفظ حریم خصوصی بیش از داده های اندازه گیری بزرگ چند بعدی محصور شده-2018
With the proliferation of smart grids, traditional utilities are struggling to handle the increasing amount of metering data. Outsourcing the metering data to heterogeneous distributed systems has the poten tial to provide efficient data access and processing. In an untrusted heterogeneous distributed system environment, employing data encryption prior to outsourcing can be an effective way to preserve user privacy. However, how to efficiently query encrypted multidimensional metering data stored in an un trusted heterogeneous distributed system environment remains a research challenge. In this paper, we propose a high performance and privacy-preserving query (P2Q) scheme over encrypted multidimen sional big metering data to address this challenge. In the proposed scheme, encrypted metering data are stored in the server of an untrusted heterogeneous distributed system environment. A Locality Sensitive Hashing (LSH) based similarity search approach is then used to realize the similarity query. To demon strate utility of the proposed LSH-based search approach, we implement a prototype using MapReduce for the Hadoop distributed environment. More specifically, for a given query, the proxy server will return K top similar data object identifiers. An enhanced Ciphertext-Policy Attribute-based Encryption (CP-ABE) policy is then used to control access to the search results. Therefore, only the requester with an authorized query attribute can obtain the correct secret keys to retrieve the metering data. We then prove that the P2Q scheme achieves data confidentiality and preserves the data owner’s privacy in a semi-trusted cloud. In addition, our evaluations demonstrate that the P2Q scheme can significantly reduce response time and provide high search efficiency without compromising on search quality (i.e. suitable for multidimensional big data search in heterogeneous distributed system, such as cloud storage system).
Keywords: Smart grid ، High performance ، Privacy preservation ، Similarity query ، Multidimensional big metering data
مقاله انگلیسی
30 FASTEN: An FPGA Based Secure System for Big Data Processing
FASTEN: یک سیستم امن بر اساس FPGA برای پردازش داده های بزرگ-2018
In cloud computing framework, the data security and protection is one of the most important aspects for optimization and concrete implementation. This paper proposes a reliable yet efficient FPGA-based security system via crypto engines and Physical Unclonable Functions (PUFs) for big data applications. Considering that FPGA or GPU-based accelerators are popular in data centers, we believe the proposed approach is very practical and effective method for data security in cloud computing.
Keywords: FPGA, Security,Big Data, Cloud Computing, Hadoop MapReduce
مقاله انگلیسی
rss مقالات ترجمه شده rss مقالات انگلیسی rss کتاب های انگلیسی rss مقالات آموزشی
logo-samandehi