Big Data Forecasting Using Evolving Multi-layer Perceptron
پیش بینی داده های بزرگ با استفاده از تکامل چند لایه ای Perceptron-2016
One of the mostly used commodities in investment is gold. However, gold price tends to have fluctuation. This paper proposed an Evolving Multi-Layer Perceptron (eMLP) to forecast accurately the gold price by considering its daily fluctuate price and utilizing information from a big data of actual dataset. The proposed eMLP algorithm combines the concept of evolving connectionist system and multi-layer perceptron in neural network. This algorithm can expand its own structure based on the incoming input. An experiment was conducted using actual dataset from January 3rd, 2011 to April 26th, 2013 for training purpose and dataset from April 29th, 2013 to April 25th, 2014 for testing. Experiment results showed that the proposed eMLP gives excellent accuracy with the Mean Absolute Percentage Error (MAPE) up to 0.769% for the selected parameters: sensitivity threshold 0.9, error threshold 0.1, learning rate1 0.9, and learning rate2 0.9.
Keywords: evolving multi-layer perceptron | evolving connectionist system | forecasting | gold price
تجزیه و تحلیل داده های بزرگ برای نظارت بر رفتار دانشجویان
سال انتشار: 2016 - تعداد صفحات فایل pdf انگلیسی: 6 - تعداد صفحات فایل doc فارسی: 15
تهدید امنیتی از حملات تروریستی بی معنی به شهروندان غیر مسلح یک نگرانی عمده در جامعه امروز است.پیشرفت های اخیر در تکنولوژی داده ها این اجازه را به ما داده تا داده های انعطاف پذیر و مقیاس پذیر را در برگرفته،ذخیره کرده،پردازش و تجزیه و تحلیل نماییم.استفاده از این قابلیت ها به ما در برخورد با مشکلات مربوط به امنیت ما کمک نماید.این مقاله معنای جدیدی در زمینه آنالیز رفتاری داده و فرصت های جدیدی برای آنالیز در یک محیط دانشگاهی با استفاده از داده ها فراهم کرده که در حال حاضر در محیط دانشگاهی ارائه شده و مورد استفاده قرار گرفته است.ما اصول اولیه سیستم مبتنی بر تکنولوژی Big Data پیشنهاد می کنیم که می تواند برای نظارت بر دانشجویان و پیش بینی این که آیا برخی از آن ها مستعد ابتلا به ایدئولوژی انحرافی که ممکن است منجر به تروریسم شوند هستند،مورد استفاده قرار گیرد.
کلمات کلیدی:هوش تجاری | آنالیز تجاری | داده های بزرگ | نظارت بر رفتار دانشجویان | آنالیز رفتاری
|مقاله ترجمه شده|
Efficient Embedding of Dynamic Languages in Big-Data Analytics
درهم آمیختن زبان های دینامیک کارآمد در تجزیه و تحلیل داده های بزرگ-2016
Over the last years several frameworks have emerged in the field of big-data analytics. Recent frameworks expose a developer-friendly API via dynamic languages such as Python. Unfortunately, the integration of dynamic languages with the parallel and distributed runtime of such frameworks is cumbersome, as it requires the integration of two or more language virtual machines via inter-process communication, introducing communication overheads and reducing the benefits of the shared memory present in modern multicore machines. In this paper we highlight the advantages of hosting multiple language runtimes in a single shared (language) virtual machine, and the possible performance gain of such an approach in the context of the Apache Spark framework.
A Comparative Survey of the HPC and Big Data Paradigms: Analysis and Experiments
بررسی مقایسه ای پارادایم های HPC و داده های بزرگ: تجزیه و تحلیل و آزمایشات-2016
Many scientific data analytic applications need huge amounts of input, which can often consist of more than several TBs of data. This emphasizes the high I/O and processing/computational cost requirements of these algorithms. Tasks in these programs can induce more I/O operations than computations or the opposite. Hardware also includes nodes with large storage devices and/or nodes with sophisticated computational capabilities. To embrace the heterogeneity of the hardware systems in non-cloud and cloud environments, the issues of resource and job allocation in these environments need to be revisited. HighPerformance Computing (HPC) models, under the leadership of MPI (plus OpenMP) parallel APIs, have mostly met users’ requirements in terms of high computational performance, while Big Data frameworks such as Spark have performed likewise in terms of high-level programming, resiliency and I/O handling. Therefore, in order to meet the specialized needs of scientists, there is a need for convergence between HPC and Big Data ecosystems. This paper presents a data-supported, comparative survey of the main current HPC and Big Data programming interfaces, namely MPI, OpenMP, PGAS (OpenSHMEM), Spark, and Hadoop, and their software stacks. A comprehensive experimental study of these interfaces on a set of benchmarks, namely reduction and I/O microbenchmarks, the StackExchange AnswersCount benchmark, and PageRank Benchmark has been performed on a single platform in order to achieve a fair comparison. These experiments lead to a thorough discussion about whether the envisioned convergence is indeed needed or not, efficient or not, and in particular whether it is the best solution to tackle future computational challenges.
Keywords: Sparks | Big data | Programming | Electronics packaging | Data models | Software | Parallel processing
Spark Versus Flink: Understanding Performance in Big Data Analytics Frameworks
Spark در مقابل Flink: درک عملکرد در چهارچوب های تحلیل داده های بزرگ تجزیه-2016
Big Data analytics has recently gained increasing popularity as a tool to process large amounts of data on-demand. Spark and Flink are two Apache-hosted data analytics frameworks that facilitate the development of multi-step data pipelines using directly acyclic graph patterns. Making the most out of these frameworks is challenging because efficient executions strongly rely on complex parameter configurations and on an in-depth understanding of the underlying architectural choices. Although extensive research has been devoted to improving and evaluating the performance of such analytics frameworks, most of them benchmark the platforms against Hadoop, as a baseline, a rather unfair comparison considering the fundamentally different design principles. This paper aims to bring some justice in this respect, by directly evaluating the performance of Spark and Flink. Our goal is to identify and explain the impact of the different architectural choices and the parameter configurations on the perceived end-to-end performance. To this end, we develop a methodology for correlating the parameter settings and the operators execution plan with the resource usage. We use this methodology to dissect the performance of Spark and Flink with several representative batch and iterative workloads on up to 100 nodes. Our key finding is that there none of the two framework outperforms the other for all data types, sizes and job patterns. This paper performs a fine characterization of the cases when each framework is superior, and we highlight how this performance correlates to operators, to resource usage and to the specifics of the internal framework design.
Index Terms: Big Data | performance evaluation | Spark | Flink
Bridging the I/O Performance Gap for Big Data Workloads: A New NVDIMM-based Approach
پل زدن به شکاف عملکرد I / O برای ظرفیت کارهای داده های بزرگ: رویکرد مبتنی بر NVDIMM جدید-2016
The long I/O latency posts significant challenges for many data-intensive applications, such as the emerging big data workloads. Recently, the NVDIMM (Non-Volatile Dual In-line Memory Module) technologies provide a promising solution to this problem. By employing non-volatile NAND flash memory as storage media and connecting them via DIMM (Dual Inline Memory Module) slots, the NVDIMM devices are exposed to memory bus so the access latencies due to going through I/O controllers can be significantly mitigated. However, placing NVDIMM on the memory bus introduces new challenges. For instance, by mixing I/O and memory traffic, NVDIMM can cause severe performance degradation on memory-intensive applications. Besides, there exists a speed mismatch between fast memory access and slow flash read/write operations. Moreover, garbage collection (GC) in NAND flash may cause up to several millisecond latency. This paper presents novel, enabling mechanisms that allow NVDIMM to more effectively bridge the I/O performance gap for big data workloads. To address the workload heterogeneity challenge, we develop a scheduling scheme in memory controller to minimize the interference between the native and the I/O-derived memory traffic by exploiting both data access criticality and resource utilization. For NVDIMM controller, several mechanisms are designed to better orchestrate traffic between the memory controller and NAND flash to alleviate the speed discrepancy issue. To mitigate the lengthy GC period, we propose a proactive GC scheme for the NVDIMM controller and flash controller to intelligently synchronize and transfer data involving in forthcoming GC operations. We present detailed evaluation and analysis to quantify how well our techniques fit with the NVDIMM design. Our experimental results show that overall the proposed techniques yield 10%∼35% performance improvements over the state-of-the-art baseline schemes.
Keywords: Nonvolatile memory | Random access memory | Throughput | Performance evaluation | Interference
An algorithm of apriori based on medical big data and cloud computing
یک الگوریتم مبتنی بر apriori بر روی داده های بزرگ پزشکی و محاسبات ابری-2016
With the enormous development in the field of medical industry, the value of medical data is highlighted increasingly. The concept of medical big data has become the study target of experts and scholars at the same time. This paper researches the association rules algorithm in existing medical data mining technology, and improves the Apriori algorithm by introducing of interest degree threshold. Based on the Hadoop platform and cloud computing technology, this paper proposes a new association rule algorithm of medical data mining, combined with the MapReduce, interest measure, confidence coefficient, and support degree. At the end of this paper, a simulation experiment is carried out based on Hadoop platform, which proves the superiority of the improved algorithm.
Keywords: Medical data | Apriori algorithm | Cloud computing | Data mining | Hadoop
Applying Big Data Analytics Into Network Security: Challenges, Techniques and Outlooks
استفاده از تجزیه و تحلیل داده های بزرگ به امنیت شبکه: چالش ها، تکنیک ها و چشم انداز-2016
With the tremendous growth of Internet, large amounts of data are generated and create big challenges for nowadays computing technologies and systems. However, on the other hand, it also sheds new light on the areas of data analytics and mining which enables uncovering the patterns and laws beneath the big data. In recent years, big data analytics have been successfully applied to many areas, such as E-commerce, Healthcare, and Industry. As the same time, security analytics based on big data also receive great attention from both academic and industry. In this paper, we give a comprehensive sketch of techniques about the applications of big data in network security analytics. The existing research works are classified into three types: supervised, unsupervised and hybrid approaches. Then we elaborate the technical issues of the three kinds of approaches and compare their advantages and disadvantages. Finally we outlook the potentials and research directions in the future.
Index Terms : big data | network security | anomaly detection
A New Transient Voltage Stability Prediction Model Using Big Data Analysis
مدل پیشبینی ثبات ولتاژ جدید با استفاده از تجزیه و تحلیل داده های بزرگ-2016
A new prediction model is proposed in transient stability analysis based on machine learning in this paper. It extracts features ahead from the time point that we want to make prediction, which produce an interval to take actions. The proposed model also takes network information into consideration, and tried to analyze how nodes in power grid influence each other. Compared to traditional algorithms which just use data from a single node in the past, this model has higher prediction accuracy. Logistic regression is chosen to be the classifier because the learning parameters can be regarded as the significance of variables. At the end, we also develop a practical system called RGAS by mixing Hadoop and Storm. It can perform learning off-line with high throughout, and make predictions on-line with low delay.
Keywords: voltage stability | big data | machine learning | feature selection | cloud computing | hadoop
Applying Climate Big-Data to Analysis of the Correlation between Regional Wind Speed and Wind Energy Generation
اعمال داده های بالقوه آب و هوا برای تحلیل رابطه بین سرعت باد منطقه ای و تولید انرژی باد-2016
In an era of growing concern over climate change, several utility companies originally supplied wholesale and retail power mainly made by burning coal, have started to consider and build the clean-energy power systems for resolving global warming problems. Wind power is nowadays regarded as one of the predominant alternative sources of clean energy. In this paper, we discuss our work on utilizing climate big-data associated with wind power, collected from several wind farms over four years, for exploring the correlation between regional wind speed and wind power. Once this huge amount of data are analyzed, it can be used to develop policies for siting wind-power facilities, designing smart charging algorithms, or evaluating the capacity of electrical distribution systems to meet the actual requirement of power load. Our work started with collecting related climate data for building data model to perform analytics work and experiments using Support Vector Regression (SVR) method. Also, we observed the correlations between other factors related to wind speed and wind energy from our empirical model. The preliminary experimental results demonstrate that our developed system framework is workable, allowing for detailed analysis of the important wind-power related factors on specific wind farm regions.
Keywords: Wind speed | Wind power | Wind farm | Weather data | Support Vector Regression | Machine Learning