Data Mining Strategies for Real-Time Control in New York City
استراتژی داده کاوی برای کنترل زمان واقعی در شهر نیویورک-2105
The Data Mining System (DMS) at New York City Department of Transportation (NYCDOT) mainly consists of four database systems for traffic and pedestrian/bicycle volumes, crash data, and signal timing plans as well as the Midtown in Motion (MIM) systems which are used as part of the NYCDOT Intelligent Transportation System (ITS) infrastructure. These database and control systems are operated by different units at NYCDOT as an independent database or operation system. New York City experiences heavy traffic volumes, pedestrians and cyclists in each Central Business District (CBD) area and along key arterial systems. There are consistent and urgent needs in New York City for real-time control to improve mobility and safety for all users of the street networks, and to provide a timely response and management of random incidents. Therefore, it is necessary to develop an integrated DMS for effective real-time control and active transportation management (ATM) in New York City. This paper will present new strategies for New York City suggesting the development of efficient and cost-effective DMS, involving: 1) use of new technology applications such as tablets and smartphone with Global Positioning System (GPS) and wireless communication features for data collection and reduction; 2) interface development among existing database and control systems; and 3) integrated DMS deployment with macroscopic and mesoscopic simulation models in Manhattan. This study paper also suggests a complete data mining process for real-time control with traditional static data, current real timing data from loop detectors, microwave sensors, and video cameras, and new real-time data using the GPS data. GPS data, including using taxi and bus GPS information, and smartphone applications can be obtained in all weather conditions and during anytime of the day. GPS data and smartphone application in NYCDOT DMS is discussed herein as a new concept. © 2014 The Authors. Published by Elsevier B.V. Selection and peer-review under responsibility of Elhadi M. Shakshu Keywords: Data Mining System (DMS), New York City, real-time control, active transportation management (ATM), GPS data
High-order possibilistic c-means algorithms based on tensor decompositions for big data in IoT
الگوریتم های c-means احتمالی اولویت بالا بر اساس تجزیه تانسور برای داده های بزرگ در اینترنت اشیا-2018
Internet of Things (IoT) connects the physical world and the cyber world to offer intelligent services by data mining for big data. Each big data sample typically involves a large number of attributes, posing a remarkable challenge on the high-order possibilistic c-means algorithm (HOPCM). Specially, HOPCM requires high-performance servers with a large-scale memory and a powerful computing unit, to cluster big samples, limiting its applicability in IoT systems with low-end devices such as portable computing units and embedded devises which have only limited memory space and computing power. In this paper, we propose two high-order possibilistic c-means algorithms based on the canonical polyadic decomposition (CP-HOPCM) and the tensor-train network (TT-HOPCM) for clustering big data. In detail, we use the canonical polyadic decomposition and the tensor-train network to compress the attributes of each big data sample. To evaluate the performance of our algorithms, we conduct the experiments on two representative big data datasets, i.e., NUS-WIDE-14 and SNAE2, by comparison with the conventional highorder possibilistic c-means algorithm in terms of attributes reduction, execution time, memory usage and clustering accuracy. Results imply that CP-HOPCM and TT-HOPCM are potential for big data clustering in IoT systems with low-end devices since they can achieve a high compression rate for heterogeneous samples to save the memory space significantly without a significant clustering accuracy drop.
Keywords: Big data ، IoT ، Possibilistic c-means clustering ، Canonical polyadic decomposition ، Tensor-train network
Using ontology-based clustering to understand the push and pull factors for British tourists visiting a Mediterranean coastal destination
استفاده از خوشه بندی مبتنی بر هستی شناسی جهت درک عوامل فشاری و کششی برای گردشگران بریتانیایی بازدید کننده از یک مقصد ساحل مدیترانه-2018
This paper studies why British tourists decide to travel to a particular destination in a Catalan region. The analysis is based on a survey that includes open-ended questions. First, we propose the operationalization of the concepts of motivation and meaning as push–pull factors when choosing a destination. Second, an ontology-based clustering method is presented, which makes it possible to analyse these qualitative factors from a semantic perspective to obtain tourist segments. A benchmark confirms that the segmentation obtained is better than that generated using classic clustering methods The results show that different meanings can be associated with any single place.
keywords: Data mining| Tourism motivations| Destination meaning| Ontologies| Qualitative data tourism geography
Stay alert: Forecasting the risks of sexting in Korea using social big data
هشدار: پیش بینی خطرات جنسیت در کره با استفاده از داده های بزرگ اجتماعی-2018
Youth sexting, which is commonly defined as the intimate image sharing of persons under 18, is an emerging phenomenon that has garnered significant attention in South Korea and in particular, the South Korean government. Widely recognized for its potential to generate undue harm, the South Korean government has initiated a movement determined to block the participation of obscene content sharing between youths under the age of 18. While there may be different avenues to examine this phenomenon from, an approach notably absent from this list is the use of big data and data mining information produced via the dispersion of the Internet and social media. Using social big data, the study found that teenagers sexting in hopes of obtaining a higher volume of attention among friends; file sharing is more frequented than image distribution through sexting; and transactions without “adult pornography” and with “smishing” were the most influential in addressing the risks of sexting in South Korea. While big data and data mining do not make any inferences themselves, the benefits of analyzing social big data lies in its ability to incorporate a much larger volume of data and confirm the thoughts of a diverse range of participants.
Keywords: Social big data ، Data mining ، Youth sexting ، South Korea ، Trends and patterns
Rim: A reusable iterative model for big data
Rim: یک مدل تکراری قابل استفاده برای داده های بزرگ-2018
In the big data environment, iterative computing is widely used in many applications such as data mining, machine learning, graph analysis and so on. Many iterative computing models are proposed to support the execution of iterative algorithms on big data efficiently. However, it is inefficient if the entire dataset has to be re-iterated when it is partly changed, for example, some data is included or excluded. This paper presents Rim, a Reusable Iterative computing Model which calculates the new iterative results with the updated dataset and the original iterative results, avoiding re-iteration on entire dataset. We propose the application conditions of Rim, and mathematically prove the accuracy and performance advantages of Rim, and describe Rims application on three typical iterative algorithms, which are PageRank, K-means and Descendant-query. Finally, we implement Rim in Spark, and evaluate its performance on different test cases and iterative algorithms. In term of PageRank, K-Means and Descendant-query, experiments show our approach is on average 1.34×, 2.51×, 3.17× faster than re-iteration on massive dataset, respectively.
Keywords: Big data ، Iterative computing ، Iterative model ، Reusable
MRQAR: a generic MapReduce framework to discover Quantitative Association Rules in Big Data problems
MRQAR: یک چارچوب کلی MapReduce برای کشف قوانین کمی در مشکلات داده های بزرگ-2018
Many algorithms have emerged to address the discovery of quantitative association rules from datasets in the last years. However, this task is becoming a challenge because the processing power of most existing techniques is not enough to handle the large amount of data generated nowadays. These vast amounts of data are known as Big Data. A number of previous studies have been focused on mining boolean or nominal association rules from Big Data problems, nevertheless, the data in real-world applications usually consist of quantitative values and designing data mining algorithms able to extract quantitative association rules presents a challenge to workers in this research field. In spite of the fact that we can find classical methods to discover boolean or nominal association rules in the most well-known repositories of Big Data algorithms, such repositories do not provide methods to discover quantitative association rules. Indeed, no methodologies have been proposed in the literature without prior discretization in Big Data. Hence, this work proposes MRQAR, a new generic parallel framework to discover quantitative association rules in large amounts of data, designed following the MapReduce paradigm using Apache Spark. MRQAR performs an incremental learning able to run any sequential quantitative association rule algorithm in Big Data problems without needing to redesign such algorithms. As a case study, we have integrated the multiobjective evolutionary algorithm MOPNAR into MRQAR to validate the generic MapReduce framework proposed in this work. The results obtained in the experimental study performed on five Big Data problems prove the capability of MRQAR to obtain reduced set of high quality rules in reasonable time.
Keywords: Quantitative association rules , multiobjective evolutionary algorithms , Big Data, MapReduce, Spark
Big data-informed energy efficiency assessment of China industry sectors based on K-means clustering
ارزیابی کارآیی انرژی ارزیابی انرژی در بخش های صنعتی چین بر اساس الگوریتم K-means خوشه بندی-2018
The regional energy management body has a large amount of regional industrial companies’ energy consumption data. It can evaluate the energy utilization of listed regional industrial companies based on the total data and, then, find the key points for understanding the resources usage patterns, identifying the problematic companies, and establishing good energy consumption practices. This paper reviews the research progress on big data analysis and industrial energy efficiency evaluation and focuses on the energy efficiency evaluation methods based on energy consumption process analysis and big data mining approach. Based on K-means and multi-dimensional association rules algorithm, to analyze the charac teristics of regional energy consumption in different industries and companies, we cluster single industry in K-means and finding their levels of water and energy consumption. This classification provided us a reference point to identify the industries and companies to focus on and locate the bad consumption practices and environmental performance. Then, multi-dimensional association rules are used to find the correlation of processes, companies and energy efficiency to guide the energy conservation in regional energy monitor. The output of our research is a working Big Data analytics platform and the results generated from advance analytics techniques applied specifically to solve regional energy efficiency problems.
Keywords: Big-data ، Energy efficiency assessment ، K-means ، Multi-dimension association rules
Improving early OSV design robustness by applying Multivariate Big Data Analytics on a ships life cycle
بهبود استحکام طراحی اولیه OSV با استفاده از «تجزیه و تحلیل داده های چند متغیره» در یک چرخه عمر کشتی-2018
Typically, only a smaller portion of the monitorable operational data (e.g. from sensors and environment) from Offshore Support Vessels (OSVs) are used at present. Operational data, in addition to equipment performance data, design and construction data, creates large volumes of data with high veracity and variety. In most cases, such data richness is not well understood as to how to utilize it better during design and operation. It is, very often, too time consuming and resource demanding to estimate the final operational performance of vessel concept design solution in early design by applying simulations and model tests. This paper argues that there is a significant potential to integrate ship lifecycle data from different phases of its operation in large data repository for deliberate aims and evaluations. It is disputed discretely in the paper, evaluating performance of real similar type vessels during early stages of the design process, helps substantially improving and fine-tuning the per formance criterion of the next generations of vessel design solutions. Producing learning from such a ship lifecycle data repository to find useful patterns and relationships among design parameters and existing fleet real performance data, requires the implementation of modern data mining techniques, such as big data and clus tering concepts, which are introduced and applied in this paper. The analytics model introduced suggests and reviews all relevant steps of data knowledge discovery, including pre-processing (integration, feature selection and cleaning), processing (data analyzing) and post processing (evaluating and validating results) in this context.
Keywords: External data ، Internal data ، Abnormality ، Missing data ، Outliers ، Randomness ، Multivariate analysis ، Data integration ، Clustering
Compression of smart meter big data_ A survey
فشرده سازی داده های بزرگ متریک هوشمند : یک مرور-2018
In recent years, the smart grid has attracted wide attention from around the world. Large scale data are collected by sensors and measurement devices in a smart grid. Smart meters can record fine-grained information about electricity consumption in near real-time, thus forming the smart meter big data. Smart meter big data has provided new opportunities for electric load forecasting, anomaly detection, and demand side management. However, the high-dimensional and massive smart meter big data not only creates great pressure on data transmission lines, but also incur enormous storage costs on data centres. Therefore, to reduce the transmission pressure and storage overhead, improve data mining efficiency, and thus fulfil the potential of smart meter big data. This study presents a comprehensive study on the compression techniques for smart meter big data. The development of smart grids and the characteristics and application challenges of electric power big data are first introduced, followed by analysis of the characteristics and benefits of smart meter big data. Finally, this study focuses on the potential data compression methods for smart meter big data, and discusses the evaluation methods for smart meter big data compression.
Keywords: Smart grid ، Smart meter ، Energy big data ، Data compression
فشرده سازی هوشمند برای داده های بزرگ: مرور
سال انتشار: 2018 - تعداد صفحات فایل pdf انگلیسی: 11 - تعداد صفحات فایل doc فارسی: 40
در سال های اخیر، شبکه هوشمند توجه گسترده ای از سراسر جهان را به خود جلب کرده است. داده های مقیاس بزرگ توسط سنسور ها و دستگاه های اندازه گیری در یک شبکه هوشمند جمع آوری می شوند. مقیاس هوشمند می تواند اطلاعات دقیق در مورد مصرف الکتریسیته را در زمان واقعی به ثبت برساند، بنابراین داده های بزرگ در مقیاس هوشمند اندازه گیری می شود. داده های بزرگ مقیاس هوشمند فرصت های جدیدی برای پیش بینی بار الکتریکی، کشف عادت ها و مدیریت تقاضا ارائه داده است. با این حال، ابعاد بزرگ و داده های بزرگ در مقیاس هوشمند عظیم نه تنها فشار زیادی را بر خطوط انتقال داده ایجاد می کند، بلکه هزینه های ذخیره سازی زیادی را در مراکز داده نیز به همراه می آورد. بنابراین، برای کاهش فشار انتقال و ارتفاع محل ذخیره سازی، برای بهبود راندمان استخراج داده ها، و به اين ترتيب ظرفیت های تحقق هوشمند داده های بزرگ 130 سانتی متری است. مقاله پیش رو یک مطالعه جامع در مورد تکنیک های فشرده سازی داده های بزرگ هوشمند را ارائه می دهد. توسعه شبکه های هوشمند و خصوصیات و چالش های کاربرد داده های بزرگ الکتریکی ابتدا معرفی شده و سپس تجزیه و تحلیل ویژگی ها و مزایای داده های بزرگ مقیاس بزرگ انجام می پذیرد. در نهایت، این مطالعه بر روی روش های فشرده سازی اطلاعات بالقوه برای داده های بزرگ هوشمند تمرکز می کند و روش های ارزیابی فشرده سازی داده های مقیاس هوشمند را مورد بحث قرار می دهد.
کلمات کلیدی: شبکه هوشمند | مقیاس هوشمند | داده های بزرگ انرژی | فشرده سازی داده ها.
|مقاله ترجمه شده|