Data Mining Strategies for Real-Time Control in New York City
استراتژی داده کاوی برای کنترل زمان واقعی در شهر نیویورک-2105
The Data Mining System (DMS) at New York City Department of Transportation (NYCDOT) mainly consists of four database systems for traffic and pedestrian/bicycle volumes, crash data, and signal timing plans as well as the Midtown in Motion (MIM) systems which are used as part of the NYCDOT Intelligent Transportation System (ITS) infrastructure. These database and control systems are operated by different units at NYCDOT as an independent database or operation system. New York City experiences heavy traffic volumes, pedestrians and cyclists in each Central Business District (CBD) area and along key arterial systems. There are consistent and urgent needs in New York City for real-time control to improve mobility and safety for all users of the street networks, and to provide a timely response and management of random incidents. Therefore, it is necessary to develop an integrated DMS for effective real-time control and active transportation management (ATM) in New York City. This paper will present new strategies for New York City suggesting the development of efficient and cost-effective DMS, involving: 1) use of new technology applications such as tablets and smartphone with Global Positioning System (GPS) and wireless communication features for data collection and reduction; 2) interface development among existing database and control systems; and 3) integrated DMS deployment with macroscopic and mesoscopic simulation models in Manhattan. This study paper also suggests a complete data mining process for real-time control with traditional static data, current real timing data from loop detectors, microwave sensors, and video cameras, and new real-time data using the GPS data. GPS data, including using taxi and bus GPS information, and smartphone applications can be obtained in all weather conditions and during anytime of the day. GPS data and smartphone application in NYCDOT DMS is discussed herein as a new concept. © 2014 The Authors. Published by Elsevier B.V. Selection and peer-review under responsibility of Elhadi M. Shakshu Keywords: Data Mining System (DMS), New York City, real-time control, active transportation management (ATM), GPS data
Physical metallurgy-guided machine learning and artificial intelligent design of ultrahigh-strength stainless steel
یادگیری ماشین با هدایت متالورژی فیزیکی و طراحی هوشمند مصنوعی از فولاد ضد زنگ قوی-2019
With the development of the materials genome philosophy and data mining methodologies, machine learning (ML) has been widely applied for discovering new materials in various systems including highend steels with improved performance. Although recently, some attempts have been made to incorporate physical features in the ML process, its effects have not been demonstrated and systematically analysed nor experimentally validated with prototype alloys. To address this issue, a physical metallurgy (PM) -guided ML model was developed, wherein intermediate parameters were generated based on original inputs and PM principles, e.g., equilibrium volume fraction (Vf) and driving force (Df) for precipitation, and these were added to the original dataset vectors as extra dimensions to participate in and guide the ML process. As a result, the ML process becomes more robust when dealing with small datasets by improving the data quality and enriching data information. Therefore, a new material design method is proposed combining PM-guided ML regression, ML classifier and a genetic algorithm (GA). The model was successfully applied to the design of advanced ultrahigh-strength stainless steels using only a small database extracted from the literature. The proposed prototype alloy with a leaner chemistry but better mechanical properties has been produced experimentally and an excellent agreement was obtained for the predicted optimal parameter settings and the final properties. In addition, the present work also clearly demonstrated that implementation of PM parameters can improve the design accuracy and efficiency by eliminating intermediate solutions not obeying PM principles in the ML process. Furthermore, various important factors influencing the generalizability of the ML model are discussed in detail.
Keywords: Alloy design | Machine learning | Physical metallurgy | Small sample problem | Stainless steel
Machine Learning Techniques for Satellite Fault Diagnosis
تکنیک های یادگیری ماشین برای تشخیص عیب ماهواره ای-2019
Satellites are known as a remotely operated systems with high degree of complexity due to large number of interconnected devices onboard the satellite. Consequently, it has corresponding significant number of telemetry parameters to allow operator and designers have full control and monitor of satellite mode of operation. The tremendous amount of telemetry data received from the satellite, during its lifetime, has to be analyzed in order to monitor and control subsystems health for better decision making and fast responsively. In this research, we address the topic of using machine learning techniques to diagnose faults of satellite subsystems using its telemetry parameters. The case study and source of telemetry are acquired from Egyptsat-1 satellite which has been launched April 2007 and lost communication with ground station last 2010. We applied Machine learning techniques in order to identify operating modes and corresponding telemetry parameters. We used Support Vector Machine for Regression to analyze the satellite performance; then a fault diagnosis approach is applied to determine the most probable reason of this satellite failure. Telemetry data is clustered using k-means clustering algorithm in combination with t-distributed stochastic neighbor embedding (t-SNE) function for dimensionality reduction. We classified data using Logical Analysis of Data (LAD) in order to generate positive patterns for each failure class which is used to determine probability failure cause for each telemetry parameter. These probabilities enable Fault Tree Analysis (FTA) to get the most probable cause that lead to satellite failure.
Keywords: Machine learning | Telemetry data mining | Satellite fault diagnosis | Logical analysis of data | Fault tree analysis
Data analysis of multi-dimensional thermophysical properties of liquid substances based on clustering approach of machine learning
تجزیه و تحلیل داده ها از خصوصیات حرارتی فیزیکی چند بعدی مواد مایع بر اساس روش خوشه بندی یادگیری ماشین-2019
In order to develop an efficient framework for global screening in the material exploration, we performed a clustering analysis of machine learning on the multi-dimensional thermophysical properties of the liquid substances. Data mining using a self-organizing map (SOM) based on the unsupervised learning was employed to project high-dimensional thermophysical data onto a low-dimensional space. Here we adopted 98 liquid substances with eight thermo-physical properties for the SOM training in order to group the liquid substances. The present SOM-clustering approach properly categorized liquid substances according to the chemical species characterized by the functional groups.
Keywords: Self-organizing map | Clustering analysis | Machine learning | Thermophysical properties | Heat medium
Machine learning-based coronary artery disease diagnosis: A comprehensive review
تشخیص بیماری عروق کرونر مبتنی بر یادگیری ماشین: یک مرور جامع-2019
Coronary artery disease (CAD) is the most common cardiovascular disease (CVD) and often leads to a heart attack. It annually causes millions of deaths and billions of dollars in financial losses worldwide. Angiography, which is invasive and risky, is the standard procedure for diagnosing CAD. Alternatively, machine learning (ML) techniques have been widely used in the literature as fast, affordable, and noninvasive approaches for CAD detection. The results that have been published on ML-based CAD diagnosis differ substantially in terms of the analyzed datasets, sample sizes, features, location of data collection, performance metrics, and applied ML techniques. Due to these fundamental differences, achievements in the literature cannot be generalized. This paper conducts a comprehensive and multifaceted review of all relevant studies that were published between 1992 and 2019 for ML-based CAD diagnosis. The impacts of various factors, such as dataset characteristics (geographical location, sample size, features, and the stenosis of each coronary artery) and applied ML techniques (feature selection, performance metrics, and method) are investigated in detail. Finally, the important challenges and shortcomings of ML-based CAD diagnosis are discussed.
Keywords: CAD diagnosis | Machine learning | Data mining | Feature selection
Machine learning and data mining frameworks for predicting drug response in cancer: An overview and a novel in silico screening process based on association rule mining
چارچوب های یادگیری ماشین و داده کاوی برای پیش بینی پاسخ به دارو در سرطان: یک مرور کلی و رمان در فرآیند غربالگری سیلیکون بر اساس کاوش قوانین انجمنی-2019
A major challenge in cancer treatment is predicting the clinical response to anti-cancer drugs on a personalized basis. The success of such a task largely depends on the ability to develop computational resources that integrate big “omic” data into effective drug-response models. Machine learning is both an expanding and an evolving computational field that holds promise to cover such needs. Here we provide a focused overview of: 1) the various supervised and unsupervised algorithms used specifically in drug response prediction applications, 2) the strategies employed to develop these algorithms into applicable models, 3) data resources that are fed into these frameworks and 4) pitfalls and challenges tomaximizemodel performance. In this contextwe also describe a novel in silico screening process, based on Association RuleMining, for identifying genes as candidate drivers of drug response and compare it with relevant data mining frameworks, for which we generated a web application freely available at: https://compbio.nyumc.org/drugs/. This pipeline explores with high efficiency large samplespaces, while is able to detect low frequency events and evaluate statistical significance even in the multidimensional space, presenting the results in the form of easily interpretable rules. We conclude with future prospects and challenges of applying machine learning based drug response prediction in precision medicine.
Key words: Drug Response Prediction | Precision Medicine | Data mining | Machine Learning | Association Rule Mining
Machine Learning based Digital Twin Framework for Production Optimization in Petrochemical Industry
چارچوب دوقلوی دیجیتال مبتنی بر یادگیری ماشین برای بهینه سازی تولید در صنعت پتروشیمی-2019
Digital twins, along with the internet of things (IoT), data mining, and machine learning technologies, offer great potential in the transformation of today’s manufacturing paradigm toward intelligent manufacturing. Production control in petrochemical industry involves complex circumstances and a high demand for timeliness; therefore, agile and smart controls are important components of intelligent manufacturing in the petrochemical industry. This paper proposes a framework and approaches for constructing a digital twin based on the petrochemical industrial IoT, machine learning and a practice loop for information exchange between the physical factory and a virtual digital twin model to realize production control optimization. Unlike traditional production control approaches, this novel approach integrates machine learning and real-time industrial big data to train and optimize digital twin models. It can support petrochemical and other process manufacturing industries to dynamically adapt to the changing environment, respond in a timely manner to changes in the market due to production optimization, and improve economic benefits. Accounting for environmental characteristics, this paper provides concrete solutions for machine learning difficulties in the petrochemical industry, e.g., high data dimensions, time lags and alignment between time series data, and high demand for immediacy. The approaches were evaluated by applying them in the production unit of a petrochemical factory, and a model was trained via industrial IoT data and used to realize intelligent production control based on real-time data. A case study shows the effectiveness of this approach in the petrochemical industry.
Keywords: digital twin | machine learning | internet of things | petrochemical industry | production control optimization
Integration of machine learning approaches for accelerated discovery of transition-metal dichalcogenides as Hg0 sensing materials
ادغام رویکردهای یادگیری ماشین برای کشف سریع شتاب دیکلوژنوئیدهای فلز انتقالی به عنوان مواد حسگر Hg0-2019
The detrimental impact of urban airborne Hg0 from fossil fuel utilization has necessitated the discovery and development of Hg0 sensing materials for effective Hg0 detection and mitigation of the pollutant. Earlier studies have hypothetically and experimentally supported 2-dimensional transition-metal dichalcogenides (2D TMDCs), particularly MoS2 to have excellent performance for Hg0 removal. However, the potential of other TMDCs is yet to be investigated for Hg0 sensor application. In this study, a total of 28 transition metals within periods 4–6 of the periodic table, excluding the lanthanides series, were examined. To ensure proper data management flow, a high-throughput data mining approach with integrated machine learning and cheminformatics simulation approaches is developed. The systemic approach integrates the Pymatgen, Factsage, Aflow and density functional theory simulation tools for accelerated discovery of suitable TMDCs from raw data via the chemical vapour reaction route. Predicted results showed that TiS2, NiS2, ZrS2, MoS2, PdS2 and WS2 exhibited TMDCs characteristics. Furthermore, first-principles calculation shows Hg-uptake capacity is in the order NiS2 > PdS2 > TiS2 > ZrS2 > WS2 > MoS2, while Hg sensing response is in the order PdS2 > MoS2 > WS2 > ZrS2 > NiS2 > TiS2. Accordingly, PdS2 depicted to be the most suitable TMDCs for airborne Hg0 sensor application. The proposed systemic approach is an initial platform for materials discovery using integrated machine learning approaches and is well-suited for the screening and the discovery of new materials based on component-oriented structures.
Keywords: Atmospheric Hg0 sensor | Data mining | 2D TMDCs | Machine learning | DFT
Introducing a new method for the fusion of fraud evidence in banking transactions with regards to uncertainty
معرفی یک روش جدید برای ادغام شواهد کلاهبرداری در معاملات بانکی با توجه به عدم اطمینان-2019
Detection of fraudulent transactions is a vital factor for financial institutions, and finding more effective and accurate methods is of tremendous importance. The use of supervised data mining techniques is not feasible in many cases due to the lack of access to labeled data. Fraud detection is a complex task, and unsupervised methods like clustering and outlier detection techniques employed alone do not yield sat- isfactory results. Another issue is epistemic uncertainty due to the absence of sufficient information on the behavioral aspects of different customers, which also leads to poorer results for fraud detection and makes the fraud detection system inapplicable in real world environment. In this paper, using multi cri- teria decision method, intuitionistic fuzzy set, and evidential reasoning, a new method for detection of fraud was introduced, which infuses several behavioral evidence of a transaction concerning the effect of uncertainty for them. Transactional behavior was modeled by considering the trends of different main and aggregated variables at different periods and the extent of deviation of the new arrived transaction from each of these trends were considered as behavioral evidence. The final belief, which is the result of the combination of much evidence using the proposed method, will determine the originality of a newly arrived transaction. Finally, using a real world dataset, the results of the new method were compared with the results of Dempster–Shafer method in terms of the number of frauds discovered and the num- ber of erroneous alerts they issued. The findings showed that the method introduced in this study has higher accuracy and lower false alarms compared to Dempster–Shafer method while the computational complexity of this method makes its implementation time longer.
Keywords: Fraud detection | IFS | DST | Uncertainty | Evidential reasoning
Direct marketing campaigns in retail banking with the use of deep learning and random forests
کمپین های بازاریابی مستقیم در بانکداری خرده فروشی با استفاده از یادگیری عمیق و جنگل های تصادفی-2019
Credit products are a crucial part of business of banks and other financial institutions. A novel approach based on time series of customer’s data representation for predicting willingness to take a personal loan is shown. Proposed testing procedure based on moving window allows detection of complex, sequen- tial, time based dependencies between particular transactions. Moreover, this approach reduces noise by eliminating irrelevant dependencies that would occur due to the lack of time dimension analysis. The system for identifying customers interested in credit products, based on classification with random forests and deep neural networks is proposed. The promising results of empirical studies prove that the system is able to extract significant patterns from customers historical transfer and transactional data and predict credit purchase likelihood. Our approach, including the testing method, is not limited to banking sector and can be easily transferred and implemented as a general purpose direct marketing campaign system.
Keywords: Consumer credit | Retail banking | Direct marketing | Marketing campaigns | Database marketing | Random forest | Deep learning | Deep belief networks | Data mining | Time series | Feature selection | Boruta algorith