Forecasting third-party mobile payments with implications for customer flow prediction
پیش بینی پرداخت های تلفن همراه شخص ثالث با پیامدهای پیش بینی جریان مشتری-2020
Forecasting customer flow is key for retailers in making daily operational decisions, but small retailers often lack the resources to obtain such forecasts. Rather than forecasting stores’ total customer flows, this research utilizes emerging third-party mobile payment data to provide participating stores with a value-added service by forecasting their share of daily customer flows. These customer transactions using mobile payments can then be utilized further to derive retailers’ total customer flows indirectly, thereby overcoming the constraints that small retailers face. We propose a third-party mobile-paymentplatform centered daily mobile payments forecasting solution based on an extension of the newly-developed Gradient Boosting Regression Tree (GBRT) method which can generate multi-step forecasts for many stores concurrently. Using empirical forecasting experiments with thousands of time series, we show that GBRT, together with a strategy for multi-period-ahead forecasting, provides more accurate forecasts than established benchmarks. Pooling data from the platform across stores leads to benefits relative to analyzing the data individually, thus demonstrating the value of this machine learning application.
Keywords: Analytics | Big data | Customer flow forecasting | Machine learning | Forecasting many time series | Multi-step-ahead forecasting strategy
Challenges and recommended technologies for the industrial internet of things: A comprehensive review
چالش ها و فن آوری های پیشنهادی برای اینترنت اشیا صنعتی: مرور جامع-2020
Physical world integration with cyber world opens the opportunity of creating smart environments; this new paradigm is called the Internet of Things (IoT). Communication between humans and objects has been extended into those between objects and objects. Industrial IoT (IIoT) takes benefits of IoT communications in business applications focusing in interoperability between machines (i.e., IIoT is a subset from the IoT). Number of daily life things and objects connected to the Internet has been in increasing fashion, which makes the IoT be the dynamic network of networks. Challenges such as heterogeneity, dynamicity, velocity, and volume of data, make IoT services produce inconsistent, inaccurate, incomplete, and incorrect results, which are critical for many applications especially in IIoT (e.g., health-care, smart transportation, wearable, finance, industry, etc.). Discovering, searching, and sharing data and resources reveal 40% of IoT benefits to cover almost industrial applications. Enabling real-time data analysis, knowledge extraction, and search techniques based on Information Communication Technologies (ICT), such as data fusion, machine learning, big data, cloud computing, blockchain, etc., can reduce and control IoT and leverage its value. This research presents a comprehensive review to study state-of-the-art challenges and recommended technologies for enabling data analysis and search in the future IoT presenting a framework for ICT integration in IoT layers. This paper surveys current IoT search engines (IoTSEs) and presents two case studies to reflect promising enhancements on intelligence and smartness of IoT applications due to ICT integration.
Keywords: Industrial IoT (IIoT) | Searching and indexing | Blockchain | Big data | Data fusion Machine learning | Cloud and fog computing
Big data analytics in health sector: Theoretical framework, techniques and prospects
تجزیه و تحلیل داده های بزرگ در بخش بهداشت و درمان: چارچوب نظری ، تکنیک ها و چشم انداز-2020
Clinicians, healthcare providers-suppliers, policy makers and patients are experiencing exciting opportunities in light of new information deriving from the analysis of big data sets, a capability that has emerged in the last decades. Due to the rapid increase of publications in the healthcare industry, we have conducted a structured review regarding healthcare big data analytics. With reference to the resource-based view theory we focus on how big data resources are utilised to create organization values/capabilities, and through content analysis of the selected publications we discuss: the classification of big data types related to healthcare, the associate analysis techniques, the created value for stakeholders, the platforms and tools for handling big health data and future aspects in the field. We present a number of pragmatic examples to show how the advances in healthcare were made possible. We believe that the findings of this review are stimulating and provide valuable information to practitioners, policy makers and researchers while presenting them with certain paths for future research.
Keywords: Big data analytics | Health-Medicine | Decision-making | Machine learning | Operations research (OR) techniques
Big Data Everywhere
داده های بزرگ در همه جا-2020
Big Data and machine-learning approaches to analytics are an important new frontier in laboratory medicine. Direct-to-consumer (DTC) testing raises specific challenges in applying these new tools of data analytics. Because DTC data are not centralized by default, there is a need for data repositories to aggregate these values to develop appropriate predictive models. The lack of a default linkage between DTC results and medical outcomes data also limits the ability to mine these data for predictive modeling of disease risk. Issues of standardization and harmonization, which are a significant issue across all laboratory medicine, may be particularly difficult to correct in aggregated sets of DTC data
KEYWORDS : Big Data | Laboratory medicine | Machine learning | Direct-to-consumer testing | DTC | Harmonization
Column generation based heuristic for learning classification trees
اکتشاف مبتنی بر تولید ستون برای یادگیری درختان طبقه بندی -2020
This paper explores the use of Column Generation (CG) techniques in constructing univariate binary de- cision trees for classification tasks. We propose a novel Integer Linear Programming (ILP) formulation, based on root-to-leaf paths in decision trees. The model is solved via a Column Generation based heuris- tic. To speed up the heuristic, we use a restricted instance data by considering a subset of decision splits, sampled from the solutions of the well-known CART algorithm. Extensive numerical experiments show that our approach is competitive with the state-of-the-art ILP-based algorithms. In particular, the pro- posed approach is capable of handling big data sets with tens of thousands of data rows. Moreover, for large data sets, it finds solutions competitive to CART.
Keywords: Machine learning | Decision trees | Column generation | Classification | CART | Integer linear programming
Wake modeling of wind turbines using machine learning
مدل سازی توربین های بادی با استفاده از یادگیری ماشین-2020
In the paper, a novel framework that employs the machine learning and CFD (computational fluid dynamics) simulation to develop new wake velocity and turbulence models with high accuracy and good efficiency is proposed to improve the turbine wake predictions. An ANN (artificial neural network) model based on the backpropagation (BP) algorithm is designed to build the underlying spatial relationship between the inflow conditions and the three-dimensional wake flows. To save the computational cost, a reduced-order turbine model ADM-R (actuator disk model with rotation), is incorporated into RANS (Reynolds-averaged Navier-Stokes equations) simulations coupled with a modified k − ε turbulence model to provide big datasets of wake flow for training, testing, and validation of the ANN model. The numerical framework of RANS/ADM-R simulations is validated by a standalone Vestas V80 2MW wind turbine and NTNU wind tunnel test of double aligned turbines. In the ANN-based wake model, the inflow wind speed and turbulence intensity at hub height are selected as input variables, while the spatial velocity deficit and added turbulence kinetic energy (TKE) in wake field are taken as output variables. The ANN-based wake model is first deployed to a standalone turbine, and then the spatial wake characteristics and power generation of an aligned 8-turbine row as representation of Horns Rev wind farm are also validated against Large Eddy Simulations (LES) and field measurement. The results of ANNbased wake model show good agreement with the numerical simulations and measurement data, indicating that the ANN is capable of establishing the complex spatial relationship between inflow conditions and the wake flows. The machine learning techniques can remarkably improve the accuracy and efficiency of wake predictions.
Keywords: Wind turbine wake | Wake model | Artificial neural network (ANN) | Machine learning | ADM-R (actuator-disk model with rotation) | model | Computational fluid dynamics (CFD)
Towards a real-time processing framework based on improved distributed recurrent neural network variants with fastText for social big data analytics
به سمت یک چارچوب پردازش در زمان واقعی بر اساس بهبود انواع شبکه عصبی مکرر توزیع شده با fastText برای تجزیه و تحلیل داده های بزرگ اجتماعی-2020
Big data generated by social media stands for a valuable source of information, which offers an excellent opportunity to mine valuable insights. Particularly, User-generated contents such as reviews, recommendations, and users’ behavior data are useful for supporting several marketing activities of many companies. Knowing what users are saying about the products they bought or the services they used through reviews in social media represents a key factor for making decisions. Sentiment analysis is one of the fundamental tasks in Natural Language Processing. Although deep learning for sentiment analysis has achieved great success and allowed several firms to analyze and extract relevant information from their textual data, but as the volume of data grows, a model that runs in a traditional environment cannot be effective, which implies the importance of efficient distributed deep learning models for social Big Data analytics. Besides, it is known that social media analysis is a complex process, which involves a set of complex tasks. Therefore, it is important to address the challenges and issues of social big data analytics and enhance the performance of deep learning techniques in terms of classification accuracy to obtain better decisions. In this paper, we propose an approach for sentiment analysis, which is devoted to adopting fastText with Recurrent neural network variants to represent textual data efficiently. Then, it employs the new representations to perform the classification task. Its main objective is to enhance the performance of well-known Recurrent Neural Network (RNN) variants in terms of classification accuracy and handle large scale data. In addition, we propose a distributed intelligent system for real-time social big data analytics. It is designed to ingest, store, process, index, and visualize the huge amount of information in real-time. The proposed system adopts distributed machine learning with our proposed method for enhancing decision-making processes. Extensive experiments conducted on two benchmark data sets demonstrate that our proposal for sentiment analysis outperforms well-known distributed recurrent neural network variants (i.e., Long Short-Term Memory (LSTM), Bidirectional Long Short-Term Memory (BiLSTM), and Gated Recurrent Unit (GRU)). Specifically, we tested the efficiency of our approach using the three different deep learning models. The results show that our proposed approach is able to enhance the performance of the three models. The current work can provide several benefits for researchers and practitioners who want to collect, handle, analyze and visualize several sources of information in real-time. Also, it can contribute to a better understanding of public opinion and user behaviors using our proposed system with the improved variants of the most powerful distributed deep learning and machine learning algorithms. Furthermore, it is able to increase the classification accuracy of several existing works based on RNN models for sentiment analysis.
Keywords: Big data | FastText | Recurrent neural networks | LSTM | BiLSTM | GRU | Natural language processing | Sentiment analysis | Social big data analytics
Analysis of substance use and its outcomes by machine learning I: Childhood evaluation of liability to substance use disorder
تجزیه و تحلیل استفاده از مواد و نتایج آن با یادگیری ماشین I: ارزیابی کودک از مسئولیت در برابر اختلال در مصرف مواد-2020
Background: Substance use disorder (SUD) exacts enormous societal costs in the United States, and it is important to detect high-risk youths for prevention. Machine learning (ML) is the method to find patterns and make prediction from data. We hypothesized that ML identifies the health, psychological, psychiatric, and contextual features to predict SUD, and the identified features predict high-risk individuals to develop SUD. Method: Male (N=494) and female (N=206) participants and their informant parents were administered a battery of questionnaires across five waves of assessment conducted at 10–12, 12–14, 16, 19, and 22 years of age. Characteristics most strongly associated with SUD were identified using the random forest (RF)algorithm from approximately 1000 variables measured at each assessment. Next, the complement of features was validated, and the best models were selected for predicting SUD using seven ML algorithms. Lastly, area under the receiver operating characteristic curve (AUROC) evaluated accuracy of detecting individuals who develop SUD +/- up to thirty years of age. Results: Approximately thirty variables strongly predict SUD. The predictors shift from psychological dysregulation and poor health behavior in late childhood to non-normative socialization in mid to late adolescence. In 10–12-year-old youths, the features predict SUD+/- with 74% accuracy, increasing to 86% at 22 years of age. The RF algorithm optimally detects individuals between 10–22 years of age who develop SUD compared to other ML algorithms. Conclusion: These findings inform the items required for inclusion in instruments to accurately identify high risk youths and young adults requiring SUD prevention
Keywords: Substance use disorder | Machine learning | Substance abuse prevention | Big data | Screening addiction risk
A new approach for identifying the Kemeny median ranking
یک روش جدید برای شناسایی رتبه بندی متوسط Kemeny-2020
Condorcet consistent rules were originally developed for preference aggregation in the theory of social choice. Nowadays these rules are applied in a variety of fields such as discrete multi-criteria analysis, defence and security decision support, composite indicators, machine learning, artificial intelligence, queries in databases or internet multiple search engines and theoretical computer science. The cycle issue, known also as Condorcets paradox, is the most serious problem inherent in this type of rules. Solutions for dealing with the cycle issue properly already exist in the literature; the most important one being the identification of the median ranking, often called the Kemeny ranking. Unfortunately its identification is a NP-hard problem. This article has three main objectives: (1) to clarify that the Kemeny median order has to be framed in the context of Condorcet consistent rules; this is important since in the current practice sometimes even the Borda count is used as a proxy for the Kemeny ranking. (2) To present a new exact algorithm, this identifies the Kemeny median ranking by providing a searching time guarantee. (3) To present a new heuristic algorithm identifying the Kemeny median ranking with an optimal trade-off between convergence and approximation .
Keywords : Decision analysis | Combinatorial optimisation | Social choice| Multiple criteria | Artificial intelligence| Defence and security| Big data
Predicting and explaining corruption across countries: A machine learning approach
پیش بینی و توضیح فساد در سراسر کشور: رویکرد یادگیری ماشینی-2020
In the era of Big Data, Analytics, and Data Science, corruption is still ubiquitous and is perceived as one of the major challenges of modern societies. A large body of academic studies has attempted to identify and explain the potential causes and consequences of corruption, at varying levels of granularity, mostly through theoretical lenses by using correlations and regression-based statistical analyses. The present study approaches the phenomenon from the predictive analytics perspective by employing contemporary machine learning techniques to discover the most important corruption perception predictors based on enriched/enhanced nonlinear models with a high level of predictive accuracy. Specifically, within the multiclass classification modeling setting that is employed herein, the Random Forest (an ensemble-type machine learning algorithm) is found to be the most accurate prediction/classification model, followed by Support Vector Machines and Artificial Neural Networks. From the practical standpoint, the enhanced predictive power of machine learning algorithms coupled with a multi-source database revealed the most relevant corruption-related information, contributing to the related body of knowledge, generating actionable insights for administrator, scholars, citizens, and politicians. The variable importance results indicated that government integrity, property rights, judicial effectiveness, and education index are the most influential factors in defining the corruption level of significance
Keywords: Corruption perception | Machine learning | Predictive modeling | Random forest | Society policies and regulations |Government integrity | Social development