Forecasting third-party mobile payments with implications for customer flow prediction
پیش بینی پرداخت های تلفن همراه شخص ثالث با پیامدهای پیش بینی جریان مشتری-2020
Forecasting customer flow is key for retailers in making daily operational decisions, but small retailers often lack the resources to obtain such forecasts. Rather than forecasting stores’ total customer flows, this research utilizes emerging third-party mobile payment data to provide participating stores with a value-added service by forecasting their share of daily customer flows. These customer transactions using mobile payments can then be utilized further to derive retailers’ total customer flows indirectly, thereby overcoming the constraints that small retailers face. We propose a third-party mobile-paymentplatform centered daily mobile payments forecasting solution based on an extension of the newly-developed Gradient Boosting Regression Tree (GBRT) method which can generate multi-step forecasts for many stores concurrently. Using empirical forecasting experiments with thousands of time series, we show that GBRT, together with a strategy for multi-period-ahead forecasting, provides more accurate forecasts than established benchmarks. Pooling data from the platform across stores leads to benefits relative to analyzing the data individually, thus demonstrating the value of this machine learning application.
Keywords: Analytics | Big data | Customer flow forecasting | Machine learning | Forecasting many time series | Multi-step-ahead forecasting strategy
Rapid discrimination of Salvia miltiorrhiza according to their geographical regions by laser induced breakdown spectroscopy (LIBS) and particle swarm optimization-kernel extreme learning machine (PSO-KELM)
تبعیض سریع miltiorrhiza مریم گلی با توجه به مناطق جغرافیایی خود را با طیف سنجی شکست ناشی از لیزر (LIBS) و یادگیری ماشین افراطی بهینه سازی ازدحام ذرات (PSO-KELM)-2020
Laser-induced breakdown spectroscopy (LIBS) coupled with particle swarm optimization-kernel extreme learning machine (PSO-KELM) method was developed for classification and identification of six types Salvia miltiorrhiza samples in different regions. The spectral data of 15 Salvia miltiorrhiza samples were collected by LIBS spectrometer. An unsupervised classification model based on principal components analysis (PCA) was employed first for the classification of Salvia miltiorrhiza in different regions. The results showed that only Salvia miltiorrhiza samples from Gansu and Sichuan Province can be easily distinguished, and the samples in other regions present a bigger challenge in classification based on PCA. A supervised classification model based on KELM was then developed for the classification of Salvia miltiorrhiza, and two methods of random forest (RF) and PSO were used as the variable selection method to eliminate useless information and improve classification ability of the KELM model. The results showed that PSO-KELM model has a better classification result with a classification accuracy of 94.87%. Comparing the results with that obtained by particle swarm optimization-least squares support vector machines (PSO-LSSVM) and PSO-RF model, the PSO-KELM model possess the best classification performance. The overall results demonstrate that LIBS technique combined with PSO-KELM method would be a promising method for classification and identification of Salvia miltiorrhiza samples in different regions.
Keywords: Laser-induced breakdown spectroscopy | Particle swarm optimization | Kernel extreme learning machine | Salvia miltiorrhiza | Classification
Challenges and recommended technologies for the industrial internet of things: A comprehensive review
چالش ها و فن آوری های پیشنهادی برای اینترنت اشیا صنعتی: مرور جامع-2020
Physical world integration with cyber world opens the opportunity of creating smart environments; this new paradigm is called the Internet of Things (IoT). Communication between humans and objects has been extended into those between objects and objects. Industrial IoT (IIoT) takes benefits of IoT communications in business applications focusing in interoperability between machines (i.e., IIoT is a subset from the IoT). Number of daily life things and objects connected to the Internet has been in increasing fashion, which makes the IoT be the dynamic network of networks. Challenges such as heterogeneity, dynamicity, velocity, and volume of data, make IoT services produce inconsistent, inaccurate, incomplete, and incorrect results, which are critical for many applications especially in IIoT (e.g., health-care, smart transportation, wearable, finance, industry, etc.). Discovering, searching, and sharing data and resources reveal 40% of IoT benefits to cover almost industrial applications. Enabling real-time data analysis, knowledge extraction, and search techniques based on Information Communication Technologies (ICT), such as data fusion, machine learning, big data, cloud computing, blockchain, etc., can reduce and control IoT and leverage its value. This research presents a comprehensive review to study state-of-the-art challenges and recommended technologies for enabling data analysis and search in the future IoT presenting a framework for ICT integration in IoT layers. This paper surveys current IoT search engines (IoTSEs) and presents two case studies to reflect promising enhancements on intelligence and smartness of IoT applications due to ICT integration.
Keywords: Industrial IoT (IIoT) | Searching and indexing | Blockchain | Big data | Data fusion Machine learning | Cloud and fog computing
Big data analytics in health sector: Theoretical framework, techniques and prospects
تجزیه و تحلیل داده های بزرگ در بخش بهداشت و درمان: چارچوب نظری ، تکنیک ها و چشم انداز-2020
Clinicians, healthcare providers-suppliers, policy makers and patients are experiencing exciting opportunities in light of new information deriving from the analysis of big data sets, a capability that has emerged in the last decades. Due to the rapid increase of publications in the healthcare industry, we have conducted a structured review regarding healthcare big data analytics. With reference to the resource-based view theory we focus on how big data resources are utilised to create organization values/capabilities, and through content analysis of the selected publications we discuss: the classification of big data types related to healthcare, the associate analysis techniques, the created value for stakeholders, the platforms and tools for handling big health data and future aspects in the field. We present a number of pragmatic examples to show how the advances in healthcare were made possible. We believe that the findings of this review are stimulating and provide valuable information to practitioners, policy makers and researchers while presenting them with certain paths for future research.
Keywords: Big data analytics | Health-Medicine | Decision-making | Machine learning | Operations research (OR) techniques
Big Data Everywhere
داده های بزرگ در همه جا-2020
Big Data and machine-learning approaches to analytics are an important new frontier in laboratory medicine. Direct-to-consumer (DTC) testing raises specific challenges in applying these new tools of data analytics. Because DTC data are not centralized by default, there is a need for data repositories to aggregate these values to develop appropriate predictive models. The lack of a default linkage between DTC results and medical outcomes data also limits the ability to mine these data for predictive modeling of disease risk. Issues of standardization and harmonization, which are a significant issue across all laboratory medicine, may be particularly difficult to correct in aggregated sets of DTC data
KEYWORDS : Big Data | Laboratory medicine | Machine learning | Direct-to-consumer testing | DTC | Harmonization
Column generation based heuristic for learning classification trees
اکتشاف مبتنی بر تولید ستون برای یادگیری درختان طبقه بندی -2020
This paper explores the use of Column Generation (CG) techniques in constructing univariate binary de- cision trees for classification tasks. We propose a novel Integer Linear Programming (ILP) formulation, based on root-to-leaf paths in decision trees. The model is solved via a Column Generation based heuris- tic. To speed up the heuristic, we use a restricted instance data by considering a subset of decision splits, sampled from the solutions of the well-known CART algorithm. Extensive numerical experiments show that our approach is competitive with the state-of-the-art ILP-based algorithms. In particular, the pro- posed approach is capable of handling big data sets with tens of thousands of data rows. Moreover, for large data sets, it finds solutions competitive to CART.
Keywords: Machine learning | Decision trees | Column generation | Classification | CART | Integer linear programming
Wake modeling of wind turbines using machine learning
مدل سازی توربین های بادی با استفاده از یادگیری ماشین-2020
In the paper, a novel framework that employs the machine learning and CFD (computational fluid dynamics) simulation to develop new wake velocity and turbulence models with high accuracy and good efficiency is proposed to improve the turbine wake predictions. An ANN (artificial neural network) model based on the backpropagation (BP) algorithm is designed to build the underlying spatial relationship between the inflow conditions and the three-dimensional wake flows. To save the computational cost, a reduced-order turbine model ADM-R (actuator disk model with rotation), is incorporated into RANS (Reynolds-averaged Navier-Stokes equations) simulations coupled with a modified k − ε turbulence model to provide big datasets of wake flow for training, testing, and validation of the ANN model. The numerical framework of RANS/ADM-R simulations is validated by a standalone Vestas V80 2MW wind turbine and NTNU wind tunnel test of double aligned turbines. In the ANN-based wake model, the inflow wind speed and turbulence intensity at hub height are selected as input variables, while the spatial velocity deficit and added turbulence kinetic energy (TKE) in wake field are taken as output variables. The ANN-based wake model is first deployed to a standalone turbine, and then the spatial wake characteristics and power generation of an aligned 8-turbine row as representation of Horns Rev wind farm are also validated against Large Eddy Simulations (LES) and field measurement. The results of ANNbased wake model show good agreement with the numerical simulations and measurement data, indicating that the ANN is capable of establishing the complex spatial relationship between inflow conditions and the wake flows. The machine learning techniques can remarkably improve the accuracy and efficiency of wake predictions.
Keywords: Wind turbine wake | Wake model | Artificial neural network (ANN) | Machine learning | ADM-R (actuator-disk model with rotation) | model | Computational fluid dynamics (CFD)
Analysis of substance use and its outcomes by machine learning I: Childhood evaluation of liability to substance use disorder
تجزیه و تحلیل استفاده از مواد و نتایج آن با یادگیری ماشین I: ارزیابی کودک از مسئولیت در برابر اختلال در مصرف مواد-2020
Background: Substance use disorder (SUD) exacts enormous societal costs in the United States, and it is important to detect high-risk youths for prevention. Machine learning (ML) is the method to find patterns and make prediction from data. We hypothesized that ML identifies the health, psychological, psychiatric, and contextual features to predict SUD, and the identified features predict high-risk individuals to develop SUD. Method: Male (N=494) and female (N=206) participants and their informant parents were administered a battery of questionnaires across five waves of assessment conducted at 10–12, 12–14, 16, 19, and 22 years of age. Characteristics most strongly associated with SUD were identified using the random forest (RF)algorithm from approximately 1000 variables measured at each assessment. Next, the complement of features was validated, and the best models were selected for predicting SUD using seven ML algorithms. Lastly, area under the receiver operating characteristic curve (AUROC) evaluated accuracy of detecting individuals who develop SUD +/- up to thirty years of age. Results: Approximately thirty variables strongly predict SUD. The predictors shift from psychological dysregulation and poor health behavior in late childhood to non-normative socialization in mid to late adolescence. In 10–12-year-old youths, the features predict SUD+/- with 74% accuracy, increasing to 86% at 22 years of age. The RF algorithm optimally detects individuals between 10–22 years of age who develop SUD compared to other ML algorithms. Conclusion: These findings inform the items required for inclusion in instruments to accurately identify high risk youths and young adults requiring SUD prevention
Keywords: Substance use disorder | Machine learning | Substance abuse prevention | Big data | Screening addiction risk
Predicting and explaining corruption across countries: A machine learning approach
پیش بینی و توضیح فساد در سراسر کشور: رویکرد یادگیری ماشینی-2020
In the era of Big Data, Analytics, and Data Science, corruption is still ubiquitous and is perceived as one of the major challenges of modern societies. A large body of academic studies has attempted to identify and explain the potential causes and consequences of corruption, at varying levels of granularity, mostly through theoretical lenses by using correlations and regression-based statistical analyses. The present study approaches the phenomenon from the predictive analytics perspective by employing contemporary machine learning techniques to discover the most important corruption perception predictors based on enriched/enhanced nonlinear models with a high level of predictive accuracy. Specifically, within the multiclass classification modeling setting that is employed herein, the Random Forest (an ensemble-type machine learning algorithm) is found to be the most accurate prediction/classification model, followed by Support Vector Machines and Artificial Neural Networks. From the practical standpoint, the enhanced predictive power of machine learning algorithms coupled with a multi-source database revealed the most relevant corruption-related information, contributing to the related body of knowledge, generating actionable insights for administrator, scholars, citizens, and politicians. The variable importance results indicated that government integrity, property rights, judicial effectiveness, and education index are the most influential factors in defining the corruption level of significance
Keywords: Corruption perception | Machine learning | Predictive modeling | Random forest | Society policies and regulations |Government integrity | Social development
Can twitter analytics predict election outcome? An insight from 2017 Punjab assembly elections
آیا تحلیل های توییتر می توانند نتیجه انتخابات را پیش بینی کنند؟ بینشی از انتخابات مجلس پنجم 2017-2020
Since the beginning of this decade, there has seen an exponential growth in number of internet users using social media, especially Twitter for sharing their views on various topics of common interest like sports, products, politics etc. Due to the active participation of large number of people on Twitter, huge amount of data (i.e. big data) is being generated, which can be put to use (after refining) to analyze real world problems. This paper takes into consideration the Twitter data related to the 2017 Punjab (a state of India) assembly elections and applies different social media analytic techniques on collected tweets to extract and unearth hidden but useful information. In addition to this, we have employed machine learning algorithm to perform polarity analysis and have proposed a new seat forecasting method to accurately predict the number of seats that a political party is likely to win in the elections. Our results confirmed that Indian National Congress was likely to emerge winner and that in fact was the outcome, when results got declared.
Keywords: Analytics | Election prediction | Social media | Natural language processing | Machine learning | Sentiment analysis | Twitter