Forecasting third-party mobile payments with implications for customer flow prediction
پیش بینی پرداخت های تلفن همراه شخص ثالث با پیامدهای پیش بینی جریان مشتری-2020
Forecasting customer flow is key for retailers in making daily operational decisions, but small retailers often lack the resources to obtain such forecasts. Rather than forecasting stores’ total customer flows, this research utilizes emerging third-party mobile payment data to provide participating stores with a value-added service by forecasting their share of daily customer flows. These customer transactions using mobile payments can then be utilized further to derive retailers’ total customer flows indirectly, thereby overcoming the constraints that small retailers face. We propose a third-party mobile-paymentplatform centered daily mobile payments forecasting solution based on an extension of the newly-developed Gradient Boosting Regression Tree (GBRT) method which can generate multi-step forecasts for many stores concurrently. Using empirical forecasting experiments with thousands of time series, we show that GBRT, together with a strategy for multi-period-ahead forecasting, provides more accurate forecasts than established benchmarks. Pooling data from the platform across stores leads to benefits relative to analyzing the data individually, thus demonstrating the value of this machine learning application.
Keywords: Analytics | Big data | Customer flow forecasting | Machine learning | Forecasting many time series | Multi-step-ahead forecasting strategy
Big data analytics for financial Market volatility forecast based on support vector machine
تجزیه و تحلیل داده های بزرگ برای پیش بینی نوسانات مالی بازار بر اساس دستگاه بردار پشتیبانی-2020
High-frequency data provides a lot of materials and broad research prospects for in-depth research and understanding on financial market behavior, but the problems solved in the research of high-frequency data are far less than the problems faced and encountered, and the research value of high-frequency data will be greatly reduced without solving these problems. Volatility is an important measurement index of market risk, and the research and forecasting on the volatility of high-frequency data is of great significance to investors, government regulators and capital markets. To this end, by modelling the jump volatility of high-frequency data, the shortterm volatility of high-frequency data are predicted.
Keywords: Big data | Financial market | Volatility | Support vector machine
A multi-scale method for forecasting oil price with multi-factor search engine data
یک روش چند مقیاس برای پیش بینی قیمت نفت با داده های موتور جستجوی چند عاملی-2020
With the boom in big data, a promising idea for using search engine data has emerged and improved international oil price prediction, a hot topic in the fields of energy system modelling and analysis. Since different search engine data drive the oil price in different ways at different timescales, a multi-scale forecasting methodology is proposed that carefully explores the multi-scale relationship between the oil price and multi-factor search engine data. In the proposed methodology, three major steps are involved: (1) a multi-factor data process, to collect informative search engine data, reduce dimensionality, and test the predictive power via statistical analyses; (2) multi-scale analysis, to extract matched common modes at similar timescales from the oil price and multi-factor search engine data via multivariate empirical mode decomposition; (3) oil price prediction, including individual prediction at each timescale and ensemble prediction across timescales via a typical forecasting technique. With the Brent oil price as a sample, the empirical results show that the novel methodology significantly outperforms its original form (without multi-factor search engine data and multi-scale analysis), semi-improved versions (with either multi-factor search engine data or multi-scale analysis), and similar counterparts (with other multi-scale analysis), in both the level and directional predictions.
Keywords: Big data | Search engine data | Google trends | Multivariate empirical mode decomposition | Oil price forecasting
A novel intelligent option price forecasting and trading system by multiple kernel adaptive filters
رویکرد پیش بینی قیمت و گزینه سیستم تجاری با فیلترهای انطباقی چند هسته ای-2020
Derivatives such as options are complex financial instruments. The risk in option trading leads to the demand of trading support systems for investors to control and hedge their risk. The nonlinearity and non-stationarity of option dynamics are the main challenge of option price forecasting. To address the problem, this study develops a multi-kernel adaptive filters (MKAF) for online option trading. MKAF is an improved version of the adaptive filter, which employs multiple kernels to enhance the richness of nonlinear feature representation. The MKAF is a fully adaptive online algorithm. The strength of MKAF is that the weights to the kernels are simultaneous optimally determined in filter coefficient updates. We do not need to design the weights separately. Therefore, MKAF is good at tracking nonstationary nonlinear option dynamics. Moreover, to reduce the computation time in updating the filter, and prevent overadaptation, the number of kernels is restricted by using coherence-based sparsification, which constructs a set of dictionary and uses a coherence threshold to restrict the dictionary size. This study compared the new method with traditional ones, we found the performance improvement is significant and robust. Especially, the cumulated trading profits are substantially increased
Keywords: Artificial intelligence | Adaptive filter | Multiple Kernel Machine | Big data analysis | Data mining | Financial forecasting
Forecasting crude oil price with multilingual search engine data
پیش بینی قیمت نفت خام با داده های موتور جستجو چند زبانه-2020
In the big data era, search engine data (SED) has presented new opportunities for improving crude oil price prediction; however, the existing research were confined to single-language (mostly English) search keywords in SED collection. To address such a language bias and grasp worldwide investor attention, this study proposes a novel multilingual SED-driven forecasting methodology from a global perspective. The proposed methodology includes three main steps: (1) multilingual index construction, based on multilingual SED; (2) relationship investigation, between the multilingual index and crude oil price; and (3) oil price prediction, with the multilingual index as an informative predictor. With WTI spot price as studying samples, the empirical results indicate that SED have a powerful predictive power for crude oil price; nevertheless, multilingual SED statistically demonstrate better performance than single-language SED, in terms of enhancing prediction accuracy and model robustness.
Keywords: Big data | Multilingual search engine index | Crude oil price forecasting | Google Trends | Artificial intelligence
Can twitter analytics predict election outcome? An insight from 2017 Punjab assembly elections
آیا تحلیل های توییتر می توانند نتیجه انتخابات را پیش بینی کنند؟ بینشی از انتخابات مجلس پنجم 2017-2020
Since the beginning of this decade, there has seen an exponential growth in number of internet users using social media, especially Twitter for sharing their views on various topics of common interest like sports, products, politics etc. Due to the active participation of large number of people on Twitter, huge amount of data (i.e. big data) is being generated, which can be put to use (after refining) to analyze real world problems. This paper takes into consideration the Twitter data related to the 2017 Punjab (a state of India) assembly elections and applies different social media analytic techniques on collected tweets to extract and unearth hidden but useful information. In addition to this, we have employed machine learning algorithm to perform polarity analysis and have proposed a new seat forecasting method to accurately predict the number of seats that a political party is likely to win in the elections. Our results confirmed that Indian National Congress was likely to emerge winner and that in fact was the outcome, when results got declared.
Keywords: Analytics | Election prediction | Social media | Natural language processing | Machine learning | Sentiment analysis | Twitter
Comparison of QuikSCAT, WRF and buoy ocean surface wind data off Valparaiso Bay, Chile
مقایسه داده های باد QuikSCAT ، WRF و شناور سطح اقیانوس شناور در خلیج Valparaiso ، شیلی-2020
The winds that affect the surface of the ocean are also important for a vast array of activities, either operational or scientific, hence the importance of being able to adequately predict this quality. Because of the preceding fact, a Weather Research and Forecasting model was used to perform simulations at the surface of the Ocean, for winds derived from different boundary conditions (NCEP-CFSR, ERA-Interim and NCEP-FNL) and configured with different spatial resolution (25, 5 and 1 km), with the objective of evaluating which of these data sets delivers the more precise wind simulation at the surface of the ocean. A comparative analysis was performed between the different outputs of the WRF model, QuikSCAT satellite and in situ observations of a buoy installed off the central coast of Chile. The results showed that the WRF model, overestimates the wind magnitudes, across all boundary conditions or spatial resolution. Additionally, depending on the in situ wind magnitude (> 6 ms−1), the model predicts adequately wind magnitude and direction. Spatial comparisons were performed between QuikSCAT and WRF outputs at the Chilean coast to evaluate any possible differences. The modeled winds showed a tendency to be stronger than those measured by Satellites and the bigger differences appeared closer to the shore. The wavelet coherence and phase analysis, confirmed that the model delivers precise wind information for frequencies greater than the daily cycle. Finally, the results of the simulation produced by the ERA-Interim analysis showed lower errors in terms of temporal and spatial variability of surface winds.
Keywords: Winds | Buoy | Chile | QuikSCAT | ERA-Interim | WRF
Therapy-driven Deep Glucose Forecasting
پیش بینی گلوکز عمیق درمان محور-2020
The automatic regulation of blood glucose for Type 1 diabetes patients is the main goal of the artificial pancreas, a closed-loop system that exploits continue glucose monitoring data to define an optimal insulin therapy. One of the most successful approaches for developing the artificial pancreas is the model predictive control, which exhibits promising results on both virtual and real patients. The performance of such controller is highly dependent on the reliability of the glucose–insulin model used for prediction purpose, which is usually implemented with classic mathematical models. The main limitation of these models consists in the difficulties of modeling the physiological nonlinear dynamics typical of this system. The availability of big amount of in silico and in vivo data moved the attention to new data-driven methods which are able to easily overcome this problem. In this paper we propose Deep Glucose Forecasting, a deep learning approach for forecasting glucose levels, based on a novel, two-headed Long-Short Term Memory implementation. It takes in input the previous values obtained through continue glucose monitoring, the carbohydrate intake, the suggested insulin therapy and forecasts the interstitial glucose level of the patient. The proposed architecture has been trained on 100 virtual adult patients of the UVA/Padova simulator, and tested on both virtual and real patients. The proposed solution is able to generalize to new unseen data, outperforms classical population models and reaches performance comparable to classical personalized models when fine-tuning is exploited on real patients.
Keywords: Diabetes | Forecasting | Prediction | Deep learning | LSTM
Forecasting across time series databases using recurrent neural networks on groups of similar series: A clustering approach
پیش بینی پیش بینی پایگاه داده های سری زمانی با استفاده از شبکه های عصبی مکرر در گروه های مشابه سری: یک روش خوشه بندی-2020
With the advent of Big Data, nowadays in many applications databases containing large quantities of sim- ilar time series are available. Forecasting time series in these domains with traditional univariate fore- casting procedures leaves great potentials for producing accurate forecasts untapped. Recurrent neural networks (RNNs), and in particular Long Short Term Memory (LSTM) networks, have proven recently that they are able to outperform state-of-the-art univariate time series forecasting methods in this context, when trained across all available time series. However, if the time series database is heterogeneous, ac- curacy may degenerate, so that on the way towards fully automatic forecasting methods in this space, a notion of similarity between the time series needs to be built into the methods. To this end, we present a prediction model that can be used with different types of RNN models on subgroups of similar time series, which are identified by time series clustering techniques. We assess our proposed methodology using LSTM networks, a widely popular RNN variant, together with various clustering algorithms, such as kMeans, DBScan, Partition Around Medoids (PAM), and Snob. Our method achieves competitive results on benchmarking datasets under competition evaluation procedures. In particular, in terms of mean sMAPE accuracy it consistently outperforms the baseline LSTM model, and outperforms all other methods on the CIF2016 forecasting competition dataset.
Keywords: Big data forecasting | RNN | LSTM | Time series clustering | Neural networks
A novel spatio-temporal wind power forecasting framework based on multi-output support vector machine and optimization strategy
چارچوب پیش بینی نیروی باد مکانی و مکانی رمان بر اساس ماشین بردار پشتیبانی چند خروجی و استراتژی بهینه سازی-2020
The integration of a large number of wind farms poses big challenges to the secure and economical operation of power systems, and ultra-short-term wind power forecasting is an effective solution. However, traditional approaches can only predict an individual wind farm power at a time and ignore the spatio-temporal correlation of wind farms. In this paper, a novel ultra-short-term forecasting framework based on spatio-temporal (ST) analysis, multi-output support vector machine (MSVM) and grey wolf optimizer (GWO) which defined ST-GWO-MSVM model is proposed to predict the output wind power from multiple wind farms; the ST-GWO-MSVM model includes data analysis stage, parameters optimization stage, and modeling stage. In the data analysis stage, the person correlation coefficient and partial autocorrelation function are used to analyze the spatio-temporal correlation of wind power. In the parameters optimization stage, to avoid obtaining the unreliable forecasting results due to the parameters are chosen empirically, the GWO algorithm is used to optimize the kernel function parameters of the MSVM model. In the modeling stage, an innovative forecasting model with optimal parameter of MSVM is proposed to predict the output wind power of 15 wind farms. Results show that the performance of ST-GWO-MSVM is better than other benchmark models in terms of multiple-error metrics including fractional bias, direction accuracy, and improvement percentages.
Keywords: wind power forecasting | Spatio-temporal correlation | Multi-output support vector machine | Grey wolf optimizer | Combined forecasting approaches