Big data analytics for financial Market volatility forecast based on support vector machine
تجزیه و تحلیل داده های بزرگ برای پیش بینی نوسانات مالی بازار بر اساس دستگاه بردار پشتیبانی-2020
High-frequency data provides a lot of materials and broad research prospects for in-depth research and understanding on financial market behavior, but the problems solved in the research of high-frequency data are far less than the problems faced and encountered, and the research value of high-frequency data will be greatly reduced without solving these problems. Volatility is an important measurement index of market risk, and the research and forecasting on the volatility of high-frequency data is of great significance to investors, government regulators and capital markets. To this end, by modelling the jump volatility of high-frequency data, the shortterm volatility of high-frequency data are predicted.
Keywords: Big data | Financial market | Volatility | Support vector machine
Prediction of the ground temperature with ANN, LS-SVM and fuzzy LS-SVM for GSHP application
پیش بینی دمای زمین با شبکه های عصبی، LS-SVM و LS-SVM فازی برای استفاده GSHP-2020
Ground source heat pump (GSHP) system has received more and more attentions for its energy-conserving and environmental-friendly properties. Acquisition of the undisturbed ground temperature is the prerequisite for designing of GSHP system. Measurement by burying temperature sensors underground is the conventional means for obtaining the ground temperature data. However, this way is usually time consuming and high investment, and also easily encounter with certain technical difficulties. The rapid development of intelligent computation algorithm provides solutions for many realistic difficult problems. Basing on a great number of the measured data of the ground temperature from two boreholes with 100m depth located in Chongqing, ground temperature prediction models basing on artificial neural network (ANN) and support vector machine based on least square (LS-SVM) are established, respectively. And then, two kinds of validation works, i.e., holdout validation and k-fold validation are conducted toward the two models, respectively. Furthermore, a new method that correlating fuzzy theory with LS-SVM is proposed to solve the big computation burden problem encountered by LS-SVM model. By comparing with the above two models, it is concluded that the newly proposed model can not only improve the calculation speed obviously but also be able to promote the prediction accuracy, especially superior to the single LS-SVM model.
Keywords: Ground temperature | Fuzzy | Support vector machine | Ground source heat pump
A grid-quadtree model selection method for support vector machines
روش انتخاب مدل شبکه چهارگوش برای ماشینهای بردار پشتیبانی-2020
In this paper, a new model selection approach for Support Vector Machine (SVM), which integrates the quadtree technique with the grid search, denominated grid-quadtree (GQ) is proposed. The developed method is the first in the literature to apply the quadtree for the SVM parameters optimization. The SVM is a machine-learning technique for pattern recognition whose performance relies on its parameters determination. Thus, the model selection problem for SVM is an important field of study and requires expert and intelligent systems to solve it. Real classification data sets involve a huge number of instances and features, and the greater is the training data set dimension, the larger is the cost of a recognition system. The grid search (GS) is the most popular and the simplest method to select parameters for SVM. However, it is time-consuming, which limits its application for big-sized problems. With this in mind, the main idea of this research is to apply the quadtree technique to the GS to make it faster. Hence, this may lower computational time cost for solving problems such as bio-identification, bank credit risk and cancer detection. Based on the asymptotic behaviors of the SVM, it was noticeably observed that the quadtree is able to avoid the GS full search space evaluation. As a consequence, the GQ carries out fewer parameters analysis, solving the same problem with much more efficiency. To assess the GQ performance, ten classification benchmark data set were used. The obtained results were compared with the ones of the traditional GS. The outcomes showed that the GQ is able to find parameters that are as good as the GS ones, executing 78.8124% to 85.8415% fewer operations. This research points out that the adoption of quadtree expressively reduces the computational time of the original GS, making it much more efficient to deal with high dimensional and large data sets.
Keywords: Support vector machine | Parameter determination | Quadtree | Grid search
A machine-learning-based prediction model of fistula formation after interstitial brachytherapy for locally advanced gynecological malignancies
یک مدل پیش بینی مبتنی بر یادگیری ماشینی از تشکیل فیستول پس از براکی تراپی بینابینی برای بدخیمی های ژنتیکی بومی محلی-2019
PURPOSE: External beam radiotherapy combined with interstitial brachytherapy is commonly used to treat patients with bulky, advanced gynecologic cancer. However, the high radiation dose needed to control the tumor may result in fistula development. There is a clinical need to identify patients at high risk for fistula formation such that treatment may be managed to prevent this toxic side effect. This work aims to develop a fistula prediction model framework using machine learning based on patient, tumor, and treatment features. METHODS AND MATERIALS: This retrospective study included 35 patients treated at our institution using interstitial brachytherapy for various gynecological malignancies. Five patients developed rectovaginal fistula and two developed both rectovaginal and vesicovaginal fistula. For each patient, 31 clinical features of multiple data types were collected to develop a fistula prediction framework. A nonlinear support vector machine was used to build the prediction model. Sequential backward feature selection and sequential floating backward feature selection methods were used to determine optimal feature sets. To overcome data imbalance issues, the synthetic minority oversampling technique was used to generate synthetic fistula cases for model training. RESULTS: Seven mixed data features were selected by both sequential backward selection and sequential floating backward selection methods. Our prediction model using these features achieved a high prediction accuracy, that is, 0.904 area under the curve, 97.1% sensitivity, and 88.5% specificity. CONCLUSIONS: A machine-learningebased prediction model of fistula formation has been developed for patients with advanced gynecological malignancies treated using interstitial brachytherapy. This model may be clinically impactful pending refinement and validation in a larger series.
Keywords: Machine learning | Support vector machine | Interstitial brachytherapy | Gynecologic cancer
Deep learning facilitates the diagnosis of adult asthma
تسهیلات یادگیری عمیق در تشخیص آسم بزرگسالان-2019
Background: We explored whether the use of deep learning to model combinations of symptom-physical signs and objective tests, such as lung function tests and the bronchial challenge test, would improve model performance in predicting the initial diagnosis of adult asthma when compared to the conventional machine learning diagnostic method. Methods: The data were obtained from the clinical records on prospective study of 566 adult outpatients who visited Kindai University Hospital for the first time with complaints of non-specific respiratory symptoms. Asthma was comprehensively diagnosed by specialists based on symptom-physical signs and objective tests. Model performance metrics were compared to logistic analysis, support vector machine (SVM) learning, and the deep neural network (DNN) model. Results: For the diagnosis of adult asthma based on symptom-physical signs alone, the accuracy of the DNN model was 0.68, whereas that for the SVM was 0.60 and for the logistic analysis was 0.65. When adult asthma was diagnosed based on symptom-physical signs, biochemical findings, lung function tests, and the bronchial challenge test, the accuracy of the DNN model increased to 0.98 and was significantly higher than the 0.82 accuracy of the SVM and the 0.94 accuracy of the logistic analysis. Conclusions: DNN is able to better facilitate diagnosing adult asthma, compared with classical machine learnings, such as logistic analysis and SVM. The deep learning models based on symptom-physical signs and objective tests appear to improve the performance for diagnosing adult asthma
Keywords: Artificial intelligence | Asthma | Deep learning | Diagnosis | Support vector machine
A deep feature mining method of electronic nose sensor data for identifying beer olfactory information
یک روش استخراج عمیق از داده های حسگر بینی الکترونیکی برای شناسایی اطلاعات بویایی آبجو-2019
In this work, a deep feature mining method for electronic nose (E-nose) sensor data based on the convolutional neural network (CNN) was proposed in combination with a support vector machine (SVM) to identify beer olfactory information. According to the characteristics of E-nose sensor data, the structure and parameters of the CNN was designed. By means of convolution and pooling operations, the beer olfaction features were extracted automatically. Meanwhile, the SVM replaced the full connection layer of the CNN to enhance the generalization ability of the model, and two important parameters affecting the classification performance of the SVM were optimized based on an improved particle swarm optimization (PSO). The results indicated that the CNN-SVM model achieved deep feature automatic extraction of beer olfactory information, and a good classification performance of 96.67% was obtained in the testing set. This study shows that the CNN-SVM can be used as an effective tool for high precision intelligent identification of beer olfactory information
Keywords: Electronic nose | Feature mining | Convolutional neural network | Support vector machine | Beer
A machine learning approach for traffic-noise annoyance assessment
یک روش یادگیری ماشین برای تخمین آزار سر و صدای ترافیک-2019
In this study, models for predicting traffic-noise annoyance based on noise perception, noise exposure levels, and demographics were developed. By applying machine-learning techniques, in particular artificial neural networks (ANN), support vector machines (SVM) and multiple linear regressions (MLR), the traffic-noise annoyance models were obtained, and the error rates compared. A traffic noise map and the estimation of noise exposure for the case study area were developed. Although, it is quite evident that subjective noise perception and predicted noise exposure levels strongly influence traffic-noise annoyance, traditional statistical models fail to produce accurate predictions. Therefore, a machine-learning approach was applied, which showed a better performance in terms of error rates and the coefficient of determination (R2). The best results for predicting traffic-noise annoyance were obtained with the ANN model, obtaining 42% and 35% error reduction in training subsets compared to the MRL and SVM models, respectively. For testing subsets, the error reductions were 24% and 19% for the corresponding models. The coefficient of determination R2 increased 3.8 and 2.3 times using ANN compared to MRL and SVM models in training subsets respectively, and 1.7 times (in both MRL and SVM models) for testing subsets. In this way, the applied methodology can be used as a reliable and more accurate tool for determining the impact of transportation noise in urban context, promoting the well-being of the population and the creation of suitable public policy.
Keywords: Noise annoyance | Traffic noise | Machine-learning | Artificial neural networks | Support vector machine
Mining featured biomarkers associated with vascular invasion in HCC by bioinformatics analysis with TCGA RNA sequencing data
استخراج نشانگرهای زیستی مرتبط با تهاجم عروق در HCC با تجزیه و تحلیل بیوانفورماتیک با داده های توالی TCNA RNA-2019
This study aims to identify the feature genes associated with vascular invasion in hepatocellular carcinoma (HCC). Here, the RNA sequencing data related to vascular invasion in The Cancer Genome Atlas (TCGA) database, including 292 HCC patients with complete clinical data were included in our study as the training dataset for construction and E-TABM-36, including 41 HCC patients with complete clinical data was used as the validation dataset. Following data normalization, differentially expressed mRNA and copy number (CN) were selected between with and without vascular invasion samples. A support vector machine (SVM) classifier was constructed and validated in GSE9828 and GSE20017 datasets. Total 59 feature genes were found by the SVM classifier. Using Cox regression analysis, three clinical features, including Patholigic T, Stage and vascular invasion and 6 optimal prognostic genes, including ANO1, EPHX2, GFRA1, OLFM2, SERPINA10 and TKT were significantly correlated with prognosis. A risk score formula was developed to assess the prognostic value of 6 optimal prognostic genes, which were identified to possess the most remarkable correlation with overall survival in HCC patients. By performing in vitro experiments, we observed TKT was significantly increased, but OLFM2 was decreased in high metastatic potential HCC cell lines (SK-HEP-1 and MHCC-97 H) compared with low metastatic potential cell line Huh7 and normal human liver cell line LO2 using western blotting analysis. Knockdown of TKT in MHCC-97H or overexpression of OLFM2 in SK-HEP-1 significantly suppressed cell migration and invasion using transwell assays. Our results demonstrated that TKT and OLFM2 might be novel independent biomarkers for predicting survival based on the presence of vascular invasion in patients with HCC.
Keywords: Hepatocellular carcinoma | Vascular invasion | Support vector machine | Prognosis
Analysis of body pressure distribution on car seats by using deep learning
تجزیه و تحلیل توزیع فشار بدن روی صندلی های اتومبیل با استفاده از یادگیری عمیق-2019
This study aimed to extract information from body pressure distribution, including comfort, participant body size, and seat characteristics by using supervised deep learning, and body pressure characteristics corresponding to sensory evaluation by using unsupervised deep learning. Body pressure data of 18 participants and 19 kinds of car seats were used for the analysis. Sensory evaluation of 9 items concerning cushion characteristics and seat comfort was conducted. From the analysis, we determined that body size and car seats could be classified with high precision by using body pressure distribution data. For the sensory evaluation items, the correct answer rate was high. By examining the importance of the cells of the mat, the features of the body pressure mat at the seat cushion and backrest, body size, car seat, and parts related to sensory evaluation could be determined in detail. The study findings can be applied in the development of car seats.
Keywords: Body pressure distribution | Car seat | Machine learning | Deep learning | Support vector machine | Characteristics extraction
On the application of machine learning techniques to derive seismic fragility curves
استفاده از روش های یادگیری ماشین برای استنتاج منحنی های شکنندگی لرزه ای-2019
Deriving the fragility curves is a key step in seismic risk assessment within the performance-based earthquake engineering framework. The objective of this study is to implement machine learning tools (i.e., classification-based tools in particular) for predicting the structural responses and the fragility curves. In this regard, ten different classification-based methods are explored: logistic regression, lasso regression, support vector machine, Naïve Bayes, decision tree, random forest, linear and quadratic discriminant analyses, neural networks, and K-nearest neighbors with the structural responses resulted from the multiple strip analyses. In addition, this study examines the impact of class imbalance in training dataset, which is typical among data of structural responses, when developing classification-based models for predicting structural responses. The statistical results using the implemented dataset demonstrate that among applied methods, random forest and quadratic discriminant analysis are, respectively, preferable with the imbalanced and balanced datasets since they show the highest efficiency in predicting the structural responses. Moreover, a detailed procedure is presented on how to derive the fragility curves based on the classification-based tools. Finally, the sensitivity of the applied machine learning methods to the size of employed dataset is investigated. The results explain that logistic regression, lasso regression, and Naïve Bayes are not sensitive to the size of dataset (i.e., the number of performed time history analyses); while the performance of discriminant analysis significantly depends on the size of applied dataset
Keywords: Fragility curve | Machine learning tools | Imbalanced dataset | Random forest | Support vector machine | Multiple strip analysis