Special interest tourism is not so special after all: Big data evidence from the 2017 Great American Solar Eclipse
جهانگردی با علاقه ویژه از همه مهم تر نیست: شواهد داده های بزرگ از خورشید گرفتگی بزرگ آمریکایی 2017-2020
This study puts to empirical test a major typology in the tourism literature, mass versus special interest tourism (SIT), as the once-distinctive boundary between the two has become blurry in modern tourism scholarship. We utilize 41,747 geo-located Instagram photos pertaining to the 2017 Great American Solar Eclipse and Big Data analytics to distinguish tourists based on their choice of observational destinations and spatial movement patterns. Two types of tourists are identified: opportunists and hardcore. The motivational profile of those tourists is validated with the external data through hypothesis testing and compared with and contrasted against existing motivation-based tourist typologies. The main conclusion is that large share of tourists involved in what is traditionally understood as SIT activities exhibit behavior and profile characteristic of mass tourists seeking novelty but conscious about risks and comforts. Practical implications regarding the potential of rural and urban destinations for developing SIT tourism are also discussed.
Keywords: Big data | Instagram photos | Social media | Spatial analysis | Special interest tourism | Astro-tourism
Efficacy and safety of oral and inhalation commercial beta-glucan products: Systematic review of randomized controlled trials
اثربخشی و ایمنی محصولات بتا گلوکان تجاری و خوراکی استنشاق: مرور سیستماتیک کارآزمایی کنترل شده تصادفی-2020
Background & aims: Beta-glucans are advertised as biologically active compounds, with various health claims.We aimed to summarize results about efficacy and safety of commercial oral and inhalation betaglucan products on human health from randomized controlled trials (RCTs). Methods: We conducted systematic review of RCTs. We searched MEDLINE, CENTRAL and ClinicalTrials. gov. Any commercial product, any types of participants and any health-related outcomes were eligible. Two authors independently screened studies and extracted data. Cochrane risk of bias tool was used. This review did not have any extramural funding. Registration: PROSPERO record no. 42016043539. Results: We included 30 RCTs that were conducted on healthy or ill participants. Most of the trials reported beneficial effect of beta-glucan, but among the 105 different outcome domains and measures that were used, only three could be considered clinically relevant, while others were various biomarkers and surrogate outcomes such as complete blood count. Included studies on average had 33 participants per study arm, high or unclear risk of bias of at least one domain, and only half of them reported data for safety. More than half of trials that reported source of funding indicated commercial sponsorship from producers of beta-glucan. Only five RCTs reported trial registration. Conclusions: Commercial beta-glucan products were studied in a number of RCTs whose results can be considered only as preliminary, as they used small number of participants and surrogate outcomes. The quality of many studies was poor and further research and trials on bigger population should be performed before a final conclusion can be made.
Keywords: Beta-glucan | Systematic review | Evidence | Randomized controlled trial | Research waste
Can the development of a patient’s condition be predicted through intelligent inquiry under the e-health business mode? Sequential feature map-based disease risk prediction upon features selected from cognitive diagnosis big dat
آیا می توان از طریق استعلام هوشمند تحت شرایط تجارت الکترونیکی ، وضعیت یک بیمار را پیش بینی کرد؟ پیش بینی خطر ابتلا به بیماری مبتنی بر ویژگی های توالی بر ویژگی های انتخاب شده از تشخیص شناختی داده های بزرگ-2020
The data-driven mode has promoted the researches of preventive medicine. In prediction of disease risks, physicians’ clinical cognitive diagnosis data can be used for early prevention of diseases and, therefore, to reduce medical cost, to improve accessibility of medical services and to lower medical risk. However, researches involved no physicians’ cognition of patients’ conditions in intelligent inquiry under e-health business mode, offered no diagnosis big data, neglected the values of the fused text information generated by joint activities of online and offline medical data, and failed to thoroughly analyze the phenomenon of redundancy-complementarity dispersion caused by high-order information shortage from the online inquiry data-driven perspective. Besides, the risk prediction simply based on offline clinical cognitive diagnosis data undoubtedly reduces prediction precision. Importantly, relevant researches rarely considered temporal relationships of different medical events, did not conduct detailed analysis on practical problems of pattern explosion, did not offer a thought of intelligent portrayal map, and did not conduct relevant risk prediction based on the sub-maps obtained from the map. In consequence, the paper presents a disease risk prediction method with the model for redundancy-complementarity dispersion-based feature selection from physicians’ online cognitive diagnosis big data to realize features selection from the cognitive diagnosis big data of online intelligent inquiry; the obtained features were ranked intelligently for subsequent high-dimensional information shortage compensation; the compensated key feature information of the cognitive diagnosis big data was fused with offline electronic medical record (EMR) to form the virtual electronic medical record (VEMR). The formed VEMR was combined with the method of the sequential feature map for modelling, and a sequential feature map-based model for disease risk prediction was presented to obtain online users’ medical conditions. A neighborhood-based collaborative prediction model was presented for prediction of an online intelligent medical inquiry user’s possible diseases in the future and to intelligently rank the risk probabilities of the diseases. In the experiments, the online intelligent medical inquiry users’ VEMRs were used as the foundation of the simulation experiments to predict disease risks in chronic obstructive pulmonary disease (OCPD) population and rheumatic heart disease (RHD) population. The experiments demonstrated that the presented method showed relatively good metric performances in the VEMR and improved disease risk prediction.
Keywords: Cognitive diagnosis big data | Online intelligent inquiry | Sequential feature map | Disease risk prediction | Redundancy and complementarity dispersion
Big data analytics for financial Market volatility forecast based on support vector machine
تجزیه و تحلیل داده های بزرگ برای پیش بینی نوسانات مالی بازار بر اساس دستگاه بردار پشتیبانی-2020
High-frequency data provides a lot of materials and broad research prospects for in-depth research and understanding on financial market behavior, but the problems solved in the research of high-frequency data are far less than the problems faced and encountered, and the research value of high-frequency data will be greatly reduced without solving these problems. Volatility is an important measurement index of market risk, and the research and forecasting on the volatility of high-frequency data is of great significance to investors, government regulators and capital markets. To this end, by modelling the jump volatility of high-frequency data, the shortterm volatility of high-frequency data are predicted.
Keywords: Big data | Financial market | Volatility | Support vector machine
Big Data Everywhere
داده های بزرگ در همه جا-2020
Big Data and machine-learning approaches to analytics are an important new frontier in laboratory medicine. Direct-to-consumer (DTC) testing raises specific challenges in applying these new tools of data analytics. Because DTC data are not centralized by default, there is a need for data repositories to aggregate these values to develop appropriate predictive models. The lack of a default linkage between DTC results and medical outcomes data also limits the ability to mine these data for predictive modeling of disease risk. Issues of standardization and harmonization, which are a significant issue across all laboratory medicine, may be particularly difficult to correct in aggregated sets of DTC data
KEYWORDS : Big Data | Laboratory medicine | Machine learning | Direct-to-consumer testing | DTC | Harmonization
Analysis of substance use and its outcomes by machine learning I: Childhood evaluation of liability to substance use disorder
تجزیه و تحلیل استفاده از مواد و نتایج آن با یادگیری ماشین I: ارزیابی کودک از مسئولیت در برابر اختلال در مصرف مواد-2020
Background: Substance use disorder (SUD) exacts enormous societal costs in the United States, and it is important to detect high-risk youths for prevention. Machine learning (ML) is the method to find patterns and make prediction from data. We hypothesized that ML identifies the health, psychological, psychiatric, and contextual features to predict SUD, and the identified features predict high-risk individuals to develop SUD. Method: Male (N=494) and female (N=206) participants and their informant parents were administered a battery of questionnaires across five waves of assessment conducted at 10–12, 12–14, 16, 19, and 22 years of age. Characteristics most strongly associated with SUD were identified using the random forest (RF)algorithm from approximately 1000 variables measured at each assessment. Next, the complement of features was validated, and the best models were selected for predicting SUD using seven ML algorithms. Lastly, area under the receiver operating characteristic curve (AUROC) evaluated accuracy of detecting individuals who develop SUD +/- up to thirty years of age. Results: Approximately thirty variables strongly predict SUD. The predictors shift from psychological dysregulation and poor health behavior in late childhood to non-normative socialization in mid to late adolescence. In 10–12-year-old youths, the features predict SUD+/- with 74% accuracy, increasing to 86% at 22 years of age. The RF algorithm optimally detects individuals between 10–22 years of age who develop SUD compared to other ML algorithms. Conclusion: These findings inform the items required for inclusion in instruments to accurately identify high risk youths and young adults requiring SUD prevention
Keywords: Substance use disorder | Machine learning | Substance abuse prevention | Big data | Screening addiction risk
A novel intelligent option price forecasting and trading system by multiple kernel adaptive filters
رویکرد پیش بینی قیمت و گزینه سیستم تجاری با فیلترهای انطباقی چند هسته ای-2020
Derivatives such as options are complex financial instruments. The risk in option trading leads to the demand of trading support systems for investors to control and hedge their risk. The nonlinearity and non-stationarity of option dynamics are the main challenge of option price forecasting. To address the problem, this study develops a multi-kernel adaptive filters (MKAF) for online option trading. MKAF is an improved version of the adaptive filter, which employs multiple kernels to enhance the richness of nonlinear feature representation. The MKAF is a fully adaptive online algorithm. The strength of MKAF is that the weights to the kernels are simultaneous optimally determined in filter coefficient updates. We do not need to design the weights separately. Therefore, MKAF is good at tracking nonstationary nonlinear option dynamics. Moreover, to reduce the computation time in updating the filter, and prevent overadaptation, the number of kernels is restricted by using coherence-based sparsification, which constructs a set of dictionary and uses a coherence threshold to restrict the dictionary size. This study compared the new method with traditional ones, we found the performance improvement is significant and robust. Especially, the cumulated trading profits are substantially increased
Keywords: Artificial intelligence | Adaptive filter | Multiple Kernel Machine | Big data analysis | Data mining | Financial forecasting
Assessment of mutual fund performance based on Ensemble Empirical Mode Decomposition
ارزیابی عملکرد صندوق های متقابل بر اساس تجزیه و تحلیل حالت تجربی گروه-2020
This study analyzes mutual fund performance in three different time scales. The mutual fund return time series is decomposed by ensemble empirical model decomposition method, which is a data analysis method, especially for processing nonstationary and nonlinear time series, into three time-scales, namely, short cycle, long cycle and trend, which have different meaning on mutual fund management. Short cycle represents the temporary volatility of the market and long cycle represents the operation circle of the mutual fund and trend represents the development tendency of the fund. The mutual funds are also divided into equity, bond, and mixture funds according to portfolio types. The performances of the three fund types are analyzed. The data set, having 2600 mutual funds, in this study is relatively large compared with that in other researches. Result shows that the bond and mixture funds have different management strategies from that of the equity fund, which means that, to seek excess profit, the equity fund focuses on short-cycle management and tends to ignore the long-cycle management, whereas the bond and mixture funds focus on long-cycle management and take less care on short-cycle management. In short cycle, all three sorts of funds are making excess profit through taking market system risk and have no significant performance on α return; in long cycle and trend, they seek excess profit through acquiring more α return. The assessment indices used to assess fund performance confirm the differences in the three fund’s management strategies.
Keywords: Capital asset pricing model | Ensemble Empirical Model Decomposition | Mutual fund performance | Fund management strategy
Predicting academic performance of students from VLE big data using deep learning models
پیش بینی عملکرد علمی دانش آموزان از داده های بزرگ VLE با استفاده از مدل های یادگیری عمیق-2020
The abundance of accessible educational data, supported by the technology-enhanced learning platforms, provides opportunities to mine learning behavior of students, addressing their issues, optimizing the educational environment, and enabling data-driven decision making. Virtual learning environments complement the learning analytics paradigm by effectively providing datasets for analysing and reporting the learning process of students and its reflection and contribution in their respective performances. This study deploys a deep artificial neural network on a set of unique handcrafted features, extracted from the virtual learning environments clickstream data, to predict at-risk students providing measures for early intervention of such cases. The results show the proposed model to achieve a classification accuracy of 84%–93%. We show that a deep artificial neural network outperforms the baseline logistic regression and support vector machine models. While logistic regression achieves an accuracy of 79.82%–85.60%, the support vector machine achieves 79.95%–89.14%. Aligned with the existing studies - our findings demonstrate the inclusion of legacy data and assessment-related data to impact the model significantly. Students interested in accessing the content of the previous lectures are observed to demonstrate better performance. The study intends to assist institutes in formulating a necessary framework for pedagogical support, facilitating higher education decision-making process towards sustainable education.
Keywords: Learning analytics | Predicting success | Educational data | Machine learning | Deep learning | Virtual learning environments (VLE)
Highway crash detection and risk estimation using deep learning
تشخیص تصادف بزرگراه و تخمین ریسک با استفاده از یادگیری عمیق-2020
Crash Detection is essential in providing timely information to traffic management centers and the public to reduce its adverse effects. Prediction of crash risk is vital for avoiding secondary crashes and safeguarding highway traffic. For many years, researchers have explored several techniques for early and precise detection of crashes to aid in traffic incident management. With recent advancements in data collection techniques, abundant real-time traffic data is available for use. Big data infrastructure and machine learning algorithms can utilize this data to provide suitable solutions for the highway traffic safety system. This paper explores the feasibility of using deep learning models to detect crash occurrence and predict crash risk. Volume, Speed and Sensor Occupancy data collected from roadside radar sensors along Interstate 235 in Des Moines, IA is used for this study. This real-world traffic data is used to design feature set for the deep learning models for crash detection and crash risk prediction. The results show that a deep model has better crash detection performance and similar crash prediction performance than state of the art shallow models. Additionally, a sensitivity analysis was conducted for crash risk prediction using data 1-minute, 5-minutes and 10-minutes prior to crash occurrence. It was observed that is hard to predict the crash risk of a traffic condition, 10 min prior to a crash.
Keywords: Crash detection | Crash prediction | Deep learning