Estimating monthly wet sulfur (S) deposition flux over China using an ensemble model of improved machine learning and geostatistical approach
برآورد شار رسوب ماهانه گوگرد مرطوب (S) بر روی چین با استفاده از مدل گروهی از یادگیری ماشین پیشرفته و روش زمین آماری-2019
The wet S deposition was treated as a key issue because it played the negative on the soil acidification, biodiversity loss, and global climate change. However, the limited ground-level monitoring sites make it difficult to fully clarify the spatiotemporal variations of wet S deposition over China. Therefore, an ensemble model of improved machine learning and geostatistical method named fruit fly optimization algorithm-random forestspatiotemporal Kriging (FOA-RF-STK) model was developed to estimate the nationwide S deposition based on the emission inventory, meteorological factors, and other geographical covariates. The ensemble model can capture the relationship between predictors and S deposition flux with the better performance (R2=0.68, root mean square error (RMSE)=7.51 kg ha−1 yr−1) compared with the original RF model (R2=0.52, RMSE=8.99 kg ha−1 yr−1). Based on the improved model, it predicted that the highest and lowest S deposition flux were mainly concentrated on the Southeast China (69.57 kg S ha−1 yr−1) and Inner Mongolia (42.37 kg S ha−1 yr−1), respectively. The estimated wet S deposition flux displayed the remarkably seasonal variation with the highest value in summer (22.22 kg S ha−1 sea−1), follwed by ones in autumn (18.30 kg S ha−1 sea−1), spring (16.27 kg S ha−1 sea−1), and the lowest one in winter (14.71 kg S ha−1 sea−1), which was closely associated with the rainfall amounts. The study provides a novel approach for the S deposition estimation at a national scale.
Keywords: Wet S deposition | Machine learning | Geostatistical approach | China
Physical metallurgy-guided machine learning and artificial intelligent design of ultrahigh-strength stainless steel
یادگیری ماشین با هدایت متالورژی فیزیکی و طراحی هوشمند مصنوعی از فولاد ضد زنگ قوی-2019
With the development of the materials genome philosophy and data mining methodologies, machine learning (ML) has been widely applied for discovering new materials in various systems including highend steels with improved performance. Although recently, some attempts have been made to incorporate physical features in the ML process, its effects have not been demonstrated and systematically analysed nor experimentally validated with prototype alloys. To address this issue, a physical metallurgy (PM) -guided ML model was developed, wherein intermediate parameters were generated based on original inputs and PM principles, e.g., equilibrium volume fraction (Vf) and driving force (Df) for precipitation, and these were added to the original dataset vectors as extra dimensions to participate in and guide the ML process. As a result, the ML process becomes more robust when dealing with small datasets by improving the data quality and enriching data information. Therefore, a new material design method is proposed combining PM-guided ML regression, ML classifier and a genetic algorithm (GA). The model was successfully applied to the design of advanced ultrahigh-strength stainless steels using only a small database extracted from the literature. The proposed prototype alloy with a leaner chemistry but better mechanical properties has been produced experimentally and an excellent agreement was obtained for the predicted optimal parameter settings and the final properties. In addition, the present work also clearly demonstrated that implementation of PM parameters can improve the design accuracy and efficiency by eliminating intermediate solutions not obeying PM principles in the ML process. Furthermore, various important factors influencing the generalizability of the ML model are discussed in detail.
Keywords: Alloy design | Machine learning | Physical metallurgy | Small sample problem | Stainless steel
First-principles and Machine Learning Predictions of Elasticity in Severely Lattice-distorted High-Entropy Alloys with Experimental Validation
اصول اول و پیش بینی یادگیری ماشین از الاستیسیته در آلیاژهای آنتروپی با تحریف شدید شبکه با استفاده از اعتبار سنجی تجربی-2019
Stiffness usually increases with the lattice-distortion-induced strain, as observed in many nanostructures. Partly due to the size differences in the component elements, severe lattice distortion naturally exists in high entropy alloys (HEAs). The single-phase face-centered-cubic (FCC) Al0.3CoCrFeNi HEA, which has large size differences among its constituent elements, is an ideal system to study the relationship between the elastic properties and lattice distortion using a combined experimental and computational approach based on in-situ neutron-diffraction (ND) characterizations, and first-principles calculations. Analysis of the interatomic distance distributions from calculations of optimized special quasi random structure (SQS) found that the HEA has a high degree of lattice distortion. When the lattice distortion is explicitly considered, elastic properties calculated using SQS are in excellent agreement with experimental measurements for the HEA. The calculated elastic constant values are within 5% of the ND measurements. A comparison of calculations from the optimized SQS and the SQS with ideal lattice sites indicate that the lattice distortion results in the reduced stiffness. The optimized SQS has a bulk modulus of 177 GPa compared to the ideal lattice SQS with a bulk modulus of 194 GPa. Machine learning (ML) modeling is also implemented to explore the use of fast, and computationally efficient models for predicting the elastic moduli of HEAs. ML models trained on a large dataset of inorganic structures are shown to make accurate predictions of elastic properties for the HEA. The ML models also demonstrate the dependence of bulk and shear moduli on several material features which can act as guides for tuning elastic properties in HEAs.
Keywords: First-principles calculation | Elastic constants | in situ tension test | Neutron diffraction | Machine learning
Machine learning estimates of plug-in hybrid electric vehicle utility factors
تخمین یادگیری ماشین فاکتورهای وسیله نقلیه الکتریکی هیبریدی توکار-2019
Plug-in hybrid electric vehicles (PHEV) combine an electric drive train with a conventional one and are able to drive on gasoline when the battery is fully depleted. They can thus electrify many vehicle miles travelled (VMT) without fundamental range limits. The most important variable for the electrification potential is the ratio of electric VMT to total VMT, the so-called utility factor (UF). However, the empirical assessment of UFs is difficult since important factors such as daily driving, re-charging behaviour and frequency of long-distance travel vary noteworthy between drivers and large data collections are required. Here, we apply machine learning techniques (regression tree, random forest, support vector machine, and neural nets) to estimate real-world UF and compare the estimates to actual long-term average UF of 1768 individual Chevrolet Volt PHEV. Our results show that UFs can be predicted with high accuracy from individual summary statistics to noteworthy accuracy with a mean absolute error of five percentage points. The accuracy of these methods is higher than a simple simulation with electric driving until the battery is discharged and one full daily recharge. The most important variables in estimating UF according to a linear regression model are the variance and skewness of the daily VMT distributions as well as the frequency of long-distance driving. Thus, our findings make UF predictions from existing data sets for driving of conventional vehicles more accurate.
Keywords: Electric vehicles | Plug-in hybrid electric vehicle | Utility factor | Machine learning
Automatic hourly solar forecasting using machine learning models
پیش بینی خودکار خورشیدی ساعتی با استفاده از مدل های یادگیری ماشین-2019
Owing to its recent advance, machine learning has spawned a large collection of solar forecasting works. In particular, machine learning is currently one of the most popular approaches for hourly solar forecasting. Nevertheless, there is evidently a myth on forecast accuracy—virtually all research papers claim superiority over others. Apparently, the “best” model can only be selected with hindsight, i.e., after empirical evaluation. For any new forecasting project, it is irrational for solar forecasters to bet on a single model from the start. In this article, the hourly forecasting performance of 68 machine learning algorithms is evaluated for 3 sky conditions, 7 locations, and 5 climate zones in the continental United States. To ensure a fair comparison, no hybrid model is considered, and only off-the-shelf implementations of these algorithms are used. Moreover, all models are trained using the automatic tuning algorithm available in the R caret package. It is found that tree-based methods consistently perform well in terms of two-year overall results, however, they rarely stand out during daily evaluation. Although no universal model can be found, some preferred ones for each sky and climate condition are advised.
Keywords: Automatic machine learning | Solar forecasting | R caret package
Machine learning to predict occult nodal metastasis in early oral squamous cell carcinoma
یادگیری ماشین برای پیش بینی متاستاز گره غشایی در کارسینوم سلول سنگفرشی اولیه دهان-2019
Objectives: To develop and validate an algorithm to predict occult nodal metastasis in clinically node negative oral cavity squamous cell carcinoma (OCSCC) using machine learning. To compare algorithm performance to a model based on tumor depth of invasion (DOI). Materials and methods: Patients who underwent primary tumor extirpation and elective neck dissection from 2007 to 2013 for clinical T1-2N0 OCSCC were identified from the National Cancer Database (NCDB). Multiple machine learning algorithms were developed to predict pathologic nodal metastasis using clinicopathologic data from 782 patients. The algorithm was internally validated using test data from 654 patients in NCDB and was then externally validated using data from 71 patients treated at a single academic institution. Performance was measured using area under the receiver operating characteristic (ROC) curve (AUC). Machine learning and DOI model performance were compared using Delong’s test for two correlated ROC curves. Results: The best classification performance was achieved with a decision forest algorithm (AUC=0.840). When applied to the single-institution data, the predictive performance of machine learning exceeded that of the DOI model (AUC=0.657, p=0.007). Compared to the DOI model, machine learning reduced the number of neck dissections recommended while simultaneously improving sensitivity and specificity. Conclusion: Machine learning improves prediction of pathologic nodal metastasis in patients with clinical T1- 2N0 OCSCC compared to methods based on DOI. Improved predictive algorithms are needed to ensure that patients with occult nodal disease are adequately treated while avoiding the cost and morbidity of neck dissection in patients without pathologic nodal disease.
Keywords: Oral cancer | Squamous cell carcinoma | Machine learning | Artificial intelligence
Adsorption characteristics of supercritical CO2/CH4 on different types of coal and a machine learning approach
ویژگی های جذب CO2 / CH4 فوق بحرانی در انواع مختلف ذغال سنگ و رویکرد یادگیری ماشین-2019
The injection of CO2 into deep coal beds can not only improve the recovery of CH4, but also contribute to the geological sequestration of CO2. The adsorption characteristics of coal determine the amount of the greenhouse gas that deep coal seams can store in place. Using self-developed adsorption facility of supercritical fluids, this paper studied the adsorption behavior of supercritical CO2 and CH4 on three types of coal (anthracite, bituminous coal A, bituminous coal B) under different temperatures of 35 °C, 45 °C and 55 °C. The influence of temperature, pressure, and coal rank on the Gibbs excess and absolute/real adsorption amount of supercritical CO2/CH4 on coal samples has been analyzed. Several traditional isotherm models are applied to interpret the experimental data and Langmuir related models are verified to provide good performances. However, these models are limited to isothermal conditions and are highly depended on extensive experiments. To overcome these deficiencies, one innovative adsorption model is proposed based on machine learning methods. This model is applied to the adsorption data of both this paper and four early publications. It was proved to be highly effective in predicting adsorption behavior of a certain type of coal. To further break the limit of coal type, the second optimization model is provided based on published data. Using the second model, one can predict the adsorption behavior of coal based on the fundamental physicochemical parameters of coal. Overall, working directly with the real data, the machine learning technique makes the unified adsorption model become possible, avoiding tedious theoretical assumptions, derivations and strong limitations of the traditional model.
Keywords: Supercritical CO2 | Supercritical CH4 | Coal | Adsorption model | Machine learning
Identification and analysis of behavioral phenotypes in autism spectrum disorder via unsupervised machine learning
شناسایی و تجزیه و تحلیل فنوتیپ های رفتاری در اختلال طیف اوتیسم از طریق یادگیری ماشین بدون نظارت-2019
Background and objective: Autism spectrum disorder (ASD) is a heterogeneous disorder. Research has explored potential ASD subgroups with preliminary evidence supporting the existence of behaviorally and genetically distinct subgroups; however, research has yet to leverage machine learning to identify phenotypes on a scale large enough to robustly examine treatment response across such subgroups. The purpose of the present study was to apply Gaussian Mixture Models and Hierarchical Clustering to identify behavioral phenotypes of ASD and examine treatment response across the learned phenotypes. Materials and methods: The present study included a sample of children with ASD (N = 2400), the largest of its kind to date. Unsupervised machine learning was applied to model ASD subgroups as well as their taxonomic relationships. Retrospective treatment data were available for a portion of the sample (n =1034). Treatment response was examined within each subgroup via regression. Results: The application of a Gaussian Mixture Model revealed 16 subgroups. Further examination of the subgroups through Hierarchical Agglomerative Clustering suggested 2 overlying behavioral phenotypes with unique deficit profiles each composed of subgroups that differed in severity of those deficits. Furthermore, differentiated response to treatment was found across subtypes, with a substantially higher amount of variance accounted for due to the homogenization effect of the clustering. Discussion: The high amount of variance explained by the regression models indicates that clustering provides a basis for homogenization, and thus an opportunity to tailor treatment based on cluster memberships. These findings have significant implications on prognosis and targeted treatment of ASD, and pave the way for personalized intervention based on unsupervised machine learning.
Keywords: Machine learning | Autism spectrum disorder | Behavioral phenotypes | Cluster analysis | Treatment response
Mining patient-specific and contextual data with machine learning technologies to predict cancellation of children’s surgery
استخراج داده های خاص و اختصاصی بیمار با فناوری های یادگیری ماشین برای پیش بینی لغو جراحی کودکان-2019
Background: Last-minute surgery cancellation represents a major wastage of resources and can cause significant inconvenience to patients. Our objectives in this study were: 1) To develop predictive models of last-minute surgery cancellation, utilizing machine learning technologies, from patient-specific and contextual data from two distinct pediatric surgical sites of a single institution; and 2) to identify specific key predictors that impact children’s risk of day-of-surgery cancellation. Methods and findings: We extracted five-year datasets (2012–2017) from the Electronic Health Record at Cincinnati Children’s Hospital Medical Center. By leveraging patient-specific information and contextual data, machine learning classifiers were developed to predict all patient-related cancellations and the most frequent four cancellation causes individually (patient illness, “no show,” NPO violation and refusal to undergo surgery by either patient or family). Model performance was evaluated by the area under the receiver operating characteristic curve (AUC) using ten-fold cross-validation. The best performance for predicting all-cause surgery cancellation was generated by gradient-boosted logistic regression models, with AUC 0.781 (95% CI: [0.764,0.797]) and 0.740 (95% CI: [0.726,0.771]) for the two campuses. Of the four most frequent individual causes of cancellation, “no show” and NPO violation were predicted better than patient illness or patient/family refusal. Models showed good cross-campus generalizability (AUC: 0.725/0.735, when training on one site and testing on the other). To synthesize a human-oriented conceptualization of pediatric surgery cancellation, an iterative step-forward approach was applied to identify key predictors which may inform the design of future preventive interventions. Conclusions: Our study demonstrated the capacity of machine learning models for predicting pediatric patients at risk of last-minute surgery cancellation and providing useful insight into root causes of cancellation. The approach offers the promise of targeted interventions to significantly decrease both healthcare costs and also families’ negative experiences.
Keywords: Pediatric surgery cancellation | Quality improvement | Predictive modeling | Machine learning
Predicting breast cancer metastasis by using serum biomarkers and clinicopathological data with machine learning technologies
پیش بینی متاستاز سرطان پستان با استفاده از نشانگرهای سرمی و داده های کلینیکوپاتولوژیکی با فن آوری های یادگیری ماشین-2019
Background: Approximately 10%–15% of patients with breast cancer die of cancer metastasis or recurrence, and early diagnosis of it can improve prognosis. Breast cancer outcomes may be prognosticated on the basis of surface markers of tumor cells and serum tests. However, evaluation of a combination of clinicopathological features may offer a more comprehensive overview for breast cancer prognosis. Materials and methods: We evaluated serum human epidermal growth factor receptor 2 (sHER2) as part of a combination of clinicopathological features used to predict breast cancer metastasis using machine learning algorithms, namely random forest, support vector machine, logistic regression, and Bayesian classification algorithms. The sample cohort comprised 302 patients who were diagnosed with and treated for breast cancer and received at least one sHER2 test at Chang Gung Memorial Hospital at Linkou between 2003 and 2016. Results: The random-forest-based model was determined to be the optimal model to predict breast cancer metastasis at least 3 months in advance; the correspondingarea under the receiver operating characteristic curve value was 0. 75 (p < 0. 001). Conclusion: The random-forest-based model presented in this study may be helpful as part of a follow-up intervention decision support system and may lead to early detection of recurrence, early treatment, and more favorable outcomes.
Keywords: Breast cancer | Machine learning | Prediction model | Cancer prognosis