Estimating monthly wet sulfur (S) deposition flux over China using an ensemble model of improved machine learning and geostatistical approach
برآورد شار رسوب ماهانه گوگرد مرطوب (S) بر روی چین با استفاده از مدل گروهی از یادگیری ماشین پیشرفته و روش زمین آماری-2019
The wet S deposition was treated as a key issue because it played the negative on the soil acidification, biodiversity loss, and global climate change. However, the limited ground-level monitoring sites make it difficult to fully clarify the spatiotemporal variations of wet S deposition over China. Therefore, an ensemble model of improved machine learning and geostatistical method named fruit fly optimization algorithm-random forestspatiotemporal Kriging (FOA-RF-STK) model was developed to estimate the nationwide S deposition based on the emission inventory, meteorological factors, and other geographical covariates. The ensemble model can capture the relationship between predictors and S deposition flux with the better performance (R2=0.68, root mean square error (RMSE)=7.51 kg ha−1 yr−1) compared with the original RF model (R2=0.52, RMSE=8.99 kg ha−1 yr−1). Based on the improved model, it predicted that the highest and lowest S deposition flux were mainly concentrated on the Southeast China (69.57 kg S ha−1 yr−1) and Inner Mongolia (42.37 kg S ha−1 yr−1), respectively. The estimated wet S deposition flux displayed the remarkably seasonal variation with the highest value in summer (22.22 kg S ha−1 sea−1), follwed by ones in autumn (18.30 kg S ha−1 sea−1), spring (16.27 kg S ha−1 sea−1), and the lowest one in winter (14.71 kg S ha−1 sea−1), which was closely associated with the rainfall amounts. The study provides a novel approach for the S deposition estimation at a national scale.
Keywords: Wet S deposition | Machine learning | Geostatistical approach | China
Physical metallurgy-guided machine learning and artificial intelligent design of ultrahigh-strength stainless steel
یادگیری ماشین با هدایت متالورژی فیزیکی و طراحی هوشمند مصنوعی از فولاد ضد زنگ قوی-2019
With the development of the materials genome philosophy and data mining methodologies, machine learning (ML) has been widely applied for discovering new materials in various systems including highend steels with improved performance. Although recently, some attempts have been made to incorporate physical features in the ML process, its effects have not been demonstrated and systematically analysed nor experimentally validated with prototype alloys. To address this issue, a physical metallurgy (PM) -guided ML model was developed, wherein intermediate parameters were generated based on original inputs and PM principles, e.g., equilibrium volume fraction (Vf) and driving force (Df) for precipitation, and these were added to the original dataset vectors as extra dimensions to participate in and guide the ML process. As a result, the ML process becomes more robust when dealing with small datasets by improving the data quality and enriching data information. Therefore, a new material design method is proposed combining PM-guided ML regression, ML classifier and a genetic algorithm (GA). The model was successfully applied to the design of advanced ultrahigh-strength stainless steels using only a small database extracted from the literature. The proposed prototype alloy with a leaner chemistry but better mechanical properties has been produced experimentally and an excellent agreement was obtained for the predicted optimal parameter settings and the final properties. In addition, the present work also clearly demonstrated that implementation of PM parameters can improve the design accuracy and efficiency by eliminating intermediate solutions not obeying PM principles in the ML process. Furthermore, various important factors influencing the generalizability of the ML model are discussed in detail.
Keywords: Alloy design | Machine learning | Physical metallurgy | Small sample problem | Stainless steel
First-principles and Machine Learning Predictions of Elasticity in Severely Lattice-distorted High-Entropy Alloys with Experimental Validation
اصول اول و پیش بینی یادگیری ماشین از الاستیسیته در آلیاژهای آنتروپی با تحریف شدید شبکه با استفاده از اعتبار سنجی تجربی-2019
Stiffness usually increases with the lattice-distortion-induced strain, as observed in many nanostructures. Partly due to the size differences in the component elements, severe lattice distortion naturally exists in high entropy alloys (HEAs). The single-phase face-centered-cubic (FCC) Al0.3CoCrFeNi HEA, which has large size differences among its constituent elements, is an ideal system to study the relationship between the elastic properties and lattice distortion using a combined experimental and computational approach based on in-situ neutron-diffraction (ND) characterizations, and first-principles calculations. Analysis of the interatomic distance distributions from calculations of optimized special quasi random structure (SQS) found that the HEA has a high degree of lattice distortion. When the lattice distortion is explicitly considered, elastic properties calculated using SQS are in excellent agreement with experimental measurements for the HEA. The calculated elastic constant values are within 5% of the ND measurements. A comparison of calculations from the optimized SQS and the SQS with ideal lattice sites indicate that the lattice distortion results in the reduced stiffness. The optimized SQS has a bulk modulus of 177 GPa compared to the ideal lattice SQS with a bulk modulus of 194 GPa. Machine learning (ML) modeling is also implemented to explore the use of fast, and computationally efficient models for predicting the elastic moduli of HEAs. ML models trained on a large dataset of inorganic structures are shown to make accurate predictions of elastic properties for the HEA. The ML models also demonstrate the dependence of bulk and shear moduli on several material features which can act as guides for tuning elastic properties in HEAs.
Keywords: First-principles calculation | Elastic constants | in situ tension test | Neutron diffraction | Machine learning
Machine learning estimates of plug-in hybrid electric vehicle utility factors
تخمین یادگیری ماشین فاکتورهای وسیله نقلیه الکتریکی هیبریدی توکار-2019
Plug-in hybrid electric vehicles (PHEV) combine an electric drive train with a conventional one and are able to drive on gasoline when the battery is fully depleted. They can thus electrify many vehicle miles travelled (VMT) without fundamental range limits. The most important variable for the electrification potential is the ratio of electric VMT to total VMT, the so-called utility factor (UF). However, the empirical assessment of UFs is difficult since important factors such as daily driving, re-charging behaviour and frequency of long-distance travel vary noteworthy between drivers and large data collections are required. Here, we apply machine learning techniques (regression tree, random forest, support vector machine, and neural nets) to estimate real-world UF and compare the estimates to actual long-term average UF of 1768 individual Chevrolet Volt PHEV. Our results show that UFs can be predicted with high accuracy from individual summary statistics to noteworthy accuracy with a mean absolute error of five percentage points. The accuracy of these methods is higher than a simple simulation with electric driving until the battery is discharged and one full daily recharge. The most important variables in estimating UF according to a linear regression model are the variance and skewness of the daily VMT distributions as well as the frequency of long-distance driving. Thus, our findings make UF predictions from existing data sets for driving of conventional vehicles more accurate.
Keywords: Electric vehicles | Plug-in hybrid electric vehicle | Utility factor | Machine learning
Automatic hourly solar forecasting using machine learning models
پیش بینی خودکار خورشیدی ساعتی با استفاده از مدل های یادگیری ماشین-2019
Owing to its recent advance, machine learning has spawned a large collection of solar forecasting works. In particular, machine learning is currently one of the most popular approaches for hourly solar forecasting. Nevertheless, there is evidently a myth on forecast accuracy—virtually all research papers claim superiority over others. Apparently, the “best” model can only be selected with hindsight, i.e., after empirical evaluation. For any new forecasting project, it is irrational for solar forecasters to bet on a single model from the start. In this article, the hourly forecasting performance of 68 machine learning algorithms is evaluated for 3 sky conditions, 7 locations, and 5 climate zones in the continental United States. To ensure a fair comparison, no hybrid model is considered, and only off-the-shelf implementations of these algorithms are used. Moreover, all models are trained using the automatic tuning algorithm available in the R caret package. It is found that tree-based methods consistently perform well in terms of two-year overall results, however, they rarely stand out during daily evaluation. Although no universal model can be found, some preferred ones for each sky and climate condition are advised.
Keywords: Automatic machine learning | Solar forecasting | R caret package
Machine learning to predict occult nodal metastasis in early oral squamous cell carcinoma
یادگیری ماشین برای پیش بینی متاستاز گره غشایی در کارسینوم سلول سنگفرشی اولیه دهان-2019
Objectives: To develop and validate an algorithm to predict occult nodal metastasis in clinically node negative oral cavity squamous cell carcinoma (OCSCC) using machine learning. To compare algorithm performance to a model based on tumor depth of invasion (DOI). Materials and methods: Patients who underwent primary tumor extirpation and elective neck dissection from 2007 to 2013 for clinical T1-2N0 OCSCC were identified from the National Cancer Database (NCDB). Multiple machine learning algorithms were developed to predict pathologic nodal metastasis using clinicopathologic data from 782 patients. The algorithm was internally validated using test data from 654 patients in NCDB and was then externally validated using data from 71 patients treated at a single academic institution. Performance was measured using area under the receiver operating characteristic (ROC) curve (AUC). Machine learning and DOI model performance were compared using Delong’s test for two correlated ROC curves. Results: The best classification performance was achieved with a decision forest algorithm (AUC=0.840). When applied to the single-institution data, the predictive performance of machine learning exceeded that of the DOI model (AUC=0.657, p=0.007). Compared to the DOI model, machine learning reduced the number of neck dissections recommended while simultaneously improving sensitivity and specificity. Conclusion: Machine learning improves prediction of pathologic nodal metastasis in patients with clinical T1- 2N0 OCSCC compared to methods based on DOI. Improved predictive algorithms are needed to ensure that patients with occult nodal disease are adequately treated while avoiding the cost and morbidity of neck dissection in patients without pathologic nodal disease.
Keywords: Oral cancer | Squamous cell carcinoma | Machine learning | Artificial intelligence
Enhancing transportation systems via deep learning: A survey
تقویت سیستم های حمل و نقل از طریق یادگیری عمیق: یک مرور-2019
Machine learning (ML) plays the core function to intellectualize the transportation systems. Recent years have witnessed the advent and prevalence of deep learning which has provoked a storm in ITS (Intelligent Transportation Systems). Consequently, traditional ML models in many applications have been replaced by the new learning techniques and the landscape of ITS is being reshaped. Under such perspective, we provide a comprehensive survey that focuses on the utilization of deep learning models to enhance the intelligence level of transportation systems. By organizing multiple dozens of relevant works that were originally scattered here and there, this survey attempts to provide a clear picture of how various deep learning models have been applied in multiple transportation applications.
Keywords: Deep learning | Transportation systems | Survey
On initial population generation in feature subset selection
تولید جمعیت اولیه در انتخاب زیر مجموعه ویژگی-2019
Performance of evolutionary algorithms depends on many factors such as population size, number of generations, crossover or mutation probability, etc. Generating the initial population is one of the impor- tant steps in evolutionary algorithms. A poor initial population may unnecessarily increase the number of searches or it may cause the algorithm to converge at local optima. In this study, we aim to find a promis- ing method for generating the initial population, in the Feature Subset Selection (FSS) domain. FSS is not considered as an expert system by itself, yet it constitutes a significant step in many expert systems. It eliminates redundancy in data, which decreases training time and improves solution quality. To achieve our goal, we compare a total of five different initial population generation methods; Information Gain Ranking (IGR), greedy approach and three types of random approaches. We evaluate these methods using a specialized Teaching Learning Based Optimization searching algorithm (MTLBO-MD), and three super- vised learning classifiers: Logistic Regression, Support Vector Machines, and Extreme Learning Machine. In our experiments, we employ 12 publicly available datasets, mostly obtained from the well-known UCI Machine Learning Repository. According to their feature sizes and instance counts, we manually classify these datasets as small, medium, or large-sized. Experimental results indicate that all tested methods achieve similar solutions on small-sized datasets. For medium-sized and large-sized datasets, however, the IGR method provides a better starting point in terms of execution time and learning performance. Finally, when compared with other studies in literature, the IGR method proves to be a viable option for initial population generation.
Keywords: Feature subset selection | Initial population | Multiobjective optimization
Adsorption characteristics of supercritical CO2/CH4 on different types of coal and a machine learning approach
ویژگی های جذب CO2 / CH4 فوق بحرانی در انواع مختلف ذغال سنگ و رویکرد یادگیری ماشین-2019
The injection of CO2 into deep coal beds can not only improve the recovery of CH4, but also contribute to the geological sequestration of CO2. The adsorption characteristics of coal determine the amount of the greenhouse gas that deep coal seams can store in place. Using self-developed adsorption facility of supercritical fluids, this paper studied the adsorption behavior of supercritical CO2 and CH4 on three types of coal (anthracite, bituminous coal A, bituminous coal B) under different temperatures of 35 °C, 45 °C and 55 °C. The influence of temperature, pressure, and coal rank on the Gibbs excess and absolute/real adsorption amount of supercritical CO2/CH4 on coal samples has been analyzed. Several traditional isotherm models are applied to interpret the experimental data and Langmuir related models are verified to provide good performances. However, these models are limited to isothermal conditions and are highly depended on extensive experiments. To overcome these deficiencies, one innovative adsorption model is proposed based on machine learning methods. This model is applied to the adsorption data of both this paper and four early publications. It was proved to be highly effective in predicting adsorption behavior of a certain type of coal. To further break the limit of coal type, the second optimization model is provided based on published data. Using the second model, one can predict the adsorption behavior of coal based on the fundamental physicochemical parameters of coal. Overall, working directly with the real data, the machine learning technique makes the unified adsorption model become possible, avoiding tedious theoretical assumptions, derivations and strong limitations of the traditional model.
Keywords: Supercritical CO2 | Supercritical CH4 | Coal | Adsorption model | Machine learning
Combining hierarchical clustering approaches using the PCA method
ترکیب روشهای خوشه بندی سلسله مراتبی با استفاده از روش PCA-2019
In expert systems, data mining methods are algorithms that simulate humans’ problem-solving capabil- ities. Clustering methods as unsupervised machine learning methods are crucial approaches to catego- rize similar samples in the same categories. The use of different clustering algorithms to a given dataset produces clusters with different qualities. Hence, many researchers have applied clustering combination methods to reduce the risk of choosing an inappropriate clustering algorithm. In these methods, the out- puts of several clustering algorithms are combined. In these research works, the input hierarchical clus- terings are transformed to descriptor matrices and their combination is achieved by aggregating their descriptor matrices. In previous works, only element-wise aggregation operators have been used and the relation between the elements of each descriptor matrix has been ignored. However, the value of each element of the descriptor matrix is meaningful in comparison with its other elements. The current study proposes a novel method of combining hierarchical clustering approaches based on principle component analysis (PCA). PCA as an aggregator allows considering all elements of the descriptor matrices. In the proposed approach, basic clusters are made and transformed to descriptor matrices. Then, a final ma- trix is extracted from the descriptor matrices using PCA. Next, a final dendrogram is constructed from the matrix that is used to summarize the results of the diverse clustering. The experimental results on popular available datasets show the superiority of the clustering accuracy of the proposed method over basic clustering methods such as single, average and centroid linkage and previously combined hierar- chical clustering methods. In addition, statistical tests show that the proposed method significantly out- performed hierarchical clustering combination methods with element-wise averaging operators in almost all tested datasets. Several experiments have also been conducted which confirm the robustness of the proposed method for its parameter setting.
Keywords: Clustering | Hierarchical clustering | Principle component analysis | PCA