An online tool for predicting fatigue strength of steel alloys based on ensemble data mining
یک ابزار آنلاین برای پیش بینی استحکام خستگی آلیاژهای فولادی بر اساس گروه داده کاوی -2018
Fatigue strength is one of the most important mechanical properties of steel. Here we describe the development and deployment of data-driven ensemble predictive models for fatigue strength of a given steel alloy represented by its composition and processing information. The forward models for PSPP relationships (predicting property of a material given its composition and processing parameters) are built using over 400 experimental ob servations from the Japan National Institute of Materials Science (NIMS) steel fatigue dataset. Forty modeling techniques, including ensemble modeling were explored to identify the set of best performing models for dif ferent attribute sets. Data-driven feature selection techniques were also used to find a small non-redundant subset of attributes, and the processing/composition parameters most influential to fatigue strength were identified to inform future design efforts. The developed predictive models are deployed in a user-friendly online
Keywords: Materials informatics ، Supervised learning ، Ensemble learning ، Fatigue strength ، Online tool
مروری بر یادگیری عمیق برای داده های بزرگ
سال انتشار: 2018 - تعداد صفحات فایل pdf انگلیسی: 12 - تعداد صفحات فایل doc فارسی: 44
یادگیری عمیق، به عنوان یکی از مهم ترین تکنیک های یادگیری ماشینی، موفقیت های زیادی در بسیاری از برنامه های کاربردی مانند تحلیل تصویر، تشخیص گفتار و درک متن بدست اورده است . انها از استراتژی های نظارت شده و بی نظیر برای یادگیری چندین سطح و ویژگی های معماری سلسله مراتبی برای وظایف طبقه بندی و تشخیص الگو استفاده می کنند. پیشرفت های اخیر در شبکه های حسگر و فناوری های ارتباطی، قادر به جمع آوری داده های بزرگ می باشد. اگر چه داده های بزرگ فرصت های خوبی برای بسیاری از زمینه ها از جمله تجارت الکترونیک، کنترل صنعتی و پزشکی هوشمند فراهم می اورند، اما در زمینه داده کاوی و پردازش اطلاعات به دلیل ویژگی های حجم زیاد، انواع مختلف، سرعت زیاد و حقیقت بزرگ، چالش های فراوانی را به همراه خواهند داشت. در چند سال گذشته، یادگیری عمیق در راه حل های تحلیلی داده های بزرگ نقش مهمی را ایفا کرده است. در این مقاله، تحقیقات انجام شده درباره مدل های یادگیری عمیق برای یادگیری ویژگی های بزرگ داده ها در اینده را مرور می کنیم. علاوه بر این، ما با توجه به چالش های باقیمانده به یادگیری عمیق داده های بزرگ و بحث در مورد موضوعات آینده اشاره می کنیم. Furthermore, we point out the remaining challenges of big data deep learning and discuss the future topics.
کلمات کلیدی: یادگیری عمیق | داده های بزرگ | رمزگذاران خودکار انباشته شده | شبکه های اعتقادی عمیق | شبکه های عصبی کانولوشن | شبکه عصبی مرتب
|مقاله ترجمه شده|
Educational data mining: Predictive analysis of academic performance of public school students in the capital of Brazil
داده کاوی آموزشی: تجزیه و تحلیل پیش بینی عملکرد تحصیلی دانش آموزان مدارس عمومی در پایتخت برزیل-2018
In this article, we present a predictive analysis of the academic performance of students in public schools of the Federal District of Brazil during the school terms of 2015 and 2016. Initially, we performed a descriptive sta tistical analysis to gain insight from data. Subsequently, two datasets were obtained. The first dataset contains variables obtained prior to the start of the school year, and the second included academic variables collected two months after the semester began. Classification models based on the Gradient Boosting Machine (GBM) were created to predict academic outcomes of student performance at the end of the school year for each dataset. Results showed that, though the attributes ‘grades and ‘absences were the most relevant for predicting the end of the year academic outcomes of student performance, the analysis of demographic attributes reveals that ‘neighborhood’, ‘school’ and ‘age’ are also potential indicators of a students academic success or failure.
Keywords: Educational data mining ، Academic performance ، Predictive analysis ، Decision tree ، Gradient boosting machine ، H2O
Mining data in a dynamic PRA framework
داده های معدن در یک چارچوب PRA پویا-2018
Computational, also known as Dynamic, Probabilistic Risk Assessment (PRA) methods employ system simulation codes coupled with stochastic analysis tools in order to determine probabilities of certain outcomes such as system failure. In contrast to Classical PRA methods (i.e., Event-Tree and Fault-Tree) in which timing and se quencing of events is set by the analyst, accident progression is dictated by the system control logic and its interaction with the system temporal evolution. Due to the nature of the problem, Dynamic PRA methods can be expensive form a computational point of view since a large number of accident scenarios is simulated. Consequently, they also generate a large amount of data (database storage may be on the order of gigabytes or higher). We investigate and apply several methods and algorithms to analyze these large time-dependent data sets. The objective is to present a broad overview of methods and algorithms that can be used to improve data quality and to analyze and extract information from large data sets containing time dependent data. In this context, “extracting information” means constructing input-output correlations, finding commonalities, and identifying outliers.
Keywords: Data mining ، Dynamic PRA ، Probabilistic risk assessment ، Clustering
Advanced data mining approaches in the assessment of urinary concentrations of bisphenols, chlorophenols, parabens and benzophenones in Brazilian children and their association to DNA damage
روش های پیشرفته داده کاوی در ارزیابی غلظت ادراری بیس فنول ها، کلروفنل ها، پارابن ها و بنزوفنون ها در کودکان برزیل و ارتباط آنها با آسیب DNA-2018
Human exposure to endocrine disrupting chemicals (EDCs) has received considerable attention over the last three decades. However, little is known about the influence of co-exposure to multiple EDCs on effect-bio markers such as oxidative stress in Brazilian children. In this study, concentrations of 40 EDCs were determined in urine samples collected from 300 Brazilian children of ages 6–14 years and data were analyzed by advanced data mining techniques. Oxidative DNA damage was evaluated from the urinary concentrations of 8-hydroxy-2′- deoxyguanosine (8OHDG). Fourteen EDCs, including bisphenol A (BPA), methyl paraben (MeP), ethyl paraben (EtP), propyl paraben (PrP), 3,4-dihydroxy benzoic acid (3,4-DHB), methyl-protocatechuic acid (OH-MeP), ethyl-protocatechuic acid (OH-EtP), triclosan (TCS), triclocarban (TCC), 2-hydroxy-4-methoxybenzophenone (BP3), 2,4-dihydroxybenzophenone (BP1), bisphenol A bis(2,3-dihydroxypropyl) glycidyl ether (BADGE·2H2O), 2,4-dichlorophenol (2,4-DCP), and 2,5-dichlorophenol (2,5-DCP) were found in > 50% of the urine samples analyzed. The highest geometric mean concentrations were found for MeP (43.1 ng/mL), PrP (3.12 ng/mL), 3,4- DHB (42.2 ng/mL), TCS (8.26 ng/mL), BP3 (3.71 ng/mL), and BP1 (4.85 ng/mL), and exposures to most of which were associated with personal care product (PCP) use. Statistically significant associations were found between urinary concentrations of 8OHDG and BPA, MeP, 3,4-DHB, OH-MeP, OH-EtP, TCS, BP3, 2,4-DCP, and 2,5-DCP. After clustering the data on the basis of i) 14 EDCs (exposure levels), ii) demography (age, gender and geo graphic location), and iii) 8OHDG (effect), two distinct clusters of samples were identified. 8OHDG con centration was the most critical parameter that differentiated the two clusters, followed by OH-EtP. When 8OHDG was removed from the dataset, predictability of exposure variables increased in the order of: OH EtP > OH-MeP > 3,4-DHB > BPA > 2,4-DCP > MeP > TCS > EtP > BP1 > 2,5-DCP. Our results showed that co-exposure to OH-EtP, OH-MeP, 3,4-DHB, BPA, 2,4-DCP, MeP, TCS, EtP, BP1, and 2,5-DCP was associated with DNA damage in children. This is the first study to report exposure of Brazilian children to a wide range of EDCs and the data mining approach further strengthened our findings of chemical co-exposures and biomarkers of effect.
Keywords: Endocrine disrupting chemicals ، Human co-exposure ، Children ، Data mining ، Oxidative stress
Copper mining productivity: Lessons from Chile
بهره وری معدن مس: درسهای از شیلی-2018
Chile represents almost one third of the world’s copper production. Mining is one of the main industries that contributes to our country’s development with resources and is globally recognized. Due to the end of the commodity cycle, improving productivity will be a key variable in mining performance in incoming years. This paper studies mining productivity in Chile by relying on two indicators: measure of the total factor productivity (TFP) using the traditional Solow methodology, and labor productivity. Since 2000, we found a decrease in TFP, explained mainly by the participation of capital as well as diverse factor adjustments to labor and capital inputs. Average labor productivity also decreases 42% from 1999 to 2010, a decrease explained by four determinants: real mining wages, electricity prices, copper prices and mineral grade. Since 2010, average labor productivity has increased 30%, and there is also an opportunity for additional improvement by reducing energy costs as well as by aligning productivity and labor performances.
Keywords: Labor productivity; TFP; Chile; Mining; Labor; Energy; Copper price
Developing an integrated framework for using data mining techniques and ontology concepts for process improvement
توسعه چارچوب یکپارچه برای استفاده از تکنیک های داده کاوی و مفاهیم هستی شناسی برای بهبود فرایند-2018
Process, as an important knowledge resource, must be effectively managed and improved. The main prob lems are the large number of processes, their specific features, and the complicated relationships between them, which all lead to the increase in complexity and create a high-dimensionality problem. Traditional process management systems are unable to manage and improve processes with a high volume of data. Data mining techniques, however, can be employed to identify valuable patterns. With the aid of these patterns, suggestions for process improvement can be presented. Further, process ontology can be ap plied to share the process patterns between people, facilitate the process understanding, and develop the reusability of the extracted patterns for process improvement. This study presents a combined three-part, five-stage framework of data mining, process improve ment, and process ontology. To evaluate the applicability and effectiveness of the proposed framework, a real process dataset is applied. Two clustering and classification techniques are used to discover valuable patterns as the process ontology. The output of these two techniques can be considered as the recom mendations for improving the processes. The proposed framework can be exploited to support process improvement methodologies in organizations.
Keywords: Data mining ، Process improvement ، Ontology ، Classification ، Clustering
Clustering fMRI data with a robust unsupervised learning algorithm for neuroscience data mining
خوشه بندی اطلاعات fMRI با یک الگوریتم یادگیری بدون نظارت قوی برای استخراج داده کاوی علوم اعصاب-2018
Background: Clustering approaches used in functional magnetic resonance imaging (fMRI) research use brain activity to divide the brain into various parcels with some degree of homogeneous characteristics, but choosing the appropriate clustering algorithms remains a problem. New method: A novel application of the robust unsupervised learning approach is proposed in the current study. Robust growing neural gas (RGNG) algorithm was fed into fMRI data and compared with growing neural gas (GNG) algorithm, which has not been used for this purpose or any other medical application. Learning algorithms proposed in the current study are fed with real and free auditory fMRI datasets. Results: The fMRI result obtained by running RGNG was within the expected outcome and is similar to those found with the hypothesis method in detecting active areas within the expected auditory cortices. Comparison with existing method(s): The fMRI application of the presented RGNG approach is clearly superior to other approaches in terms of its insensitivity to different initializations and the presence of outliers, as well as its ability to determine the actual number of clusters successfully, as indicated by its performance measured by minimum description length (MDL) and receiver operating characteristic (ROC) analysis. Conclusions: The RGNG can detect the active zones in the brain, analyze brain function, and determine the optimal number of underlying clusters in fMRI datasets. This algorithm can define the positions of the center of an output cluster corresponding to the minimal MDL value.
Keywords: Clustering technique ، Data mining ، Growing neural gas (GNG) ، Robust growing neural gas (RGNG)
Identifying the relative importance of non-suicidal self-injury features in classifying suicidal ideation, plans, and behavior using exploratory data mining
شناسایی اهمیت نسبی ویژگی های خود اسیب غیر خودکشی درطبقه بندی ایده های خودکشی، برنامه ها و رفتار با استفاده از داده کاوی اکتشافی -2018
Individuals with a history of non-suicidal self-injury (NSSI) are at alarmingly high risk for suicidal ideation (SI), planning (SP), and attempts (SA). Given these findings, research has begun to evaluate the features of this multi faceted behavior that may be most important to assess when quantifying risk for SI, SP, and SA. However, no studies have examined the wide range of NSSI characteristics simultaneously when determining which NSSI features are most salient to suicide risk. The current study utilized three exploratory data mining techniques (elastic net regression, decision trees, random forests) to address these gaps in the literature. Undergraduates with a history of NSSI (N = 359) were administered measures assessing demographic variables, depression, and 58 NSSI characteristics (e.g., methods, frequency, functions, locations, scarring) as well as current SI, current SP, and SA history. Results suggested that depressive symptoms and the anti-suicide function of NSSI were the most important features for predicting SI and SP. The most important features in predicting SA were the anti-suicide function of NSSI, NSSI-related medical treatment, and NSSI scarring. Overall, results suggest that NSSI functions, scarring, and medical lethality may be more important to assess than commonly regarded NSSI severity indices when ascertaining suicide risk.
Keywords: Non-suicidal self-injury ، Suicidal ideation ، Suicide plan ، Suicide attempt ، Exploratory data mining ، Elastic net regression ، Decision trees
Data mining-assisted short-term wind speed forecasting by wavelet packet decomposition and Elman neural network
پیش بینی سرعت سرعت باد با کمک داده ها توسط تجزیه بسته های موجک و شبکه عصبی المان-2018
On the basis of data-mining technology, a hybrid method of short-term wind speed forecast is proposed by the wavelet packet decomposition, density-based spatial clustering of applications with noise, and the Elman neural network (WPD-DBSCAN-ENN). First, the WPD is applied to decompose a raw wind speed series into several sub series. The gradient boosted regression trees (GBRT) algorithm is then applied to determine the structure of the ENNs for each sub-wind series. Next, the training dataset is clustered by the DBSCAN to select the representative data for the ENNs. A key parameter in the DBSCAN is chosen through a new method. Finally, the wind speed forecast is conducted by the ENNs. Case studies are adopted to validate the accuracy of the hybrid methods. The results are compared with those obtained using the WPD-ENN hybrid method and a single ENN via four general error criteria. The performance of the WPD-DBSCAN-ENN hybrid method outperformed those of the other methods indicated above.
Keywords: Wind speed forecasting ، Wavelet packet decomposition (WPD) ، Gradient boosted regression trees (GBRT) ، Density-based spatial clustering of applications ، with noise (DBSCAN) ، Elman neural network (ENN)