با سلام خدمت کاربران عزیز، به اطلاع می رساند ترجمه مقالاتی که سال انتشار آن ها زیر 2008 می باشد رایگان بوده و میتوانید با وارد شدن در صفحه جزییات مقاله به رایگان ترجمه را دانلود نمایید.
Data-driven switching modeling for MPC using Regression Trees and Random Forests
مدل سازی سوئیچینگ داده محور برای MPC با استفاده از درختان رگرسیون و جنگل های تصادفی-2020
Model Predictive Control is a well consolidated technique to design optimal control strategies, leveraging the capability of a mathematical model to predict a system’s behavior over a time horizon. However, building physics-based models for complex large-scale systems can be cost and time prohibitive. To overcome this problem we propose a methodology to exploit machine learning techniques (i.e. Regression Trees and Random Forests) in order to build a Switching Affine dynamical model (deterministic and Markovian) of a large-scale system using historical data, and apply Model Predictive Control. A comparison with an optimal benchmark and related techniques is provided on an energy management system to validate the performance of the proposed methodology.
Keywords: Regression Trees | Random Forests | Model predictive control | Switching systems | Markov Jump Systems
Using gait analysis’ parameters to classify Parkinsonism: A data mining approach
استفاده از پارامترهای تحلیل راه رفتن برای طبقه بندی پارکینسونیسم: یک رویکرد داده کاوی-2019
Introduction: Parkinson’s disease (PD) is the second most common neurodegenerative disorder in the world, while Progressive Supranuclear Palsy (PSP) is an atypical Parkinsonism resembling PD, especially in early stage. Assumed that gait dysfunctions represent a major motor symptom for both pathologies, gait analysis can provide clinicians with subclinical information reflecting subtle differences between these diseases. In this scenario, data mining can be exploited in order to differentiate PD patients at different stages of the disease course and PSP using all the variables acquired through gait analysis. Methods: A cohort of 46 subjects (divided into three groups) affected by PD patients at different stages and PSP patients was acquired through gait analysis and spatial and temporal parameters were anal- ysed. Synthetic Minority Over-sampling Technique was used to balance our imbalanced dataset and cross- validation was applied to provide different training and testing sets. Then, Random Forests and Gradient Boosted Trees were implemented. Results: Accuracy, error, precision, recall, specificity and sensitivity were computed for each group and for both algorithms, including 16 features. Random Forests obtained the highest accuracy (86.4%) but also specificity and sensitivity were particularly high, overcoming the 90% for PSP group. Conclusion: The novelty of the study is the use of a data mining approach on the spatial and temporal parameters of gait analysis in order to classify patients affected by typical (PD) and atypical Parkinsonism (PSP) based on gait patterns. This application would be helpful for clinicians to distinguish PSP from PD at early stage, when the differential diagnosis is particularly challenging.
Keywords: Parkinson’s disease |Progressive supranuclear palsy | Gait analysis | Data mining | Random forests | Gradient boosted trees
Direct marketing campaigns in retail banking with the use of deep learning and random forests
کمپین های بازاریابی مستقیم در بانکداری خرده فروشی با استفاده از یادگیری عمیق و جنگل های تصادفی-2019
Credit products are a crucial part of business of banks and other financial institutions. A novel approach based on time series of customer’s data representation for predicting willingness to take a personal loan is shown. Proposed testing procedure based on moving window allows detection of complex, sequen- tial, time based dependencies between particular transactions. Moreover, this approach reduces noise by eliminating irrelevant dependencies that would occur due to the lack of time dimension analysis. The system for identifying customers interested in credit products, based on classification with random forests and deep neural networks is proposed. The promising results of empirical studies prove that the system is able to extract significant patterns from customers historical transfer and transactional data and predict credit purchase likelihood. Our approach, including the testing method, is not limited to banking sector and can be easily transferred and implemented as a general purpose direct marketing campaign system.
Keywords: Consumer credit | Retail banking | Direct marketing | Marketing campaigns | Database marketing | Random forest | Deep learning | Deep belief networks | Data mining | Time series | Feature selection | Boruta algorith
Advancing Ensemble Learning Performance through data transformation and classifiers fusion in granular computing context
پیشبرد عملکرد یادگیری گروه از طریق تبدیل داده ها و ترکیب طبقه بندیگرها در زمینه محاسبات دانه ای-2019
Classification is a special type of machine learning tasks, which is essentially achieved by training a clas- sifier that can be used to classify new instances. In order to train a high performance classifier, it is crucial to extract representative features from raw data, such as text and images. In reality, instances could be highly diverse even if they belong to the same class, which indicates different instances of the same class could represent very different characteristics. For example, in a facial expression recognition task, some instances may be better described by Histogram of Oriented Gradients features, while others may be better presented by Local Binary Patterns features. From this point of view, it is necessary to adopt ensemble learning to train different classifiers on different feature sets and to fuse these classi- fiers towards more accurate classification of each instance. On the other hand, different algorithms are likely to show different suitability for training classifiers on different feature sets. It shows again the ne- cessity to adopt ensemble learning towards advances in the classification performance. Furthermore, a multi-class classification task would become increasingly more complex when the number of classes is increased, i.e. it would lead to the increased difficulty in terms of discriminating different classes. In this paper, we propose an ensemble learning framework that involves transforming a multi-class classification task into a number of binary classification tasks and fusion of classifiers trained on different f eature sets by using different learning algorithms. We report experimental studies on a UCI data set on Sonar and the CK + data set on facial expression recognition. The results show that our proposed ensemble learning approach leads to considerable advances in classification performance, in comparison with popular learn- ing approaches including decision tree ensembles and deep neural networks. In practice, the proposed approach can be used effectively to build an ensemble of ensembles acting as a group of expert systems, which show the capability to achieve more stable performance of pattern recognition, in comparison with building a single classifier that acts as a single expert system.
Keywords: Machine learning | Ensemble learning | Classification | Bagging | Boosting | Random forests
Big data analysis for brain tumor detection: Deep convolutional neural networks
تجزیه و تحلیل داده های بزرگ برای تشخیص تومور مغزی: شبکه های عصبی پیچیده عمیق-2018
Brain tumor detection is an active area of research in brain image processing. In this work, a methodology is proposed to segment and classify the brain tumor using magnetic resonance images (MRI).Deep Neural Networks (DNN) based architecture is employed for tumor segmentation. In the proposed model, 07 layers are used for classification that consist of 03 convolutional, 03 ReLU and a softmax layer. First the input MR image is divided into multiple patches and then the center pixel value of each patch is supplied to the DNN. DNN assign labels according to center pixels and perform segmentation. Extensive experiments are performed using eight large scale benchmark datasets including BRATS 2012 (image dataset and synthetic dataset), 2013 (image dataset and synthetic dataset), 2014, 2015 and ISLES (Ischemic stroke lesion segmentation) 2015 and 2017. The results are validated on accuracy (ACC), sensitivity (SE), specificity (SP), Dice Similarity Coefficient (DSC), precision, false positive rate (FPR), true positive rate (TPR) and Jaccard similarity index (JSI) respectively.
Keywords: Random Forests; Segmentation; Patches; Filters; Tissues
Assessing irrigated agricultures surface water and groundwater consumption by combining satellite remote sensing and hydrologic modelling
بررسی آب های سطحی کشاورزی تحت آبیاری و مصرف آب های زیرزمینی با ترکیب های ماهواره ای سنجش از راه دور و مدل سازی هیدرولوژیکی-2016
Globally, irrigation accounts for more than two thirds of freshwater demand. Recent regional and global assess- ments indicate that groundwater extraction (GWE) for irrigation has increased more rapidly than surface water extraction (SWE), potentially resulting in groundwater depletion. Irrigated agriculture in semi-arid and arid re- gions is usually from a combination of stored surface water and groundwater. This paper assesses the usefulness of remotely-sensed (RS) derived information on both irrigation dynamics and rates of actual evapotranspiration which are both input to a river-reach water balance model in order to quantify irrigation water use and water provenance (either surface water or groundwater). The assessment is implemented for the water-years 2004/ 05–2010/11 in ﬁve reaches of the Murray–Darling Basin (Australia); a heavily regulated basin with large irrigated areas and periodic droughts and ﬂoods. Irrigated area and water use are identiﬁed each water-year (from July to June) through a Random Forest model which uses RS vegetation phenology and actual evapotranspiration as predicting variables. Both irrigated areas and actual evapotranspiration from irrigated areas were compared against published estimates of irrigated areas and total water extraction (SWE + GWE).The river-reach model determines the irrigated area that can be serviced with stored surface water (SWE), and the remainder area (as determined by the Random Forest Model) is assumed to be supplemented by groundwater (GWE). Model re- sults were evaluated against observed SWE and GWE. The modelled SWE generally captures the observed inter- annual patterns and to some extent the magnitudes, with Pearsons correlation coefﬁcients N 0.8 and normalised root-mean-square-error b 30%. In terms of magnitude, the results were as accurate as or better than those of* Corresponding author at: CSIRO Land and Water, CS Christian laboratory, CSIRO Black Mountain site, GPO Box 1666, Canberra ACT 2601, AustraliaE-mail address: email@example.com (J.L. Peña-Arancibia).http://dx.doi.org/10.1016/j.scitotenv.2015.10.0860048-9697/Crown Copyright © 2015 Published by Elsevier B.V. All rights reserved.more traditional (i.e., using areas that ﬂuctuate based on water resource availability and prescribed crop factors) irrigation modelling. The RS irrigated areas and actual evapotranspiration can be used to: (i) understand irriga- tion dynamics, (ii) constrain irrigation models in data scarce regions, as well as (iii) pinpointing areas that require better ground-based monitoring.Crown Copyright © 2015 Published by Elsevier B.V. All rights reserved.
Keywords: Image classification | Random forest | Mapping | Hydrology | Diversions | Evapotranspiration | Murray–Darling basin
سیستم های توصیه گر سه راهی بر مبنای جنگل های تصادفی
سال انتشار: 2016 - تعداد صفحات فایل pdf انگلیسی: 12 - تعداد صفحات فایل doc فارسی: 47
سیستم های توصیه گر تلاش می کنند کاربران را در تصمیمات مرتبط با انتخاب آیتم های مبتنی بر رابط ها پیرامون نظرات شخصی خود آنها، هدایت کنند. اکثر سیستم های موجود تلویحاً فرض را بر این می گذارند که دسته بندی اساسی به صورت دو دویی است، یعنی یک ایتم کاندید یا تصویه شده، و یا توصیه نشده است. ما در اینجا چارچوبی جایگزین را پیشنهاد می کنیم که تصمیمات سه راهی و جنگل های تصادفی را در هم ادغام کرده تا سیستم های توصیه گر را ایجاد نماید. نخست ما هم هزینه های دسته بندی نادرست و هم هزینه ی آموزش را در نظر می گیریم. مورد اول برای رفتارهای نادرست توصیه گر پرداخت می شود، در حالیکه مورد دوم برای مشاورده دادن فعالانه به کاربر برای اولویت هایش هزینه می شود. با این هزینه ها، یک مدل تصمیم گیری سه راهی ساخته شده و تنظیمات منطقی برای مقادیر آستانه های مثبت و منفی، α* و β* محاسبه می شود.سپس با ساختن جنگل تصادفی یک احتمال را برای P که مطلوب بنظر میرسد را محاسبه میکنیم. در نهایت b* a*, و P برای پیشنهاد یک رفتار تعیین میشوند. گزینه ی پیشنهاد شده براساس یک هزینه ی متوسط ارزیابی میشود. نتایج ازمایش شده روی مجموعه داده های MovieLens (که به خوبی شناخته شده است) نشان میدهد که جفت –( a*,b*) که از طریق مدل سه راهی تعیین شده بودند, نه تنها در مرحله ی اموزش بلکه در مرحله تست نیز بسیار مطلوب و مورد پسند بودند.
کلمات کلیدی: حساسیت هزینه | جنگل های تصادفی | سیستم های توصیه گر | تصمیم سه راهی
|مقاله ترجمه شده|
Voice data mining for laryngeal pathology assessment
داده کاوی صدا برای ارزیابی آسیب شناسی حنجره-2015
The aim of this study was to evaluate the usefulness of different methods of speech signal analysis in the detection of voice pathologies. Firstly, an initial vector was created consisting of 28 parameters extracted from time, frequency and cepstral domain describing the human voice signal based on the analysis of sustained vowels /a/, /i/ and /u/ all at high, low and normal pitch. Afterwards we used a linear feature extraction technique (principal component analysis), which enabled a reduction in the number of parameters and choose the most effective acoustic features describing the speech signal. We have also performed non-linear data transformation which was calculated using kernel principal components. The results of the presented methods for normal and pathological cases will be revealed and discussed in this paper. The initial and extracted feature vectors were classiﬁed using the k-means clustering and the random forest classiﬁer. We found that reasonably good classiﬁcation accuracies could be achieved by selecting appropriate features. We obtained accuracies of up to 100% for classiﬁcation of healthy versus pathology voice using random forest classiﬁcation for female and male recordings. These results may assist in the feature development of automated detection systems for diagnosis of patients with symp- toms of pathological voice.& 2015 Elsevier Ltd. All rights reserved.
Keywords: Voice pathology detection | Feature selection | PCA | kPCA | Random forest | Acoustic analysis
Predicting overall survivability in comorbidity of cancers: A data mining approach
پیش بینی بقای کلی در اختلال همبودی سرطان: یک رویکرد داده کاوی-2015
Cancer and other chronic diseases have constituted (and will do so at an increasing pace) a signiﬁcant portion of healthcare costs in the United States in recent years. Although prior research has shown that diagnostic and treat- ment recommendations might be altered based on the severity of comorbidities, chronic diseases are still being investigated in isolation from one another in most cases. To illustrate the signiﬁcance of concurrent chronic dis- eases in the course of treatment, this study uses SEER's cancer data to create two comorbid data sets: one for breast and female genital cancers and another for prostate and urinal cancers. Several popular machine learning techniques are then applied to the resultant data sets to build predictive models. Comparison of the results shows that having more information about comorbid conditions of patients can improve models' predictive power, which in turn, can help practitioners make better diagnostic and treatment decisions. Therefore, proper identiﬁ- cation, recording, and use of patients' comorbidity status can potentially lower treatment costs and ease the healthcare related economic challenges.© 2015 Elsevier B.V. All rights reserved.
Keywords: Medical decision making | Comorbidity | Concurrent diseases | Concomitant diseases | Predictive modeling | Random forest