Big Data Everywhere
داده های بزرگ در همه جا-2020
Big Data and machine-learning approaches to analytics are an important new frontier in laboratory medicine. Direct-to-consumer (DTC) testing raises specific challenges in applying these new tools of data analytics. Because DTC data are not centralized by default, there is a need for data repositories to aggregate these values to develop appropriate predictive models. The lack of a default linkage between DTC results and medical outcomes data also limits the ability to mine these data for predictive modeling of disease risk. Issues of standardization and harmonization, which are a significant issue across all laboratory medicine, may be particularly difficult to correct in aggregated sets of DTC data
KEYWORDS : Big Data | Laboratory medicine | Machine learning | Direct-to-consumer testing | DTC | Harmonization
A grid-quadtree model selection method for support vector machines
روش انتخاب مدل شبکه چهارگوش برای ماشینهای بردار پشتیبانی-2020
In this paper, a new model selection approach for Support Vector Machine (SVM), which integrates the quadtree technique with the grid search, denominated grid-quadtree (GQ) is proposed. The developed method is the first in the literature to apply the quadtree for the SVM parameters optimization. The SVM is a machine-learning technique for pattern recognition whose performance relies on its parameters determination. Thus, the model selection problem for SVM is an important field of study and requires expert and intelligent systems to solve it. Real classification data sets involve a huge number of instances and features, and the greater is the training data set dimension, the larger is the cost of a recognition system. The grid search (GS) is the most popular and the simplest method to select parameters for SVM. However, it is time-consuming, which limits its application for big-sized problems. With this in mind, the main idea of this research is to apply the quadtree technique to the GS to make it faster. Hence, this may lower computational time cost for solving problems such as bio-identification, bank credit risk and cancer detection. Based on the asymptotic behaviors of the SVM, it was noticeably observed that the quadtree is able to avoid the GS full search space evaluation. As a consequence, the GQ carries out fewer parameters analysis, solving the same problem with much more efficiency. To assess the GQ performance, ten classification benchmark data set were used. The obtained results were compared with the ones of the traditional GS. The outcomes showed that the GQ is able to find parameters that are as good as the GS ones, executing 78.8124% to 85.8415% fewer operations. This research points out that the adoption of quadtree expressively reduces the computational time of the original GS, making it much more efficient to deal with high dimensional and large data sets.
Keywords: Support vector machine | Parameter determination | Quadtree | Grid search
Forecasting client retention — A machine-learning approach
پیش بینی حفظ مشتری - یک رویکرد یادگیری ماشین-2020
In the age of big data, companies store practically all data on any client transaction. Making use of this data is commonly done with machine-learning techniques so as to turn it into information that can be used to drive business decisions. Our interest lies in using data on prepaid unitary services in a business-to-business setting to forecast client retention: whether a particular client is at risk of being lost before they cease being clients. The purpose of such a forecast is to provide the company with an opportunity to reach out to such clients as an effort to ensure their retention. We work with monthly records of client transactions: each client is represented as a series of purchases and consumptions. We vary (1) the length of the time period used to make the forecast, (2) the length of a period of inactivity after which a client is assumed to be lost, and (3) how far in advance the forecast is made. Our experimental work finds that current machine-learning techniques able to adequately predict, well in advance, which clients will be lost. This knowledge permits a company to focus marketing efforts on such clients as early as three months in advance.
Keywords: Client retention | Sales forecasting | Machine learning | Prepaid unitary services
Machine-learning based error prediction approach for coarse-grid Computational Fluid Dynamics (CG-CFD)
رویکرد پیش بینی خطا مبتنی بر یادگیری ماشین برای دینامیک سیالات محاسباتی درشت-شبکه (CG-CFD)-2020
Computational Fluid Dynamics (CFD) is one of the modeling approaches essential to identifying the parameters that affect Containment Thermal Hydraulics (CTH) phenomena. While the CFD approach can capture the multidimensional behavior of CTH phenomena, its computational cost is high when modeling complex accident scenarios. To mitigate this expense, we propose reliance on coarse-grid CFD (CG-CFD). Coarsening the computational grid increases the grid-induced error thus requiring a novel approach that will produce a surrogate model predicting the distribution of the CG-CFD local error and correcting the fluid-flow variables. Given sufficiently fine-mesh simulations, a surrogate model can be trained to predict the CG-CFD local errors as a function of the coarse-grid local flow features. The surrogate model is constructed using Machine Learning (ML) regression algorithms. Two of the widely used ML regression algorithms were tested: Artificial Neural Network (ANN) and Random Forest (RF). The proposed CG-CFD method is illustrated with a three-dimensional turbulent flow inside a lid-driven cavity. We studied a set of scenarios to investigate the capability of the surrogate model to interpolate and extrapolate outside the training data range. The proposed method has proven capable of correcting the coarse-grid results and obtaining reasonable predictions for new cases (of different Reynolds number, different grid sizes, or larger geometries). Based on the investigated cases, we found this novel method maximizes the benefit of the available data and shows potential for a good predictive capability.
Keywords: Coarse grid (mesh) | CFD | Machine learning | Discretization error | Big data | Artificial neural network | Random forest | Data-driven
Double Q-PID algorithm for mobile robot control
الگوریتم دابل Q-PID برای کنترل ربات های موبایل-2019
Many expert systems have been developed for self-adaptive PID controllers of mobile robots. However, the high computational requirements of the expert systems layers, developed for the tuning of the PID controllers, still require previous expert knowledge and high efficiency in algorithmic and software exe- cution for real-time applications. To address these problems, in this paper we propose an expert agent- based system, based on a reinforcement learning agent, for self-adapting multiple low-level PID con- trollers in mobile robots. For the formulation of the artificial expert agent, we develop an incremental model-free algorithm version of the double Q -Learning algorithm for fast on-line adaptation of multi- ple low-level PID controllers. Fast learning and high on-line adaptability of the artificial expert agent is achieved by means of a proposed incremental active-learning exploration-exploitation procedure, for a non-uniform state space exploration, along with an experience replay mechanism for multiple value functions updates in the double Q -learning algorithm. A comprehensive comparative simulation study and experiments in a real mobile robot demonstrate the high performance of the proposed algorithm for a real-time simultaneous tuning of multiple adaptive low-level PID controllers of mobile robots in real world conditions.
Keywords: Reinforcement learning | Double Q -learning | Incremental learning | Double Q-PID | Mobile robots | Multi-platforms
Quantitative EEG reactivity and machine learning for prognostication in hypoxic-ischemic brain injury
واکنش کمی EEG و یادگیری ماشین برای پیش آگهی در آسیب مغزی هیپوکسیک-ایسکمیک-2019
Objective: Electroencephalogram (EEG) reactivity is a robust predictor of neurological recovery after cardiac arrest, however interrater-agreement among electroencephalographers is limited. We sought to evaluate the performance of machine learning methods using EEG reactivity data to predict good longterm outcomes in hypoxic-ischemic brain injury. Methods: We retrospectively reviewed clinical and EEG data of comatose cardiac arrest subjects. Electroencephalogram reactivity was tested within 72 h from cardiac arrest using sound and pain stimuli. A Quantitative EEG (QEEG) reactivity method evaluated changes in QEEG features (EEG spectra, entropy, and frequency features) during the 10 s before and after each stimulation. Good outcome was defined as Cerebral Performance Category of 1–2 at six months. Performance of a random forest classifier was compared against a penalized general linear model (GLM) and expert electroencephalographer review. Results: Fifty subjects were included and sixteen (32%) had good outcome. Both QEEG reactivity methods had comparable performance to expert EEG reactivity assessment for good outcome prediction (mean AUC 0.8 for random forest vs. 0.69 for GLM vs. 0.69 for expert review, respectively; p non-significant). Conclusions: Machine-learning models utilizing quantitative EEG reactivity data can predict long-term outcome after cardiac arrest. Significance: A quantitative approach to EEG reactivity assessment may support prognostication in cardiac arrest.
Keywords: EEG reactivity | Quantitative EEG | Hypoxic-ischemic encephalopathy | Cardiac arrest | Machine learning
A machine-learning-based prediction model of fistula formation after interstitial brachytherapy for locally advanced gynecological malignancies
یک مدل پیش بینی مبتنی بر یادگیری ماشینی از تشکیل فیستول پس از براکی تراپی بینابینی برای بدخیمی های ژنتیکی بومی محلی-2019
PURPOSE: External beam radiotherapy combined with interstitial brachytherapy is commonly used to treat patients with bulky, advanced gynecologic cancer. However, the high radiation dose needed to control the tumor may result in fistula development. There is a clinical need to identify patients at high risk for fistula formation such that treatment may be managed to prevent this toxic side effect. This work aims to develop a fistula prediction model framework using machine learning based on patient, tumor, and treatment features. METHODS AND MATERIALS: This retrospective study included 35 patients treated at our institution using interstitial brachytherapy for various gynecological malignancies. Five patients developed rectovaginal fistula and two developed both rectovaginal and vesicovaginal fistula. For each patient, 31 clinical features of multiple data types were collected to develop a fistula prediction framework. A nonlinear support vector machine was used to build the prediction model. Sequential backward feature selection and sequential floating backward feature selection methods were used to determine optimal feature sets. To overcome data imbalance issues, the synthetic minority oversampling technique was used to generate synthetic fistula cases for model training. RESULTS: Seven mixed data features were selected by both sequential backward selection and sequential floating backward selection methods. Our prediction model using these features achieved a high prediction accuracy, that is, 0.904 area under the curve, 97.1% sensitivity, and 88.5% specificity. CONCLUSIONS: A machine-learningebased prediction model of fistula formation has been developed for patients with advanced gynecological malignancies treated using interstitial brachytherapy. This model may be clinically impactful pending refinement and validation in a larger series.
Keywords: Machine learning | Support vector machine | Interstitial brachytherapy | Gynecologic cancer
Improving Workflow Efficiency for Mammography Using Machine Learning
بهبود بهره وری گردش کار برای ماموگرافی با استفاده از یادگیری ماشین-2019
Objective: The aim of this study was to determine whether machine learning could reduce the number of mammograms the radiologist must read by using a machine-learning classifier to correctly identify normal mammograms and to select the uncertain and abnormal examinations for radiological interpretation. Methods: Mammograms in a research data set from over 7,000 women who were recalled for assessment at six UK National Health Service Breast Screening Program centers were used. A convolutional neural network in conjunction with multitask learning was used to extract imaging features from mammograms that mimic the radiological assessment provided by a radiologist, the patient’s nonimaging features, and pathology outcomes. A deep neural network was then used to concatenate and fuse multiple mammogram views to predict both a diagnosis and a recommendation of whether or not additional radiological assessment was needed. Results: Ten-fold cross-validation was used on 2,000 randomly selected patients from the data set; the remainder of the data set was used for convolutional neural network training. While maintaining an acceptable negative predictive value of 0.99, the proposed model was able to identify 34% (95% confidence interval, 25%-43%) and 91% (95% confidence interval: 88%-94%) of the negative mammograms for test sets with a cancer prevalence of 15% and 1%, respectively. Conclusion: Machine learning was leveraged to successfully reduce the number of normal mammograms that radiologists need to read without degrading diagnostic accuracy.
Key Words: Breast cancer | deep learning | machine learning | mammography | radiology
The Application of Machine Learning to Quality Improvement Through the Lens of the Radiology Value Network
کاربرد یادگیری ماشین برای بهبود کیفیت از طریق لنز شبکه ارزش رادیولوژی-2019
Recent advances in machine learning and artificial intelligence offer promising applications to radiology quality improvement initiatives as they relate to the radiology value network. Coordination within the interlocking web of systems, events, and stakeholders in the radiology value network may be mitigated though standardization, automation, and a focus on workflow efficiency. In this article the authors present applications of these various strategies via use cases for quality improvement projects at different points in the radiology value network. In addition, the authors discuss opportunities for machine-learning applications in data aggregation as opposed to traditional applications in data extraction.
Key Words: Machine learning | artificial intelligence | radiology quality improvement | radiology value network | data aggregation
An Evaluation of Machine Learning Approaches for the Prediction of Essential Genes in Eukaryotes Using Protein Sequence-Derived Features
ارزیابی رویکردهای یادگیری ماشینی برای پیش بینی ژنهای ضروری در یوکاریوتها با استفاده از ویژگیهای حاصل از توالی پروتئین-2019
The availability of whole-genome sequences and associated multi-omics data sets, combined with advances in gene knockout and knockdown methods, has enabled large-scale annotation and exploration of gene and protein functions in eukaryotes. Knowing which genes are essential for the survival of eukaryotic organisms is paramount for an understanding of the basic mechanisms of life, and could assist in identifying intervention targets in eukaryotic pathogens and cancer. Here, we studied essential gene orthologs among selected species of eukaryotes, and then employed a systematic machine-learning approach, using protein sequence-derived features and selection procedures, to investigate essential gene predictions within and among species. We showed that the numbers of essential gene orthologs comprise small fractions when comparedwith the total number of orthologs among the eukaryotic species studied. In addition, we demonstrated that machine-learning models trainedwith subsets of essentiality-related data performed better than random guessing of gene essentiality for a particular species. Consistent with our gene ortholog analysis, the predictions of essential genes among multiple (including distantly-related) species is possible, yet challenging, suggesting that most essential genes are unique to a species. The presentwork provides a foundation for the expansion of genome-wide essentiality investigations in eukaryotes using machine learning approaches.
Keywords: Machine-learning | Essential genes | Essentiality prediction | Eukaryotes