Big Data Analysis and Machine Learning in Intensive Care Units
تجزیه و تحلیل داده های بزرگ و یادگیری ماشین در بخش مراقبت های ویژه-2019
Intensive care is an ideal environment for the use of Big Data Analysis (BDA) andMachine Learning (ML), due to the huge amount of information processed and stored in elec-tronic format in relation to such care. These tools can improve our clinical research capabilitiesand clinical decision making in the future.The present study reviews the foundations of BDA and ML, and explores possible applicationsin our field from a clinical viewpoint. We also suggest potential strategies to optimize thesenew technologies and describe a new kind of hybrid healthcare-data science professional witha linking role between clinicians and data.
KEYWORDSBig Data Analysis | Machine Learning | Artificial intelligence | Secondary electronichealth record dataanalysis
Predictive model of cardiac arrest in smokers using machine learning technique based on Heart Rate Variability parameter
مدل پیش بینی ایست قلبی در افراد سیگاری با استفاده از روش یادگیری ماشین بر اساس پارامتر تنوع ضربان قلب-2019
Cardiac arrest is a severe heart anomaly that results in billions of annual casualties. Smoking is a specific hazard factor for cardiovascular pathology, including coronary heart disease, but data on smoking and heart death not earlier reviewed. The Heart Rate Variability (HRV) parameters used to predict cardiac arrest in smokers using machine learning technique in this paper. Machine learning is a method of computing experience based on automatic learning and enhances performances to increase prognosis. This study intends to compare the performance of logistical regression, decision tree, and random forest model to predict cardiac arrest in smokers. In this paper, a machine learning technique implemented on the dataset received from the data science research group MITU Skillogies Pune, India. To know the patient has a chance of cardiac arrest or not, developed three predictive models as 19 input feature of HRV indices and two output classes. These model evaluated based on their accuracy, precision, sensitivity, specificity, F1 score, and Area under the curve (AUC). The model of logistic regression has achieved an accuracy of 88.50%, precision of 83.11%, the sensitivity of 91.79%, the specificity of 86.03%, F1 score of 0.87, and AUC of 0.88. The decision tree model has arrived with an accuracy of 92.59%, precision of 97.29%, the sensitivity of 90.11%, the specificity of 97.38%, F1 score of 0.93, and AUC of 0.94. The model of the random forest has achieved an accuracy of 93.61%, precision of 94.59%, the sensitivity of 92.11%, the specificity of 95.03%, F1 score of 0.93 and AUC of 0.95. The random forest model achieved the best accuracy classification, followed by the decision tree, and logistic regression shows the lowest classification accuracy.
Keywords: Cardiac arrest | Heart Rate Variability | Machine learning | Accuracy | Precision | Area under the curve
Weaving seams with data: Conceptualizing City APIs as elements of infrastructures
بافتن با داده ها: اندیشه سازی رابط های برنامه های کاربردی (API) شهری به عنوان عناصر زیرساخت-2019
This article addresses the role of application programming interfaces (APIs) for integrating data sources in the context of smart cities and communities. On top of the built infrastructures in cities, application programming interfaces allow to weave new kinds of seams from static and dynamic data sources into the urban fabric. Contributing to debates about ‘‘urban informatics’’ and the governance of urban information infrastructures, this article provides a technically informed and critically grounded approach to evaluating APIs as crucial but often overlooked elements within these infrastructures. The conceptualization of what we term City APIs is informed by three perspectives: In the first part, we review established criticisms of proprietary social media APIs and their crucial function in current web architectures. In the second part, we discuss how the design process of APIs defines conventions of data exchanges that also reflect negotiations between API producers and API consumers about affordances and mental models of the underlying computer systems involved. In the third part, we present recent urban data innovation initiatives, especially CitySDK and OrganiCity, to underline the centrality of API design and governance for new kinds of civic and commercial services developed within and for cities. By bridging the fields of criticism, design, and implementation, we argue that City APIs as elements of infrastructures reveal how urban renewal processes become crucial sites of socio-political contestation between data science, technological development, urban management, and civic participation.
Keywords: Application Programming Interface (API) | infrastructure | Internet of Things (IoT) | interface design | social urban data | smart city
‘‘You Social Scientists Love Mind Games’’: Experimenting in the ‘‘divide’’ between data science and critical algorithm studies
دانشمندان اجتماعی شما بازی های ذهنی را دوست دارند: آزمایش در "تقسیم" بین علم داده ها و مطالعات الگوریتم بحرانی-2019
In recent years, many qualitative sociologists, anthropologists, and social theorists have critiqued the use of algorithms and other automated processes involved in data science on both epistemological and political grounds. Yet, it has proven difficult to bring these important insights into the practice of data science itself. We suggest that part of this problem has to do with under-examined or unacknowledged assumptions about the relationship between the two fields—ideas about how data science and its critics can and should relate. Inspired by recent work in Science and Technology Studies on interventions, we attempted to stage an encounter in which practicing data scientists were asked to analyze a corpus of critical social science literature about their work, using tools of textual analysis such as co-word and topic modelling. The idea was to provoke discussion both about the content of these texts and the possible limits of such analyses. In this commentary, we reflect on the planning stages of the experiment and how responses to the exercise, from both data scientists and qualitative social scientists, revealed some of the tensions and interactions between the normative positions of the different fields. We argue for further studies which can help us understand what these interdisciplinary tensions turn on—which do not paper over them but also do not take them as given.
Keywords: Algorithms | data science | intervention | reflexivity | interdisciplinarity | Science and Technology Studies
What’s in the box?! Towards explainable machine learning applied to non-residential building smart meter classification
جعبه چیست؟ به سمت کاربرد یادگیری ماشین قابل توضیح برای طبقه بندی کنتورهای هوشمند ساختمان غیر مسکونی-2019
Feature engineering and data-driven classification models are at the forefront of analysis of large temporal sensor data from the built environment. In previous effort s, temporal features were engineered from the whole building hourly electrical meter data from 507 non-residential buildings. These features fall within the three general categories of statistics, model, and pattern-based and can be used to identify various behavior in the structure of the whole building electrical meter data. In this paper, a deeper investiga- tion is made of exactly what types of behavior are most important in the context of two classification scenarios: the primary use of a building and the level of performance the building has when compared to its peers. The highly comparative time-series analysis (hctsa) toolkit is used to analyze the most im- portant temporal features for the classification of various building performance attributes. In the first analysis, a comparison is made to distinguish the behavior between university dormitories (70 buildings) and laboratories (95 buildings) as an example of interpreting the classification of the primary-use-type of a building. In the second analysis, a comparison of buildings with high (165 buildings) versus low (169 buildings) consumption is used to extract and understand the behavior that indicates the level of the energy performance of a building. These two case study examples provide a foundation for further ex- plainable machine learning techniques in both classification and prediction as applied to buildings. This effort is the first example of machine learning with an explicit focus on the interpretability of classifica- tion for smart meter data from non-residential buildings.
Keywords: Interpretable machine learning | Explainable machine learning | Building performance analysis | Performance classification | Energy efficiency | Smart meter | Temporal feature engineering | Load clustering | Data science | Customer segmentation | Time-series analysis
Setting up standards: A methodological proposal for pediatric Triage machine learning model construction based on clinical outcomes
تنظیم استانداردها: یک پیشنهاد روش شناختی برای ساخت مدل یادگیری ماشین تراشی کودکان براساس نتایج بالینی-2019
Triage is a critical process in hospital emergency departments (ED). Specifically, we consider how to achieve fast and accurate patient Triage in the ED of a pediatric hospital. The goal of this paper is to establish methodological best practices for the application of machine learning (ML) to Triage in pediatric ED, providing a comprehensive comparison of the performance of ML techniques over a large dataset. Our work is among the first attempts in this direction. Following very recent works in the literature, we use the clinical outcome of a case as its label for supervised ML model training, instead of the more uncertain labels provided by experts. The experimental dataset contains the records along 3 years of operation of the hospital ED. It consists of 189,718 patients visits to the hospital. The clinical outcome of 9271 cases (4.98%) wa hospital admission, therefore our dataset is highly class imbalanced. Our reported performance comparison results focus on four ML models: Deep Learning (DL), Random Forest (RF), Naive Bayes (NB) and Support Vector Machines (SVM). Data preprocessing includes class imbalance correction, and case re-labeling. We use different well known metrics to evaluate performance of ML models in three different experimental settings: (a) classification of each case into the standard five Triage urgency levels, (b) discrimination of high versus low case severity according to its clinical outcome, and (c) comparison of the number of patients assigned to each standard Triage urgency level against the Triage rule based expert system currently in use at the hospital. RF achieved greater AUC, accuracy, PPV and specificity than the other models in the dychotomic classification experiments. On the implementation side, our study shows that ML predictive models trained according to clinical outcomes, provide better Triage performance than the current rule based expert system in operation at the hospital.
Keywords: Machine learning | Emergency department | Triage | Data science | Clinical decision support systems
Toward modeling and optimization of features selection in Big Data based social Internet of Things
به سوی مدل سازی و بهینه سازی انتخاب ویژگی ها در داده های بزرگ مبتنی بر اینترنت اشیا اجتماعی-2018
The growing gap between users and the Big Data analytics requires innovative tools that address the challenges faced by big data volume, variety, and velocity. Therefore, it becomes computationally inefficient to analyze and select features from such massive volume of data. Moreover, advancements in the field of Big Data application and data science poses additional challenges, where a selection of appropriate features and High-Performance Computing (HPC) solution has become a key issue and has attracted attention in recent years. Therefore, keeping in view the needs above, there is a requirement for a system that can efficiently select features and analyze a stream of Big Data within their requirements. Hence, this paper presents a system architecture that selects features by using Artificial Bee Colony (ABC). Moreover, a Kalman filter is used in Hadoop ecosystem that is used for removal of noise. Furthermore, traditional MapReduce with ABC is used that enhance the processing efficiency. Moreover, a complete four-tier architecture is also proposed that efficiently aggregate the data, eliminate unnecessary data, and analyze the data by the proposed Hadoop-based ABC algorithm. To check the efficiency of the proposed algorithms exploited in the proposed system architecture, we have implemented our proposed system using Hadoop and MapReduce with the ABC algorithm. ABC algorithm is used to select features, whereas, MapReduce is supported by a parallel algorithm that efficiently processes a huge volume of data sets. The system is implemented using MapReduce tool at the top of the Hadoop parallel nodes with near real time. Moreover, the proposed system is compared with Swarm approaches and is evaluated regarding efficiency, accuracy and throughput by using ten different data sets. The results show that the proposed system is more scalable and efficient in selecting features.
Keywords: SIoT ، Big Data ، ABC algorithm، Feature selection
The Rise of Big Data in Oncology
ظهور داده های بزرگ در انکولوژی-2018
OBJECTIVES: To describe big data and data science in the context of oncology nursing care. DATA SOURCES: Peer-reviewed and lay publications. CONCLUSION: The rapid expansion of real-world evidence from sources such as the electronic health record, genomic sequencing, administrative claims and other data sources has outstripped the ability of clinicians and researchers to manually review and analyze it. To promote high-quality, high-value cancer care, big data platforms must be constructed from standardized data sources to support extraction of meaningful, comparable insights. IMPLICATIONS FOR NURSING PRACTICE: Nurses must advocate for the use of stan dardized vocabularies and common data elements that represent terms and concepts that are meaningful to patient care. he term “big data” first appeared in the literature in 1997 by researchers at NASA as they described the challenges to store the volume of information generated as a result of a new, data-intensive type of computational work.1 In 2008, a white paper entitled “Big-Data Computing: Creating revolutionary breakthroughs in commerce, science and society,” highlighted the rapid integration of data-driven strategies across settings ranging from Wal-Mart’s (then) 4 petabyte (4000 trillion bytes) data warehouse to the 15 petabytes of data projected to be generated annually by the Large Hadron Collider particle accelerator project,2 and is credited with widespread adoption of the term.3
KEY WORDS: electronic health records, meaningful use, artificial intelligence, neoplasms
Insights into Antidepressant Prescribing Using Open Health Data
بینشی به تجویز ضد افسردگی با استفاده از داده های باز بهداشتی-2018
The growth of big data is transforming many economic sectors, including the medical and healthcare sector. Despite this, research into the practical application of data analytics to the development of health policy is still limited. In this study we examine how data science and machine learning methods can be applied to a variety of open health datasets, including GP prescribing data, disease prevalence data and economic deprivation data. This paper discusses the context of mental health and antidepressant prescribing in Northern Ireland and highlights its importance as a public policy issue. A hypothesis is proposed, suggesting that the link between antidepressant usage and economic deprivation is mediated by depression prevalence. An analysis of various heterogeneous open datasets is used to test this hypothesis. A description of the methodology is provided, including the open health datasets under investigation and an explanation of the data processing pipeline. Correlations between key variables and several different clustering analyses are presented. Evidence is provided which suggests that the depression prevalence hypothesis is flawed. Clusters of GP practices based on prescribing behaviour and disease prevalence are described and key characteristics are identified and discussed. Possible policy implications are explored and opportunities for future research are identified.
Keywords: Health policy ، Data analytics ، Big data ، Prescribing ، Prevalence ، Machine learning ،Deprivation
Power systems big data analytics: An assessment of paradigm shift barriers and prospects
سیستم های قدرت تجزیه و تحلیل داده های بزرگ: ارزیابی موانع تغییر پارادایم و چشم انداز-2018
Electric power systems are taking drastic advances in deployment of information and communication technologies; numerous new measurement devices are installed in forms of advanced metering infras tructure, distributed energy resources (DER) monitoring systems, high frequency synchronized wide area awareness systems that with great speed are generating immense volume of energy data. However, it is still questioned that whether the today’s power system data, the structures and the tools being developed are indeed aligned with the pillars of the big data science. Further, several requirements and especial features of power systems and energy big data call for customized methods and platforms. This paper provides an assessment of the distinguished aspects in big data analytics developments in the domain of power systems. We perform several taxonomy of the existing and the missing elements in the structures and methods associated with big data analytics in power systems. We also provide a holistic outline, classifications, and concise discussions on the technical approaches, research opportunities, and application areas for energy big data analytics.
Keywords: Energy , Big data analytics , Internet of energy , Smart grid