Predicting and explaining corruption across countries: A machine learning approach
پیش بینی و توضیح فساد در سراسر کشور: رویکرد یادگیری ماشینی-2020
In the era of Big Data, Analytics, and Data Science, corruption is still ubiquitous and is perceived as one of the major challenges of modern societies. A large body of academic studies has attempted to identify and explain the potential causes and consequences of corruption, at varying levels of granularity, mostly through theoretical lenses by using correlations and regression-based statistical analyses. The present study approaches the phenomenon from the predictive analytics perspective by employing contemporary machine learning techniques to discover the most important corruption perception predictors based on enriched/enhanced nonlinear models with a high level of predictive accuracy. Specifically, within the multiclass classification modeling setting that is employed herein, the Random Forest (an ensemble-type machine learning algorithm) is found to be the most accurate prediction/classification model, followed by Support Vector Machines and Artificial Neural Networks. From the practical standpoint, the enhanced predictive power of machine learning algorithms coupled with a multi-source database revealed the most relevant corruption-related information, contributing to the related body of knowledge, generating actionable insights for administrator, scholars, citizens, and politicians. The variable importance results indicated that government integrity, property rights, judicial effectiveness, and education index are the most influential factors in defining the corruption level of significance
Keywords: Corruption perception | Machine learning | Predictive modeling | Random forest | Society policies and regulations |Government integrity | Social development
Cognitive computing, Big Data Analytics and data driven industrial marketing
محاسبات شناختی ، تحلیل داده های بزرگ و بازاریابی صنعتی مبتنی بر داده ها-2020
The integration of cognitive computing and big data analytics leads to a new paradigm that enables the application of the most sophisticated advances in information and communication technology (ICT) in business, including industry, business to business, and related decision-making process. The same paradigm will lead to several breakthroughs in the subfield of industrial marketing: a field both promising and extremely challenging. This special issue makes a case that cognitive computing and big data are a source of a new competitive advantage that, if properly embraced, will further consolidate industrial marketing management position in the of core the decision-making process of businesses operating locally and globally. In this vein, the value added of this special issue is twofold. On the one hand, this special issue communicates high quality research on big data analytics and data science as it is applied in industrial marketing management; On the other hand, it proposes a multidisciplinary approach to the study of the design, implementation and provision of sophisticated applications and systems necessary for data-driven industrial marketing decisions.
Big data and the electricity sector in African countries
داده های بزرگ و بخش برق در کشورهای آفریقایی-2020
A number of “disruptive” data science and sensor technologies are creating new opportunities for addressing global challenges. The emergence of abundant computing power made possible the generation and storage of “big data,” enabled the explosion of sensors and networked devices, and powered major breakthroughs in the application of Artificial Intelligence, and Machine Learning techniques. These developments have led to a new trend best described as the seamless interplay between the physical and the digital world—also known as the Fourth Industrial Revolution (Industry 4.0) (Deloitte, 2015). This has paved the way for potential radical transformation of whole sectors and industries across the globe. Perhaps somewhat hidden from the hype surrounding these advancements are the opportunities they present for challenges in emerging and frontier markets, and sub-Saharan African countries in particular.
Setting up standards: A methodological proposal for pediatric Triage machine learning model construction based on clinical outcomes
تنظیم استانداردها: یک پیشنهاد روش شناختی برای ساخت مدل یادگیری ماشین تراشی کودکان براساس نتایج بالینی-2019
Triage is a critical process in hospital emergency departments (ED). Specifically, we consider how to achieve fast and accurate patient Triage in the ED of a pediatric hospital. The goal of this paper is to establish methodological best practices for the application of machine learning (ML) to Triage in pediatric ED, providing a comprehensive comparison of the performance of ML techniques over a large dataset. Our work is among the first attempts in this direction. Following very recent works in the literature, we use the clinical outcome of a case as its label for supervised ML model training, instead of the more uncertain labels provided by experts. The experimental dataset contains the records along 3 years of operation of the hospital ED. It consists of 189,718 patients visits to the hospital. The clinical outcome of 9271 cases (4.98%) wa hospital admission, therefore our dataset is highly class imbalanced. Our reported performance comparison results focus on four ML models: Deep Learning (DL), Random Forest (RF), Naive Bayes (NB) and Support Vector Machines (SVM). Data preprocessing includes class imbalance correction, and case re-labeling. We use different well known metrics to evaluate performance of ML models in three different experimental settings: (a) classification of each case into the standard five Triage urgency levels, (b) discrimination of high versus low case severity according to its clinical outcome, and (c) comparison of the number of patients assigned to each standard Triage urgency level against the Triage rule based expert system currently in use at the hospital. RF achieved greater AUC, accuracy, PPV and specificity than the other models in the dychotomic classification experiments. On the implementation side, our study shows that ML predictive models trained according to clinical outcomes, provide better Triage performance than the current rule based expert system in operation at the hospital.
Keywords: Machine learning | Emergency department | Triage | Data science | Clinical decision support systems
Big Data Analysis and Machine Learning in Intensive Care Units
تجزیه و تحلیل داده های بزرگ و یادگیری ماشین در بخش مراقبت های ویژه-2019
Intensive care is an ideal environment for the use of Big Data Analysis (BDA) andMachine Learning (ML), due to the huge amount of information processed and stored in elec-tronic format in relation to such care. These tools can improve our clinical research capabilitiesand clinical decision making in the future.The present study reviews the foundations of BDA and ML, and explores possible applicationsin our field from a clinical viewpoint. We also suggest potential strategies to optimize thesenew technologies and describe a new kind of hybrid healthcare-data science professional witha linking role between clinicians and data.
KEYWORDSBig Data Analysis | Machine Learning | Artificial intelligence | Secondary electronichealth record dataanalysis
Predictive model of cardiac arrest in smokers using machine learning technique based on Heart Rate Variability parameter
مدل پیش بینی ایست قلبی در افراد سیگاری با استفاده از روش یادگیری ماشین بر اساس پارامتر تنوع ضربان قلب-2019
Cardiac arrest is a severe heart anomaly that results in billions of annual casualties. Smoking is a specific hazard factor for cardiovascular pathology, including coronary heart disease, but data on smoking and heart death not earlier reviewed. The Heart Rate Variability (HRV) parameters used to predict cardiac arrest in smokers using machine learning technique in this paper. Machine learning is a method of computing experience based on automatic learning and enhances performances to increase prognosis. This study intends to compare the performance of logistical regression, decision tree, and random forest model to predict cardiac arrest in smokers. In this paper, a machine learning technique implemented on the dataset received from the data science research group MITU Skillogies Pune, India. To know the patient has a chance of cardiac arrest or not, developed three predictive models as 19 input feature of HRV indices and two output classes. These model evaluated based on their accuracy, precision, sensitivity, specificity, F1 score, and Area under the curve (AUC). The model of logistic regression has achieved an accuracy of 88.50%, precision of 83.11%, the sensitivity of 91.79%, the specificity of 86.03%, F1 score of 0.87, and AUC of 0.88. The decision tree model has arrived with an accuracy of 92.59%, precision of 97.29%, the sensitivity of 90.11%, the specificity of 97.38%, F1 score of 0.93, and AUC of 0.94. The model of the random forest has achieved an accuracy of 93.61%, precision of 94.59%, the sensitivity of 92.11%, the specificity of 95.03%, F1 score of 0.93 and AUC of 0.95. The random forest model achieved the best accuracy classification, followed by the decision tree, and logistic regression shows the lowest classification accuracy.
Keywords: Cardiac arrest | Heart Rate Variability | Machine learning | Accuracy | Precision | Area under the curve
Weaving seams with data: Conceptualizing City APIs as elements of infrastructures
بافتن با داده ها: اندیشه سازی رابط های برنامه های کاربردی (API) شهری به عنوان عناصر زیرساخت-2019
This article addresses the role of application programming interfaces (APIs) for integrating data sources in the context of smart cities and communities. On top of the built infrastructures in cities, application programming interfaces allow to weave new kinds of seams from static and dynamic data sources into the urban fabric. Contributing to debates about ‘‘urban informatics’’ and the governance of urban information infrastructures, this article provides a technically informed and critically grounded approach to evaluating APIs as crucial but often overlooked elements within these infrastructures. The conceptualization of what we term City APIs is informed by three perspectives: In the first part, we review established criticisms of proprietary social media APIs and their crucial function in current web architectures. In the second part, we discuss how the design process of APIs defines conventions of data exchanges that also reflect negotiations between API producers and API consumers about affordances and mental models of the underlying computer systems involved. In the third part, we present recent urban data innovation initiatives, especially CitySDK and OrganiCity, to underline the centrality of API design and governance for new kinds of civic and commercial services developed within and for cities. By bridging the fields of criticism, design, and implementation, we argue that City APIs as elements of infrastructures reveal how urban renewal processes become crucial sites of socio-political contestation between data science, technological development, urban management, and civic participation.
Keywords: Application Programming Interface (API) | infrastructure | Internet of Things (IoT) | interface design | social urban data | smart city
‘‘You Social Scientists Love Mind Games’’: Experimenting in the ‘‘divide’’ between data science and critical algorithm studies
دانشمندان اجتماعی شما بازی های ذهنی را دوست دارند: آزمایش در "تقسیم" بین علم داده ها و مطالعات الگوریتم بحرانی-2019
In recent years, many qualitative sociologists, anthropologists, and social theorists have critiqued the use of algorithms and other automated processes involved in data science on both epistemological and political grounds. Yet, it has proven difficult to bring these important insights into the practice of data science itself. We suggest that part of this problem has to do with under-examined or unacknowledged assumptions about the relationship between the two fields—ideas about how data science and its critics can and should relate. Inspired by recent work in Science and Technology Studies on interventions, we attempted to stage an encounter in which practicing data scientists were asked to analyze a corpus of critical social science literature about their work, using tools of textual analysis such as co-word and topic modelling. The idea was to provoke discussion both about the content of these texts and the possible limits of such analyses. In this commentary, we reflect on the planning stages of the experiment and how responses to the exercise, from both data scientists and qualitative social scientists, revealed some of the tensions and interactions between the normative positions of the different fields. We argue for further studies which can help us understand what these interdisciplinary tensions turn on—which do not paper over them but also do not take them as given.
Keywords: Algorithms | data science | intervention | reflexivity | interdisciplinarity | Science and Technology Studies
A review on big data based parallel and distributed approaches of pattern mining
بررسی رویکردهای موازی و توزیع شده مبتنی بر داده های بزرگ مبتنی بر کاوش الگو-2019
Pattern mining is a fundamental technique of data mining to discover interesting correlations in the data set. There are several variations of pattern mining, such as frequent itemset mining, sequence mining, and high utility itemset mining. High utility itemset mining is an emerging data science task, aims to extract knowledge based on a domain objective. The utility of a pattern shows its effectiveness or benefit that can be calculated based on user priority and domain-specific understanding. The sequential pattern mining (SPM) issue is much examined and expanded in various directions. Sequential pattern mining enumerates sequential patterns in a sequence data collection. Researchers have paid more attention in recent years to frequent pattern mining over uncertain transaction dataset. In recent years, mining itemsets in big data have received extensive attention based on the Apache Hadoop and Spark framework. This paper seeks to give a broad overview of the distinct approaches to pattern mining in the Big Data domain. Initially, we investigate the problem involved with pattern mining approaches and associated techniques such as Apache Hadoop, Apache Spark, parallel and distributed processing. Then we examine major developments in parallel, distributed, and scalable pattern mining, analyze them in the big data perspective and identify difficulties in designing the algorithms. In particular, we study four varieties of itemsets mining, i.e., parallel frequent itemsets mining, high utility itemset mining, sequential patterns mining and frequent itemset mining in uncertain data. This paper concludes with a discussion of open issues and opportunity. It also provides direction for further enhancement of existing approaches.
Keywords: Big data | FIM | HUIM | PSPM | Uncertain data mining
Data mining new energy materials from structure databases
داده کاوی مواد انرژی جدید از ساختار بانکهای اطلاعاتی -2019
New energy materials that act as clean power sources and data science are developing rapidly in the past decades and the advancement of the two research areas have significantly benefited the development of each other. At the meantime, structural information of materials have been obtained and stored in various structure databases, such as the Cambridge Structure Database (CSD) and the Inorganic Crystal Structure Database (ICSD). Researchers have developed various structure-property relationships of the energy materials, which could be applied to screen the potential suitable materials from structure databases; this has become an efficient route to explore and design new energy materials. In this article, we review recent progresses on the data mining study of new energy materials based on structure databases such as CSD and ICSD, in the context of dye-sensitized solar cells and perovskite solar cells, and also include other energy systems such as water splitting systems, lithium batteries, thermoelectric devices and gas adsorbent materials. The structure descriptors that are more fundamental in the data mining procedure employing the structure-properties relationships are focused; the structural descriptors are complementary to the quantum descriptors and are efficient in the materials design process. We believe that with the successful formulation of more advanced and case-by-case structure-property relationships of energy materials, many new energy materials could be efficiently identified with much lower cost and shorter design period via the data mining process.
Keywords: Solar cell | Energy materials | Structure database | Data mining | Materials design