Big data analytics in health sector: Theoretical framework, techniques and prospects
تجزیه و تحلیل داده های بزرگ در بخش بهداشت و درمان: چارچوب نظری ، تکنیک ها و چشم انداز-2020
Clinicians, healthcare providers-suppliers, policy makers and patients are experiencing exciting opportunities in light of new information deriving from the analysis of big data sets, a capability that has emerged in the last decades. Due to the rapid increase of publications in the healthcare industry, we have conducted a structured review regarding healthcare big data analytics. With reference to the resource-based view theory we focus on how big data resources are utilised to create organization values/capabilities, and through content analysis of the selected publications we discuss: the classification of big data types related to healthcare, the associate analysis techniques, the created value for stakeholders, the platforms and tools for handling big health data and future aspects in the field. We present a number of pragmatic examples to show how the advances in healthcare were made possible. We believe that the findings of this review are stimulating and provide valuable information to practitioners, policy makers and researchers while presenting them with certain paths for future research.
Keywords: Big data analytics | Health-Medicine | Decision-making | Machine learning | Operations research (OR) techniques
Big Data Everywhere
داده های بزرگ در همه جا-2020
Big Data and machine-learning approaches to analytics are an important new frontier in laboratory medicine. Direct-to-consumer (DTC) testing raises specific challenges in applying these new tools of data analytics. Because DTC data are not centralized by default, there is a need for data repositories to aggregate these values to develop appropriate predictive models. The lack of a default linkage between DTC results and medical outcomes data also limits the ability to mine these data for predictive modeling of disease risk. Issues of standardization and harmonization, which are a significant issue across all laboratory medicine, may be particularly difficult to correct in aggregated sets of DTC data
KEYWORDS : Big Data | Laboratory medicine | Machine learning | Direct-to-consumer testing | DTC | Harmonization
A full-disk image standardization of the chromosphere observation at Huairou Solar Observing Station
استاندارد سازی دیسک کامل تصویر از مشاهده کروموسفر در ایستگاه مشاهده خورشیدی Huairou-2020
Observations of local features in the solar chromosphere began in 1992 at Huairou Solar Observing Station, while the full-disk chromosphere observations were carried out since 2000. In order to facilitate researchers to use full-disk chromosphere observation, algorithms have been developed to standardize the full-disk images. The algorithms include the determination of the center of the image and size standardization, geometric correction and intensity normalization. The solar limb of each image is determined from a histogram analysis of its intensity distribution. The center and radius are then calculated and the image is corrected for geometric distortions. Images are re-scaled to have a fixed radius of 500 pixels and centered within the 1024 1024 frame. Finally, large-scale variations in intensity, such as limb-darkening, are removed using a median filter. This paper provides a detailed description of these algorithms, and a summary of the properties of these chromospheric full-disk observations to be used for further scientific investigations.
Keywords: Chromosphere | Data standardization | Physical parameters | Big data
Guest satisfaction & dissatisfaction in luxury hotels: An application of big data
رضایت و نارضایتی مهمان در هتل های لوکس: استفاده از داده های بزرگ-2020
In order to understand the pivotal attributes of luxury hotel service in Malaysia, this study analyses big data in the form of online reviews, as available in TripAdvisor. The content analysis, which was performed using the word frequency analysis has revealed that the main themes of luxury hotel service quality include hotel-related attributes, room-related attributes, staff-related attributes, travel-related attributes, and possible outcomes. The critical incident technique has also been performed to examine the antecedents and outcomes of hotel guests’ satisfaction and dissatisfaction. In this study, quality of rooms and interaction with employees have been determined as major drivers of customers’ word of mouth and revisit intentions. This study contributes with an empirical analysis of particular features of textual context and discussion of the concept of luxury service in the developing countries has been largely neglected so far.
Keywords: Luxury hotel service | Online review | Service quality | Satisfaction | Dissatisfaction | Post-purchase behavior
Big data and stream processing platforms for Industry 4:0 requirements mapping for a predictive maintenance use case
چهارچوب داده های بزرگ و پردازش جریان برای نگاشت الزامات صنعت 4:0 برای یک مورد استفاده نگهداری پیشگویانه-2020
Industry 4.0 is considered to be the fourth industrial revolution introducing a new paradigm of digital, autonomous, and decentralized control for manufacturing systems. Two key objectives for Industry 4.0 applications are to guarantee maximum uptime throughout the production chain and to increase productivity while reducing production cost. As the data-driven economy evolves, enterprises have started to utilize big data techniques to achieve these objectives. Big data and IoT technologies are playing a pivotal role in building data-oriented applications such as predictive maintenance. In this paper, we use a systematic methodology to review the strengths and weaknesses of existing opensource technologies for big data and stream processing to establish their usage for Industry 4.0 use cases. We identified a set of requirements for the two selected use cases of predictive maintenance in the areas of rail transportation and wind energy. We conducted a breadth-first mapping of predictive maintenance use-case requirements to the capabilities of big data streaming technologies focusing on open-source tools. Based on our research, we propose some optimal combinations of open-source big data technologies for our selected use cases.
Keywords: Industry 4.0 | Big Data | Stream processing | Predictive maintenance | Railway | Wind turbines
A multi-scale method for forecasting oil price with multi-factor search engine data
یک روش چند مقیاس برای پیش بینی قیمت نفت با داده های موتور جستجوی چند عاملی-2020
With the boom in big data, a promising idea for using search engine data has emerged and improved international oil price prediction, a hot topic in the fields of energy system modelling and analysis. Since different search engine data drive the oil price in different ways at different timescales, a multi-scale forecasting methodology is proposed that carefully explores the multi-scale relationship between the oil price and multi-factor search engine data. In the proposed methodology, three major steps are involved: (1) a multi-factor data process, to collect informative search engine data, reduce dimensionality, and test the predictive power via statistical analyses; (2) multi-scale analysis, to extract matched common modes at similar timescales from the oil price and multi-factor search engine data via multivariate empirical mode decomposition; (3) oil price prediction, including individual prediction at each timescale and ensemble prediction across timescales via a typical forecasting technique. With the Brent oil price as a sample, the empirical results show that the novel methodology significantly outperforms its original form (without multi-factor search engine data and multi-scale analysis), semi-improved versions (with either multi-factor search engine data or multi-scale analysis), and similar counterparts (with other multi-scale analysis), in both the level and directional predictions.
Keywords: Big data | Search engine data | Google trends | Multivariate empirical mode decomposition | Oil price forecasting
Pivot-based approximate k-NN similarity joins for big high-dimensional data
پیوندهای شباهت تقریبی k-NN مبتنی بر محوری برای داده های بزرگ با ابعاد بزرگ-2020
Given an appropriate similarity model, the k-nearest neighbor similarity join represents a useful yet costly operator for data mining, data analysis and data exploration applications. The time to evaluate the operator depends on the size of datasets, data distribution and the dimensionality of data representations. For vast volumes of high-dimensional data, only distributed and approximate approaches make the joins practically feasible. In this paper, we investigate and evaluate the performance of multiple MapReduce-based approximate k-NN similarity join approaches on two leading Big Data systems Apache Hadoop and Spark. Focusing on the metric space approach relying on reference dataset objects (pivots), this paper investigates distributed similarity join techniques with and without approximation guarantees and also proposes high-dimensional extensions to previously proposed algorithms. The paper describes the design guidelines, algorithmic details, and key theoretical underpinnings of the compared approaches and also presents the empirical performance evaluation, approximation precision, and scalability properties of the implemented algorithms. Moreover, the Spark source code of all these algorithms has been made publicly available. Key findings of the experimental analysis are that randomly initialized pivot-based methods perform well with big highdimensional data and that, in general, the selection of the best algorithm depends on the desired levels of approximation guarantee, precision and execution time.
Keywords: Hadoop | Spark | MapReduce | k-NN | Approximate similarity join | High-dimensional data
Similarity query support in big data management systems
پشتیبانی پرس و جوی شباهت ها در سیستم های مدیریت داده های بزرگ-2020
Similarity query processing is becoming increasingly important in many applications such as data cleaning, record linkage, Web search, and document analytics. In this paper we study how to provide end-to-end similarity query support natively in a parallel database system. We discuss how to express a similarity predicate in its query language, how to build indexes, how to answer similarity queries (selections and joins) efficiently in the runtime engine, possibly using indexes, and how to optimize similarity queries. One particular challenge is how to incorporate existing similarity join algorithms, which often require a series of steps to achieve a high efficiency, including collecting token frequencies, finding matching record id pairs, and reassembling result records based on id pairs. We present a novel approach that uses existing runtime operators to implement such complex join algorithms without reinventing the wheel; doing so positions the system to automatically benefit from future improvements to those operators. The approach includes a technique to transform a similarity join plan into an efficient operator-based physical plan during query optimization by using a template expressed largely in the system’s user-level query language; this technique greatly simplifies the specification of such a transformation rule. We use Apache AsterixDB, a parallel Big Data management system, to illustrate and validate our techniques. We conduct an experimental study using several large, real datasets on a parallel computing cluster to assess the similarity query support. We also include experiments involving three other parallel systems and report the efficacy and performance results.
Keywords: Similarity query | Parallel database | Optimization
An extensive study on the evolution of context-aware personalized travel recommender systems
یک مطالعه گسترده در مورد تکامل سیستمهای توصیه گر سفر شخصی آگاه از متن-2020
Ever since the beginning of civilization, travel for various causes exists as an essential part of human life so as travel recommendations, though the early form of recommendations were the accrued experiences shared by the community. Modern recommender systems evolved along with the growth of Information Technology and are contributing to all industry and service segments inclusive of travel and tourism. The journey started with generic recommender engines which gave way to personalized recommender systems and further advanced to contextualized personalization with advent of artificial intelligence. Current era is also witnessing a boom in social media usage and the social media big data is acting as a critical input for various analytics with no exception for recommender systems. This paper details about the study conducted on the evolution of travel recommender systems, their features and current set of limitations. We also discuss on the key algorithms being used for classification and recommendation processes and metrics that can be used to evaluate the performance of the algorithms and thereby the recommenders.
Keywords: Recommender system | Personalization | Context aware | Big data | Travel and tourism
Towards a real-time processing framework based on improved distributed recurrent neural network variants with fastText for social big data analytics
به سمت یک چارچوب پردازش در زمان واقعی بر اساس بهبود انواع شبکه عصبی مکرر توزیع شده با fastText برای تجزیه و تحلیل داده های بزرگ اجتماعی-2020
Big data generated by social media stands for a valuable source of information, which offers an excellent opportunity to mine valuable insights. Particularly, User-generated contents such as reviews, recommendations, and users’ behavior data are useful for supporting several marketing activities of many companies. Knowing what users are saying about the products they bought or the services they used through reviews in social media represents a key factor for making decisions. Sentiment analysis is one of the fundamental tasks in Natural Language Processing. Although deep learning for sentiment analysis has achieved great success and allowed several firms to analyze and extract relevant information from their textual data, but as the volume of data grows, a model that runs in a traditional environment cannot be effective, which implies the importance of efficient distributed deep learning models for social Big Data analytics. Besides, it is known that social media analysis is a complex process, which involves a set of complex tasks. Therefore, it is important to address the challenges and issues of social big data analytics and enhance the performance of deep learning techniques in terms of classification accuracy to obtain better decisions. In this paper, we propose an approach for sentiment analysis, which is devoted to adopting fastText with Recurrent neural network variants to represent textual data efficiently. Then, it employs the new representations to perform the classification task. Its main objective is to enhance the performance of well-known Recurrent Neural Network (RNN) variants in terms of classification accuracy and handle large scale data. In addition, we propose a distributed intelligent system for real-time social big data analytics. It is designed to ingest, store, process, index, and visualize the huge amount of information in real-time. The proposed system adopts distributed machine learning with our proposed method for enhancing decision-making processes. Extensive experiments conducted on two benchmark data sets demonstrate that our proposal for sentiment analysis outperforms well-known distributed recurrent neural network variants (i.e., Long Short-Term Memory (LSTM), Bidirectional Long Short-Term Memory (BiLSTM), and Gated Recurrent Unit (GRU)). Specifically, we tested the efficiency of our approach using the three different deep learning models. The results show that our proposed approach is able to enhance the performance of the three models. The current work can provide several benefits for researchers and practitioners who want to collect, handle, analyze and visualize several sources of information in real-time. Also, it can contribute to a better understanding of public opinion and user behaviors using our proposed system with the improved variants of the most powerful distributed deep learning and machine learning algorithms. Furthermore, it is able to increase the classification accuracy of several existing works based on RNN models for sentiment analysis.
Keywords: Big data | FastText | Recurrent neural networks | LSTM | BiLSTM | GRU | Natural language processing | Sentiment analysis | Social big data analytics