PWR pin-homogenized cross-sections analysis using big-data technology
PWR تجزیه و تحلیل مقطع پین همگن با استفاده از فناوری داده های بزرگ-2020
To accurately and reliably predict the behavior of the nuclear reactor core, pin-by-pin fuel management calculation is becoming the next most possible methodology for the Pressurized Water Reactor (PWR). The fewgroup constants parameterization, however, would be the most challenge problem due to the large number of few-group constants and the complicated pin-cell states during the entire operation history of the reactor core. In this study, the big-data analysis technologies were employed to find a possible relationship between the pin-cell homogenized few-group constants and the pin-cell homogenized nuclide densities. Firstly, correlation analysis demonstrated a strong multicollinearity between different nuclide densities, which implies a significant internal structure and interaction among them. Secondly, by using the factor analysis, the large number of nuclide densities were reduced to a small group of nuclide density factors that can reflect the main information of their variation trends. Thirdly, multiple regression analysis between the nuclide density factors and few-group constants verified the possibility of using nuclide density in form of driven factors as the state parameters to predict the few-group constants.
Keywords: Pin-cell homogenized few-group constants | Nuclide density | Correlation analysis | Factor analysis | Multiple regression analysis
Cost-efficient dynamic scheduling of big data applications in apache spark on cloud
برنامه ریزی پویا مقرون به صرفه برنامه های داده های بزرگ در آپاچی اسپارک روی ابر-2020
Job scheduling is one of the most crucial components in managing resources, and efficient execution of big data applications. Specifically, scheduling jobs in a cloud-deployed cluster are challenging as the cloud offers different types of Virtual Machines (VMs) and jobs can be heterogeneous. The default big data processing framework schedulers fail to reduce the cost of VM usages in the cloud environment while satisfying the performance constraints of each job. The existing works in cluster scheduling mainly focus on improving job performance and do not leverage from VM types on the cloud to reduce cost. In this paper, we propose efficient scheduling algorithms that reduce the cost of resource usage in a cloud-deployed Apache Spark cluster. In addition, the proposed algorithms can also prioritise jobs based on their given deadlines. Besides, the proposed scheduling algorithms are online and adaptive to clus- ter changes. We have also implemented the proposed algorithms on top of Apache Mesos. Furthermore, we have performed extensive experiments on real datasets and compared to the existing schedulers to showcase the superiority of our proposed algorithms. The results indicate that our algorithms can reduce resource usage cost up to 34% under different workloads and improve job performance.
Keywords: Cloud | Apache spark | Scheduling | Cost-efficiency
Model-based vehicular prognostics framework using Big Data architecture
چارچوب پیش آگهی های وسایل نقلیه مبتنی بر مدل با استفاده از معماری داده های بزرگ-2020
Nowadays, the continuous technological advances allow designing novel Integrated Vehicle Health Man-agement (IVHM) systems to deal with strict safety regulations in the automotive field with the aim atimproving efficiency and reliability of automotive components. However, challenging issue, which arisesin this domain, is handling a huge amount of data that are useful for prognostic. To this aim, in thispaper we propose a cloud-based infrastructure, namely Automotive predicTOr Maintenance In Cloud(ATOMIC), for prognostic analysis that leverages Big Data technologies and mathematical models of bothnominal and faulty behaviour of the automotive components to estimate on-line the End-Of-Life (EOL)and Remaining Useful Life (EUL) indicators for the automotive systems under investigation. A case studybased on the Delphi DFG1596 fuel pump has been presented to evaluate the proposed prognostic method.Finally, we perform a benchmark analysis of the deployment configurations of ATOMIC architecture interms of scalability and cost.
Keywords:Model-based prognostic analysis | Big Data analysis | Cloud computing servicesa
A new MapReduce solution for associative classification to handle scalability and skewness in vertical data structure
یک راه حل جدید MapReduce برای طبقه بندی انجمنی برای مقابله با مقیاس پذیری و پوستی در ساختار داده های عمودی-2020
Associative classification is a promising methodology in information mining that uses the association rule discovery procedures to build the classifier. But they have some limitations like: they are not able to handle big data as they have memory constraints, high time complexity, load imbalance and data skewness. Data skewness occurs invariably when big data analytics comes in picture and affects the efficiency of an approach. This paper presents the MapReduce solution for associative classification in respect of vertical data layout. To handle these problems we have proposed two algorithms MRMCAR- F (MapReduce-Multi Class Associative Classifier-MapReduce fast algorithm) and MR-MCAR-L (MapReduce-Multi Class Associative Classifier Load parallel frequent pattern growth algorithm). Also in this paper, MapReduce solution of Tid List and Database coverage has been proposed. We have used three type of pruning techniques viz. database coverage, global and distributed pruning. The proposed approaches have been compared with latest approach from the literature survey in terms of accuracy, computation time and data skewness. The existing scalable approaches cannot handle skewness while, our proposed method handles it in a very effective manner. All the experiments have been performed on six datasets which have been extracted from UCI repositories on the Hadoop framework. Proposed algorithms are scalable solutions for associative classification to handle big data and data skewness.
Keywords: Associative classification | Scalability | Data skewness | Load balancing | Big data | Hadoop
Intelligence and security in big 5G-oriented IoNT: An overview
هوش و امنیت در اینترنت اشیا نانو 5G بزرگ گرا: یک مرور کلی-2020
Internet of Nano-Things (IoNT) overcomes critical difficulties and additionally open doors for wearable sensor based huge information examination. Conventional computing and/or communication systems do not offer enough flexibility and adaptability to deal with the gigantic amount of assorted information nowadays. This creates the need for legitimate components that can efficiently investigate and communicate the huge data while maintaining security and quality of service. In addition, while developing the ultra-wide Heterogeneous Networks (HetNets) associated with the ongoing Big Data project and 5G-based IoNT, it is required to resolve the emerging difficulties as well. Accordingly, these difficulties and other relevant design issues have been comprehensively reported in this survey. It mainly focuses on security issues and associated intelligence to be considered while managing these issues.
Keywords: IoNT | Security | Big data | Design factors
Semantic-aware data quality assessment for image big data
ارزیابی کیفیت داده های آگاهانه معنایی برای داده های بزرگ تصویر-2020
Data quality (DQ) assessment is essential for realizing the promise of big data by judging the value of data in advance. Relevance, an indispensable dimension of DQ, focusing on ‘‘fitness for requirement’’, can arouse the user’s interest in exploiting the data source. It has two-level evaluations: (1) the amount of data that meets the user’s requirements; (2) the matching degree of these relevant data. However, there lack works of DQ assessment at dimension of relevance, especially for unstructured image data which focus on semantic similarity. When we try to evaluate semantic relevance between an image data source and a query (requirement), there are three challenges: (1) how to extract semantic information with generalization ability for all image data? (2) how to quantify relevance by fusing the quantity of relevant data and the degree of similarity comprehensively? (3) how to improve assessing efficiency of relevance in a big data scenario by design of an effective architecture? To overcome these challenges, we propose a semantic-aware data quality assessment (SDQA) architecture which includes off-line analysis and on-line assessment. In off-line analysis, for an image data source, we first transform all images into hash codes using our improved Deep Self-taught Hashing (IDSTH) algorithm which can extract semantic features with generalization ability, then construct a graph using hash codes and restricted Hamming distance, next use our designed Semantic Hash Ranking (SHR) algorithm to calculate the importance score (rank) for each node (image), which takes both the quantity of relevant images and the degree of semantic similarity into consideration, and finally rank all images in descending order of score. During on-line assessment, we first convert the user’s query into hash codes using IDSTH model, then retrieve matched images to collate their importance scores, and finally help the user determine whether the image data source is fit for his requirement. The results on public dataset and real-world dataset show effectiveness, superiority and on-line efficiency of our SDQA architecture.
Keywords: Semantic-aware | Quality assessment | Image big data | IDSTH | SHR
Fast and effective Big Data exploration by clustering
اکتشاف سریع و موثر داده های بزرگ با خوشه بندی-2020
The rise of Big Data era calls for more efficient and effective Data Exploration and analysis tools. In this respect, the need to support advanced analytics on Big Data is driving data scientist’ interest toward massively parallel distributed systems and software platforms, such as Map-Reduce and Spark, that make possible their scalable utilization. However, when complex data mining algorithms are required, their fully scalable deployment on such platforms faces a number of technical challenges that grow with the complexity of the algorithms involved. Thus algorithms, that were originally designed for a sequential nature, must often be redesigned in order to effectively use the distributed computational resources. In this paper, we explore these problems, and then propose a solution which has proven to be very effective on the complex hierarchical clustering algorithm CLUBS+. By using four stages of successive refinements, CLUBS+ delivers high-quality clusters of data grouped around their centroids, working in a totally unsupervised fashion. Experimental results confirm the accuracy and scalability of CLUBS+ on platforms tailored for Big Data management.
Keywords: Big Data | Clustering | Data exploration
QoS provisioning for various types of deadline-constrained bulk data transfers between data centers
تامین کیفیت سرویس برای انواع مختلف انتقال داده های فشرده محدود بین مراکز داده-2020
Keywords: Big data | Data center | High-performance networks | Software-defined networking | Bandwidth scheduling
Static malware detection and attribution in android byte-code through an end-to-end deep system
شناسایی بدافزارهای استاتیکی و انتساب در بایت کد اندرویدی از طریق یک سیستم عمیق انتها به انتها-2020
Android reflects a revolution in handhelds and mobile devices. It is a virtual machine based, an open source mobile platform that powers millions of smartphone and devices and even a larger no. of applications in its ecosystem. Surprisingly in a short lifespan, Android has also seen a colossal expansion in application malware with 99% of the total malware for smartphones being found in the Android ecosystem. Subsequently, quite a few techniques have been proposed in the literature for the analysis and detection of these malicious applications for the Android platform. The increasing and diversified nature of Android malware has immensely attenuated the usefulness of prevailing malware detectors, which leaves Android users susceptible to novel malware. Here in this paper, as a remedy to this problem, we propose an anti-malware system that uses customized learning models, which are sufficiently deep, and are ’End to End deep learning architectures which detect and attribute the Android malware via opcodes extracted from application bytecode’. Our results show that Bidirectional long short-term memory (BiLSTMs) neural networks can be used to detect static behavior of Android malware beating the state-of-the-art models without using handcrafted features. For our experiments in our system, we also choose to work with distinct and independent deep learning models leveraging sequence specialists like recurrent neural networks, Long Short Term Memory networks and its Bidirectional variation as well as those are more usual neural architectures like a network of all connected layers(fully connected), deep convnets, Diabolo network (autoencoders) and generative graphical models like deep belief networks for static malware analysis on Android. To test our system, we have also augmented a bytecode dataset from three open and independently maintained state-of-the-art datasets. Our bytecode dataset, which is on an order of magnitude large, essentially suffice for our experiments. Our results suggests that our proposed system can lead to better design of malware detectors as we report an accuracy of 0.999 and an F1-score of 0.996 on a large dataset of more than 1.8 million Android applications.
Keywords: End-to-end architecture | Malware analysis | Deep neural networks | Android and big data
Real-time resource scaling platform for Big Data workloads on serverless environments
بستر مقیاس گذاری منابع در زمان واقعی برای بارهای کاری داده های بزرگ در محیطهای بدون سرور-2020
The serverless execution paradigm is becoming an increasingly popular option when workloads are to be deployed in an abstracted way, more specifically, without specifying any infrastructure requirements. Currently, such workloads are typically comprised of small programs or even a series of single functions used as event triggers or to process a data stream. Other applications that may also fit on a serverless scenario are stateless services that may need to seamlessly scale in terms of resources, such as a web server. Although several commercial serverless services are available (e.g., Amazon Lambda), their use cases are mostly limited to the execution of functions or scripts that can be adapted to predefined templates or specifications. However, current research efforts point out that it is interesting for the serverless paradigm to evolve from single functions and support more flexible infrastructure units such as operating-system-level virtualization in the form of containers. In this paper we present a novel platform to automatically scale container resources in real time, while they are running, and without any need for reboots. This platform is evaluated using Big Data workloads, both batch and streaming, as representative examples of applications that could be initially regarded as unsuitable for the serverless paradigm considering the currently available services. The results show how our serverless platform can improve the CPU utilization by up to 77% with an execution time overhead of only 6%, while remaining scalable when using a 32-container cluster.
Keywords: Serverless computing | Big Data | Resource scaling | Operating-system-level virtualization | Container cluster