Globally-biased BIRECT algorithm with local accelerators for expensive global optimization
الگوریتم BIRECT مغرضانه جهانی با شتاب دهنده های محلی برای بهینه سازی جهانی ارزشمند-2019
In this paper, black-box global optimization problem with expensive function evaluations is considered. This problem is challenging for numerical methods due to the practical limits on computational budget often required by intelligent systems. For its efficient solution, a new DIRECT-type hybrid technique is proposed. The new algorithm incorporates a novel sampling on diagonals and bisection strategy (instead of a trisection which is commonly used in the existing DIRECT-type algorithms), embedded into the globally-biased framework, and enriched with three different local minimization strategies. The numerical results on a test set of almost 900 problems from the literature and on a real-life application regarding nonlinear regression show that the new approach effectively addresses well-known DIRECT weaknesses, has beneficial effects on the overall performance, and on average, gives significantly better results compared to several DIRECT-type methods widely used in decision-making expert systems.
Keywords: Nonlinear global optimization| DIRECT-type algorithms | BIRECT algorithm | hybrid optimization algorithms | nonlinear regression
A Survey and Taxonomy of FPGA-based Deep Learning Accelerators
مرور و طبقه بندی شتاب دهنده های یادگیری عمیق مبتنی بر FPGA-2019
Deep learning, the fastest growing segment of Artificial Neural Network (ANN), has led to the emergence of many machine learning applications and their implementation across multiple platforms such as CPUs, GPUs and recon- figurable hardware ( Field-Programmable Gate Arrays or FPGAs). However, inspired by the structure and function of ANNs, large-scale deep learning topologies require a considerable amount of parallel processing, memory re- sources, high throughput and significant processing power. Consequently, in the context of real time hardware systems, it is crucial to find the right trade-offbetween performance, energy efficiency, fast development, and cost. Although limited in size and resources, several approaches have showed that FPGAs provide a good starting point for the development of future deep learning implementation architectures. Through this paper, we briefly review recent work related to the implementation of deep learning algorithms in FPGAs. We will analyze and compare the design requirements and features of existing topologies to finally propose development strategies and implementation architectures for better use of FPGA-based deep learning topologies. In this context, we will examine the frameworks used in these studies, which will allow testing a lot of topologies to finally arrive at the best implementation alternatives in terms of performance and energy efficiency.
Keywords: Deep learning | Framework | Optimized implementation | FPGA
A survey of techniques for optimizing deep learning on GPUs
مروری بر تکنیک های بهینه سازی یادگیری عمیق در GPU-2019
The rise of deep-learning (DL) has been fuelled by the improvements in accelerators. Due to its unique features, the GPU continues to remain the most widely used accelerator for DL applications. In this paper, we present a survey of architecture and system-level techniques for optimizing DL applications on GPUs. We review techniques for both inference and training and for both single GPU and distributed system with multiple GPUs. We bring out the similarities and differences of different works and highlight their key attributes. This survey will be useful for both novice and experts in the field of machine learning, processor architecture and high-performance computing
Keywords: Review | GPU | Hardware architecture for deep learning | Accelerator | Distributed training | Parameter | server Allreduce | Pruning | Tiling
Competition and credit procyclicality in European banking
رقابتی بودن و اعتبار در بانکداری اروپا-2019
This paper empirically assesses how competition in the banking sector affects credit procyclicality by estimating both an interacted panel VAR model using macroeconomic data and a single-equation model with bank-level data. These two empirical approaches show that a deviation of actual GDP from potential GDP leads to greater credit fluctuations in economies where bank competition is weak. This suggests that increased market power for banks increases the financial accelerator mechanism, which is consistent with recent macroeconomic models showing that monopolistic banking tends to increase macroeconomic volatility by making credit cheaper during booms and more expensive during recessions.
Keywords: Credit cycle | Business cycle | Bank competition | Interacted panel VAR
The Rise of Big Data in Oncology
ظهور داده های بزرگ در انکولوژی-2018
OBJECTIVES: To describe big data and data science in the context of oncology nursing care. DATA SOURCES: Peer-reviewed and lay publications. CONCLUSION: The rapid expansion of real-world evidence from sources such as the electronic health record, genomic sequencing, administrative claims and other data sources has outstripped the ability of clinicians and researchers to manually review and analyze it. To promote high-quality, high-value cancer care, big data platforms must be constructed from standardized data sources to support extraction of meaningful, comparable insights. IMPLICATIONS FOR NURSING PRACTICE: Nurses must advocate for the use of stan dardized vocabularies and common data elements that represent terms and concepts that are meaningful to patient care. he term “big data” first appeared in the literature in 1997 by researchers at NASA as they described the challenges to store the volume of information generated as a result of a new, data-intensive type of computational work.1 In 2008, a white paper entitled “Big-Data Computing: Creating revolutionary breakthroughs in commerce, science and society,” highlighted the rapid integration of data-driven strategies across settings ranging from Wal-Mart’s (then) 4 petabyte (4000 trillion bytes) data warehouse to the 15 petabytes of data projected to be generated annually by the Large Hadron Collider particle accelerator project,2 and is credited with widespread adoption of the term.3
KEY WORDS: electronic health records, meaningful use, artificial intelligence, neoplasms
CPU-FPGA Co scheduling for Big Data Applications
زمانبندی CPU-FPGA برای برنامه های داده های بزرگ-2018
FPGA accelerators integrated with general-purpose CPUs have brought opportunities to improve energy efficiency of data center workloads. This article addresses the problem of coordination between FPGAs and multicore CPUs for big data applications.
Keywords: Reconfigurable hardware, Scheduling and Task partition, Genomic data processing
FASTEN: An FPGA Based Secure System for Big Data Processing
FASTEN: یک سیستم امن بر اساس FPGA برای پردازش داده های بزرگ-2018
In cloud computing framework, the data security and protection is one of the most important aspects for optimization and concrete implementation. This paper proposes a reliable yet efficient FPGA-based security system via crypto engines and Physical Unclonable Functions (PUFs) for big data applications. Considering that FPGA or GPU-based accelerators are popular in data centers, we believe the proposed approach is very practical and effective method for data security in cloud computing.
Keywords: FPGA, Security,Big Data, Cloud Computing, Hadoop MapReduce
Big vs little core for energy-efficient Hadoop computing
بزرگ در مقابل هسته برای محاسبات هادوپ انرژی کارآمد-2018
Emerging big data applications require a significant amount of server computational power. However, the rapid growth in the data yields challenges to process them efficiently using current high-performance server architectures. Furthermore, physical design constraints, such as power and density, have become the dominant limiting factor for scaling out servers. Heterogeneous architectures that combine big Xeon cores with little Atom cores have emerged as a promising solution to enhance energy-efficiency by allowing each application to run on an architecture that matches resource needs more closely than a one size-fits-all architecture. Therefore, the question of whether to map the application to big Xeon or little Atom in heterogeneous server architecture becomes important. In this paper, through a comprehensive system level analysis, we first characterize Hadoop-based MapReduce applications on big Xeon and little Atom-based server architectures to understand how the choice of big vs little cores is affected by various parameters at application, system and architecture levels and the interplay among these parameters. Second, we study how the choice between big and little core changes across various phases of MapReduce tasks. Furthermore, we show how the choice of most efficient core for a particular MapReduce phase changes in the presence of accelerators. The characterization analysis helps guiding scheduling decisions in future cloud-computing environment equipped with heterogeneous multicore architectures and accelerators. We have also evaluated the operational and the capital cost to understand how performance, power and area constraints for big data analytics affect the choice of big vs little core server as a more cost and energy efficient architecture.
Keywords: Heterogeneous architectures ، Hadoop ، MapReduce ، Energy and cost efficiency ، Big and little cores ، Scheduling
GPU-accelerated Tersoff potentials for massively parallel Molecular Dynamics simulations
پتانسیل Tersoff برای تسریع GPU برای شبیه سازی دینامیک مولکولی موازی-2017
The Tersoff potential is one of the empirical many-body potentials that has been widely used in simulation studies at atomic scales. Unlike pair-wise potentials, the Tersoff potential involves three-body terms, which require much more arithmetic operations and data dependency. In this contribution, we have implemented the GPU-accelerated version of several variants of the Tersoff potential for LAMMPS, an open-source massively parallel Molecular Dynamics code. Compared to the existing MPI implementation in LAMMPS, the GPU implementation exhibits a better scalability and offers a speedup of 2.2X when run on 1000 compute nodes on the Titan supercomputer. On a single node, the speedup ranges from 2.0 to 8.0 times, depending on the number of atoms per GPU and hardware configurations. The most notable features of our GPU-accelerated version include its design for MPI/accelerator heterogeneous parallelism, its compatibility with other functionalities in LAMMPS, its ability to give deterministic results and to support both NVIDIA CUDA- and OpenCL-enabled accelerators. Our implementation is now part of the GPU package in LAMMPS and accessible for public use.
Keywords: Tersoff | LAMMPS | GPU acceleration | Hybrid MPI/GPU | High-performance computing
Efficient and accurate algorithms for computing matrix trigonometric functions
الگوریتم های کارآمد و دقیق برای محاسبه توابع مثلثاتی ماتریس-2017
Trigonometric matrix functions play a fundamental role in second order differential equations. This work presents an algorithm based on Taylor series for computing the matrix cosine. It uses a backward error analysis with improved bounds. Numerical experiments show that MATLAB implementations of this algorithm has higher accuracy than other MATLAB implementations of the state of the art in the majority of tests. Furthermore, we have implemented the designed algorithm in language C for general purpose processors, and in CUDA for one and two NVIDIA GPUs. We obtained a very good performance from these implementations thanks to the high computational power of these hardware accelerators and our effort driven to avoid as much communications as possible. All the implemented programs are accessible through the MATLAB environment.
Keywords: Matrix cosine | Matrix sine | Scaling and squaring method | Taylor series | Backward error | Parallel implementation