با سلام خدمت کاربران در صورتی که با خطای سیستم پرداخت بانکی مواجه شدید از طریق کارت به کارت (6037997535328901 بانک ملی ناصر خنجری ) مقاله خود را دریافت کنید (تا مشکل رفع گردد).
ردیف | عنوان | نوع |
---|---|---|
1 |
Efficient Implementation of Lightweight Hash Functions on GPU and Quantum Computers for IoT Applications
اجرای کارآمد توابع هش سبک در GPU و کامپیوترهای کوانتومی برای کاربردهای اینترنت اشیا-2022 Secure communication is important for Internet of Things (IoT) applications, to avoid cybersecurity attacks. One of the key security aspects is data integrity, which can be protected by employing cryptographic hash functions. Recently, US National Institute of Standards and Technology (NIST)
announced a competition to standardize lightweight hash functions, which can be used in IoT applications.
IoT communication involves various hardware platforms, from low-end microcontrollers to high-end cloud
servers with GPU accelerators. Since many sensor nodes are connected to the gateway devices and cloud
servers, performing high throughput integrity check is important to secure IoT applications. However, this is a
time consuming task even for high-end servers, which may affect the response time in IoT systems. Moreover,
no prior work had evaluated the performance of NIST candidates on contemporary processors like GPU and
quantum computers. In this study, we showed that with carefully crafted implementation techniques, all
the finalist hash function candidates in the NIST standardization competition can achieve high throughput
(up-to 1,000 Gbps) on a RTX 3080 GPU. This research output can be used by IoT gateway devices and cloud
servers to perform data integrity checks at high speed, thus ensuring a timely response. In addition, this is
also the first study that showcase the implementation of NIST lightweight hash functions on a quantum
computer (ProjectQ). Besides securing the communication in IoT, these efficient implementations on a GPU
and quantum computer can be used to evaluate the strength of respective hash functions against brute-force
attack.
INDEX TERMS: Graphics processing units (GPU) | hash function | lightweight cryptography | quantum computer. |
مقاله انگلیسی |
2 |
Deep convolutional neural networks-based Hardware–Software on-chip system for computer vision application
سیستم سختافزار-نرمافزار روی تراشه مبتنی بر شبکههای عصبی عمیق برای کاربرد بینایی ماشین-2022 Embedded vision systems are the best solutions for high-performance and lightning-fast inspection tasks. As everyday life evolves, it becomes almost imperative to harness artificial
intelligence (AI) in vision applications that make these systems intelligent and able to make
decisions close to or similar to humans. In this context, the AI’s integration on embedded
systems poses many challenges, given that its performance depends on data volume and
quality they assimilate to learn and improve. This returns to the energy consumption and
cost constraints of the FPGA-SoC that have limited processing, memory, and communication
capacity. Despite this, the AI algorithm implementation on embedded systems can drastically
reduce energy consumption and processing times, while reducing the costs and risks associated
with data transmission. Therefore, its efficiency and reliability always depend on the designed
prototypes. Within this range, this work proposes two different designs for the Traffic Sign
Recognition (TSR) application based on the convolutional neural network (CNN) model,
followed by three implantations on PYNQ-Z1. Firstly, we propose to implement the CNN-based
TSR application on the PYNQ-Z1 processor. Considering its runtime result of around 3.55 s,
there is room for improvement using programmable logic (PL) and processing system (PS) in a
hybrid architecture. Therefore, we propose a streaming architecture, in which the CNN layers
will be accelerated to provide a hardware accelerator for each layer where direct memory
access (DMA) interface is used. Thus, we noticed efficient power consumption, decreased
hardware cost, and execution time optimization of 2.13 s, but, there was still room for design
optimizations. Finally, we propose a second co-design, in which the CNN will be accelerated
to be a single computation engine where BRAM interface is used. The implementation results
prove that our proposed embedded TSR design achieves the best performances compared to the
first proposed architectures, in terms of execution time of about 0.03 s, computation roof of
about 36.6 GFLOPS, and bandwidth roof of about 3.2 GByte/s.
keywords: CNN | FPGA | Acceleration | Co-design | PYNQ-Z1 |
مقاله انگلیسی |
3 |
Memristor Crossbar Arrays Performing Quantum Algorithms
آرایه های ضربدری ممریستور که الگوریتم های کوانتومی را انجام می دهند-2022 There is a growing interest in quantum computers
and quantum algorithm development. It has been proved that
ideal quantum computers, with zero error rates and large
decoherence times, can solve problems that are intractable
for today’s classical computers. Quantum computers use two
resources, superposition and entanglement, that have no classical
analog. Since quantum computer platforms that are currently
available comprise only a few dozen of qubits, the use of quantum
simulators is essential in developing and testing new quantum
algorithms. We present a novel quantum simulator based on
memristor crossbar circuits and use them to simulate well-known
quantum algorithms, namely the Deutsch and Grover quantum algorithms. In quantum computing the dominant algebraic
operations are matrix-vector multiplications. The execution time
grows exponentially with the simulated number of qubits, causing
an exponential slowdown in quantum algorithm execution using
classical computers. In this work, we show that the inherent
characteristics of memristor arrays can be used to overcome this
problem and that memristor arrays can be used not only as independent quantum simulators but also as a part of a quantum computer stack where classical computers accelerators are
connected. Our memristive crossbar circuits are re-configurable
and can be programmed to simulate any quantum algorithm.
Index Terms— Memristors | memristor crossbars | quantum algorithms | quantum simulators. |
مقاله انگلیسی |
4 |
Retargetable Optimizing Compilers for Quantum Accelerators via a Multilevel Intermediate Representation
کامپایلرهای بهینه سازی مجدد قابل هدف گیری برای شتاب دهنده های کوانتومی از طریق یک نمایش میانی چند سطحی-2022 We present a multilevel quantum–classical intermediate representation (IR) that
enables an optimizing, retargetable compiler for available quantum languages.
Our work builds upon the multilevel intermediate representation (MLIR)
framework and leverages its unique progressive lowering capabilities to map
quantum languages to the low-level virtual machine (LLVM) machine-level IR.
We provide both quantum and classical optimizations via the MLIR pattern
rewriting subsystem and standard LLVM optimization passes, and demonstrate
the programmability, compilation, and execution of our approach via standard
benchmarks and test cases. In comparison to other standalone language and
compiler efforts available today, our work results in compile times that are
1,000 faster than standard Pythonic approaches, and 5–10 faster than
comparative standalone quantum language compilers. Our compiler provides
quantum resource optimizations via standard programming patterns that result
in a 10 reduction in entangling operations, a common source of program
noise. We see this work as a vehicle for rapid quantum compiler prototyping.
|
مقاله انگلیسی |
5 |
Benchmarking vision kernels and neural network inference accelerators on embedded platforms
محک زدن هسته بینایی و شتاب دهنده های استنتاج شبکه عصبی بر روی سیستم عامل های توکار-2021 Developing efficient embedded vision applications requires exploring various algorithmic optimization trade- offs and a broad spectrum of hardware architecture choices. This makes navigating the solution space and finding the design points with optimal performance trade-offs a challenge for developers. To help provide a fair baseline comparison, we conducted comprehensive benchmarks of accuracy, run-time, and energy efficiency of a wide range of vision kernels and neural networks on multiple embedded platforms: ARM57 CPU, Nvidia Jetson TX2 GPU and Xilinx ZCU102 FPGA. Each platform utilizes their optimized libraries for vision kernels (OpenCV, Vision Works and xfOpenCV) and neural networks (OpenCV DNN, TensorRT and Xilinx DPU). Forvision kernels, our results show that the GPU achieves an energy/frame reduction ratio of 1.1–3.2× compared to the others for simple kernels. However, for more complicated kernels and complete vision pipelines, the FPGA outperforms the others with energy/frame reduction ratios of 1.2–22.3×. For neural networks [Inception-v2 and ResNet-50, ResNet-18, Mobilenet-v2 and SqueezeNet], it shows that the FPGA achieves a speed up of [2.5, 2.1, 2.6, 2.9 and 2.5]× and an EDP reduction ratio of [1.5, 1.1, 1.4, 2.4 and 1.7]× compared to the GPU FP16 implementations, respectively. Keywords: Benchmarks | CPUs | GPUs | FPGAs | Embedded vision | Neural networks |
مقاله انگلیسی |
6 |
Are social incubators different from other incubators? Evidence from Italy
آیا دستگاه های جوجه کشی اجتماعی با سایر دستگاه های جوجه کشی فرق دارند؟ مدارکی از ایتالیا-2020 This paper defines and analyses incubators that mainly support start-ups with a significant social impact. In 2016, a survey was conducted on the 162 incubators active in Italy, and a total of 88 responses were received. An analysis of the literature and of this dataset led to the identification of three types of incubators: Business, Mixed, and Social. Thirty of the respondents sent information on their tenants. Thanks to the data regarding 247 tenants, it was possible to analyze the impact of the three different types of incubators (Business, Mixed, and Social) on the tenants’ growth through OLS regression analyses. A Social Incubator is here defined as an incubator that supports more than 50% of start-ups that aim to introduce a positive social impact. The study shows that Social Incubators perceive social impact measurement and training/consulting on business ethics and CSR as being more important services than other incubator types. The regression analyses explain that Social Incubators are as efficient as other incubators, in terms of tenants’ economic growth, notwithstanding the focus of Social Incubators on start-ups that do not pursue only economic objectives. Finally, this study indicates that policy- makers can foster Social Incubators to support social entrepreneurship. Keywords: Incubators | Accelerators | Social start-up | Social entrepreneurship | Social innovation | Entrepreneurship |
مقاله انگلیسی |
7 |
Collaborative AI and Laboratory Medicine integration in precision cardiovascular medicine
یکپارچه سازی هوش مصنوعی و داروی آزمایشگاهی در پزشکی دقیق قلب و عروق-2020 Artificial Intelligence (AI) is a broad term that combines computation with sophisticated mathematical models
and in turn allows the development of complex algorithms which are capable to simulate human intelligence
such as problem solving and learning. It is devised to promote a significant paradigm shift in the most diverse
areas of medical knowledge. On the other hand, Cardiology is a vast field dealing with diseases relating to the
heart, the circulatory system, and includes coronary heart disease, cerebrovascular disease, rheumatic heart
disease and other conditions. AI has emerged as a promising tool in cardiovascular medicine which is aimed in
augmenting the effectiveness of the cardiologist and to extend better quality to patients. It has the ability to
support decision‑making and improve diagnostic and prognostic performance. Attempt has also been made to
explore novel genotypes and phenotypes in existing cardiovascular diseases, improve the quality of patient care,
to enable cost-effectiveness with reduce readmission and mortality rates. Our review addresses the integration of
AI and laboratory medicine as an accelerator of personalization care associated with the precision and the need
of value creation services in cardiovascular medicine. Keywords: Artificial intelligence | Cardiology | Laboratory | Biomarkers | Data | Machine learning | Personalized |
مقاله انگلیسی |
8 |
Benchmarking the Performance and Energy Efficiency of AI Accelerators for AI Training
معیار عملکرد و بهره وری انرژی شتاب دهنده های هوش مصنوعی برای آموزش هوش مصنوعی-2020 Deep learning has become widely used in complex AI
applications. Yet, training a deep neural network (DNNs) model
requires a considerable amount of calculations, long running time,
and much energy. Nowadays, many-core AI accelerators (e.g.,
GPUs and TPUs) are designed to improve the performance of
AI training. However, processors from different vendors perform
dissimilarly in terms of performance and energy consumption.
To investigate the differences among several popular off-theshelf
processors (i.e., Intel CPU, NVIDIA GPU, AMD GPU, and
Google TPU) in training DNNs, we carry out a comprehensive
empirical study on the performance and energy efficiency of
these processors 1 by benchmarking a representative set of deep
learning workloads, including computation-intensive operations,
classical convolutional neural networks (CNNs), recurrent neural
networks (LSTM), Deep Speech 2, and Transformer. Different
from the existing end-to-end benchmarks which only present the
training time, We try to investigate the impact of hardware,
vendor’s software library, and deep learning framework on
the performance and energy consumption of AI training. Our
evaluation methods and results not only provide an informative
guide for end users to select proper AI accelerators, but also
expose some opportunities for the hardware vendors to improve
their software library. Index Terms: AI Accelerator | Deep Learning | CPU | GPU | TPU | Computation-intensive Operations | Convolution Neural Networks | Recurrent Neural Networks | Transformer | Deep Speech 2 |
مقاله انگلیسی |
9 |
Scale quickly or fail fast: An inductive study of acceleration
مقیاس سریع یا سریع شکست: مطالعه استقرایی شتاب-2020 Accelerators are a fast-growing form of entrepreneurship support. Literature about them remains descriptive and disjointed. While some consider them new, others believe them to be a next-generation incubator model. Based on a qualitative inductive study in India, with inputs from both accelerator executives and founders of accelerated ventures, we shift the analysis from the form (accelerator) to its underlying mechanism (acceleration). We identify at least three characteristics that make acceleration unique: a focus on product-market fit ventures; a focus on time-compressed scaling; and a focus on aggressive scalability testing. Our findings call for a shift in entrepreneurship support research (including accelerators) from “form” to “mechanism.” Entrepreneurs will find our three characteristics useful in assessing which programs truly accelerate, and therefore increase their chances of achieving scale. Accelerator executives can now distinguish their offerings from other support forms (e.g. incubators) by searching for ventures with product-market fit, offering time-compressed scaling services and testing the ventures’ ability to scale rapidly. University administrators and policymakers can use the findings to add acceleration (to support scaling) as a component of their entrepreneurial ecosystems. Implications and future research directions are discussed.
Keywords: Acceleration | Accelerator | Incubator | Entrepreneurship support | Entrepreneurial ecosystem |
مقاله انگلیسی |
10 |
پراکراست: یک جریان داده و شتاب دهنده برای شبکه عصبی مصنوعی پراکنده
سال انتشار: 2020 - تعداد صفحات فایل pdf انگلیسی: 14 - تعداد صفحات فایل doc فارسی: 44 موفقیت DNN به توسعه استنتاج انرژی سبب شدهاست که از مدلهای سرشاخه با وزن پراکنده و تنسور های فعالسازی پشتیبانی میکند . از آنجا که چیدمان های حافظه و جریان داده ها در این ساختارها برای الگوهای دسترسی طی استنتاجی بهینهسازی شدهاند ، با این حال ، آنها به طور موثر از تکنیکهای پیشرفته آموزش پراکنده پشتیبانی نمیکنند .
(الف)در این مقاله ، ما نشان میدهیم که تسریع آموزش پراکنده نیازمند یک رویکرد طراحی مشترک است که در آن الگوریتمها با محدودیتهای سختافزاری سازگار میشوند ، و ( ب ) سختافزار برای آموزش DNN پراکنده باید با محدودیتهایی که در شتاب دهنده استنتاج بوجود نمیآیند ، مقابله کند . به عنوان اثباتی بر مفهوم ، ما یک الگوریتم آموزشی پراکنده را با سرعت سختافزار سازگار میکنیم ؛ سپس جریان داده ، چیدمان دادهها ، و تکنیکهای توازن بار برای تسریع آن را توسعه میدهیم .
سیستم حاصل یک شتابدهنده آموزشی پراکنده است که سرشاخه را با دقت یکسان به عنوان مدلهای متراکم بدون آموزش اول ، سپس شاخه و در نهایت ، یک مدل متراکم ، تولید میکند . در مقایسه با آموزش مدلهای معادل بدون متراکم با استفاده از یک شتابدهنده - art DNN بدون پشتیبانی آموزشی پراکنده ، پراکراست انرژی کمتری مصرف میکند و چهار سرعت را در دامنه مدلها ارایه میدهد، در حالی که اوزان با ترتیب اندازه و حفظ دقت بدون سرشاخه افزایش مییابد .
|
مقاله ترجمه شده |