با سلام خدمت کاربران در صورتی که با خطای سیستم پرداخت بانکی مواجه شدید از طریق کارت به کارت (6037997535328901 بانک ملی ناصر خنجری ) مقاله خود را دریافت کنید (تا مشکل رفع گردد).
دسته بندی:
داده های بزرگ - big data
سال انتشار:
2018
عنوان انگلیسی مقاله:
A multi-factor monitoring fault tolerance model based on a GPU cluster for big data processing
ترجمه فارسی عنوان مقاله:
مدل تحمل نظارت بر گسل چند عامل بر اساس یک خوشه GPU برای پردازش داده های بزرگ
منبع:
Sciencedirect - Elsevier - Information Sciences, Corrected proof: doi:10:1016/j:ins:2018:04:053
نویسنده:
Yuling Fang a, Qingkui Chen a,∗, Naixue Xiong a,b
چکیده انگلیسی:
High-performance computing clusters are widely used in large-scale data mining applica
tions, and have higher requirements for persistence, stability and real-time use and sre
therefore computationally intensive. To support large-scale data processing, we design a
multi-factor real-time monitoring fault tolerance (MRMFT) model based on a GPU clus
ter. However, the higher clock frequency of GPU chips results in excessively high energy
consumption in computing systems. Moreover, the ability to support a long-lasting high
temperature operation varies greatly between different GPUs owing to the individual dif
ferences between the chips. In this paper, we design a GPU cluster energy consumption
monitoring system based on wireless sensor networks (WSNs) and propose an energy con
sumption aware checkpointing (ECAC) for high energy consumption problems with the
following two advantages: the system sets checkpoints according to actual energy con
sumption and the device temperature to improve the utilization of checkpoints and re
duce time cost; and it exploits the parallel computing features of CPU and GPU to hide
the CPU detection overhead in GPU parallel computation, and further reduce the time and
energy consumption overhead in the fault tolerance phase. Using ECAC as the constraint
and aiming for a persistent and reliable operation, the dynamic task migration mechanism
is designed, and the reliability of the cluster is greatly improved. The theoretical analysis
and experiment results show that the model improves the persistence and stability of the
computing system while reducing checkpoint overhead.
Keywords: Big data processing ، GPU cluster ، Persistence computing ، Energy consumption ، Fault tolerance ، Energy consumption aware heckpointing
، Task migration
قیمت: رایگان
توضیحات اضافی:
تعداد نظرات : 0