سال انتشار:
2020
عنوان انگلیسی مقاله:
PowerCoord: Power capping coordination for multi-CPU/GPU servers using reinforcement learning
ترجمه فارسی عنوان مقاله:
PowerCoord: هماهنگی محدود کردن قدرت برای سرورهای چند پردازنده CPU / GPU با استفاده از یادگیری تقویتی
منبع:
Sciencedirect - Elsevier - Sustainable Computing: Informatics and Systems, 28 (2020) 100412. doi:10.1016/j.suscom.2020.100412
نویسنده:
Reza Azimia,∗, Chao Jingb, Sherief Redaaa
چکیده انگلیسی:
Modern supercomputers and cloud providers rely on server nodes that are equipped with multiple CPU sockets and general purpose GPUs (GPGPUs) to handle the high demand for intensive computations. These nodes consume much higher power than commodity servers, and integrating them with power capping systems used in modern clusters presents new challenges. In this paper, we propose a new power capping controller, PowerCoord, that is specifically designed for servers with multiple CPU and GPU sockets that are running multiple jobs at a time. PowerCoord coordinates among the various power domains (e.g., CPU sockets and GPUs) inside a node server to meet target power caps, while seeking to maximize throughput. Our approach also takes into consideration job deadlines and priorities. Because performance modeling for co-located jobs is error-prone, PowerCoord uses a learning method. PowerCoord has a number of heuristic policies to allocate power among the various CPUs and GPUs, and it uses reinforcement learning for policy selection during runtime. Based on the observed state of the system, PowerCoord shifts the distribution of selected policies. We implement our power cap controller on a real multi-CPU/GPU server with low overhead, and we demonstrate that it is able to meet target power caps while maximizing the throughput, and balancing other demands such as priorities and deadlines. Our results show PowerCoord improves the server throughput on average by 18% compared with the case when power is not coordinated among CPU/GPU domains. Also, PowerCoord improves the server throughput on average by 11% compared with prior work that uses a heuristic approach to coordinate the power among domains
Keywords : Power capping | GPGPU acceleration | Reinforcement learning
قیمت: رایگان
توضیحات اضافی:
تعداد نظرات : 0