AI4DL: Mining Behaviors of Deep Learning Workloads for Resource Management

13/07/2020

AI4DL: Mining Behaviors of Deep Learning Workloads for Resource Management

Josep L. Berral, Chen Wang, Alaa Youssef

Keywords:

Abstract Paper Similar Papers

Abstract: The more we know about the resource usage patterns of workloads, the better we can allocate resources. Here we present a methodology to discover resource usage behaviors for the training workloads of Deep Learning (DL) models. From monitoring, we can observe repeating patterns and similitude of resource usage among containers running the training workloads of different DL models. The repeating patterns observed can be leveraged by the scheduler or the resource autoscaler to reduce resource fragmentation and overall resource utilization in a dedicated DL cluster. Specifically, our approach combines Conditional Restricted Boltzmann Machines (CRBMs) and clustering techniques to discover common sequences of behaviors (phases) of containers running the model training workloads in clusters providing IBM Deep Learning Services. By studying the resource usage pattern at each phase and the typical sequences of phases among different containers, we can discover a reduced set of prototypical executions representing most executions. We use statistical information from each phase to refine resource provisioning by dynamically tuning the amount of resource each container requires at each phase of its execution. Evaluation of our method shows that container resource usage displays typical patterns that can help reduce CPU and Memory consumption by 30% relative to reactive policies, which is close to having \emph{a-priori} knowledge of resource usage while fulfilling resource demand over 95% of the time.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at HotCloud 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

05/04/2021

FLAML: A Fast and Lightweight AutoML Library

Chi Wang, Qingyun Wu, Markus Weimer, Erkang Zhu

Keywords Paper

0

0

0

0

5:08

05/04/2021

FLAML: A Fast and Lightweight AutoML Library

Chi Wang, Qingyun Wu, Markus Weimer, Erkang Zhu

Keywords Paper

0

0

0

0

18:23

06/12/2021

Efficient Training of Retrieval Models using Negative Cache

Erik Lindgren, Sashank Reddi, Ruiqi Guo, Sanjiv Kumar

Keywords Paper

deep learning, machine learning

0

0

0

0

10:41

18/07/2021

Model Performance Scaling with Multiple Data Sources

Tatsunori Hashimoto

Keywords Paper

Algorithms, Supervised Learning

0

0

0

1

4:50

13/04/2021

Critical parameters for scalable distributed learning with large batches and asynchronous updates

Sebastian Stich, Amirkeivan Mohtashami, Martin Jaggi

Keywords Paper

0

0

0

0

3:00

06/12/2020

AutoSync: Learning to Synchronize for Data-Parallel Distributed Deep Learning

Hao Zhang, Yuan Li, Zhijie Deng and
Xiaodan Liang, Lawrence Carin, Eric Xing

Keywords Paper

0

0

0

0

3:32

12/07/2020

Associative Memory in Iterated Overparameterized Sigmoid Autoencoders

Yibo Jiang, Cengiz Pehlevan

Keywords Paper

Deep Learning - Generative Models and Autoencoders

0

0

0

0

13:37

06/12/2021

Efficient Combination of Rematerialization and Offloading for Training DNNs

Olivier Beaumont, Lionel Eyraud-Dubois, Alena Shilova

Keywords Paper

deep learning, optimization

0

0

0

0

15:03

06/12/2020

Predicting Training Time Without Training

Luca Zancato, Alessandro Achille, Avinash Ravichandran and
RAHUL Bhotika, Stefano Soatto

Keywords Paper

0

0

0

0

3:23

06/12/2021

Training Neural Networks with Fixed Sparse Masks

Yi-Lin Sung, Varun Nair, Colin Raffel

Keywords Paper

deep learning, transfer learning

0

0

0

0

14:20

18/11/2020

Deep-n-cheap: An automated search framework for low complexity deep learning

Sourya Dey, Saikrishna C. Kanala, Keith M. Chugg, Peter A. Beerel

Keywords Paper

0

0

0

0

11:59

18/07/2021

Scaling Multi-Agent Reinforcement Learning with Selective Parameter Sharing

Filippos Christianos, Georgios Papoudakis, Arrasy Rahman, Stefano V. Albrecht

Keywords Paper

Reinforcement Learning and Planning

0

0

0

0

4:44

23/08/2020

AutoFIS: Automatic feature interaction selection in factorization models for click-through rate prediction

Bin Liu, Chenxu Zhu, Guilin Li and
Weinan Zhang, Jincai Lai, Ruiming Tang, Xiuqiang He, Zhenguo Li, Yong Yu

Keywords Paper

feature selection, neural architecture search, recommendation, factorization machine

0

0

0

0

19:23

13/04/2021

On data efficiency of meta-learning

Maruan Al-Shedivat, Liam Li, Eric Xing, Ameet Talwalkar

Keywords Paper

0

0

0

0

3:24

25/07/2020

Automated embedding size search in deep recommender systems

Haochen Liu, Xiangyu Zhao, Chong Wang and
Xiaobing Liu, Jiliang Tang

Keywords Paper

embedding, recommender system, AutoML

0

0

0

0

16:19

06/12/2020

Learning Discrete Energy-based Models via Auxiliary-variable Local Exploration

Hanjun Dai, Rishabh Singh, Bo Dai and
Charles Sutton, Dale Schuurmans

Keywords Paper

0

0

0

0

3:23

06/12/2021

CAM-GAN: Continual Adaptation Modules for Generative Adversarial Networks

Sakshi Varshney, Vinay Kumar Verma, P. K. Srijith and
Lawrence Carin, Piyush Rai

Keywords Paper

generative model, representation learning, continual learning

0

0

0

0

14:50

03/05/2021

VA-RED$^2$: Video Adaptive Redundancy Reduction

Bowen Pan, Rameswar Panda, Camilo L Fosco and
Chung-Ching Lin, Alex J Andonian, Yue Meng, Kate Saenko, Aude Oliva, Rogerio Feris

Keywords Paper

0

0

0

0

5:02

13/04/2021

Hogwild! Over distributed local data sets with linearly increasing mini-batch sizes

Nhuong Nguyen, Toan Nguyen, PHUONG HA NGUYEN and
Quoc Tran-Dinh, Lam Nguyen, Marten Dijk

Keywords Paper

0

0

0

0

3:13

14/09/2020

Option Encoder: A Framework for Discovering a Policy Basis in Reinforcement Learning

Arjun Manoharan, Rahul Ramesh, Balaraman Ravindran

Keywords Paper

hierarchical reinforcement learning, policy distillation

0

0

0

0

13:49

03/05/2021

Few-Shot Bayesian Optimization with Deep Kernel Surrogates

Martin Wistuba, Josif Grabocka

Keywords Paper

automl, bayesian optimization, metalearning, few-shot learning

0

0

0

0

5:18

18/07/2021

Delving into Deep Imbalanced Regression

Yuzhe Yang, Kaiwen Zha, YINGCONG CHEN and
Hao Wang, Dina Katabi

Keywords Paper

Applications

0

0

0

0

16:37

06/12/2021

Meta-Learning via Learning with Distributed Memory

Sudarshan Babu, Pedro Savarese, Michael Maire

Keywords Paper

deep learning, optimization, machine learning, vision, meta learning, online learning

0

0

0

0

15:04

23/08/2020

Compositional embeddings using complementary partitions for memory-efficient recommendation systems

Hao-Jun Michael Shi, Dheevatsa Mudigere, Maxim Naumov, Jiyan Yang

Keywords Paper

embeddings, model compression, recommendation systems

0

0

0

0

16:14

12/07/2020

Towards Adaptive Residual Network Training: A Neural-ODE Perspective

chengyu dong, Liyuan Liu, Zichao Li, Jingbo Shang

Keywords Paper

Deep Learning - Algorithms

0

1

1

1

14:43

19/08/2021

Regularising Knowledge Transfer by Meta Functional Learning

Pan Li, Yanwei Fu, Shaogang Gong

Keywords Paper

Machine Learning, Classification, Transfer, Adaptation, Multi-task Learning, Weakly Supervised Learning

0

0

0

0

13:41

06/12/2020

Adaptive Gradient Quantization for Data-Parallel SGD

Fartash Faghri, Iman Tabrizian, Ilia Markov and
Dan Alistarh, Dan Roy, Ali Ramezani-Kebrya

Keywords Paper

0

0

0

0

3:20

26/04/2020

SlowMo: Improving Communication-Efficient Distributed SGD with Slow Momentum

Jianyu Wang, Vinayak Tantia, Nicolas Ballas, Michael Rabbat

Keywords Paper

distributed optimization, decentralized training methods, communication-efficient distributed training with momentum, large-scale parallel SGD

0

0

0

0

5:07

26/04/2020

Learning Efficient Parameter Server Synchronization Policies for Distributed SGD

Rong Zhu, Sheng Yang, Andreas Pfadler and
Zhengping Qian, Jingren Zhou

Keywords Paper

Distributed SGD, Paramter-Server, Synchronization Policy, Reinforcement Learning

0

0

0

0

4:45

18/07/2021

Lower Bounds on Cross-Entropy Loss in the Presence of Test-time Adversaries

Arjun Nitin Bhagoji, Daniel Cullina, Vikash Sehwag, Prateek Mittal

Keywords Paper

Algorithms, Adversarial Examples

0

0

0

0

5:10

03/05/2021

Neurally Augmented ALISTA

Freya Behrens, Jonathan Sauder, Peter Jung

Keywords Paper

learned ISTA, unrolled algorithms, compressed sensing, sparse reconstruction

0

0

0

0

5:18

18/07/2021

Efficient Iterative Amortized Inference for Learning Symmetric and Disentangled Multi-Object Representations

Patrick Emami, Pan He, Sanjay Ranka, Anand Rangarajan

Keywords Paper

Deep Learning, Embedding and Representation learning

0

0

0

0

5:10

06/12/2020

A Group-Theoretic Framework for Data Augmentation

Shuxiao Chen, Edgar Dobriban, Jane Lee

Keywords Paper

0

0

0

0

3:28

02/02/2021

GLISTER: Generalization based Data Subset Selection for Efficient and Robust Learning

Krishnateja Killamsetty, Durga Sivasubramanian, Ganesh Ramakrishnan, Rishabh Iyer

Keywords Paper

0

0

0

0

19:14

12/07/2020

Extrapolation for Large-batch Training in Deep Learning

Tao LIN, Lingjing Kong, Sebastian Stich, Martin Jaggi

Keywords Paper

Deep Learning - Algorithms

0

0

0

0

13:21

06/12/2020

Untangling tradeoffs between recurrence and self-attention in artificial neural networks

Giancarlo Kerg, bhargav104 Kanuparthi, Anirudh Goyal ALIAS PARTH GOYAL and
Kyle Goyette, Yoshua Bengio, Guillaume Lajoie

Keywords Paper

0

0

0

0

3:20

26/04/2020

Selection via Proxy: Efficient Data Selection for Deep Learning

Cody Coleman, Christopher Yeh, Stephen Mussmann and
Baharan Mirzasoleiman, Peter Bailis, Percy Liang, Jure Leskovec, Matei Zaharia

Keywords Paper

data selection, active-learning, core-set selection, deep learning, uncertainty sampling

0

0

0

0

4:46

18/07/2021

Message Passing Adaptive Resonance Theory for Online Active Semi-supervised Learning

Taehyeong Kim, Injune Hwang, Hyundo Lee and
Hyunseo Kim, Won-Seok Choi, Joseph Lim, Byoung-Tak Zhang

Keywords Paper

Algorithms, Active Learning

0

0

0

0

4:53

26/04/2020

DivideMix: Learning with Noisy Labels as Semi-supervised Learning

Junnan Li, Richard Socher, Steven C.H. Hoi

Keywords Paper

label noise, semi-supervised learning

0

0

0

0

5:00

02/02/2021

Hierarchical Reinforcement Learning for Integrated Recommendation

Ruobing Xie, Shaoliang Zhang, Rui Wang and
Feng Xia, Leyu Lin

Keywords Paper

0

0

0

0

17:08