Orthogonalized SGD and Nested Architectures for Anytime Neural Networks

12/07/2020

Orthogonalized SGD and Nested Architectures for Anytime Neural Networks

Chengcheng Wan, Henry (Hank) Hoffmann, Shan Lu, Michael Maire

Keywords: Deep Learning - General

Abstract Paper Similar Papers

Abstract: We propose a novel variant of SGD customized for training network architectures that support anytime behavior: such networks produce a series of increasingly accurate outputs over time. Efficient architectural designs for these networks focus on re-using internal state; subnetworks must produce representations relevant for both immediate prediction as well as refinement by subsequent network stages. We consider traditional branched networks as well as a new class of recursively nested networks. Our new optimizer, Orthogonalized SGD, dynamically re-balances task-specific gradients when training a multitask network. In the context of anytime architectures, this optimizer projects gradients from later outputs onto a parameter subspace that does not interfere with those from earlier outputs. Experiments demonstrate that training with Orthogonalized SGD significantly improves generalization accuracy of anytime networks.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at ICML 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

05/04/2021

Pipelined Backpropagation at Scale: Training Large Models without Batches

Atli Kosson, Vitaliy Chiley, Abhi Venigalla and
Joel Hestness, Urs Koster

Keywords Paper

0

0

0

0

18:00

05/04/2021

Pipelined Backpropagation at Scale: Training Large Models without Batches

Atli Kosson, Vitaliy Chiley, Abhi Venigalla and
Joel Hestness, Urs Koster

Keywords Paper

0

0

0

0

4:14

14/06/2020

Mnemonics Training: Multi-Class Incremental Learning Without Forgetting

Yaoyao Liu, Yuting Su, An-An Liu and
Bernt Schiele, Qianru Sun

Keywords Paper

incremental learning, continual learning, classification, recognition, transfer learning, representation learning, bilevel optimization, online learning, imagenet, cifar-100

0

0

0

0

5:01

06/12/2020

Ode to an ODE

Krzysztof Choromanski, Jared Davis, Valerii Likhosherstov and
Xingyou Song, Jean-Jacques Slotine, Jacob Varley, Honglak Lee, Adrian Weller, Vikas Sindhwani

Keywords Paper

0

0

0

0

3:16

02/02/2021

Towards Reusable Network Components by Learning Compatible Representations

Michael Gygli, Jasper Uijlings, Vittorio Ferrari

Keywords Paper

0

0

0

0

19:58

18/07/2021

AdaXpert: Adapting Neural Architecture for Growing Data

Shuaicheng Niu, Jiaxiang Wu, Guanghui Xu and
Yifan Zhang, Yong Guo, Peilin Zhao, Peng Wang, Mingkui Tan

Keywords Paper

Deep Learning, Reinforcement Learning and Planning, Reinforcement Learning, Algorithms, AutoML

0

0

0

0

5:14

03/05/2021

Teaching with Commentaries

Aniruddh Raghu, Maithra Raghu, Simon Kornblith and
David Duvenaud, Geoffrey Hinton

Keywords Paper

hypergradients, metalearning, learning to teach

0

0

0

0

5:11

26/08/2020

Beyond exploding and vanishing gradients: analysing RNN training using attractors and smoothness

Antônio H. Ribeiro, Koen Tiels, Luis A. Aguirre, Thomas Schön

Keywords Paper

0

0

0

0

15:04

14/06/2020

APQ: Joint Search for Network Architecture, Pruning and Quantization Policy

Tianzhe Wang, Kuan Wang, Han Cai and
Ji Lin, Zhijian Liu, Hanrui Wang, Yujun Lin, Song Han

Keywords Paper

efficiency, model compression, joint design, neural architecture search, channel pruning, mixed-precision quantization

0

0

0

0

1:00

12/07/2020

TaskNorm: Rethinking Batch Normalization for Meta-Learning

John Bronskill, Jonathan Gordon, James Requeima and
Sebastian Nowozin, Richard Turner

Keywords Paper

Transfer, Multitask and Meta-learning

0

0

0

0

13:56

02/02/2021

Self-Progressing Robust Training

Minhao Cheng, Pin-Yu Chen, Sijia Liu and
Shiyu Chang, Cho-Jui Hsieh, Payel Das

Keywords Paper

0

0

0

0

14:34

26/04/2020

Adversarially robust transfer learning

Ali Shafahi, Parsa Saadatpanah, Chen Zhu and
Amin Ghiasi, Christoph Studer, David Jacobs, Tom Goldstein

Keywords Paper

0

0

0

0

4:58

18/07/2021

Bayesian Structural Adaptation for Continual Learning

Abhishek Kumar, Sunabha Chatterjee, Piyush Rai

Keywords Paper

Probabilistic Methods, Bayesian Methods

0

0

0

0

7:39

06/12/2021

Meta-learning to Improve Pre-training

Aniruddh Raghu, Jonathan Lorraine, Simon Kornblith and
Matthew McDermott, David Duvenaud

Keywords Paper

deep learning, optimization, graph learning, meta learning

0

0

0

0

12:57

14/06/2020

Exemplar Normalization for Learning Deep Representation

Ruimao Zhang, Zhanglin Peng, Lingyun Wu and
Zhen Li, Ping Luo

Keywords Paper

normalization, learning to normalize, sample-adaptive, deep learning, image classification, semantic segmentation

0

0

0

0

1:00

08/12/2020

E.T.: Entity-Transformers. Coreference augmented Neural Language Model for richer mention representations via Entity-Transformer blocks

Nikolaos Stylianou, Ioannis Vlahavas

Keywords Paper

0

0

0

0

8:49

14/06/2020

DNU: Deep Non-Local Unrolling for Computational Spectral Imaging

Lizhi Wang, Chen Sun, Maoqing Zhang and
Ying Fu, Hua Huang

Keywords Paper

computational spectral imaging, spectral image reconstruction, deep unrolling, non-local similarity, deep prior, image compressive sensing

0

0

0

0

1:01

18/07/2021

ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training

Jianfei Chen, Lianmin Zheng, Zhewei Yao and
Dequan Wang, Ion Stoica, Michael Mahoney, Joseph E Gonzalez

Keywords Paper

Algorithms, Large Scale Learning

0

0

0

0

18:54

02/02/2021

Dynamically Grown Generative Adversarial Networks

Lanlan Liu, Yuting Zhang, Jia Deng, Stefano Soatto

Keywords Paper

0

0

0

0

14:46

06/12/2021

Topographic VAEs learn Equivariant Capsules

T. Anderson Keller, Max Welling

Keywords Paper

deep learning, generative model, graph learning

0

0

0

0

9:58

02/02/2021

Toward Robust Long Range Policy Transfer

Wei-Cheng Tseng, Jin-Siang Lin, Yao-Min Feng, Min Sun

Keywords Paper

0

0

0

0

14:02

26/10/2020

Integrating Acting, Planning, and Learning in Hierarchical Operational Models

Sunandita Patra, James Mason, Amit Kumar and
Malik Ghallab, Paolo Traverso, Dana Nau

Keywords Paper

integrated planning and acting, integrated planning and learning, hierarchical operational models, online planning, dynamic environments

0

0

0

0

11:35

18/07/2021

Improving Generalization in Meta-learning via Task Augmentation

Huaxiu Yao, Long-Kai Huang, Linjun Zhang and
Ying WEI, Li Tian, James Zou, Junzhou Huang, Zhenhui (Jessie) Li

Keywords Paper

Algorithms, Multitask, Transfer, and Meta Learning

0

0

0

0

8:27

06/12/2020

ExpandNets: Linear Over-parameterization to Train Compact Convolutional Networks

Shuxuan Guo, Jose M. Alvarez, Mathieu Salzmann

Keywords Paper

0

0

0

0

3:20

18/07/2021

Learning Neural Network Subspaces

Mitchell Wortsman, Maxwell Horton, Carlos Guestrin and
Ali Farhadi, Mohammad Rastegari

Keywords Paper

Deep Learning, Applications, Dialog- or Communication-Based Learning, Algorithms, Representation Learning

0

0

0

0

5:07

06/12/2020

Optimizing Neural Networks via Koopman Operator Theory

Akshunna S. Dogra, Will Redman

Keywords Paper

0

0

0

0

3:12

06/12/2021

Heuristic-Guided Reinforcement Learning

Ching-An Cheng, Andrey Kolobov, Adith Swaminathan

Keywords Paper

theory, reinforcement learning and planning

0

0

0

0

8:39

05/01/2021

Analyzing Deep Neural Network's Transferability via Frechet Distance

Yifan Ding, Liqiang Wang, Boqing Gong

Keywords Paper

0

0

0

0

4:59

19/04/2021

Bootstrapping relation extractors using syntactic search by examples

Matan Eyal, Asaf Amrami, Hillel Taub-Tabib, Yoav Goldberg

Keywords Paper

0

0

0

0

9:55

03/05/2021

Latent Skill Planning for Exploration and Transfer

Kevin Xie, Homanga Bharadhwaj, Danijar Hafner and
Animesh Garg, Florian Shkurti

Keywords Paper

Partial Amortization, Model Predictive Control, Planning, Mutual Information, Skill Discovery, World Models, Model-Based Reinforcement Learning

0

0

0

0

5:10

22/11/2021

Meta-learning the Learning Trends Shared Across Tasks

Jathushan Rajasegaran, Salman Khan, Munawar Hayat and
Fahad Shahbaz Khan, Mubarak Shah

Keywords Paper

Meta-learning, Few-shot learning

0

0

0

0

2:38

14/06/2020

AdaBits: Neural Network Quantization With Adaptive Bit-Widths

Qing Jin, Linjie Yang, Zhenyu Liao

Keywords Paper

neural network quantization, adaptive model, model compression

0

0

0

0

1:01

18/07/2021

Parallelizing Legendre Memory Unit Training

Narsimha Reddy Chilkuri, Chris Eliasmith

Keywords Paper

Deep Learning, Architectures

0

0

0

0

5:13

12/07/2020

Generative Flows with Matrix Exponential

Changyi Xiao, Ligang Liu

Keywords Paper

Deep Learning - Generative Models and Autoencoders

0

0

0

0

8:47

06/12/2020

Adaptive Gradient Quantization for Data-Parallel SGD

Fartash Faghri, Iman Tabrizian, Ilia Markov and
Dan Alistarh, Dan Roy, Ali Ramezani-Kebrya

Keywords Paper

0

0

0

0

3:20

05/01/2021

MPRNet: Multi-Path Residual Network for Lightweight Image Super Resolution

Armin Mehri, Parichehr B. Ardakani, Angel D. Sappa

Keywords Paper

0

0

0

0

4:57

04/07/2020

Learning Architectures from an Extended Search Space for Language Modeling

Yinqiao Li, Chi Hu, Yuhao Zhang and
Nuo Xu, Yufan Jiang, Tong Xiao, Jingbo Zhu, Tongran Liu, Changliang Li

Keywords Paper

Language Modeling, intra-cell NAS, recurrent modeling, CoNLL task

0

0

0

0

10:28

12/07/2020

Generative Teaching Networks: Accelerating Neural Architecture Search by Learning to Generate Synthetic Training Data

Felipe Petroski Such, Aditya Rawal, Joel Lehman and
Kenneth Stanley, Jeffrey Clune

Keywords Paper

Transfer, Multitask and Meta-learning

0

0

0

0

7:25

03/05/2021

Dataset Meta-Learning from Kernel Ridge-Regression

Timothy Nguyen, Zhourong Chen, Jaehoon Lee

Keywords Paper

dataset corruption, infinite-width networks, neural kernels, kernel-ridge regression, dataset compression, dataset distillation, meta-learning

0

0

0

0

4:59

13/04/2021

On the importance of hyperparameter optimization for model-based reinforcement learning

Baohe Zhang, Raghu Rajan, Luis Pineda and
Nathan Lambert, André Biedenkapp, Kurtland Chua, Frank Hutter, Roberto Calandra

Keywords Paper

0

0

0

0

2:59