Deep Learning 1

The session Deep Learning 1 will be held on tuesday, 2019-09-17, from 11:00 to 12:40, at room 0.004 (AOK-HS). The session chair is Elisa Fromont.

Talks


11:00 - 11:20 Importance Weighted Generative Networks (21) Maurice Diesendruck (The University of Texas at Austin), Ethan R. Elenberg (ASAPP, Inc.), Rajat Sen (Amazon, Inc.), Guy W. Cole (The University of Texas at Austin), Sanjay Shakkottai (The University of Texas at Austin), Sinead A. Williamson (The University of Texas at Austin; CognitiveScale) While deep generative networks can simulate from complex data distributions, their utility can be hindered by limitations on the data available for training. Specifically, the training data distribution may differ from the target sampling distribution due to sample selection bias, or because the training data comes from a different but related distribution. We present methods to accommodate this difference via importance weighting, which allow us to estimate a loss function with respect to a target distribution even if we cannot access that distribution directly. These estimators, which differentially weight the contribution of data to the loss function, offer theoretical guarantees that heuristic approaches lack, while giving impressive empirical performance in a variety of settings. Reproducible Research
11:20 - 11:40 Linearly Constrained Weights: Reducing Activation Shift for Faster Training of Neural Networks (98) Takuro Kutsuna (Toyota Central R&D Labs. Inc.) In this paper, we first identify activation shift, a simple but remarkablephenomenon in aneural network in whichthe preactivation value of a neuron has non-zero mean that depends on the anglebetween the weight vector of the neuron and the mean of the activationvector in the previous layer. We then propose linearly constrained weights (LCW) to reducethe activation shift in both fully connected and convolutional layers.The impact of reducing the activation shift in a neural network is studiedfrom the perspective of how the variance of variables in the networkchanges through layer operations in both forward and backward chains.We also discuss its relationship to the vanishing gradient problem.Experimental results show that LCW enables a deep feedforward networkwith sigmoid activation functions to be trained efficientlyby resolving the vanishing gradient problem.Moreover, combined with batch normalization, LCW improvesgeneralization performance of both feedforward andconvolutional networks.
11:40 - 12:00 Adversarial Invariant Feature Learning with Accuracy Constraint for Domain Generalization (278) Kei Akuzawa (University of Tokyo), Yusuke Iwasawa (University of Tokyo), Yutaka Matsuo (University of Tokyo) Learning domain-invariant representation is a dominant approach for domain generalization (DG), where we need to build a classifier that is robust toward domain shifts.However, previous domain-invariance-based methods overlooked the underlying dependency of classes on domains, which is responsible for the trade-off between classification accuracy and domain invariance.Because the primary purpose of DG is to classify unseen domains rather than the invariance itself, the improvement of the invariance can negatively affect DG performance under this trade-off.To overcome the problem, this study first expands the analysis of the trade-off by Xie et. al., and provides the notion of accuracy-constrained domain invariance, which means the maximum domain invariance within a range that does not interfere with accuracy.We then propose a novel method adversarial feature learning with accuracy constraint (AFLAC), which explicitly leads to that invariance on adversarial training.Empirical validations show that the performance of AFLAC is superior to that of domain-invariance-based methods on both synthetic and three real-world datasets, supporting the importance of considering the dependency and the efficacy of the proposed method. Reproducible Research
12:00 - 12:20 Meta-Learning for Black-box Optimization (576) Vishnu TV (TCS Research, New Delhi), Pankaj Malhotra (TCS Research, New Delhi), Jyoti Narwariya (TCS Research, New Delhi), Lovekesh Vig (TCS Research, New Delhi), Gautam Shroff (TCS Research, New Delhi) Recently, neural networks trained as optimizers under the "learning to learn" or meta-learning framework have been shown to be effective for a broad range of optimization tasks including derivative-free black-box function optimization. Recurrent neural networks (RNNs) trained to optimize a diverse set of synthetic non-convex differentiable functions via gradient descent have been effective at optimizing derivative-free black-box functions.In this work, we propose RNN-Opt: an approach for learning RNN-based optimizers for optimizing real-parameter single-objective continuous functions under limited budget constraints.Existing approaches utilize an observed improvement based meta-learning loss function for training such models. We propose training RNN-Opt by using synthetic non-convex functions with known (approximate) optimal values by directly using discounted regret as our meta-learning loss function. We hypothesize that a regret-based loss function mimics typical testing scenarios, and would therefore lead to better optimizers compared to optimizers trained only to propose queries that improve over previous queries.Further, RNN-Opt incorporates simple yet effective enhancements during training and inference procedures to deal with the following practical challenges: (i) Unknown range of possible values for the black-box function to be optimized, and (ii) Practical and domain-knowledge based constraints on the input parameters.We demonstrate the efficacy of RNN-Opt in comparison to existing methods on several synthetic as well as standard benchmark black-box functions along with an anonymized industrial constrained optimization problem.
12:20 - 12:40 Training Discrete-Valued Neural Networks with Sign Activations Using Weight Distributions (596) Wolfgang Roth (Graz University of Technology), Günther Schindler (Ruprecht Karls University, Heidelberg), Holger Fröning (Ruprecht Karls University, Heidelberg), Franz Pernkopf (Graz University of Technology) Since resource-constrained devices hardly benefit from the trend towards ever-increasing neural network (NN) structures, there is growing interest in designing more hardware-friendly NNs.In this paper, we consider the training of NNs with discrete-valued weights and sign activation functions that can be implemented more efficiently in terms of inference speed, memory requirements, and power consumption.We build on the framework of probabilistic forward propagations using the local reparameterization trick, where instead of training a single set of NN weights we rather train a distribution over these weights.Using this approach, we can perform gradient-based learning by optimizing the continuous distribution parameters over discrete weights while at the same time perform backpropagation through the sign activation.In our experiments, we investigate the influence of the number of weights on the classification performance on several benchmark datasets, and we show that our method achieves state-of-the-art performance. Reproducible Research

Parallel Sessions