Fall 2015

Standard

Sep 16
Bren Hall 4011
1 pm
Hanie Sedghi
Graduate Student
Department of Electrical Engineering
University of Southern California

Learning with big data is a challenging task that requires smart and efficient methods to extract useful information from data. Optimization methods, both convex and nonconvex are promising approaches to do this. In this talk I will review two classes of my work on prominent problems in convex and nonconvex optimization.

Beating the Perils of Non-Convexity: Guaranteed Training of Neural Networks using Tensor Method: Neural networks provide a versatile tool for approximating functions of various inputs. Despite exciting achievements in application, a theoretical understanding of them is mostly lacking. Training a neural network is a highly nonconvex problem and backpropagation can get stuck in local optima. For the first time, we have a computationally efficient method for training neural networks that also has guaranteed generalization. This is part of our recently proposed general framework based on method-of-moments and tensor decomposition to efficiently learn different models such as neural networks and mixture of classifiers.

Breaking Curse of Dimensionality: Stochastic Optimization in high dimensions: We have designed an efficient stochastic optimization method based on ADMM that is fast and cheap to implement, can be performed in parallel and can be used for any regularized optimization framework with some mild assumptions. We have proved that our algorithm obtains minimax optimal convergence rates for sparse optimization and robust PCA framework. Experiment results show that in the aforementioned scenarios, our method outperforms state-of-the-art, i.e., yields smaller error with equal time.

Oct 5
Bren Hall 4011
1 pm
Gokcan Karakus
Graduate Student
Department of Civil Engineering
Caltech

We are proposing an algorithm to test the accuracy of the predictions by earthquake early warning systems. Most warning systems predict the location and the magnitude of an ongoing earthquake via the early-arriving seismic wave data. Our algorithm uses logarithm of ratios between observed ground motion envelopes and Virtual Seismologist’s (Cua G. and Heaton T.) predicted envelopes to assess the validity of system predictions. We quantify the uncertainty attached to our parameters using Bayesian probability approach.
Oct 12
Bren Hall 4011
1 pm
Alexander Ihler
Associate Professor
Department of Computer Science
University of California, Irvine

Importance sampling (IS) and its variant, annealed IS (AIS) have been widely used for estimating the partition function in graphical models, such as Markov random fields and deep generative models. However, IS tends to underestimate the partition function and is subject to high variance when the proposal distribution is more peaked than the target distribution. On the other hand, “reverse” versions of IS and AIS tend to overestimate the partition function, and degenerate when the target distribution is more peaked than the proposal distribution. We present a simple, general method that gives much more reliable and robust estimates than either IS (AIS) or reverse IS (AIS). Our method works by converting the estimation problem into a simple classification problem that discriminates between the samples drawn from the target and the proposal. We give both theoretical and empirical justification, and show that an annealed version of our method significantly outperforms both AIS and reverse AIS (Burda et al., 2015), which has been the state-of-the-art for likelihood evaluation in deep generative models. Joint work with Qiang Liu, Jian Peng, and John Fisher.
Oct 19
Bren Hall 4011
1 pm
Zhiying Wang
Assistant Professor
Department of Electrical Engineering
University of California, Irvine

In this talk, we propose the multi-version coding problem for distributed storage. We consider a setting where there are n servers that aim to store v versions of a message, and there is a total ordering on the versions from the earliest to the latest. We assume that each message version has a given number of bits. Each server can receive any subset of the v versions and stores a function of the message versions it receives. The multi-version code we consider ensures that, a decoder that connects to any c out of the n servers can recover the message corresponding to the latest common version stored among those servers, or a message corresponding to a version that is later than the latest common version. We describe a simple and explicit achievable scheme, as well as an information-theoretic converse. Moreover, we apply the multi-version code to one of the problems in distributed algorithms – the emulation of atomic shared memory in a message-passing network – and improve upon previous algorithms up to a half in terms of storage cost.
Oct 26
Bren Hall 4011
1 pm
Soheil Feizi
Graduate Student
CSAIL
MIT

Network models provide a unifying framework for understanding dependencies among variables in medical, biological, and other sciences. Networks can be used to reveal underlying data structures, infer functional modules, and facilitate experiment design. In practice, however, size, uncertainty and complexity of the underlying associations render these applications challenging.

In this talk, we illustrate the use of spectral, combinatorial, and statistical inference techniques in several significant network science problems. First, we consider the problem of network alignment where the goal is to find a bijective mapping between nodes of two networks to maximize their overlapping edges while minimizing mismatches. To solve this combinatorial problem, we present a new scalable spectral algorithm, and establish its efficiency theoretically and experimentally over several synthetic and real networks. Next, we introduce network maximal correlation (NMC) as an essential measure to capture nonlinear associations in networks. We characterize NMC using geometric properties of Hilbert spaces and illustrate its application in learning network topology when variables have unknown nonlinear dependencies. Finally, we discuss the problem of learning low dimensional structures (such as clusters) in large networks, where we introduce logistic Random Dot Product Graphs, a new class of networks which includes most stochastic block models as well as other low dimensional structures. Using this model, we propose a spectral network clustering algorithm that possesses robust performance under different clustering setups. In all of these problems, we examine underlying fundamental limits and present efficient algorithms for solving them. We also highlight applications of the proposed algorithms to data-driven problems such as functional and regulatory genomics of human diseases, and cancer.

Bio: Soheil Feizi is a PhD candidate at Massachusetts Institute of Technology (MIT), co-supervised by Prof. Muriel Médard and Prof. Manolis Kellis. His research interests include analysis of complex networks and the development of inference and learning methods based on Optimization, Information Theory, Machine Learning, Statistics, and Probability, with applications in Computational Biology, and beyond. He completed his B.Sc. at Sharif University of Technology, awarded as the best student of his class. He received the Jacobs Presidential Fellowship and EECS Great Educators Fellowship, both from MIT. He has been a finalist in the Qualcomm Innovation contest. He received an Ernst Guillemin Award for his Master of Science Thesis in the department of Electrical Engineering and Computer Science at MIT.

Nov 2
Bren Hall 4011
1 pm
Surya Ganguli
Assistant Professor
Department of Applied Physics
Stanford University

Neuronal networks have enjoyed a resurgence both in the worlds of neuroscience, where they yield mathematical frameworks for thinking about complex neural datasets, and in machine learning, where they achieve state of the art results on a variety of tasks, including machine vision, speech recognition, and language translation. Despite their empirical success, a mathematical theory of how deep neural circuits, with many layers of cascaded nonlinearities, learn and compute remains elusive. We will discuss three recent vignettes in which ideas from statistical physics can shed light on this issue. In particular, we show how dynamical criticality can help in neural learning, how the non-intuitive geometry of high dimensional error landscapes can be exploited to speed up learning, and how modern ideas from non-equilibrium statistical physics, like the Jarzynski equality, can be extended to yield powerful algorithms for modeling complex probability distributions. Time permitting, we will also discuss the relationship between neural network learning dynamics and the developmental time course of semantic concepts in infants.
Nov 9
Bren Hall 4011
1 pm
Javier Larrosa
Professor
Llenguatges i Sistemes Informàtics
Universitat Politècnica de Catalunya

Weighted Max-SAT is an extension of SAT in which each clause has an associated cost. The goal is to minimize the cost of falsified clauses. Max-SAT has been successfully applied to a number of domains including Bioinformatics, Telecommunications and Scheduling.

In this talk I will introduce the Max-SAT framework and discuss the main solving approaches. In particular, I will present Max-resolution and will show how it can be effectively used in the context of Depth-first Branch-and-Bound.

Nov 16
Bren Hall 4011
1 pm
Golnaz Ghiasi
Graduate Student
Department of Computer Science
University of California, Irvine

Occlusion poses a significant difficulty for detecting and localizing object keypoints and subsequent fine-grained identification. In this talk, I will describe a hierarchical deformable part model for face detection and keypoint localization that explicitly models part occlusion. The proposed model structure makes it possible to augment positive training data with large numbers of synthetically occluded instances. This allows us to easily incorporate the statistics of occlusion patterns in a discriminatively trained model. However, this model does not exploit bottom-up cues such as detection of occluding contours and image segments. I will talk about how to modify the proposed model to utilize bottom-up class-specific segmentation in order to jointly detect and segment out the foreground pixels belonging to the face.
Nov 23
Thankgiving week
(no seminar)

Nov 30
Bren Hall 4011
1 pm
Dimitrios Kotzias
Graduate Student
Department of Computer Science
University of California, Irvine

In many classification problems labels are relatively scarce. One context in which this occurs is where we have labels for groups of instances but not for the instances themselves, as in multi-instance learning. Past work on this problem has typically focused on learning classifiers to make predictions at the group level. In this paper we focus on the problem of learning classifiers to make predictions at the instance level. To achieve this we propose a new objective function that encourages smoothness of inferred instance-level labels based on instance-level similarity, while at the same time respecting group-level label constraints. We apply this approach to the problem of predicting labels for sentences given labels for reviews, using a convolutional neural network to infer sentence similarity. The approach is evaluated using three large review data sets from IMDB, Yelp, and Amazon, and we demonstrate the proposed approach is both accurate and scalable compared to various alternatives.
Dec 7
Finals week
(no seminar)