Sep 16
Bren Hall 4011 1 pm 
Learning with big data is a challenging task that requires smart and efficient methods to extract useful information from data. Optimization methods, both convex and nonconvex are promising approaches to do this. In this talk I will review two classes of my work on prominent problems in convex and nonconvex optimization.
Beating the Perils of NonConvexity: Guaranteed Training of Neural Networks using Tensor Method: Neural networks provide a versatile tool for approximating functions of various inputs. Despite exciting achievements in application, a theoretical understanding of them is mostly lacking. Training a neural network is a highly nonconvex problem and backpropagation can get stuck in local optima. For the first time, we have a computationally efficient method for training neural networks that also has guaranteed generalization. This is part of our recently proposed general framework based on methodofmoments and tensor decomposition to efficiently learn different models such as neural networks and mixture of classifiers. Breaking Curse of Dimensionality: Stochastic Optimization in high dimensions: We have designed an efficient stochastic optimization method based on ADMM that is fast and cheap to implement, can be performed in parallel and can be used for any regularized optimization framework with some mild assumptions. We have proved that our algorithm obtains minimax optimal convergence rates for sparse optimization and robust PCA framework. Experiment results show that in the aforementioned scenarios, our method outperforms stateoftheart, i.e., yields smaller error with equal time. 
Oct 5
Bren Hall 4011 1 pm 
We are proposing an algorithm to test the accuracy of the predictions by earthquake early warning systems. Most warning systems predict the location and the magnitude of an ongoing earthquake via the earlyarriving seismic wave data. Our algorithm uses logarithm of ratios between observed ground motion envelopes and Virtual Seismologist’s (Cua G. and Heaton T.) predicted envelopes to assess the validity of system predictions. We quantify the uncertainty attached to our parameters using Bayesian probability approach. 
Oct 12
Bren Hall 4011 1 pm 
Importance sampling (IS) and its variant, annealed IS (AIS) have been widely used for estimating the partition function in graphical models, such as Markov random fields and deep generative models. However, IS tends to underestimate the partition function and is subject to high variance when the proposal distribution is more peaked than the target distribution. On the other hand, “reverse” versions of IS and AIS tend to overestimate the partition function, and degenerate when the target distribution is more peaked than the proposal distribution. We present a simple, general method that gives much more reliable and robust estimates than either IS (AIS) or reverse IS (AIS). Our method works by converting the estimation problem into a simple classification problem that discriminates between the samples drawn from the target and the proposal. We give both theoretical and empirical justification, and show that an annealed version of our method significantly outperforms both AIS and reverse AIS (Burda et al., 2015), which has been the stateoftheart for likelihood evaluation in deep generative models. Joint work with Qiang Liu, Jian Peng, and John Fisher. 
Oct 19
Bren Hall 4011 1 pm 
In this talk, we propose the multiversion coding problem for distributed storage. We consider a setting where there are n servers that aim to store v versions of a message, and there is a total ordering on the versions from the earliest to the latest. We assume that each message version has a given number of bits. Each server can receive any subset of the v versions and stores a function of the message versions it receives. The multiversion code we consider ensures that, a decoder that connects to any c out of the n servers can recover the message corresponding to the latest common version stored among those servers, or a message corresponding to a version that is later than the latest common version. We describe a simple and explicit achievable scheme, as well as an informationtheoretic converse. Moreover, we apply the multiversion code to one of the problems in distributed algorithms – the emulation of atomic shared memory in a messagepassing network – and improve upon previous algorithms up to a half in terms of storage cost. 
Oct 26
Bren Hall 4011 1 pm 
Network models provide a unifying framework for understanding dependencies among variables in medical, biological, and other sciences. Networks can be used to reveal underlying data structures, infer functional modules, and facilitate experiment design. In practice, however, size, uncertainty and complexity of the underlying associations render these applications challenging.
In this talk, we illustrate the use of spectral, combinatorial, and statistical inference techniques in several significant network science problems. First, we consider the problem of network alignment where the goal is to find a bijective mapping between nodes of two networks to maximize their overlapping edges while minimizing mismatches. To solve this combinatorial problem, we present a new scalable spectral algorithm, and establish its efficiency theoretically and experimentally over several synthetic and real networks. Next, we introduce network maximal correlation (NMC) as an essential measure to capture nonlinear associations in networks. We characterize NMC using geometric properties of Hilbert spaces and illustrate its application in learning network topology when variables have unknown nonlinear dependencies. Finally, we discuss the problem of learning low dimensional structures (such as clusters) in large networks, where we introduce logistic Random Dot Product Graphs, a new class of networks which includes most stochastic block models as well as other low dimensional structures. Using this model, we propose a spectral network clustering algorithm that possesses robust performance under different clustering setups. In all of these problems, we examine underlying fundamental limits and present efficient algorithms for solving them. We also highlight applications of the proposed algorithms to datadriven problems such as functional and regulatory genomics of human diseases, and cancer. Bio: Soheil Feizi is a PhD candidate at Massachusetts Institute of Technology (MIT), cosupervised by Prof. Muriel Médard and Prof. Manolis Kellis. His research interests include analysis of complex networks and the development of inference and learning methods based on Optimization, Information Theory, Machine Learning, Statistics, and Probability, with applications in Computational Biology, and beyond. He completed his B.Sc. at Sharif University of Technology, awarded as the best student of his class. He received the Jacobs Presidential Fellowship and EECS Great Educators Fellowship, both from MIT. He has been a finalist in the Qualcomm Innovation contest. He received an Ernst Guillemin Award for his Master of Science Thesis in the department of Electrical Engineering and Computer Science at MIT. 
Nov 2
Bren Hall 4011 1 pm 
Neuronal networks have enjoyed a resurgence both in the worlds of neuroscience, where they yield mathematical frameworks for thinking about complex neural datasets, and in machine learning, where they achieve state of the art results on a variety of tasks, including machine vision, speech recognition, and language translation. Despite their empirical success, a mathematical theory of how deep neural circuits, with many layers of cascaded nonlinearities, learn and compute remains elusive. We will discuss three recent vignettes in which ideas from statistical physics can shed light on this issue. In particular, we show how dynamical criticality can help in neural learning, how the nonintuitive geometry of high dimensional error landscapes can be exploited to speed up learning, and how modern ideas from nonequilibrium statistical physics, like the Jarzynski equality, can be extended to yield powerful algorithms for modeling complex probability distributions. Time permitting, we will also discuss the relationship between neural network learning dynamics and the developmental time course of semantic concepts in infants. 
Nov 9
Bren Hall 4011 1 pm 
Weighted MaxSAT is an extension of SAT in which each clause has an associated cost. The goal is to minimize the cost of falsified clauses. MaxSAT has been successfully applied to a number of domains including Bioinformatics, Telecommunications and Scheduling.
In this talk I will introduce the MaxSAT framework and discuss the main solving approaches. In particular, I will present Maxresolution and will show how it can be effectively used in the context of Depthfirst BranchandBound. 
Nov 16
Bren Hall 4011 1 pm 
Occlusion poses a significant difficulty for detecting and localizing object keypoints and subsequent finegrained identification. In this talk, I will describe a hierarchical deformable part model for face detection and keypoint localization that explicitly models part occlusion. The proposed model structure makes it possible to augment positive training data with large numbers of synthetically occluded instances. This allows us to easily incorporate the statistics of occlusion patterns in a discriminatively trained model. However, this model does not exploit bottomup cues such as detection of occluding contours and image segments. I will talk about how to modify the proposed model to utilize bottomup classspecific segmentation in order to jointly detect and segment out the foreground pixels belonging to the face. 
Nov 23

Thankgiving week
(no seminar)

Nov 30
Bren Hall 4011 1 pm 
In many classification problems labels are relatively scarce. One context in which this occurs is where we have labels for groups of instances but not for the instances themselves, as in multiinstance learning. Past work on this problem has typically focused on learning classifiers to make predictions at the group level. In this paper we focus on the problem of learning classifiers to make predictions at the instance level. To achieve this we propose a new objective function that encourages smoothness of inferred instancelevel labels based on instancelevel similarity, while at the same time respecting grouplevel label constraints. We apply this approach to the problem of predicting labels for sentences given labels for reviews, using a convolutional neural network to infer sentence similarity. The approach is evaluated using three large review data sets from IMDB, Yelp, and Amazon, and we demonstrate the proposed approach is both accurate and scalable compared to various alternatives. 
Dec 7

Finals week
(no seminar)
