# Winter 2016

Standard

 Jan 11Bren Hall 40111 pm Padhraic SmythProfessorDepartment of Computer ScienceUniversity of California, Irvine Statistical Latent Variable and Event Models for Network DataSocial network analysis has a long and successful history in the social sciences, often with a focus on relatively small survey-based data sets. In the past decade, driven by the ease of automatically collecting large-scale network data sets, there has been significant interest in developing new statistical and machine learning techniques for network analysis. In this talk we will focus on two general modeling themes in this context: the use of latent variables for low-dimensional vector-based network representations models and event-based models for temporal network data. We will review the representational capabilities of these models from a generative perspective, discuss some of the challenges of parameter estimation that arise, and emphasize the role of predictive evaluation. The talk will conclude with a brief discussion of future directions in this general area. Based on joint work with Zach Butler, Chris DuBois, Jimmy Foulds, and Carter Butts Jan 18 No Seminar (MLK Day) Jan 25Bren Hall 40111 pm James FouldsPostdoctoral FellowDepartment of Computer ScienceUniversity of California, San Diego Latent Topic Networks: A Versatile Probabilistic Programming Framework for Topic ModelsTopic models have become increasingly prominent text-analytic machine learning tools for research in the social sciences and the humanities. In particular, custom topic models can be developed to answer specific research questions. The design of these models requires a nontrivial amount of effort and expertise, motivating general-purpose topic modeling frameworks. In this talk I will introduce latent topic networks, a flexible class of richly structured topic models designed to facilitate applied research. Custom models can straightforwardly be developed in this framework with an intuitive first-order logical probabilistic programming language. Latent topic networks admit scalable training via a parallelizable EM algorithm which leverages ADMM in the M-step. I demonstrate the broad applicability of the models with case studies on modeling influence in citation networks, and U.S. Presidential State of the Union addresses. This talk is based on joint work with Lise Getoor and Shachi Kumar from the University of California, Santa Cruz, published at ICML 2015. Feb 1 Bren Hall 40111 pm Furong HuangPhD CandidateDepartment of Electrical EngineeringUniversity of California, Irvine Discovery of Latent Factors in High-dimensional DataLatent or hidden variable models have applications in almost every domain, e.g., social network analysis, natural language processing, computer vision and computational biology. Training latent variable models is challenging due to non-convexity of the likelihood objective function. An alternative method is based on the spectral decomposition of low order moment matrices and tensors. This versatile framework is guaranteed to estimate the correct model consistently. I will discuss my results on convergence to globally optimal solution for stochastic gradient descent, despite non-convexity of the objective. I will then discuss large-scale implementations (which are highly parallel and scalable) of spectral methods, carried out on CPU/GPU and Spark platforms. We obtain a gain in both accuracies and in running times by several orders of magnitude compared to the state-of-art variational methods. I will discuss the following applications in detail: (1) learning hidden user commonalities (communities) in social networks, and (2) learning sentence embeddings for paraphrase detection using convolutional models. More generally, I have applied the methods to a variety of problems such as text and social network analysis, healthcare analytics, and cataloging neuronal cell types in neuroscience. Feb 8 Bren Hall 40111 pm Majid JanzaminPhD CandidateDepartment of Electrical EngineeringUniversity of California, Irvine Non-convex Optimization in Machine Learning: Provable Guarantees using Spectral MethodsOptimization lies at the core of machine learning. However, most machine learning problems entail non-convex optimization. In this talk, I will show how spectral and tensor methods can yield guaranteed convergence to globally optimal solutions under transparent conditions for a range of machine learning problems. In the first part, I will explain how tensor methods are useful for learning latent variable models in an unsupervised manner. The focus of my work is on overcomplete regime where the hidden dimension is larger than the observed dimensionality. I describe how tensor methods enable us to learn these models in the overcomplete regime with theoretical guarantees in recovering the parameters of the model. I also provide efficient sample complexity results for training these models. Next, I will describe a new method for training neural networks for which we provide theoretical guarantees on the performance of the algorithm. We have developed a computationally efficient algorithm for training a two-layer neural network using method-of-moment and tensor decomposition techniques. Feb 10Bren Hall 30113 pm Yining WangPhD StudentMachine Learning DepartmentCMU Subsampling and sketching in machine learningI will discuss subsampling and sketching with their applications and analysis in machine learning. They can be viewed not only as tools to improve computational and storage efficiency of existing learning algorithms, but also as settings that characterize data measurement/availability/privacy constraints in modern machine learning applications. In this talk I will introduce my recent work, which analyze subsampling and sketching settings in three popular machine learning algorithms: tensor factorization, subspace clustering and linear regression. Feb 15 No Seminar (Presidents Day) Feb 22Bren Hall 40111 pm Julian McAuleyAssistant ProfessorComputer Science & EngineeringUC San Diego Building rich recommender systems with visual, relational, and temporal informationUnderstanding the semantics of preferences and behavior is incredibly complicated, especially in settings where the visual appearance of items influences our decisions. Three challenges that I’ll discuss in this talk include (1) how can we uncover the semantics of visual preferences, especially in sparse or long-tailed data, where new items are constantly introduced? (2) How can we use visual data to understand the relationships between items, and in particular what makes two items “visually compatible”? And (3) how can we understand the temporal dynamics of visual preferences, in order to uncover how “fashions” have evolved over time? Feb 29 No Seminar (Cancelled) Mar 7 Bren Hall 40111 pm William LamPhD CandidateDepartment of Computer ScienceUniversity of California, Irvine Exploiting Compiled Heuristic Errors to Guide AND/OR Search for Graphical ModelsWe investigate the potential of look-ahead in the context of AND/OR search in graphical models using the mini-bucket heuristic for combinatorial optimization tasks (e.g. MAP/MPE or weighted CSPs.) We present and analyze the complexity of computing the residual (a.k.a. Bellman update) of the mini-bucket heuristic, which we call “bucket errors” and show how this can be used to identify which parts of the search space are more likely to benefit from look-ahead, therefore facilitating a method to bound its overhead. We also rephrase the look-ahead computation as a graphical model to make use of structure exploiting inference schemes. In our empirical results, we demonstrate that our methods can be used to cost-effectively increase the power of branch-and-bound search. In the second part of the talk, we show how bucket errors can be used to improve the performance of AND/OR best-first search algorithms for providing lower bounds on the min-sum problem. In our preliminary experiments, we show that when expanding nodes for the AO* algorithm, using bucket errors as a subproblem ordering heuristic can allow us to expand fewer nodes to arrive at the optimal solution compared to the existing ordering approach.

# Fall 2015

Standard

 Sep 16Bren Hall 40111 pm Hanie SedghiGraduate StudentDepartment of Electrical EngineeringUniversity of Southern California Taming the Wild: Optimization Approaches to Big DataLearning with big data is a challenging task that requires smart and efficient methods to extract useful information from data. Optimization methods, both convex and nonconvex are promising approaches to do this. In this talk I will review two classes of my work on prominent problems in convex and nonconvex optimization. Beating the Perils of Non-Convexity: Guaranteed Training of Neural Networks using Tensor Method: Neural networks provide a versatile tool for approximating functions of various inputs. Despite exciting achievements in application, a theoretical understanding of them is mostly lacking. Training a neural network is a highly nonconvex problem and backpropagation can get stuck in local optima. For the first time, we have a computationally efficient method for training neural networks that also has guaranteed generalization. This is part of our recently proposed general framework based on method-of-moments and tensor decomposition to efficiently learn different models such as neural networks and mixture of classifiers. Breaking Curse of Dimensionality: Stochastic Optimization in high dimensions: We have designed an efficient stochastic optimization method based on ADMM that is fast and cheap to implement, can be performed in parallel and can be used for any regularized optimization framework with some mild assumptions. We have proved that our algorithm obtains minimax optimal convergence rates for sparse optimization and robust PCA framework. Experiment results show that in the aforementioned scenarios, our method outperforms state-of-the-art, i.e., yields smaller error with equal time. Oct 5Bren Hall 40111 pm Gokcan KarakusGraduate StudentDepartment of Civil EngineeringCaltech Using Waveform Envelopes in a Bayesian Framework for Earthquake Early WarningWe are proposing an algorithm to test the accuracy of the predictions by earthquake early warning systems. Most warning systems predict the location and the magnitude of an ongoing earthquake via the early-arriving seismic wave data. Our algorithm uses logarithm of ratios between observed ground motion envelopes and Virtual Seismologist’s (Cua G. and Heaton T.) predicted envelopes to assess the validity of system predictions. We quantify the uncertainty attached to our parameters using Bayesian probability approach. Oct 12Bren Hall 40111 pm Alexander IhlerAssociate ProfessorDepartment of Computer ScienceUniversity of California, Irvine Discriminance SamplingImportance sampling (IS) and its variant, annealed IS (AIS) have been widely used for estimating the partition function in graphical models, such as Markov random fields and deep generative models. However, IS tends to underestimate the partition function and is subject to high variance when the proposal distribution is more peaked than the target distribution. On the other hand, “reverse” versions of IS and AIS tend to overestimate the partition function, and degenerate when the target distribution is more peaked than the proposal distribution. We present a simple, general method that gives much more reliable and robust estimates than either IS (AIS) or reverse IS (AIS). Our method works by converting the estimation problem into a simple classification problem that discriminates between the samples drawn from the target and the proposal. We give both theoretical and empirical justification, and show that an annealed version of our method significantly outperforms both AIS and reverse AIS (Burda et al., 2015), which has been the state-of-the-art for likelihood evaluation in deep generative models. Joint work with Qiang Liu, Jian Peng, and John Fisher. Oct 19Bren Hall 40111 pm Zhiying WangAssistant ProfessorDepartment of Electrical EngineeringUniversity of California, Irvine Multi-version Coding for Consistent Distributed StorageIn this talk, we propose the multi-version coding problem for distributed storage. We consider a setting where there are n servers that aim to store v versions of a message, and there is a total ordering on the versions from the earliest to the latest. We assume that each message version has a given number of bits. Each server can receive any subset of the v versions and stores a function of the message versions it receives. The multi-version code we consider ensures that, a decoder that connects to any c out of the n servers can recover the message corresponding to the latest common version stored among those servers, or a message corresponding to a version that is later than the latest common version. We describe a simple and explicit achievable scheme, as well as an information-theoretic converse. Moreover, we apply the multi-version code to one of the problems in distributed algorithms – the emulation of atomic shared memory in a message-passing network – and improve upon previous algorithms up to a half in terms of storage cost. Oct 26Bren Hall 40111 pm Soheil FeiziGraduate StudentCSAILMIT Learning (from) networks: fundamental limits, algorithms, and applicationsNetwork models provide a unifying framework for understanding dependencies among variables in medical, biological, and other sciences. Networks can be used to reveal underlying data structures, infer functional modules, and facilitate experiment design. In practice, however, size, uncertainty and complexity of the underlying associations render these applications challenging. In this talk, we illustrate the use of spectral, combinatorial, and statistical inference techniques in several significant network science problems. First, we consider the problem of network alignment where the goal is to find a bijective mapping between nodes of two networks to maximize their overlapping edges while minimizing mismatches. To solve this combinatorial problem, we present a new scalable spectral algorithm, and establish its efficiency theoretically and experimentally over several synthetic and real networks. Next, we introduce network maximal correlation (NMC) as an essential measure to capture nonlinear associations in networks. We characterize NMC using geometric properties of Hilbert spaces and illustrate its application in learning network topology when variables have unknown nonlinear dependencies. Finally, we discuss the problem of learning low dimensional structures (such as clusters) in large networks, where we introduce logistic Random Dot Product Graphs, a new class of networks which includes most stochastic block models as well as other low dimensional structures. Using this model, we propose a spectral network clustering algorithm that possesses robust performance under different clustering setups. In all of these problems, we examine underlying fundamental limits and present efficient algorithms for solving them. We also highlight applications of the proposed algorithms to data-driven problems such as functional and regulatory genomics of human diseases, and cancer. Bio: Soheil Feizi is a PhD candidate at Massachusetts Institute of Technology (MIT), co-supervised by Prof. Muriel Médard and Prof. Manolis Kellis. His research interests include analysis of complex networks and the development of inference and learning methods based on Optimization, Information Theory, Machine Learning, Statistics, and Probability, with applications in Computational Biology, and beyond. He completed his B.Sc. at Sharif University of Technology, awarded as the best student of his class. He received the Jacobs Presidential Fellowship and EECS Great Educators Fellowship, both from MIT. He has been a finalist in the Qualcomm Innovation contest. He received an Ernst Guillemin Award for his Master of Science Thesis in the department of Electrical Engineering and Computer Science at MIT. Nov 2Bren Hall 40111 pm Surya GanguliAssistant ProfessorDepartment of Applied PhysicsStanford University The Statistical Physics of Deep Learning: on the Beneficial Roles of Dynamic Criticality, Random Landscapes, and the Reversal of TimeNeuronal networks have enjoyed a resurgence both in the worlds of neuroscience, where they yield mathematical frameworks for thinking about complex neural datasets, and in machine learning, where they achieve state of the art results on a variety of tasks, including machine vision, speech recognition, and language translation. Despite their empirical success, a mathematical theory of how deep neural circuits, with many layers of cascaded nonlinearities, learn and compute remains elusive. We will discuss three recent vignettes in which ideas from statistical physics can shed light on this issue. In particular, we show how dynamical criticality can help in neural learning, how the non-intuitive geometry of high dimensional error landscapes can be exploited to speed up learning, and how modern ideas from non-equilibrium statistical physics, like the Jarzynski equality, can be extended to yield powerful algorithms for modeling complex probability distributions. Time permitting, we will also discuss the relationship between neural network learning dynamics and the developmental time course of semantic concepts in infants. Nov 9Bren Hall 40111 pm Javier LarrosaProfessorLlenguatges i Sistemes InformàticsUniversitat Politècnica de Catalunya On Max-SAT solvingWeighted Max-SAT is an extension of SAT in which each clause has an associated cost. The goal is to minimize the cost of falsified clauses. Max-SAT has been successfully applied to a number of domains including Bioinformatics, Telecommunications and Scheduling. In this talk I will introduce the Max-SAT framework and discuss the main solving approaches. In particular, I will present Max-resolution and will show how it can be effectively used in the context of Depth-first Branch-and-Bound. Nov 16Bren Hall 40111 pm Golnaz GhiasiGraduate StudentDepartment of Computer ScienceUniversity of California, Irvine Detecting and Localizing Occluded FacesOcclusion poses a significant difficulty for detecting and localizing object keypoints and subsequent fine-grained identification. In this talk, I will describe a hierarchical deformable part model for face detection and keypoint localization that explicitly models part occlusion. The proposed model structure makes it possible to augment positive training data with large numbers of synthetically occluded instances. This allows us to easily incorporate the statistics of occlusion patterns in a discriminatively trained model. However, this model does not exploit bottom-up cues such as detection of occluding contours and image segments. I will talk about how to modify the proposed model to utilize bottom-up class-specific segmentation in order to jointly detect and segment out the foreground pixels belonging to the face. Nov 23 Thankgiving week(no seminar) Nov 30Bren Hall 40111 pm Dimitrios KotziasGraduate StudentDepartment of Computer ScienceUniversity of California, Irvine From Group to Individual Labels using Deep FeaturesIn many classification problems labels are relatively scarce. One context in which this occurs is where we have labels for groups of instances but not for the instances themselves, as in multi-instance learning. Past work on this problem has typically focused on learning classifiers to make predictions at the group level. In this paper we focus on the problem of learning classifiers to make predictions at the instance level. To achieve this we propose a new objective function that encourages smoothness of inferred instance-level labels based on instance-level similarity, while at the same time respecting group-level label constraints. We apply this approach to the problem of predicting labels for sentences given labels for reviews, using a convolutional neural network to infer sentence similarity. The approach is evaluated using three large review data sets from IMDB, Yelp, and Amazon, and we demonstrate the proposed approach is both accurate and scalable compared to various alternatives. Dec 7 Finals week(no seminar)

Standard