Sep 22
Bren Hall 4011 1 pm |
Burr Settles
Duolingo
Duolingo is a language education platform that teaches 20 languages to more than 150 million students worldwide. Our free flagship learning app is the \#1 way to learn a language online, and is the most-downloaded education app for both Android and iOS devices. In this talk, I will describe the Duolingo system and several of our empirical research projects to date, which combine machine learning with computational linguistics and psychometrics to improve learning, engagement, and even language proficiency assessment through our products. |
Sep 26
Bren Hall 4011 1 pm |
Convolutional Neural Net (CNN) architectures have terrific recognition performance but rely on spatial pooling which makes it difficult to adapt them to tasks that require dense, pixel-accurate labeling. We make two contributions to solving this problem: (1) We demonstrate that while the apparent spatial resolution of convolutional feature maps is low, the high-dimensional feature representation contains significant sub-pixel localization information. (2) We describe a multi-resolution reconstruction architecture based on a Laplacian pyramid that uses skip connections from higher resolution feature maps and multiplicative gating to successively refine segment boundaries reconstructed from lower-resolution maps. This approach yields state-of-the-art semantic segmentation results on the PASCAL VOC and Cityscapes segmentation benchmarks without resorting to more complex random-field inference or instance detection driven architectures. |
Oct 3
Bren Hall 4011 1 pm |
Despite the rapid development of computer graphics during the recent years, complex materials such as fabrics, fur, and human hair remain largely lacking in the virtual worlds. This is due to both the lack of high-fidelity data and the inability to efficiently describe these complicated objects via mathematical/statistical models.
In this talk, I will present my research that introduces new means to acquire, model, and render complex materials that are essential to our daily lives with a focus on fabrics. Leveraging detailed geometric information and sophisticated optical model, our work has led to computer generated imagery with a new level of accuracy and fidelity. In particular, we measure real-world samples using volume imaging (e.g., computed micro-tomography) to obtain detailed datasets on their micro-geometries. We then fit sophisticated statistical models to the measured data, yielding highly compact yet realistic representations. Lastly, we show how to recover a sample’s optical properties (e.g., colors) using optimization. |
Oct 10
|
No Seminar (Columbus Day)
|
Oct 17
Bren Hall 4011 1 pm |
Stefano Ermon
Assistant Professor of Computer Science Fellow of the Woods Institute for the Environment Stanford University
Recent technological developments are creating new spatio-temporal data streams that contain a wealth of information relevant to sustainable development goals. Modern AI techniques have the potential to yield accurate, inexpensive, and highly scalable models to inform research and policy. As a first example, I will present a machine learning method we developed to predict and map poverty in developing countries. Our method can reliably predict economic well-being using only high-resolution satellite imagery. Because images are passively collected in every corner of the world, our method can provide timely and accurate measurements in a very scalable end economic way, and could revolutionize efforts towards global poverty eradication. As a second example, I will present some ongoing work on monitoring agricultural and food security outcomes from space. |
Oct 24
|
No Seminar (cancelled)
|
Oct 31
Bren Hall 4011 1 pm |
This talks explores recent uses of machine learning to large proprietary consumer transaction datasets. These are datasets which record barcode level transaction information on individual items purchased grouped by shopping trip and customer. Recent innovations in data collection allow us to go beyond the supermarket scanner to collect such data and include recent efforts to digitize the universe of customers’ receipts across all channels from supermarkets to online purchases. Additionally, passive wifi tracking allows us to record search behavior in stores and model how it translates into sales. It also gives us the opportunity to create real time interventions to nudge consumer shopping behavior. We will explore some of the challenges of modeling consumer behavior using these data and discuss methods such as tensor decompositions for count data, discrete choice modeling with Dirichlet Process Mixtures, and the use of deep autoencoders for producing interpretable statistical hypotheses. |
Nov 7
Bren Hall 4011 1 pm |
This talk investigates the restricted Boltzmann machine (RBM), which is the building block for many deep probabilistic models. We propose an infinite RBM model, whose maximum likelihood estimation corresponds to a constrained convex optimization. We consider the Frank-Wolfe algorithm to solve the program, which provides a sparse solution that can be interpreted as inserting a hidden unit at each iteration. As a side benefit, this can be used to easily and efficiently identify an appropriate number of hidden units during the optimization. We also investigate different learning algorithms for conditional RBMs. There is a pervasive opinion that loopy belief propagation does not work well on RBM-based models, especially for learning. We demonstrate that, in the conditional setting, learning RBM-based models with belief propagation and its variants can provide much better results than the state-of-the-art contrastive divergence algorithms. |
Nov 14
Bren Hall 4011 1 pm |
Traditionally, the field of computational Bayesian statistics has been divided into two main subfields: variational inference and Markov chain Monte Carlo (MCMC). In recent years, however, several methods have been proposed based on combining variational Bayesian inference and MCMC simulation in order to improve their overall accuracy and computational efficiency. This marriage of fast evaluation and flexible approximation provides a promising means of designing scalable Bayesian inference methods. In this work, we explore the possibility of incorporating variational approximation into a state-of-the-art MCMC method, Hamiltonian Monte Carlo (HMC), to reduce the required expensive computation involved in the sampling procedure, which is the bottleneck for many applications of HMC in big data problems. To this end, we exploit the regularity in parameter space to construct a free-form approximation of the target distribution by a fast and flexible surrogate function using an optimized additive model of proper random basis. The surrogate provides sufficiently accurate approximation while allowing for fast computation, resulting in an efficient approximate inference algorithm. We demonstrate the advantages of our method on both synthetic and real data problems. |
Nov 16
Bren Hall 4011 4pm |
Arindam Banerjee
Associate Professor Department of Computer Science and Engineering University of Minnesota
Many machine learning problems, especially scientific problems in areas such as ecology, climate science, and brain sciences, operate in the so-called `low samples, high dimensions’ regime. Such problems typically have numerous possible predictors or features, but the number of training examples is small, often much smaller than the number of features. In this talk, we will discuss recent advances in general formulations and estimators for such problems. These formulations generalize prior work such as the Lasso and the Dantzig selector. We will discuss the geometry underlying such formulations, and how the geometry helps in establishing finite sample properties of the estimators. We will also discuss applications of such results in structure learning in probabilistic graphical models, along with real world applications in ecology and climate science.
This is joint work with Soumyadeep Chatterjee, Sheng Chen, Farideh Fazayeli, Andre Goncalves, Jens Kattge, Igor Melnyk, Peter Reich, Franziska Schrodt, Hanhuai Shan, and Vidyashankar Sivakumar. |
Nov 21
Bren Hall 4011 1 pm |
Stein’s method provides a remarkable theoretical tool in probability theory but has not been widely known or used in practical machine learning. In this talk, we try to bright this gap and show that some of the key ideas of Stein’s method can be naturally combined with practical machine learning and probabilistic inference techniques such as kernel method, variational inference and variance reduction, which together form a new general framework for deriving new algorithms for handling the kind of highly complex, structured probabilistic models widely used in modern (deep) machine learning. The new algorithms derived in this way often have a simple, untraditional form and have significant advantages over the traditional methods. I will show several applications, including goodness-of-fit tests for evaluating models without knowing the normalization constants, scalable Bayesian inference that combines the advantages of variational inference, Monte Carlo and gradient-based optimization, and approximate maximum likelihood training of deep generative models that can generate realistic-looking images. |
Nov 28
Bren Hall 4011 1 pm |
We develop upper and lower bounds for the probability of Boolean functions by treating multiple occurrences of variables as independent and assigning them new individual probabilities. We call this approach “dissociation” and give an exact characterization of optimal oblivious bounds, i.e. when the new probabilities are chosen independent of the probabilities of all other variables.
Our motivation comes from the weighted model counting problem (or, equivalently, the problem of computing the probability of a Boolean function), which is \#P-hard in general. By performing several dissociations, one can transform a Boolean formula whose probability is difficult to compute, into one whose probability is easy to compute, and which is guaranteed to provide an upper or lower bound on the probability of the original formula by choosing appropriate probabilities for the dissociated variables. Our new bounds shed light on the connection between previous relaxation-based and model-based approximations and unify them as concrete choices in a larger design space. We also show how our theory allows a standard relational database management system to both upper and lower bound hard probabilistic queries in guaranteed polynomial time. (Based on joint work with Dan Suciu from TODS 2014, VLDB 2015, and VLDBJ 2016: http://arxiv.org/pdf/1409.6052,http://arxiv.org/pdf/1412.1069, http://arxiv.org/pdf/1310.6257) |
Dec 5
|
No Seminar
Finals Week
|
AIML
Spring 2016
Standard
Apr 4
|
No Seminar (Cancelled)
|
Apr 11
Bren Hall 4011 1 pm |
Venkat Chandrasekaran
Assistant Professor Computing and Mathematical Sciences & Electrical Engineering California Institute of Technology
Extracting structured planted subgraphs from large graphs is a fundamental question that arises in a range of application domains. We describe a computationally tractable approach based on convex optimization to recover certain families of structured graphs that are embedded in larger graphs containing spurious edges. Our method relies on tractable semidefinite descriptions of majorization inequalities on the spectrum of a matrix, and we give conditions on the eigenstructure of a planted graph in relation to the noise level under which our algorithm succeeds. (Joint work with Utkan Candogan.) |
Apr 18
Bren Hall 4011 1 pm |
Clinical medical data, especially in the intensive care unit (ICU), consist of multivariate time series of observations. For each patient visit (or episode), sensor data and lab test results are recorded in the patient’s Electronic Health Record (EHR). While potentially containing a wealth of insights, the data is difficult to mine effectively, owing to varying length, irregular sampling and missing data. Recurrent Neural Networks (RNNs), particularly those using Long Short-Term Memory (LSTM) hidden units, are powerful and increasingly popular models for learning from sequence data. They effectively model varying length sequences and capture long range dependencies. We present the first study to empirically evaluate the ability of LSTMs to recognize patterns in multivariate time series of clinical measurements. Specifically, we consider multilabel classification of diagnoses, training a model to classify 128 diagnoses given 13 frequently but irregularly sampled clinical measurements. First, we establish the effectiveness of a simple LSTM network for modeling clinical data. Then we demonstrate a straightforward and effective training strategy in which we replicate targets at each sequence step. Trained only on raw time series, our models outperform several strong baselines, including a multilayer perceptron trained on hand-engineered features. |
Apr 25
Bren Hall 4011 1 pm |
Jasper Vrugt
Associate Professor Department of Environmental Engineering University of California, Irvine
Bayesian inference has found widespread application and use in science and engineering to reconcile Earth system models with data, including prediction in space (interpolation), prediction in time (forecasting), assimilation of observations and deterministic/stochastic model output, and inference of the model parameters. In this talk I will review the basic elements of the DiffeRential Evolution Adaptive Metropolis (DREAM) algorithm developed by Vrugt et al. (2008,2009) and used for Bayesian inference in fields ranging from physics, chemistry and engineering, to ecology, hydrology, and geophysics. I will also discuss recent developments of DREAM, including the development of a diagnostic model evaluation framework using likelihood free inference, and the use of dimensionality reduction techniques for calibration of CPU-intensive system models. Practical examples are used from many different fields of study. |
May 2
Bren Hall 4011 1 pm |
Jeffrey Mark Siskind
Associate Professor Department of Electrical & Computer Engineering Purdue University
Humans can describe observations and act upon requests. This requires that language be grounded in perception and motor control. I will present several components of my long-term research program to understand the vision-language-motor interface in the human brain and emulate such on computers.
In the first half of the talk, I will present fMRI investigation of the vision-language interface in the human brain. Subjects were presented with stimuli in different modalities—spoken sentences, textual presentation of sentences, and video clips depicting activity that can be described by sentences—while undergoing fMRI. The scan data is analyzed to allow readout of individual constituent concepts and words—people/names, objects/nouns, actions/verbs, and spatial-relations/prepositions—as well as phrases and entire sentences. This can be done across subjects and across modality; we use classifiers trained on scan data for one subject to read out from another subject and use classifiers trained on scan data for one modality, say text, to read out from scans of another modality, say video or speech. Analysis of this indicates that the brain regions involved in processing the different kinds of constituents are largely disjoint but also largely shared across subjects and modality. Further, we can determine the predication relations; when the stimuli depict multiple people, objects, and actions, we can read out which people are performing which actions with which objects. This points to a compositional mental semantic representation common across subjects and modalities. In the second half of the talk, I will use this work to motivate the development of three computational systems. First, I will present a system that can use sentential description of human interaction with previously unseen objects in video to automatically find and track those objects. This is done without any annotation of the objects and without any pretrained object detectors. Second, I will present a system that learns the meanings of nouns and prepositions from video and tracks of a mobile robot navigating through its environment paired with sentential descriptions of such activity. Such a learned language model then supports both generation of sentential description of new paths driven in new environments as well as automatic driving of paths to satisfy navigational instructions specified with new sentences in new environments. Third, I will present a system that can play a physically grounded game of checkers using vision to determine game state and robotic arms to change the game state by reading the game rules from natural-language instructions. Joint work with Andrei Barbu, Daniel Paul Barrett, Charles Roger Bradley, Seth Benjamin Scott Alan Bronikowski, Zachary Burchill, Wei Chen, N. Siddharth, Caiming Xiong, Haonan Yu, Jason J. Corso, Christiane D. Fellbaum, Catherine Hanson, Stephen Jose Hanson, Sebastien Helie, Evguenia Malaia, Barak A. Pearlmutter, Thomas Michael Talavage, and Ronnie B. Wilbur. Bio: Jeffrey M. Siskind received the B.A. degree in computer science from the Technion, Israel Institute of Technology, Haifa, in 1979, the S.M. degree in computer science from the Massachusetts Institute of Technology (M.I.T.), Cambridge, in 1989, and the Ph.D. degree in computer science from M.I.T. in 1992. He did a postdoctoral fellowship at the University of Pennsylvania Institute for Research in Cognitive Science from 1992 to 1993. He was an assistant professor at the University of Toronto Department of Computer Science from 1993 to 1995, a senior lecturer at the Technion Department of Electrical Engineering in 1996, a visiting assistant professor at the University of Vermont Department of Computer Science and Electrical Engineering from 1996 to 1997, and a research scientist at NEC Research Institute, Inc. from 1997 to 2001. He joined the Purdue University School of Electrical and Computer Engineering in 2002 where he is currently an associate professor. His research interests include computer vision, robotics, artificial intelligence, neuroscience, cognitive science, computational linguistics, child language acquisition, automatic differentiation, and programming languages and compilers. |
May 9
Bren Hall 4011 1 pm |
Circadian rhythms date back to the origins of life, are found in virtually every species and every cell, and play fundamental roles in functions ranging from metabolism to cognition. Modern high-throughput technologies allow the measurement of concentrations of transcripts, metabolites, and other species along the circadian cycle creating novel computational challenges and opportunities, including the problems of inferring whether a given species oscillate in circadian fashion or not, and inferring the time at which a set of measurements was taken. Due to the expensive process of taking these measurements, inferring whether a given species oscillate in circadian fashion has proven to be a challenge. The sparse data with only a few replicates makes many existing methods unreliable. In addition, many differential gene expression experiments–such as those contained in the GEO repository, have been carried at single time points without taking into account circadian oscillations which can act as confounding factors. To solve these problems we introduce two deep learning methods: BIO_CYCLE and BIO_CLOCK. BIO_CYCLE takes advantage of synthetic data to determine whether or not a signal oscillates in a circadian fashion, and infer periods, amplitudes, and phases. BIO_CLOCK, using a specialized cost function and real-world data, imputes the time at which a sample was taken, from the corresponding gene expression measurements. These tools are a necessary step forward to better understand circadian rhythms at the molecular level and their applications to precision medicine. |
May 16
Bren Hall 4011 1 pm |
Aparna Chandramowlishwaran
Assistant Professor Department of Electrical Engineering University of California, Irvine
In this talk, I’ll present my group’s work on addressing two key challenges in developing parallel algorithms and software for the class of N-body problems on current and future platforms. The first challenge is reducing the apparent gap in performance between code generated from high-level forms and that of hand-tuned code, which we address using extensive characterization of the optimization space for these computations and automating the process through domain specific code generators. These application-specific compilers provide the domain scientists the ability to productively harness the power of these large machines and to enable large-scale scientific simulations and big data applications. The second challenge is analyzing and designing algorithms. We are entering the era of exascale. The number of cores are growing at a much faster rate than bandwidth per node. What implications does this trend have in designing algorithms for future systems? If we were to model computation and communication costs, what inferences can we derive from such a model for the time to execute an algorithm? Our model suggests a new kind of high level analytical co-design of the algorithm and architecture and similar analysis can be applied in designing algorithms in general. |
May 23
Bren Hall 4011 1 pm |
Divijotham Krishnamurthy
Postdoctoral Fellow Center for Mathematics of Information California Institute of Technology
Several problems arising in the design, analysis and efficient operation of power systems are naturally posed as graph-structured optimization problems. Due to the nonlinear nature of the physical equations describing the power grid, these problems are often nonconvex and NP-hard. However, practical instances of several graph-structured optimization problems have been solved successfully in the graphical models literature by exploiting graph structure and using message-passing or belief propagation techniques. In this work, we show that a similar approach can be successfully applied to power systems, leading to theoretically and practically efficient algorithms. I will discuss two applications in detail: a) Solving mixed-integer optimal power flow problems on distribution networks, and b) Detecting and mitigating market manipulation by aggregators of renewable generation in a distribution-level market. I will also discuss possible extensions of these approaches to other power system/infrastructure network problems. Based on joint work with Misha Chertkov, Sidhant Misra, Marc Vuffray, Pascal Van Hentenryck, Niangjun Chen, Navid Azizan Ruhi and Adam Wierman. |
May 30
|
No Seminar (Memorial Day)
|
Winter 2016
Standard
Jan 11
Bren Hall 4011 1 pm |
Social network analysis has a long and successful history in the social sciences, often with a focus on relatively small survey-based data sets. In the past decade, driven by the ease of automatically collecting large-scale network data sets, there has been significant interest in developing new statistical and machine learning techniques for network analysis. In this talk we will focus on two general modeling themes in this context: the use of latent variables for low-dimensional vector-based network representations models and event-based models for temporal network data. We will review the representational capabilities of these models from a generative perspective, discuss some of the challenges of parameter estimation that arise, and emphasize the role of predictive evaluation. The talk will conclude with a brief discussion of future directions in this general area.
Based on joint work with Zach Butler, Chris DuBois, Jimmy Foulds, and Carter Butts |
Jan 18
|
No Seminar (MLK Day)
|
Jan 25
Bren Hall 4011 1 pm |
Topic models have become increasingly prominent text-analytic machine learning tools for research in the social sciences and the humanities. In particular, custom topic models can be developed to answer specific research questions. The design of these models requires a nontrivial amount of effort and expertise, motivating general-purpose topic modeling frameworks. In this talk I will introduce latent topic networks, a flexible class of richly structured topic models designed to facilitate applied research. Custom models can straightforwardly be developed in this framework with an intuitive first-order logical probabilistic programming language. Latent topic networks admit scalable training via a parallelizable EM algorithm which leverages ADMM in the M-step. I demonstrate the broad applicability of the models with case studies on modeling influence in citation networks, and U.S. Presidential State of the Union addresses. This talk is based on joint work with Lise Getoor and Shachi Kumar from the University of California, Santa Cruz, published at ICML 2015. |
Feb 1
Bren Hall 4011 1 pm |
Latent or hidden variable models have applications in almost every domain, e.g., social network analysis, natural language processing, computer vision and computational biology. Training latent variable models is challenging due to non-convexity of the likelihood objective function. An alternative method is based on the spectral decomposition of low order moment matrices and tensors. This versatile framework is guaranteed to estimate the correct model consistently. I will discuss my results on convergence to globally optimal solution for stochastic gradient descent, despite non-convexity of the objective. I will then discuss large-scale implementations (which are highly parallel and scalable) of spectral methods, carried out on CPU/GPU and Spark platforms. We obtain a gain in both accuracies and in running times by several orders of magnitude compared to the state-of-art variational methods. I will discuss the following applications in detail: (1) learning hidden user commonalities (communities) in social networks, and (2) learning sentence embeddings for paraphrase detection using convolutional models. More generally, I have applied the methods to a variety of problems such as text and social network analysis, healthcare analytics, and cataloging neuronal cell types in neuroscience. |
Feb 8
Bren Hall 4011 1 pm |
Optimization lies at the core of machine learning. However, most machine learning problems entail non-convex optimization. In this talk, I will show how spectral and tensor methods can yield guaranteed convergence to globally optimal solutions under transparent conditions for a range of machine learning problems.
In the first part, I will explain how tensor methods are useful for learning latent variable models in an unsupervised manner. The focus of my work is on overcomplete regime where the hidden dimension is larger than the observed dimensionality. I describe how tensor methods enable us to learn these models in the overcomplete regime with theoretical guarantees in recovering the parameters of the model. I also provide efficient sample complexity results for training these models. Next, I will describe a new method for training neural networks for which we provide theoretical guarantees on the performance of the algorithm. We have developed a computationally efficient algorithm for training a two-layer neural network using method-of-moment and tensor decomposition techniques. |
Feb 10
Bren Hall 3011 3 pm |
I will discuss subsampling and sketching with their applications and analysis in machine learning. They can be viewed not only as tools to improve computational and storage efficiency of existing learning algorithms, but also as settings that characterize data measurement/availability/privacy constraints in modern machine learning applications. In this talk I will introduce my recent work, which analyze subsampling and sketching settings in three popular machine learning algorithms: tensor factorization, subspace clustering and linear regression. |
Feb 15
|
No Seminar (Presidents Day)
|
Feb 22
Bren Hall 4011 1 pm |
Understanding the semantics of preferences and behavior is incredibly complicated, especially in settings where the visual appearance of items influences our decisions. Three challenges that I’ll discuss in this talk include (1) how can we uncover the semantics of visual preferences, especially in sparse or long-tailed data, where new items are constantly introduced? (2) How can we use visual data to understand the relationships between items, and in particular what makes two items “visually compatible”? And (3) how can we understand the temporal dynamics of visual preferences, in order to uncover how “fashions” have evolved over time? |
Feb 29
|
No Seminar (Cancelled)
|
Mar 7
Bren Hall 4011 1 pm |
We investigate the potential of look-ahead in the context of AND/OR search in graphical models using the mini-bucket heuristic for combinatorial optimization tasks (e.g. MAP/MPE or weighted CSPs.) We present and analyze the complexity of computing the residual (a.k.a. Bellman update) of the mini-bucket heuristic, which we call “bucket errors” and show how this can be used to identify which parts of the search space are more likely to benefit from look-ahead, therefore facilitating a method to bound its overhead. We also rephrase the look-ahead computation as a graphical model to make use of structure exploiting inference schemes. In our empirical results, we demonstrate that our methods can be used to cost-effectively increase the power of branch-and-bound search.
In the second part of the talk, we show how bucket errors can be used to improve the performance of AND/OR best-first search algorithms for providing lower bounds on the min-sum problem. In our preliminary experiments, we show that when expanding nodes for the AO* algorithm, using bucket errors as a subproblem ordering heuristic can allow us to expand fewer nodes to arrive at the optimal solution compared to the existing ordering approach. |
Fall 2015
Standard
Sep 16
Bren Hall 4011 1 pm |
Learning with big data is a challenging task that requires smart and efficient methods to extract useful information from data. Optimization methods, both convex and nonconvex are promising approaches to do this. In this talk I will review two classes of my work on prominent problems in convex and nonconvex optimization.
Beating the Perils of Non-Convexity: Guaranteed Training of Neural Networks using Tensor Method: Neural networks provide a versatile tool for approximating functions of various inputs. Despite exciting achievements in application, a theoretical understanding of them is mostly lacking. Training a neural network is a highly nonconvex problem and backpropagation can get stuck in local optima. For the first time, we have a computationally efficient method for training neural networks that also has guaranteed generalization. This is part of our recently proposed general framework based on method-of-moments and tensor decomposition to efficiently learn different models such as neural networks and mixture of classifiers. Breaking Curse of Dimensionality: Stochastic Optimization in high dimensions: We have designed an efficient stochastic optimization method based on ADMM that is fast and cheap to implement, can be performed in parallel and can be used for any regularized optimization framework with some mild assumptions. We have proved that our algorithm obtains minimax optimal convergence rates for sparse optimization and robust PCA framework. Experiment results show that in the aforementioned scenarios, our method outperforms state-of-the-art, i.e., yields smaller error with equal time. |
Oct 5
Bren Hall 4011 1 pm |
We are proposing an algorithm to test the accuracy of the predictions by earthquake early warning systems. Most warning systems predict the location and the magnitude of an ongoing earthquake via the early-arriving seismic wave data. Our algorithm uses logarithm of ratios between observed ground motion envelopes and Virtual Seismologist’s (Cua G. and Heaton T.) predicted envelopes to assess the validity of system predictions. We quantify the uncertainty attached to our parameters using Bayesian probability approach. |
Oct 12
Bren Hall 4011 1 pm |
Importance sampling (IS) and its variant, annealed IS (AIS) have been widely used for estimating the partition function in graphical models, such as Markov random fields and deep generative models. However, IS tends to underestimate the partition function and is subject to high variance when the proposal distribution is more peaked than the target distribution. On the other hand, “reverse” versions of IS and AIS tend to overestimate the partition function, and degenerate when the target distribution is more peaked than the proposal distribution. We present a simple, general method that gives much more reliable and robust estimates than either IS (AIS) or reverse IS (AIS). Our method works by converting the estimation problem into a simple classification problem that discriminates between the samples drawn from the target and the proposal. We give both theoretical and empirical justification, and show that an annealed version of our method significantly outperforms both AIS and reverse AIS (Burda et al., 2015), which has been the state-of-the-art for likelihood evaluation in deep generative models. Joint work with Qiang Liu, Jian Peng, and John Fisher. |
Oct 19
Bren Hall 4011 1 pm |
In this talk, we propose the multi-version coding problem for distributed storage. We consider a setting where there are n servers that aim to store v versions of a message, and there is a total ordering on the versions from the earliest to the latest. We assume that each message version has a given number of bits. Each server can receive any subset of the v versions and stores a function of the message versions it receives. The multi-version code we consider ensures that, a decoder that connects to any c out of the n servers can recover the message corresponding to the latest common version stored among those servers, or a message corresponding to a version that is later than the latest common version. We describe a simple and explicit achievable scheme, as well as an information-theoretic converse. Moreover, we apply the multi-version code to one of the problems in distributed algorithms – the emulation of atomic shared memory in a message-passing network – and improve upon previous algorithms up to a half in terms of storage cost. |
Oct 26
Bren Hall 4011 1 pm |
Network models provide a unifying framework for understanding dependencies among variables in medical, biological, and other sciences. Networks can be used to reveal underlying data structures, infer functional modules, and facilitate experiment design. In practice, however, size, uncertainty and complexity of the underlying associations render these applications challenging.
In this talk, we illustrate the use of spectral, combinatorial, and statistical inference techniques in several significant network science problems. First, we consider the problem of network alignment where the goal is to find a bijective mapping between nodes of two networks to maximize their overlapping edges while minimizing mismatches. To solve this combinatorial problem, we present a new scalable spectral algorithm, and establish its efficiency theoretically and experimentally over several synthetic and real networks. Next, we introduce network maximal correlation (NMC) as an essential measure to capture nonlinear associations in networks. We characterize NMC using geometric properties of Hilbert spaces and illustrate its application in learning network topology when variables have unknown nonlinear dependencies. Finally, we discuss the problem of learning low dimensional structures (such as clusters) in large networks, where we introduce logistic Random Dot Product Graphs, a new class of networks which includes most stochastic block models as well as other low dimensional structures. Using this model, we propose a spectral network clustering algorithm that possesses robust performance under different clustering setups. In all of these problems, we examine underlying fundamental limits and present efficient algorithms for solving them. We also highlight applications of the proposed algorithms to data-driven problems such as functional and regulatory genomics of human diseases, and cancer. Bio: Soheil Feizi is a PhD candidate at Massachusetts Institute of Technology (MIT), co-supervised by Prof. Muriel Médard and Prof. Manolis Kellis. His research interests include analysis of complex networks and the development of inference and learning methods based on Optimization, Information Theory, Machine Learning, Statistics, and Probability, with applications in Computational Biology, and beyond. He completed his B.Sc. at Sharif University of Technology, awarded as the best student of his class. He received the Jacobs Presidential Fellowship and EECS Great Educators Fellowship, both from MIT. He has been a finalist in the Qualcomm Innovation contest. He received an Ernst Guillemin Award for his Master of Science Thesis in the department of Electrical Engineering and Computer Science at MIT. |
Nov 2
Bren Hall 4011 1 pm |
Neuronal networks have enjoyed a resurgence both in the worlds of neuroscience, where they yield mathematical frameworks for thinking about complex neural datasets, and in machine learning, where they achieve state of the art results on a variety of tasks, including machine vision, speech recognition, and language translation. Despite their empirical success, a mathematical theory of how deep neural circuits, with many layers of cascaded nonlinearities, learn and compute remains elusive. We will discuss three recent vignettes in which ideas from statistical physics can shed light on this issue. In particular, we show how dynamical criticality can help in neural learning, how the non-intuitive geometry of high dimensional error landscapes can be exploited to speed up learning, and how modern ideas from non-equilibrium statistical physics, like the Jarzynski equality, can be extended to yield powerful algorithms for modeling complex probability distributions. Time permitting, we will also discuss the relationship between neural network learning dynamics and the developmental time course of semantic concepts in infants. |
Nov 9
Bren Hall 4011 1 pm |
Weighted Max-SAT is an extension of SAT in which each clause has an associated cost. The goal is to minimize the cost of falsified clauses. Max-SAT has been successfully applied to a number of domains including Bioinformatics, Telecommunications and Scheduling.
In this talk I will introduce the Max-SAT framework and discuss the main solving approaches. In particular, I will present Max-resolution and will show how it can be effectively used in the context of Depth-first Branch-and-Bound. |
Nov 16
Bren Hall 4011 1 pm |
Occlusion poses a significant difficulty for detecting and localizing object keypoints and subsequent fine-grained identification. In this talk, I will describe a hierarchical deformable part model for face detection and keypoint localization that explicitly models part occlusion. The proposed model structure makes it possible to augment positive training data with large numbers of synthetically occluded instances. This allows us to easily incorporate the statistics of occlusion patterns in a discriminatively trained model. However, this model does not exploit bottom-up cues such as detection of occluding contours and image segments. I will talk about how to modify the proposed model to utilize bottom-up class-specific segmentation in order to jointly detect and segment out the foreground pixels belonging to the face. |
Nov 23
|
Thankgiving week
(no seminar)
|
Nov 30
Bren Hall 4011 1 pm |
In many classification problems labels are relatively scarce. One context in which this occurs is where we have labels for groups of instances but not for the instances themselves, as in multi-instance learning. Past work on this problem has typically focused on learning classifiers to make predictions at the group level. In this paper we focus on the problem of learning classifiers to make predictions at the instance level. To achieve this we propose a new objective function that encourages smoothness of inferred instance-level labels based on instance-level similarity, while at the same time respecting group-level label constraints. We apply this approach to the problem of predicting labels for sentences given labels for reviews, using a convolutional neural network to infer sentence similarity. The approach is evaluated using three large review data sets from IMDB, Yelp, and Amazon, and we demonstrate the proposed approach is both accurate and scalable compared to various alternatives. |
Dec 7
|
Finals week
(no seminar)
|
Spring 2015
Standard
Mar 30
Bren Hall 4011 1 pm |
In a physical neural system, where storage and processing are intertwined, the rules for adjusting the synaptic weights can only depend on variables that are available locally, such as the activity of the pre- and post-synaptic neurons. We propose a systematic framework to define and study the space of local learning rules where one must first define the nature of the local variables, and then the functional form that ties them together into a learning rule. We consider polynomial learning rules and analyze their behavior and capabilities in both linear and non-linear networks. As a byproduct, we also show how this framework enables the discovery of new learning rules and important relationships between learning rules and group symmetries.
Stacking local learning rules in deep feedforward networks leads to deep local learning. While deep local learning can learn interesting representations, we show however that it cannot learn complex input-output functions, even when targets are available for the top layer. Learning complex input-output functions requires local deep learning where target information is propagated to the deep layers. The complexity of the propagated information about the targets and the channel through which this information is propagated partition the space of learning algorithms and highlight the remarkable power of the backpropagation algorithm. The theory clarifies the concept of Hebbian learning, what is learnable by Hebbian learning, and explains the sparsity of the space of learning rules discovered so far. |
Apr 6
Bren Hall 4011 1 pm |
Maryam M. Shanechi
Assistant Professor Department of Electrical Engineering and Computer Science University of Southern California
A brain-machine-interface (BMI) is a system that interacts with the brain either to allow the brain to control an external device or to control the brain’s state. While these two BMI types are for different applications, they can both be viewed as closed-loop control systems. In this talk, I present our work on developing both these types of BMIs, specifically motor BMIs for restoring movement in paralyzed patients and a new BMI for control of the brain state under anesthesia. Motor BMIs have largely used standard signal processing techniques. However, devising novel algorithmic solutions that are tailored to the neural system can significantly improve the performance of these BMIs. Here, I develop a novel BMI paradigm for restoration of motor function that incorporates an optimal feedback-control model of the brain and directly processes the spiking activity using point process modeling. I show that this paradigm significantly outperforms the state-of-the-art in closed-loop primate experiments. In addition to motor BMIs, I construct a new BMI that controls the state of the brain under anesthesia. This is done by designing stochastic controllers that infer the brain’s anesthetic state from non-invasive observations of neural activity and control the real-time rate of drug administration to achieve a target brain state. I show the reliable performance of this BMI in rodent experiments.
Bio: Maryam Shanechi is an assistant professor in the Ming Hsieh Department of Electrical Engineering at the University of Southern California (USC). Prior to joining USC, she was an assistant professor in the School of Electrical and Computer Engineering at Cornell University. She received the B.A.Sc. degree in Engineering Science from the University of Toronto in 2004 and the S.M. and Ph.D. degrees in Electrical Engineering and Computer Science from MIT in 2006 and 2011, respectively. She is the recipient of the NSF CAREER Award and has been named by the MIT Technology Review as one of the world’s top 35 innovators under the age of 35 (TR35) for her work on brain-machine interfaces. |
Apr 13
Bren Hall 4011 1 pm |
AsterixDB is a new BDMS (Big Data Management System) with a feature set that sets it apart from other Big Data platforms in today’s open source ecosystem. Its features make it well-suited to applications including web data warehousing, social data storage and analysis, and other use cases related to Big Data. AsterixDB has a flexible NoSQL style data model; a query language that supports a wide range of queries, a scalable runtime; partitioned, LSM-based data storage and indexing (including B+ tree, R tree, and text indexes); support for external as well as native data; a rich set of built-in types, including spatial, temporal, and textual types; support for fuzzy, spatial, and temporal queries; a built-in notion of data feeds for ingestion of data; and transaction support akin to that of a NoSQL store.
Development of AsterixDB began in 2009 and led to a mid-2013 initial open source release. This talk will provide an overview of the resulting system. Time permitting, the talk will cover the system’s data model, its query language, and its basic architecture. Also included will be a summary of the current status of the project and a discussion of some of the “plug-in points” where AsterixDB can be made to interoperate with ML technologies. The talk will conclude with some thoughts on opportunities for future ML-related collaborations related to AsterixDB. Bio: Michael J. Carey is a Bren Professor of Information and Computer Sciences at UC Irvine. Before joining UCI in 2008, he worked at BEA Systems for seven years and led the development of BEA’s AquaLogic Data Services Platform product for virtual data integration. He also spent a dozen years teaching at the University of Wisconsin-Madison, five years at the IBM Almaden Research Center working on object-relational databases, and a year and a half at e-commerce platform startup Propel Software during the infamous 2000-2001 Internet bubble. Carey is an ACM Fellow, a member of the National Academy of Engineering, and a recipient of the ACM SIGMOD E.F. Codd Innovations Award. His current interests center around data-intensive computing and scalable data management (a.k.a. Big Data). |
Apr 20
Bren Hall 4011 1 pm |
N-body problems are ubiquitous with applications ranging from linear algebra to scientific computing and machine learning. N-body methods were identified as one of the original 7 dwarves or motifs of computation and are believed to be important in the next decade. These methods include FMMs, Treecodes, H-matrices, Butterfly algorithms, and geometric shattering. The relationship between these approaches is understood, but many of the demonstrated tools for developing and applying these algorithms remain ad-hoc, inaccessible, or inefficient.
We present recent developments towards a codebase that is abstracted over the primary domains of research in this field and is optimized for modern multicore systems. Core components including tree construction, tree traversal, and low-rank operators are developed independently and parallelized for multicore CPUs and GPUs. Applications include dense problems in machine learning and computational geometry (k-nearest neighbors, range search, kernel density estimation, Gaussian processes, and RBF kernels), treecode and fast multipole methods in computational physics (gravitational potentials, screened Coulomb interactions, Stokes flow, and Helmholtz equations), and matrix compression, computation, and inversion (PLR, HODLR, H2, and Butterfly). In this presentation, we will review a high-level perspective of the research domain, the abstraction and parallelization strategies, and how these methods can be made more practical. Bio: Cris received his PhD from Stanford University in Computational and Mathematical Engineering in 2011. As a lecturer and research scientist with the new Institute for Applied Computational Science at Harvard University, he developed core courses on parallel computing and robust software development for scientific computing. In 2014, Cris joined the Mathematics Department at the Massachusetts Institute of Technology as a research associate where he focused on developing and applying generalized N-body methods to dense linear algebra using hierarchical methods. Currently, he works in NVIDIA Research to continue to make these techniques accessible with modern parallel programming models. You can read more about his research on his Harvard web page. |
Apr 27
|
Cancelled
(no seminar)
|
May 4
Bren Hall 4011 1 pm |
Hidden Markov models (HMMs) are a standard tool in the modeling and analysis of time series with a wide variety of applications. Yet, learning their parameters remain a challenging problem. In the first part of the talk I will present a novel approach to learning an HMM whose outputs are distributed according to a parametric family. This is done by decoupling the learning task into two steps: first estimating the output parameters, and then estimating the hidden states transition probabilities. The first step is accomplished by fitting a mixture model to the output stationary distribution. Given the parameters of this mixture model, the second step is formulated as the solution of an easily solvable convex quadratic program. We provide an error analysis for the estimated transition probabilities and show they are robust to small perturbations in the estimates of the mixture parameters.
The above approach (and other recently proposed spectral/tensor methods) strongly depends on the assumption that all states have different output distributions. In various applications, however, some of the hidden states are aliased, having identical output distributions. The minimality, identifiability and learnability of such aliased HMMs have been long standing problems, with only partial solutions provided thus far. In the second part of the talk, as a first step, I will focus on parametric-output HMMs that have exactly two aliased states. For this class, we present a complete characterization of their minimality and identifiability. Furthermore, we derive computationally efficient and statistically consistent algorithms to detect the presence of aliasing and learn the aliased HMM transition parameters. We illustrate our theoretical analysis by several simulations. A joint work with Boaz Nadler and Aryeh Kontorovich. |
May 11
Bren Hall 4011 1 pm |
Many statistical and machine-learning applications call for estimating Gaussian mixtures using a limited number of samples and computational time. PAC (proper) learning estimates a distribution in a class by some distribution in the same class to a desired accuracy. Using spectral projections we show that spherical Gaussian mixtures in d-dimensions can be PAC learned with O*(d) samples, and that the same applies for learning the distribution’s parameters. Our algorithm is information theoretically near-optimal and significantly improves previously known time and sample complexities. |
May 18
Bren Hall 4011 1 pm |
Natural images are scale invariant with structures at all length scales. After a tutorial on critical phenomena and percolation theory, I will talk about formulating a geometric view of scale invariance. In this model, the scale invariance of natural images is understood as a second-order percolation phase transition. It is further quantified by fractal dimensions, and by the scale-free distribution of clusters in natural images. This formulation leads to a method for identifying clusters in images and a starting point for image segmentation.
Bio: Saeed Saremi received the Ph.D. degree in theoretical physics from MIT. He then joined the lab of Terry Sejnowski at the Salk Institute as a postdoctoral fellow. His research blends machine learning, statistical mechanics, and computational neuroscience, with the long-term goal of understanding the principles for achieving artificial intelligence. |
June 1
Bren Hall 4011 1 pm |
Teams of voting agents have great potential in finding optimal solutions, and they have been used in many important domains, such as: machine learning, crowdsourcing, forecasting systems, and even board games. Voting is popular since it is highly parallelizable, easy to implement and provide theoretical guarantees. However, there are three fundamental challenges: (i) Selecting a limited number of agents to compose a team; (ii) Combining the opinions of the team members; (iii) Assessing the performance of a given team. In this talk, I address all these challenges, showing both theoretical and experimental results. I explore three different domains: Computer Go, HIV prevention via influencing social networks and architectural design.
Bio: Leandro Soriano Marcolino is a PhD student at University of Southern California (USC), advised by Milind Tambe. He has published in several prestigious conferences in AI, robotics and machine learning, such as AAAI, AAMAS, IJCAI, NIPS, ICRA and IROS. He received the best research assistant award from the Computer Science Department at USC, had a paper nominated for best paper from the leading multi-agent conference AAMAS, and had his undergraduate work selected as the best one by the Brazilian Computer Science Society. He has been researching continuously about teamwork and cooperation, and obtained his masters degree in Japan, with the highly-competitive Monbukagakusho scholarship. Over his career, Leandro has published about a variety of domains, such as swarm robotics, computer Go, social networks, bioinformatics and architectural design. |
June 8
Bren Hall 4011 1 pm |
Quentin Berthet
CMI Postdoctoral Fellow Computing + Mathematical Sciences, Annenberg Center California Institute of Technology
Statistical estimation in many contemporary settings involves the acquisition, analysis, and aggregation of datasets from multiple sources, which can have significant differences in character and in value. Due to these variations, the effectiveness of employing a given resource – e.g., a sensing device or computing power – for gathering or processing data from a particular source depends on the nature of that source. As a result, the appropriate division and assignment of a collection of resources to a set of data sources can substantially impact the overall performance of an inferential strategy. In this expository article, we adopt a general view of the notion of a resource and its effect on the quality of a data source, and we describe a framework for the allocation of a given set of resources to a collection of sources in order to optimize a specified metric of statistical efficiency. We discuss several stylized examples involving inferential tasks such as parameter estimation and hypothesis testing based on heterogeneous data sources, in which optimal allocations can be computed either in closed form or via efficient numerical procedures based on convex optimization. Joint work with V. Chandrasekaran. |
Winter 2015
Standard
Jan 12
Bren Hall 4011 1 pm |
Mixture models are based on the hypothesis that real data can be viewed as arising from a probabilistic generative model with a small number of parameters. Examples include topic models for documents, hidden Markov models for speech, gaussian mixture models for point data, etc. The model parameters often give interesting semantic insights into the data.
I will first discuss algorithms for parameter estimation in mixture models via the use of tensors, or higher dimensional arrays. The idea is to estimate (using the data) a tensor, whose decomposition allows us to read off the hidden parameters. Thus parameter estimation is reduced to tensor decomposition. Unfortunately, tensor decomposition is NP-hard in general. However, I will show that there exist algorithms that “almost always” work efficiently, in the framework of smoothed analysis, as long as the “rank” is at most a polynomial in the number of dimensions. Next I will consider the case in which we wish to learn mixtures in small (constant number of) dimensions. Tensor methods do not apply to this regime, and indeed, there are lower bounds which say that exponentially many samples are needed to “identify” the parameters. However I will show that under a slightly relaxed objective, we can obtain “PAC” style learning algorithms. This follows from a more general theorem about sparse recovery with \ell_1 error, which I will describe. |
Jan 26
Bren Hall 4011 1 pm |
Currently, deep neural networks are the state of the art on problems suchas speech recognition and computer vision. By using a method called model compression, we show that shallow feed-forward nets can learn the complex functions previously learned by deep nets and achieve accuracies previously only achievable with deep models while using the same number of parameters as the original deep models. On the TIMIT phoneme recognition and CIFAR-10 image recognition tasks, shallow nets can be trained that perform similarly to complex, well-engineered, deeper convolutional architectures. The same model compression trick can also be used to compress impractically large deep models and ensembles of large deep models down to “medium-size” deep models that run more efficiently on servers, and down to “small” models that can run on mobile devices. In machine learning and statistics we used to believe that one of the keys to preventing overfitting was to keep models simple and the number of parameters small to force generalization. We no longer believe this — learning appears to generalize best when training models with excess capacity, but the learned functions can often be represented with far fewer parameters. We do not yet know if this is true just of current learning algorithms, or if it is a fundamental property of learning in general. |
Feb 2
Bren Hall 4011 1 pm |
Machine learning plays a major role in analyzing data from the Large Hadron Collider at CERN, and was used to discover the Higgs boson in 2012. We demonstrate that deep learning increases statistical power of this analysis, and that deep neural networks can automatically learn high-level features that usually need to be engineered by physicists. Furthermore, we describe how to automatically (and cheaply) tune deep neural network hyperparameters using Amazon EC2 GPU servers and free software tools. |
Feb 5
Bren Hall 4011 1 pm |
In recent years, the rapid proliferation of mobile phones in developing countries has provided billions of individuals with novel opportunities for social and economic interaction. Concurrently, the data generated by mobile phone networks is enabling new data-intensive methods for studying the social and economic behavior of individuals in resource-constrained environments. After all, these data reflect much more than simple communications activity: they capture the structure of social networks, decisions about expenditures and consumption, patterns of travel and mobility, and the regularity of daily routines. In this talk, I will discuss the results from two recent projects that derive behavioral insights from mobile phone data. The first study uses data on Mobile Money transfers in Rwanda and microeconomic models to better understand the motives that cause people to send money to friends and family in times of need. The second project combines call data with follow-up phone surveys to investigate the extent to which it is possible to predict an individual’s wealth and happiness based on his or her prior history of phone calls and several supervised learning models. These projects are enabled by generous support from the Institute for Money, Technology, and Financial Inclusion; Intel; the Gates Foundation; and the NSF. |
Feb 9
Bren Hall 4011 1 pm |
Shimon Whiteson
Associate Professor Intelligent Autonomous Systems Group, Informatics Institute University of Amsterdam
In this talk, I will propose a new method for the K-armed dueling bandit problem, a variation on the regular K-armed bandit problem that offers only relative feedback about pairs of arms. Our approach extends the Upper Confidence Bound algorithm to the relative setting by using estimates of the pairwise probabilities to select a promising arm and applying Upper Confidence Bound with the winner as a benchmark. We prove a sharp finite-time regret bound of order O(K log T) on a very general class of dueling bandit problems that matches a lower bound proven by Yue et al. In addition, our empirical results using real data from an information retrieval application show that it greatly outperforms the state of the art. |
Feb 23
Bren Hall 4011 1 pm |
Cylance
Glenn Chisholm, Chief Technology Officer, Cylance Matt Wolff, Chief Data Scientist, Cylance Michael Wojnowicz, Data Scientists, Cylance
Traditional approaches to detecting malware, namely those used by current antivirus methodologies, are increasingly comprised by sophisticated attackers who may have financial, social, or nationalistic motives. In just the last few weeks, there have been successful hacking attacks against Sony and Anthem, as well as a banking attack by the Carbanak group that stole over $1 billion for various financial institutions. Traditional antivirus approaches utilize manual analysis of files to identify malware; however these techniques simply do not scale to the volume of malware that now exists. At the time of this writing, an estimated million distinct newly suspicious files per day are generated. Clearly, manual inspection of all these files by human analysis is not feasible.
At Cylance, we have developed a machine learning engine to help reduce or remove the need for manual analysis. In this talk, we will dive in into various components of our machine learning infrastructure, with the goal of providing some insight into how one can apply machine learning to problems in industry and against large datasets. A machine learning approach to cybersecurity presents a wide array of interesting challenges in areas of feature extraction, feature engineering, dimensionality reduction, and modeling. Specific topics that will be discussed include designing models with speed in mind, why different optimization methods do or do not perform well against various types of data, and feature engineering based on wavelet analysis and entropy series. |
Mar 9
Bren Hall 4011 1 pm |
Fan-Gang Zeng
Professor of Anatomy and Neurobiology, Biomedical Engineering and Cognitive Sciences, UC Irvine Director, Center for Hearing Research Director of Research, Otolaryngology – Head and Neck Surgery, UC Irvine
Deafness affects not only speech communication but also language development including speaking and reading. The bionic ear or the cochlear implant is a modern medical device that allows the first successful restoration of a human sense. It works well for either adults who have lost hearing postlingually or children who receive the device prelingually. It doesn’t work well for those who never hear during development but get an implant in adulthood. My talk will address two issues. First, I will describe development of the modern cochlear implant from signal processing perspective, which clearly abides by the “KISS” principle. Second, I will emphasize the importance of the brain in the cochlear implant success, speculating on the structure and rules for the neural network to learn how to process speech. |
Fall 2014
Standard
Oct 6
Bren Hall 4011 1 pm |
Self-paced learning for long-term tracking / Efficient Matching of 3D Hand Exemplars in RGB-D Images Self-paced learning for long-term tracking
We address the problem of long-term object tracking, where the object may become occluded or leave-the-view. In this setting, we show that an accurate appearance model is considerably more effective than a strong motion model. We develop simple but effective algorithms that alternate between tracking and learning a good appearance model given a track. We show that it is crucial to learn from the “right” frames, and use the formalism of self-paced curriculum learning to automatically select such frames. We leverage techniques from object detection for learning accurate appearance-based templates, demonstrating the importance of using a large negative training set (typically not used for tracking). We describe both an offline algorithm (that processes frames in batch) and a linear-time online (i.e. causal) algorithm that approaches real-time performance. Our models significantly outperform prior art, reducing the average error on benchmark videos by a factor of 4. Efficient Matching of 3D Hand Exemplars in RGB-D Images We focus on the task of single-image hand detection and pose estimation from RGB-D images. While much past work focuses on estimation from temporal video sequences, we consider the problem of single-image pose estimation, necessary for (re) initialization. The high number of degrees-of-freedom, frequent self-occlusions, and pose ambiguities make this problem rather challenging. While previous approaches tend to rely on articulated hand models or local part classifiers, our models are based on discriminative pose exemplars that can be quickly indexed with parts. We propose novel metric depth features that make the search over exemplars accurate and fast. Importantly, our exemplar models can reason about depth-aware occlusion. Finally, we also provide an extensive evaluation of the state-of-the-art, including academic and commercial systems, on a real-world annotated dataset. We show that our model outperforms such methods, providing promising results even in the presence of occlusions. |
Oct 13
Bren Hall 4011 1 pm |
Probabilistic modeling of ranking data is an extensively studied problem with applications ranging from understanding user preferences in electoral systems and social choice theory, to more modern learning tasks in online web search, crowd-sourcing and recommendation systems. This work concerns learning the Mallows model — one of the most popular probabilistic models for analyzing ranking data. In this model, the user’s preference ranking is generated as a noisy version of an unknown central base ranking. The learning task is to recover the base ranking and the model parameters using access to noisy rankings generated from the model.
Although well understood in the setting of a homogeneous population (a single base ranking), the case of a heterogeneous population (mixture of multiple base rankings) has so far resisted algorithms with guarantees on worst case instances. In this talk I will present the first polynomial time algorithm which provably learns the parameters and the unknown base rankings of a mixture of two Mallows models. A key component of our algorithm is a novel use of tensor decomposition techniques to learn the top-k prefix in both the rankings. Before this work, even the question of identifiability in the case of a mixture of two Mallows models was unresolved. Joint work with Avrim Blum, Or Sheffet and Aravindan Vijayaraghavan. |
Oct 20
Bren Hall 4011 1 pm |
We propose a joint longitudinal-survival model for associating summary measures of a longitudinally collected biomarker with a time-to-event endpoint. The model is robust to common parametric and semi-parametric assumptions in that it avoids simple distributional assumptions on longitudinal measures and allows for non-proportional hazards covariate effects in the survival component. Specifically, we use a Gaussian process model with a parameter that captures within-subject volatility in the longitudinally sampled biomarker, where the unknown distribution of the parameter is assumed to have a Dirichlet process prior. We then estimate the association between within-subject volatility and the risk of mortality using a flexible survival model constructed via a Dirichlet process mixture of Weibull distributions. Fully joint estimation is performed to account for uncertainty in the estimated within-subject volatility measure. Simulation studies are presented to assess the operating characteristics of the proposed model. Finally, the method is applied to data from the United States Renal Data System where we estimate the association between within-subject volatility in serum album and the risk of mortality among patients with end-stage renal disease. |
Oct 27
Bren Hall 4011 1 pm |
Many emerging applications of machine learning involve time series and spatio-temporal data. In this talk, I will discuss a collection of machine learning approaches to effectively analyze and model large-scale time series and spatio-temporal data, including temporal causal models, sparse extreme-value models, and fast tensor-based forecasting models. Experiment results will be shown to demonstrate the effectiveness of our models in practical applications, such as climate science, social media and biology.
Bio: Yan Liu is an assistant professor in Computer Science Department at University of Southern California from 2010. Before that, she was a Research Staff Member at IBM Research. She received her M.Sc and Ph.D. degree from Carnegie Mellon University in 2004 and 2006. Her research interest includes developing scalable machine learning and data mining algorithms with applications to social media analysis, computational biology, climate modeling and business analytics. She has received several awards, including NSF CAREER Award, Okawa Foundation Research Award, ACM Dissertation Award Honorable Mention, Best Paper Award in SIAM Data Mining Conference, Yahoo! Faculty Award and the winner of several data mining competitions, such as KDD Cup and INFORMS data mining competition. |
Nov 3
Bren Hall 4011 1 pm |
inference with uncertainty, and form a main field in Artificial Intelligence today. However, their usual form is restricted to a *propositional* representation, in the same way propositional logic is restricted when compared to relational first-order logic.
For encoding complex probabilistic models, we need richer, relational, quantified representations that yield a form of Probabilistic Logic. While propositionalization is an option for processing such encodings, it is not scalable. The field of *lifted* probabilistic inference seeks to process first-order relational probabilistic models on the relational level, avoiding grounding or propositionalizing as much as possible. I will talk about relational probabilistic models and give the main ideas about lifted probabilistic inference, and also comment on the relationship of all that to Probabilistic Programming, exemplified by probabilistic programming languages such as Church and BLOG. Bio: Rodrigo de Salvo Braz is a Computer Scientist at SRI International. He earned a PhD from the University of Illinois in 2007 with a thesis contributing some of the earliest ideas on Lifted Probabilistic Inference. He did a postdoc at UC Berkeley with Stuart Russell, working on the BLOG language, and is currently the PI of SRI’s project for DARPA’s Probabilistic Programming Languages for Advanced Machine Learning. |
Nov 10
Bren Hall 4011 1 pm |
I will review the ways that machine learning is typically used in particle physics, some recent advancements, and future directions. In particular, I will focus on the integration of machine learning and classical statistical procedures. These considerations motivate a novel construction that is a hybrid of machine learning algorithms and more traditional likelihood methods. |
Nov 17
Bren Hall 4011 1 pm |
Many prediction domains, ranging from content recommendation in a digital system to motion planning in a physical system, require making structured predictions. Broadly speaking, structured prediction refers to any type of prediction performed jointly over multiple input instances, and has been a topic of active research in the machine learning community over the past 10-15 years. However, what has been less studied is how to model structured prediction problems for an interactive system. For example, a recommender system necessarily interacts with users when recommending content, and can learn from the subsequent user feedback on those recommendations. In general, each “prediction” is an interaction where the system not only predicts a structured action to perform, but also receives feedback (i.e., training data) corresponding to the utility of that action.
In this talk, I will describe methods for balancing the tradeoff between exploration (collecting informative feedback) versus exploitation (maximizing system utility) when making structured predictions in an interactive environment. Exploitation corresponds to the standard prediction goal in non-interactive settings, where one predicts the best possible action given the current model. Exploration refers to taking actions that maximize the informativeness of the subsequent feedback, so that one can exploit more reliably in future interactions. I will show how to model and optimize for this tradeoff in two settings: diversified news recommendation (where the feedback comes from users) and adaptive vehicle routing (where the feedback comes from measuring congestion). This is joint work with Carlos Guestrin, Sue Ann Hong, Ramayya Krishnan and Siyuan Liu. |
Nov 24
Bren Hall 4011 1 pm |
Learning several latent variable models including multiview mixtures, mixture of Gaussians, independent component analysis and so on can be done by the decomposition of a low-order moment tensor (e.g., 3rd order tensor) to its rank-1 components. Many earlier studies using tensor methods only consider undercomplete regime where the number of hidden components is smaller than the observed dimension. In this talk, we show that the tensor power iteration (as the key element for tensor decomposition) works well even in the overcomplete regime where the hidden dimension is larger than the observed dimension. We establish that a wide range of overcomplete latent variable models can be learned efficiently with low computational and sample complexity through tensor power iteration. |
Dec 1
Bren Hall 4011 1 pm |
Designing latent variable learning methods, which have guaranteed bounded sample complexity, has become one of the recent research trend in the last few years. I will pick topic modeling as an example and discuss various learning algorithms along with their sample/computational complexity bounds. These bounds has been derived under the so-called topic separability assumption, which requires every topic to have at least a single word unique to it. It could be shown that under separability of topics, \ell_1 normalized rows of the word-word co-occurence probability matrix are embedded inside a convex polytope, whose vertices correspond only to the novel words of different topics. Moreover, these vertices characterize the topic proportion matrix. I will elaborate how these two facts could be used to design provable, highly distributable, and computational efficient algorithms for topic modeling. |
Spring 2014
Standard
Mar 31
|
First day of classes
(no seminar)
|
Apr 7
Bren Hall 4011 1 pm |
Probabilistic graphical models describe a potentially large number of random variables coupled with each other through some dependency mechanism. One of the main problems underlying graphical models is to infer the values of certain variables based on observations of other variables. Rigorous analysis of statistical inference algorithms can be very complicated even for relatively simple models. Instead, methods based on statistical physics of disordered systems provide a viable alternative. Here I will demonstrate the application of those methods on two problems, MAP estimation of Hidden Markov Models and Stochastic Block Models. The inference in both problems can become highly unstable due to a critical phase transition in the corresponding statistical physical system. Those instabilities are caused by frustrated (e.g., conflicting) constraints that are imposed on the inference objective. I will also discuss how one can mitigate this undesirable feature by active inference, i.e., by adaptively acquiring information about the (true) states of the hidden variables. |
Apr 9
Bren Hall 4011 4pm |
Today’s energy systems, such as electric power grids and gas grids, already demonstrate complex nonlinear dynamics where, e.g., collective effects in one exert uncertainty and irregularities on other. These collective dynamics are not well understood and are expected to become more complex tomorrow as the grids are pushed to reliability limits, interdependencies grow, and appliances become more intelligent and autonomous. Tomorrow’s will have to integrate the intermittent power from wind and solar farms whose fluctuating outputs create far more complex stress on power grid operations, often dependent, e.g. in providing fast regulation control, on the gas supply. Conversely, one anticipates significant effect of the wind-following gas fired turbines on reliability of the gas grid. Guarding against the worst of those perturbations will require taking protective measures based on ideas from optimization, control, statistics and physics.
In this talk we introduce a few of the physical, optimization and control principles and phenomena in today’s energy grids and those that are expected to play a major role in tomorrow’s grids. We illustrate the new science of the energy grids on three examples: (a) discussing an efficient and highly scalable Chance Constrained Optimal Power Flow algorithm providing risk-aware control of the power transmission system under uncertainty associated with fluctuating renewables (wind farms); (b) describing effect of the intermittent power generation on reliability and compression control of the gas grid operations; and (c) briefly discussing examples of interdependencies, reliability troubles and solutions in the low level (distribution) grids. Bio:Dr. Chertkov’s areas of interest include statistical and mathematical physics applied to energy and communication networks, machine learning, control theory, information theory, computer science, fluid mechanics and optics. Dr. Chertkov received his Ph.D. in physics from the Weizmann Institute of Science in 1996, and his M.Sc. in physics from Novosibirsk State University in 1990. After his Ph.D., Dr. Chertkov spent three years at Princeton University as a R.H. Dicke Fellow in the Department of Physics. He joined Los Alamos National Lab in 1999, initially as a J.R. Oppenheimer Fellow in the Theoretical Division. He is now a technical staff member in the same division. Dr. Chertkov has published more than 130 papers in these research areas. He is an editor of the Journal of Statistical Mechanics (JSTAT), associate editor of IEEE Transactions on Control of Network Systems, a fellow of the American Physical Society (APS), and a Founding Faculty Fellow of Skoltech – young graduate school built in Moscow (Russia). |
Apr 14
Bren Hall 4011 1 pm |
We will discuss the detection of patterns over graphs from noisy measurements. This problem is relevant to many applications including detecting anomalies in sensor and computer networks, brain activity, co-expressions in gene networks, disease outbreaks, etc. Beyond its wide applicability, graph structured anomaly detection serves as a case study in the difficulty of balancing computational complexity with statistical power. We develop from first principles the generalized likelihood ratio test (GLRT) for determining if there is a well connected region of activation over the vertices in the graph with Gaussian noise. Due to the combinatorial nature of this test, the GLRT is computationally infeasible. We discuss two approaches, relaxing the combinatorial GLRT and a wavelet construction over graphs, to overcome this issue.
One such relaxation that we develop is the graph ellipsoid scan statistic, whose statistical performance is characterized by the spectrum of the graph Laplacian. Another relaxation that we have developed is the Lovasz extended scan statistic (LESS), which is based on submodular optimization and the performance is described using electrical network theory. We then introduce the spanning tree wavelet basis over graphs, a localized basis that reflects the topology of the graph. We show that using the uniform spanning tree in the basis construction yields a randomized test with performance guarantees similar to that of the LESS. For each of these tests we compare their statistical guarantees to an information theoretic lower bound. Finally, we consider specific graph models, such as the torus, k-nearest neighbor graphs, and epsilon-random graphs and show that for these graphs we achieve near-optimal risk consistency regimes. |
Apr 28
Bren Hall 4011 1 pm |
Modern data science applications increasingly involve statistical learning on very large datasets, where the data instances are stored in a distributed way across different nodes of the clusters, with expensive communication costs between nodes. We study a simple communication-efficient learning algorithm that first calculates the local maximum likelihood estimates (MLE) based on the subsets of datasets, and then combines the local MLEs to achieve the best possible approximation to the global MLE, based on the whole dataset jointly. A naive and commonly used combination method is to take the linear average of the local ML estimates; this method; however, this has a sub-optimal error rate, and more critically, can easily break down in practical cases where the parameters are either unidentifiable (e.g., in mixture models), non-additive, or have complicated structure. In this work, we propose a KL-divergence-based combination method that achieves the best possible error rate, and avoids weaknesses of linear averaging. Perhaps surprisingly, we show that our algorithm exactly recovers the global MLE under full exponential families, and its error rate on general distributions are related to how nearly “exponential family” they are — formally captured by Efron’s statistical curvature, originally defined by Efron (1975) to extend Fisher and Rao’s theory of information loss and second order efficiency of the MLE. In addition, we show that the statistical curvature equals the lower bound of the asymptotic error rate of arbitrary combination methods, and hence represents an intrinsic difficulty measurement of distributed learning in this setting. |
May 5
Bren Hall 4011 1 pm |
Automatically tracking people and their body poses in unconstrained videos is an important task, as it serves as a foundation for high-level reasoning such as activity recognition. This task is difficult for two reason: a) building an accurate pose detector is hard, and b) dependencies of body parts over space and time is hard to model or causes intractable computation.
This talk consists of two parts. In the first part, I address two key challenges in a common pipeline of pose tracker: 1) detecting small people and 2) extracting diverse set of candidate poses from each frame. I describe novel multiresolutional representation, motion descriptors, and inference algorithms to tackle these challenges. In the second part, I propose the use of synthetic training frames as a mean to “overfit” a single video. Using a simple synthesis engine and detailed annotation of the first frame, we synthesize potential future video frames. We argue that this large customized training serves as an ideal training set relieving the burden of modeling, and provides us with insights on critical components of a working tracker. |
May 12
Bren Hall 4011 1 pm |
Exploratory data mining methods, such as methods for clustering, association analysis, community detection, dimensionality reduction, etc., aim to assist a user in improving their understanding about the data. In this talk I will discuss a simple mathematical model for the exploratory data mining process that makes it possible to quantify how effective any given pattern (in the broad sense) is for this purpose. This quantification is naturally subjective, dependent on any prior beliefs the user may hold about the data.
While the proposed model is abstract and generic, it suggests practical ways for developing specific exploratory data mining methods that present patterns that are subjectively interesting to the user. I will illustrate this by showing how it leads to principled approaches for alternative clustering, for community detection in networks, and for association analysis in simple data tables as well as in relational databases. From May 2014 my research on this topic will be funded by an ERC Consolidator Grant titled “Formalising Subjective Interestingness in Exploratory Data mining” (FORSIED). Relevant references: http://www.tijldebie.net/projects/fiip Bio: Tijl De Bie is currently a Reader (Associate Professor) at the University of Bristol, where he was appointed Lecturer (Assistant Professor) in January 2007. Before that, he was a postdoctoral researcher at the KU Leuven (Belgium) and the University of Southampton. He completed his PhD on machine learning and advanced optimization techniques in 2005 at the KU Leuven. During his PhD he also spent a combined total of about 1 year as a visiting research scholar in U.C. Berkeley and U.C. Davis. He is currently most actively interested in the formalization of subjective interestingness in exploratory data mining, and in the use of machine learning and data mining for music informatics as well as for web and social media mining. He currently holds a prestigious ERC Consolidator Grant titled “Formalising Subjective Interestingness in Exploratory Data Mining” (FORSIED). |
May 19
Bren Hall 4011 1 pm |
We argue that object subcategories follow a long-tail distribution: a few subcategories are common, while many are rare. We describe distributed algorithms for learning large-mixture models that capture long-tail distributions, which are hard to model with current approaches. We introduce a generalized notion of mixtures (or subcategories) that allow for examples to be shared across multiple subcategories. We optimize our models with a discriminative clustering algorithm that searches over mixtures in a distributed, “brute-force” fashion. We used our scalable system to train tens of thousands of deformable mixtures for VOC objects. We demonstrate significant performance improvements, particularly for object classes that are characterized by large appearance variation. |
May 26
|
Memorial Day
(no seminar)
|
June 2
Bren Hall 4011 1 pm |
Neurophysiological studies of the decision-making process commonly involve analyzing and modeling spikes produced by a neuron. Complex behaviors, however, are driven by networks of neurons. We propose a flexible Bayesian model for capturing temporal dependencies between multiple neurons under different types of decisions (e.g., safe vs. risky; good vs. bad). Using our model, we are able to identify a small subset of neurons that are involved in the decision-making process and detect the dynamics of their dependence structure. |
Winter 2014
Standard
Jan 13
Bren Hall 4011 1 pm |
Peter Grunwald
Coordinator Information-theoretic Learning Group Centrum voor Wiskunde en Informatica (CWI) & Leiden University
Bayesian inference can behave badly if the model under consideration is wrong yet useful: the posterior may fail to concentrate even in the large sample limit. We demonstrate this using a simple linear regression example. We then introduce a test that can tell from the data whether we are heading for such a situation. If we are, we adjust the learning rate (equivalently: make the prior lighter-tailed, or penalize the likelihood more) in a data-dependent way. The resulting “safe-Bayesian” estimator continues to achieve good rates with wrong models. In classification, it learns faster in easy settings, i.e. when a Tsybakov condition holds. The safe estimator is based on empirical mixability/exp-concavity, which generalizes an idea from worst-case online prediction. Thus, safe estimation connects three paradigms: Bayesian inference, (frequentist) statistical learning theory and (individual sequence) on-line prediction.
For an informal introduction to the idea, see Larry Wasserman’s blog: http://normaldeviate.wordpress.com/2012/06/26/self-repairing-bayesian-inference/ |
Jan 20
|
Martin Luther King, Jr Day
(no seminar)
|
Jan 27
Bren Hall 4011 1 pm |
Statistical machine learning has become an important driving force behind many application fields. By large, however, its theoretical underpinning has hinged on the stringent assumption that the learning environment is stationary. In particular, the data distribution on which statistical models are optimized is the same as the distribution to which the models are applied.
Real-world applications are far more complex than the pristine condition. For instance, computer vision systems for recognizing objects in images often suffer from significant performance degradation if they are evaluated on image datasets that are different from the dataset on which they are designed. In this talk, I will describe our efforts in addressing this important challenge of building intelligent systems that are robust to distribution disparity. The central theme is to learn invariant features, cast as learning kernel functions and adapt probabilistic models across different distributions (i.e., domains). To this end, our key insight is to discover and exploit hidden structures in the data. These structures, such as manifolds and discriminative clusters, are intrinsic and thus resilient to distribution changes due to exogenous factors. I will present several learning algorithms we have proposed and demonstrate their effectiveness in pattern recognition tasks from computer vision. This talk is based on the joint work with my students (Boqing Gong and Yuan Shi, both from USC) and our collaborator Prof. Kristen Grauman (U. of Texas, Austin). |
Feb 3
Bren Hall 4011 1 pm |
An optimization strategy is a mean to improve program performance, e.g., through architectural, execution and development environment enhancements (including new architectural mechanisms/instructions, the use of specialized libraries, new compiler optimizations or compiler optimization sequences, new algorithms). Constructing complex optimization strategies by composition of simpler optimizations has been shown to provide significant performance improvements to programs. However, composing simpler optimizations is non-trivial because (a) the combination of possibilities that are available is large and (b) the fact that the effect of the interplay of basic optimizations to program performance is difficult to predict. This talk will first briefly survey previous work on the application of machine learning techniques to construct program optimization strategies and second proposes a practical and widely applicable technique to construct program optimization strategies based on recommendation systems. Preliminary results are shown to support that the proposed technique is equally applicable to several compelling performance evaluation studies, including characterization, comparison and tuning of hardware configurations, compilers, run-time environments or any combination thereof. |
Feb 7
Bren Hall 4011 1pm |
Algorithms are exposed to randomness in the input or noise during the computation. How well can they preserve the information in the data w.r.t. the output space? Algorithms especially in Machine Learning are required to generalize over input fluctuations or randomization during execution. This talk elaborates a new framework to measure the “informativeness” of algorithmic procedures and their “stability” against noise. An algorithm is considered to be a noisy channel which is characterized by a generalization capacity (GC). The generalization capacity objectively ranks different algorithms for the same data processing task based on the bit rate of their respective capacities. The problem of grouping data is used to demonstrate this validation principle for clustering algorithms, e.g. k-means, pairwise clustering, normalized cut, adaptive ratio cut and dominant set clustering. Our new validation approach selects the most informative clustering algorithm, which filters out the maximal number of stable, task-related bits relative to the underlying hypothesis class. The concept also enables us to measure how many bit are extracted by sorting, feature selection or minimum spanning tree algorithms when the respective inputs are contaminated by noise.
Bio:Joachim M. Buhmann leads the Machine Learning Laboratory in the Department of Computer Science at ETH Zurich. He has been a full professor of Information Science and Engineering since October 2003. He studied physics at the Technical University Munich and obtained his PhD in Theoretical Physics. As postdoc and research assistant professor, he spent 1988-92 at the University of Southern California, Los Angeles, and the Lawrence Livermore National Laboratory. He held a professorship for applied computer science at the University of Bonn, Germany from 1992 to 2003. His research interests spans the areas of pattern recognition and data analysis, including machine learning, statistical learning theory and information theory. Application areas of his research include image analysis, medical imaging, acoustic processing and bioinformatics. Currently, he serves as president of the German Pattern Recognition Society. Co-sponsored with the Institute for Genomics and Bioinformatics |
Feb 10
Bren Hall 4011 1 pm |
The problem of finding the most probable, or MAP, configuration of a graphical model can also be cast as an integer linear programming (ILP) problem. Formulating the MAP problem as an ILP has led to the development of many approximate inference methods based on linear programming (LP) relaxations, including Tree-Reweighted Belief Propagation, MPLP and Max-Sum Diffusion. Recent work suggests that going in the opposite direction and posing an ILP as a MAP problem may also prove beneficial. In this talk, I’ll focus on a classic combinatorial optimization problem, the weighted matching problem, and show that if it is posed as a MAP inference problem then the Max-Product Belief Propagation (BP) algorithm always converges to the MAP configuration. Remarkably, this is true even though the weighted matching graphical model is loopy and BP is neither guaranteed to converge, nor be correct in such models. I’ll then discuss a cutting plane approach to solving the weighted matching problem that utilizes the BP algorithm to iteratively tighten the LP relaxation of the matching ILP. This line of work improves our understanding of the performance of the loopy BP algorithm in general and further strengthens the theoretical link between message passing algorithms and optimization theory.
This talk is based on joint work with Misha Chertkov (Los Alamos National Lab) and Jinwoo Shin (KAIST University). |
Feb 17
|
President’s Day
(no seminar)
|
Feb 24
Bren Hall 4011 1 pm |
The explosion of the number of devices capable of transmitting and receiving information connected to wireless and wired communication infrastructures, both for human-to-human and machine-to-machine interaction, is forcing a technological transition to a different inter-networking model. The communication infrastructure is merging from a collection of distinct networks to a single super-network deeply integrated with heterogeneous sub-systems. Mobile health-care, smart energy grids and social networks are just examples of complex systems interoperating with the global communication infrastructure. Adaptive learning and control are the key to empower the next generation of communication networks with the ability to operate and interoperate in heterogeneous environments and with heterogeneous sub-systems. However, the high complexity of these systems discourages the use of traditional tools in practice. In this seminar, I will discuss a novel framework for online learning and optimization in complex inter-networked systems based on sparse approximation theory and wavelet analysis. The key observation is that communication protocols, and supposedly natural and human phenomena, induce a structured behavior of the stochastic process tracking the state of the system. This induces a regular structure at different time scales (hop number) of the graph modeling state transitions, and enables dimensionality reduction based on graph wavelet analysis and graph filtering. Sparse approximation algorithms can be then employed to dramatically reduce the number of observations needed to estimate fundamental control functions mapped on the state space of the observed system. |
Mar 3
Bren Hall 4011 1 pm |
Deep learning methods have shown to be powerful approaches for modelling a large variety of data (speech, computer vision, natural language, biological, etc.) and solve a vast range of machine learning tasks (classification, regression, etc.). In this talk, I will present my recent research on using deep neural networks for the task of distribution/density estimation, one of the most fundamental problem in machine learning. Specifically, I will discuss the neural autoregressive distribution estimator (NADE), a state-of-the-art estimator of the probability distribution of data. I will then describe a deep version of NADE, which again illustrates the statistical modelling power of deep models.
Bio:Hugo Larochelle is Assistant Professor at the Université de Sherbrooke (UdeS). Before joining the Computer Science department of UdeS in 2011, he spent two years in the machine learning group at University of Toronto, as a postdoctoral fellow under the supervision of Geoffrey Hinton. He obtained his Ph.D. at Université de Montréal, under the supervision of Yoshua Bengio. He is the recipient of two Google Faculty Awards, acts as associate editor for the IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) and is a member of the editorial board of the Journal of Artificial Intelligence Research (JAIR). This talk co-sponsored with the Institute for Genomics and Bioinformatics. |
Mar 10
Bren Hall 4011 1 pm |
Statistical topic models such as latent Dirichlet allocation (LDA) have become a standard tool for analyzing text corpora, with broad applications to political science, the humanities, sociology, conversational dialog, and more. In recent years there has been an explosion in the amount of digital text information available, leading to challenges of scale for traditional inference algorithms for topic models. In this talk, I will present algorithms for learning and evaluating topic models, quickly, accurately and at scale.
In part one of the talk I will describe SCVB0, a stochastic variational Bayesian inference algorithm which exploits the collapsed representation of LDA. The algorithm can fit a topic model to Wikipedia in a matter of hours, and to the NIPS corpus of machine learning articles in seconds. It is also extremely simple and easy to implement. The second part of the talk considers the problem of evaluating topic models by computing the log-likelihood of held-out data, a task which requires approximating an intractable high-dimensional integral. Annealed importance sampling (AIS), a Monte Carlo integration technique which operates by annealing between two distributions, has previously been successfully used to evaluate topic models. We introduce new annealing paths which exhibit much lower empirical variance than the previous AIS approach (albeit at the cost of increased bias when given too few iterations), facilitating reliable per-document comparisons of topic models. We then show how to use these paths to evaluate the predictive performance of topic model learning algorithms on a per-iteration basis. The proposed method achieves better estimates at this task than previous algorithms, in some cases with an order of magnitude less computational effort. |
Mar 17
|
Finals Week
(no seminar)
|
Fall 2013
Standard
Sept 26
Bren Hall 4011 1 pm |
The dominant visual search paradigm for object class detection is sliding windows. Although simple and effective, it is also wasteful, unnatural and rigidly hardwired. We propose strategies to search for objects which intelligently explore the space of windows by making sequential observations at locations decided based on previous observations. Our strategies adapt to the class being searched and to the content of a particular test image, exploiting context as the statistical relation between the appearance of a window and its location relative to the object, as observed in the training set. In addition to being more elegant than sliding windows, we demonstrate experimentally on the PASCAL VOC 2010 dataset that our strategies evaluate two orders of magnitude fewer windows while achieving higher object detection performance. |
Oct 7
Bren Hall 4011 1 pm |
Massive datasets have imposed new challenges for the scientific community. Data-intensive problems are especially challenging for Bayesian methods, which typically involve intractable models that rely on Markov Chain Monte Carlo (MCMC) algorithms for their implementation. In this talk, I will discuss our recent attempts to develop a new class of scalable computational methods to facilitate the application of Bayesian statistics in data-intensive scientific problems. One approach uses geometrically motivated methods that explore the parameter space more efficiency by exploiting its geometric properties. Another approach uses techniques that are designed to speed up sampling algorithms through faster exploration of the parameter space. I will also discuss a possible integration of geometric methods with proper computational techniques to improve the overall efficiency of sampling algorithms so that they can be used for Big Data analysis. |
Oct 14
Bren Hall 4011 1 pm |
When reviewing scientific literature, it would be useful to have automatic tools that identify the most influential scientific articles as well as how ideas propagate between articles. In this context, this paper introduces topical influence, a quantitative measure of the extent to which an article tends to spread its topics to the articles that cite it. Given the text of the articles and their citation graph, we show how to learn a probabilistic model to recover both the degree of topical influence of each article and the influence relationships between articles. Experimental results on corpora from two well-known computer science conferences are used to illustrate and validate the proposed approach. |
Oct 21
Bren Hall 4011 1 pm |
Learning from prior tasks and transferring that experience to improve future performance is critical for building lifelong learning agents. Although results in supervised and reinforcement learning show that transfer may significantly improve the learning performance, most of the literature on transfer is focused on batch learning tasks. In this paper we study the problem of \textit{sequential transfer in online learning}, notably in the multi-armed bandit framework, where the objective is to minimize the cumulative regret over a sequence of tasks by incrementally transferring knowledge from prior tasks. We introduce a novel bandit algorithm based on a method-of-moments approach for the estimation of the possible tasks and derive regret bounds for it.
Bio: Mohammad Gheshlaghi Azar studied Electrical Engineering (control theory) at University of Tehran, Iran from 2003 till 2006. He then moved to Netherlands for Ph.d., where he worked with Professor Bert Kappen and Professor Remi Munos on the subject of statistical machine learning and reinforcement learning. Following finishing his Ph.D. in 2012, he joined the school of computer science at Carnegie Mellon university as a postdoctoral fellow, where he is working with Professor Brunskill on the subject of transfer of knowledge in sequential decision making problems. His research is focused on developing new machine learning algorithms which apply to life-long and real-world learning and decision making problems. |
Oct 28
Bren Hall 4011 1 pm |
This talk will describe Rephil, a system used widely within Google to identify the concepts or topics that underlie a given piece of text. Rephil determines, for example, that “apple pie” relates to some of the same concepts as “chocolate cake”, but has little in common with “apple ipod”. The concepts used by Rephil are not pre-specified; instead, they are derived by an unsupervised learning algorithm running on massive amounts of text. The result of this learning process is a Rephil model — a giant Bayesian network with concepts as nodes. I will discuss the structure of Rephil models, the distributed machine learning algorithm that we use to build these models from terabytes of data, and the Bayesian network inference algorithm that we use to identify concepts in new texts under tight time constraints. I will also discuss how Rephil relates to ongoing academic research on probabilistic topic models.
Bio: Brian Milch is a software engineer at Google’s Los Angeles office. He first joined Google in 2000, after completing a B.S. in Symbolic Systems at Stanford University. A year later, he entered the Computer Science Ph.D. program at U.C. Berkeley. He received his doctorate in 2006, with a thesis focused on the integration of probabilistic and logical approaches to artificial intelligence. He then spent two years as a post-doctoral researcher at MIT before returning to Google in 2008. He has contributed to Google production systems for spelling correction, transliteration, and semantic modeling of text. |
Nov 4
Bren Hall 4011 1 pm |
Survival analysis focuses on modeling and predicting the time to an event of interest. Traditional survival models (e.g., the prevalent proportional hazards model) often impose strong assumptions on hazard functions, which describe how the risk of an event changes over time depending on covariates associated with each individual. In this paper we propose a nonparametric survival model (GBMCI) that does not make explicit assumptions on hazard functions. Our model trains an ensemble of regression trees by the gradient boosting machine to optimize a smoothed approximation of the concordance index, which is one of the most widely used metrics in survival model evaluation. We benchmarked the performance of GBMCI against other popular survival models with a large-scale breast cancer prognosis dataset. Our experiment shows that GBMCI consistently outperforms other methods based on a number of covariate settings. |
Nov 11
|
Veterans Day
(no seminar)
|
Nov 18
Bren Hall 4011 1 pm |
Statistical models with constrained probability distributions are abundant in machine learning. Some examples include regression models with norm constraints (e.g., Lasso), probit models, many copula models, and Latent Dirichlet Allocation (LDA) models. Bayesian inference involving probability distributions confined to constrained domains could be quite challenging for commonly used sampling algorithms. For such problems, we propose a novel Markov Chain Monte Carlo (MCMC) method that provides a general and computationally efficient framework for handling boundary conditions. Our method first maps the $D$-dimensional constrained domain of parameters to the unit ball ${\bf B}_0^D(1)$, then augments it to the $D$-dimensional sphere ${\bf S}^D$ such that the original boundary corresponds to the equator of ${\bf S}^D$. This way, our method handles the constraints implicitly by moving freely on sphere generating proposals that remain within boundaries when mapped back to the original space. To improve the computational efficiency of our algorithm, we divide the dynamics into several parts such that the resulting split dynamics has a partial analytical solution as a geodesic flow on the sphere. We apply our method to several examples including truncated Gaussian, Bayesian Lasso, Bayesian bridge regression, and a copula model for identifying synchrony among multiple neurons. Our results show that the proposed method can provide a natural and efficient framework for handling several types of constraints on target distributions. |
Nov 25
|
Thanksgiving week
(no seminar)
|
Dec 2
Bren Hall 4011 1 pm |
|