Center member Anima Anandkumar, an assistant professor of electrical engineering and computer science, has been awarded a 2014 Sloan Research Fellowship for her work at the interface of theory and practice of large-scale machine learning and high-dimensional statistics. Bestowed annually since 1955 by the Alfred P. Sloan Foundation, the two-year fellowships go to 126 early-career scientists and scholars in the U.S. and Canada whose achievements and potential identify them as the next generation of scientific leaders. Read more here.
Two Center Members Elected as ACM Fellows
StandardThe Association for Computing Machinery (ACM), the world’s largest educational and scientific computing society, has announced two faculty members have earned prestigious honors. Computer science professors Rina Dechter and Padhraic Smyth have been named 2013 ACM Fellows. Read more
Carey and Li receive $1.1 million in funding for AsterixDB research
StandardProfessors Michael Carey and Chen Li have received $750,000 from the National Science Foundation and nearly $400,000 from corporations . including Google, Oracle and HTC . to continue the development of their Big Data system, AsterixDB. Read more
ICS grad students take second place in sbv IMPROVER competition
StandardPeter Sadowski and Michael Zeller, both Ph.D. students with the Department of Computer Science, earned a second-place finish in an international data-mining competition. The honor was given by sbv IMPROVER, a collaborative project designed to enable scientists to learn about and contribute to the development of a new crowdsourcing method for verification of scientific data and results. Read more
Winter 2014
Standard
Jan 13
Bren Hall 4011 1 pm |
Peter Grunwald
Coordinator Information-theoretic Learning Group Centrum voor Wiskunde en Informatica (CWI) & Leiden University
Bayesian inference can behave badly if the model under consideration is wrong yet useful: the posterior may fail to concentrate even in the large sample limit. We demonstrate this using a simple linear regression example. We then introduce a test that can tell from the data whether we are heading for such a situation. If we are, we adjust the learning rate (equivalently: make the prior lighter-tailed, or penalize the likelihood more) in a data-dependent way. The resulting “safe-Bayesian” estimator continues to achieve good rates with wrong models. In classification, it learns faster in easy settings, i.e. when a Tsybakov condition holds. The safe estimator is based on empirical mixability/exp-concavity, which generalizes an idea from worst-case online prediction. Thus, safe estimation connects three paradigms: Bayesian inference, (frequentist) statistical learning theory and (individual sequence) on-line prediction.
For an informal introduction to the idea, see Larry Wasserman’s blog: http://normaldeviate.wordpress.com/2012/06/26/self-repairing-bayesian-inference/ |
Jan 20
|
Martin Luther King, Jr Day
(no seminar)
|
Jan 27
Bren Hall 4011 1 pm |
Statistical machine learning has become an important driving force behind many application fields. By large, however, its theoretical underpinning has hinged on the stringent assumption that the learning environment is stationary. In particular, the data distribution on which statistical models are optimized is the same as the distribution to which the models are applied.
Real-world applications are far more complex than the pristine condition. For instance, computer vision systems for recognizing objects in images often suffer from significant performance degradation if they are evaluated on image datasets that are different from the dataset on which they are designed. In this talk, I will describe our efforts in addressing this important challenge of building intelligent systems that are robust to distribution disparity. The central theme is to learn invariant features, cast as learning kernel functions and adapt probabilistic models across different distributions (i.e., domains). To this end, our key insight is to discover and exploit hidden structures in the data. These structures, such as manifolds and discriminative clusters, are intrinsic and thus resilient to distribution changes due to exogenous factors. I will present several learning algorithms we have proposed and demonstrate their effectiveness in pattern recognition tasks from computer vision. This talk is based on the joint work with my students (Boqing Gong and Yuan Shi, both from USC) and our collaborator Prof. Kristen Grauman (U. of Texas, Austin). |
Feb 3
Bren Hall 4011 1 pm |
An optimization strategy is a mean to improve program performance, e.g., through architectural, execution and development environment enhancements (including new architectural mechanisms/instructions, the use of specialized libraries, new compiler optimizations or compiler optimization sequences, new algorithms). Constructing complex optimization strategies by composition of simpler optimizations has been shown to provide significant performance improvements to programs. However, composing simpler optimizations is non-trivial because (a) the combination of possibilities that are available is large and (b) the fact that the effect of the interplay of basic optimizations to program performance is difficult to predict. This talk will first briefly survey previous work on the application of machine learning techniques to construct program optimization strategies and second proposes a practical and widely applicable technique to construct program optimization strategies based on recommendation systems. Preliminary results are shown to support that the proposed technique is equally applicable to several compelling performance evaluation studies, including characterization, comparison and tuning of hardware configurations, compilers, run-time environments or any combination thereof. |
Feb 7
Bren Hall 4011 1pm |
Algorithms are exposed to randomness in the input or noise during the computation. How well can they preserve the information in the data w.r.t. the output space? Algorithms especially in Machine Learning are required to generalize over input fluctuations or randomization during execution. This talk elaborates a new framework to measure the “informativeness” of algorithmic procedures and their “stability” against noise. An algorithm is considered to be a noisy channel which is characterized by a generalization capacity (GC). The generalization capacity objectively ranks different algorithms for the same data processing task based on the bit rate of their respective capacities. The problem of grouping data is used to demonstrate this validation principle for clustering algorithms, e.g. k-means, pairwise clustering, normalized cut, adaptive ratio cut and dominant set clustering. Our new validation approach selects the most informative clustering algorithm, which filters out the maximal number of stable, task-related bits relative to the underlying hypothesis class. The concept also enables us to measure how many bit are extracted by sorting, feature selection or minimum spanning tree algorithms when the respective inputs are contaminated by noise.
Bio:Joachim M. Buhmann leads the Machine Learning Laboratory in the Department of Computer Science at ETH Zurich. He has been a full professor of Information Science and Engineering since October 2003. He studied physics at the Technical University Munich and obtained his PhD in Theoretical Physics. As postdoc and research assistant professor, he spent 1988-92 at the University of Southern California, Los Angeles, and the Lawrence Livermore National Laboratory. He held a professorship for applied computer science at the University of Bonn, Germany from 1992 to 2003. His research interests spans the areas of pattern recognition and data analysis, including machine learning, statistical learning theory and information theory. Application areas of his research include image analysis, medical imaging, acoustic processing and bioinformatics. Currently, he serves as president of the German Pattern Recognition Society. Co-sponsored with the Institute for Genomics and Bioinformatics |
Feb 10
Bren Hall 4011 1 pm |
The problem of finding the most probable, or MAP, configuration of a graphical model can also be cast as an integer linear programming (ILP) problem. Formulating the MAP problem as an ILP has led to the development of many approximate inference methods based on linear programming (LP) relaxations, including Tree-Reweighted Belief Propagation, MPLP and Max-Sum Diffusion. Recent work suggests that going in the opposite direction and posing an ILP as a MAP problem may also prove beneficial. In this talk, I’ll focus on a classic combinatorial optimization problem, the weighted matching problem, and show that if it is posed as a MAP inference problem then the Max-Product Belief Propagation (BP) algorithm always converges to the MAP configuration. Remarkably, this is true even though the weighted matching graphical model is loopy and BP is neither guaranteed to converge, nor be correct in such models. I’ll then discuss a cutting plane approach to solving the weighted matching problem that utilizes the BP algorithm to iteratively tighten the LP relaxation of the matching ILP. This line of work improves our understanding of the performance of the loopy BP algorithm in general and further strengthens the theoretical link between message passing algorithms and optimization theory.
This talk is based on joint work with Misha Chertkov (Los Alamos National Lab) and Jinwoo Shin (KAIST University). |
Feb 17
|
President’s Day
(no seminar)
|
Feb 24
Bren Hall 4011 1 pm |
The explosion of the number of devices capable of transmitting and receiving information connected to wireless and wired communication infrastructures, both for human-to-human and machine-to-machine interaction, is forcing a technological transition to a different inter-networking model. The communication infrastructure is merging from a collection of distinct networks to a single super-network deeply integrated with heterogeneous sub-systems. Mobile health-care, smart energy grids and social networks are just examples of complex systems interoperating with the global communication infrastructure. Adaptive learning and control are the key to empower the next generation of communication networks with the ability to operate and interoperate in heterogeneous environments and with heterogeneous sub-systems. However, the high complexity of these systems discourages the use of traditional tools in practice. In this seminar, I will discuss a novel framework for online learning and optimization in complex inter-networked systems based on sparse approximation theory and wavelet analysis. The key observation is that communication protocols, and supposedly natural and human phenomena, induce a structured behavior of the stochastic process tracking the state of the system. This induces a regular structure at different time scales (hop number) of the graph modeling state transitions, and enables dimensionality reduction based on graph wavelet analysis and graph filtering. Sparse approximation algorithms can be then employed to dramatically reduce the number of observations needed to estimate fundamental control functions mapped on the state space of the observed system. |
Mar 3
Bren Hall 4011 1 pm |
Deep learning methods have shown to be powerful approaches for modelling a large variety of data (speech, computer vision, natural language, biological, etc.) and solve a vast range of machine learning tasks (classification, regression, etc.). In this talk, I will present my recent research on using deep neural networks for the task of distribution/density estimation, one of the most fundamental problem in machine learning. Specifically, I will discuss the neural autoregressive distribution estimator (NADE), a state-of-the-art estimator of the probability distribution of data. I will then describe a deep version of NADE, which again illustrates the statistical modelling power of deep models.
Bio:Hugo Larochelle is Assistant Professor at the Université de Sherbrooke (UdeS). Before joining the Computer Science department of UdeS in 2011, he spent two years in the machine learning group at University of Toronto, as a postdoctoral fellow under the supervision of Geoffrey Hinton. He obtained his Ph.D. at Université de Montréal, under the supervision of Yoshua Bengio. He is the recipient of two Google Faculty Awards, acts as associate editor for the IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) and is a member of the editorial board of the Journal of Artificial Intelligence Research (JAIR). This talk co-sponsored with the Institute for Genomics and Bioinformatics. |
Mar 10
Bren Hall 4011 1 pm |
Statistical topic models such as latent Dirichlet allocation (LDA) have become a standard tool for analyzing text corpora, with broad applications to political science, the humanities, sociology, conversational dialog, and more. In recent years there has been an explosion in the amount of digital text information available, leading to challenges of scale for traditional inference algorithms for topic models. In this talk, I will present algorithms for learning and evaluating topic models, quickly, accurately and at scale.
In part one of the talk I will describe SCVB0, a stochastic variational Bayesian inference algorithm which exploits the collapsed representation of LDA. The algorithm can fit a topic model to Wikipedia in a matter of hours, and to the NIPS corpus of machine learning articles in seconds. It is also extremely simple and easy to implement. The second part of the talk considers the problem of evaluating topic models by computing the log-likelihood of held-out data, a task which requires approximating an intractable high-dimensional integral. Annealed importance sampling (AIS), a Monte Carlo integration technique which operates by annealing between two distributions, has previously been successfully used to evaluate topic models. We introduce new annealing paths which exhibit much lower empirical variance than the previous AIS approach (albeit at the cost of increased bias when given too few iterations), facilitating reliable per-document comparisons of topic models. We then show how to use these paths to evaluate the predictive performance of topic model learning algorithms on a per-iteration basis. The proposed method achieves better estimates at this task than previous algorithms, in some cases with an order of magnitude less computational effort. |
Mar 17
|
Finals Week
(no seminar)
|
Fall 2013
Standard
Sept 26
Bren Hall 4011 1 pm |
The dominant visual search paradigm for object class detection is sliding windows. Although simple and effective, it is also wasteful, unnatural and rigidly hardwired. We propose strategies to search for objects which intelligently explore the space of windows by making sequential observations at locations decided based on previous observations. Our strategies adapt to the class being searched and to the content of a particular test image, exploiting context as the statistical relation between the appearance of a window and its location relative to the object, as observed in the training set. In addition to being more elegant than sliding windows, we demonstrate experimentally on the PASCAL VOC 2010 dataset that our strategies evaluate two orders of magnitude fewer windows while achieving higher object detection performance. |
Oct 7
Bren Hall 4011 1 pm |
Massive datasets have imposed new challenges for the scientific community. Data-intensive problems are especially challenging for Bayesian methods, which typically involve intractable models that rely on Markov Chain Monte Carlo (MCMC) algorithms for their implementation. In this talk, I will discuss our recent attempts to develop a new class of scalable computational methods to facilitate the application of Bayesian statistics in data-intensive scientific problems. One approach uses geometrically motivated methods that explore the parameter space more efficiency by exploiting its geometric properties. Another approach uses techniques that are designed to speed up sampling algorithms through faster exploration of the parameter space. I will also discuss a possible integration of geometric methods with proper computational techniques to improve the overall efficiency of sampling algorithms so that they can be used for Big Data analysis. |
Oct 14
Bren Hall 4011 1 pm |
When reviewing scientific literature, it would be useful to have automatic tools that identify the most influential scientific articles as well as how ideas propagate between articles. In this context, this paper introduces topical influence, a quantitative measure of the extent to which an article tends to spread its topics to the articles that cite it. Given the text of the articles and their citation graph, we show how to learn a probabilistic model to recover both the degree of topical influence of each article and the influence relationships between articles. Experimental results on corpora from two well-known computer science conferences are used to illustrate and validate the proposed approach. |
Oct 21
Bren Hall 4011 1 pm |
Learning from prior tasks and transferring that experience to improve future performance is critical for building lifelong learning agents. Although results in supervised and reinforcement learning show that transfer may significantly improve the learning performance, most of the literature on transfer is focused on batch learning tasks. In this paper we study the problem of \textit{sequential transfer in online learning}, notably in the multi-armed bandit framework, where the objective is to minimize the cumulative regret over a sequence of tasks by incrementally transferring knowledge from prior tasks. We introduce a novel bandit algorithm based on a method-of-moments approach for the estimation of the possible tasks and derive regret bounds for it.
Bio: Mohammad Gheshlaghi Azar studied Electrical Engineering (control theory) at University of Tehran, Iran from 2003 till 2006. He then moved to Netherlands for Ph.d., where he worked with Professor Bert Kappen and Professor Remi Munos on the subject of statistical machine learning and reinforcement learning. Following finishing his Ph.D. in 2012, he joined the school of computer science at Carnegie Mellon university as a postdoctoral fellow, where he is working with Professor Brunskill on the subject of transfer of knowledge in sequential decision making problems. His research is focused on developing new machine learning algorithms which apply to life-long and real-world learning and decision making problems. |
Oct 28
Bren Hall 4011 1 pm |
This talk will describe Rephil, a system used widely within Google to identify the concepts or topics that underlie a given piece of text. Rephil determines, for example, that “apple pie” relates to some of the same concepts as “chocolate cake”, but has little in common with “apple ipod”. The concepts used by Rephil are not pre-specified; instead, they are derived by an unsupervised learning algorithm running on massive amounts of text. The result of this learning process is a Rephil model — a giant Bayesian network with concepts as nodes. I will discuss the structure of Rephil models, the distributed machine learning algorithm that we use to build these models from terabytes of data, and the Bayesian network inference algorithm that we use to identify concepts in new texts under tight time constraints. I will also discuss how Rephil relates to ongoing academic research on probabilistic topic models.
Bio: Brian Milch is a software engineer at Google’s Los Angeles office. He first joined Google in 2000, after completing a B.S. in Symbolic Systems at Stanford University. A year later, he entered the Computer Science Ph.D. program at U.C. Berkeley. He received his doctorate in 2006, with a thesis focused on the integration of probabilistic and logical approaches to artificial intelligence. He then spent two years as a post-doctoral researcher at MIT before returning to Google in 2008. He has contributed to Google production systems for spelling correction, transliteration, and semantic modeling of text. |
Nov 4
Bren Hall 4011 1 pm |
Survival analysis focuses on modeling and predicting the time to an event of interest. Traditional survival models (e.g., the prevalent proportional hazards model) often impose strong assumptions on hazard functions, which describe how the risk of an event changes over time depending on covariates associated with each individual. In this paper we propose a nonparametric survival model (GBMCI) that does not make explicit assumptions on hazard functions. Our model trains an ensemble of regression trees by the gradient boosting machine to optimize a smoothed approximation of the concordance index, which is one of the most widely used metrics in survival model evaluation. We benchmarked the performance of GBMCI against other popular survival models with a large-scale breast cancer prognosis dataset. Our experiment shows that GBMCI consistently outperforms other methods based on a number of covariate settings. |
Nov 11
|
Veterans Day
(no seminar)
|
Nov 18
Bren Hall 4011 1 pm |
Statistical models with constrained probability distributions are abundant in machine learning. Some examples include regression models with norm constraints (e.g., Lasso), probit models, many copula models, and Latent Dirichlet Allocation (LDA) models. Bayesian inference involving probability distributions confined to constrained domains could be quite challenging for commonly used sampling algorithms. For such problems, we propose a novel Markov Chain Monte Carlo (MCMC) method that provides a general and computationally efficient framework for handling boundary conditions. Our method first maps the $D$-dimensional constrained domain of parameters to the unit ball ${\bf B}_0^D(1)$, then augments it to the $D$-dimensional sphere ${\bf S}^D$ such that the original boundary corresponds to the equator of ${\bf S}^D$. This way, our method handles the constraints implicitly by moving freely on sphere generating proposals that remain within boundaries when mapped back to the original space. To improve the computational efficiency of our algorithm, we divide the dynamics into several parts such that the resulting split dynamics has a partial analytical solution as a geodesic flow on the sphere. We apply our method to several examples including truncated Gaussian, Bayesian Lasso, Bayesian bridge regression, and a copula model for identifying synchrony among multiple neurons. Our results show that the proposed method can provide a natural and efficient framework for handling several types of constraints on target distributions. |
Nov 25
|
Thanksgiving week
(no seminar)
|
Dec 2
Bren Hall 4011 1 pm |
|
CML Faculty Member Alex Ihler receives NSF CAREER Award
StandardAlexander Ihler, associate professor of computer science, has been awarded the National Science Foundation’s (NSF) Faculty Early Career Development (CAREER) Award for his project, .Estimation and Decisions in Graphical Models.. Ihler will receive $442,000 over five years for his CAREER project, which seeks to develop a new framework for exact and approximate methods for advanced computational reasoning problems. It extends the abilities of intelligent systems to reasoning and decision-making under uncertainty, and it applies and tests these methods on a variety of application domains, including sensor networks and computer vision. Read more
Microsoft Faculty Fellowship awarded to CML Faculty Anima Anandkumar
StandardAnima Anandkumar has been awarded a 2013 Microsoft Research Faculty Fellowship. An assistant professor of electrical engineering and computer science at UC Irvine’s The Henry Samueli School of Engineering, Anandkumar is one of seven from around the world to receive this support. Read more
Professor Pierre Baldi wins ICSB Award
StandardThe International Society for Computational Biology has selected Chancellor’s Professor Pierre Baldi as an ISCB Fellow. The ISCB Fellows program honors members who have distinguished themselves through outstanding contributions to the fields of computational biology and bioinformatics. The 2013 fellows were recognized at the Intelligent Systems for Molecular Biology conference, held July 21-23 in Berlin. Read more
Professor Padhraic Smyth serves as Program Chair for UAI 2013 Conference
StandardComputer science professor Padhraic Smyth served as program chair for the 29th Conference on Uncertainty in Artificial Intelligence (UAI 2013), held July 11-15 in Bellevue, Wash. Sponsored by Microsoft Research, Google, Facebook, Amazon, Toyota and IBM, UAI is the leading international conference on the use of probabilistic models and algorithms in artificial intelligence and machine learning. More than 240 papers were submitted to the conference; 73 were accepted for presentation at the meeting, after extensive peer review by a program committee of over 200 researchers in the area. Read more