Center member Prof. Anima Anandkumar gave a half-day course at the 2014 Machine Learning Summer School in Pittsburg PA. MLSS is a long-running series of summer schools around the world for graduate students and researchers who want to apply machine learning methods to their research problems.
By: ihler
Smyth to head new Data Science Initiative
StandardCenter member and Professor of Computer Science Padhraic Smyth has been named the director of a new campus-wide initiative with a focus on coordinating and linking the activities of researchers and students across campus involved in various aspects of data science. UCI’s Data Science Initiative was initiative was started on July 1, 2014 and is sponsored by the Provost through the Office of Academic Initiatives. Find out more here.
Spring 2014
Standard
Mar 31
|
First day of classes
(no seminar)
|
Apr 7
Bren Hall 4011 1 pm |
Probabilistic graphical models describe a potentially large number of random variables coupled with each other through some dependency mechanism. One of the main problems underlying graphical models is to infer the values of certain variables based on observations of other variables. Rigorous analysis of statistical inference algorithms can be very complicated even for relatively simple models. Instead, methods based on statistical physics of disordered systems provide a viable alternative. Here I will demonstrate the application of those methods on two problems, MAP estimation of Hidden Markov Models and Stochastic Block Models. The inference in both problems can become highly unstable due to a critical phase transition in the corresponding statistical physical system. Those instabilities are caused by frustrated (e.g., conflicting) constraints that are imposed on the inference objective. I will also discuss how one can mitigate this undesirable feature by active inference, i.e., by adaptively acquiring information about the (true) states of the hidden variables. |
Apr 9
Bren Hall 4011 4pm |
Today’s energy systems, such as electric power grids and gas grids, already demonstrate complex nonlinear dynamics where, e.g., collective effects in one exert uncertainty and irregularities on other. These collective dynamics are not well understood and are expected to become more complex tomorrow as the grids are pushed to reliability limits, interdependencies grow, and appliances become more intelligent and autonomous. Tomorrow’s will have to integrate the intermittent power from wind and solar farms whose fluctuating outputs create far more complex stress on power grid operations, often dependent, e.g. in providing fast regulation control, on the gas supply. Conversely, one anticipates significant effect of the wind-following gas fired turbines on reliability of the gas grid. Guarding against the worst of those perturbations will require taking protective measures based on ideas from optimization, control, statistics and physics.
In this talk we introduce a few of the physical, optimization and control principles and phenomena in today’s energy grids and those that are expected to play a major role in tomorrow’s grids. We illustrate the new science of the energy grids on three examples: (a) discussing an efficient and highly scalable Chance Constrained Optimal Power Flow algorithm providing risk-aware control of the power transmission system under uncertainty associated with fluctuating renewables (wind farms); (b) describing effect of the intermittent power generation on reliability and compression control of the gas grid operations; and (c) briefly discussing examples of interdependencies, reliability troubles and solutions in the low level (distribution) grids. Bio:Dr. Chertkov’s areas of interest include statistical and mathematical physics applied to energy and communication networks, machine learning, control theory, information theory, computer science, fluid mechanics and optics. Dr. Chertkov received his Ph.D. in physics from the Weizmann Institute of Science in 1996, and his M.Sc. in physics from Novosibirsk State University in 1990. After his Ph.D., Dr. Chertkov spent three years at Princeton University as a R.H. Dicke Fellow in the Department of Physics. He joined Los Alamos National Lab in 1999, initially as a J.R. Oppenheimer Fellow in the Theoretical Division. He is now a technical staff member in the same division. Dr. Chertkov has published more than 130 papers in these research areas. He is an editor of the Journal of Statistical Mechanics (JSTAT), associate editor of IEEE Transactions on Control of Network Systems, a fellow of the American Physical Society (APS), and a Founding Faculty Fellow of Skoltech – young graduate school built in Moscow (Russia). |
Apr 14
Bren Hall 4011 1 pm |
We will discuss the detection of patterns over graphs from noisy measurements. This problem is relevant to many applications including detecting anomalies in sensor and computer networks, brain activity, co-expressions in gene networks, disease outbreaks, etc. Beyond its wide applicability, graph structured anomaly detection serves as a case study in the difficulty of balancing computational complexity with statistical power. We develop from first principles the generalized likelihood ratio test (GLRT) for determining if there is a well connected region of activation over the vertices in the graph with Gaussian noise. Due to the combinatorial nature of this test, the GLRT is computationally infeasible. We discuss two approaches, relaxing the combinatorial GLRT and a wavelet construction over graphs, to overcome this issue.
One such relaxation that we develop is the graph ellipsoid scan statistic, whose statistical performance is characterized by the spectrum of the graph Laplacian. Another relaxation that we have developed is the Lovasz extended scan statistic (LESS), which is based on submodular optimization and the performance is described using electrical network theory. We then introduce the spanning tree wavelet basis over graphs, a localized basis that reflects the topology of the graph. We show that using the uniform spanning tree in the basis construction yields a randomized test with performance guarantees similar to that of the LESS. For each of these tests we compare their statistical guarantees to an information theoretic lower bound. Finally, we consider specific graph models, such as the torus, k-nearest neighbor graphs, and epsilon-random graphs and show that for these graphs we achieve near-optimal risk consistency regimes. |
Apr 28
Bren Hall 4011 1 pm |
Modern data science applications increasingly involve statistical learning on very large datasets, where the data instances are stored in a distributed way across different nodes of the clusters, with expensive communication costs between nodes. We study a simple communication-efficient learning algorithm that first calculates the local maximum likelihood estimates (MLE) based on the subsets of datasets, and then combines the local MLEs to achieve the best possible approximation to the global MLE, based on the whole dataset jointly. A naive and commonly used combination method is to take the linear average of the local ML estimates; this method; however, this has a sub-optimal error rate, and more critically, can easily break down in practical cases where the parameters are either unidentifiable (e.g., in mixture models), non-additive, or have complicated structure. In this work, we propose a KL-divergence-based combination method that achieves the best possible error rate, and avoids weaknesses of linear averaging. Perhaps surprisingly, we show that our algorithm exactly recovers the global MLE under full exponential families, and its error rate on general distributions are related to how nearly “exponential family” they are — formally captured by Efron’s statistical curvature, originally defined by Efron (1975) to extend Fisher and Rao’s theory of information loss and second order efficiency of the MLE. In addition, we show that the statistical curvature equals the lower bound of the asymptotic error rate of arbitrary combination methods, and hence represents an intrinsic difficulty measurement of distributed learning in this setting. |
May 5
Bren Hall 4011 1 pm |
Automatically tracking people and their body poses in unconstrained videos is an important task, as it serves as a foundation for high-level reasoning such as activity recognition. This task is difficult for two reason: a) building an accurate pose detector is hard, and b) dependencies of body parts over space and time is hard to model or causes intractable computation.
This talk consists of two parts. In the first part, I address two key challenges in a common pipeline of pose tracker: 1) detecting small people and 2) extracting diverse set of candidate poses from each frame. I describe novel multiresolutional representation, motion descriptors, and inference algorithms to tackle these challenges. In the second part, I propose the use of synthetic training frames as a mean to “overfit” a single video. Using a simple synthesis engine and detailed annotation of the first frame, we synthesize potential future video frames. We argue that this large customized training serves as an ideal training set relieving the burden of modeling, and provides us with insights on critical components of a working tracker. |
May 12
Bren Hall 4011 1 pm |
Exploratory data mining methods, such as methods for clustering, association analysis, community detection, dimensionality reduction, etc., aim to assist a user in improving their understanding about the data. In this talk I will discuss a simple mathematical model for the exploratory data mining process that makes it possible to quantify how effective any given pattern (in the broad sense) is for this purpose. This quantification is naturally subjective, dependent on any prior beliefs the user may hold about the data.
While the proposed model is abstract and generic, it suggests practical ways for developing specific exploratory data mining methods that present patterns that are subjectively interesting to the user. I will illustrate this by showing how it leads to principled approaches for alternative clustering, for community detection in networks, and for association analysis in simple data tables as well as in relational databases. From May 2014 my research on this topic will be funded by an ERC Consolidator Grant titled “Formalising Subjective Interestingness in Exploratory Data mining” (FORSIED). Relevant references: http://www.tijldebie.net/projects/fiip Bio: Tijl De Bie is currently a Reader (Associate Professor) at the University of Bristol, where he was appointed Lecturer (Assistant Professor) in January 2007. Before that, he was a postdoctoral researcher at the KU Leuven (Belgium) and the University of Southampton. He completed his PhD on machine learning and advanced optimization techniques in 2005 at the KU Leuven. During his PhD he also spent a combined total of about 1 year as a visiting research scholar in U.C. Berkeley and U.C. Davis. He is currently most actively interested in the formalization of subjective interestingness in exploratory data mining, and in the use of machine learning and data mining for music informatics as well as for web and social media mining. He currently holds a prestigious ERC Consolidator Grant titled “Formalising Subjective Interestingness in Exploratory Data Mining” (FORSIED). |
May 19
Bren Hall 4011 1 pm |
We argue that object subcategories follow a long-tail distribution: a few subcategories are common, while many are rare. We describe distributed algorithms for learning large-mixture models that capture long-tail distributions, which are hard to model with current approaches. We introduce a generalized notion of mixtures (or subcategories) that allow for examples to be shared across multiple subcategories. We optimize our models with a discriminative clustering algorithm that searches over mixtures in a distributed, “brute-force” fashion. We used our scalable system to train tens of thousands of deformable mixtures for VOC objects. We demonstrate significant performance improvements, particularly for object classes that are characterized by large appearance variation. |
May 26
|
Memorial Day
(no seminar)
|
June 2
Bren Hall 4011 1 pm |
Neurophysiological studies of the decision-making process commonly involve analyzing and modeling spikes produced by a neuron. Complex behaviors, however, are driven by networks of neurons. We propose a flexible Bayesian model for capturing temporal dependencies between multiple neurons under different types of decisions (e.g., safe vs. risky; good vs. bad). Using our model, we are able to identify a small subset of neurons that are involved in the decision-making process and detect the dynamics of their dependence structure. |
Anandkumar Receives ‘Early-Career’ Sloan Research Fellowship
StandardCenter member Anima Anandkumar, an assistant professor of electrical engineering and computer science, has been awarded a 2014 Sloan Research Fellowship for her work at the interface of theory and practice of large-scale machine learning and high-dimensional statistics. Bestowed annually since 1955 by the Alfred P. Sloan Foundation, the two-year fellowships go to 126 early-career scientists and scholars in the U.S. and Canada whose achievements and potential identify them as the next generation of scientific leaders. Read more here.
Two Center Members Elected as ACM Fellows
StandardThe Association for Computing Machinery (ACM), the world’s largest educational and scientific computing society, has announced two faculty members have earned prestigious honors. Computer science professors Rina Dechter and Padhraic Smyth have been named 2013 ACM Fellows. Read more
Carey and Li receive $1.1 million in funding for AsterixDB research
StandardProfessors Michael Carey and Chen Li have received $750,000 from the National Science Foundation and nearly $400,000 from corporations . including Google, Oracle and HTC . to continue the development of their Big Data system, AsterixDB. Read more
ICS grad students take second place in sbv IMPROVER competition
StandardPeter Sadowski and Michael Zeller, both Ph.D. students with the Department of Computer Science, earned a second-place finish in an international data-mining competition. The honor was given by sbv IMPROVER, a collaborative project designed to enable scientists to learn about and contribute to the development of a new crowdsourcing method for verification of scientific data and results. Read more
Winter 2014
Standard
Jan 13
Bren Hall 4011 1 pm |
Peter Grunwald
Coordinator Information-theoretic Learning Group Centrum voor Wiskunde en Informatica (CWI) & Leiden University
Bayesian inference can behave badly if the model under consideration is wrong yet useful: the posterior may fail to concentrate even in the large sample limit. We demonstrate this using a simple linear regression example. We then introduce a test that can tell from the data whether we are heading for such a situation. If we are, we adjust the learning rate (equivalently: make the prior lighter-tailed, or penalize the likelihood more) in a data-dependent way. The resulting “safe-Bayesian” estimator continues to achieve good rates with wrong models. In classification, it learns faster in easy settings, i.e. when a Tsybakov condition holds. The safe estimator is based on empirical mixability/exp-concavity, which generalizes an idea from worst-case online prediction. Thus, safe estimation connects three paradigms: Bayesian inference, (frequentist) statistical learning theory and (individual sequence) on-line prediction.
For an informal introduction to the idea, see Larry Wasserman’s blog: http://normaldeviate.wordpress.com/2012/06/26/self-repairing-bayesian-inference/ |
Jan 20
|
Martin Luther King, Jr Day
(no seminar)
|
Jan 27
Bren Hall 4011 1 pm |
Statistical machine learning has become an important driving force behind many application fields. By large, however, its theoretical underpinning has hinged on the stringent assumption that the learning environment is stationary. In particular, the data distribution on which statistical models are optimized is the same as the distribution to which the models are applied.
Real-world applications are far more complex than the pristine condition. For instance, computer vision systems for recognizing objects in images often suffer from significant performance degradation if they are evaluated on image datasets that are different from the dataset on which they are designed. In this talk, I will describe our efforts in addressing this important challenge of building intelligent systems that are robust to distribution disparity. The central theme is to learn invariant features, cast as learning kernel functions and adapt probabilistic models across different distributions (i.e., domains). To this end, our key insight is to discover and exploit hidden structures in the data. These structures, such as manifolds and discriminative clusters, are intrinsic and thus resilient to distribution changes due to exogenous factors. I will present several learning algorithms we have proposed and demonstrate their effectiveness in pattern recognition tasks from computer vision. This talk is based on the joint work with my students (Boqing Gong and Yuan Shi, both from USC) and our collaborator Prof. Kristen Grauman (U. of Texas, Austin). |
Feb 3
Bren Hall 4011 1 pm |
An optimization strategy is a mean to improve program performance, e.g., through architectural, execution and development environment enhancements (including new architectural mechanisms/instructions, the use of specialized libraries, new compiler optimizations or compiler optimization sequences, new algorithms). Constructing complex optimization strategies by composition of simpler optimizations has been shown to provide significant performance improvements to programs. However, composing simpler optimizations is non-trivial because (a) the combination of possibilities that are available is large and (b) the fact that the effect of the interplay of basic optimizations to program performance is difficult to predict. This talk will first briefly survey previous work on the application of machine learning techniques to construct program optimization strategies and second proposes a practical and widely applicable technique to construct program optimization strategies based on recommendation systems. Preliminary results are shown to support that the proposed technique is equally applicable to several compelling performance evaluation studies, including characterization, comparison and tuning of hardware configurations, compilers, run-time environments or any combination thereof. |
Feb 7
Bren Hall 4011 1pm |
Algorithms are exposed to randomness in the input or noise during the computation. How well can they preserve the information in the data w.r.t. the output space? Algorithms especially in Machine Learning are required to generalize over input fluctuations or randomization during execution. This talk elaborates a new framework to measure the “informativeness” of algorithmic procedures and their “stability” against noise. An algorithm is considered to be a noisy channel which is characterized by a generalization capacity (GC). The generalization capacity objectively ranks different algorithms for the same data processing task based on the bit rate of their respective capacities. The problem of grouping data is used to demonstrate this validation principle for clustering algorithms, e.g. k-means, pairwise clustering, normalized cut, adaptive ratio cut and dominant set clustering. Our new validation approach selects the most informative clustering algorithm, which filters out the maximal number of stable, task-related bits relative to the underlying hypothesis class. The concept also enables us to measure how many bit are extracted by sorting, feature selection or minimum spanning tree algorithms when the respective inputs are contaminated by noise.
Bio:Joachim M. Buhmann leads the Machine Learning Laboratory in the Department of Computer Science at ETH Zurich. He has been a full professor of Information Science and Engineering since October 2003. He studied physics at the Technical University Munich and obtained his PhD in Theoretical Physics. As postdoc and research assistant professor, he spent 1988-92 at the University of Southern California, Los Angeles, and the Lawrence Livermore National Laboratory. He held a professorship for applied computer science at the University of Bonn, Germany from 1992 to 2003. His research interests spans the areas of pattern recognition and data analysis, including machine learning, statistical learning theory and information theory. Application areas of his research include image analysis, medical imaging, acoustic processing and bioinformatics. Currently, he serves as president of the German Pattern Recognition Society. Co-sponsored with the Institute for Genomics and Bioinformatics |
Feb 10
Bren Hall 4011 1 pm |
The problem of finding the most probable, or MAP, configuration of a graphical model can also be cast as an integer linear programming (ILP) problem. Formulating the MAP problem as an ILP has led to the development of many approximate inference methods based on linear programming (LP) relaxations, including Tree-Reweighted Belief Propagation, MPLP and Max-Sum Diffusion. Recent work suggests that going in the opposite direction and posing an ILP as a MAP problem may also prove beneficial. In this talk, I’ll focus on a classic combinatorial optimization problem, the weighted matching problem, and show that if it is posed as a MAP inference problem then the Max-Product Belief Propagation (BP) algorithm always converges to the MAP configuration. Remarkably, this is true even though the weighted matching graphical model is loopy and BP is neither guaranteed to converge, nor be correct in such models. I’ll then discuss a cutting plane approach to solving the weighted matching problem that utilizes the BP algorithm to iteratively tighten the LP relaxation of the matching ILP. This line of work improves our understanding of the performance of the loopy BP algorithm in general and further strengthens the theoretical link between message passing algorithms and optimization theory.
This talk is based on joint work with Misha Chertkov (Los Alamos National Lab) and Jinwoo Shin (KAIST University). |
Feb 17
|
President’s Day
(no seminar)
|
Feb 24
Bren Hall 4011 1 pm |
The explosion of the number of devices capable of transmitting and receiving information connected to wireless and wired communication infrastructures, both for human-to-human and machine-to-machine interaction, is forcing a technological transition to a different inter-networking model. The communication infrastructure is merging from a collection of distinct networks to a single super-network deeply integrated with heterogeneous sub-systems. Mobile health-care, smart energy grids and social networks are just examples of complex systems interoperating with the global communication infrastructure. Adaptive learning and control are the key to empower the next generation of communication networks with the ability to operate and interoperate in heterogeneous environments and with heterogeneous sub-systems. However, the high complexity of these systems discourages the use of traditional tools in practice. In this seminar, I will discuss a novel framework for online learning and optimization in complex inter-networked systems based on sparse approximation theory and wavelet analysis. The key observation is that communication protocols, and supposedly natural and human phenomena, induce a structured behavior of the stochastic process tracking the state of the system. This induces a regular structure at different time scales (hop number) of the graph modeling state transitions, and enables dimensionality reduction based on graph wavelet analysis and graph filtering. Sparse approximation algorithms can be then employed to dramatically reduce the number of observations needed to estimate fundamental control functions mapped on the state space of the observed system. |
Mar 3
Bren Hall 4011 1 pm |
Deep learning methods have shown to be powerful approaches for modelling a large variety of data (speech, computer vision, natural language, biological, etc.) and solve a vast range of machine learning tasks (classification, regression, etc.). In this talk, I will present my recent research on using deep neural networks for the task of distribution/density estimation, one of the most fundamental problem in machine learning. Specifically, I will discuss the neural autoregressive distribution estimator (NADE), a state-of-the-art estimator of the probability distribution of data. I will then describe a deep version of NADE, which again illustrates the statistical modelling power of deep models.
Bio:Hugo Larochelle is Assistant Professor at the Université de Sherbrooke (UdeS). Before joining the Computer Science department of UdeS in 2011, he spent two years in the machine learning group at University of Toronto, as a postdoctoral fellow under the supervision of Geoffrey Hinton. He obtained his Ph.D. at Université de Montréal, under the supervision of Yoshua Bengio. He is the recipient of two Google Faculty Awards, acts as associate editor for the IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) and is a member of the editorial board of the Journal of Artificial Intelligence Research (JAIR). This talk co-sponsored with the Institute for Genomics and Bioinformatics. |
Mar 10
Bren Hall 4011 1 pm |
Statistical topic models such as latent Dirichlet allocation (LDA) have become a standard tool for analyzing text corpora, with broad applications to political science, the humanities, sociology, conversational dialog, and more. In recent years there has been an explosion in the amount of digital text information available, leading to challenges of scale for traditional inference algorithms for topic models. In this talk, I will present algorithms for learning and evaluating topic models, quickly, accurately and at scale.
In part one of the talk I will describe SCVB0, a stochastic variational Bayesian inference algorithm which exploits the collapsed representation of LDA. The algorithm can fit a topic model to Wikipedia in a matter of hours, and to the NIPS corpus of machine learning articles in seconds. It is also extremely simple and easy to implement. The second part of the talk considers the problem of evaluating topic models by computing the log-likelihood of held-out data, a task which requires approximating an intractable high-dimensional integral. Annealed importance sampling (AIS), a Monte Carlo integration technique which operates by annealing between two distributions, has previously been successfully used to evaluate topic models. We introduce new annealing paths which exhibit much lower empirical variance than the previous AIS approach (albeit at the cost of increased bias when given too few iterations), facilitating reliable per-document comparisons of topic models. We then show how to use these paths to evaluate the predictive performance of topic model learning algorithms on a per-iteration basis. The proposed method achieves better estimates at this task than previous algorithms, in some cases with an order of magnitude less computational effort. |
Mar 17
|
Finals Week
(no seminar)
|
Fall 2013
Standard
Sept 26
Bren Hall 4011 1 pm |
The dominant visual search paradigm for object class detection is sliding windows. Although simple and effective, it is also wasteful, unnatural and rigidly hardwired. We propose strategies to search for objects which intelligently explore the space of windows by making sequential observations at locations decided based on previous observations. Our strategies adapt to the class being searched and to the content of a particular test image, exploiting context as the statistical relation between the appearance of a window and its location relative to the object, as observed in the training set. In addition to being more elegant than sliding windows, we demonstrate experimentally on the PASCAL VOC 2010 dataset that our strategies evaluate two orders of magnitude fewer windows while achieving higher object detection performance. |
Oct 7
Bren Hall 4011 1 pm |
Massive datasets have imposed new challenges for the scientific community. Data-intensive problems are especially challenging for Bayesian methods, which typically involve intractable models that rely on Markov Chain Monte Carlo (MCMC) algorithms for their implementation. In this talk, I will discuss our recent attempts to develop a new class of scalable computational methods to facilitate the application of Bayesian statistics in data-intensive scientific problems. One approach uses geometrically motivated methods that explore the parameter space more efficiency by exploiting its geometric properties. Another approach uses techniques that are designed to speed up sampling algorithms through faster exploration of the parameter space. I will also discuss a possible integration of geometric methods with proper computational techniques to improve the overall efficiency of sampling algorithms so that they can be used for Big Data analysis. |
Oct 14
Bren Hall 4011 1 pm |
When reviewing scientific literature, it would be useful to have automatic tools that identify the most influential scientific articles as well as how ideas propagate between articles. In this context, this paper introduces topical influence, a quantitative measure of the extent to which an article tends to spread its topics to the articles that cite it. Given the text of the articles and their citation graph, we show how to learn a probabilistic model to recover both the degree of topical influence of each article and the influence relationships between articles. Experimental results on corpora from two well-known computer science conferences are used to illustrate and validate the proposed approach. |
Oct 21
Bren Hall 4011 1 pm |
Learning from prior tasks and transferring that experience to improve future performance is critical for building lifelong learning agents. Although results in supervised and reinforcement learning show that transfer may significantly improve the learning performance, most of the literature on transfer is focused on batch learning tasks. In this paper we study the problem of \textit{sequential transfer in online learning}, notably in the multi-armed bandit framework, where the objective is to minimize the cumulative regret over a sequence of tasks by incrementally transferring knowledge from prior tasks. We introduce a novel bandit algorithm based on a method-of-moments approach for the estimation of the possible tasks and derive regret bounds for it.
Bio: Mohammad Gheshlaghi Azar studied Electrical Engineering (control theory) at University of Tehran, Iran from 2003 till 2006. He then moved to Netherlands for Ph.d., where he worked with Professor Bert Kappen and Professor Remi Munos on the subject of statistical machine learning and reinforcement learning. Following finishing his Ph.D. in 2012, he joined the school of computer science at Carnegie Mellon university as a postdoctoral fellow, where he is working with Professor Brunskill on the subject of transfer of knowledge in sequential decision making problems. His research is focused on developing new machine learning algorithms which apply to life-long and real-world learning and decision making problems. |
Oct 28
Bren Hall 4011 1 pm |
This talk will describe Rephil, a system used widely within Google to identify the concepts or topics that underlie a given piece of text. Rephil determines, for example, that “apple pie” relates to some of the same concepts as “chocolate cake”, but has little in common with “apple ipod”. The concepts used by Rephil are not pre-specified; instead, they are derived by an unsupervised learning algorithm running on massive amounts of text. The result of this learning process is a Rephil model — a giant Bayesian network with concepts as nodes. I will discuss the structure of Rephil models, the distributed machine learning algorithm that we use to build these models from terabytes of data, and the Bayesian network inference algorithm that we use to identify concepts in new texts under tight time constraints. I will also discuss how Rephil relates to ongoing academic research on probabilistic topic models.
Bio: Brian Milch is a software engineer at Google’s Los Angeles office. He first joined Google in 2000, after completing a B.S. in Symbolic Systems at Stanford University. A year later, he entered the Computer Science Ph.D. program at U.C. Berkeley. He received his doctorate in 2006, with a thesis focused on the integration of probabilistic and logical approaches to artificial intelligence. He then spent two years as a post-doctoral researcher at MIT before returning to Google in 2008. He has contributed to Google production systems for spelling correction, transliteration, and semantic modeling of text. |
Nov 4
Bren Hall 4011 1 pm |
Survival analysis focuses on modeling and predicting the time to an event of interest. Traditional survival models (e.g., the prevalent proportional hazards model) often impose strong assumptions on hazard functions, which describe how the risk of an event changes over time depending on covariates associated with each individual. In this paper we propose a nonparametric survival model (GBMCI) that does not make explicit assumptions on hazard functions. Our model trains an ensemble of regression trees by the gradient boosting machine to optimize a smoothed approximation of the concordance index, which is one of the most widely used metrics in survival model evaluation. We benchmarked the performance of GBMCI against other popular survival models with a large-scale breast cancer prognosis dataset. Our experiment shows that GBMCI consistently outperforms other methods based on a number of covariate settings. |
Nov 11
|
Veterans Day
(no seminar)
|
Nov 18
Bren Hall 4011 1 pm |
Statistical models with constrained probability distributions are abundant in machine learning. Some examples include regression models with norm constraints (e.g., Lasso), probit models, many copula models, and Latent Dirichlet Allocation (LDA) models. Bayesian inference involving probability distributions confined to constrained domains could be quite challenging for commonly used sampling algorithms. For such problems, we propose a novel Markov Chain Monte Carlo (MCMC) method that provides a general and computationally efficient framework for handling boundary conditions. Our method first maps the $D$-dimensional constrained domain of parameters to the unit ball ${\bf B}_0^D(1)$, then augments it to the $D$-dimensional sphere ${\bf S}^D$ such that the original boundary corresponds to the equator of ${\bf S}^D$. This way, our method handles the constraints implicitly by moving freely on sphere generating proposals that remain within boundaries when mapped back to the original space. To improve the computational efficiency of our algorithm, we divide the dynamics into several parts such that the resulting split dynamics has a partial analytical solution as a geodesic flow on the sphere. We apply our method to several examples including truncated Gaussian, Bayesian Lasso, Bayesian bridge regression, and a copula model for identifying synchrony among multiple neurons. Our results show that the proposed method can provide a natural and efficient framework for handling several types of constraints on target distributions. |
Nov 25
|
Thanksgiving week
(no seminar)
|
Dec 2
Bren Hall 4011 1 pm |
|
CML Faculty Member Alex Ihler receives NSF CAREER Award
StandardAlexander Ihler, associate professor of computer science, has been awarded the National Science Foundation’s (NSF) Faculty Early Career Development (CAREER) Award for his project, .Estimation and Decisions in Graphical Models.. Ihler will receive $442,000 over five years for his CAREER project, which seeks to develop a new framework for exact and approximate methods for advanced computational reasoning problems. It extends the abilities of intelligent systems to reasoning and decision-making under uncertainty, and it applies and tests these methods on a variety of application domains, including sensor networks and computer vision. Read more