Winter 2017

Standard

Jan 16
No Seminar (MLK Day)

Jan 23
Bren Hall 4011
1 pm
Mohammad Ghavamzadeh
Senior Analytics Researcher
Adobe Research

In online advertisement as well as many other fields such as health informatics and computational finance, we often have to deal with the situation in which we are given a batch of data generated by the current strategy(ies) of the company (hospital, investor), and we are asked to generate a good or an optimal strategy. Although there are many techniques to find a good policy given a batch of data, there are not much results to guarantee that the obtained policy will perform well in the real system without deploying it. On the other hand, deploying a policy might be risky, and thus, requires convincing the product (hospital, investment) manager that it is not going to harm the business. This is why it is extremely important to devise algorithms that generate policies with performance guarantees.

In this talk, we discuss four different approaches to this fundamental problem, we call them model-based, model-free, online, and risk-sensitive. In the model-based approach, we first use the batch of data and build a simulator that mimics the behavior of the dynamical system under studies (online advertisement, hospital’s ER, financial market), and then use this simulator to generate data and learn a policy. The main challenge here is to have guarantees on the performance of the learned policy, given the error in the simulator. This line of research is closely related to the area of robust learning and control. In the model-free approach, we learn a policy directly from the batch of data (without building a simulator), and the main question is whether the learned policy is guaranteed to perform at least as well as a baseline strategy. This line of research is related to off-policy evaluation and control. In the online approach, the goal is to control the exploration of the algorithm in a way that never during its execution the loss of using it instead of the baseline strategy is more than a given margin. In the risk-sensitive approach, the goal is to learn a policy that manages risk by minimizing some measure of variability in the performance in addition to maximizing a standard criterion. We present algorithms based on these approaches and demonstrate their usefulness in real-world applications such as personalized ad recommendation, energy arbitrage, traffic signal control, and American option pricing.

Bio:Mohammad Ghavamzadeh received a Ph.D. degree in Computer Science from the University of Massachusetts Amherst in 2005. From 2005 to 2008, he was a postdoctoral fellow at the University of Alberta. He has been a permanent researcher at INRIA in France since November 2008. He was promoted to first-class researcher in 2010, was the recipient of the “INRIA award for scientific excellence” in 2011, and obtained his Habilitation in 2014. He is currently (from October 2013) on a leave of absence from INRIA working as a senior analytics researcher at Adobe Research in California, on projects related to digital marketing. He has been an area chair and a senior program committee member at NIPS, IJCAI, and AAAI. He has been on the editorial board of Machine Learning Journal (MLJ), has published over 50 refereed papers in major machine learning, AI, and control journals and conferences, and has organized several tutorials and workshops at NIPS, ICML, and AAAI. His research is mainly focused on sequential decision-making under uncertainty, reinforcement learning, and online learning.

Jan 27
Bren Hall 6011
11:00am
Ruslan Salakhutdinov
Associate Professor
Machine Learning Department
Carnegie Mellon University

In this talk, I will first introduce a broad class of unsupervised deep learning models and show that they can learn useful hierarchical representations from large volumes of high-dimensional data with applications in information retrieval, object recognition, and speech perception. I will next introduce deep models that are capable of extracting a unified representation that fuses together multiple data modalities and present the Reverse Annealed Importance Sampling Estimator (RAISE) for evaluating these deep generative models. Finally, I will discuss models that can generate natural language descriptions (captions) of images and generate images from captions using attention, as well as introduce multiplicative and fine-grained gating mechanisms with application to reading comprehension.

Bio: Ruslan Salakhutdinov received his PhD in computer science from the University of Toronto in 2009. After spending two post-doctoral years at the Massachusetts Institute of Technology Artificial Intelligence Lab, he joined the University of Toronto as an Assistant Professor in the Departments of Statistics and Computer Science. In 2016 he joined the Machine Learning Department at Carnegie Mellon University as an Associate Professor. Ruslan’s primary interests lie in deep learning, machine learning, and large-scale optimization. He is an action editor of the Journal of Machine Learning Research and served on the senior programme committee of several learning conferences including NIPS and ICML. He is an Alfred P. Sloan Research Fellow, Microsoft Research Faculty Fellow, Canada Research Chair in Statistical Machine Learning, a recipient of the Early Researcher Award, Google Faculty Award, Nvidia’s Pioneers of AI award, and is a Senior Fellow of the Canadian Institute for Advanced Research.

Jan 30
Bren Hall 4011
1 pm
Pierre Baldi & Peter Sadowski
Chancellor’s Professor
Department of Computer Science
University of California, Irvine

Learning in the Machine is a style of machine learning that takes into account the physical constraints of learning machines, from brains to neuromorphic chips. Taking into account these constraints leads to new insights into the foundations of learning systems, and occasionally leads also to improvements for machine learning performed on digital computers. Learning in the Machine is particularly useful when applied to message passing algorithms such as backpropagation and belief propagation, and leads to the concepts of local learning and learning channel. These concepts in turn will be applied to random backpropagation and several new variants. In addition to simulations corroborating the remarkable robustness of these algorithms, we will present new mathematical results establishing interesting connections between machine learning and Hilbert 16th problem.
Feb 6
Bren Hall 4011
1 pm
Miles Stoudenmire
Research Scientist
Department of Physics
University of California, Irvine

Tensor networks are a technique for factorizing tensors with hundreds or thousands of indices into a contracted network of low-order tensors. Originally developed at UCI in the 1990’s, tensor networks have revolutionized major areas of physics are starting to be used in applied math and machine learning. I will show that tensor networks fit naturally into a certain class of non-linear kernel learning models, such that advanced optimization techniques from physics can be applied straightforwardly (arxiv:1605.05775). I will discuss many advantages and future directions of tensor network models, for example adaptive pruning of weights and linear scaling with training set size (compared to at least quadratic scaling when using the kernel trick).
Feb 13
Bren Hall 4011
1 pm
Qi Lou
PhD Candidate
Department of Computer Science
University of California, Irvine

Bounding the partition function is a key inference task in many graphical models. In this paper, we develop an anytime anyspace search algorithm taking advantage of AND/OR tree structure and optimized variational heuristics to tighten deterministic bounds on the partition function. We study how our priority-driven best-first search scheme can improve on state-of-the-art variational bounds in an anytime way within limited memory resources, as well as the effect of the AND/OR framework to exploit conditional independence structure within the search process within the context of summation. We compare our resulting bounds to a number of existing methods, and show that our approach offers a number of advantages on real-world problem instances taken from recent UAI competitions.
Feb 20
No Seminar (Presidents Day)

Feb 27
Bren Hall 4011
1 pm
Eric Nalisnick
PhD Candidate
Department of Computer Science
University of California, Irvine

Deep generative models (such as the Variational Autoencoder) efficiently couple the expressiveness of deep neural networks with the robustness to uncertainty of probabilistic latent variables. This talk will first give an overview of deep generative models, their applications, and approximate inference strategies for them. Then I’ll discuss our work on placing Bayesian Nonparametric priors on their latent space, which allows the hidden representations to grow as the data necessitates.
Mar 6
Bren Hall 4011
1 pm
Omer Levy
Postdoctoral Researcher
Department of Computer Science & Engineering
University of Washington

Neural word embeddings, such as word2vec (Mikolov et al., 2013), have become increasingly popular in both academic and industrial NLP. These methods attempt to capture the semantic meanings of words by processing huge unlabeled corpora with methods inspired by neural networks and the recent onset of Deep Learning. The result is a vectorial representation of every word in a low-dimensional continuous space. These word vectors exhibit interesting arithmetic properties (e.g. king – man + woman = queen) (Mikolov et al., 2013), and seemingly outperform traditional vector-space models of meaning inspired by Harris’s Distributional Hypothesis (Baroni et al., 2014). Our work attempts to demystify word embeddings, and understand what makes them so much better than traditional methods at capturing semantic properties.

Our main result shows that state-of-the-art word embeddings are actually “more of the same”. In particular, we show that skip-grams with negative sampling, the latest algorithm in word2vec, is implicitly factorizing a word-context PMI matrix, which has been thoroughly used and studied in the NLP community for the past 20 years. We also identify that the root of word2vec’s perceived superiority can be attributed to a collection of hyperparameter settings. While these hyperparameters were thought to be unique to neural-network inspired embedding methods, we show that they can, in fact, be ported to traditional distributional methods, significantly improving their performance. Among our qualitative results is a method for interpreting these seemingly-opaque word-vectors, and the answer to why king – man + woman = queen.

Bio: Omer Levy is a post-doc in the Department of Computer Science & Engineering at the University of Washington, working with Prof. Luke Zettlemoyer. Previously, he completed his BSc and MSc at Technion – Israel Institute of Technology with the guidance of Prof. Shaul Markovitch, and got his PhD at Bar-Ilan University with the supervision of Prof. Ido Dagan and Dr. Yoav Goldberg. Omer is interested in realizing high-level semantic applications such as question answering and summarization to help people cope with information overload. At the heart of these applications are challenges in textual entailment, semantic similarity, and reading comprehension, which form the core of my current research. He is also interested in the current advances in deep learning and how they can facilitate semantic applications.