Sep 23 
No Seminar

Sep 30 4011 Bren Hall 1 pm 
Educational environments have become increasingly reliant on computermediated communication, relying on video conferencing, synchronous chats, and asynchronous forums, in both small (520 learners) and massive (1000+ learner) learning environments. These platforms, which are designed to support or even supplant traditional instruction, have become commonplace across all levels of education, and as a result created big data in education. In order to move forward, the learning sciences field is in need of new automated approaches that offer deeper insights into the dynamics of learner interaction and discourse across online learning platforms. This talk will present results from recent work that uses language and discourse to capture social and cognitive dynamics during collaborative interactions. I will introduce group communication analysis (GCA), a novel approach for detecting emergent learner roles from the participants’ contributions and patterns of interaction. This method makes use of automated computational linguistic analysis of the sequential interactions of participants in online group communication to create distinct interaction profiles. We have applied the GCA to several collaborative learning datasets. Cluster analysis, predictive, and hierarchical linear mixedeffects modeling
were used to assess the validity of the GCA approach, and practical influence of learner roles on student and overall group performance. The results indicate that learners’ patterns in linguistic coordination and cohesion are representative of the roles that individuals play in collaborative discussions. More broadly, GCA provides a framework for researchers to explore the micro intra and interpersonal patterns associated with the participants’ roles and the sociocognitive processes related to successful collaboration.
Bio: I am an assistant professor in the School of Education at UCI. My primary interests are in cognitive psychology, discourse processing, group interaction, and learning analytics. In general, my research focuses on using language and discourse to uncover the dynamics of socially significant, cognitive, and affective processes. I am currently applying computational techniques to model discourse and social dynamics in a variety of environments including small group computermediated collaborative learning environments, collaborative design networks, and massive open online courses (MOOCs). My research has also extended beyond the educational and learning sciences spaces and highlighted the practical applications of computational discourse science in the clinical, political and social sciences areas. 
Oct 7 4011 Bren Hall 1 pm 
Humans can efficiently learn and communicate new knowledge about the world through natural language (e.g, the concept of important emails may be described through explanations like ‘late night emails from my boss are usually important’). Can machines be similarly taught new tasks and behavior through natural language interactions with their users? In this talk, we’ll explore two approaches towards languagebased learning for classifications tasks. First, we’ll consider how language can be leveraged for interactive feature space construction for learning tasks. I’ll present a method that jointly learns to understand language and learn classification models, by using explanations in conjunction with a small number of labeled examples of the concept. Secondly, we’ll examine an approach for using language as a substitute for labeled supervision for training machine learning models, which leverages the semantics of quantifier expressions in everyday language (`definitely’, `sometimes’, etc.) to enable learning in scenarios with limited or no labeled data.
Bio: Shashank Srivastava is an assistant professor in the Computer Science department at the University of North Carolina (UNC) Chapel Hill. Shashank received his PhD from the Machine Learning department at CMU in 2018, and was an AI Resident at Microsoft Research in 201819. Shashank’s research interests lie in conversational AI, interactive machine learning and grounded language understanding. Shashank has an undergraduate degree in Computer Science from IIT Kanpur, and a Master’s degree in Language Technologies from CMU. He received the Yahoo InMind Fellowship for 201617; his research has been covered by popular media outlets including GeekWire and New Scientist. 
Oct 14 4011 Bren Hall 1 pm 
Structured Knowledge Bases (KBs) are extremely useful for applications such as question answering and dialog, but are difficult to populate and maintain. People prefer expressing information in natural language, and hence text corpora, such as Wikipedia, contain more detailed uptodate information. This raises the question — can we directly treat text corpora as knowledge bases for extracting information on demand?
In this talk I will focus on two problems related to this question. First, I will look at augmenting incomplete KBs with textual knowledge for question answering. I will describe a graph neural network model for processing heterogeneous data from the two sources. Next, I will describe a scalable approach for compositional reasoning over the contents of the text corpus, analogous to following a path of relations in a structured KB to answer multihop queries. I will conclude by discussing interesting future research directions in this domain.

Oct 21 4011 Bren Hall 1 pm 
Bayesian inference is often advertised for applications where posterior uncertainties matter. A less appreciated advantage of Bayesian inference is that it allows for highly scalable model selection (“hyperparameter tuning”) via the Expectation Maximization (EM) algorithm and its approximate variant, variational EM.
In this talk, I will present both an application and an improvement of variational EM. The application is for link prediction in knowledge graphs, where a probabilistic approach and variational EM allowed us to train highly flexible models with more than ten thousand hyperparameters, improving predictive performance. In the second part of the talk, I will propose a new family of objective functions for variational EM. We will see that existing versions of variational inference in the literature can be interpreted as various forms of biased importance sampling of the marginal likelihood. Combining this insight with ideas from perturbation theory in statistical physics will lead us to a tighter bound on the true marginal likelihood and to better predictive performance of Variational Autoencoders.
Bio: Robert Bamler is a Postdoc at UCI in the group of Prof. Stephan Mandt. His interests are probabilistic embedding models, variational inference, and probabilistic deep learning methods for data compression. Before joining UCI in December of 2018, Rob worked in the statistical machine learning group at Disney Research in Pittsburgh and Los Angeles. He received his PhD in theoretical statistical and quantum physics from University of Cologne, Germany. 
Oct 28 4011 Bren Hall 1 pm 
Humans interact with other humans or the world through information from various channels including vision, audio, language, haptics, etc. To simulate intelligence, machines require similar abilities to process and combine information from different channels to acquire better situation awareness, better communication ability, and better decisionmaking ability. In this talk, we describe three projects. In the first study, we enable a robot to utilize both vision and audio information to achieve better user understanding. Then we use incremental language generation to improve the robot’s communication with a human. In the second study, we utilize multimodal history tracking to optimize policy planning in taskoriented visual dialogs. In the third project, we tackle the wellknown tradeoff between dialog response relevance and policy effectiveness in visual dialog generation. We propose a new machine learning procedure that alternates from supervised learning and reinforcement learning to optimum language generation and policy planning jointly in visual dialogs. We will also cover some recent ongoing work on image synthesis through dialogs, and generating social multimodal dialogs with a blend of GIF and words.
Bio: Zhou Yu is an Assistant Professor at the Computer Science Department at UC Davis. She received her PhD from Carnegie Mellon University in 2017. Zhou is interested in building robust and multipurpose dialog systems using fewer data points and less annotation. She also works on language generation, vision and language tasks. Zhou’s work on persuasive dialog systems received an ACL 2019 best paper nomination recently. Zhou was featured in Forbes as 2018 30 under 30 in Science for her work on multimodal dialog systems. Her team recently won the 2018 Amazon Alexa Prize on building an engaging social bot for a $500,000 cash award. 
Nov 4 
Variational inference provides a general optimization framework to approximate the posterior distributions of latent variables in probabilistic models. Although effective in simple scenarios, it may be inaccurate or infeasible when the data is highdimensional, the model structure is complicated, or variable relationships are nonconjugate. In this talk, I will present two different strategies to solve these problems. The first one is to derive rigorous variational bounds by leveraging the probabilistic relations and structural dependencies of the given model. One example I will explore is largescale noisyOR Bayesian networks popular in IT companies for analyzing the semantic content of massive text datasets. The second strategy is to create flexible algorithms directly applicable to many models, as can be expressed by probabilistic programming systems. I’ll talk about a lowvariance Monte Carlo variational inference framework we recently developed for arbitrary models with discrete variables. It has appealing advantages over REINFORCEstyle stochastic gradient estimates and modeldependent auxiliaryvariable solutions, as demonstrated on realworld models of images, text, and social networks.
Bio: Geng Ji is a PhD candidate in the CS Department of UC Irvine, advised by Professor Erik Sudderth. His research interests are broadly in probabilistic graphical models, largescale variational inference, as well as their applications in computer vision and natural language processing. He did summer internships at Disney Research in 2017 mentored by Professor Stephan Mandt, and Facebook AI in 2018 which he will join as a fulltime research scientist. 
Nov 11 
Veterans Day

Nov 18 4011 Bren Hall 1 pm 
John T. Halloran Postdoctoral Researcher Dept. of Biomedical Engineering University of California, Davis In the past few decades, mass spectrometrybased proteomics has
dramatically improved our fundamental knowledge of biology, leading to
advancements in the understanding of diseases and methods for clinical
diagnoses. However, the complexity and sheer volume of typical
proteomics datasets make both fast and accurate analysis difficult to
accomplish simultaneously; while machine learning
methods have proven themselves capable of incredibly accurate
proteomic analysis, such methods deter use by requiring extremely long
runtimes in practice. In this talk, we will discuss two core problems in
computational proteomics and how to accelerate the training of
their highly accurate, but slow, machine learning solutions. For the
first problem, wherein we seek to infer the protein subsequences
(called peptides) present in a biological sample, we will improve the
training of graphical models by deriving emission functions which
render conditionalmaximum likelihood learning concave. Used within a
dynamic Bayesian network, we show that these emission functions not
only allow extremely efficient learning of globallyconvergent parameters,
but also drastically outperform the stateoftheart in peptide
identification accuracy. For the second problem, wherein we
seek to further improve peptide identification accuracy by
classifying correct versus incorrect identifications, we will
speed up the stateoftheart in discriminative learning using a
combination of improved convex optimization and extensive
parallelization. We show that on massive datasets containing
hundredsofmillions of peptide identifications, these speedups reduce
discriminative analysis time from several days down to just several
hours, without any degradation in analysis quality. 
Nov 25 4011 Bren Hall 1 pm 
TBA 
Dec 2 4011 Bren Hall 1 pm 
TBA 
Dec 9 
Finals week

AIML
Spring 2019
StandardApr 8 
No Seminar 
Apr 15 Bren Hall 4011 1 pm 
In this presentation, I will present our approach to the problem of automatically reconstructing a complete 3D model of a scene from a single RGB image. This challenging task requires inferring the shape of both visible and occluded surfaces. Our approach utilizes viewercentered, multilayer representation of scene geometry adapted from recent methods for single object shape completion. To improve the accuracy of viewcentered representations for complex scenes, we introduce a novel “Epipolar Feature Transformer” that transfers convolutional network features from an input view to other virtual camera viewpoints, and thus better covers the 3D scene geometry. Unlike existing approaches that first detect and localize objects in 3D, and then infer object shape using categoryspecific models, our approach is fully convolutional, endtoend differentiable, and avoids the resolution and memory limitations of voxel representations. We demonstrate the advantages of multilayer depth representations and epipolar feature transformers on the reconstruction of a large database of indoor scenes. Project page: https://www.ics.uci.edu/~daeyuns/layeredepipolarcnn/ 
Apr 22 Bren Hall 4011 1 pm 
I will discuss machinelearning emulation of O(100M) cloudresolving simulations of moist turbulence for use in multiscale global climate simulation. First, I will present encouraging results from pilot tests on an idealized oceanworld, in which a fully connected deep neural network (DNN) is found to be capable of emulating explicit subgrid vertical heat and vapor transports across a globally diverse population of convective regimes. Next, I will demonstrate that O(10k) instances of the DNN emulator spanning the world are able to feed back realistically with a prognostic global host atmospheric model, producing viable MLpowered climate simulations that exhibit realistic spacetime variability for convectively coupled weather dynamics and even some limited outofsample generalizability to new climate states beyond the training data’s boundaries. I will then discuss a new prototype of the neural network under development that includes the ability to enforce multiple physical constraints within the DNN optimization process, which exhibits potential for further generalizability. Finally, I will conclude with some discussion of the unsolved technical issues and interesting philosophical tensions being raised in the climate modeling community by this disruptive but promising approach for nextgeneration global simulation. 
Apr 29 Bren Hall 4011 1 pm 
Large problems with repetitive substructure arise in many domains such as social network analysis, collective classification, and database entity resolution. In these instances, individual data is augmented with a small set of rules that uniformly govern the relationship among groups of objects (for example: “the friend of my friend is probably my friend” in a social network). Uncertainty is captured by a probabilistic graphical model structure. While theoretically sound, standard reasoning techniques cannot be applied due to the massive size of the network (often millions of random variable and trillions of factors). Previous work on lifted inference efficiently exploits symmetric structure in graphical models, but breaks down in the presence of unique individual data (contained in all realworld problems). Current methods to address this problem are largely heuristic. In this presentation we describe a coarse to fine approximate inference framework that initially treats all individuals identically, gradually relaxing this restriction to finer subgroups. This produces a sequence of inference objective bounds of monotonically increasing cost and accuracy. We then discuss our work on incorporating highorder inference terms (over large subsets of variables) into lifted inference and ongoing challenges in this area. 
May 13 Bren Hall 4011 1 pm 
Reading machines that truly understood what they read would change the world, but our current best reading systems struggle to understand text at anything more than a superficial level. In this talk I try to reason out what it means to “read”, and how reasoning systems might help us get there. I will introduce three reading comprehension datasets that require systems to reason at a deeper level about the text that they read, using numerical, coreferential, and implicative reasoning abilities. I will also describe some early work on models that can perform these kinds of reasoning. Bio: Matt is a senior research scientist at the Allen Institute for Artificial Intelligence (AI2) on the AllenNLP team, and a visiting scholar at UCI. His research focuses primarily on getting computers to read and answer questions, dealing both with open domain reading comprehension and with understanding question semantics in terms of some formal grounding (semantic parsing). He is particularly interested in cases where these two problems intersect, doing some kind of reasoning over open domain text. He is the original author of the AllenNLP toolkit for NLP research, and he cohosts the NLP Highlights podcast with Waleed Ammar. 
May 27 
No Seminar (Memorial Day) 
June 3 Bren Hall 4011 12:00 
New technologies for remote sensing and astronomy provide an unprecedented view of Earth, our Sun, and beyond. Traditional dataanalysis pipelines in oceanography, atmospheric sciences, and astronomy struggle to take full advantage of the massive amounts of highdimensional data now available. I will describe opportunities for using deep learning to process satellite and telescope data, and discuss recent work mapping extreme sea states using Satellite Aperture Radar (SAR), inferring the physics of our sun’s atmosphere, and detecting anomalous astrophysical events in other systems, such as comets transiting distant stars. Bio: Peter Sadowski is an Assistant Professor of Information and Computer Sciences at the University of Hawaii Manoa and CoDirector of the AI Precision Health Institute at the University of Hawaii Cancer Center. He completed his Ph.D. and Postdoc at University of California Irvine, and his undergraduate studies at Caltech. His research focuses on deep learning and its applications to the natural sciences, particularly those at the intersection of machine learning and physics. 
June 3 Bren Hall 4011 1 pm 
Deep learning has boosted the performance of many applications tremendously, such as object classification and detection in images, speech recognition and understanding, machine translation, game play such as chess and go etc. However, these all constitute reasonably narrowly and well defined tasks for which it is reasonable to collect very large datasets. For artificial general intelligence (AGI) we will need to learn from a small number of samples, generalize to entirely new domains, and reason about a problem. What do we need in order to make progress to AGI? I will argue that we need to combine the data generating process, such as the physics of the domain and the causal relationships between objects, with the tools of deep learning. In this talk I will present a first attempt to integrate the theory of graphical models, which arguably was the dominating modeling machine learning paradigm around the turn of the twentyfirst century, with deep learning. Graphical models express the relations between random variables in an interpretable way, while probabilistic inference in such networks can be used to reason about these variables. We will propose a new hybrid paradigm where probabilistic message passing in such networks is enhanced with graph convolutional neural networks to improve the ability of such systems to reason and make predictions. 
June 10 
No Seminar (Finals) 
Fall 2018
Standard
Oct 1

No Seminar

Oct 8
Bren Hall 4011 1 pm 
The path to natural language understanding goes through increasingly challenging question answering tasks. I will present research that significantly improves performance on two such tasks: answering complex questions over tables, and opendomain factoid question answering. For answering complex questions, I will present a typeconstrained encoderdecoder neural semantic parser that learns to map natural language questions to programs. For opendomain factoid QA, I will show that training paragraphlevel QA systems to give calibrated confidence scores across paragraphs is crucial when the correct answercontaining paragraph is unknown. I will conclude with some thoughts about how to combine these two disparate QA paradigms, towards the goal of answering complex questions over opendomain text.
Bio:Matt Gardner is a research scientist at the Allen Institute for Artificial Intelligence (AI2), where he has been exploring various kinds of question answering systems. He is the lead designer and maintainer of the AllenNLP toolkit, a platform for doing NLP research on top of pytorch. Matt is also the cohost of the NLP Highlights podcast, where, with Waleed Ammar, he gets to interview the authors of interesting NLP papers about their work. Prior to joining AI2, Matt earned a PhD from Carnegie Mellon University, working with Tom Mitchell on the Never Ending Language Learning project. 
Oct 22
Bren Hall 4011
1 pm 
Assistant Professor
Dept. of Computer Science UC Irvine I will give an overview of some exciting recent developments in deep probabilistic modeling, which combines deep neural networks with probabilistic models for unsupervised learning. Deep probabilistic models are capable of synthesizing artificial data that highly resemble the training data, and are able fool both machine learning classifiers as well as humans. These models have numerous applications in creative tasks, such as voice, image, or video synthesis and manipulation. At the same time, combining neural networks with strong priors results in flexible yet highly interpretable models for finding hidden structure in large data sets. I will summarize my group’s activities in this space, including measuring semantic shifts of individual words over hundreds of years, summarizing audience reactions to movies, and predicting the future evolution of video sequences with applications to neural video coding. 
Oct 25
Bren Hall 3011 3 pm 
(Note: different day (Thurs), time (3pm), and location (3011) relative to usual Monday seminars)
Many of the computational problems that arise in data analysis and machine learning can be expressed mathematically as optimization problems. Indeed, much new algorithmic research in optimization is being driven by the need to solve large, complex problems from these areas. In this talk, we review a number of canonical problems in data analysis and their formulations as optimization problems. We will cover support vector machines / kernel learning, logistic regression (including regularized and multiclass variants), matrix completion, deep learning, and several other paradigms. 
Oct 29
Bren Hall 4011 1 pm 
We study the problem of fairly allocating a set of indivisible items among $n$ agents. Typically, the literature has focused on oneshot algorithms. In this talk we depart from this paradigm and allow items to arrive online. When an item arrives we must immediately and irrevocably allocate it to an agent. A paradigmatic example is that of food banks: food donations arrive, and must be delivered to nonprofit organizations such as food pantries and soup kitchens. Items are often perishable, which is why allocation decisions must be made quickly, and donated items are typically leftovers, leading to lack of information about items that will arrive in the future. Which recipient should a new donation go to? We approach this problem from different angles.
In the first part of the talk, we study the problem of minimizing the maximum envy between any two recipients, after all the goods have been allocated. We give a polynomialtime, deterministic and asymptotically optimal algorithm with vanishing envy, i.e. the maximum envy divided by the number of items T goes to zero as T goes to infinity. In the second part of the talk, we adopt and further develop an emerging paradigm called virtual democracy. We will take these ideas all the way to practice. In the last part of the talk I will present some results from an ongoing work on automating the decisions faced by a food bank called 412 Food Rescue, an organization in Pittsburgh that matches food donations with nonprofit organizations. 
Nov 5
Bren Hall 4011 1 pm 
Image Segmentation and Tracking Utilizing a Difference of Convex Regularized MumfordShah Functional In this talk I will give a brief overview of the segmentation and tracking problems and will propose a new model that tackles both of them. This model incorporates a weighted difference of anisotropic and isotropic total variation (TV) norms into a relaxed formulation of the MumfordShah (MS) model. We will show results exceeding those obtained by the MS model when using the standard TV norm to regularize partition boundaries. Examples illustrating the qualitative differences between the proposed model and the standard MS one will be shown as well. I will also talk about a fast numerical method that is used to optimize the proposed model utilizing the differenceofconvex algorithm (DCA) and the primal dual hybrid gradient (PDHG) method. Finally, future directions will be given that could harness the power of convolution nets for more advanced segmentation tasks. 
Nov 12

No Seminar (Veterans Day)

Nov 19
Bren Hall 4011 1 pm 
Google Accelerated Sciences is a translational research team that brings Google’s technological expertise to the scientific community. Recent advances in machine learning have delivered incredible results in consumer applications (e.g. photo recognition, language translation), and is now beginning to play an important role in life sciences. Taking examples from active collaborations in the biochemical, biological, and biomedical fields, I will focus on how our team transforms science problems into data problems and applies Google’s scaled computation, datadriven engineering, and machine learning to accelerate discovery. See http://g.co/research/gas for our publications and more details.
Bio: 
Nov 26
Bren Hall 4011 1 pm 
Why is natural language the way it is? I propose that human languages can be modeled as solutions to the problem of efficient communication among intelligent agents with certain information processing constraints, in particular constraints on shortterm memory. I present an analysis of dependency treebank corpora of over 50 languages showing that word orders across languages are optimized to limit shortterm memory demands in parsing. Next I develop a Bayesian, informationtheoretic model of human language processing, and show that this model can intuitively explain an apparently paradoxical class of comprehension errors made by both humans and stateoftheart recurrent neural networks (RNNs). Finally I combine these insights in a model of human languages as informationtheoretic codes for latent tree structures, and show that optimization of these codes for expressivity and compressibility results in grammars that resemble human languages. 
Dec 3

No Seminar (NIPS)

Spring 2018
Standard
Apr 2

No Seminar

Apr 9
Bren Hall 4011 1 pm 
Sabino Miranda, Ph.D
CONACyT Researcher
Center for Research and Innovation in Information and Communication Technologies
Sentiment Analysis is a research area concerned with the computational analysis of people’s feelings or beliefs expressed in texts such as emotions, opinions, attitudes, appraisals, etc. At the same time, with the growth of social media data (review websites, microblogging sites, etc.) on the Web, Twitter has received particular attention because it is a huge source of opinionated information with potential applications to decisionmaking tasks from business applications to the analysis of social and political events. In this context, I will present the multilingual and errorrobust approaches developed in our group to tackle sentiment analysis as a classification problem, mainly for informal written text such as Twitter. Our approaches have been tested in several benchmark contests such as SemEval (International Workshop on Semantic Evaluation), TASS (Workshop for Sentiment Analysis Focused on Spanish), and PAN (Workshop on Digital Text Forensics). 
Apr 16
Bren Hall 4011 1 pm 
Professor of Mathematics
University of California, Irvine
A simple way to generate a Boolean function in n variables is to take the sign of some polynomial. Such functions are called polynomial threshold functions. How many lowdegree polynomial threshold functions are there? This problem was solved for degree d=1 by Zuev in 1989 and has remained open for any higher degrees, including d=2, since then. In a joint work with Pierre Baldi (UCI), we settled the problem for all degrees d>1. The solution explores connections of Boolean functions to additive combinatorics and highdimensional probability. This leads to a program of extending random matrix theory to random tensors, which is mostly an uncharted territory at present. 
Apr 23
Bren Hall 4011 1 pm 
PhD Candidate, Computer Science
Brown University
We develop new representations and algorithms for threedimensional (3D) scene understanding from images and videos. In cluttered indoor scenes, RGBD images are typically described by local geometric features of the 3D point cloud. We introduce descriptors that account for 3D camera viewpoint, and use structured learning to perform 3D object detection and room layout prediction. We also extend this work by using latent support surfaces to capture style variations of 3D objects and help detect small objects. Contextual relationships among categories and layout are captured via a cascade of classifiers, leading to holistic scene hypotheses with improved accuracy. In outdoor autonomous driving applications, given two consecutive frames from a pair of stereo cameras, 3D scene flow methods simultaneously estimate the 3D geometry and motion of the observed scene. We incorporate semantic segmentation in a cascaded prediction framework to more accurately model moving objects by iteratively refining segmentation masks, stereo correspondences, 3D rigid motion estimates, and optical flow fields. 
Apr 30

Cancelled

May 7
Bren Hall 4011 1 pm 
Assistant Professor
University of Utah
Natural language processing (NLP) sees potential applicability in a broad array of userfacing applications. To realize this potential, however, we need to address several challenges related to representations, data availability and scalability.
In this talk, I will discuss these concerns and how we may overcome them. First, as a motivating example of NLP’s broad reach, I will present our recent work on using language technology to improve mental health treatment. Then, I will focus on some of the challenges that need to be addressed. The choice of representations can make a big difference in our ability to reason about text; I will discuss recent work on developing rich semantic representations. Finally, I will touch upon the problem of systematically speeding up the entire NLP pipeline without sacrificing accuracy. As a concrete example, I will present a new algebraic characterization of the process of feature extraction, as a direct consequence of which, we can make trained classifiers significantly faster. 
May 14
Bren Hall 4011 1 pm 
PhD Candidate, Computer Science
University of California, Irvine
Objects may appear at arbitrary scales in perspective images of a scene, posing a challenge for recognition systems that process images at a fixed resolution. We propose a depthaware gating module that adaptively selects the pooling field size (by fusing multiscale pooled features) in a convolutional network architecture according to the object scale (inversely proportional to the depth) so that small details are preserved for distant objects while larger receptive fields are used for those nearby. The depth gating signal is provided by stereo disparity or estimated directly from monocular input. We further integrate this depthaware gating into a recurrent convolutional neural network to refine semantic segmentation, and show stateoftheart performance on several benchmarks.
Moreover, rather than fusing mutliscale pooled features based on estimated depth, we show the “correct” size of pooling field for each pixel can be decided in an attentional fashion by our Pixelwise Attentional Gating unit (PAG), which learns to choose the pooling size for each pixel. PAG is a generic, architectureindependent, problemagnostic mechanism that can be readily “plugged in” to an existing model with finetuning. We utilize PAG in two ways: 1) learning spatially varying pooling fields that improves model performance without the extra computation cost, and 2) learning a dynamic computation policy for each pixel to decrease total computation while maintaining accuracy. We extensively evaluate PAG on a variety of perpixel labeling tasks, including semantic segmentation, boundary detection, monocular depth and surface normal estimation. We demonstrate that PAG allows competitive or stateoftheart performance on these tasks. We also show that PAG learns dynamic spatial allocation of computation over the input image which provides better performance tradeoffs compared to related approaches (e.g., truncating deep models or dynamically skipping whole layers). Generally, we observe that PAG reduces computation by 10% without noticeable loss in accuracy, and performance degrades gracefully when imposing stronger computational constraints. 
May 21
Bren Hall 4011 1 pm 
Principal Researcher
Microsoft Research
In machine learning often a tradeoff must be made between accuracy and intelligibility: the most accurate models usually are not very intelligible (e.g., deep nets, boosted trees and random forests), and the most intelligible models usually are less accurate (e.g., logistic regression and decision lists). This tradeoff often limits the accuracy of models that can be safely deployed in missioncritical applications such as healthcare where being able to understand, validate, edit, and ultimately trust a learned model is important. We have been working on a learning method based on generalized additive models (GAMs) that is often as accurate as full complexity models, but even more intelligible than linear models. This makes it easy to understand what a model has learned, and also makes it easier to edit the model when it learns inappropriate things because of unanticipated problems with the data. Making it possible for experts to understand a model and repair it is critical because most data has unanticipated landmines. In the talk I’ll present two healthcare cases studies where these highaccuracy GAMs discover surprising patterns in the data that would have made deploying a blackbox model risky. I’ll also briefly show how we’re using these models to detect bias in domains where fairness and transparency are paramount. 
May 28

Memorial Day

Jun 4
Bren Hall 4011 1 pm 
Stephen McAleer (Pierre Baldi‘s group)
Graduate Student, Computer Science
University of California, Irvine
We will present a novel approach to solving the Rubik’s cube effectively without any human knowledge using several ingredients including deep learning, reinforcement learning, and Monte Carlo searches.
At the end, if time permits, we will describe several extensions to the neuronal Boolean complexity results presented by Roman Vershynin a few weeks ago. 
Jun 11

No Seminar (finals week)

Winter 2018
Standard
Jan 15

No Seminar (MLK Day)

Jan 22
Bren Hall 4011 1 pm 
Shufeng Kong
PhD Candidate
Centre for Quantum Software and Information, FEIT
University of Technology Sydney, Australia
The Simple Temporal Problem (STP) is a fundamental temporal reasoning problem and has recently been extended to the Multiagent Simple Temporal Problem (MaSTP). In this paper we present a novel approach that is based on enforcing arcconsistency (AC) on the input (multiagent) simple temporal network. We show that the ACbased approach is sufficient for solving both the STP and MaSTP and provide efficient algorithms for them. As our ACbased approach does not impose new constraints between agents, it does not violate the privacy of the agents and is superior to the stateoftheart approach to MaSTP. Empirical evaluations on diverse benchmark datasets also show that our ACbased algorithms for STP and MaSTP are significantly more efficient than existing approaches. 
Jan 29
Bren Hall 4011 1 pm 
Postdoctoral Scholar
Paul Allen School of Computer Science and Engineering
University of Washington
Deep learning is one of the most important techniques used in natural language processing (NLP). A central question in deep learning for NLP is how to design a neural network that can fully utilize the information from training data and make accurate predictions. A key to solving this problem is to design a better network architecture.
In this talk, I will present two examples from my work on how structural information from natural language helps design better neural network models. The first example shows adding coreference structures of entities not only helps different aspects of text modeling, but also improves the performance of language generation; the second example demonstrates structures of organizing sentences into coherent texts can help neural networks build better representations for various text classification tasks. Along the lines of this topic, I will also propose a few ideas for future work and discuss some potential challenges. 
February 5

No Seminar (AAAI)

February 12
Bren Hall 4011 1 pm 
PhD Candidate
Computer Science
University of California, Irvine
Bayesian inference for complex models—the kinds needed to solve complex tasks such as object recognition—is inherently intractable, requiring analytically difficult integrals be solved in high dimensions. One solution is to turn to variational Bayesian inference: a parametrized family of distributions is proposed, and optimization is carried out to find the member of the family nearest to the true posterior. There is an innate tradeoff within VI between expressive vs tractable approximations. We wish the variational family to be as rich as possible so as it might include the true posterior (or something very close), but adding structure to the approximation increases the computational complexity of optimization. As a result, there has been much interest in efficient optimization strategies for mixture model approximations. In this talk, I’ll return to the problem of using mixture models for VI. First, to motivate our approach, I’ll discuss the distinction between averaging vs combining variational models. We show that optimization objectives aimed at fitting mixtures (i.e. model combination), in practice, are relaxed into performing something between model combination and averaging. Our primary contribution is to formulate a novel training algorithm for variational model averaging by adapting Stein variational gradient descent to operate on the parameters of the approximating distribution. Then, through a particular choice of kernel, we show the algorithm can be adapted to perform something closer to model combination, providing a new algorithm for optimizing (finite) mixture approximations. 
February 19

No Seminar (President’s Day)

February 26
Bren Hall 4011 1 pm 
Research Scientist
ISI/USC
Knowledge is an essential ingredient in the quest for artificial intelligence, yet scalable and robust approaches to acquiring knowledge have challenged AI researchers for decades. Often, the obstacle to knowledge acquisition is massive, uncertain, and changing data that obscures the underlying knowledge. In such settings, probabilistic models have excelled at exploiting the structure in the domain to overcome ambiguity, revise beliefs and produce interpretable results. In my talk, I will describe recent work using probabilistic models for knowledge graph construction and information extraction, including linking subjects across electronic health records, fusing background knowledge from scientific articles with gene association studies, disambiguating user browsing behavior across platforms and devices, and aligning structured data sources with textual summaries. I also highlight several areas of ongoing research, fusing embedding approaches with probabilistic modeling and building models that support dynamic data or humanintheloop interactions.
Bio: 
March 5
Bren Hall 4011 1 pm 
Assistant Professor
UC Riverside
Tensors and tensor decompositions have been very popular and effective tools for analyzing multiaspect data in a wide variety of fields, ranging from Psychology to Chemometrics, and from Signal Processing to Data Mining and Machine Learning. Using tensors in the era of big data presents us with a rich variety of applications, but also poses great challenges such as the one of scalability and efficiency. In this talk I will first motivate the effectiveness of tensor decompositions as data analytic tools in a variety of exciting, realworld applications. Subsequently, I will discuss recent techniques on tackling the scalability and efficiency challenges by parallelizing and speeding up tensor decompositions, especially for very sparse datasets, including the scenario where the data are continuously updated over time. Finally, I will discuss open problems in unsupervised tensor mining and quality assessment of the results, and present workinprogress addressing that problem with very encouraging results. 
March 12
Bren Hall 4011 1 pm 
PhD Student
UC Los Angeles
I will describe the basic elements of the Emergence Theory of Deep Learning, that started as a general theory for representations, and is comprised of three parts: (1) We formalize the desirable properties that a representation should possess, based on classical principles of statistical decision and information theory: invariance, sufficiency, minimality, disentanglement. We then show that such an optimal representation of the data can be learned by minimizing a specific loss function which is related to the notion of Information Bottleneck and Variational Inference. (2) We analyze common empirical losses employed in Deep Learning (such as empirical crossentropy), and implicit or explicit regularizers, including Dropout and Pooling, and show that they bias the network toward recovering such an optimal representation. Finally, (3) we show that minimizing a suitably (implicitly or explicitly) regularized loss with SGD with respect to the weights of the network implies implicit optimization of the loss described in (1), with relates instead to the activations of the network. Therefore, even when we optimize a DNN as a blackbox classifier, we are always biased toward learning minimal, sufficient and invariant representation. The link between (implicit or explicit) regularization of the classification loss and learning of optimal representations is specific to the architecture of deep networks, and is not found in a general classifier. The theory is related to a new version of the Information Bottleneck that studies the weights of a network, rater than the activation, and can also be derived using PACBayes or Kolmogorov complexity arguments, providing independent validation. 
March 19

No Seminar (Finals Week)

Fall 2017
Standard
Oct 9

No Seminar (Columbus Day)

Oct 16
Bren Hall 3011 1 pm 
We investigate the problem of automatically determining what type of shoe left an impression found at a crime scene. This recognition problem is made difficult by the variability in types of crime scene evidence (ranging from traces of dust or oil on hard surfaces to impressions made in soil) and the lack of comprehensive databases of shoe outsole tread patterns. We find that midlevel features extracted by pretrained convolutional neural nets are surprisingly effective descriptors for these specialized domains. However, the choice of similarity measure for matching exemplars to a query image is essential to good performance. For matching multichannel deep features, we propose the use of multichannel normalized crosscorrelation and analyze its effectiveness. Finally, we introduce a discriminatively trained variant and finetune our system endtoend, obtaining stateoftheart performance. 
Oct 23
Bren Hall 3011 1 pm 
We propose a hierarchical generative model that captures the selfsimilar structure of image regions as well as how this structure is shared across image collections. Our model is based on a novel, variational interpretation of the popular expected patch loglikelihood (EPLL) method as a model for randomly positioned grids of image patches. While previous EPLL methods modeled image patches with finite Gaussian mixtures, we use nonparametric Dirichlet process (DP) mixtures to create models whose complexity grows as additional images are observed. An extension based on the hierarchical DP then captures repetitive and selfsimilar structure via imagespecific variations in cluster frequencies. We derive a structured variational inference algorithm that adaptively creates new patch clusters to more accurately model novel image textures. Our denoising performance on standard benchmarks is superior to EPLL and comparable to the stateoftheart, and we provide novel statistical justifications for common image processing heuristics. We also show accurate image inpainting results. 
Oct 30
Bren Hall 4011 1 pm 
Computing the partition function is a key inference task in many graphical models. In this paper, we propose a dynamic importance sampling scheme that provides anytime finitesample bounds for the partition function. Our algorithm balances the advantages of the three major inference strategies, heuristic search, variational bounds, and Monte Carlo methods, blending sampling with search to refine a variationally defined proposal. Our algorithm combines and generalizes recent work on anytime search and probabilistic bounds of the partition function. By using an intelligently chosen weighted average over the samples, we construct an unbiased estimator of the partition function with strong finitesample confidence intervals that inherit both the rapid early improvement rate of sampling with the longterm benefits of an improved proposal from search. This gives significantly improved anytime behavior, and more flexible tradeoffs between memory, time, and solution quality. We demonstrate the effectiveness of our approach empirically on realworld problem instances taken from recent UAI competitions. 
Nov 6
Bren Hall 3011 1 pm 
Estimating evolutionary trees, called phylogenies or genealogies, is a fundamental task in modern biology. Once phylogenetic reconstruction is accomplished, scientists are faced with a challenging problem of interpreting phylogenetic trees. In certain situations, a coalescent process, a stochastic model that randomly generates evolutionary trees, comes to rescue by probabilistically connecting phylogenetic reconstruction with the demographic history of the population under study. An important application of the coalescent is phylodynamics, an area that aims at reconstructing past population dynamics from genomic data. Phylodynamic methods have been especially successful in analyses of genetic sequences from viruses circulating in human populations. From a Bayesian hierarchal modeling perspective, the coalescent process can be viewed as a prior for evolutionary trees, parameterized in terms of unknown demographic parameters, such as the population size trajectory. I will review Bayesian nonparametric techniques that can accomplish phylodynamic reconstruction, with a particular attention to analysis of genetic data sampled serially through time. 
Nov 20

No Seminar (Thanksgiving Week)

Dec 4

No Seminar (NIPS Conference)

Dec 13
Bren Hall 4011 1 pm 
We learn recurrent neural network optimizers trained on simple synthetic functions by gradient descent. We show that these learned optimizers exhibit a remarkable degree of transfer in that they can be used to efficiently optimize a broad range of derivativefree blackbox functions, including Gaussian process bandits, simple control objectives, global optimization benchmarks and hyperparameter tuning tasks. Up to the training horizon, the learned optimizers learn to tradeoff exploration and exploitation, and compare favourably with heavily engineered Bayesian optimization packages for hyperparameter tuning. 
Spring 2017
Standard
Apr 10
Bren Hall 4011 1 pm 
I’ll present two algorithms that use divide and conquer techniques to speed up learning. The first algorithm (called OWA) is a communication efficient distributed learner. OWA uses only two rounds of communication, which is sufficient to achieve optimal learning rates. The second algorithm is a metaalgorithm for fast cross validation. I’ll show that for any divide and conquer learning algorithm, there exists a fast cross validation procedure whose run time is asymptotically independent of the number of cross validation folds. 
Apr 17
Bren Hall 4011 1 pm 
Cameras can naturally capture sequences of images, or videos. And when understanding videos, connecting the past with the present requires tracking. Sometimes tracking is easy. We focus on two challenges which make tracking harder: longterm occlusions and appearance variations. To handle total occlusion, a tracker must know when it has lost track and how to reinitialize tracking when the target reappears. Reinitialization requires good appearance models. We build appearance models for humans and hands, with a particular emphasis on robustness and occlusion. For the second challenge, appearance variation, the tracker must know when and how to relearn (or update) an appearance model. This challenge leads to the classic problem of drift: aggressively learning appearance changes allows small errors to compound, as elements of the background environment pollute the appearance model. We propose two solutions. First, we consider selfpaced learning, wherein a tracker begins by learning from frames it finds easy. As the tracker becomes better at recognizing the target, it begins to learn from harder frames. We also develop a datadriven approach: train a tracking policy to decide when and how to update an appearance model. To take this direct approach to “learning when to learn”, we exploit largescale Internet data through reinforcement learning. We interpret the resulting policy and conclude with a generalization for tracking multiple objects. 
Apr 24
Bren Hall 4011 1 pm 
David R Thompson
Jet Propulsion Laboratory
Imaging spectrometers enable quantitative maps of physical and chemical properties at high spatial resolution. They have a long history of deployments for mapping terrestrial and coastal aquatic ecosystems, geology, and atmospheric properties. They are also critical tools for exploring other planetary bodies. These highdimensional spatiospectral datasets pose a rich challenge for computer scientists and algorithm designers. This talk will provide an introduction to remote imaging spectroscopy in the Visible and Shortwave Infrared, describing the measurement strategy and data analysis considerations including atmospheric correction. We will describe historical and current instruments, software, and public datasets.
Bio: David R. Thompson is a researcher and Technical Group Lead in the Imaging Spectroscopy group at the NASA Jet Propulsion Laboratory. He is Investigation Scientist for the AVIRIS imaging spectrometer project. Other roles include software lead for the NEAScout mission, autonomy software lead for the PIXL instrument, and algorithm development for diverse JPL airborne imaging spectrometer campaigns. He is recipient of the NASA Early Career Achievement Medal and the JPL Lew Allen Award. 
May 1
Bren Hall 4011 1 pm 
Bayesian nonparametric (BNP) models have been widely used in modern applications. In this talk, I will discuss some recent theoretical results for the commonly used BNP methods from a frequentist asymptotic perspective. I will cover a set of function estimation and testing problems such as density estimation, highdimensional partial linear regression, independence testing, and independent component analysis. Minimax optimal convergence rates, adaptation and Bernsteinvon Mises theorem will be discussed. 
May 8
Bren Hall 4011 1 pm 
During the last two decades the experience of consumers has been undergoing a fundamental and dramatic transformation – giving a rich variety of informed choices, online shopping, consumption of news and entertainment on the go, and personalized shopping experiences. All of this has been powered by the massive amounts of data that is continuously being collected and the application of machine learning, data science and AI techniques to it.
Adobe is a leader the Digital Marketing and is the leading provider of solutions to enterprises that are serving customers both in the B2B and B2C space. In this talk, we will outline the current state of the industry and the technology that is behind it, how Data Science and Machine Learning are gradually beginning to transform the experiences of the consumer as well as the marketer. We will also speculate on how recent developments in Artificial Intelligence will lead to deep personalization and richer experiences for the consumer as well as more powerful and tailored endtoend capabilities for the marketer. Bio: Dr. P. Anandan is Vice President in Adobe Research, responsible for developing research strategy for Adobe, especially in Digital Marketing, and Leading the Adobe India Research lab. An emphasis of this lab is on Big Data Experience and Intelligence. At Adobe, he is also leading efforts in applying A.I. to Big Data. Dr. Anandan is an expert in Computer Vision with more than 60 publications that have earned 14,500 citations in Google Scholar. His research areas include visual motion analysis, video surveillance, and 3D scene modeling from images and video. His papers have won multiple awards including the Helmholtz Prize, for long term fundamental contributions to computer vision research. Prior to joining Adobe Dr. Anandan had a long tenure with Microsoft Research in Redmond, WA, and became a Distinguished Scientist. He was the Managing Director of Microsoft Research India, which he founded. Most recently he was the Managing Director of Microsoft Research’s Worldwide Outreach. He earned a PhD from the University of Massachusetts specializing in Computer Vision and Artificial Intelligence. He started as an assistant professor at Yale University before moving on to work in Video Information Processing at the David Sarnoff Research Center. His research has been used in DARPA’s Video Surveillance and Monitoring program as well as in creating special effects in the movies “What Dreams May Come”, “Prince of Egypt,” and “The Matrix.” Dr. Anandan is the recipient of Distinguished Alumnus awards from both University of Massachusetts and the Indian Institute of Technology Madras, where he earned a B. Tech. in Electrical Engineering. He was inducted into the Nebraska Hall of Computing by the University of Nebraska, from where he obtained an MS in Computer Science. He is currently a member of the Board of Governors of IIT Madras. 
May 15
Bren Hall 4011 1 pm 
Ndapa Nakashole
Assistant Professor Computer Science and Engineering University of California, San Diego
Zeroshot learning is used in computer vision, natural language, and other domains to induce mapping functions that project vectors from one vector space to another. This is a promising approach to learning, when we do not have labeled data for every possible label we want a system to recognize. This setting is common when doing NLP for lowresource languages, where labeled data is very scare. In this talk, I will present our work on improving zeroshot learning methods for the task of wordlevel translation.
Bio: Ndapa Nakashole is an Assistant Professor in the Department of Computer Science and Engineering at the University of California, San Diego. Prior to UCSD, she was a Postdoctoral Fellow in the Machine Learning Department at Carnegie Mellon University. She obtained her PhD from Saarland University, Germany, for work done at the Max Planck Institute for Informatics at Saarbrücken. 
May 22
Bren Hall 4011 1 pm 
Batya Kenig
Postdoctoral Scholar Department of Information Systems Engineering Technion – Israel Institute of Technology
We propose a novel framework wherein probabilistic preferences can be naturally represented and analyzed in a probabilistic relational database. The framework augments the relational schema with a special type of a relation symbol, a preference symbol. A deterministic instance of this symbol holds a collection of binary relations. Abstractly, the probabilistic variant is a probability space over databases of the augmented form (i.e., probabilistic database). Effectively, each instance of a preference symbol can be represented as a collection of parametric preference distributions such as Mallows. We establish positive and negative complexity results for evaluating Conjunctive Queries (CQs) over databases where preferences are represented in the Repeated Insertion Model (RIM), Mallows being a special case. We show how CQ evaluation reduces to a novel inference problem (of independent interest) over RIM, and devise a solver with polynomial data complexity. 
May 29

No Seminar (Memorial Day)

Jun 5
Bren Hall 4011 1 pm 
The future of selfdriving cars, personal robots, smart homes, and intelligent assistants hinges on our ability to communicate with computers. The failures and miscommunications of Siristyle systems are untenable and become more problematic as machines become more pervasive and are given more control over our lives. Despite the creation of massive proprietary datasets to train dialogue systems, these systems still fail at the most basic tasks. Further, their reliance on big data is problematic. First, successes in English cannot be replicated in most of the 6,000+ languages of the world. Second, while big data has been a boon for supervised training methods, many of the most interesting tasks will never have enough labeled data to actually achieve our goals. It is therefore important that we build systems which can learn from naturally occurring data and grounded situated interactions.
In this talk, I will discuss work from my thesis on the unsupervised acquisition of syntax which harnesses unlabeled text in over a dozen languages. This exploration leads us to novel insights into the limits of semanticsfree language learning. Having isolated these stumbling blocks, I’ll then present my recent work on language grounding where we attempt to learn the meaning of several linguistic constructions via interaction with the world. Bio: Yonatan Bisk’s research focuses on Natural Language Processing from naturally occurring data (unsupervised and weakly supervised data). He is a postdoc researcher with Daniel Marcu at USC’s Information Sciences Institute. Previously, he received his Ph.D. from the University of Illinois at UrbanaChampaign under Julia Hockenmaier and his BS from the University of Texas at Austin. 
Winter 2017
Standard
Jan 16

No Seminar (MLK Day)

Jan 23
Bren Hall 4011 1 pm 
In online advertisement as well as many other fields such as health informatics and computational finance, we often have to deal with the situation in which we are given a batch of data generated by the current strategy(ies) of the company (hospital, investor), and we are asked to generate a good or an optimal strategy. Although there are many techniques to find a good policy given a batch of data, there are not much results to guarantee that the obtained policy will perform well in the real system without deploying it. On the other hand, deploying a policy might be risky, and thus, requires convincing the product (hospital, investment) manager that it is not going to harm the business. This is why it is extremely important to devise algorithms that generate policies with performance guarantees.
In this talk, we discuss four different approaches to this fundamental problem, we call them modelbased, modelfree, online, and risksensitive. In the modelbased approach, we first use the batch of data and build a simulator that mimics the behavior of the dynamical system under studies (online advertisement, hospital’s ER, financial market), and then use this simulator to generate data and learn a policy. The main challenge here is to have guarantees on the performance of the learned policy, given the error in the simulator. This line of research is closely related to the area of robust learning and control. In the modelfree approach, we learn a policy directly from the batch of data (without building a simulator), and the main question is whether the learned policy is guaranteed to perform at least as well as a baseline strategy. This line of research is related to offpolicy evaluation and control. In the online approach, the goal is to control the exploration of the algorithm in a way that never during its execution the loss of using it instead of the baseline strategy is more than a given margin. In the risksensitive approach, the goal is to learn a policy that manages risk by minimizing some measure of variability in the performance in addition to maximizing a standard criterion. We present algorithms based on these approaches and demonstrate their usefulness in realworld applications such as personalized ad recommendation, energy arbitrage, traffic signal control, and American option pricing. Bio:Mohammad Ghavamzadeh received a Ph.D. degree in Computer Science from the University of Massachusetts Amherst in 2005. From 2005 to 2008, he was a postdoctoral fellow at the University of Alberta. He has been a permanent researcher at INRIA in France since November 2008. He was promoted to firstclass researcher in 2010, was the recipient of the “INRIA award for scientific excellence” in 2011, and obtained his Habilitation in 2014. He is currently (from October 2013) on a leave of absence from INRIA working as a senior analytics researcher at Adobe Research in California, on projects related to digital marketing. He has been an area chair and a senior program committee member at NIPS, IJCAI, and AAAI. He has been on the editorial board of Machine Learning Journal (MLJ), has published over 50 refereed papers in major machine learning, AI, and control journals and conferences, and has organized several tutorials and workshops at NIPS, ICML, and AAAI. His research is mainly focused on sequential decisionmaking under uncertainty, reinforcement learning, and online learning. 
Jan 27
Bren Hall 6011 11:00am 
In this talk, I will first introduce a broad class of unsupervised deep learning models and show that they can learn useful hierarchical representations from large volumes of highdimensional data with applications in information retrieval, object recognition, and speech perception. I will next introduce deep models that are capable of extracting a unified representation that fuses together multiple data modalities and present the Reverse Annealed Importance Sampling Estimator (RAISE) for evaluating these deep generative models. Finally, I will discuss models that can generate natural language descriptions (captions) of images and generate images from captions using attention, as well as introduce multiplicative and finegrained gating mechanisms with application to reading comprehension.
Bio: Ruslan Salakhutdinov received his PhD in computer science from the University of Toronto in 2009. After spending two postdoctoral years at the Massachusetts Institute of Technology Artificial Intelligence Lab, he joined the University of Toronto as an Assistant Professor in the Departments of Statistics and Computer Science. In 2016 he joined the Machine Learning Department at Carnegie Mellon University as an Associate Professor. Ruslan’s primary interests lie in deep learning, machine learning, and largescale optimization. He is an action editor of the Journal of Machine Learning Research and served on the senior programme committee of several learning conferences including NIPS and ICML. He is an Alfred P. Sloan Research Fellow, Microsoft Research Faculty Fellow, Canada Research Chair in Statistical Machine Learning, a recipient of the Early Researcher Award, Google Faculty Award, Nvidia’s Pioneers of AI award, and is a Senior Fellow of the Canadian Institute for Advanced Research. 
Jan 30
Bren Hall 4011 1 pm 
Pierre Baldi & Peter Sadowski
Chancellor’s Professor Department of Computer Science University of California, Irvine
Learning in the Machine is a style of machine learning that takes into account the physical constraints of learning machines, from brains to neuromorphic chips. Taking into account these constraints leads to new insights into the foundations of learning systems, and occasionally leads also to improvements for machine learning performed on digital computers. Learning in the Machine is particularly useful when applied to message passing algorithms such as backpropagation and belief propagation, and leads to the concepts of local learning and learning channel. These concepts in turn will be applied to random backpropagation and several new variants. In addition to simulations corroborating the remarkable robustness of these algorithms, we will present new mathematical results establishing interesting connections between machine learning and Hilbert 16th problem. 
Feb 6
Bren Hall 4011 1 pm 
Tensor networks are a technique for factorizing tensors with hundreds or thousands of indices into a contracted network of loworder tensors. Originally developed at UCI in the 1990’s, tensor networks have revolutionized major areas of physics are starting to be used in applied math and machine learning. I will show that tensor networks fit naturally into a certain class of nonlinear kernel learning models, such that advanced optimization techniques from physics can be applied straightforwardly (arxiv:1605.05775). I will discuss many advantages and future directions of tensor network models, for example adaptive pruning of weights and linear scaling with training set size (compared to at least quadratic scaling when using the kernel trick). 
Feb 13
Bren Hall 4011 1 pm 
Bounding the partition function is a key inference task in many graphical models. In this paper, we develop an anytime anyspace search algorithm taking advantage of AND/OR tree structure and optimized variational heuristics to tighten deterministic bounds on the partition function. We study how our prioritydriven bestfirst search scheme can improve on stateoftheart variational bounds in an anytime way within limited memory resources, as well as the effect of the AND/OR framework to exploit conditional independence structure within the search process within the context of summation. We compare our resulting bounds to a number of existing methods, and show that our approach offers a number of advantages on realworld problem instances taken from recent UAI competitions. 
Feb 20

No Seminar (Presidents Day)

Feb 27
Bren Hall 4011 1 pm 
Deep generative models (such as the Variational Autoencoder) efficiently couple the expressiveness of deep neural networks with the robustness to uncertainty of probabilistic latent variables. This talk will first give an overview of deep generative models, their applications, and approximate inference strategies for them. Then I’ll discuss our work on placing Bayesian Nonparametric priors on their latent space, which allows the hidden representations to grow as the data necessitates. 
Mar 6
Bren Hall 4011 1 pm 
Omer Levy
Postdoctoral Researcher Department of Computer Science & Engineering University of Washington
Neural word embeddings, such as word2vec (Mikolov et al., 2013), have become increasingly popular in both academic and industrial NLP. These methods attempt to capture the semantic meanings of words by processing huge unlabeled corpora with methods inspired by neural networks and the recent onset of Deep Learning. The result is a vectorial representation of every word in a lowdimensional continuous space. These word vectors exhibit interesting arithmetic properties (e.g. king – man + woman = queen) (Mikolov et al., 2013), and seemingly outperform traditional vectorspace models of meaning inspired by Harris’s Distributional Hypothesis (Baroni et al., 2014). Our work attempts to demystify word embeddings, and understand what makes them so much better than traditional methods at capturing semantic properties.
Our main result shows that stateoftheart word embeddings are actually “more of the same”. In particular, we show that skipgrams with negative sampling, the latest algorithm in word2vec, is implicitly factorizing a wordcontext PMI matrix, which has been thoroughly used and studied in the NLP community for the past 20 years. We also identify that the root of word2vec’s perceived superiority can be attributed to a collection of hyperparameter settings. While these hyperparameters were thought to be unique to neuralnetwork inspired embedding methods, we show that they can, in fact, be ported to traditional distributional methods, significantly improving their performance. Among our qualitative results is a method for interpreting these seeminglyopaque wordvectors, and the answer to why king – man + woman = queen. Bio: Omer Levy is a postdoc in the Department of Computer Science & Engineering at the University of Washington, working with Prof. Luke Zettlemoyer. Previously, he completed his BSc and MSc at Technion – Israel Institute of Technology with the guidance of Prof. Shaul Markovitch, and got his PhD at BarIlan University with the supervision of Prof. Ido Dagan and Dr. Yoav Goldberg. Omer is interested in realizing highlevel semantic applications such as question answering and summarization to help people cope with information overload. At the heart of these applications are challenges in textual entailment, semantic similarity, and reading comprehension, which form the core of my current research. He is also interested in the current advances in deep learning and how they can facilitate semantic applications. 
Fall 2016
Standard
Sep 22
Bren Hall 4011 1 pm 
Burr Settles
Duolingo
Duolingo is a language education platform that teaches 20 languages to more than 150 million students worldwide. Our free flagship learning app is the \#1 way to learn a language online, and is the mostdownloaded education app for both Android and iOS devices. In this talk, I will describe the Duolingo system and several of our empirical research projects to date, which combine machine learning with computational linguistics and psychometrics to improve learning, engagement, and even language proficiency assessment through our products. 
Sep 26
Bren Hall 4011 1 pm 
Convolutional Neural Net (CNN) architectures have terrific recognition performance but rely on spatial pooling which makes it difficult to adapt them to tasks that require dense, pixelaccurate labeling. We make two contributions to solving this problem: (1) We demonstrate that while the apparent spatial resolution of convolutional feature maps is low, the highdimensional feature representation contains significant subpixel localization information. (2) We describe a multiresolution reconstruction architecture based on a Laplacian pyramid that uses skip connections from higher resolution feature maps and multiplicative gating to successively refine segment boundaries reconstructed from lowerresolution maps. This approach yields stateoftheart semantic segmentation results on the PASCAL VOC and Cityscapes segmentation benchmarks without resorting to more complex randomfield inference or instance detection driven architectures. 
Oct 3
Bren Hall 4011 1 pm 
Despite the rapid development of computer graphics during the recent years, complex materials such as fabrics, fur, and human hair remain largely lacking in the virtual worlds. This is due to both the lack of highfidelity data and the inability to efficiently describe these complicated objects via mathematical/statistical models.
In this talk, I will present my research that introduces new means to acquire, model, and render complex materials that are essential to our daily lives with a focus on fabrics. Leveraging detailed geometric information and sophisticated optical model, our work has led to computer generated imagery with a new level of accuracy and fidelity. In particular, we measure realworld samples using volume imaging (e.g., computed microtomography) to obtain detailed datasets on their microgeometries. We then fit sophisticated statistical models to the measured data, yielding highly compact yet realistic representations. Lastly, we show how to recover a sample’s optical properties (e.g., colors) using optimization. 
Oct 10

No Seminar (Columbus Day)

Oct 17
Bren Hall 4011 1 pm 
Stefano Ermon
Assistant Professor of Computer Science Fellow of the Woods Institute for the Environment Stanford University
Recent technological developments are creating new spatiotemporal data streams that contain a wealth of information relevant to sustainable development goals. Modern AI techniques have the potential to yield accurate, inexpensive, and highly scalable models to inform research and policy. As a first example, I will present a machine learning method we developed to predict and map poverty in developing countries. Our method can reliably predict economic wellbeing using only highresolution satellite imagery. Because images are passively collected in every corner of the world, our method can provide timely and accurate measurements in a very scalable end economic way, and could revolutionize efforts towards global poverty eradication. As a second example, I will present some ongoing work on monitoring agricultural and food security outcomes from space. 
Oct 24

No Seminar (cancelled)

Oct 31
Bren Hall 4011 1 pm 
This talks explores recent uses of machine learning to large proprietary consumer transaction datasets. These are datasets which record barcode level transaction information on individual items purchased grouped by shopping trip and customer. Recent innovations in data collection allow us to go beyond the supermarket scanner to collect such data and include recent efforts to digitize the universe of customers’ receipts across all channels from supermarkets to online purchases. Additionally, passive wifi tracking allows us to record search behavior in stores and model how it translates into sales. It also gives us the opportunity to create real time interventions to nudge consumer shopping behavior. We will explore some of the challenges of modeling consumer behavior using these data and discuss methods such as tensor decompositions for count data, discrete choice modeling with Dirichlet Process Mixtures, and the use of deep autoencoders for producing interpretable statistical hypotheses. 
Nov 7
Bren Hall 4011 1 pm 
This talk investigates the restricted Boltzmann machine (RBM), which is the building block for many deep probabilistic models. We propose an infinite RBM model, whose maximum likelihood estimation corresponds to a constrained convex optimization. We consider the FrankWolfe algorithm to solve the program, which provides a sparse solution that can be interpreted as inserting a hidden unit at each iteration. As a side benefit, this can be used to easily and efficiently identify an appropriate number of hidden units during the optimization. We also investigate different learning algorithms for conditional RBMs. There is a pervasive opinion that loopy belief propagation does not work well on RBMbased models, especially for learning. We demonstrate that, in the conditional setting, learning RBMbased models with belief propagation and its variants can provide much better results than the stateoftheart contrastive divergence algorithms. 
Nov 14
Bren Hall 4011 1 pm 
Traditionally, the field of computational Bayesian statistics has been divided into two main subfields: variational inference and Markov chain Monte Carlo (MCMC). In recent years, however, several methods have been proposed based on combining variational Bayesian inference and MCMC simulation in order to improve their overall accuracy and computational efficiency. This marriage of fast evaluation and flexible approximation provides a promising means of designing scalable Bayesian inference methods. In this work, we explore the possibility of incorporating variational approximation into a stateoftheart MCMC method, Hamiltonian Monte Carlo (HMC), to reduce the required expensive computation involved in the sampling procedure, which is the bottleneck for many applications of HMC in big data problems. To this end, we exploit the regularity in parameter space to construct a freeform approximation of the target distribution by a fast and flexible surrogate function using an optimized additive model of proper random basis. The surrogate provides sufficiently accurate approximation while allowing for fast computation, resulting in an efficient approximate inference algorithm. We demonstrate the advantages of our method on both synthetic and real data problems. 
Nov 16
Bren Hall 4011 4pm 
Arindam Banerjee
Associate Professor Department of Computer Science and Engineering University of Minnesota
Many machine learning problems, especially scientific problems in areas such as ecology, climate science, and brain sciences, operate in the socalled `low samples, high dimensions’ regime. Such problems typically have numerous possible predictors or features, but the number of training examples is small, often much smaller than the number of features. In this talk, we will discuss recent advances in general formulations and estimators for such problems. These formulations generalize prior work such as the Lasso and the Dantzig selector. We will discuss the geometry underlying such formulations, and how the geometry helps in establishing finite sample properties of the estimators. We will also discuss applications of such results in structure learning in probabilistic graphical models, along with real world applications in ecology and climate science.
This is joint work with Soumyadeep Chatterjee, Sheng Chen, Farideh Fazayeli, Andre Goncalves, Jens Kattge, Igor Melnyk, Peter Reich, Franziska Schrodt, Hanhuai Shan, and Vidyashankar Sivakumar. 
Nov 21
Bren Hall 4011 1 pm 
Stein’s method provides a remarkable theoretical tool in probability theory but has not been widely known or used in practical machine learning. In this talk, we try to bright this gap and show that some of the key ideas of Stein’s method can be naturally combined with practical machine learning and probabilistic inference techniques such as kernel method, variational inference and variance reduction, which together form a new general framework for deriving new algorithms for handling the kind of highly complex, structured probabilistic models widely used in modern (deep) machine learning. The new algorithms derived in this way often have a simple, untraditional form and have significant advantages over the traditional methods. I will show several applications, including goodnessoffit tests for evaluating models without knowing the normalization constants, scalable Bayesian inference that combines the advantages of variational inference, Monte Carlo and gradientbased optimization, and approximate maximum likelihood training of deep generative models that can generate realisticlooking images. 
Nov 28
Bren Hall 4011 1 pm 
We develop upper and lower bounds for the probability of Boolean functions by treating multiple occurrences of variables as independent and assigning them new individual probabilities. We call this approach “dissociation” and give an exact characterization of optimal oblivious bounds, i.e. when the new probabilities are chosen independent of the probabilities of all other variables.
Our motivation comes from the weighted model counting problem (or, equivalently, the problem of computing the probability of a Boolean function), which is \#Phard in general. By performing several dissociations, one can transform a Boolean formula whose probability is difficult to compute, into one whose probability is easy to compute, and which is guaranteed to provide an upper or lower bound on the probability of the original formula by choosing appropriate probabilities for the dissociated variables. Our new bounds shed light on the connection between previous relaxationbased and modelbased approximations and unify them as concrete choices in a larger design space. We also show how our theory allows a standard relational database management system to both upper and lower bound hard probabilistic queries in guaranteed polynomial time. (Based on joint work with Dan Suciu from TODS 2014, VLDB 2015, and VLDBJ 2016: http://arxiv.org/pdf/1409.6052,http://arxiv.org/pdf/1412.1069, http://arxiv.org/pdf/1310.6257) 
Dec 5

No Seminar
Finals Week

Spring 2016
Standard
Apr 4

No Seminar (Cancelled)

Apr 11
Bren Hall 4011 1 pm 
Venkat Chandrasekaran
Assistant Professor Computing and Mathematical Sciences & Electrical Engineering California Institute of Technology
Extracting structured planted subgraphs from large graphs is a fundamental question that arises in a range of application domains. We describe a computationally tractable approach based on convex optimization to recover certain families of structured graphs that are embedded in larger graphs containing spurious edges. Our method relies on tractable semidefinite descriptions of majorization inequalities on the spectrum of a matrix, and we give conditions on the eigenstructure of a planted graph in relation to the noise level under which our algorithm succeeds. (Joint work with Utkan Candogan.) 
Apr 18
Bren Hall 4011 1 pm 
Clinical medical data, especially in the intensive care unit (ICU), consist of multivariate time series of observations. For each patient visit (or episode), sensor data and lab test results are recorded in the patient’s Electronic Health Record (EHR). While potentially containing a wealth of insights, the data is difficult to mine effectively, owing to varying length, irregular sampling and missing data. Recurrent Neural Networks (RNNs), particularly those using Long ShortTerm Memory (LSTM) hidden units, are powerful and increasingly popular models for learning from sequence data. They effectively model varying length sequences and capture long range dependencies. We present the first study to empirically evaluate the ability of LSTMs to recognize patterns in multivariate time series of clinical measurements. Specifically, we consider multilabel classification of diagnoses, training a model to classify 128 diagnoses given 13 frequently but irregularly sampled clinical measurements. First, we establish the effectiveness of a simple LSTM network for modeling clinical data. Then we demonstrate a straightforward and effective training strategy in which we replicate targets at each sequence step. Trained only on raw time series, our models outperform several strong baselines, including a multilayer perceptron trained on handengineered features. 
Apr 25
Bren Hall 4011 1 pm 
Jasper Vrugt
Associate Professor Department of Environmental Engineering University of California, Irvine
Bayesian inference has found widespread application and use in science and engineering to reconcile Earth system models with data, including prediction in space (interpolation), prediction in time (forecasting), assimilation of observations and deterministic/stochastic model output, and inference of the model parameters. In this talk I will review the basic elements of the DiffeRential Evolution Adaptive Metropolis (DREAM) algorithm developed by Vrugt et al. (2008,2009) and used for Bayesian inference in fields ranging from physics, chemistry and engineering, to ecology, hydrology, and geophysics. I will also discuss recent developments of DREAM, including the development of a diagnostic model evaluation framework using likelihood free inference, and the use of dimensionality reduction techniques for calibration of CPUintensive system models. Practical examples are used from many different fields of study. 
May 2
Bren Hall 4011 1 pm 
Jeffrey Mark Siskind
Associate Professor Department of Electrical & Computer Engineering Purdue University
Humans can describe observations and act upon requests. This requires that language be grounded in perception and motor control. I will present several components of my longterm research program to understand the visionlanguagemotor interface in the human brain and emulate such on computers.
In the first half of the talk, I will present fMRI investigation of the visionlanguage interface in the human brain. Subjects were presented with stimuli in different modalities—spoken sentences, textual presentation of sentences, and video clips depicting activity that can be described by sentences—while undergoing fMRI. The scan data is analyzed to allow readout of individual constituent concepts and words—people/names, objects/nouns, actions/verbs, and spatialrelations/prepositions—as well as phrases and entire sentences. This can be done across subjects and across modality; we use classifiers trained on scan data for one subject to read out from another subject and use classifiers trained on scan data for one modality, say text, to read out from scans of another modality, say video or speech. Analysis of this indicates that the brain regions involved in processing the different kinds of constituents are largely disjoint but also largely shared across subjects and modality. Further, we can determine the predication relations; when the stimuli depict multiple people, objects, and actions, we can read out which people are performing which actions with which objects. This points to a compositional mental semantic representation common across subjects and modalities. In the second half of the talk, I will use this work to motivate the development of three computational systems. First, I will present a system that can use sentential description of human interaction with previously unseen objects in video to automatically find and track those objects. This is done without any annotation of the objects and without any pretrained object detectors. Second, I will present a system that learns the meanings of nouns and prepositions from video and tracks of a mobile robot navigating through its environment paired with sentential descriptions of such activity. Such a learned language model then supports both generation of sentential description of new paths driven in new environments as well as automatic driving of paths to satisfy navigational instructions specified with new sentences in new environments. Third, I will present a system that can play a physically grounded game of checkers using vision to determine game state and robotic arms to change the game state by reading the game rules from naturallanguage instructions. Joint work with Andrei Barbu, Daniel Paul Barrett, Charles Roger Bradley, Seth Benjamin Scott Alan Bronikowski, Zachary Burchill, Wei Chen, N. Siddharth, Caiming Xiong, Haonan Yu, Jason J. Corso, Christiane D. Fellbaum, Catherine Hanson, Stephen Jose Hanson, Sebastien Helie, Evguenia Malaia, Barak A. Pearlmutter, Thomas Michael Talavage, and Ronnie B. Wilbur. Bio: Jeffrey M. Siskind received the B.A. degree in computer science from the Technion, Israel Institute of Technology, Haifa, in 1979, the S.M. degree in computer science from the Massachusetts Institute of Technology (M.I.T.), Cambridge, in 1989, and the Ph.D. degree in computer science from M.I.T. in 1992. He did a postdoctoral fellowship at the University of Pennsylvania Institute for Research in Cognitive Science from 1992 to 1993. He was an assistant professor at the University of Toronto Department of Computer Science from 1993 to 1995, a senior lecturer at the Technion Department of Electrical Engineering in 1996, a visiting assistant professor at the University of Vermont Department of Computer Science and Electrical Engineering from 1996 to 1997, and a research scientist at NEC Research Institute, Inc. from 1997 to 2001. He joined the Purdue University School of Electrical and Computer Engineering in 2002 where he is currently an associate professor. His research interests include computer vision, robotics, artificial intelligence, neuroscience, cognitive science, computational linguistics, child language acquisition, automatic differentiation, and programming languages and compilers. 
May 9
Bren Hall 4011 1 pm 
Circadian rhythms date back to the origins of life, are found in virtually every species and every cell, and play fundamental roles in functions ranging from metabolism to cognition. Modern highthroughput technologies allow the measurement of concentrations of transcripts, metabolites, and other species along the circadian cycle creating novel computational challenges and opportunities, including the problems of inferring whether a given species oscillate in circadian fashion or not, and inferring the time at which a set of measurements was taken. Due to the expensive process of taking these measurements, inferring whether a given species oscillate in circadian fashion has proven to be a challenge. The sparse data with only a few replicates makes many existing methods unreliable. In addition, many differential gene expression experiments–such as those contained in the GEO repository, have been carried at single time points without taking into account circadian oscillations which can act as confounding factors. To solve these problems we introduce two deep learning methods: BIO_CYCLE and BIO_CLOCK. BIO_CYCLE takes advantage of synthetic data to determine whether or not a signal oscillates in a circadian fashion, and infer periods, amplitudes, and phases. BIO_CLOCK, using a specialized cost function and realworld data, imputes the time at which a sample was taken, from the corresponding gene expression measurements. These tools are a necessary step forward to better understand circadian rhythms at the molecular level and their applications to precision medicine. 
May 16
Bren Hall 4011 1 pm 
Aparna Chandramowlishwaran
Assistant Professor Department of Electrical Engineering University of California, Irvine
In this talk, I’ll present my group’s work on addressing two key challenges in developing parallel algorithms and software for the class of Nbody problems on current and future platforms. The first challenge is reducing the apparent gap in performance between code generated from highlevel forms and that of handtuned code, which we address using extensive characterization of the optimization space for these computations and automating the process through domain specific code generators. These applicationspecific compilers provide the domain scientists the ability to productively harness the power of these large machines and to enable largescale scientific simulations and big data applications. The second challenge is analyzing and designing algorithms. We are entering the era of exascale. The number of cores are growing at a much faster rate than bandwidth per node. What implications does this trend have in designing algorithms for future systems? If we were to model computation and communication costs, what inferences can we derive from such a model for the time to execute an algorithm? Our model suggests a new kind of high level analytical codesign of the algorithm and architecture and similar analysis can be applied in designing algorithms in general. 
May 23
Bren Hall 4011 1 pm 
Divijotham Krishnamurthy
Postdoctoral Fellow Center for Mathematics of Information California Institute of Technology
Several problems arising in the design, analysis and efficient operation of power systems are naturally posed as graphstructured optimization problems. Due to the nonlinear nature of the physical equations describing the power grid, these problems are often nonconvex and NPhard. However, practical instances of several graphstructured optimization problems have been solved successfully in the graphical models literature by exploiting graph structure and using messagepassing or belief propagation techniques. In this work, we show that a similar approach can be successfully applied to power systems, leading to theoretically and practically efficient algorithms. I will discuss two applications in detail: a) Solving mixedinteger optimal power flow problems on distribution networks, and b) Detecting and mitigating market manipulation by aggregators of renewable generation in a distributionlevel market. I will also discuss possible extensions of these approaches to other power system/infrastructure network problems. Based on joint work with Misha Chertkov, Sidhant Misra, Marc Vuffray, Pascal Van Hentenryck, Niangjun Chen, Navid Azizan Ruhi and Adam Wierman. 
May 30

No Seminar (Memorial Day)
