Weekly Seminar in AI & Machine Learning
Sponsored by Cylance
Sep 22
Bren Hall 4011 1 pm 
Burr Settles
Duolingo
Duolingo is a language education platform that teaches 20 languages to more than 150 million students worldwide. Our free flagship learning app is the \#1 way to learn a language online, and is the mostdownloaded education app for both Android and iOS devices. In this talk, I will describe the Duolingo system and several of our empirical research projects to date, which combine machine learning with computational linguistics and psychometrics to improve learning, engagement, and even language proficiency assessment through our products. 
Sep 26
Bren Hall 4011 1 pm 
Convolutional Neural Net (CNN) architectures have terrific recognition performance but rely on spatial pooling which makes it difficult to adapt them to tasks that require dense, pixelaccurate labeling. We make two contributions to solving this problem: (1) We demonstrate that while the apparent spatial resolution of convolutional feature maps is low, the highdimensional feature representation contains significant subpixel localization information. (2) We describe a multiresolution reconstruction architecture based on a Laplacian pyramid that uses skip connections from higher resolution feature maps and multiplicative gating to successively refine segment boundaries reconstructed from lowerresolution maps. This approach yields stateoftheart semantic segmentation results on the PASCAL VOC and Cityscapes segmentation benchmarks without resorting to more complex randomfield inference or instance detection driven architectures. 
Oct 3
Bren Hall 4011 1 pm 
Despite the rapid development of computer graphics during the recent years, complex materials such as fabrics, fur, and human hair remain largely lacking in the virtual worlds. This is due to both the lack of highfidelity data and the inability to efficiently describe these complicated objects via mathematical/statistical models.
In this talk, I will present my research that introduces new means to acquire, model, and render complex materials that are essential to our daily lives with a focus on fabrics. Leveraging detailed geometric information and sophisticated optical model, our work has led to computer generated imagery with a new level of accuracy and fidelity. In particular, we measure realworld samples using volume imaging (e.g., computed microtomography) to obtain detailed datasets on their microgeometries. We then fit sophisticated statistical models to the measured data, yielding highly compact yet realistic representations. Lastly, we show how to recover a sample’s optical properties (e.g., colors) using optimization. 
Oct 10

No Seminar (Columbus Day)

Oct 17
Bren Hall 4011 1 pm 
Stefano Ermon
Assistant Professor of Computer Science Fellow of the Woods Institute for the Environment Stanford University
Recent technological developments are creating new spatiotemporal data streams that contain a wealth of information relevant to sustainable development goals. Modern AI techniques have the potential to yield accurate, inexpensive, and highly scalable models to inform research and policy. As a first example, I will present a machine learning method we developed to predict and map poverty in developing countries. Our method can reliably predict economic wellbeing using only highresolution satellite imagery. Because images are passively collected in every corner of the world, our method can provide timely and accurate measurements in a very scalable end economic way, and could revolutionize efforts towards global poverty eradication. As a second example, I will present some ongoing work on monitoring agricultural and food security outcomes from space. 
Oct 24

No Seminar (cancelled)

Oct 31
Bren Hall 4011 1 pm 
This talks explores recent uses of machine learning to large proprietary consumer transaction datasets. These are datasets which record barcode level transaction information on individual items purchased grouped by shopping trip and customer. Recent innovations in data collection allow us to go beyond the supermarket scanner to collect such data and include recent efforts to digitize the universe of customers’ receipts across all channels from supermarkets to online purchases. Additionally, passive wifi tracking allows us to record search behavior in stores and model how it translates into sales. It also gives us the opportunity to create real time interventions to nudge consumer shopping behavior. We will explore some of the challenges of modeling consumer behavior using these data and discuss methods such as tensor decompositions for count data, discrete choice modeling with Dirichlet Process Mixtures, and the use of deep autoencoders for producing interpretable statistical hypotheses. 
Nov 7
Bren Hall 4011 1 pm 
This talk investigates the restricted Boltzmann machine (RBM), which is the building block for many deep probabilistic models. We propose an infinite RBM model, whose maximum likelihood estimation corresponds to a constrained convex optimization. We consider the FrankWolfe algorithm to solve the program, which provides a sparse solution that can be interpreted as inserting a hidden unit at each iteration. As a side benefit, this can be used to easily and efficiently identify an appropriate number of hidden units during the optimization. We also investigate different learning algorithms for conditional RBMs. There is a pervasive opinion that loopy belief propagation does not work well on RBMbased models, especially for learning. We demonstrate that, in the conditional setting, learning RBMbased models with belief propagation and its variants can provide much better results than the stateoftheart contrastive divergence algorithms. 
Nov 14
Bren Hall 4011 1 pm 
Traditionally, the field of computational Bayesian statistics has been divided into two main subfields: variational inference and Markov chain Monte Carlo (MCMC). In recent years, however, several methods have been proposed based on combining variational Bayesian inference and MCMC simulation in order to improve their overall accuracy and computational efficiency. This marriage of fast evaluation and flexible approximation provides a promising means of designing scalable Bayesian inference methods. In this work, we explore the possibility of incorporating variational approximation into a stateoftheart MCMC method, Hamiltonian Monte Carlo (HMC), to reduce the required expensive computation involved in the sampling procedure, which is the bottleneck for many applications of HMC in big data problems. To this end, we exploit the regularity in parameter space to construct a freeform approximation of the target distribution by a fast and flexible surrogate function using an optimized additive model of proper random basis. The surrogate provides sufficiently accurate approximation while allowing for fast computation, resulting in an efficient approximate inference algorithm. We demonstrate the advantages of our method on both synthetic and real data problems. 
Nov 16
Bren Hall 4011 4pm 
Arindam Banerjee
Associate Professor Department of Computer Science and Engineering University of Minnesota
Many machine learning problems, especially scientific problems in areas such as ecology, climate science, and brain sciences, operate in the socalled `low samples, high dimensions’ regime. Such problems typically have numerous possible predictors or features, but the number of training examples is small, often much smaller than the number of features. In this talk, we will discuss recent advances in general formulations and estimators for such problems. These formulations generalize prior work such as the Lasso and the Dantzig selector. We will discuss the geometry underlying such formulations, and how the geometry helps in establishing finite sample properties of the estimators. We will also discuss applications of such results in structure learning in probabilistic graphical models, along with real world applications in ecology and climate science.
This is joint work with Soumyadeep Chatterjee, Sheng Chen, Farideh Fazayeli, Andre Goncalves, Jens Kattge, Igor Melnyk, Peter Reich, Franziska Schrodt, Hanhuai Shan, and Vidyashankar Sivakumar. 
Nov 21
Bren Hall 4011 1 pm 
Stein’s method provides a remarkable theoretical tool in probability theory but has not been widely known or used in practical machine learning. In this talk, we try to bright this gap and show that some of the key ideas of Stein’s method can be naturally combined with practical machine learning and probabilistic inference techniques such as kernel method, variational inference and variance reduction, which together form a new general framework for deriving new algorithms for handling the kind of highly complex, structured probabilistic models widely used in modern (deep) machine learning. The new algorithms derived in this way often have a simple, untraditional form and have significant advantages over the traditional methods. I will show several applications, including goodnessoffit tests for evaluating models without knowing the normalization constants, scalable Bayesian inference that combines the advantages of variational inference, Monte Carlo and gradientbased optimization, and approximate maximum likelihood training of deep generative models that can generate realisticlooking images. 
Nov 28
Bren Hall 4011 1 pm 
We develop upper and lower bounds for the probability of Boolean functions by treating multiple occurrences of variables as independent and assigning them new individual probabilities. We call this approach “dissociation” and give an exact characterization of optimal oblivious bounds, i.e. when the new probabilities are chosen independent of the probabilities of all other variables.
Our motivation comes from the weighted model counting problem (or, equivalently, the problem of computing the probability of a Boolean function), which is \#Phard in general. By performing several dissociations, one can transform a Boolean formula whose probability is difficult to compute, into one whose probability is easy to compute, and which is guaranteed to provide an upper or lower bound on the probability of the original formula by choosing appropriate probabilities for the dissociated variables. Our new bounds shed light on the connection between previous relaxationbased and modelbased approximations and unify them as concrete choices in a larger design space. We also show how our theory allows a standard relational database management system to both upper and lower bound hard probabilistic queries in guaranteed polynomial time. (Based on joint work with Dan Suciu from TODS 2014, VLDB 2015, and VLDBJ 2016: http://arxiv.org/pdf/1409.6052,http://arxiv.org/pdf/1412.1069, http://arxiv.org/pdf/1310.6257) 
Dec 5

No Seminar
Finals Week
