Winter 2021

Standard

Live Stream for all Winter 2021 CML Seminars

Jan. 4	No Seminar
Jan. 11 Live Stream 1 pm	Florian Wenzel Postdoctoral Researcher Google Brain Berlin YouTube Stream: https://youtu.be/9n8_5tjt_Lw Towards Reliable Deep Learning Deep learning models are bad at detecting their failure. They tend to make over-confident mistakes, especially, under distribution shift. Making deep learning more reliable is important in safety-critical applications including health care, self-driving cars, and recommender systems. We discuss two approaches to reliable deep learning. First, we will focus on Bayesian neural networks that come with many promises to improved uncertainty estimation. However, why are they rarely used in industrial practice? In this talk, we will cast doubt on the current understanding of Bayes posteriors in deep networks. We show that Bayesian neural networks can be improved significantly through the use of a “cold posterior” that overcounts evidence and hence sharply deviates from the Bayesian paradigm. We will discuss several hypotheses that could explain cold posteriors. In the second part, we will discuss a classical approach to more robust predictions: ensembles. Deep ensembles combine the predictions of models trained from different initializations. We will show that the diversity of predictions can be improved by considering models with different hyperparameters. Finally, we present an efficient method that leverages hyperparameter diversity within a single model. Bio: Florian Wenzel is a machine learning researcher who is currently on the job market. His research has focused on probabilistic deep learning, uncertainty estimation, and scalable inference methods. From October 2019 to October 2020 he was a postdoctoral researcher at Google Brain. He received his PhD from Humboldt University in Berlin and worked with Marius Kloft, Stephan Mandt, and Manfred Opper.
Jan. 18	No Seminar (Martin Luther King, Jr. Holiday)
Jan. 25 Live Stream 1 pm	Yezhou Yang Assistant Professor School of Computing, Informatics, and Decision Systems Engineering Arizona State University YouTube Stream: https://youtu.be/IcSUBZraB3s Visual Recognition beyond Appearances, and its Robotic Applications The goal of Computer Vision, as coined by Marr, is to develop algorithms to answer What are Where at When from visual appearance. The speaker, among others, recognizes the importance of studying underlying entities and relations beyond visual appearance, following an Active Perception paradigm. This talk will present the speaker’s efforts over the last decade, ranging from 1) reasoning beyond appearance for visual question answering, image understanding and video captioning tasks, through 2) temporal knowledge distillation with incremental knowledge transfer, till 3) their roles in a Robotic visual learning framework via a Robotic Indoor Object Search task. The talk will also feature the Active Perception Group (APG)’s ongoing projects (NSF RI, NRI and CPS, DARPA KAIROS, and Arizona IAM) addressing emerging challenges of the nation in autonomous driving, AI security and healthcare domains, at the ASU School of Computing, Informatics, and Decision Systems Engineering (CIDSE). Bio: Yezhou Yang is an Assistant Professor at School of Computing, Informatics, and Decision Systems Engineering, Arizona State University. He is directing the ASU Active Perception Group. His primary interests lie in Cognitive Robotics, Computer Vision, and Robot Vision, especially exploring visual primitives in human action understanding from visual input, grounding them by natural language as well as high-level reasoning over the primitives for intelligent robots. Before joining ASU, Dr. Yang was a Postdoctoral Research Associate at the Computer Vision Lab and the Perception and Robotics Lab, with the University of Maryland Institute for Advanced Computer Studies. He is a recipient of Qualcomm Innovation Fellowship 2011, the NSF CAREER award 2018 and the Amazon AWS Machine Learning Research Award 2019. He receives his Ph.D. from University of Maryland at College Park, and B.E. from Zhejiang University, China.
Feb. 1 Live Stream 1 pm	Joe Marino PhD Student Computation and Neural Systems California Institute of Technology YouTube Stream: https://youtu.be/iVz6uwD7i6A Connecting Variational Autoencoders Back to the Brain Unsupervised machine learning has recently dramatically improved our ability to model and extract structure from data. One such approach is deep latent variable models, which includes variational autoencoders (VAEs) [Kingma & Welling, 2014; Rezende et al., 2014]. These models can be traced back to the Helmholtz machine [Dayan et al., 1995], which, in turn, was inspired by ideas from theoretical neuroscience [Mumford, 1992]. In the intervening years, neuroscientists have further developed these ideas into a popular theory: predictive coding [Rao & Ballard, 1999; Friston, 2005]. Yet, the machine learning community remains largely unaware of these connections. In this talk, I discuss the links between modern deep latent variable models and predictive coding, yielding several striking implications for the correspondences between machine learning and neuroscience. This motivates a more nuanced view in connecting these fields, including the search for backpropagation in the brain. Bio: Joe Marino is a PhD candidate in the Computation & Neural Systems program at Caltech, advised by Yisong Yue. His work focuses on improving probabilistic models and inference techniques, using neuroscience-inspired ideas, within the areas of generative modeling and reinforcement learning.
Feb. 8 Live Stream 1 pm	Junkyu Lee AI Planning Group IBM Research YouTube Stream: https://youtu.be/p7X-L1T9ULk Decomposition Bounds for Influence Diagrams Influence diagrams (IDs) extend Bayesian networks with decision variables and utility functions to model the interaction between an agent and a system to capture the preferences. The standard task in IDs is to compute the maximum expected utility (MEU) over the influence diagram and optimal policies. However, it is the most challenging task in graphical models. Therefore, computing upper bounds on the MEU is desirable because upper bounds can facilitate anytime-solutions by acting as heuristics to guide search or sampling-based methods. In this talk, I will present bounding schemes for solving IDs. The first approach builds on top of the tree decomposition scheme in probabilistic graphical models and extends variational decomposition bounds in marginal MAP. The second approach is a new tree decomposition method called submodel tree decomposition. The empirical evaluation results show that presented bounding schemes generate upper bounds that are orders of magnitude tighter than previous methods. Finally, I will conclude the talk with future directions. Bio: Junkyu Lee received his Ph.D. from the CS department at UC Irvine, where Rina Dechter supervised him. Currently, he is a resident at the IBM Research AI planning group. His research focuses on graphical model inference and heuristic search for sequential decision making under uncertainty. He is also broadly interested in related areas such as planning and reinforcement learning.
Feb. 15	No Seminar (Presidents’ Holiday)
Feb. 22	No Seminar
March 1 Live Stream 1 pm	Robert Logan PhD Student Department of Computer Science University of California, Irvine YouTube Stream: https://youtu.be/Mim1pmEn1UU Fill in the _____: Prompt-Based Solutions for NLP Recent progress in natural language processing (NLP) has been predominantly driven by the advent of large neural language models (e.g., GPT-2 and BERT) that are “pretrained” using a self-supervised learning objective on billions of tokens of text before being “finetuned” (i.e., transferred) to downstream tasks. The exceptional success of these models has motivated many NLP researchers to study what exactly these models are learning during pretraining that causes them to be more successful than their non-self-supervised counterparts. In this talk, we will describe the technique of prompting, an approach that answers this question by reformulating tasks as fill-in-the-blanks questions. We will begin by showing how prompts can be used to measure the amount of factual, linguistic, and task-specific knowledge contained in language models. We will then introduce an approach for automatically constructing prompts based on gradient-guided search that provides a scalable alternative to manually writing prompts by hand. Lastly, we will cover our ongoing work investigating whether prompting can be used as a replacement for finetuning of language models, describing some early results that demonstrate that prompting can indeed be more effective in few-shot learning scenarios while being substantially more parameter efficient. Bio: Robert L. Logan IV is a 4th year PhD Candidate at UC Irvine, co-advised by Sameer Singh and Padhraic Smyth. His research focuses on leveraging external knowledge sources to measure and improve NLP models’ ability to reason with factual and common sense knowledge. He was selected as a Noyce Fellow and has been awarded the 2020 Rose Hills Foundation Scholarship. Robert received his B.A. in mathematics at the University of California, Santa Cruz, and has held research positions at Google and Diffbot.
March 8	No Seminar
March 15	Finals Week

Fall 2020

Standard

Live Stream for all Fall 2020 CML Seminars

Oct 5	No Seminar
Oct 12 Live Stream 1 pm	Forest Agostinelli Assistant Professor Computer Science and Engineering University of South Carolina YouTube Stream: https://youtu.be/shwYW9yEAIQ From Combination Puzzles to the Natural Sciences Combination puzzles, such as the Rubik’s cube, pose unique challenges for artificial intelligence. Furthermore, solutions to such puzzles are directly linked to problems in the natural sciences. In this talk, I will present DeepCubeA, a deep reinforcement learning and search algorithm that can solve the Rubik’s cube, and six other puzzles, without domain specific knowledge. Next, I will discuss how solving combination puzzles opens up new possibilities for solving problems in the natural sciences. Finally, I will show how problems we encounter in the natural sciences motivate future research directions in areas such as theorem proving and education. A demonstration of our work can be seen at http://deepcube.igb.uci.edu/. Bio: Forest Agostinelli is an assistant professor at the University of South Carolina. He received his B.S. from the Ohio State University, his M.S. from the University of Michigan, and his Ph.D. from UC, Irvine under Professor Pierre Baldi. His research interests include deep learning, reinforcement learning, search, bioinformatics, neuroscience, and chemistry.
Oct 19 Live Stream 1 pm	Stephan Mandt Assistant Professor Dept. of Computer Science University of California, Irvine YouTube Stream: https://youtu.be/Z8juQKrCkmk Compressing Variational Bayes Neural image compression algorithms have recently outperformed their classical counterparts in rate-distortion performance and show great potential to also revolutionize video coding. In this talk, I will show how innovations from Bayesian machine learning and generative modeling can lead to dramatic performance improvements in compression. In particular, I will explain how sequential variational autoencoders can be converted into video codecs, how deep latent variable models can be compressed in post-processing with variable bitrates, and how iterative amortized inference can be used to achieve the world record in image compression performance. Bio: Stephan Mandt is an Assistant Professor of Computer Science at the University of California, Irvine. From 2016 until 2018, he was a Senior Researcher and Head of the statistical machine learning group at Disney Research, first in Pittsburgh and later in Los Angeles. He held previous postdoctoral positions at Columbia University and Princeton University. Stephan holds a Ph.D. in Theoretical Physics from the University of Cologne. He is a Fellow of the German National Merit Foundation, a Kavli Fellow of the U.S. National Academy of Sciences, and was a visiting researcher at Google Brain. Stephan regularly serves as an Area Chair for NeurIPS, ICML, AAAI, and ICLR, and is a member of the Editorial Board of JMLR. His research is currently supported by NSF, DARPA, Intel, and Qualcomm.
Oct 26 Live Stream 1 pm	Christoph Lippert Professor Hasso Plattner Institute University of Potsdam YouTube Stream: https://youtu.be/zElgAKf4AhE GWAS of Images using Deep Transfer Learning At the Chair of Digital Health & Machine Learning, we are developing methods for the statistical analysis of large biomedical data. In particular imaging provides a powerful means for measuring phenotypic information at scale. While images are abundantly available in large repositories such as the UK Biobank, the analysis of imaging data poses new challenges for statistical methods development. In this talk, I will give an overview over some of our current efforts in using deep representation learning as a non-parametric way to model imaging phenotypes and for associating images to the genome. References: Kirchler, M., Khorasani, S., Kloft, M., & Lippert, C. (2020, June). Two-sample testing using deep learning. In International Conference on Artificial Intelligence and Statistics (pp. 1387-1398). PMLR. Kirchler, M., Konigroski, S., Schurmann, C., Norden, M., Meltendorf, C., Kloft, M., Lippert, C. transferGWAS: GWAS of images using deep transfer learning. Manuscript in preparation. Bio: Lippert studied bioinformatics from 2001–2008 in Munich and went on to earn his doctorate at the Max Planck Institutes for Intelligent Systems and for Developmental Biology in Tübingen in machine learning bioinformatics, with an emphasis on methods for genome-associated studies. In 2012, he accepted a Researcher position at Microsoft Research in Los Angeles and subsequently carried out work at Human Longevity, Inc. in Mountain View. In 2017, Lippert returned to Germany to head the research group “Statistical Genomics” at the Max Delbrück Center for Molecular Medicine in Berlin. In 2018, Lippert has been appointed Full Professor of “Digital Health & Machine Learning” in the joint Digital Engineering Faculty of the Hasso Plattner Institute and the University of Potsdam.
Nov 2 Live Stream 1 pm	Cory Scott PhD Student Dept. of Computer Science University of California, Irvine YouTube Stream: https://youtu.be/CpGfCA92rMw Efficient Learning of Cytoskeletal Dynamics with Multiscale Machine Learning and Optimized Projection Operators Microtubules are a primary constituent of the dynamic cytoskeleton in living cells, involved in many cellular processes whose study would benefit from scalable dynamic computational models. We define a novel machine learning model which aggregates information across multiple spatial scales to predict energy potentials measured from a simulation of a section of microtubule. Using projection operators which optimize an objective function related to the diffusion kernel of a graph, we sum information from local neighborhoods. This process is repeated recursively until the coarsest scale, and all scales are separately used as the input to a Graph Convolutional Network, forming our novel architecture: the Graph Prolongation Convolutional Network (GPCN). The GPCN outputs a prediction for each spatial scale, and these are combined using the inverse of the optimized projections. This fine-to-coarse mapping, and its inverse, create a model which is able to learn to predict energetic potentials more efficiently than other GCN ensembles which do not leverage multiscale information. We also compare the effect of training this ensemble in a coarse-to-fine fashion, and find that schedules adapted from the Algebraic Multigrid (AMG) literature further increase this efficiency. Since forces are derivatives of energies, we discuss the implications of this type of model for machine learning of multiscale molecular dynamics. Reference: C.B. Scott and Eric Mjolsness. “Graph Prolongation Convolutional Networks: Explicitly Multiscale Machine Learning on Graphs with Applications to Modeling of Cytoskeleton”. In: Machine Learning: Science and Technology (2020). DOI: https://iopscience.iop.org/article/10.1088/2632-2153/abb6d2
Nov 9 Live Stream 1 pm	Lukas Ruff PhD Student Electrical Engineering and Computer Science TU Berlin YouTube Stream: https://youtu.be/Uncc5y7g8Is Recent Advances in Anomaly Detection Anomaly detection is the problem of identifying unusual observations in data. This problem is usually unsupervised and occurs in numerous applications such as industrial fault and damage detection, fraud detection in finance and insurance, intrusion detection in cybersecurity, scientific discovery, or medical diagnosis and disease detection. Many of these applications involve complex data such as images, text, graphs, or biological sequences, that is continually growing in size. This has sparked a great interest in developing deep learning approaches to anomaly detection. In this talk, my aim is to provide a systematic and unifying overview of deep anomaly detection methods. We will discuss methods based on reconstruction, generative modeling, and one-class classification, where we identify common underlying principles and draw connections between traditional ‘shallow’ and novel deep methods. Furthermore, we will cover recent developments that include weakly and self-supervised approaches as well as techniques for explaining models that enable to reveal ‘Clever Hans’ detectors. Finally, I will conclude the talk by highlighting some open challenges and potential paths for future research. Bio: Lukas Ruff is a third year PhD student in the Machine Learning Group headed by Klaus-Robert Müller at TU Berlin. His research covers robust and trustworthy machine learning, with a specific focus on deep anomaly detection. Lukas received a B.Sc. degree in Mathematical Finance from the University of Konstanz in 2015 and a joint M.Sc. degree in Statistics from HU, TU and FU Berlin in 2017.
Nov 16 Live Stream 1 pm	Karem Sakallah Professor Electrical Engineering and Computer Science University of Michigan YouTube Stream: https://youtu.be/5A5dTRo50EQ Accidental Research: Scalable Algorithms for Boolean Satisfiability and Graph Automorphism Accidental research is when you’re an expert in some domain and seek to solve problem A in that domain. You soon discover that to solve A you need to also solve B which, however, comes from a domain in which you have little, or even no, expertise. You, thus, explore existing solutions to B but are disappointed to find that they just aren’t up to the task of solving A. Your options at this point are a) to abandon this futile project, or b) to try and find a solution to B that will help you solve A. While this might seem like a fool’s errand, you have the advantage over B experts of being unencumbered by their experience. You are a novice who does not, yet, appreciate the complexity of B, but are able to explore it from a fresh perspective. You also bring along expertise from your own domain to connect what you know with what you hope to learn. If you’re lucky, you may succeed in finding a solution to B that helps you solve A. I will relate two cases in which this scenario played out: developing the GRASP conflict-driven clause-learning SAT solver in the context of performing timing analysis of very large scale integrated circuits, and developing the saucy graph automorphism program to find and break symmetries in large SAT problems. Ironically, in both cases solving problem B (GRASP, saucy) turned out to be much more impactful than solving problem A (timing analysis, breaking symmetries.) Without the trigger of problem A, however, neither GRASP nor saucy would have been conceived. Bio: Karem A. Sakallah is a Professor of Electrical Engineering and Computer Science at the University of Michigan. He received the B.E. degree in electrical engineering from the American University of Beirut and the M.S. and Ph.D. degrees in electrical and computer engineering from Carnegie Mellon University. Prior to joining the University of Michigan, he headed the Analysis and Simulation Advanced Development Team at Digital Equipment Corporation. Besides his academic duties, he has served in a variety of professional roles including the establishment of a computing research institute in Qatar for which he took a leave to serve a term of three years as the Chief Scientist. His current research is focused on automating the formal verification of hardware, software, and distributed protocols. He is a fellow of the IEEE and the ACM and a co-recipient of the prestigious Computer-Aided Verification Award for “Fundamental contributions to the development of high-performance Boolean satisfiability solvers.”
Nov 23 Live Stream 1 pm	Ioannis Panageas Assistant Professor Dept. of Computer Science University of California, Irvine YouTube Stream: https://youtu.be/4cepfWDiL3A First-order Methods Almost Always Avoid Saddle Points In this talk we will give an overview of some results on the limiting behavior of first-order methods. In particular we will show that typical instantiations of first-order methods like gradient descent, coordinate descent, etc. avoid saddle points for almost all initializations. Moreover, we will provide applications of these results on Non-negative Matrix Factorization. The takeaway message is that such algorithms can be studied from a dynamical systems perspective in which appropriate instantiations of the Stable Manifold Theorem allow for a global stability analysis. Bio: Ioannis is an Assistant Professor of Computer Science at UCI. He is interested in the theory of computation, machine learning and its interface with non-convex optimization, dynamical systems, probability and statistics. Before joining UCI, he was an Assistant Professor at Singapore University of Technology and Design. Prior to that he was a MIT postdoctoral fellow working with Constantinos Daskalakis. He received his PhD in Algorithms, Combinatorics and Optimization from Georgia Tech in 2016, a Diploma in EECS from National Technical University of Athens, and a M.Sc. in Mathematics from Georgia Tech. He is the recipient of the 2019 NRF fellowship for AI.
Nov 30 Live Stream 1 pm	Deqing Sun Senior Research Scientist Google YouTube Stream: https://youtu.be/N3y_K1ewkL0 Learning Optical Flow: From MRFs to CNNs Optical flow provides important motion information about the dynamic world and is of fundamental importance to many tasks. Like other visual inference problems, it is critical to choose the representation to encode both the forward formation process and the prior knowledge of optical flow. In this talk, I will present my work on two different optical flow representations in the past decade. First, I will describe learning Markov random field (MRF) models and defining non-local conditional random field (CRF) models to recover motion boundaries. Second, I will talk about combining domain knowledge of optical flow with convolutional neural networks (CNNs) to develop a compact and effective model and some recent developments. Bio: Deqing Sun is a senior research scientist at Google working on computer vision and machine learning. He received a Ph.D. degree in Computer Science from Brown University. He is a recipient of the PAMI Young Researcher award in 2020, the Longuet-Higgins prize at CVPR 2020, the best paper honorable mention award at CVPR 2018, and the first prize in the robust optical flow competition at CVPR 2018 and ECCV 2020. He served as an area chair for CVPR/ECCV/BMVC, and co-organized several workshops/tutorials at CVPR, ECCV, and SIGGRAPH.
Dec 7	No Seminar (NeurIPS Conference)
Dec 14	Finals week

Winter 2020

Standard

Spring 2020 Seminars Delayed

Following UCI guidance to limit social interactions during the COVID-19 outbreak, our CML seminar series is cancelled for the start of spring quarter. We hope to rejoin you later this year.

Jan. 6	No Seminar
Jan. 13 4011 Bren Hall 1 pm	Michael Campbell Eureka (SAP) Speculative and hedging interaction model in oil and U.S. dollar markets: Long-term investor dynamics and phases We develop the rational dynamics for the long-term investor among boundedly rational speculators in the Carfì-Musolino speculative and hedging model. Numerical evidence is given that indicates there are various phases determined by the degree of non-rational behavior of speculators. The dynamics are shown to be influenced by speculator “noise”. This model has two types of operators: a real economic subject (Air, a long-term trader) and one or more investment banks (Bank, short-term speculators). It also has two markets: oil spot market and U.S. dollar futures. Bank agents react to Air and equilibrate much more quickly than Air, thus we consider rational, best-local-response dynamics for Air based on averaged values of equilibrated Bank variables. The averaged Bank variables are effectively parameters for Air dynamics that depend on deviations-from-rationality (temperature) and Air investment (external field). At zero field, below a critical temperature, there is a phase transition in the speculator system which creates two equilibriums for bank variables, hence in this regime the parameters for the dynamics of the long-term investor Air can undergo a rapid change, which is exactly what happens in the study of quenched dynamics for physical systems. It is also shown that large changes in strategy by the long-term Air investor are always preceded by diverging spatial volatility of Bank speculators. The phases resemble those for unemployment in the “Mark 0” macroeconomic model.
Jan. 20	Martin Luther King Junior Day
Jan. 27	No Seminar
Feb. 3 4011 Bren Hall 1 pm	Phanwadee Sinthong Computer Science University of California, Irvine AFrame: Extending DataFrames for Large-Scale Modern Data Analysis Analyzing the increasingly large volumes of data that are available today, possibly including the application of custom machine learning models, requires the utilization of distributed frameworks. This can result in serious productivity issues for “normal” data scientists. We introduce AFrame, a new scalable data analysis package powered by a Big Data management system that extends the data scientists’ familiar DataFrame operations to efficiently operate on managed data at scale. AFrame is implemented as a layer on top of Apache AsterixDB, transparently scaling out the execution of DataFrame operations and machine learning model invocation through a parallel, shared-nothing big data management system. AFrame allows users to interact with a very large volume of semi-structured data in the same way that Pandas DataFrames work against locally stored tabular data. Our AFrame prototype leverages lazy evaluation. AFrame operations are incrementally translated into AsterixDB SQL++ queries that are executed only when final results are called for. In order to evaluate our proposed approach, we also introduce an extensible micro-benchmark for use in evaluating DataFrame performance in both single-node and distributed settings via a collection of representative analytic operations. Bio: Phanwadee (Gift) Sinthong is a fourth-year Ph.D. student in the CS Department at UC Irvine, advised by Professor Michael Carey. Her research interests are broadly in data management and distributed computation. Her current project is to deliver a scale-independent data science platform by incorporating database management capabilities with existing data science technologies to help support and enhance big data analysis.
Feb. 10 4011 Bren Hall 1 pm	Mingzhang Yin Statistics and Data Sciences University of Texas, Austin Semi-Implicit Variational Inference Uncertainty estimation is one of the most unique features of biological systems, as we have to sense and act in noisy environments. In this talk, I will introduce semi-implicit variational inference (SIVI) as a new machine-learning framework to achieve accurate uncertainty estimation in general latent variable models. Semi-implicit distribution is introduced to expand the commonly used analytic variational family, by mixing the variational parameters with a highly flexible distribution. To cope with this new distribution family, a novel evidence lower bound is derived to achieve the accurate statistical inference. The theoretical properties of the proposed methods will be introduced from an information-theoretic perspective. With a substantially expanded variational family and a novel optimization algorithm, SIVI is shown to closely match the accuracy of MCMC in inferring the posterior while maintaining the merits of variational methods in a variety of Bayesian inference tasks. Bio: Mingzhang Yin is a fifth year Ph.D. student in statistics at UT Austin. His research centers around Bayesian methods and machine learning, with a focus on approximate inference and structured data modeling.
Feb. 17	Presidents’ Day
Feb. 24 4011 Bren Hall 1 pm	Jaan Altosaar Physics Department Princeton University Probabilistic modeling in support of science and decision-making: From statistical physics to recommender systems Applied machine learning relies on translating the structure of a problem into a computational model. This arises in applications as diverse as statistical physics and food recommendation. The pattern of connectivity in an undirected graphical model or the fact that datapoints in food recommendation are unordered collections of features can inform the structure of a model. First, consider undirected graphical models from statistical physics like the ubiquitous Ising model. Basic research in statistical physics requires accurate and scalable simulations for comparing the behavior of these models to their experimental counterparts. The Ising model consists of binary random variables with local connectivity; interactions between neighboring nodes can lead to long-range correlations. Modeling these correlations is necessary to capture physical phenomena such as phase transitions. To mirror the local structure of these models, we use flow-based convolutional generative models that can capture long-range correlations. Combining flow-based models designed for continuous variables with recent work on hierarchical variational approximations enables the modeling of discrete random variables. Compared to existing variational inference methods, this approach scales to statistical physics models with tens of thousands of correlated random variables and uses fewer parameters. Just as computational choices can be made by considering the structure of an undirected graphical model, model construction itself can be guided by the structure of individual datapoints. Consider a recommendation task where datapoints consist of unordered sets, and the objective is to maximize top-K recall, a common recommendation metric. Simple results show that a classifier with zero worst-case error achieves maximum top-K recall. Further, the unordered structure of the data suggests the use of a permutation-invariant classifier for statistical and computational efficiency. We evaluate this recommendation model on a dataset of 55k users logging 16M meals on a food tracking app, where every meal is an unordered collection of ingredients. On this data, permutation-invariant classifiers outperform probabilistic matrix factorization methods. Bio: Jaan Altosaar is a PhD Candidate in the Physics department at Princeton University where he is advised by David Blei and Shivaji Sondhi. He is a visiting academic at the Center for Data Science at New York University, where he works with Kyle Cranmer. His research focuses on machine learning methodology such as developing Bayesian deep learning techniques or variational inference methods for statistical physics. Prior to Princeton, Jaan earned his BSc in Mathematics and Physics from McGill University. He has interned at Google Brain and DeepMind, and his work has been supported by fellowships from the Natural Sciences and Engineering Research Council of Canada.
Mar. 2 6011 Bren Hall 1 pm	Oren Etzioni CEO, Allen Institute for Artificial Intelligence (AI2) Artificial Intelligence and the Future of Humanity Could we wake up one morning to find that AI is poised to take over the world? Is AI the technology of unfairness and bias? My talk will assess these concerns, and sketch a more optimistic view. We will have ample warning before the emergence of superintelligence, and in the meantime we have the opportunity to create Beneficial AI: (1) AI that mitigates bias rather than amplifying it. (2) AI that saves lives rather than taking them. (3) AI that helps us to solve humanity’s thorniest problems. My talk builds on work at the Allen Institute for AI, a non-profit research institute based in Seattle. Bio: Oren Etzioni launched the Allen Institute for AI, and has served as its CEO since 2014. He has been a Professor at the University of Washington’s Computer Science department since 1991, publishing papers that have garnered over 2,300 highly influential citations on Semantic Scholar. He is also the founder of several startups including Farecast (acquired by Microsoft in 2008).
Mar. 9 4011 Bren Hall 12 pm	Ioannis Panageas Singapore University of Technology and Design Depth-width trade-offs for ReLU networks via Sharkovsky's theorem Understanding the representational power of Deep Neural Networks (DNNs) and how their structural properties (e.g., depth, width, type of activation unit) affect the functions they can compute, has been an important yet challenging question in deep learning and approximation theory. In a seminal paper, Telgarsky highlighted the benefits of depth by presenting a family of functions (based on simple triangular waves) for which DNNs achieve zero classification error, whereas shallow networks with fewer than exponentially many nodes incur constant error. Even though Telgarsky’s work reveals the limitations of shallow neural networks, it does not inform us on why these functions are difficult to represent and in fact he states it as a tantalizing open question to characterize those functions that cannot be well-approximated by smaller depths. In this talk, we will point to a new connection between DNNs expressivity and Sharkovsky’s Theorem from dynamical systems, that enables us to characterize the depth-width trade-offs of ReLU networks for representing functions based on the presence of generalized notion of fixed points, called periodic points (a fixed point is a point of period 1). Motivated by our observation that the triangle waves used in Telgarsky’s work contain points of period 3 – a period that is special in that it implies chaotic behavior based on the celebrated result by Li-Yorke – we will give general lower bounds for the width needed to represent periodic functions as a function of the depth. Technically, the crux of our approach is based on an eigenvalue analysis of the dynamical system associated with such functions. Bio: Ioannis Panageas is an Assistant Professor at Information Systems Department of SUTD since September 2018. Prior to that he was a MIT postdoctoral fellow working with Constantinos Daskalakis. He received his PhD in Algorithms, Combinatorics and Optimization from Georgia Institute of Technology in 2016, a Diploma in EECS from National Technical University of Athens (summa cum laude) and a M.Sc. in Mathematics from Georgia Institute of Technology. His work lies on the intersection of optimization, probability, learning theory, dynamical systems and algorithms. He is the recipient of the 2019 NRF fellowship for AI (analogue of NSF CAREER award).
Mar. 16	Finals Week
Mar. 23	Spring Break
TBD 4011 Bren Hall	Qiang Ning Allen Institute for AI Understanding events in natural language: Learning, common sense, annotation, and what's next The era of information explosion has opened up an unprecedented opportunity to study the social, political, financial and medical events described in natural language text. While the past decades have seen significant progress in deep learning and natural language processing (NLP), it is still extremely difficult to analyze textual data at the event-level, e.g., to understand what is going on, what is the cause and impact, and how things will unfold over time. In this talk, I will mainly focus on a key component of event understanding: temporal relations. Understanding temporal relations is challenging due to the lack of explicit timestamps in natural language text, its strong dependence on background knowledge, and the difficulty of collecting high-quality annotations to train models. I will present a series of results addressing these problems from the perspective of structured learning, common sense knowledge acquisition, and data annotation. These efforts culminated in improving the state-of-the-art by approximately 20% in absolute F1. I will also discuss recent results on other aspects of event understanding and the incidental supervision paradigm. I will conclude my talk by describing my vision on future directions towards building next-generation event-based NLP techniques. Bio: Qiang Ning is a research scientist on the AllenNLP team at the Allen Institute for AI (AI2). Qiang received his Ph.D. in Dec. 2019 from the Department of Electrical and Computer Engineering at the University of Illinois at Urbana-Champaign (UIUC). He obtained his master’s degree in biomedical imaging from the same department in May 2016. Before coming to the United States, Qiang obtained two bachelor’s degrees from Tsinghua University in 2013, in Electronic Engineering and in Economics, respectively. He was an “Excellent Teacher Ranked by Their Students” across the university in 2017 (UIUC), a recipient of the YEE Fellowship in 2015, a finalist for the best paper in IEEE ISBI’15, and also won the National Scholarship at Tsinghua University in 2012.

Fall 2019

Standard

Sep 23	No Seminar
Sep 30 4011 Bren Hall 1 pm	Nia Dowell Assistant Professor School of Education University of California, Irvine Group Communication Analysis: Applications for Online Learning Environments Educational environments have become increasingly reliant on computer-mediated communication, relying on video conferencing, synchronous chats, and asynchronous forums, in both small (5-20 learners) and massive (1000+ learner) learning environments. These platforms, which are designed to support or even supplant traditional instruction, have become common-place across all levels of education, and as a result created big data in education. In order to move forward, the learning sciences field is in need of new automated approaches that offer deeper insights into the dynamics of learner interaction and discourse across online learning platforms. This talk will present results from recent work that uses language and discourse to capture social and cognitive dynamics during collaborative interactions. I will introduce group communication analysis (GCA), a novel approach for detecting emergent learner roles from the participants’ contributions and patterns of interaction. This method makes use of automated computational linguistic analysis of the sequential interactions of participants in online group communication to create distinct interaction profiles. We have applied the GCA to several collaborative learning datasets. Cluster analysis, predictive, and hierarchical linear mixed-effects modeling were used to assess the validity of the GCA approach, and practical influence of learner roles on student and overall group performance. The results indicate that learners’ patterns in linguistic coordination and cohesion are representative of the roles that individuals play in collaborative discussions. More broadly, GCA provides a framework for researchers to explore the micro intra- and inter-personal patterns associated with the participants’ roles and the sociocognitive processes related to successful collaboration. Bio: I am an assistant professor in the School of Education at UCI. My primary interests are in cognitive psychology, discourse processing, group interaction, and learning analytics. In general, my research focuses on using language and discourse to uncover the dynamics of socially significant, cognitive, and affective processes. I am currently applying computational techniques to model discourse and social dynamics in a variety of environments including small group computer-mediated collaborative learning environments, collaborative design networks, and massive open online courses (MOOCs). My research has also extended beyond the educational and learning sciences spaces and highlighted the practical applications of computational discourse science in the clinical, political and social sciences areas.
Oct 7 4011 Bren Hall 1 pm	Shashank Srivastava Assistant Professor Computer Science UNC Chapel Hill Conversational Machine Learning Humans can efficiently learn and communicate new knowledge about the world through natural language (e.g, the concept of important emails may be described through explanations like ‘late night emails from my boss are usually important’). Can machines be similarly taught new tasks and behavior through natural language interactions with their users? In this talk, we’ll explore two approaches towards language-based learning for classifications tasks. First, we’ll consider how language can be leveraged for interactive feature space construction for learning tasks. I’ll present a method that jointly learns to understand language and learn classification models, by using explanations in conjunction with a small number of labeled examples of the concept. Secondly, we’ll examine an approach for using language as a substitute for labeled supervision for training machine learning models, which leverages the semantics of quantifier expressions in everyday language (`definitely’, `sometimes’, etc.) to enable learning in scenarios with limited or no labeled data. Bio: Shashank Srivastava is an assistant professor in the Computer Science department at the University of North Carolina (UNC) Chapel Hill. Shashank received his PhD from the Machine Learning department at CMU in 2018, and was an AI Resident at Microsoft Research in 2018-19. Shashank’s research interests lie in conversational AI, interactive machine learning and grounded language understanding. Shashank has an undergraduate degree in Computer Science from IIT Kanpur, and a Master’s degree in Language Technologies from CMU. He received the Yahoo InMind Fellowship for 2016-17; his research has been covered by popular media outlets including GeekWire and New Scientist.
Oct 14 4011 Bren Hall 1 pm	Bhuwan Dhingra PhD Student Language Technologies Institute Carnegie Mellon University Text as a Virtual Knowledge Base Structured Knowledge Bases (KBs) are extremely useful for applications such as question answering and dialog, but are difficult to populate and maintain. People prefer expressing information in natural language, and hence text corpora, such as Wikipedia, contain more detailed up-to-date information. This raises the question — can we directly treat text corpora as knowledge bases for extracting information on demand? In this talk I will focus on two problems related to this question. First, I will look at augmenting incomplete KBs with textual knowledge for question answering. I will describe a graph neural network model for processing heterogeneous data from the two sources. Next, I will describe a scalable approach for compositional reasoning over the contents of the text corpus, analogous to following a path of relations in a structured KB to answer multi-hop queries. I will conclude by discussing interesting future research directions in this domain. Bio: Bhuwan Dhingra is a final year PhD student at Carnegie Mellon University, advised by William Cohen and Ruslan Salakhutdinov. His research uses natural language processing and machine learning to build an interface between AI applications and world knowledge (facts about people, places and things). His work is supported by the Siemens FutureMakers PhD fellowship. Prior to joining CMU, Bhuwan completed his undergraduate studies at IIT Kanpur in 2013, and spent two years at Qualcomm Research in the beautiful city of San Diego.
Oct 21 4011 Bren Hall 1 pm	Robert Bamler Postdoctoral Researcher Dept. of Computer Science University of California, Irvine Revisiting Variational Expectation Maximization Bayesian inference is often advertised for applications where posterior uncertainties matter. A less appreciated advantage of Bayesian inference is that it allows for highly scalable model selection (“hyperparameter tuning”) via the Expectation Maximization (EM) algorithm and its approximate variant, variational EM. In this talk, I will present both an application and an improvement of variational EM. The application is for link prediction in knowledge graphs, where a probabilistic approach and variational EM allowed us to train highly flexible models with more than ten thousand hyperparameters, improving predictive performance. In the second part of the talk, I will propose a new family of objective functions for variational EM. We will see that existing versions of variational inference in the literature can be interpreted as various forms of biased importance sampling of the marginal likelihood. Combining this insight with ideas from perturbation theory in statistical physics will lead us to a tighter bound on the true marginal likelihood and to better predictive performance of Variational Autoencoders. Bio: Robert Bamler is a Postdoc at UCI in the group of Prof. Stephan Mandt. His interests are probabilistic embedding models, variational inference, and probabilistic deep learning methods for data compression. Before joining UCI in December of 2018, Rob worked in the statistical machine learning group at Disney Research in Pittsburgh and Los Angeles. He received his PhD in theoretical statistical and quantum physics from University of Cologne, Germany.
Oct 28 4011 Bren Hall 1 pm	Zhou Yu Assistant Professor Dept. of Computer Science University of California, Davis Augment intelligence with multimodal information Humans interact with other humans or the world through information from various channels including vision, audio, language, haptics, etc. To simulate intelligence, machines require similar abilities to process and combine information from different channels to acquire better situation awareness, better communication ability, and better decision-making ability. In this talk, we describe three projects. In the first study, we enable a robot to utilize both vision and audio information to achieve better user understanding. Then we use incremental language generation to improve the robot’s communication with a human. In the second study, we utilize multimodal history tracking to optimize policy planning in task-oriented visual dialogs. In the third project, we tackle the well-known trade-off between dialog response relevance and policy effectiveness in visual dialog generation. We propose a new machine learning procedure that alternates from supervised learning and reinforcement learning to optimum language generation and policy planning jointly in visual dialogs. We will also cover some recent ongoing work on image synthesis through dialogs, and generating social multimodal dialogs with a blend of GIF and words. Bio: Zhou Yu is an Assistant Professor at the Computer Science Department at UC Davis. She received her PhD from Carnegie Mellon University in 2017. Zhou is interested in building robust and multi-purpose dialog systems using fewer data points and less annotation. She also works on language generation, vision and language tasks. Zhou’s work on persuasive dialog systems received an ACL 2019 best paper nomination recently. Zhou was featured in Forbes as 2018 30 under 30 in Science for her work on multimodal dialog systems. Her team recently won the 2018 Amazon Alexa Prize on building an engaging social bot for a $500,000 cash award.
Nov 4	Geng Ji PhD Student Dept of Computer Science University of California, Irvine Variational Inference: To Derive or Not To Derive Variational inference provides a general optimization framework to approximate the posterior distributions of latent variables in probabilistic models. Although effective in simple scenarios, it may be inaccurate or infeasible when the data is high-dimensional, the model structure is complicated, or variable relationships are non-conjugate. In this talk, I will present two different strategies to solve these problems. The first one is to derive rigorous variational bounds by leveraging the probabilistic relations and structural dependencies of the given model. One example I will explore is large-scale noisy-OR Bayesian networks popular in IT companies for analyzing the semantic content of massive text datasets. The second strategy is to create flexible algorithms directly applicable to many models, as can be expressed by probabilistic programming systems. I’ll talk about a low-variance Monte Carlo variational inference framework we recently developed for arbitrary models with discrete variables. It has appealing advantages over REINFORCE-style stochastic gradient estimates and model-dependent auxiliary-variable solutions, as demonstrated on real-world models of images, text, and social networks. Bio: Geng Ji is a PhD candidate in the CS Department of UC Irvine, advised by Professor Erik Sudderth. His research interests are broadly in probabilistic graphical models, large-scale variational inference, as well as their applications in computer vision and natural language processing. He did summer internships at Disney Research in 2017 mentored by Professor Stephan Mandt, and Facebook AI in 2018 which he will join as a full-time research scientist.
Nov 11	Veterans Day
Nov 18 4011 Bren Hall 1 pm	John T. Halloran Postdoctoral Researcher Dept. of Biomedical Engineering University of California, Davis Accelerated Machine Learning for Computational Proteomics In the past few decades, mass spectrometry-based proteomics has dramatically improved our fundamental knowledge of biology, leading to advancements in the understanding of diseases and methods for clinical diagnoses. However, the complexity and sheer volume of typical proteomics datasets make both fast and accurate analysis difficult to accomplish simultaneously; while machine learning methods have proven themselves capable of incredibly accurate proteomic analysis, such methods deter use by requiring extremely long runtimes in practice. In this talk, we will discuss two core problems in computational proteomics and how to accelerate the training of their highly accurate, but slow, machine learning solutions. For the first problem, wherein we seek to infer the protein subsequences (called peptides) present in a biological sample, we will improve the training of graphical models by deriving emission functions which render conditional-maximum likelihood learning concave. Used within a dynamic Bayesian network, we show that these emission functions not only allow extremely efficient learning of globally-convergent parameters, but also drastically outperform the state-of-the-art in peptide identification accuracy. For the second problem, wherein we seek to further improve peptide identification accuracy by classifying correct versus incorrect identifications, we will speed up the state-of-the-art in discriminative learning using a combination of improved convex optimization and extensive parallelization. We show that on massive datasets containing hundreds-of-millions of peptide identifications, these speedups reduce discriminative analysis time from several days down to just several hours, without any degradation in analysis quality. Bio: John Halloran is a Postdoc at UC Davis working with Professor David Rocke. He received his PhD from the University of Washington in 2016. John is interested in developing fast and accurate machine learning solutions for massive-scale problems encountered in computational biology. His work regularly focuses on efficient generative and discriminative training of dynamic graphical models. He is a recipient of the UC Davis Award for Excellence in Postdoctoral Research and a UW Genome Training Grant.
Nov 25 4011 Bren Hall 1 pm	Xanda Schofield Assistant Professor Dept. of Computer Science Harvey Mudd College Towards Practical and Locally Private Topic Models A critical challenge in the large-scale analysis of people’s data is protecting the privacy of the people who generated it. Of particular interest is how to privately infer models over discrete count data, like frequencies of words in a message or the number of times two people have interacted. Recently, I helped to develop locally private Bayesian Poisson factorization, a method for differentially private inference for a large family of models of count data, including topic models, stochastic block models, event models, and beyond. However, in the domain of topic models over text, this method can encounter serious obstacles in both speed and model quality. These arise from the collision of high-dimensional, sparse counts of text features in a bag-of-words representation, and dense noise from a privacy mechanism. In this talk, I address several challenges in the space of private statistical model inference over language data, as well as corresponding approaches to produce interpretable models. Bio: Xanda Schofield is an Assistant Professor in Computer Science at Harvey Mudd College. Her work focuses on practical applications of unsupervised models of text, particularly topic models, to research in the humanities and social sciences. More recently, her work has expanded to the intersection of privacy and text mining. She completed her Ph.D. in 2019 at Cornell University advised by David Mimno. In her graduate career, she was the recipient of an NDSEG Fellowship, the Anita Borg Memorial Scholarship, and the Microsoft Graduate Women’s Scholarship. She is also an avid cookie baker and tweets @XandaSchofield.
Dec 2 4011 Bren Hall 1 pm	Shayan Doroudi Assistant Professor School of Education University of California, Irvine Bias, Variance, and the Intertwined Histories of Artificial Intelligence and Education Research This talk will be divided into two parts. In the first part, I will demonstrate that the bias-variance tradeoff in machine learning and statistics can be generalized to offer insights to debates in other scientific fields. In particular, I will show how it can be applied to situate a variety of debates that appear in the education literature. In the second part of my talk, I will give a brief account of how the early history of artificial intelligence was naturally intertwined with the history of education research and the learning sciences. I will use the generalized bias-variance tradeoff as a lens with which to situate different trends that appeared in this history. Today, AI researchers might see education as just another application area, but historically AI and education were integrated into a broader movement to understand and improve intelligence and learning, in humans and in machines. Bio: Shayan Doroudi is an assistant professor at the UC Irvine School of Education. His research is focused on the learning sciences, educational technology, and the educational data sciences. He is particularly interested in studying the prospects and limitations of data-driven algorithms in learning technologies, including lessons that can be drawn from the rich history of educational technology. He earned his B.S. in Computer Science from the California Institute of Technology, and his M.S. and Ph.D. in Computer Science from Carnegie Mellon.
Dec 9	Finals week
Dec 16 4011 Bren Hall 1 pm	Eric Nalisnick Postdoctoral Researcher University of Cambridge/DeepMind Deep Learning Under Covariate Shift Deep neural networks have demonstrated impressive performance in predictive tasks. However, these models have been shown to be brittle, being easily fooled by even small perturbations of the input features (covariates). In this talk, I describe two approaches for handling covariate shift. The first uses a Bayesian prior derived from data augmentation to make the classifier robust to potential test-time shifts. The second strategy is to directly model the covariates using a ‘hybrid model’: a model of the joint distribution over labels and features. In experiments involving this latter approach, we discovered limitations in some existing methods for detecting distributional shift in high-dimensions. I demonstrate that a simple entropy-based goodness-of-fit test can solve some of these issues but conclude by arguing that more investigation is needed. Bio: Eric Nalisnick is a postdoctoral researcher at the University of Cambridge and a part-time research scientist at DeepMind. His research interests span statistical machine learning, with a current emphasis on Bayesian deep learning, generative modeling, and out-of-distribution detection. He received his PhD from the University of California, Irvine, where he was supervised by Padhraic Smyth. Eric has also spent time interning at DeepMind, Twitter, Microsoft, and Amazon.

Spring 2019

Standard

Apr 8	No Seminar
Apr 15 Bren Hall 4011 1 pm	Daeyun Shin PhD Candidate Dept of Computer Science UC Irvine Multi-layer Depth and Epipolar Feature Transformers for 3D Scene Reconstruction In this presentation, I will present our approach to the problem of automatically reconstructing a complete 3D model of a scene from a single RGB image. This challenging task requires inferring the shape of both visible and occluded surfaces. Our approach utilizes viewer-centered, multi-layer representation of scene geometry adapted from recent methods for single object shape completion. To improve the accuracy of view-centered representations for complex scenes, we introduce a novel “Epipolar Feature Transformer” that transfers convolutional network features from an input view to other virtual camera viewpoints, and thus better covers the 3D scene geometry. Unlike existing approaches that first detect and localize objects in 3D, and then infer object shape using category-specific models, our approach is fully convolutional, end-to-end differentiable, and avoids the resolution and memory limitations of voxel representations. We demonstrate the advantages of multi-layer depth representations and epipolar feature transformers on the reconstruction of a large database of indoor scenes. Project page: https://www.ics.uci.edu/~daeyuns/layered-epipolar-cnn/
Apr 22 Bren Hall 4011 1 pm	Mike Pritchard Assistant Professor Dept. of Earth System Sciences University of California, Irvine Improving global climate simulations using physically constrained deep learning emulators of unresolved moist turbulence processes I will discuss machine-learning emulation of O(100M) cloud-resolving simulations of moist turbulence for use in multi-scale global climate simulation. First, I will present encouraging results from pilot tests on an idealized ocean-world, in which a fully connected deep neural network (DNN) is found to be capable of emulating explicit subgrid vertical heat and vapor transports across a globally diverse population of convective regimes. Next, I will demonstrate that O(10k) instances of the DNN emulator spanning the world are able to feed back realistically with a prognostic global host atmospheric model, producing viable ML-powered climate simulations that exhibit realistic space-time variability for convectively coupled weather dynamics and even some limited out-of-sample generalizability to new climate states beyond the training data’s boundaries. I will then discuss a new prototype of the neural network under development that includes the ability to enforce multiple physical constraints within the DNN optimization process, which exhibits potential for further generalizability. Finally, I will conclude with some discussion of the unsolved technical issues and interesting philosophical tensions being raised in the climate modeling community by this disruptive but promising approach for next-generation global simulation.
Apr 29 Bren Hall 4011 1 pm	Nick Gallo PhD Candidate Department of Computer Science University of California, Irvine Coarse to Fine Lifted Inference Large problems with repetitive sub-structure arise in many domains such as social network analysis, collective classification, and database entity resolution. In these instances, individual data is augmented with a small set of rules that uniformly govern the relationship among groups of objects (for example: “the friend of my friend is probably my friend” in a social network). Uncertainty is captured by a probabilistic graphical model structure. While theoretically sound, standard reasoning techniques cannot be applied due to the massive size of the network (often millions of random variable and trillions of factors). Previous work on lifted inference efficiently exploits symmetric structure in graphical models, but breaks down in the presence of unique individual data (contained in all real-world problems). Current methods to address this problem are largely heuristic. In this presentation we describe a coarse to fine approximate inference framework that initially treats all individuals identically, gradually relaxing this restriction to finer sub-groups. This produces a sequence of inference objective bounds of monotonically increasing cost and accuracy. We then discuss our work on incorporating high-order inference terms (over large subsets of variables) into lifted inference and ongoing challenges in this area.
May 13 Bren Hall 4011 1 pm	Matt Gardner Senior Research Scientist Allen Institute of Artificial Intelligence Reasoning Our Way to Reading Reading machines that truly understood what they read would change the world, but our current best reading systems struggle to understand text at anything more than a superficial level. In this talk I try to reason out what it means to “read”, and how reasoning systems might help us get there. I will introduce three reading comprehension datasets that require systems to reason at a deeper level about the text that they read, using numerical, coreferential, and implicative reasoning abilities. I will also describe some early work on models that can perform these kinds of reasoning. Bio: Matt is a senior research scientist at the Allen Institute for Artificial Intelligence (AI2) on the AllenNLP team, and a visiting scholar at UCI. His research focuses primarily on getting computers to read and answer questions, dealing both with open domain reading comprehension and with understanding question semantics in terms of some formal grounding (semantic parsing). He is particularly interested in cases where these two problems intersect, doing some kind of reasoning over open domain text. He is the original author of the AllenNLP toolkit for NLP research, and he co-hosts the NLP Highlights podcast with Waleed Ammar.
May 27	No Seminar (Memorial Day)
June 3 Bren Hall 4011 12:00	Peter Sadowski Assistant Professor Information and Computer Sciences University of Hawaii Manoa Deep Learning for Extreme Remote Sensing: from Ocean Waves to Exocomets New technologies for remote sensing and astronomy provide an unprecedented view of Earth, our Sun, and beyond. Traditional data-analysis pipelines in oceanography, atmospheric sciences, and astronomy struggle to take full advantage of the massive amounts of high-dimensional data now available. I will describe opportunities for using deep learning to process satellite and telescope data, and discuss recent work mapping extreme sea states using Satellite Aperture Radar (SAR), inferring the physics of our sun’s atmosphere, and detecting anomalous astrophysical events in other systems, such as comets transiting distant stars. Bio: Peter Sadowski is an Assistant Professor of Information and Computer Sciences at the University of Hawaii Manoa and Co-Director of the AI Precision Health Institute at the University of Hawaii Cancer Center. He completed his Ph.D. and Postdoc at University of California Irvine, and his undergraduate studies at Caltech. His research focuses on deep learning and its applications to the natural sciences, particularly those at the intersection of machine learning and physics.
June 3 Bren Hall 4011 1 pm	Max Welling Research Chair, University of Amsterdam VP Technologies, Qualcomm Integrating Generative Modeling into Deep Learning Deep learning has boosted the performance of many applications tremendously, such as object classification and detection in images, speech recognition and understanding, machine translation, game play such as chess and go etc. However, these all constitute reasonably narrowly and well defined tasks for which it is reasonable to collect very large datasets. For artificial general intelligence (AGI) we will need to learn from a small number of samples, generalize to entirely new domains, and reason about a problem. What do we need in order to make progress to AGI? I will argue that we need to combine the data generating process, such as the physics of the domain and the causal relationships between objects, with the tools of deep learning. In this talk I will present a first attempt to integrate the theory of graphical models, which arguably was the dominating modeling machine learning paradigm around the turn of the twenty-first century, with deep learning. Graphical models express the relations between random variables in an interpretable way, while probabilistic inference in such networks can be used to reason about these variables. We will propose a new hybrid paradigm where probabilistic message passing in such networks is enhanced with graph convolutional neural networks to improve the ability of such systems to reason and make predictions.
June 10	No Seminar (Finals)

Fall 2018

Standard

Oct 1	No Seminar
Oct 8 Bren Hall 4011 1 pm	Matt Gardner Research Scientist Allen Institute for AI A Tale of Two Question Answering Systems The path to natural language understanding goes through increasingly challenging question answering tasks. I will present research that significantly improves performance on two such tasks: answering complex questions over tables, and open-domain factoid question answering. For answering complex questions, I will present a type-constrained encoder-decoder neural semantic parser that learns to map natural language questions to programs. For open-domain factoid QA, I will show that training paragraph-level QA systems to give calibrated confidence scores across paragraphs is crucial when the correct answer-containing paragraph is unknown. I will conclude with some thoughts about how to combine these two disparate QA paradigms, towards the goal of answering complex questions over open-domain text. Bio:Matt Gardner is a research scientist at the Allen Institute for Artificial Intelligence (AI2), where he has been exploring various kinds of question answering systems. He is the lead designer and maintainer of the AllenNLP toolkit, a platform for doing NLP research on top of pytorch. Matt is also the co-host of the NLP Highlights podcast, where, with Waleed Ammar, he gets to interview the authors of interesting NLP papers about their work. Prior to joining AI2, Matt earned a PhD from Carnegie Mellon University, working with Tom Mitchell on the Never Ending Language Learning project.
Oct 22 Bren Hall 4011 1 pm	Stephan Mandt Assistant Professor Dept. of Computer Science UC Irvine Deep Probabilistic Modeling I will give an overview of some exciting recent developments in deep probabilistic modeling, which combines deep neural networks with probabilistic models for unsupervised learning. Deep probabilistic models are capable of synthesizing artificial data that highly resemble the training data, and are able fool both machine learning classifiers as well as humans. These models have numerous applications in creative tasks, such as voice, image, or video synthesis and manipulation. At the same time, combining neural networks with strong priors results in flexible yet highly interpretable models for finding hidden structure in large data sets. I will summarize my group’s activities in this space, including measuring semantic shifts of individual words over hundreds of years, summarizing audience reactions to movies, and predicting the future evolution of video sequences with applications to neural video coding.
Oct 25 Bren Hall 3011 3 pm	(Note: different day (Thurs), time (3pm), and location (3011) relative to usual Monday seminars) Steven Wright Professor Department of Computer Sciences University of Wisconsin, Madison Optimization in Data Science Many of the computational problems that arise in data analysis and machine learning can be expressed mathematically as optimization problems. Indeed, much new algorithmic research in optimization is being driven by the need to solve large, complex problems from these areas. In this talk, we review a number of canonical problems in data analysis and their formulations as optimization problems. We will cover support vector machines / kernel learning, logistic regression (including regularized and multiclass variants), matrix completion, deep learning, and several other paradigms.
Oct 29 Bren Hall 4011 1 pm	Alex Psomas Postdoctoral Researcher Computer Science Department Carnegie Mellon University Fair Resource Allocation: From Theory to Practice We study the problem of fairly allocating a set of indivisible items among $n$ agents. Typically, the literature has focused on one-shot algorithms. In this talk we depart from this paradigm and allow items to arrive online. When an item arrives we must immediately and irrevocably allocate it to an agent. A paradigmatic example is that of food banks: food donations arrive, and must be delivered to nonprofit organizations such as food pantries and soup kitchens. Items are often perishable, which is why allocation decisions must be made quickly, and donated items are typically leftovers, leading to lack of information about items that will arrive in the future. Which recipient should a new donation go to? We approach this problem from different angles. In the first part of the talk, we study the problem of minimizing the maximum envy between any two recipients, after all the goods have been allocated. We give a polynomial-time, deterministic and asymptotically optimal algorithm with vanishing envy, i.e. the maximum envy divided by the number of items T goes to zero as T goes to infinity. In the second part of the talk, we adopt and further develop an emerging paradigm called virtual democracy. We will take these ideas all the way to practice. In the last part of the talk I will present some results from an ongoing work on automating the decisions faced by a food bank called 412 Food Rescue, an organization in Pittsburgh that matches food donations with non-profit organizations.
Nov 5 Bren Hall 4011 1 pm	Fred Park Associate Professor Dept of Math & Computer Science Whittier College Image Segmentation and Tracking Utilizing a Difference of Convex Regularized Mumford-Shah Functional In this talk I will give a brief overview of the segmentation and tracking problems and will propose a new model that tackles both of them. This model incorporates a weighted difference of anisotropic and isotropic total variation (TV) norms into a relaxed formulation of the Mumford-Shah (MS) model. We will show results exceeding those obtained by the MS model when using the standard TV norm to regularize partition boundaries. Examples illustrating the qualitative differences between the proposed model and the standard MS one will be shown as well. I will also talk about a fast numerical method that is used to optimize the proposed model utilizing the difference-of-convex algorithm (DCA) and the primal dual hybrid gradient (PDHG) method. Finally, future directions will be given that could harness the power of convolution nets for more advanced segmentation tasks.
Nov 12	No Seminar (Veterans Day)
Nov 19 Bren Hall 4011 1 pm	Philip Nelson Director of Engineering Google Research Accelerating bio discovery with machine learning, the promise and the peril Google Accelerated Sciences is a translational research team that brings Google’s technological expertise to the scientific community. Recent advances in machine learning have delivered incredible results in consumer applications (e.g. photo recognition, language translation), and is now beginning to play an important role in life sciences. Taking examples from active collaborations in the biochemical, biological, and biomedical fields, I will focus on how our team transforms science problems into data problems and applies Google’s scaled computation, data-driven engineering, and machine learning to accelerate discovery. See http://g.co/research/gas for our publications and more details. Bio: Philip Nelson is a Director of Engineering in Google Research. He joined Google in 2008 and was previously responsible for a range of Google applications and geo services. In 2013, he helped found and currently leads the Google Accelerated Science team that collaborates with academic and commercial scientists to apply Google’s knowledge and experience and technologies to important scientific problems. Philip graduated from MIT in 1985 where he did award-winning research on hip prosthetics at Harvard Medical School. Before Google, Philip helped found and lead several Silicon Valley startups in search (Verity), optimization (Impresse), and genome sequencing (Complete Genomics) and was also an Entrepreneur in Residence at Accel Partners.
Nov 26 Bren Hall 4011 1 pm	Richard Futrell Assistant Professor Dept of Language Science UC Irvine Natural language as a code: Modeling human language using information theory Why is natural language the way it is? I propose that human languages can be modeled as solutions to the problem of efficient communication among intelligent agents with certain information processing constraints, in particular constraints on short-term memory. I present an analysis of dependency treebank corpora of over 50 languages showing that word orders across languages are optimized to limit short-term memory demands in parsing. Next I develop a Bayesian, information-theoretic model of human language processing, and show that this model can intuitively explain an apparently paradoxical class of comprehension errors made by both humans and state-of-the-art recurrent neural networks (RNNs). Finally I combine these insights in a model of human languages as information-theoretic codes for latent tree structures, and show that optimization of these codes for expressivity and compressibility results in grammars that resemble human languages.
Dec 3	No Seminar (NIPS)

Spring 2018

Standard

Apr 2	No Seminar
Apr 9 Bren Hall 4011 1 pm	Sabino Miranda, Ph.D CONACyT Researcher Center for Research and Innovation in Information and Communication Technologies Towards a Multilingual and Error-Robust Approach for Sentiment Analysis Sentiment Analysis is a research area concerned with the computational analysis of people’s feelings or beliefs expressed in texts such as emotions, opinions, attitudes, appraisals, etc. At the same time, with the growth of social media data (review websites, microblogging sites, etc.) on the Web, Twitter has received particular attention because it is a huge source of opinionated information with potential applications to decision-making tasks from business applications to the analysis of social and political events. In this context, I will present the multilingual and error-robust approaches developed in our group to tackle sentiment analysis as a classification problem, mainly for informal written text such as Twitter. Our approaches have been tested in several benchmark contests such as SemEval (International Workshop on Semantic Evaluation), TASS (Workshop for Sentiment Analysis Focused on Spanish), and PAN (Workshop on Digital Text Forensics).
Apr 16 Bren Hall 4011 1 pm	Roman Vershynin Professor of Mathematics University of California, Irvine Boolean functions and random tensors A simple way to generate a Boolean function in n variables is to take the sign of some polynomial. Such functions are called polynomial threshold functions. How many low-degree polynomial threshold functions are there? This problem was solved for degree d=1 by Zuev in 1989 and has remained open for any higher degrees, including d=2, since then. In a joint work with Pierre Baldi (UCI), we settled the problem for all degrees d>1. The solution explores connections of Boolean functions to additive combinatorics and high-dimensional probability. This leads to a program of extending random matrix theory to random tensors, which is mostly an uncharted territory at present.
Apr 23 Bren Hall 4011 1 pm	Zhile Ren PhD Candidate, Computer Science Brown University Semantic Three-Dimensional Understanding of Dynamic Scenes We develop new representations and algorithms for three-dimensional (3D) scene understanding from images and videos. In cluttered indoor scenes, RGB-D images are typically described by local geometric features of the 3D point cloud. We introduce descriptors that account for 3D camera viewpoint, and use structured learning to perform 3D object detection and room layout prediction. We also extend this work by using latent support surfaces to capture style variations of 3D objects and help detect small objects. Contextual relationships among categories and layout are captured via a cascade of classifiers, leading to holistic scene hypotheses with improved accuracy. In outdoor autonomous driving applications, given two consecutive frames from a pair of stereo cameras, 3D scene flow methods simultaneously estimate the 3D geometry and motion of the observed scene. We incorporate semantic segmentation in a cascaded prediction framework to more accurately model moving objects by iteratively refining segmentation masks, stereo correspondences, 3D rigid motion estimates, and optical flow fields.
Apr 30	Cancelled
May 7 Bren Hall 4011 1 pm	Vivek Srikumar Assistant Professor University of Utah Natural Language Processing in the Wild: Opportunities & Challenges Natural language processing (NLP) sees potential applicability in a broad array of user-facing applications. To realize this potential, however, we need to address several challenges related to representations, data availability and scalability. In this talk, I will discuss these concerns and how we may overcome them. First, as a motivating example of NLP’s broad reach, I will present our recent work on using language technology to improve mental health treatment. Then, I will focus on some of the challenges that need to be addressed. The choice of representations can make a big difference in our ability to reason about text; I will discuss recent work on developing rich semantic representations. Finally, I will touch upon the problem of systematically speeding up the entire NLP pipeline without sacrificing accuracy. As a concrete example, I will present a new algebraic characterization of the process of feature extraction, as a direct consequence of which, we can make trained classifiers significantly faster.
May 14 Bren Hall 4011 1 pm	Shu Kong PhD Candidate, Computer Science University of California, Irvine Pay Attention to the Pixel, Understand the Scene Better Objects may appear at arbitrary scales in perspective images of a scene, posing a challenge for recognition systems that process images at a fixed resolution. We propose a depth-aware gating module that adaptively selects the pooling field size (by fusing multi-scale pooled features) in a convolutional network architecture according to the object scale (inversely proportional to the depth) so that small details are preserved for distant objects while larger receptive fields are used for those nearby. The depth gating signal is provided by stereo disparity or estimated directly from monocular input. We further integrate this depth-aware gating into a recurrent convolutional neural network to refine semantic segmentation, and show state-of-the-art performance on several benchmarks. Moreover, rather than fusing mutli-scale pooled features based on estimated depth, we show the “correct” size of pooling field for each pixel can be decided in an attentional fashion by our Pixel-wise Attentional Gating unit (PAG), which learns to choose the pooling size for each pixel. PAG is a generic, architecture-independent, problem-agnostic mechanism that can be readily “plugged in” to an existing model with fine-tuning. We utilize PAG in two ways: 1) learning spatially varying pooling fields that improves model performance without the extra computation cost, and 2) learning a dynamic computation policy for each pixel to decrease total computation while maintaining accuracy. We extensively evaluate PAG on a variety of per-pixel labeling tasks, including semantic segmentation, boundary detection, monocular depth and surface normal estimation. We demonstrate that PAG allows competitive or state-of-the-art performance on these tasks. We also show that PAG learns dynamic spatial allocation of computation over the input image which provides better performance trade-offs compared to related approaches (e.g., truncating deep models or dynamically skipping whole layers). Generally, we observe that PAG reduces computation by 10% without noticeable loss in accuracy, and performance degrades gracefully when imposing stronger computational constraints.
May 21 Bren Hall 4011 1 pm	Rich Caruana Principal Researcher Microsoft Research Friends Don’t Let Friends Deploy Black-Box Models: The Importance of Intelligibility in Machine Learning In machine learning often a tradeoff must be made between accuracy and intelligibility: the most accurate models usually are not very intelligible (e.g., deep nets, boosted trees and random forests), and the most intelligible models usually are less accurate (e.g., logistic regression and decision lists). This tradeoff often limits the accuracy of models that can be safely deployed in mission-critical applications such as healthcare where being able to understand, validate, edit, and ultimately trust a learned model is important. We have been working on a learning method based on generalized additive models (GAMs) that is often as accurate as full complexity models, but even more intelligible than linear models. This makes it easy to understand what a model has learned, and also makes it easier to edit the model when it learns inappropriate things because of unanticipated problems with the data. Making it possible for experts to understand a model and repair it is critical because most data has unanticipated landmines. In the talk I’ll present two healthcare cases studies where these high-accuracy GAMs discover surprising patterns in the data that would have made deploying a black-box model risky. I’ll also briefly show how we’re using these models to detect bias in domains where fairness and transparency are paramount.
May 28	Memorial Day
Jun 4 Bren Hall 4011 1 pm	Stephen McAleer (Pierre Baldi‘s group) Graduate Student, Computer Science University of California, Irvine Learning how to solve the Rubik's Cube with no Human Knowledge We will present a novel approach to solving the Rubik’s cube effectively without any human knowledge using several ingredients including deep learning, reinforcement learning, and Monte Carlo searches. At the end, if time permits, we will describe several extensions to the neuronal Boolean complexity results presented by Roman Vershynin a few weeks ago.
Jun 11	No Seminar (finals week)

Winter 2018

Standard

Jan 15	No Seminar (MLK Day)
Jan 22 Bren Hall 4011 1 pm	Shufeng Kong PhD Candidate Centre for Quantum Software and Information, FEIT University of Technology Sydney, Australia Multiagent Simple Temporal Problem: The Arc-Consistency Approach The Simple Temporal Problem (STP) is a fundamental temporal reasoning problem and has recently been extended to the Multiagent Simple Temporal Problem (MaSTP). In this paper we present a novel approach that is based on enforcing arc-consistency (AC) on the input (multiagent) simple temporal network. We show that the AC-based approach is sufficient for solving both the STP and MaSTP and provide efficient algorithms for them. As our AC-based approach does not impose new constraints between agents, it does not violate the privacy of the agents and is superior to the state-ofthe-art approach to MaSTP. Empirical evaluations on diverse benchmark datasets also show that our AC-based algorithms for STP and MaSTP are significantly more efficient than existing approaches.
Jan 29 Bren Hall 4011 1 pm	Yangfeng Ji Postdoctoral Scholar Paul Allen School of Computer Science and Engineering University of Washington Bringing Structural Information into Neural Network Design Deep learning is one of the most important techniques used in natural language processing (NLP). A central question in deep learning for NLP is how to design a neural network that can fully utilize the information from training data and make accurate predictions. A key to solving this problem is to design a better network architecture. In this talk, I will present two examples from my work on how structural information from natural language helps design better neural network models. The first example shows adding coreference structures of entities not only helps different aspects of text modeling, but also improves the performance of language generation; the second example demonstrates structures of organizing sentences into coherent texts can help neural networks build better representations for various text classification tasks. Along the lines of this topic, I will also propose a few ideas for future work and discuss some potential challenges.
February 5	No Seminar (AAAI)
February 12 Bren Hall 4011 1 pm	Eric Nalisnick PhD Candidate Computer Science University of California, Irvine Averaging and Combining Variational Models with Stein Particle Descent Bayesian inference for complex models—the kinds needed to solve complex tasks such as object recognition—is inherently intractable, requiring analytically difficult integrals be solved in high dimensions. One solution is to turn to variational Bayesian inference: a parametrized family of distributions is proposed, and optimization is carried out to find the member of the family nearest to the true posterior. There is an innate trade-off within VI between expressive vs tractable approximations. We wish the variational family to be as rich as possible so as it might include the true posterior (or something very close), but adding structure to the approximation increases the computational complexity of optimization. As a result, there has been much interest in efficient optimization strategies for mixture model approximations. In this talk, I’ll return to the problem of using mixture models for VI. First, to motivate our approach, I’ll discuss the distinction between averaging vs combining variational models. We show that optimization objectives aimed at fitting mixtures (i.e. model combination), in practice, are relaxed into performing something between model combination and averaging. Our primary contribution is to formulate a novel training algorithm for variational model averaging by adapting Stein variational gradient descent to operate on the parameters of the approximating distribution. Then, through a particular choice of kernel, we show the algorithm can be adapted to perform something closer to model combination, providing a new algorithm for optimizing (finite) mixture approximations.
February 19	No Seminar (President’s Day)
February 26 Bren Hall 4011 1 pm	Jay Pujara Research Scientist ISI/USC What do Probabilistic Models Know? Knowledge is an essential ingredient in the quest for artificial intelligence, yet scalable and robust approaches to acquiring knowledge have challenged AI researchers for decades. Often, the obstacle to knowledge acquisition is massive, uncertain, and changing data that obscures the underlying knowledge. In such settings, probabilistic models have excelled at exploiting the structure in the domain to overcome ambiguity, revise beliefs and produce interpretable results. In my talk, I will describe recent work using probabilistic models for knowledge graph construction and information extraction, including linking subjects across electronic health records, fusing background knowledge from scientific articles with gene association studies, disambiguating user browsing behavior across platforms and devices, and aligning structured data sources with textual summaries. I also highlight several areas of ongoing research, fusing embedding approaches with probabilistic modeling and building models that support dynamic data or human-in-the-loop interactions. Bio: Jay Pujara is a research scientist at the University of Southern California’s Information Sciences Institute whose principal areas of research are machine learning, artificial intelligence, and data science. He completed a postdoc at UC Santa Cruz, earned his PhD at the University of Maryland, College Park and received his MS and BS at Carnegie Mellon University. Prior to his PhD, Jay spent six years at Yahoo! working on mail spam detection, user trust, and contextual mail experiences, and he has also worked at Google, LinkedIn and Oracle. Jay is the author of over thirty peer-reviewed publications and has received three best paper awards for his work. He is a recognized authority on knowledge graphs, and has organized the Automatic Knowledge Base Construction (AKBC) and Statistical Relational AI (StaRAI) workshops, has presented tutorials on knowledge graph construction at AAAI and WSDM, and has had his work featured in AI Magazine.
March 5 Bren Hall 4011 1 pm	Vagelis Papalexakis Assistant Professor UC Riverside Tensor Decompositions for Big Multi-aspect Data Analytics Tensors and tensor decompositions have been very popular and effective tools for analyzing multi-aspect data in a wide variety of fields, ranging from Psychology to Chemometrics, and from Signal Processing to Data Mining and Machine Learning. Using tensors in the era of big data presents us with a rich variety of applications, but also poses great challenges such as the one of scalability and efficiency. In this talk I will first motivate the effectiveness of tensor decompositions as data analytic tools in a variety of exciting, real-world applications. Subsequently, I will discuss recent techniques on tackling the scalability and efficiency challenges by parallelizing and speeding up tensor decompositions, especially for very sparse datasets, including the scenario where the data are continuously updated over time. Finally, I will discuss open problems in unsupervised tensor mining and quality assessment of the results, and present work-in-progress addressing that problem with very encouraging results.
March 12 Bren Hall 4011 1 pm	Alessandro Achille PhD Student UC Los Angeles The Emergence Theory of Deep Learning: Perception, Information Theory and PAC-Bayes I will describe the basic elements of the Emergence Theory of Deep Learning, that started as a general theory for representations, and is comprised of three parts: (1) We formalize the desirable properties that a representation should possess, based on classical principles of statistical decision and information theory: invariance, sufficiency, minimality, disentanglement. We then show that such an optimal representation of the data can be learned by minimizing a specific loss function which is related to the notion of Information Bottleneck and Variational Inference. (2) We analyze common empirical losses employed in Deep Learning (such as empirical cross-entropy), and implicit or explicit regularizers, including Dropout and Pooling, and show that they bias the network toward recovering such an optimal representation. Finally, (3) we show that minimizing a suitably (implicitly or explicitly) regularized loss with SGD with respect to the weights of the network implies implicit optimization of the loss described in (1), with relates instead to the activations of the network. Therefore, even when we optimize a DNN as a black-box classifier, we are always biased toward learning minimal, sufficient and invariant representation. The link between (implicit or explicit) regularization of the classification loss and learning of optimal representations is specific to the architecture of deep networks, and is not found in a general classifier. The theory is related to a new version of the Information Bottleneck that studies the weights of a network, rater than the activation, and can also be derived using PAC-Bayes or Kolmogorov complexity arguments, providing independent validation.
March 19	No Seminar (Finals Week)

Fall 2017

Standard

Oct 9	No Seminar (Columbus Day)
Oct 16 Bren Hall 3011 1 pm	Bailey Kong PhD Candidate Department of Computer Science University of California, Irvine Cross-Domain Forensic Shoeprint Matching We investigate the problem of automatically determining what type of shoe left an impression found at a crime scene. This recognition problem is made difficult by the variability in types of crime scene evidence (ranging from traces of dust or oil on hard surfaces to impressions made in soil) and the lack of comprehensive databases of shoe outsole tread patterns. We find that mid-level features extracted by pre-trained convolutional neural nets are surprisingly effective descriptors for these specialized domains. However, the choice of similarity measure for matching exemplars to a query image is essential to good performance. For matching multi-channel deep features, we propose the use of multi-channel normalized cross-correlation and analyze its effectiveness. Finally, we introduce a discriminatively trained variant and fine-tune our system end-to-end, obtaining state-of-the-art performance.
Oct 23 Bren Hall 3011 1 pm	Geng Ji PhD Candidate Department of Computer Science University of California, Irvine From Patches to Images: A Nonparametric Generative Model We propose a hierarchical generative model that captures the self-similar structure of image regions as well as how this structure is shared across image collections. Our model is based on a novel, variational interpretation of the popular expected patch log-likelihood (EPLL) method as a model for randomly positioned grids of image patches. While previous EPLL methods modeled image patches with finite Gaussian mixtures, we use nonparametric Dirichlet process (DP) mixtures to create models whose complexity grows as additional images are observed. An extension based on the hierarchical DP then captures repetitive and self-similar structure via image-specific variations in cluster frequencies. We derive a structured variational inference algorithm that adaptively creates new patch clusters to more accurately model novel image textures. Our denoising performance on standard benchmarks is superior to EPLL and comparable to the state-of-the-art, and we provide novel statistical justifications for common image processing heuristics. We also show accurate image inpainting results.
Oct 30 Bren Hall 4011 1 pm	Qi Lou PhD Candidate Department of Computer Science University of California, Irvine Dynamic Importance Sampling for Anytime Bounds of the Partition Function Computing the partition function is a key inference task in many graphical models. In this paper, we propose a dynamic importance sampling scheme that provides anytime finite-sample bounds for the partition function. Our algorithm balances the advantages of the three major inference strategies, heuristic search, variational bounds, and Monte Carlo methods, blending sampling with search to refine a variationally defined proposal. Our algorithm combines and generalizes recent work on anytime search and probabilistic bounds of the partition function. By using an intelligently chosen weighted average over the samples, we construct an unbiased estimator of the partition function with strong finite-sample confidence intervals that inherit both the rapid early improvement rate of sampling with the long-term benefits of an improved proposal from search. This gives significantly improved anytime behavior, and more flexible trade-offs between memory, time, and solution quality. We demonstrate the effectiveness of our approach empirically on real-world problem instances taken from recent UAI competitions.
Nov 6 Bren Hall 3011 1 pm	Vladimir Minin Professor Department of Statistics University of California, Irvine Advances of Bayesian nonparametrics in population genetics of infectious diseases Estimating evolutionary trees, called phylogenies or genealogies, is a fundamental task in modern biology. Once phylogenetic reconstruction is accomplished, scientists are faced with a challenging problem of interpreting phylogenetic trees. In certain situations, a coalescent process, a stochastic model that randomly generates evolutionary trees, comes to rescue by probabilistically connecting phylogenetic reconstruction with the demographic history of the population under study. An important application of the coalescent is phylodynamics, an area that aims at reconstructing past population dynamics from genomic data. Phylodynamic methods have been especially successful in analyses of genetic sequences from viruses circulating in human populations. From a Bayesian hierarchal modeling perspective, the coalescent process can be viewed as a prior for evolutionary trees, parameterized in terms of unknown demographic parameters, such as the population size trajectory. I will review Bayesian nonparametric techniques that can accomplish phylodynamic reconstruction, with a particular attention to analysis of genetic data sampled serially through time.
Nov 20	No Seminar (Thanksgiving Week)
Dec 4	No Seminar (NIPS Conference)
Dec 13 Bren Hall 4011 1 pm	Yutian Chen Research Scientist Google DeepMind Learning to Learn without Gradient Descent by Gradient Descent We learn recurrent neural network optimizers trained on simple synthetic functions by gradient descent. We show that these learned optimizers exhibit a remarkable degree of transfer in that they can be used to efficiently optimize a broad range of derivative-free black-box functions, including Gaussian process bandits, simple control objectives, global optimization benchmarks and hyper-parameter tuning tasks. Up to the training horizon, the learned optimizers learn to trade-off exploration and exploitation, and compare favourably with heavily engineered Bayesian optimization packages for hyper-parameter tuning.

Spring 2017

Standard

Apr 10 Bren Hall 4011 1 pm	Mike Izbicki PhD Candidate Department of Computer Science University of California, Riverside Divide & Conquer Techniques for Machine Learning I’ll present two algorithms that use divide and conquer techniques to speed up learning. The first algorithm (called OWA) is a communication efficient distributed learner. OWA uses only two rounds of communication, which is sufficient to achieve optimal learning rates. The second algorithm is a meta-algorithm for fast cross validation. I’ll show that for any divide and conquer learning algorithm, there exists a fast cross validation procedure whose run time is asymptotically independent of the number of cross validation folds.
Apr 17 Bren Hall 4011 1 pm	James Supancic PhD Candidate Department of Computer Science University of California, Irvine Long-Term Tracking by Decision Making Cameras can naturally capture sequences of images, or videos. And when understanding videos, connecting the past with the present requires tracking. Sometimes tracking is easy. We focus on two challenges which make tracking harder: long-term occlusions and appearance variations. To handle total occlusion, a tracker must know when it has lost track and how to reinitialize tracking when the target reappears. Reinitialization requires good appearance models. We build appearance models for humans and hands, with a particular emphasis on robustness and occlusion. For the second challenge, appearance variation, the tracker must know when and how to re-learn (or update) an appearance model. This challenge leads to the classic problem of drift: aggressively learning appearance changes allows small errors to compound, as elements of the background environment pollute the appearance model. We propose two solutions. First, we consider self-paced learning, wherein a tracker begins by learning from frames it finds easy. As the tracker becomes better at recognizing the target, it begins to learn from harder frames. We also develop a data-driven approach: train a tracking policy to decide when and how to update an appearance model. To take this direct approach to “learning when to learn”, we exploit large-scale Internet data through reinforcement learning. We interpret the resulting policy and conclude with a generalization for tracking multiple objects.
Apr 24 Bren Hall 4011 1 pm	David R Thompson Jet Propulsion Laboratory California Institute of Technology Visible / Shortwave Infrared Imaging Spectroscopy at JPL: Instruments and Algorithms Imaging spectrometers enable quantitative maps of physical and chemical properties at high spatial resolution. They have a long history of deployments for mapping terrestrial and coastal aquatic ecosystems, geology, and atmospheric properties. They are also critical tools for exploring other planetary bodies. These high-dimensional spatio-spectral datasets pose a rich challenge for computer scientists and algorithm designers. This talk will provide an introduction to remote imaging spectroscopy in the Visible and Shortwave Infrared, describing the measurement strategy and data analysis considerations including atmospheric correction. We will describe historical and current instruments, software, and public datasets. Bio: David R. Thompson is a researcher and Technical Group Lead in the Imaging Spectroscopy group at the NASA Jet Propulsion Laboratory. He is Investigation Scientist for the AVIRIS imaging spectrometer project. Other roles include software lead for the NEAScout mission, autonomy software lead for the PIXL instrument, and algorithm development for diverse JPL airborne imaging spectrometer campaigns. He is recipient of the NASA Early Career Achievement Medal and the JPL Lew Allen Award.
May 1 Bren Hall 4011 1 pm	Weining Shen Assistant Professor Department of Statistics University of California, Irvine Theory behind Bayesian nonparametrics Bayesian nonparametric (BNP) models have been widely used in modern applications. In this talk, I will discuss some recent theoretical results for the commonly used BNP methods from a frequentist asymptotic perspective. I will cover a set of function estimation and testing problems such as density estimation, high-dimensional partial linear regression, independence testing, and independent component analysis. Minimax optimal convergence rates, adaptation and Bernstein-von Mises theorem will be discussed.
May 8 Bren Hall 4011 1 pm	P. Anandan VP for Research Adobe Systems How Data Science, Machine Learning, and AI are Transforming the Consumer Experience During the last two decades the experience of consumers has been undergoing a fundamental and dramatic transformation – giving a rich variety of informed choices, online shopping, consumption of news and entertainment on the go, and personalized shopping experiences. All of this has been powered by the massive amounts of data that is continuously being collected and the application of machine learning, data science and AI techniques to it. Adobe is a leader the Digital Marketing and is the leading provider of solutions to enterprises that are serving customers both in the B2B and B2C space. In this talk, we will outline the current state of the industry and the technology that is behind it, how Data Science and Machine Learning are gradually beginning to transform the experiences of the consumer as well as the marketer. We will also speculate on how recent developments in Artificial Intelligence will lead to deep personalization and richer experiences for the consumer as well as more powerful and tailored end-to-end capabilities for the marketer. Bio: Dr. P. Anandan is Vice President in Adobe Research, responsible for developing research strategy for Adobe, especially in Digital Marketing, and Leading the Adobe India Research lab. An emphasis of this lab is on Big Data Experience and Intelligence. At Adobe, he is also leading efforts in applying A.I. to Big Data. Dr. Anandan is an expert in Computer Vision with more than 60 publications that have earned 14,500 citations in Google Scholar. His research areas include visual motion analysis, video surveillance, and 3D scene modeling from images and video. His papers have won multiple awards including the Helmholtz Prize, for long term fundamental contributions to computer vision research. Prior to joining Adobe Dr. Anandan had a long tenure with Microsoft Research in Redmond, WA, and became a Distinguished Scientist. He was the Managing Director of Microsoft Research India, which he founded. Most recently he was the Managing Director of Microsoft Research’s Worldwide Outreach. He earned a PhD from the University of Massachusetts specializing in Computer Vision and Artificial Intelligence. He started as an assistant professor at Yale University before moving on to work in Video Information Processing at the David Sarnoff Research Center. His research has been used in DARPA’s Video Surveillance and Monitoring program as well as in creating special effects in the movies “What Dreams May Come”, “Prince of Egypt,” and “The Matrix.” Dr. Anandan is the recipient of Distinguished Alumnus awards from both University of Massachusetts and the Indian Institute of Technology Madras, where he earned a B. Tech. in Electrical Engineering. He was inducted into the Nebraska Hall of Computing by the University of Nebraska, from where he obtained an MS in Computer Science. He is currently a member of the Board of Governors of IIT Madras.
May 15 Bren Hall 4011 1 pm	Ndapa Nakashole Assistant Professor Computer Science and Engineering University of California, San Diego Improving Zero-shot learning for word-level translation Zero-shot learning is used in computer vision, natural language, and other domains to induce mapping functions that project vectors from one vector space to another. This is a promising approach to learning, when we do not have labeled data for every possible label we want a system to recognize. This setting is common when doing NLP for low-resource languages, where labeled data is very scare. In this talk, I will present our work on improving zero-shot learning methods for the task of word-level translation. Bio: Ndapa Nakashole is an Assistant Professor in the Department of Computer Science and Engineering at the University of California, San Diego. Prior to UCSD, she was a Postdoctoral Fellow in the Machine Learning Department at Carnegie Mellon University. She obtained her PhD from Saarland University, Germany, for work done at the Max Planck Institute for Informatics at Saarbrücken.
May 22 Bren Hall 4011 1 pm	Batya Kenig Postdoctoral Scholar Department of Information Systems Engineering Technion – Israel Institute of Technology Querying Probabilistic Preferences in Databases We propose a novel framework wherein probabilistic preferences can be naturally represented and analyzed in a probabilistic relational database. The framework augments the relational schema with a special type of a relation symbol, a preference symbol. A deterministic instance of this symbol holds a collection of binary relations. Abstractly, the probabilistic variant is a probability space over databases of the augmented form (i.e., probabilistic database). Effectively, each instance of a preference symbol can be represented as a collection of parametric preference distributions such as Mallows. We establish positive and negative complexity results for evaluating Conjunctive Queries (CQs) over databases where preferences are represented in the Repeated Insertion Model (RIM), Mallows being a special case. We show how CQ evaluation reduces to a novel inference problem (of independent interest) over RIM, and devise a solver with polynomial data complexity.
May 29	No Seminar (Memorial Day)
Jun 5 Bren Hall 4011 1 pm	Yonatan Bisk Postdoctoral Scholar Information Sciences Institute University of Southern California The Limits of Unsupervised Syntax and the Importance of Grounding in Language Acquisition The future of self-driving cars, personal robots, smart homes, and intelligent assistants hinges on our ability to communicate with computers. The failures and miscommunications of Siri-style systems are untenable and become more problematic as machines become more pervasive and are given more control over our lives. Despite the creation of massive proprietary datasets to train dialogue systems, these systems still fail at the most basic tasks. Further, their reliance on big data is problematic. First, successes in English cannot be replicated in most of the 6,000+ languages of the world. Second, while big data has been a boon for supervised training methods, many of the most interesting tasks will never have enough labeled data to actually achieve our goals. It is therefore important that we build systems which can learn from naturally occurring data and grounded situated interactions. In this talk, I will discuss work from my thesis on the unsupervised acquisition of syntax which harnesses unlabeled text in over a dozen languages. This exploration leads us to novel insights into the limits of semantics-free language learning. Having isolated these stumbling blocks, I’ll then present my recent work on language grounding where we attempt to learn the meaning of several linguistic constructions via interaction with the world. Bio: Yonatan Bisk’s research focuses on Natural Language Processing from naturally occurring data (unsupervised and weakly supervised data). He is a postdoc researcher with Daniel Marcu at USC’s Information Sciences Institute. Previously, he received his Ph.D. from the University of Illinois at Urbana-Champaign under Julia Hockenmaier and his BS from the University of Texas at Austin.

Center for Machine Learning and Intelligent Systems

Bren School of Information and Computer Science

University of California, Irvine

Center for Machine Learning and Intelligent Systems

University of California, Irvine

AIML

Winter 2021

Live Stream for all Winter 2021 CML Seminars

Fall 2020

Live Stream for all Fall 2020 CML Seminars

Winter 2020

Spring 2020 Seminars Delayed

Fall 2019

Spring 2019

Fall 2018

Spring 2018

Winter 2018

Fall 2017

Spring 2017