Spring 2025

Standard

April 1 DBH 4011 11 am	Sarah Wiegreffe Postdoctoral Researcher Allen Institute for AI and University of Washington Demystifying the Inner Workings of Language Models Large language models (LLMs) power a rapidly-growing and increasingly impactful suite of AI technologies. However, due to their scale and complexity, we lack a fundamental scientific understanding of much of LLMs’ behavior, even when they are open source. The “black-box” nature of LMs not only complicates model debugging and evaluation, but also limits trust and usability. In this talk, I will describe how my research on interpretability (i.e., understanding models’ inner workings) has answered key scientific questions about how models operate. I will then demonstrate how deeper insights into LLMs’ behavior enable both 1) targeted performance improvements and 2) the production of transparent, trustworthy explanations for human users. Bio: Sarah Wiegreffe is a postdoctoral researcher at the Allen Institute for AI (Ai2) and the Allen School of Computer Science and Engineering at the University of Washington. She has worked on the explainability and interpretability of neural networks for NLP since 2017, with a focus on understanding how language models make predictions in order to make them more transparent to human users. She has been honored as a 3-time Rising Star in EECS, Machine Learning, and Generative AI. She received her PhD in computer science from Georgia Tech in 2022, during which time she interned at Google and Ai2 and won the Ai2 outstanding intern award. She frequently serves on conference program committees, receiving outstanding area chair awards at ACL 2023 and EMNLP 2024.
April 4 DBH 4011 2 pm	XuDong Wang PhD Student Berkeley AI Research Lab, University of California, Berkeley Advancing Multimodal Models Beyond Human Supervision To advance AI toward true artificial general intelligence, it is crucial to incorporate a wider range of sensory inputs, including physical interaction, spatial navigation, and social dynamics. However, achieving the successes of self-supervised Large Language Models (LLMs) across other modalities in our physical and digital environments remains a significant challenge. In this talk, I will discuss how self-supervised learning methods can be harnessed to advance multimodal models beyond the need for human supervision. Firstly, I will highlight a series of research efforts on self-supervised visual scene understanding that leverage the capabilities of self-supervised models to “segment anything” without the need for 1.1 billion labeled segmentation masks, unlike the popular supervised approach, the Segment Anything Model (SAM). Secondly, I will demonstrate how generative and understanding models can work together synergistically, allowing them to complement and enhance each other. Lastly, I will explore the increasingly important techniques for learning from unlabeled or imperfect data within the context of data-centric representation learning. All these research topics are unified by the same core idea: advancing multimodal models beyond human supervision. Bio: XuDong Wang is a final-year Ph.D. student in the Berkeley AI Research (BAIR) lab at UC Berkeley, advised by Prof. Trevor Darrell, and a research scientist on the Llama Research team at GenAI, Meta. He was previously a researcher at Google DeepMind (GDM) and the International Computer Science Institute (ICSI), and a research intern at Meta’s Fundamental AI Research (FAIR) labs and Generative AI (GenAI) Research team. His research focuses on self-supervised learning, multimodal models, and machine learning, with an emphasis on developing foundational AI systems that go beyond the constraints of human supervision. By advancing self-supervised learning techniques for multimodal models—minimizing reliance on human-annotated data—he aims to build intelligent systems capable of understanding and interacting with their environment in ways that mirror, and potentially surpass, the complexity, adaptability, and richness of human intelligence. He is a recipient of the William Oldham Fellowship at UC Berkeley, awarded for outstanding graduate research in EECS.
April 21 DBH 4011 1 pm	Felix Draxler Postdoctoral Researcher Department of Computer Science, University of California, Irvine Fast and Flexible Generative Modeling with Free-Form Flows Generative models have achieved remarkable quality and success in a variety of machine learning applications, promising to become the standard paradigm for regression. However, each predominant approach comes with drawbacks in terms of inference speed, sample quality, training stability, or flexibility. In this talk, I will propose Free-Form Flows, a new generative model that offers fast data generation at high quality and flexibility. I will guide you through the fundamentals and showcase a variety of scientific applications. Bio: Felix Draxler is a Postdoctoral Researcher at the University of California, Irvine. His research focuses on the fundamentals of generative models, with the goal of making them not only accurate but also fast and versatile. He received his PhD in 2024 from Heidelberg University, Germany.
April 28 DBH 4011 1 pm	Matúš Dopiriak PhD Student Department of Computers and Informatics, Technical University in Košice Radiance Fields Advancing 3D Scene Understanding for Robotics and Autonomous Driving Since emerging in 2020, neural radiance fields (NeRFs) have marked a transformative breakthrough in representing photorealistic 3D scenes. In the years that followed, numerous variants have evolved, enhancing performance, enabling the capture of dynamic changes over time, and tackling challenges in large-scale environments. Among these, NVIDIA’s Instant-NGP stood out, earning recognition as one of TIME Magazine’s Best Inventions of 2022. Radiance fields now facilitate advanced 3D scene understanding, leveraging large language models (LLMs) and diffusion models to enable sophisticated scene editing and manipulation. Their applications span robotics, where they support planning, navigation, and manipulation. In autonomous driving, they serve as immersive simulation systems or can be used as digital twins for video compression integrated in edge computing architectures. This lecture explores the evolution, capabilities, and practical impact of radiance fields in these cutting-edge domains. Bio: Matúš Dopiriak is a 3rd-year PhD candidate at the Technical University in Košice, Department of Computers and Informatics, advised by Professor Ing. Juraj Gazda, PhD. His research explores the integration of radiance fields in autonomous mobility within edge computing architectures. Additionally, he studies the application of Large Vision-Language Models (LVLMs) to address edge-case scenarios in traffic through simulations that generate and manage these uncommon and hazardous conditions.
May 5 ISEB 1010 4 pm	Davide Corsi Postdoctoral Researcher Department of Computer Science, University of California, Irvine Safe Reinforcement Learning: Building Agents You Can Trust Reinforcement learning is increasingly used to train robots for tasks where safety is critical, such as autonomous surgery and navigation. However, when combined with deep neural networks, these systems can become unpredictable and difficult to trust in contexts where even a single error is often unacceptable. This talk explores two complementary paths toward safer reinforcement learning: making agents more reliable through constrained training, and adding formal guarantees through techniques such as verification and shielding. In the second part of the talk, we will look at the growing role of world modeling in robotics and how this, together with the rise of large foundation models, opens up new challenges for ensuring safety in complex, real-world environments. Bio: Davide Corsi is a postdoctoral researcher at the University of California, Irvine, where he works in the Intelligent Dynamics Lab led by Prof. Roy Fox. His research lies at the intersection of deep reinforcement learning and robotics, with a strong focus on ensuring that intelligent agents behave safely and reliably when deployed in real-world, safety-critical environments. He earned his PhD in Computer Science from the University of Verona under the supervision of Prof. Alessandro Farinelli, with a dissertation on safe deep reinforcement learning that explored both constrained policy optimization and formal verification techniques. During his PhD, he spent time as a visiting researcher at the Hebrew University of Jerusalem with Prof. Guy Katz, where he investigated the integration of neural network verification into learning-based control systems. Davide’s recent work also explores generative world models and causal reasoning to enable autonomous agents to predict long-term outcomes and safely adapt to new situations. His research has been published at leading venues such as AAAI, IJCAI, ICLR, IROS, and RLC, where he was recently recognized with an Outstanding Paper Award for his work on autonomous underwater navigation.
May 12 DBH 4011 1 pm	To Be Announced To Be Announced To be announced.
May 19 DBH 4011 1 pm	To Be Announced To Be Announced To be announced.
May 26	No Seminar (Memorial Day Holiday)
June 2 DBH 4011 1 pm	To Be Announced To Be Announced To be announced.

Winter 2025

Standard

Jan. 13 DBH 4011 1 pm	Dongxia Wu PhD Student Dept. of Computer Science & Engineering, University of California, San Diego Uncertainty Quantification for Scientific Machine Learning Scientific Machine Learning (SML) is an emerging interdisciplinary field with wide-ranging applications in domains such as public health, climate science, and drug discovery. The primary goal of SML is to develop data-driven surrogate models that can learn spatiotemporal dynamics or predict key system properties, thereby accelerating time-intensive simulations and reducing the need for real-world experiments. To make SML approaches truly reliable for domain experts, Uncertainty Quantification (UQ) plays a critical role in enabling risk assessment and informed decision-making. In this presentation, I will first introduce our recent advancements in UQ for spatiotemporal and multi-fidelity surrogate modeling with Bayesian deep learning, focusing on applications in accelerating computational epidemiology simulations. Following this, I will demonstrate how quantified uncertainties can be leveraged to design sample-efficient algorithms for adaptive experimental design, with a focus on Bayesian Active Learning and Bayesian Optimization. Bio: Dongxia (Allen) Wu is a Ph.D. student in the Department of Computer Science and Engineering at UC San Diego, advised by Rose Yu and Yian Ma. His research focuses on Bayesian Deep Learning, Sequential Decision Making, Scientific Machine Learning, and Spatiotemporal Modeling, with applications in public health, climate science, and drug design. His work has been published in ICML, KDD, AISTATS, and PNAS. He developed DeepGLEAM for COVID-19 incident death forecasting, which achieved the highest coverage ranking in the CDC Forecasting Hub. He is also the recipient of UCSD HDSI Ph.D. Fellowship.
Jan. 20	No Seminar (Martin Luther King, Jr. Day)
Jan. 27 DBH 4011 1 pm	Satish Kumar Thittamaranahalli Research Associate Professor Department of Computer Science, University of Southern California Revisiting FastMap: New Applications FastMap was first introduced in the Data Mining community for generating Euclidean embeddings of complex objects. In this talk, I will first generalize FastMap to generate Euclidean embeddings of graphs in near-linear time: The pairwise Euclidean distances approximate a desired graph-based distance function on the vertices. I will then apply the graph version of FastMap to efficiently solve various graph-theoretic problems of significant interest in AI: including shortest-path computations, facility location, top-K centrality computations, and community detection and block modeling. I will also present a novel learning framework, called FastMapSVM, by combining FastMap and Support Vector Machines. I will then apply FastMapSVM to predict the satisfiability of Constraint Satisfaction Problems and to classify seismograms in Earthquake Science. Bio: Prof. Satish Kumar Thittamaranahalli (T. K. Satish Kumar) leads the Collaboratory for Algorithmic Techniques and Artificial Intelligence at the Information Sciences Institute of the University of Southern California. He is a Research Associate Professor in USC’s Department of Computer Science, Department of Physics and Astronomy, and Department of Industrial and Systems Engineering. He has published extensively on numerous topics spanning diverse areas such as Constraint Reasoning, Planning and Scheduling, Probabilistic Reasoning, Machine Learning and Data Informatics, Robotics, Combinatorial Optimization, Approximation and Randomization, Heuristic Search, Model-Based Reasoning, Computational Physics, Knowledge Representation, and Spatiotemporal Reasoning. He has served on the program committees of many international conferences and is a winner of three Best Paper Awards. Prof. Kumar received his PhD in Computer Science from Stanford University in March 2005. In the past, he has also been a Visiting Student at the NASA Ames Research Center, a Postdoctoral Research Scholar at the University of California, Berkeley, a Research Scientist at the Institute for Human and Machine Cognition, a Visiting Assistant Professor at the University of West Florida, and a Senior Research and Development Scientist at Mission Critical Technologies.
Feb. 3 DBH 4011 1 pm	Francesco Immorlano Postdoctoral Researcher Department of Computer Science, University of California, Irvine Transferring Climate Change Physical Knowledge Earth system models (ESMs) are the main tools currently used to project global mean temperature rise according to several future greenhouse gases emissions scenarios. Accurate and precise climate projections are required for climate adaptation and mitigation, but these models still exhibit great uncertainties that are a major roadblock for policy makers. Several approaches have been developed to reduce the spread of climate projections, yet those methods cannot capture the non-linear complexity inherent in the climate system. Using a Transfer Learning approach, Machine Learning can leverage and combine the knowledge gained from ESMs simulations and historical observations to more accurately project global surface air temperature fields in the 21st century. This helps enhance the representation of future projections and their associated spatial patterns which are critical to climate sensitivity. Bio: Francesco Immorlano is a Postdoctoral Researcher at the University of California, Irvine with a Ph.D. in Engineering of Complex Systems from the University of Salento. Since May 2020 he has been collaborating with the CMCC Foundation and was a visiting researcher at Columbia University in Spring 2022. His main research work is focused on deep learning and generative models with a specific application to the climate science domain.
Feb. 10 DBH 4011 1 pm	Kolby Nottingham PhD Student Department of Computer Science, University of California, Irvine Aligning Language Model Agents to Environment Dynamics Language model agents are tackling challenging tasks from embodied planning to web navigation to programming. These models are a powerful artifact of natural language processing research that are being applied to interactive environments traditionally reserved for reinforcement learning. However, many environments are not natively expressed in language, resulting in poor alignment between language representations and true states and actions. Additionally, while language models are generally capable, their biases from pretraining can be unaligned with specific environment dynamics. In this talk, I cover our research into rectifying these issues through methods such as: (1) mapping high-level language model plans to low-level actions, (2) optimizing language model agent inputs using reinforcement learning, and (3) in-context policy improvement for continual task adaptation. Bio: Kolby Nottingham is a 5th-year CS PhD student at the University of California Irvine co-advised by Roy Fox and Sameer Singh. His research applies algorithms and insights from reinforcement learning to improve the potential of agentic language model applications. He has diverse industry experience from internships at companies such as Nvidia, Unity, and Allen AI. Kolby is also excited by prospective applications of his work in the video game industry and has experience doing research for game studios such as Latitude and Riot Games.
Feb. 17	No Seminar (Presidents’ Day)

Fall 2024

Standard

Oct. 7 DBH 4011 1 pm	Harsh Trivedi PhD Student Department of Computer Science, Stony Brook University AppWorld: Reliable Evaluation of Interactive Agents in a World of Apps and People We envision a world where AI agents (assistants) are widely used for complex tasks in our digital and physical worlds and are broadly integrated into our society. To move towards such a future, we need an environment for a robust evaluation of agents’ capability, reliability, and trustworthiness. In this talk, I’ll introduce AppWorld, which is a step towards this goal in the context of day-to-day digital tasks. AppWorld is a high-fidelity simulated world of people and their digital activities on nine apps like Amazon, Gmail, and Venmo. On top of this fully controllable world, we build a benchmark of complex day-to-day tasks such as splitting Venmo bills with roommates, which agents have to solve via interactive coding and API calls. One of the fundamental challenges with complex tasks lies in accounting for different ways in which the tasks can be completed. I will describe how we address this challenge using a reliable and programmatic evaluation framework. Our benchmarking evaluations show that even the best LLMs, like GPT-4o, can only solve ~30% of such tasks, highlighting the challenging nature of the AppWorld benchmark. I will conclude by laying out exciting future research that can be conducted on the foundation of AppWorld, such as benchmarks and playground for developing multimodal, collaborative, safe, socially intelligent, resourceful, and fail-tolerant agents. Bio: Harsh Trivedi is a final year PhD student at Stony Brook University, advised by Niranjan Balasubramanian. He is broadly interested in the development of reliable, explainable AI systems and their rigorous evaluation. Specifically, his research spans the domains of AI agents, multi-step reasoning, AI safety, and efficient NLP. He has interned at AI2 and was a visiting researcher at NYU. His recent work, AppWorld, received a Best Resource Paper award at ACL’24, and his work on AI safety via debate received a Best Paper award at the ML Safety workshop at NeurIPS’22.
Oct. 14 DBH 4011 1 pm	Kushagra Pandey PhD Student Department of Computer Science, University of California, Irvine Conjugate Integrators for Fast Sampling in Diffusion Models Diffusion models exhibit excellent sample quality across multiple-generation tasks. However, their inference process is iterative and often requires hundreds of function evaluations. Moreover, it is unclear if existing methods for accelerating diffusion model sampling can generalize well across different types of diffusion processes. In the first part of my talk, I will introduce Conjugate Integrators, which project unconditional diffusion dynamics to an alternate space that is more amenable to faster sampling. The resulting framework possesses several interesting theoretical connections with prior work in fast diffusion sampling, enabling their application to a broader class of diffusion processes. In the second part of my talk, I will extend the idea of Conjugate Integrators from unconditional sampling to conditional diffusion sampling in the context of solving inverse problems. Empirically, on challenging inverse problems like 4x super-resolution on the ImageNet-256 dataset, conditional Conjugate Integrators can generate high-quality samples in as few as 5 conditional sampling steps, providing significant speedups over prior work. Bio: Kushagra is a third-year PhD student in Computer Science at UCI, advised by Prof. Stephan Mandt. Previously, he completed his bachelor’s and master’s degrees in Computer Science from the Indian Institute of Technology. He is broadly interested in the efficient design and inference in deep generative models with a current focus on iterative refinement models like Diffusion Models and Stochastic Interpolants.
Oct. 21 DBH 4011 1 pm	Tianchen Qian Assistant Professor of Statistics University of California, Irvine Causal inference and machine learning in mobile health: Modeling time-varying effects using longitudinal functional data To optimize mobile health interventions and advance domain knowledge on intervention design, it is critical to understand how the intervention effect varies over time and with contextual information. This study aims to assess how a push notiﬁcation suggesting physical activity inﬂuences individuals’ step counts using data from the HeartSteps micro-randomized trial (MRT). The statistical challenges include the time-varying treatments and longitudinal functional step count measurements. We propose the ﬁrst semiparametric causal excursion effect model with varying coefﬁcients to model the time-varying effects within a decision point and across decision points in an MRT. The proposed model incorporates double time indices to accommodate the longitudinal functional outcome, enabling the assessment of time-varying effect moderation by contextual variables. We propose a two-stage causal effect estimator that uses machine learning and is robust against a misspeciﬁed high-dimensional outcome regression nuisance model. We establish asymptotic theory and conduct simulation studies to validate the proposed estimator. Our analysis provides new insights into individuals’ change in response proﬁles (such as how soon a response occurs) due to the activity suggestions, how such changes differ by the type of suggestions received, and how such changes depend on other contextual information such as being recently sedentary and the day being a weekday. Bio: Tianchen Qian is an Assistant Professor in Statistics at UC Irvine. His research focuses on leveraging data science, mobile technology, and wearable devices to design robust, personalized, and cost-effective interventions that can impact health and well-being at a significant scale. He also works on causal inference, experimental design, machine learning, semiparametric efficiency theory, and longitudinal data methods. He has a PhD in Biostatistics from Johns Hopkins University. Before joining UCI, he was a postdoc fellow in Statistics at Harvard University.
Oct. 28 DBH 4011 1 pm	Jana Lipkova Assistant Professor, Department of Pathology School of Medicine, University of California, Irvine AI-based multimodal data fusion for outcome prediction in oncology In oncology, the patient state is characterized by a spectrum of diverse medical data, each providing unique insights. The vast amount of data, however, makes it difficult for experts to adequately assess patient prognosis under the multimodal context. We present a deep learning-based multimodal framework for integration of radiology, histopathology, and genomics data to improve patient outcome prediction. The framework does not require annotations, tumor segmentation, or hand-crafted features and can be easily applied to larger cohorts and diverse disease models. The feasibility of the model is tested on two external independent cohorts, including glioma and non-small cell lung cancer, indicating benefits of multimodal data integration for patient risk stratification, outcome prediction, and prognostic biomarker exploration. Bio: Jana Lipkova is an Assistant Professor at the University of California Irvine, in Dept. of Pathology and also in Dept. of Biomedical Engineering. She completed her postdoctoral fellowship in the AI for Pathology group under the guidance of Faisal Mahmood at Harvard Medical School. Prior to her postdoc, she earned PhD in computer-aided medical procedures in the radiology department at Technical University in Munich. Jana’s research lab, called OctoPath, focuses on developing AI methods for diagnosis, prognosis, and treatment optimization in histopathology and beyond (for more see octopath.org).
Nov. 4 DBH 4011 1 pm	Amir Rahmani Professor of Nursing and Computer Science University of California, Irvine Future Health: Harnessing Multimodal Data and GenAI for Health Promotion “Future Health” emphasizes the importance of recognizing each individual’s uniqueness, which arises from their specific omics, lifestyle, environmental, and socioeconomic conditions. Thanks to advancements in sensors, mobile computing, ubiquitous computing, and artificial intelligence (AI), we can now collect detailed information about individuals. This data serves as the foundation for creating personal models, offering predictive and preventive advice tailored specifically to each person. These models enable us to provide precise recommendations that closely align with the individual’s predicted needs. In my presentation, I will explore how AI, including generative AI, and wearable technology are revolutionizing the collection and analysis of big health data in everyday environments. I will discuss the analytics used to evaluate physical and mental health and how smart recommendations can be made objectively. Moreover, I will illustrate how leveraging Large Language Models (LLMs)-powered conversational health agents (CHAs) can integrate personal data, models, and knowledge into healthcare chatbots. Additionally, I will present our open-source initiative on developing OpenCHA (openCHA.com). This integration allows for creating personalized chatbots, enhancing the delivery of health guidance directly tailored to the individual. Bio: Amir M. Rahmani is the founder of the Health SciTech Group at the University of California, Irvine (UCI) and the co-founder and co-director of the Institute for Future Health, a campus-wide Organized Research Unit at UCI. He is also a lifetime docent (Adjunct Professor) at the University of Turku (UTU), Finland. His research includes AI in healthcare, ubiquitous computing, AI-powered bio-signal processing, health informatics, and big health data analytics. He has been leading several NSF, NIH, Academy of Finland, and European Commission-funded projects on Smart Pain Assessment, Community-Centered Care, Family-centered Maternity Care, Stress Management in Adolescents, and Remote Elderly and Family Caregivers Monitoring. He is the co-author of more than 350 peer-reviewed publications and the associate editor-in-chief of ACM Transactions on Computing for Healthcare and Frontiers in Wearable Electronics journals and the Editorial Board of Nature Scientific Reports. He is a distinguished member of the ACM and a senior member of the IEEE.
Nov. 11	No Seminar (Veterans Day Holiday)
Nov. 18 DBH 4011 1 pm	Daniel Seita Assistant Professor of Computer Science University of Southern California In Pursuit of Dexterous and Generalizable Robot Manipulation using Reinforcement Learning, Imitation Learning, and Foundation Models The robotics community has seen significant progress in applying machine learning for robot manipulation. However, despite this progress, developing a system capable of generalizable robot manipulation remains fundamentally difficult, especially when manipulating in clutter and adjusting deformable objects such as fabrics, rope, and liquids. Some promising techniques for developing general robot manipulation systems include reinforcement learning, imitation learning, and more recently, leveraging foundation models trained on internet-scale data, such as GPT-4. In this talk, I will discuss our recent work on (1) deep reinforcement learning for dexterous manipulation in clutter, (2) foundation models and imitation learning for bimanual manipulation, and (3) our benchmarks and applications of foundation models for deformable object manipulation. We will discuss the current strengths and limitations of these approaches. I will conclude with an overview of future research directions and my vision for an exciting future in pursuit of a general, whole-body, and dexterous robot system. Bio: Daniel Seita is an Assistant Professor in the Computer Science department at the University of Southern California and the director of the Sensing, Learning, and Understanding for Robotic Manipulation (SLURM) Lab. His research interests are in computer vision, machine learning, and foundation models for robot manipulation, focusing on improving performance in visually and geometrically challenging settings. Daniel was previously a postdoc at Carnegie Mellon University’s Robotics Institute and holds a PhD in computer science from the University of California, Berkeley. He received undergraduate degrees in math and computer science from Williams College. Daniel’s research has been supported by a six-year Graduate Fellowship for STEM Diversity and by a two-year Berkeley Fellowship. He has the Honorable Mention for Best Paper award at UAI 2017, was an RSS 2022 Pioneer, and has presented work at premier robotics conferences such as ICRA, IROS, RSS, and CoRL.
Nov. 25 DBH 4011 1 pm	Yasaman Razeghi PhD Student Department of Computer Science, University of California, Irvine Evaluating Foundational Models Using Insights from Their Pretraining Data Foundational models have demonstrated exceptional performance on established academic benchmarks, often narrowing the gap between human reasoning and artificial intelligence. While the success of these models is widely attributed to their scale—encompassing both their architectural parameters and the vast pretraining data—the critical role of pretraining data in shaping their capabilities and limitations is often acknowledged but rarely studied. However, if we cannot disentangle model behavior from their pretraining data, how can we trust these systems in real-world, high-stakes applications? In this talk, I will argue that understanding the true performance of foundational models requires going beyond conventional benchmark testing. In particular, incorporating insights from their pretraining data is essential for comprehensively evaluating and interpreting the models’ capabilities and limitations. I show that while models –both multimodal and language models– often excel in benchmark settings, they can fail on basic, trivial reasoning tasks, raising concerns about their true robustness. To better understand these limitations, I propose examining the relationship between a model’s successes and failures through the lens of its pretraining data and present methodologies and tools for studying how pretraining data impacts a model’s performance. By revealing failure modes in these models and exposing the impact of pretraining data on their behavior, this work cautions against overly optimistic interpretations of models’ abilities based on canonical evaluation results. Bio: Yasaman Razeghi is a final-year Ph.D. student at UCI, advised by Prof. Sameer Singh. She completed her master’s and undergraduate studies at the University of Tehran in Iran. Her research focuses on understanding the relationships between pretraining data characteristics and model behavior. Most recently, she has been investigating foundational models in scenarios involving reasoning and multimodal capabilities.
Dec. 2 DBH 4011 1 pm	Ali Younis PhD Student Department of Computer Science, University of California, Irvine End-to-end Learnable Particle Filters and Smoothers Estimating the temporal state of a system from image sequences is an important task for many vision and robotics applications. A number of classical frameworks for state estimation have been proposed, but often these methods require human experts to specify the system dynamics and measurement model, requiring simplifying assumptions that hurt performance. With the increasing abundance of real-world training data, there is enormous potential to boost accuracy by using deep learning to learn state estimation algorithms, but there are also substantial technical challenges in properly accounting for uncertainty. In this presentation, I will develop end-to-end learnable particle filters and particle smoothers, and show how to bring classic state estimation methods into the age of deep learning. We first create an end-to-end learnable particle filter that uses flexible neural networks to propagate multimodal, particle-based representations of state uncertainty. Our gradient estimators are unbiased and have substantially lower variance than existing, differentiable (but biased) particle filters. We apply our end-to-end learnable particle filter to the difficult task of visual localization in unknown environments, and show large improvements over prior work. We then expand on our particle filtering method to create the first end-to-end learnable particle smoother, which incorporates information from future as well as past observations, and apply this particle smoother to the real-world task of city-scale geo-localization using camera and planimetric map data. We compare to state-of-the-art baselines for visual geo-localization, and again show superior performance. Bio: Ali Younis is a final-year PhD student in Computer Science at UCI, advised by Prof. Erik Sudderth. He previously completed his bachelor’s and master’s degrees at UCI and briefly worked on spacecraft systems before returning for a PhD. He is broadly interested in particle based belief propagation systems for time varying systems with applications in computer vision.

Winter and Spring 2024

Standard

Jan. 8 DBH 4011 1 pm	Fuxin Li Associate Professor of Electrical Engineering and Computer Science Oregon State University From Heatmaps to Structural and Counterfactual Explanations This talk will focus on our endeavors in the past few years on explaining deep image models. Realizing that an important missing piece for explaining neural networks is a reliable heatmap visualization tool, we developed I-GOS and iGOS++ which optimize with integrated gradients to avoid local optima in heatmap generations and improve performance in high-resolution heatmaps. Especially, iGOS++ was able to discover that deep classifiers trained on COVID-19 X-ray images wrongly focus on the characters printed on the image and could produce erroneous solutions. This shows the utility of explanation in “debugging” deep classifiers. During the development of those visualizations, we realize that for a significant number of images, the classifier has multiple different paths to reach a confident prediction. This leads to our recent development of structural attention graphs, an approach that utilizes beam search to locate multiple coarse heatmaps for a single image, and compactly visualizes a set of image masks by capturing how different combinations of image regions impact the confidence of a classifier. A user study shows significantly better capability of users to answer counterfactual questions when presented with SAG versus conventional heatmaps. We will also show our findings from running explanation algorithms on recently popular transformer models, which indicate different decision-making behaviors among CNNs, global attention models and local attention models. Finally, as humans prefer visually appealing explanations that will literally “change” one class into another, we present results traversing the latent space of variational autoencoders and generative adversarial networks (GANs), generating high-quality counterfactual explanations that visually show how to change one image so that CNNs predict them as another category, without needing to re-train the autoencoders/GANs. When the classifier relies on wrong information to make classifications, the counterfactual explanations will illustrate the errors clearly. Bio: Fuxin Li is currently an associate professor in the School of Electrical Engineering and Computer Science at Oregon State University. Before that, he has held research positions in University of Bonn and Georgia Institute of Technology. He had obtained a Ph.D. degree in the Institute of Automation, Chinese Academy of Sciences in 2009. He has won an NSF CAREER award, an Amazon Research Award, (co-)won the PASCAL VOC semantic segmentation challenges from 2009-2012, and led a team to the 4th place finish in the DAVIS Video Segmentation challenge 2017. He has published more than 70 papers in computer vision, machine learning and natural language processing. His main research interests are 3D point cloud deep networks, human understanding of deep learning, video object segmentation and uncertainty estimation in deep learning.
Jan. 15	No Seminar (Martin Luther King, Jr. Holiday)
Jan. 29 DBH 4011 1 pm	Gavin Kerrigan PhD Student Department of Computer Science, UC Irvine Deep Generative Models in Infinite-Dimensional Spaces Deep generative models have seen a meteoric rise in capabilities across a wide array of domains, ranging from natural language and vision to scientific applications such as precipitation forecasting and molecular generation. However, a number of important applications focus on data which is inherently infinite-dimensional, such as time-series, solutions to partial differential equations, and audio signals. This relatively under-explored class of problems poses unique theoretical and practical challenges for generative modeling. In this talk, we will explore recent developments for infinite-dimensional generative models, with a focus on diffusion-based methodologies. Bio: Gavin Kerrigan is a PhD candidate in the Department of Computer Science at UC Irvine, where he is advised by Padhraic Smyth. Prior to joining UC Irvine, he obtained a BSc in mathematics from the Schreyer Honors College at Penn State University. His research is focused on deep generative models and their application to scientific domains. He was awarded an HPI fellowship and currently serves as a workflow chair for AISTATS.
Feb. 12 DBH 4011 1 pm	Shivanshu Gupta PhD Student Department of Computer Science, UC Irvine Informative Example Selection for In-Context Learning In-context Learning (ICL) uses large language models (LLMs) for new tasks by conditioning them on prompts comprising a few task examples. With the rise of LLMs that are intractable to train or hidden behind APIs, the importance of such a training-free interface cannot be overstated. However, ICL is known to be critically sensitive to the choice of in-context examples. Despite this, the standard approach for selecting in-context examples remains to use general-purpose retrievers due to the limited effectiveness and training requirements of prior approaches. In this talk, I’ll posit that good in-context examples demonstrate the salient information necessary to solve a given test input. I’ll present efficient approaches for selecting such examples, with a special focus on preserving the training-free ICL pipeline. Through results with a wide range of tasks and LLMs, I’ll demonstrate that selecting informative examples can indeed yield superior ICL performance. Bio: Shivanshu Gupta is a Computer Science Ph.D. Candidate at the University of California Irvine, advised by Sameer Singh. Prior to this, he was a Research Fellow at LinkedIn and Microsoft Research India, and completed his B.Tech. and M.Tech. in Computer Science at IIT Delhi. His primary research interests are systematic generalization, in-context learning, and multi-step reasoning capabilities of large language models.
Feb. 20 DBH 6011 2 pm	Max Welling Professor and Research Chair in Machine Learning University of Amsterdam The Synergy between Machine Learning and the Natural Sciences Traditionally machine learning has been heavily influenced by neuroscience (hence the name artificial neural networks) and physics (e.g. MCMC, Belief Propagation, and Diffusion based Generative AI). We have recently witnessed that the flow of information has also reversed, with new tools developed in the ML community impacting physics, chemistry and biology. Examples include faster DFT, Force-Field accelerated MD simulations, PDE Neural Surrogate models, generating druglike molecules, and many more. In this talk I will review the exciting opportunities for further cross fertilization between these fields, ranging from faster (classical) DFT calculations and enhanced transition path sampling to traveling waves in artificial neural networks. Bio: Prof. Dr. Max Welling is a research chair in Machine Learning at the University of Amsterdam. He is a fellow at the Canadian Institute for Advanced Research (CIFAR) and the European Lab for Learning and Intelligent Systems (ELLIS) where he also serves on the founding board. His previous appointments include VP at Qualcomm Technologies, professor at UC Irvine, postdoc at U. Toronto and UCL under supervision of Prof. Geoffrey Hinton, and postdoc at Caltech under supervision of Prof. Pietro Perona. He finished his PhD in theoretical high energy physics under supervision of Nobel laureate Prof. Gerard ‘t Hooft.
Mar. 4 DBH 4011 11 am	Bratin Saha Vice President of Machine Learning and AI Services Amazon Web Services Scaling Generative AI in the Enterprise Machine learning (ML) and generative artificial intelligence (AI) is one of the most transformational technologies that is opening up new opportunities for innovation in every domain across software, finance, health care, manufacturing, media, entertainment and others. This talk will discuss the key trends that are driving AI/ML innovation, how enterprises are using AI/ML today to innovate how they run their businesses, the key technology challenges in scaling out ML and generative AI across the enterprise, some of the key innovations from Amazon, and how this field is likely to evolve in the future. Bio: Dr. Bratin Saha is the Vice President of Machine Learning and AI services at AWS where he leads all the ML and AI services and helped build one of the fastest growing businesses in AWS history. In 2022 Harvard Business School wrote three case studies on how he built the machine learning business at AWS. He is an alumnus of Harvard Business School (General Management Program), Yale University (PhD Computer Science), and Indian Institute of Technology (BS Computer Science). He has more than 70 patents granted (with another 50+ pending) and more than 30 papers in conferences/journals. Prior to Amazon he worked at Nvidia and Intel leading different product groups spanning imaging, analytics, media processing, high performance computing, machine learning, and software infrastructure. Bratin received the Distinguished Alumnus Award from the Indian Institute of Technology and is an Executive Fellow at the Harvard Business School.
Mar. 7 DBH 4011 11 am	Terra Blevins PhD Student School of Computer Science and Engineering, University of Washington Breaking the Curse of Multilinguality in Language Models While language models (or LMs, à la ChatGPT) have become the predominant tool in natural language processing, their performance in non-English languages increasingly lags behind. This gap is due to the curse of multilinguality, which harms individual language performance in multilingual models through inter-language competition for model capacity. In this talk, I examine how current language models do and don’t capture different languages and present new methods for fair modeling of all languages. First, I demonstrate how LMs become multilingual through their data and training dynamics. Specifically, I show how data contamination teaches ostensibly English models cross-lingual information; I then characterize when multilingual models learn (and forget) languages during training to uncover how the curse of multilinguality develops. These analyses provide key insights into developing more equitable multilingual models, and I propose a new language modeling approach for Cross-Lingual Expert Language Models (X-ELM) that explicitly allocates model resources to reduce language competition. Bio: Terra Blevins is a Ph.D. candidate in the Paul G. Allen School of Computer Science and Engineering at the University of Washington. Her research focuses on linguistic analysis of language models and multilingual NLP, with the underlying aim of using analysis to build better, more equitable multilingual systems. She has received the NSF Graduate Research Fellowship for her research and previously worked as a visiting researcher at Facebook AI Research (FAIR).
Mar. 11 DBH 4011 11 am	Paola Cascante-Bonilla Postdoctoral Associate University of Maryland Institute for Advanced Computer Studies (UMIACS) More from Less: Learning with Limited Annotated Data in Vision and Language Despite the impressive results of deep learning models, modern large-scale systems are required to be trained using massive amounts of manually annotated or freely available data on the Internet. But this “data in the wild” is insufficient to learn specific structural patterns of the world, and existing large-scale models still fail on common sense tasks requiring compositional inference. – This talk will focus on answering three fundamental questions: (a) How can we create systems that can learn with limited annotated data and adapt to new tasks and novel criteria? (b) How can we create systems able to encode real-world concepts with granularity in a robust manner? (c) Is it possible to create such a system with alternative data, complying with privacy protection principles and avoiding cultural bias? – Given my work’s intersection with Computer Vision and Natural Language Processing, my aim is to analyze and apply Machine Learning algorithms to understand how images and text can interact and model complex patterns, reinforcing compositional reasoning without forgetting prior knowledge. Finally, I will conclude with my future plans to continue exploring hyper-realistic synthetic data generation techniques and the expressiveness of generative models to train multimodal systems able to perform well in real-world scenarios, with applications including visual-question answering, cross-modal retrieval, zero-shot classification, and task planning. Bio: Paola Cascante-Bonilla is a Ph.D. Candidate in Computer Science at Rice University, working on Computer Vision, Natural Language Processing, and Machine Learning. She has been focusing on multi-modal learning, few-shot learning, semi-supervised learning, representation learning, and synthetic data generation for compositionality and privacy protection. Her work has been published in machine learning, vision, and language conferences (CVPR, ICCV, AAAI, NeurIPS, BMVC, NAACL). She has previously interned at the Mitsubishi Electric Research Laboratories (MERL) and twice at the MIT-IBM Watson AI Lab. She is the recipient of the Ken Kennedy Institute SLB Graduate Fellowship (2022/23), and has been recently selected as a Future Faculty Fellow by Rice’s George R. Brown School of Engineering (2023) and as a Rising Star in EECS (2023).
Mar. 14 DBH 4011 11 am	Xi Ye PhD Student Department of Computer Science, University of Texas at Austin Steering Textual Reasoning with Explanations Large language models (LLMs) have significantly extended the boundaries of NLP’s potential applications, partially because of their increased ability to do complex reasoning. However, LLMs have well-documented reasoning failures, such as hallucinations and inability to systematically generalize. In this talk, I describe my work on enhancing LLMs in reliably performing textual reasoning, with a particular focus on leveraging explanations. I will first introduce a framework for automatically assessing the robustness of black-box models using explanations. The framework first extracts features to describe the “reasoning process” disclosed by the explanations, and then uses a trained verifier to judge the reliability of predictions based on these features. I will then describe how to form effective explanations for better teaching LLMs to reason. My work uses declarative formal specifications as explanations, which enables using an SMT solver to amend the limited planning capabilities of LLMs. Finally, I will describe future directions for further enhancing LLMs to better aid humans in challenging real-world applications demanding deep reasoning. Bio: Xi Ye is a Ph.D. candidate in the Department of Computer Science at the University of Texas at Austin, advised by Greg Durrett. His research is in the area of natural language processing, particularly in leveraging explanations to steer language models for complex textual reasoning tasks. He is also interested in semantic parsing and program synthesis. He is a co-instructor of the tutorial on Explanations in the Era of Large Language Models at NAACL 24 and a co-organizer of the workshop on Natural Language Reasoning and Structured Explanations at ACL 24.
Mar. 19 DBH 4011 11 am	Sai Praneeth Karimireddy Postdoctoral Researcher University of California, Berkeley Building Planetary-Scale Collaborative Intelligence Today, access to high-quality data has become the key bottleneck to deploying machine learning. Often, the data that is most valuable is locked away in inaccessible silos due to unfavorable incentives and ethical or legal restrictions. This is starkly evident in health care, where such barriers have led to highly biased and underperforming tools. Using my collaborations with Doctors Without Borders and the Cancer Registry of Norway as case studies, I will describe how collaborative learning systems, such as federated learning, provide a natural solution; they can remove barriers to data sharing by respecting the privacy and interests of the data providers. Yet for these systems to truly succeed, three fundamental challenges must be confronted: These systems need to 1) be efficient and scale to massive networks, 2) manage the divergent goals of the participants, and 3) provide resilient training and trustworthy predictions. I will discuss how tools from optimization, statistics, and economics can be leveraged to address these challenges. Bio: Sai Praneeth Karimireddy is a postdoctoral researcher at the University of California, Berkeley with Mike I. Jordan. Karimireddy obtained his undergraduate degree from the Indian Institute of Technology Delhi and his PhD at the Swiss Federal Institute of Technology Lausanne (EPFL) with Martin Jaggi. His research builds large-scale machine learning systems for equitable and collaborative intelligence and designs novel algorithms that can robustly and privately learn over distributed data (i.e., edge, federated, and decentralized learning). His work has seen widespread real-world adoption through close collaborations with public health organizations (e.g., Doctors Without Borders, the Red Cross, the Cancer Registry of Norway) and with industries such as Meta, Google, OpenAI, and Owkin.
Apr. 2 DBH 4011 11 am	Chen Wei PhD Student Department of Computer Science, Johns Hopkins University Learning Generalized Knowledge for AI with Limited Supervision As babies, we begin to grasp the world through spontaneous observations, gradually developing generalized knowledge about the world. This foundational knowledge enables humans to effortlessly learn new skills without extensive teaching for each task. Can we develop a similar paradigm for AI? This talk describes how learning from limited supervision can address fundamental challenges in AI such as scalability and generalization while embedding generalized knowledge. I will first talk about our research in self-supervised learning, utilizing natural images and videos without human-annotated labels. By deriving supervisory signals from the data itself, our self-supervised models achieve scalable and universal representations. The second part will describe how to leverage non-curated image-text pairs, through which we obtain textual representation of images. This representation comprehensively describes semantic elements in an image and bridges various AI tools such as large language models (LLMs), enabling diverse vision-language applications. Collectively, this talk argues the benefits of harnessing limited supervision for developing more general, adaptable, and efficient AI, which is ultimately better able to serve human needs. Bio: Chen Wei is a PhD candidate at the Computer Science Department of Johns Hopkins University, advised by Alan Yuille. Her research in Artificial Intelligence, Machine Learning, and Computer Vision focuses on developing AI systems that generalize to a wide range of novel tasks and are adaptable to new environments. Her research involves Self-Supervised Learning, Generative Modeling, and Vision-Language Understanding. She has published at many top-tier CV and AI venues. Chen is a recipient of EECS Rising Star in 2023 and ECCV Outstanding Reviewer. During her PhD, Chen was a research intern at FAIR at Meta AI and Google DeepMind. Before Johns Hopkins, she obtained her BS in Computer Science from Peking University.
Apr. 5 DBH 2011 3 pm	Kun Zhang Professor and Director, Center for Integrative AI Mohamed bin Zayed University of Artificial Intelligence Causal Representation Learning: Discovery of the Hidden World Causality is a fundamental notion in science, engineering, and even in machine learning. Uncovering the causal process behind observed data can naturally help answer ‘why’ and ‘how’ questions, inform optimal decisions, and achieve adaptive prediction. In many scenarios, observed variables (such as image pixels and questionnaire results) are often reflections of the underlying causal variables rather than being the causal variables themselves. Causal representation learning aims to reveal the underlying high-level hidden causal variables and their relations. It can be seen as a special case of causal discovery, whose goal is to recover the underlying causal structure or causal model from observational data. The modularity property of a causal system implies properties of minimal changes and independent changes of causal representations, and in this talk, we show how such properties make it possible to recover the underlying causal representations from observational data with identifiability guarantees: under appropriate assumptions, the learned representations are consistent with the underlying causal process. Various problem settings are considered, involving independent and identically distributed (i.i.d.) data, temporal data, or data with distribution shift as input. We demonstrate when identifiable causal representation learning can benefit from flexible deep learning and when suitable parametric assumptions have to be imposed on the causal process, complemented with various examples and applications. Bio: Kun Zhang is currently on leave from Carnegie Mellon University (CMU), where he is an associate professor of philosophy and an affiliate faculty in the machine learning department; he is working as a professor and the acting chair of the machine learning department and the director of the Center for Integrative AI at Mohamed bin Zayed University of Artificial Intelligence (MBZUAI). He develops methods for making causality transparent by torturing various kinds of data and investigates machine learning problems including transfer learning, representation learning, and reinforcement learning from a causal perspective. He has been frequently serving as a senior area chair, area chair, or senior program committee member for major conferences in machine learning or artificial intelligence, including UAI, NeurIPS, ICML, IJCAI, AISTATS, and ICLR. He was a co-founder and general & program co-chair of the first Conference on Causal Learning and Reasoning (CLeaR 2022), a program co-chair of the 38th Conference on Uncertainty in Artificial Intelligence (UAI 2022), and is a general co-chair of UAI 2023.
Apr. 8 DBH 4011 11 am	Unnat Jain Postdoctoral Researcher Carnegie Mellon University, Fundamental AI Research (FAIR) at Meta Jump-starting Embodied Intelligence AI has revolutionized the way we interact online. Despite this, it hasn’t quite made the leap when it comes to tasks like cooking dinner or cleaning our desks. Why has AI excelled in automating our digital interactions but not in assisting us with physical tasks? In my talk, I will explore the challenges of applying AI to embodied tasks—those requiring physical interaction with the environment. To address these challenges, I turn to the efficient pathways humans use to achieve embodied intelligence and propose three strategies to ‘jump-start’ the learning process for embodied AI agents: (1) combining learning from both teachers and own experience, (2) leveraging external information or “hints” to simplify learning, such as using maps to learn about physical spaces, and (3) learning intelligent behaviors by simply observing others. These strategies integrate insights from perception and machine learning to bridge the gap between digital AI and embodied intelligence, ultimately enhancing AI’s usefulness and integration into our physical world. Bio: Unnat Jain is a postdoctoral researcher at Carnegie Mellon University and Fundamental AI Research (FAIR) at Meta, where he works with Abhinav Gupta, Deepak Pathak, and Xinlei Chen. He received his PhD in Computer Science from UIUC, working with Alexander Schwing and Svetlana Lazebnik and collaborating with Google DeepMind and Allen Institute for AI. His research focuses on embodied intelligence, bridging computer vision (perception) and robot learning (action). Unnat is committed to fostering a collaborative research community and serves as an area chair at CVPR and NeurIPS, and has co-led workshops such as Adaptive Robotics (CoRL) and ‘Scholars & Big Models: How Can Academics Adapt?’ (CVPR). Unnat’s achievements have been recognized with several awards, including the Mavis Future Faculty Fellowship, Director’s Gold Medal at IIT Kanpur, Siebel Scholars, two best thesis awards, Microsoft and Google Fellowship nominations, and was a finalist of the Qualcomm Fellowship.
Apr. 10 DBH 4011 11 am	Karen Ullrich Research Scientist Fundamental AI Research (FAIR) at Meta, New York Challenges in Improving and Applying Generative Models The emergence of powerful, ever more universal models such as ChatGPT, and stable Diffusion, made generative modeling (GM) undoubtedly a focal point for modern AI research. In the talk, we will discuss applications of GM and how GM fits into a vision of autonomous machine intelligence. We will critically examine the sustainability of scaling AI models, a prevalent approach driving remarkable advancements in GM. Despite significant successes, I highlight the substantial physical, economic, and environmental limitations of continuous scaling, questioning its long-term feasibility. Furthermore, we will discuss inherent limitations in current high performance models that lead to a lack of tractability of statistical queries necessary to enable reasoning. Bio: My research agenda is centered on developing principled strategies based on information theory to mitigate these limitations. Through my doctoral and postdoctoral work, I have introduced model-agnostic methods for reducing model complexity and computational demands, significantly advancing the field of data compression and efficiency in AI models.
Apr. 11 DBH 4011 11 am	Hila Gonen Postdoctoral Researcher School of Computer Science & Engineering, University of Washington Unlocking Language Models: Controlling LMs to Enable NLP for All Large language models (LLMs) have soared in popularity in recent years, thanks to their ability to generate well-formed natural language answers for a myriad of topics. Despite their astonishing capabilities, they still suffer from various limitations. This talk will focus on two of them: the limited control over LLMs, and their failure to serve users from diverse backgrounds. I will start by presenting my research on controlling and enriching language models through the input (prompting). In the second part, I will introduce a novel algorithmic method to remove protected properties (such as gender and race) from text representations, which is crucial for preserving privacy and promoting fairness. The third part of the talk will focus on my research efforts to develop models that support multiple languages, and the challenges faced when working with languages other than English. These efforts together unlock language technology for different user groups and across languages. I will conclude by presenting my vision for safer and more reliable language modeling going forward. Bio: Hila is a postdoctoral researcher at the Paul G. Allen School of Computer Science & Engineering at the University of Washington. Hila’s research lies in the intersection of Natural Language Processing, Machine Learning, and AI. In her research, she works towards two main goals: (1) developing algorithms and methods for controlling the model’s behavior; (2) making cutting-edge language technology available and fair across speakers of different languages and users of different socio-demographic groups.
Apr. 15 DBH 4011 11 am	Peter West PhD Student School of Computer Science & Engineering, University of Washington Hidden Capabilities and Counterintuitive Limits in Large Language Models Massive scale has been a recent winning recipe in natural language processing and AI, with extreme-scale language models like GPT-4 receiving most attention. This is in spite of staggering energy and monetary costs, and further, the continuing struggle of even the largest models with concepts such as compositional problem solving and linguistic ambiguity. In this talk, I will propose my vision for a research landscape where compact language models share the forefront with extreme scale models, working in concert with many pieces besides scale, such as algorithms, knowledge, information theory, and more. The first part of my talk will cover alternative ingredients to scale, including (1) an inference-time algorithm that combines language models with elements of discrete search and information theory and (2) a method for transferring useful knowledge from extreme-scale to compact language models with synthetically generated data. Next, I will discuss counterintuitive disparities in the capabilities of even extreme-scale models, which can meet or exceed human performance in some complex tasks while trailing behind humans in what seem to be much simpler tasks. Finally, I will discuss implications and next steps in scale-alternative methods. Bio: Peter West is a PhD candidate in the Paul G. Allen School of Computer Science & Engineering at the University of Washington, working with Yejin Choi. His research is focused on natural language processing and language models, particularly combining language models with elements of knowledge, search algorithms, and information theory to equip compact models with new capabilities. In parallel, he studies the limits that even extreme-scale models have yet to solve. His work has received multiple awards, including best methods paper at NAACL 2022, and outstanding paper awards at ACL and EMNLP in 2023. His work has been supported in part by the NSERC PGS-D fellowship. Previously, Peter received a BSc in computer science from the University of British Columbia.

Fall 2023

Standard

Oct. 9	No Seminar (Southern California AI and Biomedicine Symposium)
Oct. 16 DBH 4011 1 pm	Marius Kloft Professor of Computer Science PTU Kaiserslautern-Landau, Germany Deep Anomaly Detection Anomaly detection is one of the fundamental topics in machine learning and artificial intelligence. The aim is to find instances deviating from the norm – so-called ‘anomalies’. Anomalies can be observed in various scenarios, from attacks on computer or energy networks to critical faults in a chemical factory or rare tumors in cancer imaging data. In my talk, I will first introduce the field of anomaly detection, with an emphasis on ‘deep anomaly detection’ (anomaly detection based on deep learning). Then, I will present recent algorithms and theory for deep anomaly detection, with images as primary data type. I will demonstrate how these methods can be better understood using explainable AI methods. I will show new algorithms for deep anomaly detection on other data types, such as time series, graphs, tabular data, and contamined data. Finally, I will close my talk with an outlook on exciting future research directions in anomaly detection and beyond. Bio: Marius Kloft has worked and researched at various institutions in Germany and the US, including TU Berlin (PhD), UC Berkeley (PhD), NYU (Postdoc), Memorial Sloan-Kettering Cancer Center (Postdoc), HU Berlin (Assist. Prof.), and USC (Visiting Assoc. Prof.). Since 2017, he is a professor of machine learning at RPTU Kaiserslautern-Landau. His research covers a broad spectrum of machine learning, from mathematical theory and fundamental algorithms to applications in medicine and chemical engineering. He received the Google Most Influential Papers 2013 Award, and he is a recipient of the German National Science Foundation’s Emmy-Noether Career Award. In 2022, the paper ‘Deep One-class Classification’ (ICML, 2018) main-authored by Marius Kloft received the ANDEA Test-of-Time Award for the most influential paper in anomaly detection in the last ten years (2012-2022). The paper is highly cited, with around 500 citations per year.
Oct. 23 DBH 4011 1 pm	Sarah Wiegreffe Postdoctoral Researcher Allen Institute for AI and University of Washington Towards Transparent Language Models Recently-released language models have attracted a lot of attention for their major successes and (often more subtle, but still plentiful) failures. In this talk, I will motivate why transparency into model operations is needed to rectify these failures and increase model utility in a reliable way. I will highlight how techniques must be developed in this changing NLP landscape for both open-source models and black-box models behind an API. I will provide an example of each from my recent work demonstrating how improved transparency can improve language model performance on downstream tasks. Bio: Sarah Wiegreffe is a young investigator (postdoc) at the Allen Institute for AI (AI2), working on the Aristo project. She also holds a courtesy appointment in the Allen School at the University of Washington. Her research focuses on language model transparency. She received her PhD from Georgia Tech in 2022, during which she interned at Google and AI2. She frequently serves on conference program committees, receiving outstanding area chair award at ACL 2023.
Oct. 30 DBH 4011 1 pm	Noga Zaslavsky Assistant Professor of Language Science University of California, Irvine Losing bits and finding meaning: Efficient compression underlies meaning in language Our world is extremely complex, and yet we are able to exchange our thoughts and beliefs about it using a relatively small number of words. What computational principles can explain this extraordinary ability? In this talk, I argue that in order to communicate and reason about meaning while operating under limited resources, both humans and machines must efficiently compress their representations of the world. In support of this claim, I present a series of studies showing that: (i) human languages evolve under pressure to efficiently compress meanings into words via the Information Bottleneck (IB) principle; (ii) the same principle can help ground meaning representations in artificial neural networks trained for vision; and (iii) these findings offer a new framework for emergent communication in artificial agents. Taken together, these results suggest that efficient compression underlies meaning in language and offer a new approach to guiding artificial agents toward human-like communication without relying on massive amounts of human-generated training data. Bio: Noga Zaslavsky is an Assistant Professor in UCI’s Language Science department. Before joining UCI this year, she was a postdoctoral fellow at MIT. She holds a Ph.D. (2020) in Computational Neuroscience from the Hebrew University, and during her graduate studies she was also affiliated with UC Berkeley. Her research aims to understand the computational principles that underlie language and cognition by integrating methods from machine learning, information theory, and cognitive science. Her work has been recognized by several awards, including a K. Lisa Yang Integrative Computational Neuroscience Postdoctoral Fellowship, an IBM Ph.D. Fellowship Award, and a 2018 Computational Modeling Prize from the Cognitive Science Society.
Nov. 6 DBH 4011 1 pm	Mariel Werner PhD Student Department of Electrical Engineering and Computer Science, UC Berkeley Provably Personalized and Robust Federated Learning I will be discussing my recent work on personalization in federated learning. Federated learning is a powerful distributed optimization framework in which multiple clients collaboratively train a global model without sharing their raw data. In this work, we tackle the personalized version of the federated learning problem. In particular, we ask: throughout the training process, can clients identify a subset of similar clients and collaboratively train with just those clients? In the affirmative, we propose simple clustering-based methods which are provably optimal for a broad class of loss functions (the first such guarantees), are robust to malicious attackers, and perform well in practice. Bio: Mariel Werner is a 5th-year PhD student in the Department of Electrical Engineering and Computer Science at UC Berkeley advised by Michael I. Jordan. Her research focus is federated learning, with a particular interest in economic applications. Currently, she is working on designing data-sharing mechanisms for firms in oligopolistic markets, motivated by ideas from federated learning. Recently, she has also been studying dynamics of privacy and reputation-building in principal-agent interactions. Mariel holds an undergraduate degree in Applied Mathematics from Harvard University.
Nov. 13 DBH 4011 1 pm	Yian Ma Assistant Professor, Halıcıoğlu Data Science Institute University of California, San Diego MCMC, variational inference, and reverse diffusion Monte Carlo I will introduce some recent progress towards understanding the scalability of Markov chain Monte Carlo (MCMC) methods and their comparative advantage with respect to variational inference. I will fact-check the folklore that “variational inference is fast but biased, MCMC is unbiased but slow”. I will then discuss a combination of the two via reverse diffusion, which holds promise of solving some of the multi-modal problems. This talk will be motivated by the need for Bayesian computation in reinforcement learning problems as well as the differential privacy requirements that we face. Bio: Yian Ma is an assistant professor at the Halıcıoğlu Data Science Institute and an affiliated faculty member at the Computer Science and Engineering Department of UC San Diego. Prior to UCSD, he spent a year as a visiting faculty at Google Research. Before that, he was a post-doctoral fellow at UC Berkeley, hosted by Mike Jordan. Yian completed his Ph.D. at University of Washington. His current research primarily revolves around scalable inference methods for credible machine learning, with application to time series data and sequential decision making tasks. He has received the Facebook research award, and the best paper award at the NeurIPS AABI symposium.
Nov. 20 DBH 4011 1 pm	Yuhua Zhu Assistant Professor, Halicioglu Data Science Institute and Dept. of Mathematics University of California, San Diego Continuous-in-time Limit for Multi-armed Bandit In this talk, I will build the connection between Hamilton-Jacobi-Bellman equations (HJB) and the multi-armed bandit (MAB) problems. HJB is an important equation in solving stochastic optimal control problems. MAB is a widely used paradigm for studying the exploration-exploitation trade-off in sequential decision making under uncertainty. This is the first work that establishes this connection in a general setting. I will present an efficient algorithm for solving MAB problems based on this connection and demonstrate its practical applications. This is a joint work with Lexing Ying and Zach Izzo from Stanford University. Bio: Yuhua Zhu is an assistant professor at UC San Diego, where she holds a joint appointment in the Halicioğlu Data Science Institute (HDSI) and the Department of Mathematics. Previously, she was a Postdoctoral Fellow at Stanford University mentored by Lexing Ying. She earned her Ph.D. from UW-Madison in 2019 advised by Shi Jin, and she obtained her BS in Mathematics from SJTU in 2014. Her work builds the bridge between differential equations and machine learning, spanning the areas of reinforcement learning, stochastic optimization, sequential decision-making, and uncertainty quantification.
Nov. 21 DBH 4011 11 am	Yejin Choi Wissner-Slivka Professor of Computer Science and & Engineering University of Washington and Allen Institute for Artificial Intelligence Possible Impossibilities and Impossible Possibilities In this talk, I will question if there can be possible impossibilities of large language models (i.e., the fundamental limits of transformers, if any) and the impossible possibilities of language models (i.e., seemingly impossible alternative paths beyond scale, if at all). Bio: Yejin Choi is Wissner-Slivka Professor and a MacArthur Fellow at the Paul G. Allen School of Computer Science & Engineering at the University of Washington. She is also a senior director at AI2 overseeing the project Mosaic and a Distinguished Research Fellow at the Institute for Ethics in AI at the University of Oxford. Her research investigates if (and how) AI systems can learn commonsense knowledge and reasoning, if machines can (and should) learn moral reasoning, and various other problems in NLP, AI, and Vision including neuro-symbolic integration, language grounding with vision and interactions, and AI for social good. She is a co-recipient of 2 Test of Time Awards (at ACL 2021 and ICCV 2021), 7 Best/Outstanding Paper Awards (at ACL 2023, NAACL 2022, ICML 2022, NeurIPS 2021, AAAI 2019, and ICCV 2013), the Borg Early Career Award (BECA) in 2018, the inaugural Alexa Prize Challenge in 2017, and IEEE AI’s 10 to Watch in 2016.
Nov. 27 DBH 4011 1 pm	Tryphon Georgiou Distinguished Professor of Mechanical and Aerospace Engineering University of California, Irvine Stochastic thermodynamics: Diffusion models for information and energy transfer The energetic cost of information erasure and of energy transduction can be cast as the stochastic problem to minimize entropy production during thermodynamic transitions. This formalism of Stochastic Thermodynamics allows quantitative assessment of work exchange and entropy production for systems that are far from equilibrium. In the talk we will highlight the cost of Landauer’s bit-erasure in finite time and explain how to obtain bounds the performance of Carnot-like thermodynamic engines and of processes that are powered by thermal anisotropy. The talk will be largely based on joint work with Olga Movilla Miangolarra, Amir Taghvaei, Rui Fu, and Yongxin Chen. Bio: Tryphon T. Georgiou was educated at the National Technical University of Athens, Greece (1979) and the University of Florida, Gainesville (PhD 1983). He is currently a Distinguished Professor at the Department of Mechanical and Aerospace Engineering, University of California, Irvine. He is a Fellow of IEEE, SIAM, IFAC, AAAS and a Foreign Member of the Royal Swedish Academy of Engineering Sciences (IVA).
Dec. 4 DBH 4011 1 pm	Deying Kong Software Engineer, Google Handformer2T: A Lightweight Regression-based model for Interacting Hands Pose Estimation from a single RGB Image Despite its extensive range of potential applications in virtual reality and augmented reality, 3D interacting hand pose estimation from RGB image remains a very challenging problem, due to appearance confusions between keypoints of the two hands, and severe hand-hand occlusion. Due to their ability to capture long range relationships between keypoints, transformer-based methods have gained popularity in the research community. However, the existing methods usually deploy tokens at keypoint level, which inevitably results in high computational and memory complexity. In this talk, we will propose a simple yet novel mechanism, i.e., hand-level tokenization, in our transformer based model, where we deploy only one token for each hand. With this novel design, we will also propose a pose query enhancer module, which can refine the pose prediction iteratively, by focusing on features guided by previous coarse pose predictions. As a result, our proposed model, Handformer2T, can achieve high performance while remaining lightweight. Bio: Deying Kong currently is a software engineer from Google Inc. He earned his PhD in Computer Science from University of California, Irvine in 2022, under the supervision of Professor Xiaohui Xie. His research interests mainly focus on computer vision, especially hand/human pose estimation.
Dec. 11	No Seminar (Finals Week and NeurIPS Conference)

Spring 2023

Standard

Apr. 10 DBH 4011 1 pm	Durk Kingma Research Scientist Google Research Understanding the Diffusion Objective Some believe that maximum likelihood is incompatible with high-quality image generation. We provide counter-evidence: diffusion models with SOTA FIDs (e.g. https://arxiv.org/abs/2301.11093) are actually optimized with the ELBO, with very simple data augmentation (additive noise). First, we show that diffusion models in the literature are optimized with various objectives that are special cases of a weighted loss, where the weighting function specifies the weight per noise level. Uniform weighting corresponds to maximizing the ELBO, a principled approximation of maximum likelihood. In current practice diffusion models are optimized with non-uniform weighting due to better results in terms of sample quality. In this work we expose a direct relationship between the weighted loss (with any weighting) and the ELBO objective. We show that the weighted loss can be written as a weighted integral of ELBOs, with one ELBO per noise level. If the weighting function is monotonic, as in some SOTA models, then the weighted loss is a likelihood-based objective: it maximizes the ELBO under simple data augmentation, namely Gaussian noise perturbation. Our main contribution is a deeper theoretical understanding of the diffusion objective, but we also performed some experiments comparing monotonic with non-monotonic weightings, finding that monotonic weighting performs competitively with the best published results. Bio: I do research on principled and scalable methods for machine learning, with a focus on generative models. My contributions include the Variational Autoencoder (VAE), the Adam optimizer, Glow, and Variational Diffusion Models, but please see Scholar for a more complete list. I obtained a PhD (cum laude) from University of Amsterdam in 2017, and was part of the founding team of OpenAI in 2015. Before that, I co-founded Advanza which got acquired in 2016. My formal name is Diederik, but have the Frysian nickname Durk (pronounced like Dirk). I currently live in the San Francisco Bay area.
Apr. 17 DBH 4011 1 pm	Danish Pruthi Assistant Professor Department of Computational and Data Sciences (CDS) Indian Institute of Science (IISc), Bangalore Evaluating Explanations While large deep learning models have become increasingly accurate, concerns about their (lack of) interpretability have taken a center stage. In response, a growing subfield on interpretability and analysis of these models has emerged. While hundreds of techniques have been proposed to “explain” predictions of models, what aims these explanations serve and how they ought to be evaluated are often unstated. In this talk, I will present a framework to quantify the value of explanations, along with specific applications in a variety of contexts. I would end with some of my thoughts on evaluating large language models and the rationales they generate. Bio: Danish Pruthi is an incoming assistant professor at the Indian Institute of Science (IISc), Bangalore. He received his Ph.D. from the School of Computer Science at Carnegie Mellon University, where he was advised by Graham Neubig and Zachary Lipton. He is broadly interested in the areas of natural language processing and deep learning, with a focus on model interpretability. He completed his bachelors degree in computer science from BITS Pilani, Pilani. He has spent time doing research at Google AI, Facebook AI Research, Microsoft Research, Amazon AI and IISc. He is also a recipient of the Siebel Scholarship and the CMU Presidential Fellowship. His legal name is only Danish—a cause of airport quagmires and, in equal parts, funny anecdotes.
Apr. 24 DBH 4011 1 pm	Anthony Chen PhD Student Department of Computer Science, UC Irvine Researching and Revising What Language Models Say, Using Language Models As the strengths of large language models (LLMs) have become prominent, so too have their weaknesses. A glaring weakness of LLMs is their penchant for generating false, biased, or misleading claims in a phenomena broadly referred to as “hallucinations”. Most LLMs also do not ground their generations to any source, exacerbating this weakness. To enable attribution while still preserving all the powerful advantages of LLMs, we propose RARR (Retrofit Attribution using Research and Revision), a system that 1) automatically retrieves evidence to support the output of any LLM followed by 2) post-editing the output to fix any information that contradicts the retrieved evidence while preserving the original output as much as possible. When applied to the output of several state-of-the-art LLMs on a diverse set of generation tasks, we find that RARR significantly improves attribution. Bio: Anthony Chen is a final-year doctoral student advised by Sameer Singh. He is broadly interested in how we can evaluate the limits of large language models and design efficient methods to address their deficiencies. Recently, his research has been focused on tackling the pernicious problem of attribution and hallucinations in large language models and making them more reliable to use.
May 1 DBH 4011 1 pm	Hengrui Cai Assistant Professor of Statistics University of California, Irvine Towards Causal Revolution: On Learning Heterogeneity and Non-Spuriousness in Causal Graphs The causal revolution has spurred interest in understanding complex relationships in various fields. Under a general causal graph, the exposure may have a direct effect on the outcome and also an indirect effect regulated by a set of mediators. An analysis of causal effects that interprets the causal mechanism contributed through mediators is hence challenging but on demand. In this talk, we introduce a new statistical framework to comprehensively characterize causal effects with multiple mediators, namely, ANalysis Of Causal Effects (ANOCE). Built upon such causal impact learning, we focus on two emerging challenges in causal relation learning, heterogeneity and spuriousness. To characterize the heterogeneity, we first conceptualize heterogeneous causal graphs (HCGs) by generalizing the causal graphical model with confounder-based interactions and multiple mediators. In practice, only a small number of variables in the graph are relevant for the outcomes of interest. As a result, causal estimation with the full causal graph — especially given limited data — could lead to many falsely discovered, spurious variables that may be highly correlated with but have no causal impact on the target outcome. We propose to learn a class of necessary and sufficient causal graphs (NSCG) that only contain causally relevant variables by utilizing the probabilities of causation. Across empirical studies of simulated and real data applications, we show that the proposed algorithms outperform existing ones and can reveal true heterogeneous and non-spurious causal graphs. Bio: Dr. Hengrui Cai is an Assistant Professor in the Department of Statistics at the University of California Irvine. She obtained her Ph.D. degree in Statistics at North Carolina State University in 2022. Cai has broad research interests in methodology and theory in causal inference, reinforcement learning, and graphical modeling, to establish reliable, powerful, and interpretable solutions to real-world problems. Currently, her research focuses on causal inference and causal structure learning, and policy optimization and evaluation in reinforcement/deep learning. Her work has been published in conferences including ICLR, NeurIPS, ICML, and IJCAI, as well as journals including the Journal of Machine Learning Research, Stat, and Statistics in Medicine.
May 8 DBH 4011 1 pm	Pierre Baldi and Alexander Shmakov Department of Computer Science, UC Irvine Deep Learning in Science The Baldi group will present ongoing progress in the theory and applications of deep learning. On the theory side, we will discuss homogeneous activation functions and their important connections to the concept of generalized neural balance. On the application side, we will present applications of neural transformers to physics, in particular for the assignment of observation measurements to the leaves of partial Feynman diagrams in particle physics. In these applications, the permutation invariance properties of transformers are used to capture fundamental symmetries (e.g. matter vs antimatter) in the laws of physics. Bio: Pierre Baldi earned M.S. degrees in mathematics and psychology from the University of Paris, France, in 1980, and the Ph.D. degree in mathematics from the Caltech, CA, USA, in 1986. He is currently a Distinguished Professor with the Department of Computer Science, Director with the Institute for Genomics and Bioinformatics, and Associate Director with the Center for Machine Learning and Intelligent Systems at the University of California, Irvine, CA, USA. His research interests include understanding intelligence in brains and machines. He has made several contributions to the theory of deep learning, and developed and applied deep learning methods for problems in the natural sciences. He has written 4 books and over 300 peer-reviewed articles. Dr. Baldi was the recipient of the 1993 Lew Allen Award at JPL, the 2010 E. R. Caianiello Prize for research in machine learning, and a 2014 Google Faculty Research Award. He is an Elected Fellow of the AAAS, AAAI, IEEE, ACM, and ISCB Alexander Shmakov is a Ph.D. student in the Baldi research group who loves everything deep learning and robotics. He has published papers on applications of deep learning to planning, robotic control, high energy physics, astronomy, chemical synthesis, and biology.
May 15 DBH 4011 1 pm	Guy Van den Broeck Associate Professor of Computer Science University of California, Los Angeles AI can Learn from Data. But can it Learn to Reason? Many expect that AI will go from powering chatbots to providing mental health services. That it will go from advertisement to deciding who is given bail. The expectation is that AI will solve society’s problems by simply being more intelligent than we are. Implicit in this bullish perspective is the assumption that AI will naturally learn to reason from data: that it can form trains of thought that “make sense”, similar to how a mental health professional or judge might reason about a case, or more formally, how a mathematician might prove a theorem. This talk will investigate the question whether this behavior can be learned from data, and how we can design the next generation of AI techniques that can achieve such capabilities, focusing on neuro-symbolic learning and tractable deep generative models. Bio: Guy Van den Broeck is an Associate Professor and Samueli Fellow at UCLA, in the Computer Science Department, where he directs the StarAI lab. His research interests are in Machine Learning, Knowledge Representation and Reasoning, and Artificial Intelligence in general. His papers have been recognized with awards from key conferences such as AAAI, UAI, KR, OOPSLA, and ILP. Guy is the recipient of an NSF CAREER award, a Sloan Fellowship, and the IJCAI-19 Computers and Thought Award.
May 22 DBH 4011 1 pm	Gabe Hope PhD Student, Computer Science University of California, Irvine Semi-Supervised Learning with Prediction-Constrained Variational Autoencoders Variational autoencoders (VAEs) have proven to be an effective approach to modeling complex data distributions while providing compact representations that can be interpretable and useful for downstream prediction tasks. In this work we train variational autoencoders with the dual goals of good likelihood-based generative modeling and good discriminative performance in supervised and semi-supervised prediction tasks. We show that the dominant approach to training semi-supervised VAEs has key weaknesses: it is fragile as model capacity increases; it is slow due to marginalization over labels; and it incoherently decouples into separate discriminative and generative models when all data is labeled. Our novel framework for semi-supervised VAE training uses a more coherent architecture and an objective that maximizes generative likelihood subject to prediction quality constraints. To handle cases when labels are very sparse, we further enforce a consistency constraint, derived naturally from the generative model, that requires predictions on reconstructed data to match those on the original data. Our approach enables advances in generative modeling to be incorporated by semi-supervised classifiers, which we demonstrate by augmenting deep generative models with latent variables corresponding to spatial transformations and by introducing a “very deep'” prediction-constrained VAE with many layers of latent variables. Our experiments show that prediction and consistency constraints improve generative samples as well as image classification performance in semi-supervised settings. Bio: Gabe Hope is a final-year PhD student at UC Irvine working with professor Erik Sudderth. His research focuses on deep generative models, interpretable machine learning and semi-supervised learning. This fall he will join the faculty at Harvey Mudd College as a visiting assistant professor in computer science.
May 29	No Seminar (Memorial Day)
June 5 DBH 4011 1 pm	Sangeetha Abdu Jyothi Assistant Professor of Computer Science University of California, Irvine CrystalBox: Future-Based Explanations for Deep RL Network Controllers Lack of explainability is a key factor limiting the practical adoption of high-performant Deep Reinforcement Learning (DRL) controllers in systems environments. Explainable RL for networking hitherto used salient input features to interpret a controller’s behavior. However, these feature-based solutions do not completely explain the controller’s decision-making process. Often, operators are interested in understanding the impact of a controller’s actions on performance in the future, which feature-based solutions cannot capture. In this talk, I will present CrystalBox, a framework that explains a controller’s behavior in terms of the future impact on key network performance metrics. CrystalBox employs a novel learning-based approach to generate succinct and expressive explanations. We use reward components of the DRL network controller, which are key performance metrics meaningful to operators, as the basis for explanations. I will finally present three practical use cases of CrystalBox: cross-state explainability, guided reward design, and network observability. Bio: Sangeetha Abdu Jyothi is an Assistant Professor in the Computer Science department at the University of California, Irvine. Her research interests lie at the intersection of computer systems, networking, and machine learning. Prior to UCI, she completed her Ph.D. at the University of Illinois, Urbana-Champaign in 2019 where she was advised by Brighten Godfrey and had a brief stint as a postdoc at VMware Research. She is currently an Affiliated Researcher at VMware Research. She leads the Networking, Systems, and AI Lab (NetSAIL) at UCI. Her current research focus revolves around: Internet and Cloud Resilience, and Systems and Machine Learning.
July 20 DBH 3011 11 am	Vincent Fortuin Research group leader in Machine Learning Helmholtz AI Use Cases for Bayesian Deep Learning in the Age of ChatGPT Many researchers have pondered the same existential questions since the release of ChatGPT: Is scale really all you need? Will the future of machine learning rely exclusively on foundation models? Should we all drop our current research agenda and work on the next large language model instead? In this talk, I will try to make the case that the answer to all these questions should be a convinced “no” and that now, maybe more than ever, should be the time to focus on fundamental questions in machine learning again. I will provide evidence for this by presenting three modern use cases of Bayesian deep learning in the areas of self-supervised learning, interpretable additive modeling, and sequential decision making. Together, these will show that the research field of Bayesian deep learning is very much alive and thriving and that its potential for valuable real-world impact is only just unfolding. Bio: Vincent Fortuin is a tenure-track research group leader at Helmholtz AI in Munich, leading the group for Efficient Learning and Probabilistic Inference for Science (ELPIS). He is also a Branco Weiss Fellow. His research focuses on reliable and data-efficient AI approaches leveraging Bayesian deep learning, deep generative modeling, meta-learning, and PAC-Bayesian theory. Before that, he did his PhD in Machine Learning at ETH Zürich and was a Research Fellow at the University of Cambridge. He is a member of ELLIS, a regular reviewer for all major machine learning conferences, and a co-organizer of the Symposium on Advances in Approximate Bayesian Inference (AABI) and the ICBINB initiative.

Winter 2023

Standard

Jan. 30 DBH 4011 1 pm	Maarten Bos Lead Research Scientist Snap Research Behavioral science research at a corporate research laboratory Corporate research labs aim to push the scientific and technological forefront of innovation outside traditional academia. Snap Inc. combines academia and industry by hiring academic researchers and doing application-driven research. In this talk I will give examples of research projects from my corporate research experience. My goal is to showcase the value of – and hurdles for – working both with and within corporate research labs, and how some of these values and hurdles are different from working in traditional academia. Bio: Maarten Bos is a Lead Research Scientist at Snap Inc. After receiving his PhD in The Netherlands and postdoctoral training at Harvard University, he led a behavioral science group at Disney Research before joining Snap in 2018. His research interests range from decision science, to persuasion, and human-technology interaction. His work has been published in journals such as Science, Psychological Science, and the Journal of Marketing Research, and has been covered by the Wall Street Journal, Harvard Business Review, and The New York Times.
Feb. 6 DBH 4011 1 pm	Kolby Nottingham PhD Student, Department of Computer Science University of California, Irvine Large Language Models as External Knowledge Sources for Sequential Decision Making While it’s common for other machine learning modalities to benefit from model pretraining, reinforcement learning (RL) agents still typically learn tabula rasa. Large language models (LLMs), trained on internet text, have been used as external knowledge sources for RL, but, on their own, they are noisy and lack the grounding necessary to reason in interactive environments. In this talk, we will cover methods for grounding LLMs in environment dynamics and applying extracted knowledge to training RL agents. Finally, we will demonstrate our newly proposed method for applying LLMs to improving RL sample efficiency through guided exploration. By applying LLMs to guiding exploration rather than using them as planners at execution time, our method remains robust to errors in LLM output while also grounding LLM knowledge in environment dynamics. Bio: Kolby Nottingham is a PhD student at the University of California Irvine where he is coadvized by Professors Roy Fox and Sameer Singh. Kolby’s research interests lie at the intersection of reinforcement learning and natural language processing. His research applies recent advances in large language models to improving sequential decision making techniques.
Feb. 13 DBH 4011 1 pm	Noble Kennamer PhD Student, Department of Computer Science University of California, Irvine Variational Methods for Bayesian Optimal Experimental Design Bayesian optimal experimental design is a sub-field of statistics focused on developing methods to make efficient use of experimental resources. Any potential design is evaluated in terms of a utility function, such as the (theoretically well-justified) expected information gain (EIG); unfortunately however, under most circumstances the EIG is intractable to evaluate. In this talk we build off of successful variational approaches, which optimize a parameterized variational model with respect to bounds on the EIG. Past work focused on learning a new variational model from scratch for each new design considered. Here we present a novel neural architecture that allows experimenters to optimize a single variational model that can estimate the EIG for potentially infinitely many designs. To further improve computational efficiency, we also propose to train the variational model on a significantly cheaper-to-evaluate lower bound, and show empirically that the resulting model provides an excellent guide for more accurate, but expensive to evaluate bounds on the EIG. We demonstrate the effectiveness of our technique on generalized linear models, a class of statistical models that is widely used in the analysis of controlled experiments. Experiments show that our method is able to greatly improve accuracy over existing approximation strategies, and achieve these results with far better sample efficiency. Bio: Noble Kennamer recently completed his PhD at UC Irvine under Alexander Ihler, where he worked on variational methods for optimal experimental design and applications of machine learning to the physical sciences. In March he will be starting as a Research Scientist at Netflix.
Feb. 20	No Seminar (Presidents’ Day)
Feb. 27	Seminar Canceled
Mar. 6 DBH 4011 1 pm	Shlomo Zilberstein Professor of Computer Science University of Massachusetts, Amherst Competence-Aware Systems Competence is the ability to do something well. Competence awareness is the ability to represent and learn a model of self competence and use it to decide how to best use the agent’s own abilities as well as any available human assistance. This capability is critical for the success and safety of autonomous systems that operate in the open world. In this talk, I introduce two types of competence-aware systems (CAS), namely Type I and Type II CAS. The former refers to a stand-alone system that can learn its own competence and use it to fine-tune itself to the characteristics of the problem instance at hand, without human assistance. The latter is a human-aware system that can uses a self-competence model to optimize the utilization of costly human assistive actions. I describe recent results that demonstrate the benefits of the two types of competence awareness in different contexts, including autonomous vehicle decision making. Bio: Shlomo Zilberstein is Professor of Computer Science and Associate Dean for Research and Engagement in the Manning College of Information and Computer Sciences at the University of Massachusetts, Amherst. He received a B.A. in Computer Science from the Technion, and a Ph.D. in Computer Science from the UC Berkeley. Zilberstein’s research focuses on the foundations and applications of resource-bounded reasoning techniques, which allow complex systems to make decisions while coping with uncertainty, missing information, and limited computational resources. His research interests include decision theory, reasoning under uncertainty, Markov decision processes, design of autonomous agents, heuristic search, real-time problem solving, principles of meta-reasoning, planning and scheduling, multi-agent systems, and reinforcement learning. Zilberstein is a Fellow of AAAI and the ACM. He is recipient of the University of Massachusetts Chancellor’s Medal (2019), the IFAAMAS Influential Paper Award (2019), the AAAI Distinguished Service Award (2019), a National Science Foundation CAREER Award (1996), and the Israel Defense Prize (1992). He received numerous Paper Awards from AAAI (2017,2021), IJCAI (2020), AAMAS (2003), ECAI (1998), ICAPS (2010), and SoCS (2022) among others. He is the past Editor-in-Chief of the Journal of Artificial Intelligence Research, former Chair of the AAAI Conference Committee, former President of ICAPS, a former Councilor of AAAI, and the Chairman of the AI Access Foundation.

Fall 2022

Standard

Oct. 10 DBH 4011 1 pm	Furong Huang Assistant Professor of Computer Science University of Maryland Trustworthy Machine Learning in Complex Environments With the burgeoning use of machine learning models in an assortment of applications, there is a need to rapidly and reliably deploy models in a variety of environments. These trustworthy machine learning models must satisfy certain criteria, namely the ability to: (i) adapt and generalize to previously unseen worlds although trained on data that only represent a subset of the world, (ii) allow for non-iid data, (iii) be resilient to (adversarial) perturbations, and (iv) conform to social norms and make ethical decisions. In this talk, towards trustworthy and generally applicable intelligent systems, I will cover some reinforcement learning algorithms that achieve fast adaptation by guaranteed knowledge transfer, principled methods that measure the vulnerability and improve the robustness of reinforcement learning agents, and ethical models that make fair decisions under distribution shifts. Bio: Furong Huang is an Assistant Professor of the Department of Computer Science at University of Maryland. She works on statistical and trustworthy machine learning, reinforcement learning, graph neural networks, deep learning theory and federated learning with specialization in domain adaptation, algorithmic robustness and fairness. Furong is a recipient of the NSF CRII Award, the MLconf Industry Impact Research Award, the Adobe Faculty Research Award, and three JP Morgan Faculty Research Awards. She is a Finalist of AI in Research – AI researcher of the year for Women in AI Awards North America 2022. She received her Ph.D. in electrical engineering and computer science from UC Irvine in 2016, after which she completed postdoctoral positions at Microsoft Research NYC.
Oct. 17 DBH 4011 1 pm	Bodhi Majumder PhD Student, Department of Computer Science and Engineering University of California, San Diego Effective, explainable, and equitable predictions in NLP models with world knowledge and conversations The use of artificial intelligence in knowledge-seeking applications (e.g., for recommendations and explanations) has shown remarkable effectiveness. However, the increasing demand for more interactions, accessibility and user-friendliness in these systems requires the underlying components (dialog models, LLMs) to be adequately grounded in the up-to-date real-world context. However, in reality, even powerful generative models often lack commonsense, explanations, and subjectivity — a long-standing goal of artificial general intelligence. In this talk, I will partly address these problems in three parts and hint at future possibilities and social impacts. Mainly, I will discuss: 1) methods to effectively inject up-to-date knowledge in an existing dialog model without any additional training, 2) the role of background knowledge in generating faithful natural language explanations, and 3) a conversational framework to address subjectivity—balancing task performance and bias mitigation for fair interpretable predictions. Bio: Bodhisattwa Prasad Majumder is a final-year PhD student at CSE, UC San Diego, advised by Prof. Julian McAuley. His research goal is to build interactive machines capable of producing knowledge grounded explanations. He previously interned at Allen Institute of AI, Google AI, Microsoft Research, FAIR (Meta AI) and collaborated with U of Oxford, U of British Columbia, and Alan Turing Institute. He is a recipient of the UCSD CSE Doctoral Award for Research (2022), Adobe Research Fellowship (2022), UCSD Friends Fellowship (2022), and Qualcomm Innovation Fellowship (2020). In 2019, Bodhi led UCSD in the finals of Amazon Alexa Prize. He also co-authored a best-selling NLP book with O’Reilly Media that is being adopted in universities internationally. Website: http://www.majumderb.com/.
Oct. 24 DBH 4011 1 pm	Mark Steyvers Professor of Cognitive Sciences University of California, Irvine Human-AI collaboration Artificial intelligence (AI) and machine learning models are being increasingly deployed in real-world applications. In many of these applications, there is strong motivation to develop hybrid systems in which humans and AI algorithms can work together, leveraging their complementary strengths and weaknesses. In the first part of the presentation, I will discuss results from a Bayesian framework where we statistically combine the predictions from humans and machines while taking into account the unique ways human and algorithmic confidence is expressed. The framework allows us to investigate the factors that influence complementarity, where a hybrid combination of human and machine predictions leads to better performance than combinations of human or machine predictions alone. In the second part of the presentation, I will discuss some recent work on AI-assisted decision making where individuals are presented with recommended predictions from classifiers. Using a cognitive modeling approach, we can estimate the AI reliance policy used by individual participants. The results show that AI advice is more readily adopted if the individual is in a low confidence state, receives high-confidence advice from the AI and when the AI is generally more accurate. In the final part of the presentation, I will discuss the question of “machine theory of mind” and “theory of machine”, how humans and machines can efficiently form mental models of each other. I will show some recent results on theory-of-mind experiments where the goal is for individuals and machine algorithms to predict the performance of other individuals in image classification tasks. The results show performance gaps where human individuals outperform algorithms in mindreading tasks. I will discuss several research directions designed to close the gap. Bio: Mark Steyvers is a Professor of Cognitive Science at UC Irvine and Chancellor’s Fellow. He has a joint appointment with the Computer Science department and is affiliated with the Center for Machine Learning and Intelligent Systems. His publications span work in cognitive science as well as machine learning and has been funded by NSF, NIH, IARPA, NAVY, and AFOSR. He received his PhD from Indiana University and was a Postdoctoral Fellow at Stanford University. He is currently serving as Associate Editor of Computational Brain and Behavior and Consulting Editor for Psychological Review and has previously served as the President of the Society of Mathematical Psychology, Associate Editor for Psychonomic Bulletin & Review and the Journal of Mathematical Psychology. In addition, he has served as a consultant for a variety of companies such as eBay, Yahoo, Netflix, Merriam Webster, Rubicon and Gimbal on machine learning problems. Dr. Steyvers received New Investigator Awards from the American Psychological Association as well as the Society of Experimental Psychologists. He also received an award from the Future of Privacy Forum and Alfred P. Sloan Foundation for his collaborative work with Lumosity.
Oct. 31 DBH 4011 1 pm	Alex Boyd PhD Student, Department of Statistics University of California, Irvine Predictive Querying for Autoregressive Neural Sequence Models In reasoning about sequential events it is natural to pose probabilistic queries such as “when will event A occur next” or “what is the probability of A occurring before B”, with applications in areas such as user modeling, medicine, and finance. However, with machine learning shifting towards neural autoregressive models such as RNNs and transformers, probabilistic querying has been largely restricted to simple cases such as next-event prediction. This is in part due to the fact that future querying involves marginalization over large path spaces, which is not straightforward to do efficiently in such models. In this talk, we will describe a novel representation of querying for these discrete sequential models, as well as discuss various approximation and search techniques that can be utilized to help estimate these probabilistic queries. Lastly, we will briefly touch on ongoing work that has extended these techniques into sequential models for continuous time events. Bio: Alex Boyd is a Statistics PhD candidate at UC Irvine, co-advised by Padhraic Smyth and Stephan Mandt. His work focuses on improving probabilistic methods, primarily for deep sequential models. He was selected in 2020 as a National Science Foundation Graduate Fellow.
Nov. 7 DBH 4011 1 pm	Yanning Shen Assistant Professor of Electrical Engineering and Computer Science University of California, Irvine Adaptive Online Scalable Learning with Graph Feedback We live in an era of data deluge, where pervasive media collect massive amounts of data, often in a streaming fashion. Learning from these dynamic and large volumes of data is hence expected to bring significant science and engineering advances along with consequent improvements in quality of life. However, with the blessings come big challenges. The sheer volume of data makes it impossible to run analytics in batch form. Large-scale datasets are noisy, incomplete, and prone to outliers. As many sources continuously generate data in real-time, it is often impossible to store all of it. Thus, analytics must often be performed in real-time, without a chance to revisit past entries. In response to these challenges, this talk will first introduce an online scalable function approximation scheme that is suitable for various machine learning tasks. The novel approach adaptively learns and tracks the sought nonlinear function ‘on the fly’ with quantifiable performance guarantees, even in adversarial environments with unknown dynamics. Building on this robust and scalable function approximation framework, a scalable online learning approach with graph feedback will be outlined next for online learning with possibly related models. The effectiveness of the novel algorithms will be showcased in several real-world datasets. Bio: Yanning Shen is an assistant professor with the EECS department at the University of California, Irvine. She received her Ph.D. degree from the University of Minnesota (UMN) in 2019. She was a finalist for the Best Student Paper Award at the 2017 IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing, and the 2017 Asilomar Conference on Signals, Systems, and Computers. She was selected as a Rising Star in EECS by Stanford University in 2017. She received the Microsoft Academic Grant Award for AI Research in 2021, the Google Research Scholar Award in 2022, and the Hellman Fellowship in 2022. Her research interests span the areas of machine learning, network science, data science, and signal processing.
Nov. 14 DBH 4011 1 pm	Muhao Chen Assistant Research Professor of Computer Science University of Southern California Robust and Indirectly Supervised Information Extraction Information extraction (IE) is the process of automatically inducing structures of concepts and relations described in natural language text. It is the fundamental task to assess the machine’s ability for natural language understanding, as well as the essential step for acquiring structural knowledge representation that is integral to any knowledge-driven AI systems. Despite the importance, obtaining direct supervision for IE tasks is always very difficult, as it requires expert annotators to read through long documents and identify complex structures. Therefore, a robust and accountable IE model has to be achievable with minimal and imperfect supervision. Towards this mission, this talk covers recent advances of machine learning and inference technologies that (i) grant robustness against noise and perturbation, (ii) prevent systematic errors caused by spurious correlations, and (iii) provide indirect supervision for label-efficient and logically consistent IE. Bio: Muhao Chen is an Assistant Research Professor of Computer Science at USC, and the director of the USC Language Understanding and Knowledge Acquisition (LUKA) Lab. His research focuses on robust and minimally supervised machine learning for natural language understanding, structured data processing, and knowledge acquisition from unstructured data. His work has been recognized with an NSF CRII Award, faculty research awards from Cisco and Amazon, an ACM SIGBio Best Student Paper Award and a best paper nomination at CoNLL. Dr. Chen obtained his Ph.D. degree from UCLA Department of Computer Science in 2019, and was a postdoctoral researcher at UPenn prior to joining USC.
Nov. 21 DBH 4011 1 pm	Peter Orbanz Professor of Machine Learning Gatsby Computational Neuroscience Unit, University College London Statistical implications of group invariance of distributions Consider a large random structure — a random graph, a stochastic process on the line, a random field on the grid — and a function that depends only on a small part of the structure. Now use a family of transformations to ‘move’ the domain of the function over the structure, collect each function value, and average. Under suitable conditions, the law of large numbers generalizes to such averages; that is one of the deep insights of modern ergodic theory. My own recent work with Morgane Austern (Harvard) shows that central limit theorems and other higher-order properties also hold. Loosely speaking, if the i.i.d. assumption of classical statistics is substituted by suitable properties formulated in terms of groups, the fundamental theorems of inference still hold. Bio: Peter Orbanz is a Professor of Machine Learning in the Gatsby Computational Neuroscience Unit at University College London. He studies large systems of dependent variables in machine learning and inference problems. That involves symmetry and group invariance properties, such as exchangeability and stationarity, random graphs and random structures, hierarchies of latent variables, and the intersection of ergodic theory and statistical physics with statistics and machine learning. In the past, Peter was a PhD student of Joachim M. Buhmann at ETH Zurich, a postdoc with Zoubin Ghahramani at the University of Cambridge, and Assistant and Associate Professor in the Department of Statistics at Columbia University.
Nov. 28	No Seminar (NeurIPS Conference)

Spring 2022

Standard

Live Stream for all Spring 2022 CML Seminars

May 2 DBH 4011 & Live Stream 1 pm	Maurizio Filippone Associate Professor, EURECOM and Ba-Hien Tran PhD Student, EURECOM YouTube Stream: https://youtu.be/oZAuh686ipw Functional Priors for Bayesian Deep Learning The Bayesian treatment of neural networks dictates that a prior distribution is specified over their weight and bias parameters. This poses a challenge because modern neural networks are characterized by a huge number of parameters and non-linearities. The choice of these priors has an unpredictable effect on the distribution of the functional output which could represent a hugely limiting aspect of Bayesian deep learning models. Differently, Gaussian processes offer a rigorous non-parametric framework to define prior distributions over the space of functions. In this talk, we aim to introduce a novel and robust framework to impose such functional priors on modern neural networks for supervised learning tasks through minimizing the Wasserstein distance between samples of stochastic processes. In addition, we extend this framework to carry out model selection for Bayesian autoencoders for unsupervised learning tasks. We provide extensive experimental evidence that coupling these priors with scalable Markov chain Monte Carlo sampling offers systematically large performance improvements over alternative choices of priors and state-of-the-art approximate Bayesian deep learning approaches. Bio: Maurizio Filippone received a Master’s degree in Physics and a Ph.D. in Computer Science from the University of Genova, Italy, in 2004 and 2008, respectively. In 2007, he was a Research Scholar with George Mason University, Fairfax, VA. From 2008 to 2011, he was a Research Associate with the University of Sheffield, U.K. (2008-2009), with the University of Glasgow, U.K. (2010), and with University College London, U.K (2011). From 2011 to 2015 he was a Lecturer at the University of Glasgow, U.K, and he is currently AXA Chair of Computational Statistics and Associate Professor at EURECOM, Sophia Antipolis, France. His current research interests include the development of tractable and scalable Bayesian inference techniques for Gaussian processes and Deep/Conv Nets with applications in life and environmental sciences. Bio: Ba-Hien Tran is currently a PhD student within the Data Science department of EURECOM, under the supervision of Professor Maurizio Filippone. His research focuses on Accelerating Inference for Deep Probabilistic Modeling. In 2016, he received a Bachelor of Science degree with honors in Computer Science from Vietnam National University, HCMC. His thesis investigated Deep Learning approaches for data-driven image captioning. In 2020, he received a Master of Science in Engineering degree in Data Science from Télécom Paris. His thesis focused on Bayesian Inference for Deep Neural Networks.
May 9 DBH 4011 & Live Stream 1 pm	Ties van Rozendaal Senior Machine Learning Researcher Qualcomm AI Research YouTube Stream: https://youtu.be/LQu-kwpfFg4 Instance-adaptive data compression: Improving Neural Codecs by Training on the Test Set Neural data compression has been shown to outperform classical methods in terms of rate-distortion performance, with results still improving rapidly. These models are fitted to a training dataset and cannot be expected to optimally compress test data in general due to limitations on model capacity, distribution shifts, and imperfect optimization. If the test-time data distribution is known and has relatively low entropy, the model can easily be finetuned or adapted to this distribution. Instance-adaptive methods take this approach to the extreme, adapting the model to a single test instance, and signaling the updated model along in the bitstream. In this talk, we will show the potential of different types of instance-adaptive methods and discuss the tradeoffs that these methods pose. Bio: Ties is a senior machine learning researcher at Qualcomm AI Research. He obtained his masters’s degree at the University of Amsterdam with a thesis on personalizing automatic speech recognition systems using unsupervised methods. At Qualcomm AI research he has been working on neural compression, with a focus on using generative models to compress image and video data. His research includes work on semantic compression and constrained optimization as well as instance-adaptive and neural-implicit compression.
May 16 DBH 4011 & Live Stream 1 pm	Robin Jia Assistant Professor of Computer Science University of Southern California YouTube Stream: https://youtu.be/ALqqlgbzAB0 Out-of-Distribution Evaluation: The How, the Which, and the “What?!” Natural language processing (NLP) models have achieved impressive accuracies on in-distribution benchmarks, but they are unreliable in out-of-distribution (OOD) settings. In this talk, I will give an exclusive preview of my group’s ongoing work on evaluating and improving model performance in OOD settings. First, I will propose likelihood splits, a general-purpose way to create challenging non-i.i.d. benchmarks by measuring generalization to the tail of the data distribution, as identified by a language model. Second, I will describe the advantages of neurosymbolic approaches over end-to-end pretrained models for OOD generalization in visual question answering; these results highlight the importance of measuring OOD generalization when comparing modeling approaches. Finally, I will show how synthesized examples can improve open-set recognition, the task of abstaining on OOD examples that come from classes never seen at training time. Bio: Robin Jia is an Assistant Professor of Computer Science at the University of Southern California. He received his Ph.D. in Computer Science from Stanford University, where he was advised by Percy Liang. He has also spent time as a visiting researcher at Facebook AI Research, working with Luke Zettlemoyer and Douwe Kiela. He is interested broadly in natural language processing and machine learning, with a particular focus on building NLP systems that are robust to distribution shift. Robin’s work has received best paper awards at ACL and EMNLP.
May 23	No Seminar
May 30	No Seminar (Memorial Day Holiday)
June 6 DBH 4011 & Live Stream 1 pm	Bobak Pezeshki PhD Student, Department of Computer Science University of California, Irvine YouTube Stream: https://youtu.be/Yl_aCTieVqc AND/OR Branch-and-Bound for Computational Protein Design Optimizing K* Computational protein design (CPD) is the task of creating new proteins to fulfill a desired function. In this talk, I will share work recently accepted at UAI 2022 based on a new formulation of CPD as a graphical model designed for optimizing subunit binding affinity. These new methods showed promising results when compared with state-of-the-art algorithm BBK* that is part of a long-time developed software package dedicated to CPD. In the talk, I will first describe CPD in general and for optimizing a quantity called K* (which approximates binding affinity). I will relate this to the well known task of MMAP for which many powerful algorithms have been recently developed and from which our methods are inspired. Next I will give a preview of the promising results of our new framework. I will then go on to describe the framework, presenting the formulation of the problem as a graphical model for K* optimization and introducing a weighted mini-bucket heuristic for bounding K* and guiding search. Finally, I will share our algorithm AOBB-K* and modifications that can enhance it, describing some of the empirical benefits and limitations of our scheme. To conclude, I will outline some future directions for advancing the use of this framework. Bio: Bobak Pezeshki is a fifth year PhD student of Computer Science at the University of California, Irvine, under advisement of Professor Rina Dechter. His research focus is in automated reasoning over graphical models with focus in Abstraction Sampling and applying automated reasoning over graphical models to computational protein design. He completed his undergraduate studies at UC Berkeley majoring in Molecular and Cell Biology (with an emphasis in Biochemistry) and Integrative Biology. Before pursuing his PhD at UCI, he was involved in protein biochemistry research at the Stroud Lab, UCSF, and at Novartis Vaccines and Diagnostics.

Winter 2022

Standard

Live Stream for all Winter 2022 CML Seminars

January 3	No Seminar
January 10 Live Stream 1 pm	Roy Fox Assistant Professor Department of Computer Science University of California, Irvine YouTube Stream: https://youtu.be/ImvsK5CFp0w Curiously effective ensemble and double-oracle reinforcement-learning methods Ensemble methods for reinforcement learning have gained attention in recent years, due to their ability to represent model uncertainty and use it to guide exploration and to reduce value estimation bias. We present MeanQ, a very simple ensemble method with improved performance, and show how it reduces estimation variance enough to operate without a stabilizing target network. Curiously, MeanQ is theoretically almost equivalent to a non-ensemble state-of-the-art method that it significantly outperforms, raising questions about the interaction between uncertainty estimation, representation, and resampling. In adversarial environments, where a second agent attempts to minimize the first’s rewards, double-oracle (DO) methods grow a population of policies for both agents by iteratively adding the best response to the current population. DO algorithms are guaranteed to converge when they exhaust all policies, but are only effective when they find a small population sufficient to induce a good agent. We present XDO, a DO algorithm that exploits the game’s sequential structure to exponentially reduce the worst-case population size. Curiously, the small population size that XDO needs to find good agents more than compensates for its increased difficulty to iterate with a given population size. Bio: Roy Fox is an Assistant Professor and director of the Intelligent Dynamics Lab at the Department of Computer Science at UCI. He was previously a postdoc in UC Berkeley’s BAIR, RISELab, and AUTOLAB, where he developed algorithms and systems that interact with humans to learn structured control policies for robotics and program synthesis. His research interests include theory and applications of reinforcement learning, algorithmic game theory, information theory, and robotics. His current research focuses on structure, exploration, and optimization in deep reinforcement learning and imitation learning of virtual and physical agents and multi-agent systems.
January 17	No Seminar (Martin Luther King, Jr. Day)
January 24 Live Stream 1 pm	Ransalu Senanayake Postdoctoral Scholar Department of Computer Science Stanford University YouTube Stream: https://youtu.be/3yR8BqBElXw Propagating Uncertainty from Modeling into Decision-Making for Trustworthy Autonomy Autonomous agents such as self-driving cars have already gained the capability to perform individual tasks such as object detection and lane following, especially in simple, static environments. While advancing robots towards full autonomy, it is important to minimize deleterious effects on humans and infrastructure to ensure the trustworthiness of such systems. However, for robots to safely operate in the real world, it is vital for them to quantify the multimodal aleatoric and epistemic uncertainty around them and use that uncertainty for decision-making. In this talk, I will talk about how we can leverage tools from approximate Bayesian inference, kernel methods, and deep neural networks to develop interpretable autonomous systems for high-stakes applications. Bio: Ransalu Senanayake is a postdoctoral scholar in the Statistical Machine Learning Group at the Department of Computer Science, Stanford University. He focuses on making downstream applications of machine learning trustworthy by quantifying uncertainty and explaining the decisions of such systems. Currently, he works with Prof. Emily Fox and Prof. Carlos Guestrin. He also worked on decision-making under uncertainty with Prof. Mykel Kochenderfer. Prior to joining Stanford, Ransalu obtained a PhD in Computer Science from the University of Sydney, Australia, and an MPhil in Industrial Engineering and Decision Analytics from the Hong Kong University of Science and Technology, Hong Kong.
January 31 Live Stream 1 pm	Dylan Slack PhD Student Department of Computer Science University of California, Irvine YouTube Stream: https://youtu.be/71RJvjPhk3U Exposing Shortcomings and Improving the Reliability of Machine Learning Explanations For domain experts to adopt machine learning (ML) models in high-stakes settings such as health care and law, they must understand and trust model predictions. As a result, researchers have proposed numerous ways to explain the predictions of complex ML models. However, these approaches suffer from several critical drawbacks, such as vulnerability to adversarial attacks, instability, inconsistency, and lack of guidance about accuracy and correctness. For practitioners to safely use explanations in the real world, it is vital to properly characterize the limitations of current techniques and develop improved explainability methods. This talk will describe the shortcomings of explanations and introduce current research demonstrating how they are vulnerable to adversarial attacks. I will also discuss promising solutions and present recent work on explanations that leverage uncertainty estimates to overcome several critical explanation shortcomings. Bio: Dylan Slack is a Ph.D. candidate at UC Irvine advised by Sameer Singh and Hima Lakkaraju and associated with UCI NLP, CREATE, and the HPI Research Center. His research focuses on developing techniques that help researchers and practitioners build more robust, reliable, and trustworthy machine learning models. In the past, he has held research internships at GoogleAI and Amazon AWS and was previously an undergraduate at Haverford College advised by Sorelle Friedler where he researched fairness in machine learning.
February 7 Live Stream 1 pm	Maja Rudolph Senior Research Scientist Bosch Center for AI YouTube Stream: https://youtu.be/9fRw74WhRdE Modeling Irregular Time Series with Continuous Recurrent Units Recurrent neural networks (RNNs) are a popular choice for modeling sequential data. Standard RNNs assume constant time-intervals between observations. However, in many datasets (e.g. medical records) observation times are irregular and can carry important information. To address this challenge, we propose continuous recurrent units (CRUs) – a neural architecture that can naturally handle irregular intervals between observations. The CRU assumes a hidden state which evolves according to a linear stochastic differential equation and is integrated into an encoder-decoder framework. The recursive computations of the CRU can be derived using the continuous-discrete Kalman filter and are in closed form. The resulting recurrent architecture has temporal continuity between hidden states and a gating mechanism that can optimally integrate noisy observations. We derive an efficient parametrization scheme for the CRU that leads to a fast implementation (f-CRU). We empirically study the CRU on a number of challenging datasets and find that it can interpolate irregular time series better than methods based on neural ordinary differential equations. Bio: Maja Rudolph is a Senior Research Scientist at the Bosch Center for AI where she works on machine learning research questions derived from engineering problems: for example, how to model driving behavior, how to forecast the operating conditions of a device, or how to find anomalies in the sensor data of an assembly line. In 2018, Maja completed her Ph.D. in Computer Science at Columbia University, advised by David Blei. She holds a MS in Electrical Engineering from Columbia University and a BS in Mathematics from MIT.
February 14 Live Stream 1 pm	Ruiqi Gao Research Scientist Google Brain YouTube Stream: https://youtu.be/eAozs_JKp4o Advanced training of energy-based models Energy-based models (EBMs) are an appealing class of probabilistic models, which can be viewed as generative versions of discriminators, yet can be learned from unlabeled data. Despite a number of desirable properties, two challenges remain for training EBMs on high-dimensional datasets. First, learning EBMs by maximum likelihood requires Markov Chain Monte Carlo (MCMC) to generate samples from the model, which can be extremely expensive. Second, the energy potentials learned with non-convergent MCMC can be highly biased, making it difficult to evaluate the learned energy potentials or apply the learned models to downstream tasks. In this talk, I will present two algorithms to tackle the challenges of training EBMs. (1) Diffusion Recovery Likelihood, where we tractably learn and sample from a sequence of EBMs trained on increasingly noisy versions of a dataset. Each EBM is trained with recovery likelihood, which maximizes the conditional probability of the data at a certain noise level given their noisy versions at a higher noise level. (2) Flow Contrastive Estimation, where we jointly estimate an EBM and a flow-based model, in which the two models are iteratively updated based on a shared adversarial value function. We demonstrate that EBMs can be trained with a small budget of MCMC or completely without MCMC. The learned energy potentials are faithful and can be applied to likelihood evaluation and downstream tasks, such as feature learning and semi-supervised learning. Bio: Ruiqi Gao is a research scientist at Google, Brain team. Her research interests are in statistical modeling and learning, with a focus on generative models and representation learning. She received her Ph.D. degree in statistics from the University of California, Los Angeles (UCLA) in 2021 advised by Song-Chun Zhu and Ying Nian Wu. Prior to that, she received her bachelor’s degree from Peking University. Her recent research themes include scalable training algorithms of deep generative models, variational inference, and representational models with implications in neuroscience.
February 21	No Seminar (Presidents’ Day)
February 28 DBH 4011 & Live Stream 1 pm	Sunipa Dev Research Scientist Ethical AI Team, Google AI YouTube Stream: https://youtu.be/V93uXTBnpFw Towards Inclusive and Socially Aware Language Technologies Large language models are commonly used in different paradigms of natural language processing and machine learning, and are known for their efficiency as well as their overall lack of interpretability. Their data driven approach for emulating human language often results in human biases being encoded and even amplified, potentially leading to cyclic propagation of representational and allocational harm. We discuss in this talk some aspects of detecting, evaluating, and mitigating biases and associated harms in a holistic, inclusive, and culturally-aware manner. In particular, we discuss the disparate impact on society of common language tools that are not inclusive of all gender identities. Bio: Sunipa Dev is a Research Scientist on the Ethical AI team at Google AI. Previously, she was an NSF Computing Innovation Fellow at UCLA, before which she completed her PhD at the University of Utah. Her ongoing research focuses on various facets of fairness and interpretability in NLP, including robust measurements of bias, cross-cultural understanding of concepts in NLP, and inclusive language representations.
March 7 Zoom 1 pm	Mukund Sundararajan Principal Research Scientist Google YouTube Stream unavailable, please join via Zoom Analyzing deep neural networks using attribution Predicting cancer from XRays seemed great Until we discovered the true reason. The model, in its glory, did fixate On radiologist markings – treason! We found the issue with attribution: By blaming pixels for the prediction (1,2,3,4,5,6). A complement’ry way to attribute, is to pay training data, a tribute (1). If you are int’rested in FTC, counterfactual theory, SGD Or Shapley values and fine kernel tricks, Please come attend, unless you have conflicts Should you build deep models down the road, Use attributions. Takes ten lines of code! Bio: There once was an RS called MS, The models he studies are a mess, A director at Google. Accurate and frugal, Explanations are what he likes best.
March 14	No Seminar (Finals Week)