Jan. 8 DBH 4011 1 pm |
This talk will focus on our endeavors in the past few years on explaining deep image models. Realizing that an important missing piece for explaining neural networks is a reliable heatmap visualization tool, we developed I-GOS and iGOS++ which optimize with integrated gradients to avoid local optima in heatmap generations and improve performance in high-resolution heatmaps. Especially, iGOS++ was able to discover that deep classifiers trained on COVID-19 X-ray images wrongly focus on the characters printed on the image and could produce erroneous solutions. This shows the utility of explanation in “debugging” deep classifiers. During the development of those visualizations, we realize that for a significant number of images, the classifier has multiple different paths to reach a confident prediction. This leads to our recent development of structural attention graphs, an approach that utilizes beam search to locate multiple coarse heatmaps for a single image, and compactly visualizes a set of image masks by capturing how different combinations of image regions impact the confidence of a classifier. A user study shows significantly better capability of users to answer counterfactual questions when presented with SAG versus conventional heatmaps. We will also show our findings from running explanation algorithms on recently popular transformer models, which indicate different decision-making behaviors among CNNs, global attention models and local attention models. Finally, as humans prefer visually appealing explanations that will literally “change” one class into another, we present results traversing the latent space of variational autoencoders and generative adversarial networks (GANs), generating high-quality counterfactual explanations that visually show how to change one image so that CNNs predict them as another category, without needing to re-train the autoencoders/GANs. When the classifier relies on wrong information to make classifications, the counterfactual explanations will illustrate the errors clearly. Bio: Fuxin Li is currently an associate professor in the School of Electrical Engineering and Computer Science at Oregon State University. Before that, he has held research positions in University of Bonn and Georgia Institute of Technology. He had obtained a Ph.D. degree in the Institute of Automation, Chinese Academy of Sciences in 2009. He has won an NSF CAREER award, an Amazon Research Award, (co-)won the PASCAL VOC semantic segmentation challenges from 2009-2012, and led a team to the 4th place finish in the DAVIS Video Segmentation challenge 2017. He has published more than 70 papers in computer vision, machine learning and natural language processing. His main research interests are 3D point cloud deep networks, human understanding of deep learning, video object segmentation and uncertainty estimation in deep learning. |
Jan. 15 |
No Seminar (Martin Luther King, Jr. Holiday)
|
Jan. 29 DBH 4011 1 pm |
Deep generative models have seen a meteoric rise in capabilities across a wide array of domains, ranging from natural language and vision to scientific applications such as precipitation forecasting and molecular generation. However, a number of important applications focus on data which is inherently infinite-dimensional, such as time-series, solutions to partial differential equations, and audio signals. This relatively under-explored class of problems poses unique theoretical and practical challenges for generative modeling. In this talk, we will explore recent developments for infinite-dimensional generative models, with a focus on diffusion-based methodologies.
Bio: Gavin Kerrigan is a PhD candidate in the Department of Computer Science at UC Irvine, where he is advised by Padhraic Smyth. Prior to joining UC Irvine, he obtained a BSc in mathematics from the Schreyer Honors College at Penn State University. His research is focused on deep generative models and their application to scientific domains. He was awarded an HPI fellowship and currently serves as a workflow chair for AISTATS. |
Feb. 12 DBH 4011 1 pm |
In-context Learning (ICL) uses large language models (LLMs) for new tasks by conditioning them on prompts comprising a few task examples. With the rise of LLMs that are intractable to train or hidden behind APIs, the importance of such a training-free interface cannot be overstated. However, ICL is known to be critically sensitive to the choice of in-context examples. Despite this, the standard approach for selecting in-context examples remains to use general-purpose retrievers due to the limited effectiveness and training requirements of prior approaches. In this talk, I’ll posit that good in-context examples demonstrate the salient information necessary to solve a given test input. I’ll present efficient approaches for selecting such examples, with a special focus on preserving the training-free ICL pipeline. Through results with a wide range of tasks and LLMs, I’ll demonstrate that selecting informative examples can indeed yield superior ICL performance.
Bio: Shivanshu Gupta is a Computer Science Ph.D. Candidate at the University of California Irvine, advised by Sameer Singh. Prior to this, he was a Research Fellow at LinkedIn and Microsoft Research India, and completed his B.Tech. and M.Tech. in Computer Science at IIT Delhi. His primary research interests are systematic generalization, in-context learning, and multi-step reasoning capabilities of large language models. |
Feb. 20 DBH 6011 2 pm |
Traditionally machine learning has been heavily influenced by neuroscience (hence the name artificial neural networks) and physics (e.g. MCMC, Belief Propagation, and Diffusion based Generative AI). We have recently witnessed that the flow of information has also reversed, with new tools developed in the ML community impacting physics, chemistry and biology. Examples include faster DFT, Force-Field accelerated MD simulations, PDE Neural Surrogate models, generating druglike molecules, and many more. In this talk I will review the exciting opportunities for further cross fertilization between these fields, ranging from faster (classical) DFT calculations and enhanced transition path sampling to traveling waves in artificial neural networks.
Bio: Prof. Dr. Max Welling is a research chair in Machine Learning at the University of Amsterdam. He is a fellow at the Canadian Institute for Advanced Research (CIFAR) and the European Lab for Learning and Intelligent Systems (ELLIS) where he also serves on the founding board. His previous appointments include VP at Qualcomm Technologies, professor at UC Irvine, postdoc at U. Toronto and UCL under supervision of Prof. Geoffrey Hinton, and postdoc at Caltech under supervision of Prof. Pietro Perona. He finished his PhD in theoretical high energy physics under supervision of Nobel laureate Prof. Gerard ‘t Hooft. |
Mar. 4 DBH 4011 11 am |
Machine learning (ML) and generative artificial intelligence (AI) is one of the most transformational technologies that is opening up new opportunities for innovation in every domain across software, finance, health care, manufacturing, media, entertainment and others. This talk will discuss the key trends that are driving AI/ML innovation, how enterprises are using AI/ML today to innovate how they run their businesses, the key technology challenges in scaling out ML and generative AI across the enterprise, some of the key innovations from Amazon, and how this field is likely to evolve in the future.
Bio: Dr. Bratin Saha is the Vice President of Machine Learning and AI services at AWS where he leads all the ML and AI services and helped build one of the fastest growing businesses in AWS history. In 2022 Harvard Business School wrote three case studies on how he built the machine learning business at AWS. He is an alumnus of Harvard Business School (General Management Program), Yale University (PhD Computer Science), and Indian Institute of Technology (BS Computer Science). He has more than 70 patents granted (with another 50+ pending) and more than 30 papers in conferences/journals. Prior to Amazon he worked at Nvidia and Intel leading different product groups spanning imaging, analytics, media processing, high performance computing, machine learning, and software infrastructure. Bratin received the Distinguished Alumnus Award from the Indian Institute of Technology and is an Executive Fellow at the Harvard Business School. |
Mar. 7 DBH 4011 11 am |
While language models (or LMs, à la ChatGPT) have become the predominant tool in natural language processing, their performance in non-English languages increasingly lags behind. This gap is due to the curse of multilinguality, which harms individual language performance in multilingual models through inter-language competition for model capacity. In this talk, I examine how current language models do and don’t capture different languages and present new methods for fair modeling of all languages.
First, I demonstrate how LMs become multilingual through their data and training dynamics. Specifically, I show how data contamination teaches ostensibly English models cross-lingual information; I then characterize when multilingual models learn (and forget) languages during training to uncover how the curse of multilinguality develops. These analyses provide key insights into developing more equitable multilingual models, and I propose a new language modeling approach for Cross-Lingual Expert Language Models (X-ELM) that explicitly allocates model resources to reduce language competition.
Bio: Terra Blevins is a Ph.D. candidate in the Paul G. Allen School of Computer Science and Engineering at the University of Washington. Her research focuses on linguistic analysis of language models and multilingual NLP, with the underlying aim of using analysis to build better, more equitable multilingual systems. She has received the NSF Graduate Research Fellowship for her research and previously worked as a visiting researcher at Facebook AI Research (FAIR). |
Mar. 11 DBH 4011 11 am |
Paola Cascante-Bonilla Postdoctoral Associate University of Maryland Institute for Advanced Computer Studies (UMIACS) Despite the impressive results of deep learning models, modern large-scale systems are required to be trained using massive amounts of manually annotated or freely available data on the Internet. But this “data in the wild” is insufficient to learn specific structural patterns of the world, and existing large-scale models still fail on common sense tasks requiring compositional inference. – This talk will focus on answering three fundamental questions: (a) How can we create systems that can learn with limited annotated data and adapt to new tasks and novel criteria? (b) How can we create systems able to encode real-world concepts with granularity in a robust manner? (c) Is it possible to create such a system with alternative data, complying with privacy protection principles and avoiding cultural bias? – Given my work’s intersection with Computer Vision and Natural Language Processing, my aim is to analyze and apply Machine Learning algorithms to understand how images and text can interact and model complex patterns, reinforcing compositional reasoning without forgetting prior knowledge. Finally, I will conclude with my future plans to continue exploring hyper-realistic synthetic data generation techniques and the expressiveness of generative models to train multimodal systems able to perform well in real-world scenarios, with applications including visual-question answering, cross-modal retrieval, zero-shot classification, and task planning.
Bio: Paola Cascante-Bonilla is a Ph.D. Candidate in Computer Science at Rice University, working on Computer Vision, Natural Language Processing, and Machine Learning. She has been focusing on multi-modal learning, few-shot learning, semi-supervised learning, representation learning, and synthetic data generation for compositionality and privacy protection. Her work has been published in machine learning, vision, and language conferences (CVPR, ICCV, AAAI, NeurIPS, BMVC, NAACL). She has previously interned at the Mitsubishi Electric Research Laboratories (MERL) and twice at the MIT-IBM Watson AI Lab. She is the recipient of the Ken Kennedy Institute SLB Graduate Fellowship (2022/23), and has been recently selected as a Future Faculty Fellow by Rice’s George R. Brown School of Engineering (2023) and as a Rising Star in EECS (2023). |
Mar. 14 DBH 4011 11 am |
Large language models (LLMs) have significantly extended the boundaries of NLP’s potential applications, partially because of their increased ability to do complex reasoning. However, LLMs have well-documented reasoning failures, such as hallucinations and inability to systematically generalize. In this talk, I describe my work on enhancing LLMs in reliably performing textual reasoning, with a particular focus on leveraging explanations. I will first introduce a framework for automatically assessing the robustness of black-box models using explanations. The framework first extracts features to describe the “reasoning process” disclosed by the explanations, and then uses a trained verifier to judge the reliability of predictions based on these features. I will then describe how to form effective explanations for better teaching LLMs to reason. My work uses declarative formal specifications as explanations, which enables using an SMT solver to amend the limited planning capabilities of LLMs. Finally, I will describe future directions for further enhancing LLMs to better aid humans in challenging real-world applications demanding deep reasoning.
Bio: Xi Ye is a Ph.D. candidate in the Department of Computer Science at the University of Texas at Austin, advised by Greg Durrett. His research is in the area of natural language processing, particularly in leveraging explanations to steer language models for complex textual reasoning tasks. He is also interested in semantic parsing and program synthesis. He is a co-instructor of the tutorial on Explanations in the Era of Large Language Models at NAACL 24 and a co-organizer of the workshop on Natural Language Reasoning and Structured Explanations at ACL 24. |
Mar. 19 DBH 4011 11 am |
Today, access to high-quality data has become the key bottleneck to deploying machine learning. Often, the data that is most valuable is locked away in inaccessible silos due to unfavorable incentives and ethical or legal restrictions. This is starkly evident in health care, where such barriers have led to highly biased and underperforming tools. Using my collaborations with Doctors Without Borders and the Cancer Registry of Norway as case studies, I will describe how collaborative learning systems, such as federated learning, provide a natural solution; they can remove barriers to data sharing by respecting the privacy and interests of the data providers. Yet for these systems to truly succeed, three fundamental challenges must be confronted: These systems need to 1) be efficient and scale to massive networks, 2) manage the divergent goals of the participants, and 3) provide resilient training and trustworthy predictions. I will discuss how tools from optimization, statistics, and economics can be leveraged to address these challenges.
Bio: Sai Praneeth Karimireddy is a postdoctoral researcher at the University of California, Berkeley with Mike I. Jordan. Karimireddy obtained his undergraduate degree from the Indian Institute of Technology Delhi and his PhD at the Swiss Federal Institute of Technology Lausanne (EPFL) with Martin Jaggi. His research builds large-scale machine learning systems for equitable and collaborative intelligence and designs novel algorithms that can robustly and privately learn over distributed data (i.e., edge, federated, and decentralized learning). His work has seen widespread real-world adoption through close collaborations with public health organizations (e.g., Doctors Without Borders, the Red Cross, the Cancer Registry of Norway) and with industries such as Meta, Google, OpenAI, and Owkin. |
Apr. 2 DBH 4011 11 am |
As babies, we begin to grasp the world through spontaneous observations, gradually developing generalized knowledge about the world. This foundational knowledge enables humans to effortlessly learn new skills without extensive teaching for each task. Can we develop a similar paradigm for AI? This talk describes how learning from limited supervision can address fundamental challenges in AI such as scalability and generalization while embedding generalized knowledge. I will first talk about our research in self-supervised learning, utilizing natural images and videos without human-annotated labels. By deriving supervisory signals from the data itself, our self-supervised models achieve scalable and universal representations. The second part will describe how to leverage non-curated image-text pairs, through which we obtain textual representation of images. This representation comprehensively describes semantic elements in an image and bridges various AI tools such as large language models (LLMs), enabling diverse vision-language applications. Collectively, this talk argues the benefits of harnessing limited supervision for developing more general, adaptable, and efficient AI, which is ultimately better able to serve human needs.
Bio: Chen Wei is a PhD candidate at the Computer Science Department of Johns Hopkins University, advised by Alan Yuille. Her research in Artificial Intelligence, Machine Learning, and Computer Vision focuses on developing AI systems that generalize to a wide range of novel tasks and are adaptable to new environments. Her research involves Self-Supervised Learning, Generative Modeling, and Vision-Language Understanding. She has published at many top-tier CV and AI venues. Chen is a recipient of EECS Rising Star in 2023 and ECCV Outstanding Reviewer. During her PhD, Chen was a research intern at FAIR at Meta AI and Google DeepMind. Before Johns Hopkins, she obtained her BS in Computer Science from Peking University. |
Apr. 5 DBH 2011 3 pm |
Kun Zhang Professor and Director, Center for Integrative AI Mohamed bin Zayed University of Artificial Intelligence Causality is a fundamental notion in science, engineering, and even in machine learning. Uncovering the causal process behind observed data can naturally help answer ‘why’ and ‘how’ questions, inform optimal decisions, and achieve adaptive prediction. In many scenarios, observed variables (such as image pixels and questionnaire results) are often reflections of the underlying causal variables rather than being the causal variables themselves. Causal representation learning aims to reveal the underlying high-level hidden causal variables and their relations. It can be seen as a special case of causal discovery, whose goal is to recover the underlying causal structure or causal model from observational data. The modularity property of a causal system implies properties of minimal changes and independent changes of causal representations, and in this talk, we show how such properties make it possible to recover the underlying causal representations from observational data with identifiability guarantees: under appropriate assumptions, the learned representations are consistent with the underlying causal process. Various problem settings are considered, involving independent and identically distributed (i.i.d.) data, temporal data, or data with distribution shift as input. We demonstrate when identifiable causal representation learning can benefit from flexible deep learning and when suitable parametric assumptions have to be imposed on the causal process, complemented with various examples and applications.
Bio: Kun Zhang is currently on leave from Carnegie Mellon University (CMU), where he is an associate professor of philosophy and an affiliate faculty in the machine learning department; he is working as a professor and the acting chair of the machine learning department and the director of the Center for Integrative AI at Mohamed bin Zayed University of Artificial Intelligence (MBZUAI). He develops methods for making causality transparent by torturing various kinds of data and investigates machine learning problems including transfer learning, representation learning, and reinforcement learning from a causal perspective. He has been frequently serving as a senior area chair, area chair, or senior program committee member for major conferences in machine learning or artificial intelligence, including UAI, NeurIPS, ICML, IJCAI, AISTATS, and ICLR. He was a co-founder and general & program co-chair of the first Conference on Causal Learning and Reasoning (CLeaR 2022), a program co-chair of the 38th Conference on Uncertainty in Artificial Intelligence (UAI 2022), and is a general co-chair of UAI 2023. |
Apr. 8 DBH 4011 11 am |
AI has revolutionized the way we interact online. Despite this, it hasn’t quite made the leap when it comes to tasks like cooking dinner or cleaning our desks. Why has AI excelled in automating our digital interactions but not in assisting us with physical tasks? In my talk, I will explore the challenges of applying AI to embodied tasks—those requiring physical interaction with the environment. To address these challenges, I turn to the efficient pathways humans use to achieve embodied intelligence and propose three strategies to ‘jump-start’ the learning process for embodied AI agents: (1) combining learning from both teachers and own experience, (2) leveraging external information or “hints” to simplify learning, such as using maps to learn about physical spaces, and (3) learning intelligent behaviors by simply observing others. These strategies integrate insights from perception and machine learning to bridge the gap between digital AI and embodied intelligence, ultimately enhancing AI’s usefulness and integration into our physical world.
Bio: Unnat Jain is a postdoctoral researcher at Carnegie Mellon University and Fundamental AI Research (FAIR) at Meta, where he works with Abhinav Gupta, Deepak Pathak, and Xinlei Chen. He received his PhD in Computer Science from UIUC, working with Alexander Schwing and Svetlana Lazebnik and collaborating with Google DeepMind and Allen Institute for AI. His research focuses on embodied intelligence, bridging computer vision (perception) and robot learning (action). Unnat is committed to fostering a collaborative research community and serves as an area chair at CVPR and NeurIPS, and has co-led workshops such as Adaptive Robotics (CoRL) and ‘Scholars & Big Models: How Can Academics Adapt?’ (CVPR). Unnat’s achievements have been recognized with several awards, including the Mavis Future Faculty Fellowship, Director’s Gold Medal at IIT Kanpur, Siebel Scholars, two best thesis awards, Microsoft and Google Fellowship nominations, and was a finalist of the Qualcomm Fellowship. |
Apr. 10 DBH 4011 11 am |
The emergence of powerful, ever more universal models such as ChatGPT, and stable Diffusion, made generative modeling (GM) undoubtedly a focal point for modern AI research. In the talk, we will discuss applications of GM and how GM fits into a vision of autonomous machine intelligence. We will critically examine the sustainability of scaling AI models, a prevalent approach driving remarkable advancements in GM. Despite significant successes, I highlight the substantial physical, economic, and environmental limitations of continuous scaling, questioning its long-term feasibility. Furthermore, we will discuss inherent limitations in current high performance models that lead to a lack of tractability of statistical queries necessary to enable reasoning.
Bio: My research agenda is centered on developing principled strategies based on information theory to mitigate these limitations. Through my doctoral and postdoctoral work, I have introduced model-agnostic methods for reducing model complexity and computational demands, significantly advancing the field of data compression and efficiency in AI models. |
Apr. 11 DBH 4011 11 am |
Large language models (LLMs) have soared in popularity in recent years, thanks to their ability to generate well-formed natural language answers for a myriad of topics. Despite their astonishing capabilities, they still suffer from various limitations. This talk will focus on two of them: the limited control over LLMs, and their failure to serve users from diverse backgrounds. I will start by presenting my research on controlling and enriching language models through the input (prompting). In the second part, I will introduce a novel algorithmic method to remove protected properties (such as gender and race) from text representations, which is crucial for preserving privacy and promoting fairness. The third part of the talk will focus on my research efforts to develop models that support multiple languages, and the challenges faced when working with languages other than English. These efforts together unlock language technology for different user groups and across languages. I will conclude by presenting my vision for safer and more reliable language modeling going forward.
Bio: Hila is a postdoctoral researcher at the Paul G. Allen School of Computer Science & Engineering at the University of Washington. Hila’s research lies in the intersection of Natural Language Processing, Machine Learning, and AI. In her research, she works towards two main goals: (1) developing algorithms and methods for controlling the model’s behavior; (2) making cutting-edge language technology available and fair across speakers of different languages and users of different socio-demographic groups. |
Apr. 15 DBH 4011 11 am |
Massive scale has been a recent winning recipe in natural language processing and AI, with extreme-scale language models like GPT-4 receiving most attention. This is in spite of staggering energy and monetary costs, and further, the continuing struggle of even the largest models with concepts such as compositional problem solving and linguistic ambiguity. In this talk, I will propose my vision for a research landscape where compact language models share the forefront with extreme scale models, working in concert with many pieces besides scale, such as algorithms, knowledge, information theory, and more.
The first part of my talk will cover alternative ingredients to scale, including (1) an inference-time algorithm that combines language models with elements of discrete search and information theory and (2) a method for transferring useful knowledge from extreme-scale to compact language models with synthetically generated data. Next, I will discuss counterintuitive disparities in the capabilities of even extreme-scale models, which can meet or exceed human performance in some complex tasks while trailing behind humans in what seem to be much simpler tasks. Finally, I will discuss implications and next steps in scale-alternative methods.
Bio: Peter West is a PhD candidate in the Paul G. Allen School of Computer Science & Engineering at the University of Washington, working with Yejin Choi. His research is focused on natural language processing and language models, particularly combining language models with elements of knowledge, search algorithms, and information theory to equip compact models with new capabilities. In parallel, he studies the limits that even extreme-scale models have yet to solve. His work has received multiple awards, including best methods paper at NAACL 2022, and outstanding paper awards at ACL and EMNLP in 2023. His work has been supported in part by the NSERC PGS-D fellowship. Previously, Peter received a BSc in computer science from the University of British Columbia. |