Oct. 7 DBH 4011 1 pm |
We envision a world where AI agents (assistants) are widely used for complex tasks in our digital and physical worlds and are broadly integrated into our society. To move towards such a future, we need an environment for a robust evaluation of agents’ capability, reliability, and trustworthiness. In this talk, I’ll introduce AppWorld, which is a step towards this goal in the context of day-to-day digital tasks. AppWorld is a high-fidelity simulated world of people and their digital activities on nine apps like Amazon, Gmail, and Venmo. On top of this fully controllable world, we build a benchmark of complex day-to-day tasks such as splitting Venmo bills with roommates, which agents have to solve via interactive coding and API calls. One of the fundamental challenges with complex tasks lies in accounting for different ways in which the tasks can be completed. I will describe how we address this challenge using a reliable and programmatic evaluation framework. Our benchmarking evaluations show that even the best LLMs, like GPT-4o, can only solve ~30% of such tasks, highlighting the challenging nature of the AppWorld benchmark. I will conclude by laying out exciting future research that can be conducted on the foundation of AppWorld, such as benchmarks and playground for developing multimodal, collaborative, safe, socially intelligent, resourceful, and fail-tolerant agents. Bio: Harsh Trivedi is a final year PhD student at Stony Brook University, advised by Niranjan Balasubramanian. He is broadly interested in the development of reliable, explainable AI systems and their rigorous evaluation. Specifically, his research spans the domains of AI agents, multi-step reasoning, AI safety, and efficient NLP. He has interned at AI2 and was a visiting researcher at NYU. His recent work, AppWorld, received a Best Resource Paper award at ACL’24, and his work on AI safety via debate received a Best Paper award at the ML Safety workshop at NeurIPS’22. |
Oct. 14 DBH 4011 1 pm |
Diffusion models exhibit excellent sample quality across multiple-generation tasks. However, their inference process is iterative and often requires hundreds of function evaluations. Moreover, it is unclear if existing methods for accelerating diffusion model sampling can generalize well across different types of diffusion processes. In the first part of my talk, I will introduce Conjugate Integrators, which project unconditional diffusion dynamics to an alternate space that is more amenable to faster sampling. The resulting framework possesses several interesting theoretical connections with prior work in fast diffusion sampling, enabling their application to a broader class of diffusion processes. In the second part of my talk, I will extend the idea of Conjugate Integrators from unconditional sampling to conditional diffusion sampling in the context of solving inverse problems. Empirically, on challenging inverse problems like 4x super-resolution on the ImageNet-256 dataset, conditional Conjugate Integrators can generate high-quality samples in as few as 5 conditional sampling steps, providing significant speedups over prior work.
Bio: Kushagra is a third-year PhD student in Computer Science at UCI, advised by Prof. Stephan Mandt. Previously, he completed his bachelor’s and master’s degrees in Computer Science from the Indian Institute of Technology. He is broadly interested in the efficient design and inference in deep generative models with a current focus on iterative refinement models like Diffusion Models and Stochastic Interpolants. |
Oct. 21 DBH 4011 1 pm |
To optimize mobile health interventions and advance domain knowledge on intervention design, it is critical to understand how the intervention effect varies over time and with contextual information. This study aims to assess how a push notification suggesting physical activity influences individuals’ step counts using data from the HeartSteps micro-randomized trial (MRT). The statistical challenges include the time-varying treatments and longitudinal functional step count measurements. We propose the first semiparametric causal excursion effect model with varying coefficients to model the time-varying effects within a decision point and across decision points in an MRT. The proposed model incorporates double time indices to accommodate the longitudinal functional outcome, enabling the assessment of time-varying effect moderation by contextual variables. We propose a two-stage causal effect estimator that uses machine learning and is robust against a misspecified high-dimensional outcome regression nuisance model. We establish asymptotic theory and conduct simulation studies to validate the proposed estimator. Our analysis provides new insights into individuals’ change in response profiles (such as how soon a response occurs) due to the activity suggestions, how such changes differ by the type of suggestions received, and how such changes depend on other contextual information such as being recently sedentary and the day being a weekday.
Bio: Tianchen Qian is an Assistant Professor in Statistics at UC Irvine. His research focuses on leveraging data science, mobile technology, and wearable devices to design robust, personalized, and cost-effective interventions that can impact health and well-being at a significant scale. He also works on causal inference, experimental design, machine learning, semiparametric efficiency theory, and longitudinal data methods. He has a PhD in Biostatistics from Johns Hopkins University. Before joining UCI, he was a postdoc fellow in Statistics at Harvard University. |
Oct. 28 DBH 4011 1 pm |
Jana Lipkova Assistant Professor, Department of Pathology School of Medicine, University of California, Irvine In oncology, the patient state is characterized by a spectrum of diverse medical data, each providing unique insights. The vast amount of data, however, makes it difficult for experts to adequately assess patient prognosis under the multimodal context. We present a deep learning-based multimodal framework for integration of radiology, histopathology, and genomics data to improve patient outcome prediction. The framework does not require annotations, tumor segmentation, or hand-crafted features and can be easily applied to larger cohorts and diverse disease models. The feasibility of the model is tested on two external independent cohorts, including glioma and non-small cell lung cancer, indicating benefits of multimodal data integration for patient risk stratification, outcome prediction, and prognostic biomarker exploration.
Bio: Jana Lipkova is an Assistant Professor at the University of California Irvine, in Dept. of Pathology and also in Dept. of Biomedical Engineering. She completed her postdoctoral fellowship in the AI for Pathology group under the guidance of Faisal Mahmood at Harvard Medical School. Prior to her postdoc, she earned PhD in computer-aided medical procedures in the radiology department at Technical University in Munich. Jana’s research lab, called OctoPath, focuses on developing AI methods for diagnosis, prognosis, and treatment optimization in histopathology and beyond (for more see octopath.org). |
Nov. 4 DBH 4011 1 pm |
“Future Health” emphasizes the importance of recognizing each individual’s uniqueness, which arises from their specific omics, lifestyle, environmental, and socioeconomic conditions. Thanks to advancements in sensors, mobile computing, ubiquitous computing, and artificial intelligence (AI), we can now collect detailed information about individuals. This data serves as the foundation for creating personal models, offering predictive and preventive advice tailored specifically to each person. These models enable us to provide precise recommendations that closely align with the individual’s predicted needs. In my presentation, I will explore how AI, including generative AI, and wearable technology are revolutionizing the collection and analysis of big health data in everyday environments. I will discuss the analytics used to evaluate physical and mental health and how smart recommendations can be made objectively. Moreover, I will illustrate how leveraging Large Language Models (LLMs)-powered conversational health agents (CHAs) can integrate personal data, models, and knowledge into healthcare chatbots. Additionally, I will present our open-source initiative on developing OpenCHA (openCHA.com). This integration allows for creating personalized chatbots, enhancing the delivery of health guidance directly tailored to the individual.
Bio: Amir M. Rahmani is the founder of the Health SciTech Group at the University of California, Irvine (UCI) and the co-founder and co-director of the Institute for Future Health, a campus-wide Organized Research Unit at UCI. He is also a lifetime docent (Adjunct Professor) at the University of Turku (UTU), Finland. His research includes AI in healthcare, ubiquitous computing, AI-powered bio-signal processing, health informatics, and big health data analytics. He has been leading several NSF, NIH, Academy of Finland, and European Commission-funded projects on Smart Pain Assessment, Community-Centered Care, Family-centered Maternity Care, Stress Management in Adolescents, and Remote Elderly and Family Caregivers Monitoring. He is the co-author of more than 350 peer-reviewed publications and the associate editor-in-chief of ACM Transactions on Computing for Healthcare and Frontiers in Wearable Electronics journals and the Editorial Board of Nature Scientific Reports. He is a distinguished member of the ACM and a senior member of the IEEE. |
Nov. 11 |
No Seminar (Veterans Day Holiday)
|
Nov. 18 DBH 4011 1 pm |
The robotics community has seen significant progress in applying machine learning for robot manipulation. However, despite this progress, developing a system capable of generalizable robot manipulation remains fundamentally difficult, especially when manipulating in clutter and adjusting deformable objects such as fabrics, rope, and liquids. Some promising techniques for developing general robot manipulation systems include reinforcement learning, imitation learning, and more recently, leveraging foundation models trained on internet-scale data, such as GPT-4. In this talk, I will discuss our recent work on (1) deep reinforcement learning for dexterous manipulation in clutter, (2) foundation models and imitation learning for bimanual manipulation, and (3) our benchmarks and applications of foundation models for deformable object manipulation. We will discuss the current strengths and limitations of these approaches. I will conclude with an overview of future research directions and my vision for an exciting future in pursuit of a general, whole-body, and dexterous robot system.
Bio: Daniel Seita is an Assistant Professor in the Computer Science department at the University of Southern California and the director of the Sensing, Learning, and Understanding for Robotic Manipulation (SLURM) Lab. His research interests are in computer vision, machine learning, and foundation models for robot manipulation, focusing on improving performance in visually and geometrically challenging settings. Daniel was previously a postdoc at Carnegie Mellon University’s Robotics Institute and holds a PhD in computer science from the University of California, Berkeley. He received undergraduate degrees in math and computer science from Williams College. Daniel’s research has been supported by a six-year Graduate Fellowship for STEM Diversity and by a two-year Berkeley Fellowship. He has the Honorable Mention for Best Paper award at UAI 2017, was an RSS 2022 Pioneer, and has presented work at premier robotics conferences such as ICRA, IROS, RSS, and CoRL. |
Nov. 25 DBH 4011 1 pm |
Foundational models have demonstrated exceptional performance on established academic benchmarks, often narrowing the gap between human reasoning and artificial intelligence. While the success of these models is widely attributed to their scale—encompassing both their architectural parameters and the vast pretraining data—the critical role of pretraining data in shaping their capabilities and limitations is often acknowledged but rarely studied.
However, if we cannot disentangle model behavior from their pretraining data, how can we trust these systems in real-world, high-stakes applications? In this talk, I will argue that understanding the true performance of foundational models requires going beyond conventional benchmark testing. In particular, incorporating insights from their pretraining data is essential for comprehensively evaluating and interpreting the models’ capabilities and limitations. I show that while models –both multimodal and language models– often excel in benchmark settings, they can fail on basic, trivial reasoning tasks, raising concerns about their true robustness. To better understand these limitations, I propose examining the relationship between a model’s successes and failures through the lens of its pretraining data and present methodologies and tools for studying how pretraining data impacts a model’s performance. By revealing failure modes in these models and exposing the impact of pretraining data on their behavior, this work cautions against overly optimistic interpretations of models’ abilities based on canonical evaluation results. Bio: Yasaman Razeghi is a final-year Ph.D. student at UCI, advised by Prof. Sameer Singh. She completed her master’s and undergraduate studies at the University of Tehran in Iran. Her research focuses on understanding the relationships between pretraining data characteristics and model behavior. Most recently, she has been investigating foundational models in scenarios involving reasoning and multimodal capabilities. |
Dec. 2 DBH 4011 1 pm |
Estimating the temporal state of a system from image sequences is an important task for many vision and robotics applications. A number of classical frameworks for state estimation have been proposed, but often these methods require human experts to specify the system dynamics and measurement model, requiring simplifying assumptions that hurt performance. With the increasing abundance of real-world training data, there is enormous potential to boost accuracy by using deep learning to learn state estimation algorithms, but there are also substantial technical challenges in properly accounting for uncertainty. In this presentation, I will develop end-to-end learnable particle filters and particle smoothers, and show how to bring classic state estimation methods into the age of deep learning. We first create an end-to-end learnable particle filter that uses flexible neural networks to propagate multimodal, particle-based representations of state uncertainty. Our gradient estimators are unbiased and have substantially lower variance than existing, differentiable (but biased) particle filters. We apply our end-to-end learnable particle filter to the difficult task of visual localization in unknown environments, and show large improvements over prior work. We then expand on our particle filtering method to create the first end-to-end learnable particle smoother, which incorporates information from future as well as past observations, and apply this particle smoother to the real-world task of city-scale geo-localization using camera and planimetric map data. We compare to state-of-the-art baselines for visual geo-localization, and again show superior performance.
Bio: Ali Younis is a final-year PhD student in Computer Science at UCI, advised by Prof. Erik Sudderth. He previously completed his bachelor’s and master’s degrees at UCI and briefly worked on spacecraft systems before returning for a PhD. He is broadly interested in particle based belief propagation systems for time varying systems with applications in computer vision. |