Center Director Alexander Ihler gave a half-day course on approximate inference in graphical models as part of the 2015 Machine Learning Summer School in Sydney Australia. MLSS is a long-running series of summer schools around the world for graduate students and researchers who want to apply machine learning methods to their research problems. His lecture is available online here.
By: ihler
Anandkumar receives AFOSR Young Investigator Award
StandardCenter member Prof. Anima Anandkumar received a 3-year grant, Learning Mixed Membership Community Models: A Statistical and a Computational Framework, from the Air Force Office of Scientific Research, as part of its Young Investigator Research Program. The YIP is open to scientists and engineers who received their Ph.D. within the last five years, and show exceptional ability and promise for conducting basic research. Its objective is to foster creative basic research in science and engineering, enhance early career development of outstanding young investigators, and increase opportunities for the young investigators to recognize the Air Force mission and the related challenges in science and engineering.
Winter 2015
Standard
Jan 12
Bren Hall 4011 1 pm |
Mixture models are based on the hypothesis that real data can be viewed as arising from a probabilistic generative model with a small number of parameters. Examples include topic models for documents, hidden Markov models for speech, gaussian mixture models for point data, etc. The model parameters often give interesting semantic insights into the data.
I will first discuss algorithms for parameter estimation in mixture models via the use of tensors, or higher dimensional arrays. The idea is to estimate (using the data) a tensor, whose decomposition allows us to read off the hidden parameters. Thus parameter estimation is reduced to tensor decomposition. Unfortunately, tensor decomposition is NP-hard in general. However, I will show that there exist algorithms that “almost always” work efficiently, in the framework of smoothed analysis, as long as the “rank” is at most a polynomial in the number of dimensions. Next I will consider the case in which we wish to learn mixtures in small (constant number of) dimensions. Tensor methods do not apply to this regime, and indeed, there are lower bounds which say that exponentially many samples are needed to “identify” the parameters. However I will show that under a slightly relaxed objective, we can obtain “PAC” style learning algorithms. This follows from a more general theorem about sparse recovery with \ell_1 error, which I will describe. |
Jan 26
Bren Hall 4011 1 pm |
Currently, deep neural networks are the state of the art on problems suchas speech recognition and computer vision. By using a method called model compression, we show that shallow feed-forward nets can learn the complex functions previously learned by deep nets and achieve accuracies previously only achievable with deep models while using the same number of parameters as the original deep models. On the TIMIT phoneme recognition and CIFAR-10 image recognition tasks, shallow nets can be trained that perform similarly to complex, well-engineered, deeper convolutional architectures. The same model compression trick can also be used to compress impractically large deep models and ensembles of large deep models down to “medium-size” deep models that run more efficiently on servers, and down to “small” models that can run on mobile devices. In machine learning and statistics we used to believe that one of the keys to preventing overfitting was to keep models simple and the number of parameters small to force generalization. We no longer believe this — learning appears to generalize best when training models with excess capacity, but the learned functions can often be represented with far fewer parameters. We do not yet know if this is true just of current learning algorithms, or if it is a fundamental property of learning in general. |
Feb 2
Bren Hall 4011 1 pm |
Machine learning plays a major role in analyzing data from the Large Hadron Collider at CERN, and was used to discover the Higgs boson in 2012. We demonstrate that deep learning increases statistical power of this analysis, and that deep neural networks can automatically learn high-level features that usually need to be engineered by physicists. Furthermore, we describe how to automatically (and cheaply) tune deep neural network hyperparameters using Amazon EC2 GPU servers and free software tools. |
Feb 5
Bren Hall 4011 1 pm |
In recent years, the rapid proliferation of mobile phones in developing countries has provided billions of individuals with novel opportunities for social and economic interaction. Concurrently, the data generated by mobile phone networks is enabling new data-intensive methods for studying the social and economic behavior of individuals in resource-constrained environments. After all, these data reflect much more than simple communications activity: they capture the structure of social networks, decisions about expenditures and consumption, patterns of travel and mobility, and the regularity of daily routines. In this talk, I will discuss the results from two recent projects that derive behavioral insights from mobile phone data. The first study uses data on Mobile Money transfers in Rwanda and microeconomic models to better understand the motives that cause people to send money to friends and family in times of need. The second project combines call data with follow-up phone surveys to investigate the extent to which it is possible to predict an individual’s wealth and happiness based on his or her prior history of phone calls and several supervised learning models. These projects are enabled by generous support from the Institute for Money, Technology, and Financial Inclusion; Intel; the Gates Foundation; and the NSF. |
Feb 9
Bren Hall 4011 1 pm |
Shimon Whiteson
Associate Professor Intelligent Autonomous Systems Group, Informatics Institute University of Amsterdam
In this talk, I will propose a new method for the K-armed dueling bandit problem, a variation on the regular K-armed bandit problem that offers only relative feedback about pairs of arms. Our approach extends the Upper Confidence Bound algorithm to the relative setting by using estimates of the pairwise probabilities to select a promising arm and applying Upper Confidence Bound with the winner as a benchmark. We prove a sharp finite-time regret bound of order O(K log T) on a very general class of dueling bandit problems that matches a lower bound proven by Yue et al. In addition, our empirical results using real data from an information retrieval application show that it greatly outperforms the state of the art. |
Feb 23
Bren Hall 4011 1 pm |
Cylance
Glenn Chisholm, Chief Technology Officer, Cylance Matt Wolff, Chief Data Scientist, Cylance Michael Wojnowicz, Data Scientists, Cylance
Traditional approaches to detecting malware, namely those used by current antivirus methodologies, are increasingly comprised by sophisticated attackers who may have financial, social, or nationalistic motives. In just the last few weeks, there have been successful hacking attacks against Sony and Anthem, as well as a banking attack by the Carbanak group that stole over $1 billion for various financial institutions. Traditional antivirus approaches utilize manual analysis of files to identify malware; however these techniques simply do not scale to the volume of malware that now exists. At the time of this writing, an estimated million distinct newly suspicious files per day are generated. Clearly, manual inspection of all these files by human analysis is not feasible.
At Cylance, we have developed a machine learning engine to help reduce or remove the need for manual analysis. In this talk, we will dive in into various components of our machine learning infrastructure, with the goal of providing some insight into how one can apply machine learning to problems in industry and against large datasets. A machine learning approach to cybersecurity presents a wide array of interesting challenges in areas of feature extraction, feature engineering, dimensionality reduction, and modeling. Specific topics that will be discussed include designing models with speed in mind, why different optimization methods do or do not perform well against various types of data, and feature engineering based on wavelet analysis and entropy series. |
Mar 9
Bren Hall 4011 1 pm |
Fan-Gang Zeng
Professor of Anatomy and Neurobiology, Biomedical Engineering and Cognitive Sciences, UC Irvine Director, Center for Hearing Research Director of Research, Otolaryngology – Head and Neck Surgery, UC Irvine
Deafness affects not only speech communication but also language development including speaking and reading. The bionic ear or the cochlear implant is a modern medical device that allows the first successful restoration of a human sense. It works well for either adults who have lost hearing postlingually or children who receive the device prelingually. It doesn’t work well for those who never hear during development but get an implant in adulthood. My talk will address two issues. First, I will describe development of the modern cochlear implant from signal processing perspective, which clearly abides by the “KISS” principle. Second, I will emphasize the importance of the brain in the cochlear implant success, speculating on the structure and rules for the neural network to learn how to process speech. |
Mjolsness named American Association for the Advancement of Science fellow
StandardCenter member and Professor of Computer Science Eric Mjolsness has been been made a fellow of the American Association for the Advancement of Science for his distinguished contributions to the fields of computer science and biology, particularly for new computational models of gene regulation (networks of genes that turn each other on, off or partly on) and resulting technologies. For more details, see here.
Fall 2014
Standard
Oct 6
Bren Hall 4011 1 pm |
Self-paced learning for long-term tracking / Efficient Matching of 3D Hand Exemplars in RGB-D Images Self-paced learning for long-term tracking
We address the problem of long-term object tracking, where the object may become occluded or leave-the-view. In this setting, we show that an accurate appearance model is considerably more effective than a strong motion model. We develop simple but effective algorithms that alternate between tracking and learning a good appearance model given a track. We show that it is crucial to learn from the “right” frames, and use the formalism of self-paced curriculum learning to automatically select such frames. We leverage techniques from object detection for learning accurate appearance-based templates, demonstrating the importance of using a large negative training set (typically not used for tracking). We describe both an offline algorithm (that processes frames in batch) and a linear-time online (i.e. causal) algorithm that approaches real-time performance. Our models significantly outperform prior art, reducing the average error on benchmark videos by a factor of 4. Efficient Matching of 3D Hand Exemplars in RGB-D Images We focus on the task of single-image hand detection and pose estimation from RGB-D images. While much past work focuses on estimation from temporal video sequences, we consider the problem of single-image pose estimation, necessary for (re) initialization. The high number of degrees-of-freedom, frequent self-occlusions, and pose ambiguities make this problem rather challenging. While previous approaches tend to rely on articulated hand models or local part classifiers, our models are based on discriminative pose exemplars that can be quickly indexed with parts. We propose novel metric depth features that make the search over exemplars accurate and fast. Importantly, our exemplar models can reason about depth-aware occlusion. Finally, we also provide an extensive evaluation of the state-of-the-art, including academic and commercial systems, on a real-world annotated dataset. We show that our model outperforms such methods, providing promising results even in the presence of occlusions. |
Oct 13
Bren Hall 4011 1 pm |
Probabilistic modeling of ranking data is an extensively studied problem with applications ranging from understanding user preferences in electoral systems and social choice theory, to more modern learning tasks in online web search, crowd-sourcing and recommendation systems. This work concerns learning the Mallows model — one of the most popular probabilistic models for analyzing ranking data. In this model, the user’s preference ranking is generated as a noisy version of an unknown central base ranking. The learning task is to recover the base ranking and the model parameters using access to noisy rankings generated from the model.
Although well understood in the setting of a homogeneous population (a single base ranking), the case of a heterogeneous population (mixture of multiple base rankings) has so far resisted algorithms with guarantees on worst case instances. In this talk I will present the first polynomial time algorithm which provably learns the parameters and the unknown base rankings of a mixture of two Mallows models. A key component of our algorithm is a novel use of tensor decomposition techniques to learn the top-k prefix in both the rankings. Before this work, even the question of identifiability in the case of a mixture of two Mallows models was unresolved. Joint work with Avrim Blum, Or Sheffet and Aravindan Vijayaraghavan. |
Oct 20
Bren Hall 4011 1 pm |
We propose a joint longitudinal-survival model for associating summary measures of a longitudinally collected biomarker with a time-to-event endpoint. The model is robust to common parametric and semi-parametric assumptions in that it avoids simple distributional assumptions on longitudinal measures and allows for non-proportional hazards covariate effects in the survival component. Specifically, we use a Gaussian process model with a parameter that captures within-subject volatility in the longitudinally sampled biomarker, where the unknown distribution of the parameter is assumed to have a Dirichlet process prior. We then estimate the association between within-subject volatility and the risk of mortality using a flexible survival model constructed via a Dirichlet process mixture of Weibull distributions. Fully joint estimation is performed to account for uncertainty in the estimated within-subject volatility measure. Simulation studies are presented to assess the operating characteristics of the proposed model. Finally, the method is applied to data from the United States Renal Data System where we estimate the association between within-subject volatility in serum album and the risk of mortality among patients with end-stage renal disease. |
Oct 27
Bren Hall 4011 1 pm |
Many emerging applications of machine learning involve time series and spatio-temporal data. In this talk, I will discuss a collection of machine learning approaches to effectively analyze and model large-scale time series and spatio-temporal data, including temporal causal models, sparse extreme-value models, and fast tensor-based forecasting models. Experiment results will be shown to demonstrate the effectiveness of our models in practical applications, such as climate science, social media and biology.
Bio: Yan Liu is an assistant professor in Computer Science Department at University of Southern California from 2010. Before that, she was a Research Staff Member at IBM Research. She received her M.Sc and Ph.D. degree from Carnegie Mellon University in 2004 and 2006. Her research interest includes developing scalable machine learning and data mining algorithms with applications to social media analysis, computational biology, climate modeling and business analytics. She has received several awards, including NSF CAREER Award, Okawa Foundation Research Award, ACM Dissertation Award Honorable Mention, Best Paper Award in SIAM Data Mining Conference, Yahoo! Faculty Award and the winner of several data mining competitions, such as KDD Cup and INFORMS data mining competition. |
Nov 3
Bren Hall 4011 1 pm |
inference with uncertainty, and form a main field in Artificial Intelligence today. However, their usual form is restricted to a *propositional* representation, in the same way propositional logic is restricted when compared to relational first-order logic.
For encoding complex probabilistic models, we need richer, relational, quantified representations that yield a form of Probabilistic Logic. While propositionalization is an option for processing such encodings, it is not scalable. The field of *lifted* probabilistic inference seeks to process first-order relational probabilistic models on the relational level, avoiding grounding or propositionalizing as much as possible. I will talk about relational probabilistic models and give the main ideas about lifted probabilistic inference, and also comment on the relationship of all that to Probabilistic Programming, exemplified by probabilistic programming languages such as Church and BLOG. Bio: Rodrigo de Salvo Braz is a Computer Scientist at SRI International. He earned a PhD from the University of Illinois in 2007 with a thesis contributing some of the earliest ideas on Lifted Probabilistic Inference. He did a postdoc at UC Berkeley with Stuart Russell, working on the BLOG language, and is currently the PI of SRI’s project for DARPA’s Probabilistic Programming Languages for Advanced Machine Learning. |
Nov 10
Bren Hall 4011 1 pm |
I will review the ways that machine learning is typically used in particle physics, some recent advancements, and future directions. In particular, I will focus on the integration of machine learning and classical statistical procedures. These considerations motivate a novel construction that is a hybrid of machine learning algorithms and more traditional likelihood methods. |
Nov 17
Bren Hall 4011 1 pm |
Many prediction domains, ranging from content recommendation in a digital system to motion planning in a physical system, require making structured predictions. Broadly speaking, structured prediction refers to any type of prediction performed jointly over multiple input instances, and has been a topic of active research in the machine learning community over the past 10-15 years. However, what has been less studied is how to model structured prediction problems for an interactive system. For example, a recommender system necessarily interacts with users when recommending content, and can learn from the subsequent user feedback on those recommendations. In general, each “prediction” is an interaction where the system not only predicts a structured action to perform, but also receives feedback (i.e., training data) corresponding to the utility of that action.
In this talk, I will describe methods for balancing the tradeoff between exploration (collecting informative feedback) versus exploitation (maximizing system utility) when making structured predictions in an interactive environment. Exploitation corresponds to the standard prediction goal in non-interactive settings, where one predicts the best possible action given the current model. Exploration refers to taking actions that maximize the informativeness of the subsequent feedback, so that one can exploit more reliably in future interactions. I will show how to model and optimize for this tradeoff in two settings: diversified news recommendation (where the feedback comes from users) and adaptive vehicle routing (where the feedback comes from measuring congestion). This is joint work with Carlos Guestrin, Sue Ann Hong, Ramayya Krishnan and Siyuan Liu. |
Nov 24
Bren Hall 4011 1 pm |
Learning several latent variable models including multiview mixtures, mixture of Gaussians, independent component analysis and so on can be done by the decomposition of a low-order moment tensor (e.g., 3rd order tensor) to its rank-1 components. Many earlier studies using tensor methods only consider undercomplete regime where the number of hidden components is smaller than the observed dimension. In this talk, we show that the tensor power iteration (as the key element for tensor decomposition) works well even in the overcomplete regime where the hidden dimension is larger than the observed dimension. We establish that a wide range of overcomplete latent variable models can be learned efficiently with low computational and sample complexity through tensor power iteration. |
Dec 1
Bren Hall 4011 1 pm |
Designing latent variable learning methods, which have guaranteed bounded sample complexity, has become one of the recent research trend in the last few years. I will pick topic modeling as an example and discuss various learning algorithms along with their sample/computational complexity bounds. These bounds has been derived under the so-called topic separability assumption, which requires every topic to have at least a single word unique to it. It could be shown that under separability of topics, \ell_1 normalized rows of the word-word co-occurence probability matrix are embedded inside a convex polytope, whose vertices correspond only to the novel words of different topics. Moreover, these vertices characterize the topic proportion matrix. I will elaborate how these two facts could be used to design provable, highly distributable, and computational efficient algorithms for topic modeling. |
Baldi, Kobsa, Mark receive Google Faculty Research Awards
StandardThree ICS professors have received a Google Faculty Research Award as part of Google’s biannual open call for proposals in computer science, engineering and related fields. Computer science Chancellor’s Professor Pierre Baldi, informatics and computer science professor Alfred Kobsa and informatics professor Gloria Mark join several other ICS faculty who have received the award in recent years. Read more
Tomlinson, Patterson receive $400,000 NSF grant for crowdsourcing and food security project
StandardThe National Science Foundation (NSF) has awarded informatics professor Bill Tomlinson $400,000 for his project “Fostering Non-Expert Creation of Sustainable Polycultures through Crowdsourced Data Synthesis.” Associate professor Donald Patterson and Assistant Professor of Crop Sciences at the University of Illinois Sarah Taylor Lovell serve as co-principal investigators.
The project integrates research in computing and sustainability science with the goal of enabling a new approach to sustainable food security. By combining cyber-human systems and crowdsourcing research with the science of agroecology, the project seeks to develop an understanding of how online design tools may contribute to sustainability through enhanced local food production; to use the process of populating a plant species database as an instance of a class of problems amenable to intelligent crowdsourcing; and to pioneer new knowledge in crowdsourcing optimization.
According to the project abstract, “The work will contribute to long-term food security and offer lessons, concepts, methods, and software tools that may be transferable to other sustainability challenges.”
The award is part of the Cyber-Innovation for Sustainability Science and Engineering (CyberSEES) program at NSF, and is funded through the Division of Computing and Communication Foundations (CCF), which supports research and education projects that explore the foundations of computing and communication devices and their usage. According to the CCF website, “CCF-supported projects also investigate revolutionary computing models and technologies based on emerging scientific ideas and integrate research and education activities to prepare future generations of computer science and engineering workers.”
Lee uses crowdsourcing to predict World Cup outcome
StandardCenter member and Professor of Cognitive Science Michael Lee, in partnership with
Ranker, are using the wisdom of the crowd to predict the outcome of the World Cup. Lee and his collaborators developed a model that integrates multiple sources of ranking information available from participating individuals, along with bracket information, to make an overall prediction on each country’s likelihood of winning. See their blog post for more information.
Update: After the tournament, Lee and his collaborators have also analyzed their performance relative to other World Cup prediction models.
Smyth gives keynote talk at top artificial intelligence conference
StandardProfessor Padhraic Smyth gave an invited keynote talk at the recent AAAI Conference on Artificial Intelligence (AAAI-14). Held July 27-31 in Quebec City, Canada, the conference promotes research in artificial intelligence and scientific exchange among AI researchers, practitioners, scientists and engineers in affiliated disciplines. Read more
‘Deep learning’ makes search for exotic particles easier
StandardUCI researchers develop computing techniques that could aid hunt for Higgs bosons. Fully automated .deep learning. by computers greatly improves the odds of discovering particles such as the Higgs boson, beating even veteran physicists. abilities, according to findings by UC Irvine researchers published today in the journal Nature Communications. Read more