Winter 2015

Standard

Jan 12
Bren Hall 4011
1 pm
Aditya Bhaskara
PostDoc Researcher
Graph Mining Team
Google Research NYC

Mixture models are based on the hypothesis that real data can be viewed as arising from a probabilistic generative model with a small number of parameters. Examples include topic models for documents, hidden Markov models for speech, gaussian mixture models for point data, etc. The model parameters often give interesting semantic insights into the data.

I will first discuss algorithms for parameter estimation in mixture models via the use of tensors, or higher dimensional arrays. The idea is to estimate (using the data) a tensor, whose decomposition allows us to read off the hidden parameters. Thus parameter estimation is reduced to tensor decomposition. Unfortunately, tensor decomposition is NP-hard in general. However, I will show that there exist algorithms that “almost always” work efficiently, in the framework of smoothed analysis, as long as the “rank” is at most a polynomial in the number of dimensions.

Next I will consider the case in which we wish to learn mixtures in small (constant number of) dimensions. Tensor methods do not apply to this regime, and indeed, there are lower bounds which say that exponentially many samples are needed to “identify” the parameters. However I will show that under a slightly relaxed objective, we can obtain “PAC” style learning algorithms. This follows from a more general theorem about sparse recovery with \ell_1 error, which I will describe.

Jan 26
Bren Hall 4011
1 pm
Rich Caruana
Senior Researcher
Microsoft Research in Redmond

Currently, deep neural networks are the state of the art on problems suchas speech recognition and computer vision. By using a method called model compression, we show that shallow feed-forward nets can learn the complex functions previously learned by deep nets and achieve accuracies previously only achievable with deep models while using the same number of parameters as the original deep models. On the TIMIT phoneme recognition and CIFAR-10 image recognition tasks, shallow nets can be trained that perform similarly to complex, well-engineered, deeper convolutional architectures. The same model compression trick can also be used to compress impractically large deep models and ensembles of large deep models down to “medium-size” deep models that run more efficiently on servers, and down to “small” models that can run on mobile devices. In machine learning and statistics we used to believe that one of the keys to preventing overfitting was to keep models simple and the number of parameters small to force generalization. We no longer believe this — learning appears to generalize best when training models with excess capacity, but the learned functions can often be represented with far fewer parameters. We do not yet know if this is true just of current learning algorithms, or if it is a fundamental property of learning in general.
Feb 2
Bren Hall 4011
1 pm
Peter Sadowski
Graduate Student
Department of Computer Science
UC Irvine

Machine learning plays a major role in analyzing data from the Large Hadron Collider at CERN, and was used to discover the Higgs boson in 2012. We demonstrate that deep learning increases statistical power of this analysis, and that deep neural networks can automatically learn high-level features that usually need to be engineered by physicists. Furthermore, we describe how to automatically (and cheaply) tune deep neural network hyperparameters using Amazon EC2 GPU servers and free software tools.
Feb 5
Bren Hall 4011
1 pm
Joshua Blumenstock
Assistant Professor
Information School
University of Washington

In recent years, the rapid proliferation of mobile phones in developing countries has provided billions of individuals with novel opportunities for social and economic interaction. Concurrently, the data generated by mobile phone networks is enabling new data-intensive methods for studying the social and economic behavior of individuals in resource-constrained environments. After all, these data reflect much more than simple communications activity: they capture the structure of social networks, decisions about expenditures and consumption, patterns of travel and mobility, and the regularity of daily routines. In this talk, I will discuss the results from two recent projects that derive behavioral insights from mobile phone data. The first study uses data on Mobile Money transfers in Rwanda and microeconomic models to better understand the motives that cause people to send money to friends and family in times of need. The second project combines call data with follow-up phone surveys to investigate the extent to which it is possible to predict an individual’s wealth and happiness based on his or her prior history of phone calls and several supervised learning models. These projects are enabled by generous support from the Institute for Money, Technology, and Financial Inclusion; Intel; the Gates Foundation; and the NSF.
Feb 9
Bren Hall 4011
1 pm
Shimon Whiteson
Associate Professor
Intelligent Autonomous Systems Group, Informatics Institute
University of Amsterdam

In this talk, I will propose a new method for the K-armed dueling bandit problem, a variation on the regular K-armed bandit problem that offers only relative feedback about pairs of arms. Our approach extends the Upper Confidence Bound algorithm to the relative setting by using estimates of the pairwise probabilities to select a promising arm and applying Upper Confidence Bound with the winner as a benchmark. We prove a sharp finite-time regret bound of order O(K log T) on a very general class of dueling bandit problems that matches a lower bound proven by Yue et al. In addition, our empirical results using real data from an information retrieval application show that it greatly outperforms the state of the art.
Feb 23
Bren Hall 4011
1 pm
Cylance
Glenn Chisholm, Chief Technology Officer, Cylance
Matt Wolff, Chief Data Scientist, Cylance
Michael Wojnowicz, Data Scientists, Cylance

Traditional approaches to detecting malware, namely those used by current antivirus methodologies, are increasingly comprised by sophisticated attackers who may have financial, social, or nationalistic motives. In just the last few weeks, there have been successful hacking attacks against Sony and Anthem, as well as a banking attack by the Carbanak group that stole over $1 billion for various financial institutions. Traditional antivirus approaches utilize manual analysis of files to identify malware; however these techniques simply do not scale to the volume of malware that now exists. At the time of this writing, an estimated million distinct newly suspicious files per day are generated. Clearly, manual inspection of all these files by human analysis is not feasible.

At Cylance, we have developed a machine learning engine to help reduce or remove the need for manual analysis. In this talk, we will dive in into various components of our machine learning infrastructure, with the goal of providing some insight into how one can apply machine learning to problems in industry and against large datasets. A machine learning approach to cybersecurity presents a wide array of interesting challenges in areas of feature extraction, feature engineering, dimensionality reduction, and modeling. Specific topics that will be discussed include designing models with speed in mind, why different optimization methods do or do not perform well against various types of data, and feature engineering based on wavelet analysis and entropy series.

Mar 9
Bren Hall 4011
1 pm
Fan-Gang Zeng
Professor of Anatomy and Neurobiology, Biomedical Engineering and Cognitive Sciences, UC Irvine
Director, Center for Hearing Research
Director of Research, Otolaryngology – Head and Neck Surgery, UC Irvine

Deafness affects not only speech communication but also language development including speaking and reading. The bionic ear or the cochlear implant is a modern medical device that allows the first successful restoration of a human sense. It works well for either adults who have lost hearing postlingually or children who receive the device prelingually. It doesn’t work well for those who never hear during development but get an implant in adulthood. My talk will address two issues. First, I will describe development of the modern cochlear implant from signal processing perspective, which clearly abides by the “KISS” principle. Second, I will emphasize the importance of the brain in the cochlear implant success, speculating on the structure and rules for the neural network to learn how to process speech.