Southern California Machine Learning Symposium

Standard

On Friday, May 20 2016, UC Irvine will host the Southern California Machine Learning Symposium at CalIT2. The event is co-sponsored by CML, IGB, and the UCI Data Science Initiative. Speakers include:

  • Sanjoy Dasgupta, UC San Diego
  • Michael Jordan, UC Berkeley
  • Tomaso Poggio, MIT
  • Vladimir Vapnik, Columbia University & Facebook
  • Anima Anandkumar, UC Irvine
  • Fei Sha, UCLA
  • Kevin Murphy, Google
  • Pietro Perona, Caltech
  • Pierre Baldi, UC Irvine

You can register online at: www.bit.ly/ML2016.

For additional information, see the event poster.

Anandkumar receives Google Research Award

Standard

Center member Anima Anandkumar was awarded a Google Research award grant for Fall 2015. Google Research Awards are an “annual open call for proposals on computer science and related topics including machine learning, speech recognition, natural language processing, and computational neuroscience.” Awards are highly competitive, with 151 projects funded of 950 proposals. Grants include funding for a graduate student, and provide opportunities for students and faculty to collaborate directly with researchers at Google. For more information, see the Google Research Blog.

Winter 2016

Standard

Jan 11
Bren Hall 4011
1 pm
Padhraic Smyth
Professor
Department of Computer Science
University of California, Irvine

Social network analysis has a long and successful history in the social sciences, often with a focus on relatively small survey-based data sets. In the past decade, driven by the ease of automatically collecting large-scale network data sets, there has been significant interest in developing new statistical and machine learning techniques for network analysis. In this talk we will focus on two general modeling themes in this context: the use of latent variables for low-dimensional vector-based network representations models and event-based models for temporal network data. We will review the representational capabilities of these models from a generative perspective, discuss some of the challenges of parameter estimation that arise, and emphasize the role of predictive evaluation. The talk will conclude with a brief discussion of future directions in this general area.

Based on joint work with Zach Butler, Chris DuBois, Jimmy Foulds, and Carter Butts

Jan 18
No Seminar (MLK Day)

Jan 25
Bren Hall 4011
1 pm
James Foulds
Postdoctoral Fellow
Department of Computer Science
University of California, San Diego

Topic models have become increasingly prominent text-analytic machine learning tools for research in the social sciences and the humanities. In particular, custom topic models can be developed to answer specific research questions. The design of these models requires a nontrivial amount of effort and expertise, motivating general-purpose topic modeling frameworks. In this talk I will introduce latent topic networks, a flexible class of richly structured topic models designed to facilitate applied research. Custom models can straightforwardly be developed in this framework with an intuitive first-order logical probabilistic programming language. Latent topic networks admit scalable training via a parallelizable EM algorithm which leverages ADMM in the M-step. I demonstrate the broad applicability of the models with case studies on modeling influence in citation networks, and U.S. Presidential State of the Union addresses. This talk is based on joint work with Lise Getoor and Shachi Kumar from the University of California, Santa Cruz, published at ICML 2015.
Feb 1
Bren Hall 4011
1 pm
Furong Huang
PhD Candidate
Department of Electrical Engineering
University of California, Irvine

Latent or hidden variable models have applications in almost every domain, e.g., social network analysis, natural language processing, computer vision and computational biology. Training latent variable models is challenging due to non-convexity of the likelihood objective function. An alternative method is based on the spectral decomposition of low order moment matrices and tensors. This versatile framework is guaranteed to estimate the correct model consistently. I will discuss my results on convergence to globally optimal solution for stochastic gradient descent, despite non-convexity of the objective. I will then discuss large-scale implementations (which are highly parallel and scalable) of spectral methods, carried out on CPU/GPU and Spark platforms. We obtain a gain in both accuracies and in running times by several orders of magnitude compared to the state-of-art variational methods. I will discuss the following applications in detail: (1) learning hidden user commonalities (communities) in social networks, and (2) learning sentence embeddings for paraphrase detection using convolutional models. More generally, I have applied the methods to a variety of problems such as text and social network analysis, healthcare analytics, and cataloging neuronal cell types in neuroscience.
Feb 8
Bren Hall 4011
1 pm
Majid Janzamin
PhD Candidate
Department of Electrical Engineering
University of California, Irvine

Optimization lies at the core of machine learning. However, most machine learning problems entail non-convex optimization. In this talk, I will show how spectral and tensor methods can yield guaranteed convergence to globally optimal solutions under transparent conditions for a range of machine learning problems.

In the first part, I will explain how tensor methods are useful for learning latent variable models in an unsupervised manner. The focus of my work is on overcomplete regime where the hidden dimension is larger than the observed dimensionality. I describe how tensor methods enable us to learn these models in the overcomplete regime with theoretical guarantees in recovering the parameters of the model. I also provide efficient sample complexity results for training these models. Next, I will describe a new method for training neural networks for which we provide theoretical guarantees on the performance of the algorithm. We have developed a computationally efficient algorithm for training a two-layer neural network using method-of-moment and tensor decomposition techniques.

Feb 10
Bren Hall 3011
3 pm
Yining Wang
PhD Student
Machine Learning Department
CMU

I will discuss subsampling and sketching with their applications and analysis in machine learning. They can be viewed not only as tools to improve computational and storage efficiency of existing learning algorithms, but also as settings that characterize data measurement/availability/privacy constraints in modern machine learning applications. In this talk I will introduce my recent work, which analyze subsampling and sketching settings in three popular machine learning algorithms: tensor factorization, subspace clustering and linear regression.
Feb 15
No Seminar (Presidents Day)

Feb 22
Bren Hall 4011
1 pm
Julian McAuley
Assistant Professor
Computer Science & Engineering
UC San Diego

Understanding the semantics of preferences and behavior is incredibly complicated, especially in settings where the visual appearance of items influences our decisions. Three challenges that I’ll discuss in this talk include (1) how can we uncover the semantics of visual preferences, especially in sparse or long-tailed data, where new items are constantly introduced? (2) How can we use visual data to understand the relationships between items, and in particular what makes two items “visually compatible”? And (3) how can we understand the temporal dynamics of visual preferences, in order to uncover how “fashions” have evolved over time?
Feb 29
No Seminar (Cancelled)

Mar 7
Bren Hall 4011
1 pm
William Lam
PhD Candidate
Department of Computer Science
University of California, Irvine

We investigate the potential of look-ahead in the context of AND/OR search in graphical models using the mini-bucket heuristic for combinatorial optimization tasks (e.g. MAP/MPE or weighted CSPs.) We present and analyze the complexity of computing the residual (a.k.a. Bellman update) of the mini-bucket heuristic, which we call “bucket errors” and show how this can be used to identify which parts of the search space are more likely to benefit from look-ahead, therefore facilitating a method to bound its overhead. We also rephrase the look-ahead computation as a graphical model to make use of structure exploiting inference schemes. In our empirical results, we demonstrate that our methods can be used to cost-effectively increase the power of branch-and-bound search.

In the second part of the talk, we show how bucket errors can be used to improve the performance of AND/OR best-first search algorithms for providing lower bounds on the min-sum problem. In our preliminary experiments, we show that when expanding nodes for the AO* algorithm, using bucket errors as a subproblem ordering heuristic can allow us to expand fewer nodes to arrive at the optimal solution compared to the existing ordering approach.

Fall 2015

Standard

Sep 16
Bren Hall 4011
1 pm
Hanie Sedghi
Graduate Student
Department of Electrical Engineering
University of Southern California

Learning with big data is a challenging task that requires smart and efficient methods to extract useful information from data. Optimization methods, both convex and nonconvex are promising approaches to do this. In this talk I will review two classes of my work on prominent problems in convex and nonconvex optimization.

Beating the Perils of Non-Convexity: Guaranteed Training of Neural Networks using Tensor Method: Neural networks provide a versatile tool for approximating functions of various inputs. Despite exciting achievements in application, a theoretical understanding of them is mostly lacking. Training a neural network is a highly nonconvex problem and backpropagation can get stuck in local optima. For the first time, we have a computationally efficient method for training neural networks that also has guaranteed generalization. This is part of our recently proposed general framework based on method-of-moments and tensor decomposition to efficiently learn different models such as neural networks and mixture of classifiers.

Breaking Curse of Dimensionality: Stochastic Optimization in high dimensions: We have designed an efficient stochastic optimization method based on ADMM that is fast and cheap to implement, can be performed in parallel and can be used for any regularized optimization framework with some mild assumptions. We have proved that our algorithm obtains minimax optimal convergence rates for sparse optimization and robust PCA framework. Experiment results show that in the aforementioned scenarios, our method outperforms state-of-the-art, i.e., yields smaller error with equal time.

Oct 5
Bren Hall 4011
1 pm
Gokcan Karakus
Graduate Student
Department of Civil Engineering
Caltech

We are proposing an algorithm to test the accuracy of the predictions by earthquake early warning systems. Most warning systems predict the location and the magnitude of an ongoing earthquake via the early-arriving seismic wave data. Our algorithm uses logarithm of ratios between observed ground motion envelopes and Virtual Seismologist’s (Cua G. and Heaton T.) predicted envelopes to assess the validity of system predictions. We quantify the uncertainty attached to our parameters using Bayesian probability approach.
Oct 12
Bren Hall 4011
1 pm
Alexander Ihler
Associate Professor
Department of Computer Science
University of California, Irvine

Importance sampling (IS) and its variant, annealed IS (AIS) have been widely used for estimating the partition function in graphical models, such as Markov random fields and deep generative models. However, IS tends to underestimate the partition function and is subject to high variance when the proposal distribution is more peaked than the target distribution. On the other hand, “reverse” versions of IS and AIS tend to overestimate the partition function, and degenerate when the target distribution is more peaked than the proposal distribution. We present a simple, general method that gives much more reliable and robust estimates than either IS (AIS) or reverse IS (AIS). Our method works by converting the estimation problem into a simple classification problem that discriminates between the samples drawn from the target and the proposal. We give both theoretical and empirical justification, and show that an annealed version of our method significantly outperforms both AIS and reverse AIS (Burda et al., 2015), which has been the state-of-the-art for likelihood evaluation in deep generative models. Joint work with Qiang Liu, Jian Peng, and John Fisher.
Oct 19
Bren Hall 4011
1 pm
Zhiying Wang
Assistant Professor
Department of Electrical Engineering
University of California, Irvine

In this talk, we propose the multi-version coding problem for distributed storage. We consider a setting where there are n servers that aim to store v versions of a message, and there is a total ordering on the versions from the earliest to the latest. We assume that each message version has a given number of bits. Each server can receive any subset of the v versions and stores a function of the message versions it receives. The multi-version code we consider ensures that, a decoder that connects to any c out of the n servers can recover the message corresponding to the latest common version stored among those servers, or a message corresponding to a version that is later than the latest common version. We describe a simple and explicit achievable scheme, as well as an information-theoretic converse. Moreover, we apply the multi-version code to one of the problems in distributed algorithms – the emulation of atomic shared memory in a message-passing network – and improve upon previous algorithms up to a half in terms of storage cost.
Oct 26
Bren Hall 4011
1 pm
Soheil Feizi
Graduate Student
CSAIL
MIT

Network models provide a unifying framework for understanding dependencies among variables in medical, biological, and other sciences. Networks can be used to reveal underlying data structures, infer functional modules, and facilitate experiment design. In practice, however, size, uncertainty and complexity of the underlying associations render these applications challenging.

In this talk, we illustrate the use of spectral, combinatorial, and statistical inference techniques in several significant network science problems. First, we consider the problem of network alignment where the goal is to find a bijective mapping between nodes of two networks to maximize their overlapping edges while minimizing mismatches. To solve this combinatorial problem, we present a new scalable spectral algorithm, and establish its efficiency theoretically and experimentally over several synthetic and real networks. Next, we introduce network maximal correlation (NMC) as an essential measure to capture nonlinear associations in networks. We characterize NMC using geometric properties of Hilbert spaces and illustrate its application in learning network topology when variables have unknown nonlinear dependencies. Finally, we discuss the problem of learning low dimensional structures (such as clusters) in large networks, where we introduce logistic Random Dot Product Graphs, a new class of networks which includes most stochastic block models as well as other low dimensional structures. Using this model, we propose a spectral network clustering algorithm that possesses robust performance under different clustering setups. In all of these problems, we examine underlying fundamental limits and present efficient algorithms for solving them. We also highlight applications of the proposed algorithms to data-driven problems such as functional and regulatory genomics of human diseases, and cancer.

Bio: Soheil Feizi is a PhD candidate at Massachusetts Institute of Technology (MIT), co-supervised by Prof. Muriel Médard and Prof. Manolis Kellis. His research interests include analysis of complex networks and the development of inference and learning methods based on Optimization, Information Theory, Machine Learning, Statistics, and Probability, with applications in Computational Biology, and beyond. He completed his B.Sc. at Sharif University of Technology, awarded as the best student of his class. He received the Jacobs Presidential Fellowship and EECS Great Educators Fellowship, both from MIT. He has been a finalist in the Qualcomm Innovation contest. He received an Ernst Guillemin Award for his Master of Science Thesis in the department of Electrical Engineering and Computer Science at MIT.

Nov 2
Bren Hall 4011
1 pm
Surya Ganguli
Assistant Professor
Department of Applied Physics
Stanford University

Neuronal networks have enjoyed a resurgence both in the worlds of neuroscience, where they yield mathematical frameworks for thinking about complex neural datasets, and in machine learning, where they achieve state of the art results on a variety of tasks, including machine vision, speech recognition, and language translation. Despite their empirical success, a mathematical theory of how deep neural circuits, with many layers of cascaded nonlinearities, learn and compute remains elusive. We will discuss three recent vignettes in which ideas from statistical physics can shed light on this issue. In particular, we show how dynamical criticality can help in neural learning, how the non-intuitive geometry of high dimensional error landscapes can be exploited to speed up learning, and how modern ideas from non-equilibrium statistical physics, like the Jarzynski equality, can be extended to yield powerful algorithms for modeling complex probability distributions. Time permitting, we will also discuss the relationship between neural network learning dynamics and the developmental time course of semantic concepts in infants.
Nov 9
Bren Hall 4011
1 pm
Javier Larrosa
Professor
Llenguatges i Sistemes Informàtics
Universitat Politècnica de Catalunya

Weighted Max-SAT is an extension of SAT in which each clause has an associated cost. The goal is to minimize the cost of falsified clauses. Max-SAT has been successfully applied to a number of domains including Bioinformatics, Telecommunications and Scheduling.

In this talk I will introduce the Max-SAT framework and discuss the main solving approaches. In particular, I will present Max-resolution and will show how it can be effectively used in the context of Depth-first Branch-and-Bound.

Nov 16
Bren Hall 4011
1 pm
Golnaz Ghiasi
Graduate Student
Department of Computer Science
University of California, Irvine

Occlusion poses a significant difficulty for detecting and localizing object keypoints and subsequent fine-grained identification. In this talk, I will describe a hierarchical deformable part model for face detection and keypoint localization that explicitly models part occlusion. The proposed model structure makes it possible to augment positive training data with large numbers of synthetically occluded instances. This allows us to easily incorporate the statistics of occlusion patterns in a discriminatively trained model. However, this model does not exploit bottom-up cues such as detection of occluding contours and image segments. I will talk about how to modify the proposed model to utilize bottom-up class-specific segmentation in order to jointly detect and segment out the foreground pixels belonging to the face.
Nov 23
Thankgiving week
(no seminar)

Nov 30
Bren Hall 4011
1 pm
Dimitrios Kotzias
Graduate Student
Department of Computer Science
University of California, Irvine

In many classification problems labels are relatively scarce. One context in which this occurs is where we have labels for groups of instances but not for the instances themselves, as in multi-instance learning. Past work on this problem has typically focused on learning classifiers to make predictions at the group level. In this paper we focus on the problem of learning classifiers to make predictions at the instance level. To achieve this we propose a new objective function that encourages smoothness of inferred instance-level labels based on instance-level similarity, while at the same time respecting group-level label constraints. We apply this approach to the problem of predicting labels for sentences given labels for reviews, using a convolutional neural network to infer sentence similarity. The approach is evaluated using three large review data sets from IMDB, Yelp, and Amazon, and we demonstrate the proposed approach is both accurate and scalable compared to various alternatives.
Dec 7
Finals week
(no seminar)

Stern to help lead national effort to improve criminal evidence analysis, cut wrongful convictions

Standard

Center member Hal Stern, dean of the Donald Bren School of Information & Computer Sciences and professor of statistics, will help lead a new national Forensic Science Center of Excellence. Aimed at improving criminal evidence analysis and reducing wrongful convictions, it will be funded by a five-year, $20 million grant from the National Institute of Standards & Technology. The campus will receive about $4 million, to be used by ICS and social ecology faculty and students. “UC Irvine is honored to be a part of this. There is a critical need to advance the scientific underpinnings for the analysis of forensic evidence – including fingerprints, firearms, marks left by tools, and documents – and to ensure that participants in the law enforcement process have a strong understanding of proper analyses and interpretation,” said Stern, who is principal investigator for UCI. The center, headquartered at Iowa State University, also will partner with Carnegie Mellon University and the University of Virginia. It will incorporate both a research agenda – developing new probabilistic methods and statistical tools – and education to ensure that judges, lawyers and investigators can effectively utilize the results of forensic analyses. For more information, see here.

Spring 2015

Standard

Mar 30
Bren Hall 4011
1 pm
Pierre Baldi
Chancellor’s Professor
Department of Computer Science
UC Irvine

In a physical neural system, where storage and processing are intertwined, the rules for adjusting the synaptic weights can only depend on variables that are available locally, such as the activity of the pre- and post-synaptic neurons. We propose a systematic framework to define and study the space of local learning rules where one must first define the nature of the local variables, and then the functional form that ties them together into a learning rule. We consider polynomial learning rules and analyze their behavior and capabilities in both linear and non-linear networks. As a byproduct, we also show how this framework enables the discovery of new learning rules and important relationships between learning rules and group symmetries.

Stacking local learning rules in deep feedforward networks leads to deep local learning. While deep local learning can learn interesting representations, we show however that it cannot learn complex input-output functions, even when targets are available for the top layer. Learning complex input-output functions requires local deep learning where target information is propagated to the deep layers. The complexity of the propagated information about the targets and the channel through which this information is propagated partition the space of learning algorithms and highlight the remarkable power of the backpropagation algorithm. The theory clarifies the concept of Hebbian learning, what is learnable by Hebbian learning, and explains the sparsity of the space of learning rules discovered so far.

Apr 6
Bren Hall 4011
1 pm
Maryam M. Shanechi
Assistant Professor
Department of Electrical Engineering and Computer Science
University of Southern California

A brain-machine-interface (BMI) is a system that interacts with the brain either to allow the brain to control an external device or to control the brain’s state. While these two BMI types are for different applications, they can both be viewed as closed-loop control systems. In this talk, I present our work on developing both these types of BMIs, specifically motor BMIs for restoring movement in paralyzed patients and a new BMI for control of the brain state under anesthesia. Motor BMIs have largely used standard signal processing techniques. However, devising novel algorithmic solutions that are tailored to the neural system can significantly improve the performance of these BMIs. Here, I develop a novel BMI paradigm for restoration of motor function that incorporates an optimal feedback-control model of the brain and directly processes the spiking activity using point process modeling. I show that this paradigm significantly outperforms the state-of-the-art in closed-loop primate experiments. In addition to motor BMIs, I construct a new BMI that controls the state of the brain under anesthesia. This is done by designing stochastic controllers that infer the brain’s anesthetic state from non-invasive observations of neural activity and control the real-time rate of drug administration to achieve a target brain state. I show the reliable performance of this BMI in rodent experiments.

Bio:

Maryam Shanechi is an assistant professor in the Ming Hsieh Department of Electrical Engineering at the University of Southern California (USC). Prior to joining USC, she was an assistant professor in the School of Electrical and Computer Engineering at Cornell University. She received the B.A.Sc. degree in Engineering Science from the University of Toronto in 2004 and the S.M. and Ph.D. degrees in Electrical Engineering and Computer Science from MIT in 2006 and 2011, respectively. She is the recipient of the NSF CAREER Award and has been named by the MIT Technology Review as one of the world’s top 35 innovators under the age of 35 (TR35) for her work on brain-machine interfaces.

Apr 13
Bren Hall 4011
1 pm
Michael Carey
Professor
Department of Computer Science
UC Irvine

AsterixDB is a new BDMS (Big Data Management System) with a feature set that sets it apart from other Big Data platforms in today’s open source ecosystem. Its features make it well-suited to applications including web data warehousing, social data storage and analysis, and other use cases related to Big Data. AsterixDB has a flexible NoSQL style data model; a query language that supports a wide range of queries, a scalable runtime; partitioned, LSM-based data storage and indexing (including B+ tree, R tree, and text indexes); support for external as well as native data; a rich set of built-in types, including spatial, temporal, and textual types; support for fuzzy, spatial, and temporal queries; a built-in notion of data feeds for ingestion of data; and transaction support akin to that of a NoSQL store.

Development of AsterixDB began in 2009 and led to a mid-2013 initial open source release. This talk will provide an overview of the resulting system. Time permitting, the talk will cover the system’s data model, its query language, and its basic architecture. Also included will be a summary of the current status of the project and a discussion of some of the “plug-in points” where AsterixDB can be made to interoperate with ML technologies. The talk will conclude with some thoughts on opportunities for future ML-related collaborations related to AsterixDB.

Bio:

Michael J. Carey is a Bren Professor of Information and Computer Sciences at UC Irvine. Before joining UCI in 2008, he worked at BEA Systems for seven years and led the development of BEA’s AquaLogic Data Services Platform product for virtual data integration. He also spent a dozen years teaching at the University of Wisconsin-Madison, five years at the IBM Almaden Research Center working on object-relational databases, and a year and a half at e-commerce platform startup Propel Software during the infamous 2000-2001 Internet bubble. Carey is an ACM Fellow, a member of the National Academy of Engineering, and a recipient of the ACM SIGMOD E.F. Codd Innovations Award. His current interests center around data-intensive computing and scalable data management (a.k.a. Big Data).

Apr 20
Bren Hall 4011
1 pm
Cris Cecka
Research Scientist
NVIDIA Research

N-body problems are ubiquitous with applications ranging from linear algebra to scientific computing and machine learning. N-body methods were identified as one of the original 7 dwarves or motifs of computation and are believed to be important in the next decade. These methods include FMMs, Treecodes, H-matrices, Butterfly algorithms, and geometric shattering. The relationship between these approaches is understood, but many of the demonstrated tools for developing and applying these algorithms remain ad-hoc, inaccessible, or inefficient.

We present recent developments towards a codebase that is abstracted over the primary domains of research in this field and is optimized for modern multicore systems. Core components including tree construction, tree traversal, and low-rank operators are developed independently and parallelized for multicore CPUs and GPUs. Applications include dense problems in machine learning and computational geometry (k-nearest neighbors, range search, kernel density estimation, Gaussian processes, and RBF kernels), treecode and fast multipole methods in computational physics (gravitational potentials, screened Coulomb interactions, Stokes flow, and Helmholtz equations), and matrix compression, computation, and inversion (PLR, HODLR, H2, and Butterfly).

In this presentation, we will review a high-level perspective of the research domain, the abstraction and parallelization strategies, and how these methods can be made more practical.

Bio:

Cris received his PhD from Stanford University in Computational and Mathematical Engineering in 2011. As a lecturer and research scientist with the new Institute for Applied Computational Science at Harvard University, he developed core courses on parallel computing and robust software development for scientific computing. In 2014, Cris joined the Mathematics Department at the Massachusetts Institute of Technology as a research associate where he focused on developing and applying generalized N-body methods to dense linear algebra using hierarchical methods. Currently, he works in NVIDIA Research to continue to make these techniques accessible with modern parallel programming models. You can read more about his research on his Harvard web page.

Apr 27
Cancelled
(no seminar)

May 4
Bren Hall 4011
1 pm
Roi Weiss
PhD student
Department of Computer Science
Ben Gurion University of the Negev

Hidden Markov models (HMMs) are a standard tool in the modeling and analysis of time series with a wide variety of applications. Yet, learning their parameters remain a challenging problem. In the first part of the talk I will present a novel approach to learning an HMM whose outputs are distributed according to a parametric family. This is done by decoupling the learning task into two steps: first estimating the output parameters, and then estimating the hidden states transition probabilities. The first step is accomplished by fitting a mixture model to the output stationary distribution. Given the parameters of this mixture model, the second step is formulated as the solution of an easily solvable convex quadratic program. We provide an error analysis for the estimated transition probabilities and show they are robust to small perturbations in the estimates of the mixture parameters.

The above approach (and other recently proposed spectral/tensor methods) strongly depends on the assumption that all states have different output distributions. In various applications, however, some of the hidden states are aliased, having identical output distributions. The minimality, identifiability and learnability of such aliased HMMs have been long standing problems, with only partial solutions provided thus far. In the second part of the talk, as a first step, I will focus on parametric-output HMMs that have exactly two aliased states. For this class, we present a complete characterization of their minimality and identifiability. Furthermore, we derive computationally efficient and statistically consistent algorithms to detect the presence of aliasing and learn the aliased HMM transition parameters. We illustrate our theoretical analysis by several simulations.

A joint work with Boaz Nadler and Aryeh Kontorovich.

May 11
Bren Hall 4011
1 pm
Ananda Theertha Suresh
PhD student
Department of Electrical Engineering
UC San Diego

Many statistical and machine-learning applications call for estimating Gaussian mixtures using a limited number of samples and computational time. PAC (proper) learning estimates a distribution in a class by some distribution in the same class to a desired accuracy. Using spectral projections we show that spherical Gaussian mixtures in d-dimensions can be PAC learned with O*(d) samples, and that the same applies for learning the distribution’s parameters. Our algorithm is information theoretically near-optimal and significantly improves previously known time and sample complexities.
May 18
Bren Hall 4011
1 pm
Saeed Saremi
Postdotoral Fellow
The Computational Neurobiology Laboratory
Salk Institute

Natural images are scale invariant with structures at all length scales. After a tutorial on critical phenomena and percolation theory, I will talk about formulating a geometric view of scale invariance. In this model, the scale invariance of natural images is understood as a second-order percolation phase transition. It is further quantified by fractal dimensions, and by the scale-free distribution of clusters in natural images. This formulation leads to a method for identifying clusters in images and a starting point for image segmentation.

Bio:

Saeed Saremi received the Ph.D. degree in theoretical physics from MIT. He then joined the lab of Terry Sejnowski at the Salk Institute as a postdoctoral fellow. His research blends machine learning, statistical mechanics, and computational neuroscience, with the long-term goal of understanding the principles for achieving artificial intelligence.

June 1
Bren Hall 4011
1 pm
Leandro Soriano Marcolino
PhD student
Viterbi School of Engineering
University of Southern California

Teams of voting agents have great potential in finding optimal solutions, and they have been used in many important domains, such as: machine learning, crowdsourcing, forecasting systems, and even board games. Voting is popular since it is highly parallelizable, easy to implement and provide theoretical guarantees. However, there are three fundamental challenges: (i) Selecting a limited number of agents to compose a team; (ii) Combining the opinions of the team members; (iii) Assessing the performance of a given team. In this talk, I address all these challenges, showing both theoretical and experimental results. I explore three different domains: Computer Go, HIV prevention via influencing social networks and architectural design.

Bio:

Leandro Soriano Marcolino is a PhD student at University of Southern California (USC), advised by Milind Tambe. He has published in several prestigious conferences in AI, robotics and machine learning, such as AAAI, AAMAS, IJCAI, NIPS, ICRA and IROS. He received the best research assistant award from the Computer Science Department at USC, had a paper nominated for best paper from the leading multi-agent conference AAMAS, and had his undergraduate work selected as the best one by the Brazilian Computer Science Society. He has been researching continuously about teamwork and cooperation, and obtained his masters degree in Japan, with the highly-competitive Monbukagakusho scholarship. Over his career, Leandro has published about a variety of domains, such as swarm robotics, computer Go, social networks, bioinformatics and architectural design.

June 8
Bren Hall 4011
1 pm
Quentin Berthet
CMI Postdoctoral Fellow
Computing + Mathematical Sciences, Annenberg Center
California Institute of Technology

Statistical estimation in many contemporary settings involves the acquisition, analysis, and aggregation of datasets from multiple sources, which can have significant differences in character and in value. Due to these variations, the effectiveness of employing a given resource – e.g., a sensing device or computing power – for gathering or processing data from a particular source depends on the nature of that source. As a result, the appropriate division and assignment of a collection of resources to a set of data sources can substantially impact the overall performance of an inferential strategy. In this expository article, we adopt a general view of the notion of a resource and its effect on the quality of a data source, and we describe a framework for the allocation of a given set of resources to a collection of sources in order to optimize a specified metric of statistical efficiency. We discuss several stylized examples involving inferential tasks such as parameter estimation and hypothesis testing based on heterogeneous data sources, in which optimal allocations can be computed either in closed form or via efficient numerical procedures based on convex optimization. Joint work with V. Chandrasekaran.

Anandkumar receives AFOSR Young Investigator Award

Standard

Center member Prof. Anima Anandkumar received a 3-year grant, Learning Mixed Membership Community Models: A Statistical and a Computational Framework, from the Air Force Office of Scientific Research, as part of its Young Investigator Research Program. The YIP is open to scientists and engineers who received their Ph.D. within the last five years, and show exceptional ability and promise for conducting basic research. Its objective is to foster creative basic research in science and engineering, enhance early career development of outstanding young investigators, and increase opportunities for the young investigators to recognize the Air Force mission and the related challenges in science and engineering.

Winter 2015

Standard

Jan 12
Bren Hall 4011
1 pm
Aditya Bhaskara
PostDoc Researcher
Graph Mining Team
Google Research NYC

Mixture models are based on the hypothesis that real data can be viewed as arising from a probabilistic generative model with a small number of parameters. Examples include topic models for documents, hidden Markov models for speech, gaussian mixture models for point data, etc. The model parameters often give interesting semantic insights into the data.

I will first discuss algorithms for parameter estimation in mixture models via the use of tensors, or higher dimensional arrays. The idea is to estimate (using the data) a tensor, whose decomposition allows us to read off the hidden parameters. Thus parameter estimation is reduced to tensor decomposition. Unfortunately, tensor decomposition is NP-hard in general. However, I will show that there exist algorithms that “almost always” work efficiently, in the framework of smoothed analysis, as long as the “rank” is at most a polynomial in the number of dimensions.

Next I will consider the case in which we wish to learn mixtures in small (constant number of) dimensions. Tensor methods do not apply to this regime, and indeed, there are lower bounds which say that exponentially many samples are needed to “identify” the parameters. However I will show that under a slightly relaxed objective, we can obtain “PAC” style learning algorithms. This follows from a more general theorem about sparse recovery with \ell_1 error, which I will describe.

Jan 26
Bren Hall 4011
1 pm
Rich Caruana
Senior Researcher
Microsoft Research in Redmond

Currently, deep neural networks are the state of the art on problems suchas speech recognition and computer vision. By using a method called model compression, we show that shallow feed-forward nets can learn the complex functions previously learned by deep nets and achieve accuracies previously only achievable with deep models while using the same number of parameters as the original deep models. On the TIMIT phoneme recognition and CIFAR-10 image recognition tasks, shallow nets can be trained that perform similarly to complex, well-engineered, deeper convolutional architectures. The same model compression trick can also be used to compress impractically large deep models and ensembles of large deep models down to “medium-size” deep models that run more efficiently on servers, and down to “small” models that can run on mobile devices. In machine learning and statistics we used to believe that one of the keys to preventing overfitting was to keep models simple and the number of parameters small to force generalization. We no longer believe this — learning appears to generalize best when training models with excess capacity, but the learned functions can often be represented with far fewer parameters. We do not yet know if this is true just of current learning algorithms, or if it is a fundamental property of learning in general.
Feb 2
Bren Hall 4011
1 pm
Peter Sadowski
Graduate Student
Department of Computer Science
UC Irvine

Machine learning plays a major role in analyzing data from the Large Hadron Collider at CERN, and was used to discover the Higgs boson in 2012. We demonstrate that deep learning increases statistical power of this analysis, and that deep neural networks can automatically learn high-level features that usually need to be engineered by physicists. Furthermore, we describe how to automatically (and cheaply) tune deep neural network hyperparameters using Amazon EC2 GPU servers and free software tools.
Feb 5
Bren Hall 4011
1 pm
Joshua Blumenstock
Assistant Professor
Information School
University of Washington

In recent years, the rapid proliferation of mobile phones in developing countries has provided billions of individuals with novel opportunities for social and economic interaction. Concurrently, the data generated by mobile phone networks is enabling new data-intensive methods for studying the social and economic behavior of individuals in resource-constrained environments. After all, these data reflect much more than simple communications activity: they capture the structure of social networks, decisions about expenditures and consumption, patterns of travel and mobility, and the regularity of daily routines. In this talk, I will discuss the results from two recent projects that derive behavioral insights from mobile phone data. The first study uses data on Mobile Money transfers in Rwanda and microeconomic models to better understand the motives that cause people to send money to friends and family in times of need. The second project combines call data with follow-up phone surveys to investigate the extent to which it is possible to predict an individual’s wealth and happiness based on his or her prior history of phone calls and several supervised learning models. These projects are enabled by generous support from the Institute for Money, Technology, and Financial Inclusion; Intel; the Gates Foundation; and the NSF.
Feb 9
Bren Hall 4011
1 pm
Shimon Whiteson
Associate Professor
Intelligent Autonomous Systems Group, Informatics Institute
University of Amsterdam

In this talk, I will propose a new method for the K-armed dueling bandit problem, a variation on the regular K-armed bandit problem that offers only relative feedback about pairs of arms. Our approach extends the Upper Confidence Bound algorithm to the relative setting by using estimates of the pairwise probabilities to select a promising arm and applying Upper Confidence Bound with the winner as a benchmark. We prove a sharp finite-time regret bound of order O(K log T) on a very general class of dueling bandit problems that matches a lower bound proven by Yue et al. In addition, our empirical results using real data from an information retrieval application show that it greatly outperforms the state of the art.
Feb 23
Bren Hall 4011
1 pm
Cylance
Glenn Chisholm, Chief Technology Officer, Cylance
Matt Wolff, Chief Data Scientist, Cylance
Michael Wojnowicz, Data Scientists, Cylance

Traditional approaches to detecting malware, namely those used by current antivirus methodologies, are increasingly comprised by sophisticated attackers who may have financial, social, or nationalistic motives. In just the last few weeks, there have been successful hacking attacks against Sony and Anthem, as well as a banking attack by the Carbanak group that stole over $1 billion for various financial institutions. Traditional antivirus approaches utilize manual analysis of files to identify malware; however these techniques simply do not scale to the volume of malware that now exists. At the time of this writing, an estimated million distinct newly suspicious files per day are generated. Clearly, manual inspection of all these files by human analysis is not feasible.

At Cylance, we have developed a machine learning engine to help reduce or remove the need for manual analysis. In this talk, we will dive in into various components of our machine learning infrastructure, with the goal of providing some insight into how one can apply machine learning to problems in industry and against large datasets. A machine learning approach to cybersecurity presents a wide array of interesting challenges in areas of feature extraction, feature engineering, dimensionality reduction, and modeling. Specific topics that will be discussed include designing models with speed in mind, why different optimization methods do or do not perform well against various types of data, and feature engineering based on wavelet analysis and entropy series.

Mar 9
Bren Hall 4011
1 pm
Fan-Gang Zeng
Professor of Anatomy and Neurobiology, Biomedical Engineering and Cognitive Sciences, UC Irvine
Director, Center for Hearing Research
Director of Research, Otolaryngology – Head and Neck Surgery, UC Irvine

Deafness affects not only speech communication but also language development including speaking and reading. The bionic ear or the cochlear implant is a modern medical device that allows the first successful restoration of a human sense. It works well for either adults who have lost hearing postlingually or children who receive the device prelingually. It doesn’t work well for those who never hear during development but get an implant in adulthood. My talk will address two issues. First, I will describe development of the modern cochlear implant from signal processing perspective, which clearly abides by the “KISS” principle. Second, I will emphasize the importance of the brain in the cochlear implant success, speculating on the structure and rules for the neural network to learn how to process speech.

Mjolsness named American Association for the Advancement of Science fellow

Standard

Center member and Professor of Computer Science Eric Mjolsness has been been made a fellow of the American Association for the Advancement of Science for his distinguished contributions to the fields of computer science and biology, particularly for new computational models of gene regulation (networks of genes that turn each other on, off or partly on) and resulting technologies. For more details, see here.