Jan 15

No Seminar (MLK Day)

Jan 22
Bren Hall 4011 1 pm 
Shufeng Kong
PhD Candidate
Centre for Quantum Software and Information, FEIT
University of Technology Sydney, Australia
The Simple Temporal Problem (STP) is a fundamental temporal reasoning problem and has recently been extended to the Multiagent Simple Temporal Problem (MaSTP). In this paper we present a novel approach that is based on enforcing arcconsistency (AC) on the input (multiagent) simple temporal network. We show that the ACbased approach is sufficient for solving both the STP and MaSTP and provide efficient algorithms for them. As our ACbased approach does not impose new constraints between agents, it does not violate the privacy of the agents and is superior to the stateoftheart approach to MaSTP. Empirical evaluations on diverse benchmark datasets also show that our ACbased algorithms for STP and MaSTP are significantly more efficient than existing approaches. 
Jan 29
Bren Hall 4011 1 pm 
Postdoctoral Scholar
Paul Allen School of Computer Science and Engineering
University of Washington
Deep learning is one of the most important techniques used in natural language processing (NLP). A central question in deep learning for NLP is how to design a neural network that can fully utilize the information from training data and make accurate predictions. A key to solving this problem is to design a better network architecture.
In this talk, I will present two examples from my work on how structural information from natural language helps design better neural network models. The first example shows adding coreference structures of entities not only helps different aspects of text modeling, but also improves the performance of language generation; the second example demonstrates structures of organizing sentences into coherent texts can help neural networks build better representations for various text classification tasks. Along the lines of this topic, I will also propose a few ideas for future work and discuss some potential challenges. 
February 5

No Seminar (AAAI)

February 12
Bren Hall 4011 1 pm 
PhD Candidate
Computer Science
University of California, Irvine
Bayesian inference for complex models—the kinds needed to solve complex tasks such as object recognition—is inherently intractable, requiring analytically difficult integrals be solved in high dimensions. One solution is to turn to variational Bayesian inference: a parametrized family of distributions is proposed, and optimization is carried out to find the member of the family nearest to the true posterior. There is an innate tradeoff within VI between expressive vs tractable approximations. We wish the variational family to be as rich as possible so as it might include the true posterior (or something very close), but adding structure to the approximation increases the computational complexity of optimization. As a result, there has been much interest in efficient optimization strategies for mixture model approximations. In this talk, I’ll return to the problem of using mixture models for VI. First, to motivate our approach, I’ll discuss the distinction between averaging vs combining variational models. We show that optimization objectives aimed at fitting mixtures (i.e. model combination), in practice, are relaxed into performing something between model combination and averaging. Our primary contribution is to formulate a novel training algorithm for variational model averaging by adapting Stein variational gradient descent to operate on the parameters of the approximating distribution. Then, through a particular choice of kernel, we show the algorithm can be adapted to perform something closer to model combination, providing a new algorithm for optimizing (finite) mixture approximations. 
February 19

No Seminar (President’s Day)

February 26
Bren Hall 4011 1 pm 
Research Scientist
ISI/USC
Knowledge is an essential ingredient in the quest for artificial intelligence, yet scalable and robust approaches to acquiring knowledge have challenged AI researchers for decades. Often, the obstacle to knowledge acquisition is massive, uncertain, and changing data that obscures the underlying knowledge. In such settings, probabilistic models have excelled at exploiting the structure in the domain to overcome ambiguity, revise beliefs and produce interpretable results. In my talk, I will describe recent work using probabilistic models for knowledge graph construction and information extraction, including linking subjects across electronic health records, fusing background knowledge from scientific articles with gene association studies, disambiguating user browsing behavior across platforms and devices, and aligning structured data sources with textual summaries. I also highlight several areas of ongoing research, fusing embedding approaches with probabilistic modeling and building models that support dynamic data or humanintheloop interactions.
Bio: 
March 5
Bren Hall 4011 1 pm 
Assistant Professor
UC Riverside
Tensors and tensor decompositions have been very popular and effective tools for analyzing multiaspect data in a wide variety of fields, ranging from Psychology to Chemometrics, and from Signal Processing to Data Mining and Machine Learning. Using tensors in the era of big data presents us with a rich variety of applications, but also poses great challenges such as the one of scalability and efficiency. In this talk I will first motivate the effectiveness of tensor decompositions as data analytic tools in a variety of exciting, realworld applications. Subsequently, I will discuss recent techniques on tackling the scalability and efficiency challenges by parallelizing and speeding up tensor decompositions, especially for very sparse datasets, including the scenario where the data are continuously updated over time. Finally, I will discuss open problems in unsupervised tensor mining and quality assessment of the results, and present workinprogress addressing that problem with very encouraging results. 
March 12
Bren Hall 4011 1 pm 
PhD Student
UC Los Angeles
I will describe the basic elements of the Emergence Theory of Deep Learning, that started as a general theory for representations, and is comprised of three parts: (1) We formalize the desirable properties that a representation should possess, based on classical principles of statistical decision and information theory: invariance, sufficiency, minimality, disentanglement. We then show that such an optimal representation of the data can be learned by minimizing a specific loss function which is related to the notion of Information Bottleneck and Variational Inference. (2) We analyze common empirical losses employed in Deep Learning (such as empirical crossentropy), and implicit or explicit regularizers, including Dropout and Pooling, and show that they bias the network toward recovering such an optimal representation. Finally, (3) we show that minimizing a suitably (implicitly or explicitly) regularized loss with SGD with respect to the weights of the network implies implicit optimization of the loss described in (1), with relates instead to the activations of the network. Therefore, even when we optimize a DNN as a blackbox classifier, we are always biased toward learning minimal, sufficient and invariant representation. The link between (implicit or explicit) regularization of the classification loss and learning of optimal representations is specific to the architecture of deep networks, and is not found in a general classifier. The theory is related to a new version of the Information Bottleneck that studies the weights of a network, rater than the activation, and can also be derived using PACBayes or Kolmogorov complexity arguments, providing independent validation. 
March 19

No Seminar (Finals Week)
