- Amos: Sequential Monte-Carlo update. Basically quick summaries of salient papers on this SMC page would be worth doing to get everyone up to speed. I'll probably arrange this for my session on 21 Aug. Charles: I agree and suggest this paper if it hasn't been done already: "Sequential Monte Carlo Samplers", (with P. Del Moral & A. Jasra),
*J. Royal Statist. Soc.*B, vol. 68, no. 3, pp. 411-436, 2006. - NIPS2006 workshop on Dynamical Systems, Stochastic Processes and Bayesian Inference.
- Compressed sensing, see http://www.dsp.ece.rice.edu/cs/
- Further look at deep belief networks, including the work on human motion which gives a good demo of conditional deep belieft network models.
- Submodular functions and optimization (generalizes convexity to functions on sets), see e.g. http://www.mlpedia.org/index.php?title=Submodular_function

How overconfidence slows you down, a learning story.

Abstract: Nowadays, for many tasks such as object recognition or language modeling, data is plentiful. As such, the most important challenge has become to find a way to use all the available data rather than to deal with small training sets. In this setting, coined ``large-scale learning'' by Bottou and Bousquet,learning and optimization become different and powerful optimization algorithms are suboptimal learning algorithms. While many only considered optimization algorithms (or approximations thereof) to perform learning, I will show how designing a proper learning algorithm and making use of the covariance of the gradients can yield faster, more robust, convergence. I will also show that this covariance matrix is not an approximation of the Hessian and that the two matrices can be combined in an principled and efficient way.

- Peter Orbanz and Joachim M. Buhmann, Int J Comput Vis 77: 25–45 (2008): Nonparametric Bayesian Image Segmentation

Information Theoretic Novelty Detection

In this talk, we present a novel approach to online change detection problems when the training sample size is small. The proposed method is based on estimating the expected information content of a new data point in the null hypothesis that it has been generated from the same distribution as the training data. In the case of the Gaussian distribution, our approach is analytically tractable and closely related to classical statistical tests, since the expected information content is independent from the statistics of the generating distribution. Such a test naturally takes into account the variability of the statistics due to the finite sample effect, and thus it allows to control the false positive rate even when only a small training set is available. We then discuss two different extensions of the presented method. In the first one, we propose an approximation scheme to evaluate the information content of a new data point when the generating distribution is a mixture of Gaussians. Finally, we study the extension to autoregressive time series with Gaussian noise, thus removing the i.i.d. assumption. The experiments conducted on synthetic and real data sets show that our method maintains a good overall accuracy, but significantly improves the control over the false positive rate.

Part of the material covered in the talk can be found here:

- M. Filippone and G. Sanguinetti, To appear in Pattern Recognition: Information Theoretic Novelty Detection
- M. Filippone and G. Sanguinetti, Technical Report: Novelty detection in autoregressive models using information theoretic measures

- H Larochelle, Y Bengio, ICML (2008): Classification using Discriminative Restricted Boltzmann Machines
- H Larochelle, D Erhan, P Vincent, AISTATS (2009): Deep Learning using Robust Interdependent Codes

I will begin by describing the tree-structured latent variable model it employs to generate pyramidally organized multiscale image features, and to couple dependencies between them. I will then describe an extension using Hierarchical Dirichlet Processes to learn data-driven, global statistical image models of unbounded complexity. Finally, we develop effective learning algorithms using Markov chain Monte Carlo methods and belief propagation for categorizing images of novel scenes, and denoising them in a transfer learning-based algorithm.

- J. J. Kivinen, E. B. Sudderth, and M. I. Jordan, ICCV (2007): Learning multiscale representations of natural scenes using Dirichlet processes.
- J. J. Kivinen, E. B. Sudderth, and M. I. Jordan, ICIP (2007): * Image denoising with nonparametric hidden Markov trees.

Title: Variational Inference for Large Datasets in Gaussian Processes

Gaussian processes (GPs) are stochastic processes over real-valued functions that can be used for Bayesian non-linear regression and classification problems. GPs also naturally arise as the solutions of linear stochastic differential equations. However, when the amount of observed or training data is large, the evaluation of posterior GPs is intractable because the computations scale as O(n^3) where n is the number of training examples. Therefore, for large datasets we need to consider approximate or sparse inference methods. In this talk we discuss sparse approximations for GPs based on inducing/support variables and the standard variational inference methodology. We apply this to regression, binary classification and large systems of linear stochastic differential equations.

We will discuss Variational Inference in Markov Jump Processes using the following papers:

- M. Opper and G. Sanguinetti, NIPS (2007): Variational Inference for Markov Jump Processes
- G. Sanguinetti, et al., Bioinformatics 25(10): 1280-1286 (2009): Switching regulatory models of cellular stress response

Joint session with joint DevCompNeuro journal club.

Short description of the talk:

The main goal of the project I'm working at is to predict what image has been presented to an animal based on the activity profile of group of cells (~50) obtain via two-photon imaging. The system would learn this prediction from recordings of pairs of images and activity profiles. Instead of directly predicting the image the goal is to be able to tell from a large set of images which one was the presented one.

I have so far applied several simpler approaches to the problem, including the simple linear perceptrons, multi-layer NN with back-propagation and notably the 'gaussian pyramid model' which worked when applied to analogous problem but with fMRI data in a study by (Gallant et al. 2008). I have also tried several approaches to directly determine the receptive field of the neurons.

So far these techniques haven't worked. My main aim with this presentation is to get some brainstorming going and perhaps learn from the real machine learning people about latest approaches to fit non-linear models. I would be particularly interested in learning about methods of learning recurrent NN, as it appears that a lot of the neural responses are due to lateral interaction as opposed to the feed-forward receptive field structure.

Kay NN, Naselaris T, Prenger RJ and Gallant JL (2008): Identifying natural images from human brain activity

- Vikash Mansinghka, Daniel Roy, Eric Jonas, Joshua Tenenbaum
Exact and Approximate Sampling by Systematic Stochastic Search. AISTATS 2009.*.* - Ricardo Silva, Zoubin Ghahramani
Factorial Mixture of Gaussians and the Marginal Independence Model. AISTATS 2009.*.*

Brief discussions on the following UAI 2009 papers:

- KMC: Group Sparse Priors for Covariance Estimation
- KMC: Multi-Task Feature Learning Via Efficient L2,1-Norm Minimization
- ASP: Products of Hidden Markov Models: It Takes N>1 to Tango
- ASP: Convexifying the Bethe Free Energy
- FD: Modeling Discrete Interventional Data using Directed Cyclic Graphical Models
- AJS: Lower Bound Bayesian Networks - Efficient Inference of Lower Bounds on Probability Distributions, Daniel Andrade, Bernhard Sick
- CKIW Mean Field Variational Approximation for Continuous-Time Bayesian Networks
- CKIW Virtual Vector Machine for Bayesian Online Classification
- FA: Optimization of Structured Mean Field Objectives

UAI 2009 proceedings at http://www.cs.mcgill.ca/~uai2009/proceedings.html

We will discuss the following paper:

- E. B. Anderes and M. L. Stein, Annals of Statistics, Vol. 36, No. 2, 719-741, (2008): Estimating deformations of isotropic Gaussian random fields on the plane.

We will discuss Deep Boltzmann Machines using the paper:

- R. Salakhutdinov and G.E. Hinton, To appear in Artificial Intelligence and Statistics (2009): Deep Boltzmann Machines

If there is enough time, Amos will also give a basic introduction to Martingales using:

- Introduction to Martingales by Robert L. Wolpert

Brief discussions on the following ICML 2009 papers:

- CW: Herding Dynamical Weights to Learn
- JAT: Learning Linear Dynamical Systems without Sequence Information
- JAT: Function factorization using warped Gaussian processes
- MD: Learning Nonlinear Dynamic Models
- MD: Dynamic Mixed Membership Block Model for Evolving Networks

Note: ICML 2009 proceedings at http://www.cs.mcgill.ca/~icml2009/abstracts.html

Brief discussions on the following ICML 2009 papers:

- KMC: Sparse Gaussian Graphical Models with Unknown Block Structure
- KMC: Learning with Structured Sparsity
- DR: Deep Learning from Temporal Coherence in Video
- ASP: Curriculum Learning
- ASP: Factored Conditional Restricted Boltzmann Machines for Modeling Motion Style
- FA: Regression by dependence minimization and its application to causal inference
- FA: Split Variational Inference

We will discuss Hierarchical HMMs using the paper:

- K. Murphy and M. Paskin, NIPS (2001): Linear Time Inference in Hierarchical HMMs

Note: There is an extended version of the paper:

- K. Murphy, November 2001: Hierarchical HMMs

We will discuss two variants from the RBM/DBN literature using the papers:

- H. Lee, et al., ICML (2009): Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations

- R. Memisevic, G. E. Hinton, CVRP (2007): Unsupervised Learning of Image Transformations

We will discuss multi-arm bandits and Gittins indices. This is a simple case where the exploration-exploitation tradeoff is seen, and there is an optimal Bayesian solution.

The papers

J. C. Gittins, Bandit Processes and Dynamic Allocation Indices, Journal of the Royal Statistical Society. Series B (Methodological), Vol. 41, No. 2. (1979), pp. 148-177.

J. C. Gittins, D. M. Jones, A Dynamic Allocation Index for the Discounted Multiarmed Bandit Problem, Biometrika, Vol 66, No. 3. (1979), pp. 561-565.

are available via http://en.wikipedia.org/wiki/Gittins_index

- L.Li et al., Bayesian Analysis 3:171-196 (2008): A Method for Avoiding Bias from Feature Selection with Application to Naive Bayes Classification Models

- A. Pelizzola, J.PHYS.A 38:R309 (2005): Cluster Variation Method in Statistical Physics and Probabilistic Graphical Models

- D.M. Roy and Y.W. Teh, NIPS (2009): The Mondrian Process
- P. Rai and H. Daume III, NIPS (2009): The Infinite Hierarchical Factor Regression Model

- General duality between optimal control and estimation (
*Emanuel Todorov. IEEE Conference on Decision and Control, 2008*) - Optimal control as a graphical model inference problem (
*B. Kappen, V. Gomez, M. Opper. arXiv:0901.0633v2 [cs.AI], 2009*)

- R. Koenker and K. F. Hallock, The Journal of Economic Perspectives 15:143-156 (2001): Quantile Regression
- I. Takeuchi, et al., Journal of Machine Learning Research 7:1231--1264 (2006): Nonparametric Quantile Estimation

- L. Li and R. M. Neal, Bayesian Analysis 3:793-822 (2008): Compressing parameters in Bayesian high-order models with application to logistic sequence models
- R. Adams et al., NIPS (2009): The Gaussian Process Density Sampler

Note: NIPS 21 preproceedings at http://books.nips.cc/nips21.html

- NH: I. Murray, R. Salakhutdinov: Evaluating probabilities under high-dimensional latent variable models
- NH: I. Sutskever, G. Hinton, G. Taylor: The Recurrent Temporal Restricted Boltzmann Machine
- EB: Reducing statistical dependencies in natural signals using radial Gaussianization (
*Siwei Lyu, Eero Simoncelli*) - EB: Sparse Convolved Gaussian Processes for Multi-ouptut Regression (
*M. Alvarez, N. Lawrence*) - CW: Shared Segmentation of Natural Scenes Using Dependent Pitman-Yor Processes. Erik Sudderth, Michael Jordan
- CW: The Conjoint Effect of Divisive Normalization and Orientation Selectivity on Redundancy Reduction Fabian H. Sinz, Matthias Bethge
- CW: Bayesian Exponential Family PCA Shakir Mohamed, Katherine Heller, Zoubin Ghahramani. See also [[http://www.cs.ualberta.ca/~dale/papers/allerton08.pdf] [Efficient global optimization for exponential family PCA and low-rank matrix factorization.]] Guo, Y. and Schuurmans, D. (2008) in Allerton Conference on Communication, Control, and Computing.
- MD: Using Bayesian Dynamic Systems for Motion Template Libraries Silvia Chiappa, Jens Kober, Jan Peters
- MD Nonparametric Bayesian Learning of Switching Linear Dynamical Systems Emily Fox, Erik Sudderth, Michael Jordan, Alan Willsky
- DR: Cascaded Classification Models: Combining Models for Holistic Scene Understanding Geremy Heitz, Stephen Gould, Ashutosh Saxena, Daphne Koller

- The Infinite Factorial Hidden Markov Model, Jurgen Van Gael, Yee Whye Teh, Zoubin Ghahramani
- Deep Learning with Kernel Regularization for Visual Recognition. Kai Yu, Wei Xu, Yihong Gong
- Cascaded Classification Models: Combining Models for Holistic Scene Understanding. Geremy Heitz, Stephen Gould, Ashutosh Saxena, Daphne Koller

Edit | Attach | Print version | History: r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions

Topic revision: r2 - 11 Aug 2010 - 13:02:42 - AthinaSpiliopoulou

Copyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.

Ideas, requests, problems regarding TWiki? Send feedback

This Wiki uses Cookies

Ideas, requests, problems regarding TWiki? Send feedback

This Wiki uses Cookies