TWiki> ANC Web>PIGS>PIGSTwoThousandAndNine (revision 1)EditAttach

Probabilistic Inference Group (PIGS) - Archive

Meetings in 2009

Tue 12 December (Nicolas Le Roux)

Talk by Nicolas Le Roux (Microsoft Research). Please see title and abstract below:

How overconfidence slows you down, a learning story.

Abstract: Nowadays, for many tasks such as object recognition or language modeling, data is plentiful. As such, the most important challenge has become to find a way to use all the available data rather than to deal with small training sets. In this setting, coined ``large-scale learning'' by Bottou and Bousquet,learning and optimization become different and powerful optimization algorithms are suboptimal learning algorithms. While many only considered optimization algorithms (or approximations thereof) to perform learning, I will show how designing a proper learning algorithm and making use of the covariance of the gradients can yield faster, more robust, convergence. I will also show that this covariance matrix is not an approximation of the Hessian and that the two matrices can be combined in an principled and efficient way.

Tue 17 November (Jakub Piatkowski)

We will discuss the following paper:

Wed 11 November (Maurizio Filippone)

Talk by Maurizio Filippone. Please see title, abstract and related material below:

Information Theoretic Novelty Detection

In this talk, we present a novel approach to online change detection problems when the training sample size is small. The proposed method is based on estimating the expected information content of a new data point in the null hypothesis that it has been generated from the same distribution as the training data. In the case of the Gaussian distribution, our approach is analytically tractable and closely related to classical statistical tests, since the expected information content is independent from the statistics of the generating distribution. Such a test naturally takes into account the variability of the statistics due to the finite sample effect, and thus it allows to control the false positive rate even when only a small training set is available. We then discuss two different extensions of the presented method. In the first one, we propose an approximation scheme to evaluate the information content of a new data point when the generating distribution is a mixture of Gaussians. Finally, we study the extension to autoregressive time series with Gaussian noise, thus removing the i.i.d. assumption. The experiments conducted on synthetic and real data sets show that our method maintains a good overall accuracy, but significantly improves the control over the false positive rate.

Part of the material covered in the talk can be found here:

Tue 3 November (David Reichert)

We will discuss the following papers:

Tue 20 October (Jyri Kivinen)

Jyri will present some of his joint work on statistical modeling of natural images and scenes using a hierarchical nonparametric Bayesian framework (J. J. Kivinen, E. B. Sudderth (Brown), and M. I. Jordan (UC Berkeley)). Please see abstract and background papers below:

I will begin by describing the tree-structured latent variable model it employs to generate pyramidally organized multiscale image features, and to couple dependencies between them. I will then describe an extension using Hierarchical Dirichlet Processes to learn data-driven, global statistical image models of unbounded complexity. Finally, we develop effective learning algorithms using Markov chain Monte Carlo methods and belief propagation for categorizing images of novel scenes, and denoising them in a transfer learning-based algorithm.

Fri 16 October (Michalis Titsias)

Title: Variational Inference for Large Datasets in Gaussian Processes

Gaussian processes (GPs) are stochastic processes over real-valued functions that can be used for Bayesian non-linear regression and classification problems. GPs also naturally arise as the solutions of linear stochastic differential equations. However, when the amount of observed or training data is large, the evaluation of posterior GPs is intractable because the computations scale as O(n^3) where n is the number of training examples. Therefore, for large datasets we need to consider approximate or sparse inference methods. In this talk we discuss sparse approximations for GPs based on inducing/support variables and the standard variational inference methodology. We apply this to regression, binary classification and large systems of linear stochastic differential equations.

Tue 6 October (Michael Dewar)

We will discuss Variational Inference in Markov Jump Processes using the following papers:

Tue 22 September (Jan Antolik)

Joint session with joint DevCompNeuro journal club.

Short description of the talk:

The main goal of the project I'm working at is to predict what image has been presented to an animal based on the activity profile of group of cells (~50) obtain via two-photon imaging. The system would learn this prediction from recordings of pairs of images and activity profiles. Instead of directly predicting the image the goal is to be able to tell from a large set of images which one was the presented one.

I have so far applied several simpler approaches to the problem, including the simple linear perceptrons, multi-layer NN with back-propagation and notably the 'gaussian pyramid model' which worked when applied to analogous problem but with fMRI data in a study by (Gallant et al. 2008). I have also tried several approaches to directly determine the receptive field of the neurons.

So far these techniques haven't worked. My main aim with this presentation is to get some brainstorming going and perhaps learn from the real machine learning people about latest approaches to fit non-linear models. I would be particularly interested in learning about methods of learning recurrent NN, as it appears that a lot of the neural responses are due to lateral interaction as opposed to the feed-forward receptive field structure.

Kay NN, Naselaris T, Prenger RJ and Gallant JL (2008): Identifying natural images from human brain activity

Tue 25 August (Edwin Bonilla)

Tue 28 July (UAI session)

Brief discussions on the following UAI 2009 papers:

UAI 2009 proceedings at

Tue 14 July (Kian Ming Chai)

We will discuss the following paper:

Tue 30 June (Amos Storkey)

We will discuss Deep Boltzmann Machines using the paper:

  • R. Salakhutdinov and G.E. Hinton, To appear in Artificial Intelligence and Statistics (2009): Deep Boltzmann Machines

If there is enough time, Amos will also give a basic introduction to Martingales using:

Tue 23 June (2nd ICML session)

Brief discussions on the following ICML 2009 papers:

Note: ICML 2009 proceedings at

Tue 16 June (ICML session)

Brief discussions on the following ICML 2009 papers:

Tue 9 June (Michael Dewar)

We will discuss Hierarchical HMMs using the paper:

Note: There is an extended version of the paper:

Tue 26 May (Athina Spiliopoulou)

We will discuss two variants from the RBM/DBN literature using the papers:

Tue 21 April (Chris Williams)

We will discuss multi-arm bandits and Gittins indices. This is a simple case where the exploration-exploitation tradeoff is seen, and there is an optimal Bayesian solution.

The papers

J. C. Gittins, Bandit Processes and Dynamic Allocation Indices, Journal of the Royal Statistical Society. Series B (Methodological), Vol. 41, No. 2. (1979), pp. 148-177.

J. C. Gittins, D. M. Jones, A Dynamic Allocation Index for the Discounted Multiarmed Bandit Problem, Biometrika, Vol 66, No. 3. (1979), pp. 561-565.

are available via

Tue 7 April 2009 (Jakub Piatkowski)

Tue 24 March 2009 (Nicolas Heess)

Tue 10 March 2009 (Andrew Dai)

Tue 24 February 2009 (Edwin Bonilla)

Tue 10 February 2009 (Kian Ming Chai)

Tue 27 January 2009 (Amos Storkey)

Tue 13 January 2009 (NIPS)

Note: NIPS 21 preproceedings at

Some other NIPS 21 papers CW found to be of interest:
  • The Infinite Factorial Hidden Markov Model, Jurgen Van Gael, Yee Whye Teh, Zoubin Ghahramani
  • Deep Learning with Kernel Regularization for Visual Recognition. Kai Yu, Wei Xu, Yihong Gong
  • Cascaded Classification Models: Combining Models for Holistic Scene Understanding. Geremy Heitz, Stephen Gould, Ashutosh Saxena, Daphne Koller
Edit | Attach | Print version | History: r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r1 - 13 Jul 2010 - 12:15:53 - AthinaSpiliopoulou
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies