Probabilistic Inference Group (PIGS) - Archive

Academic Year 2013-2014

We have moved to a themed and team based approach to PIGS meetings. These meetings will now happen in 2.33 at 11am weekly.

Model Combination methods

Mon 7th July: Applications of model combination in speech recognition Theoretical Analysis of Diversity in an Ensemble of Automatic Speech Recognition Systems (Partha)

Mon 9 June. Amos Storkey: Tom Dietterich on Ensemble Methods and David Wolpert on Stacked Generalisation.

Visitor Talk

Mon 19 May Neil Lawrence.


Mon 12 May AISTATS Review. Please sign up for papers.Visit and add your paper to the wiki below.

Visitor Talk

Mon 5 May Ruslan Salakhutdinov

Spectral Learning


Mon 31st March Spectral learning for LDS, Byron Boots' thesis (2012), Chapter 2: A Spectral Learning Algorithm for Constant-Covariance Kalman Filters (Konstantinos, Chris W)

Mon 7th April Spectral learning for MoG, Hsu and Kakade (2012), Learning mixtures of spherical Gaussians: moment methods and spectral decompositions (Benigno, Partha)

Mon 14th April Spectral learning for LDA Anandkumar et al. (2013), A Spectral Algorithm for Latent Dirichlet Allocation (Iain, Krzysztof)

Mon 21st April Spectral learning for ( HMM or PCFG or ?) (TBD)

Additional resources


Organiser: Matt (


Mon 3rd March Koller and Friedman (2009) Probabilistic Graphical Models, Chapter 21: Causality, part 1 (Boris, Chris W)

Mon 10th March Koller and Friedman (2009) Probabilistic Graphical Models, Chapter 21: Causality, part 2 (Agamemnon, Amos)

Mon 17th March Schölkopf et al. (2012) On Causal and Anticausal learning (Iain, Matt)

Mon 24th March Winn (2012) Causality with Gates (Amos, Zhanxing).


Possible topics

Active Learning

Mon 25 November: Burr Settles (2010) Active Learning Literature Survey This is an introductory paper to Active Learning. We are aiming to broadly and unevenly cover the material in the first three chapters. (Guido, Kira)

Mon 2 December: Gaussian Processes tutorial (Iain), slides

Mon 13 January: Agamemnon and Amos will read Information-based objective functions for active data selection, David J.C. MacKay Neural Computation 4, 589--603 (1992)

Mon 20 January: NIPS postcards.

Mon 27 January: Srinivas et al "Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting", In IEEE Transactions on Information Theory, vol. 58, no. 5, pp. 3250-3265, 2012 (possibly also looking at Snoek et al Practical Bayesian Optimization of Machine Learning Algorithms;) (Jari will lead);

Mon 3 February: Self-Paced Learning for Latent Variable Models, by Packer, Kumar, & Koller. Relates to curriculum learning rather than active learning per se. (Chris L. and Partha);

Mon 10 February: - 10/02 Ziyu Wang‚ Masrour Zoghi‚ Frank Hutter‚ David Matheson and Nando de Freitas, Bayesian Optimization in High Dimensions via Random Embeddings (Guido and Pavlos).

Online Stochastic Descent and Stochastic Optimization

Mon 7 October: Leon Bottou (2010) Large-Scale Machine Learning with Stochastic Gradient Descent. This is an introductory paper to Stochastic Gradient Descent. For those wanting a little more detail on online learning methods, Sebastian Bubeck's lecture notes may be helpful: Introduction to Online Optimization. (Amos, Konstantinos)

Mon 14 October: Non-stationary loss and adaptive learning rates: No more pesky learning rates (Beni, Jinli) PLEASE NOTE THIS WILL BE IN 2.33 This is an extension of that work, but will not be presented: Adaptive learning rates and parallelization for stochastic, sparse, non-smooth gradients

Mon 21 October: Ahn, Korattikara and Welling (2012) Bayesian Posterior Sampling via Stochastic Gradient Fisher Scoring. (Guido, Matt)
Hybrid of stochastic gradient descent and Langevin dynamics based MCMC sampling for learning and sampling from the posterior across model parameters using only small mini-batches of the dataset on each update
Useful resources:
Welling and Teh (2011), Bayesian Learning via Stochastic Gradient Langevin Dynamics - precursor paper to that we'll cover explaining how a stochastic (mini-batch) estimate of the log-likelihood gradient can be used with a Langevin dynamics based update to construct a Markov chain which will converge to the posterior over parameters (video presentation of paper)
Roberts and Tweedie (1996), Exponential convergence of Langevin distributions and their discrete approximations - describes the Metropolis-adjusted Langevin algorithm (MALA) for unbiased sampling using discretised Langevin dynamics
Video presentation by Max Welling explaining SGLD and SGFS

Mon 28 October: Le Roux, Manzagol and Bengio (2007) Topmoumoute Online Natural Gradient Algorithm. This Combines online learning with the idea of natural gradient. (Jeff, Mihai) PLEASE NOTE THIS WILL BE IN 2.33

Mon 4 November: Schmidt, Le Roux and Bach (2013) Minimizaing Finite Sums with the Stochastic Average Gradient. This provides interesting theoretical results on the right scheduling process for online learning methods (Zhanxing, Boris)

Mon 11 November: Discussion meeting covering a) potential research directions relating to stochastic gradients, practical suggestions on how and when to use them. b) Choosing people for the next PIGS theme.
Mon 18 November: We will discuss the practical decisions around using stochastic online methods. This will involve the brief review of the suggestions and empirical issues discussed in the following papers. We suggest a cursory look at the papers - we will not spend much time on any theoretical analyses this time...

For future reference 9 Dec, 23 Jun 2014 and 30 June 2014 will be in 2.33

Other papers:

Duchi, Hazan Singer (2010) Adaptive Subgradient Methods for Online Learning and Stochastic Optimization. The subgradient is particularly important in general settings e.g. where we have potential non-differentiability. This is something that can be combined naturally with online learning. (?,?)

Feng Niu, Benjamin Recht, Christopher R e and Stephen J. Wright (2011) Hogwild!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent. This covers the issue of large scale parallelisation of the methods, which is pretty important in many settings.

This topic: ANC > PIGSTwoThousandAndFourteen
Topic revision: r1 - 22 Jan 2015 - 12:54:25 - Main.s1058681
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies