This meeting is not currently runing.

The idea of these lunch sessions is to create a forum, perhaps primarily for PhD students, but really for everybody who is interested, to discuss ongoing work, get feedback, and exchange ideas related to machine learning. The sessions are very informal, no slides required! In most cases it involves a brief presentation (max 30 min) and a subsequent discussion. A session can deliver a high-level overview of a project that the speaker is currently working on, or it can be about specific results, or about a particular problem with which other people might be able to help, about a new idea, or... i.e. anything that seems interesting enough to tell others and to discuss (well - almost anything ;)). The hope is that these sessions will lead to more interactions - and facilitate our work.

Lunches are generally held every Friday at 1 pm in room IF1.16 in the Informatics Forum. Announcements are made through the PIGS mailing list.

- Simon
- Chris Filo
- James
- Konrad
- Ioan
- Jono
- Ondrej
- Dominik
- Ali
- Alexander
- Luigi
- Krzysztof
- Benigno

I'll be giving an general talk about Reservoir Computing, an approach to modelling time series and dynamical systems related to recurrent neural networks.

What is a Reservoir? Why would I want to compute with it? What does a cortical microcolumn have in common with a bucket of water? All these questions, and perhaps more, will be answered.

We will continue watching Michael Jordan's video about Bayesian vs frequentist methods that we started two weeks ago.

There are two related topics that I'd like to talk about:

1) the Bayesian brain hypothesis - you've probably heard of it a lot before ("people are Bayesian"), but what does it mean exactly, problems, open questions, etc.

2) I am reading this book, "Cognition and Chance: The Psychology of Probabilistic Reasoning", and it's pretty interesting. I might go through a few chapters of it.

Depending on time, I might talk about one or both of these.

We sometimes hear things like "oh no, this is a frequentist procedure" or "if you want to be Bayesian about it..." etc. In my talk I will try to explain what it really means to be Bayesian or frequentist. I will back this up with a talk given by Michael Jordan himself and then I will share some ideas I came across while looking at this topic.

I am going to talk about Redford Neal's blog on Harmonic Mean method (HMM) for marginal likelihood estimation (MLE). More general, I classify the HMM estimation of MLE as a special family of estimators, called the "worst" estimator one can have.

"By many estimates, 80% of time in machine learning is spent in data cleaning and exploratory data analysis." -- Someone on the internet

My own experience has also been very similar. I'll try to demonstrate this further by considering two published case studies: 1) "A practical framework for optimizing paper matching" (http://bit.ly/IjEQ9v), and 2) "Classifier technology and the illusion of progress" (http://bit.ly/ecmBrP)

'Causality without experiments'

A treat for philosophical minds. When can we make causal conclusions from observational data, without poking things with a stick? We can answer this by drawing graphical models. I will explain an example of a situation when inference of causality is possible. When we understand the impact of genetic variations on human health, some questions about causality can be answered.

Based on a Cambridge MLSS lecture:

http://videolectures.net/mlss09uk_dawid_caus/

Starter:

Microscopy soup with melted points into blurry blobs and resolution seasoning

Main course:

Marinated, fine chopped Fourier Transforms

Desert:

Fourier frequency range extended cake

Over the past few years, I've fought a number of times with MATLAB, and sometimes, I've won.

I'll be talking about some of the interesting tips and tricks you can use to develop, debug, speed up code, plot data and other tomfoolery in MATLAB. If you're struggling with anything MATLAB related, feel free to bring questions, and we'll see if we can solve them as a group.

Code from the talk can be found here.

Two things I forgot to touch on, were Long Running Jobs on DICE machines, and latex tables.

- See my mlb.sh script to ensure AFS access for long jobs (it sucks when you can't save a file after days of computation).
- See ml_lunch_latex_tables.m for the relevant code on automatically generating .tex files containing tables which previously were in code! These tables are easy to include in your main doc with \input

Useful links include:

- Tom Minka's Lightspeed Toolbox
- Iain Murray's "Efficient Matlab and Octave"
- timeit (the function Iain recommended to time a function) is available here

While the benefits of Fourier analysis are beyond doubt, limitations do exist. Observing a signal in time tells you nothing about its spectrum, while the Fourier transform contains no time information. The answer is given by time-frequency analysis, and most notably by the wavelet theory.

In the first part of my talk, I will review the basic concepts of Fourier analysis and illustrate them with image processing examples. I will then introduce the concept of multi-resolution and try to sketch the wavelet transform.

I will talk about one of the following (maybe depending on what people want to hear):

- general introduction to RL and Stochastic Optimal Control and relation to ML methods

- using kernel methods to solve RL/stochastic optimal control problems

- random features for kernel machines

I am going to talk about differential geometry (DG) and statistics and how it is used to develop state-of-art MCMC method for Bayesian Inference. In the first half, I will ***try* **to give a very brief introduction to the most important concepts of DG and how they are connected to probability distribution space in a Bayesian perspective. The second half would be focused on why we care about DG in sampling and how we can take advantage of it to make MCMC better.

NPAIRS

Tomorrow I'm going to present you a framework for comparing models and data preprocessing techniques that combines measures of prediction and reproducibility. The method was developed as a replacement for ROC curves on simulated data and is intended to use on datasets in which the ground truth is not exactly available.

I'll give a short introduction to the theory of Hilbert spaces, working my way up from basic linear algebra. I don't mean this to be a comprehensive tutorial: the aim here is for people to leave feeling comfortable with the ideas that are most likely to appear in a machine learning context.

Depending on time, I'll cover some of the following topics: Basic definitions and properties, Concrete examples of Hilbert spaces, Fourier series and function approximation, Mercer's theorem, representation of linear functionals, and the Hilbert space interpretation of the Fourier transform.

Receptive fields in the auditory cortex and speech recognition

If computer vision researchers get excited when their algorithms learn Gabor filters, what should a speech processing researcher get excited about? I will present my one week undertaking into reading about the receptive fields in the cochlea and auditory cortex.

One of the standard machine learning assumptions is the same distribution of training and test data. This is often too strict: there may be very little available labelled data - or none at all - in one domain, while plenty of labelled data may be available in another related domain. Since the process of labelling is often very expensive (as it may require human experts), it is sensible to train an algorithm on available data and apply it to data coming from a different distribution.

There is a considerable amount of work in the area of domain adaptation and transfer learning that deals with this kind of scenario. All the models used in these studies are, however, assumed to have fixed complexity. On the other hand, all frequently used model selection criteria assume that training and test distributions are the same. How to combine these two perspectives in order to pick the model that gives the smallest error when the training and test distributions are divergent is an interesting question. Preliminary results have shown that the models favoured in this scenario are simpler than

in the standard setting.

I'll talk about my research about probabilistic modelling of human motor-sensory timing and time perception. In particular, my recent work has focussed on inferring the internal representations of temporal statistics (i.e. subjective priors and loss functions) learnt by subjects in lengthy psychophysical experiments. Aim of the project is probing the upper bounds and limitations of human learning and/or of simple Bayesian accounts of it in the temporal domain.

I will talk about the neocortex, what is is, what it does, and why you might care as a ML researcher.

In particular, I will introduce some of Jeff Hawkins ideas about a cortically inspired algorithm for prediction based on his book 'On Intelligence' and work from his group at Numenta (www.numenta.com)

I will give an overview of the work I've done so far since I started the PhD (with emphasis on most recent results). I will also do a mini-brainstorm about how my vision models could potentially be useful for NLP problems.

I will talk about human genetic variation, and how machine learning allows us to use the data. Your genetic code can now be cheaply read - but what benefits does it bring?

This year's NIPS paper describes a Bayesian nonparametric model for modelling genetic variation, and I will explain it. Finally I will

explain my own work on genetic variation of humans living on islands.

If you would like to have a look at the paper, here it is: http://www.gatsby.ucl.ac.uk/~ywteh/research/npbayes/TehBluEll2011a.pdf

Although there are many different models in machine learning, there are very few compositional approaches (model averaging, mixtures, products/factors, mixings). Prediction markets are used effectively in real life to predict outcomes for events such as presidential elections. Market based approaches are also known to be robust (with different players entering and leaving the market all the time), scalable (millions of players interact with markets every minute) and flexible (adjusting to new information very quickly).

I will be discussing the use of prediction markets for machine learning goals, namely classifier aggregation and probability estimation. I will give a brief introduction on how prediction markets can be set in probabilistic terms, how we have formalised them from a utility based approach, and how they can be used to recreate the standard compositional structures of mixtures and products. I will introduce my method of solving market equilibria (and its problems), and show the results we have gained so far.

I will also discuss potential future avenues of research in the area.

I will give a short introduction to the theory of stochastic differential equations (SDEs), focusing on diffusion processes. In general, nonlinear SDEs are analytically intractible. This has the effect of making filtering and parameter estimation difficult. I will show how one can introduce a set of auxiliary variables to simplify the problem: conditional on the value of these variables, a diffusion behaves approximately like a Gaussian process.

Video lecture: Non-parametric Bayesian Models

http://videolectures.net/mlss09uk_teh_nbm/

I will talk a little about 'fine-grained visual categorisation', and show how I've been applying our 'Factored Shapes and Appearances (FSA)' model to this task. I will conclude by brainstorming about potential future directions for my work.

On Friday I am going to talk about using Gamma-Gaussian Mixture models to select cluster forming threshold for false discovery rate corrected random field theory inference in massive univariate statistical maps. The sole motivation and application of this method is thresholding T maps acquired from fMRI data. I will also show simulations comparing it to the state of the art and reliability study (scan, rescan after three days) results.

I will talk about my current work on modelling music, using ideas and concepts from the field of topic models.

Although there are many different models in machine learning, there are very few compositional approaches (model averaging, mixtures, products/factors, mixings). Prediction markets are used effectively in real life to predict outcomes for events such as presidential elections. Market based approaches are also known to be robust (with different players entering and leaving the market all the time), scalable (millions of players interact with markets every minute) and flexible (adjusting to new information very quickly). I will be discussing the use of prediction markets for machine learning goals, namely classifier aggregation and probability estimation. I will give a brief introduction on how prediction markets can be set in probabilistic terms, how we have formalised them from a utility based approach, and how they can be used to recreate the standard compositional structures of mixtures and products. The rest of the session will be on how we expect to implement and show their worth.

Precision-Recall (PR) curves and ROC curves are a convenient tool for evaluating the performance of a classifier on labeled binary data, when the classifier produces a ranking of labels and the best threshold is unknown. They can also be applied for evaluating a network that has been reconstructed from data, if the true network structure is known. We can use the area under the curve to summarise the output of a PR or ROC curve. However, there are certain subtleties about interpolation that need to be taken into account. I will discuss some of these, and how they can be addressed.

I might refer to this paper, but it's not required reading: http://portal.acm.org/citation.cfm?id=1143844.1143874

I will first explain how the Factorial Switching Linear Dynamical System is used for neonatal condition monitoring. Moving on, a current challenge of our application is the treatment of events happening on different time scales. Two possible solutions are resampling discrete-time AR processes or employing continuous-time AR processes.

I'll talk about how embeddings of random variables into reproducing kernel hilbert spaces can be used for approximate inference and how I intend to use it for my work in stochastic optimal control.

I'll be talking about the optical fluorescence microscopy and why using quantum dots might be used to increase resolution in biological samples. I'll shortly discuss several machine learning techniques (such as Non-negative Matrix Factorisation, Gamma-Poisson model and its variational formulation and Molgedey-Schuster algorithm for sources separation based on time correlations) as different tools for analyzing microscopic data with quantum dots used as fluorescent labels.

I will quickly brainstorm about some issues related to implementing biological object perception. Specifically, the question is how one could learn units whose activity reflects border-ownership of edges, contributing to representations that organize the parts of a visual scene into objects.

We present a novel method to infer combinatorial regulation of gene expression by multiple transcription factors in large-scale transcriptional regulatory networks. The method implements a factorial hidden Markov model with a non-linear likelihood to represent the interactions between the hidden transcription factors. We explore our model’s performance on artificial data sets and demonstrate the applicability of our method on genome-wide scale for three expression data sets. The results obtained using our model are biologically coherent and provide a tool to explore the concealed nature of combinatorial transcriptional regulation.

For more details, look at

In this talk, I will give a short, non-technical introduction to the theory of stochastic differential equations. SDEs are well-suited to modelling natural phenomena that are subject to random effects, and have been used to describe the dynamics of the stock market, patterns of neuronal activity, and population dynamics among many other applications. In practice, it is not possible to observe an SDE at every point in time. One must therefore attempt to infer the behaviour of an SDE based on discrete, potentially noisy observations of the process. I will discuss some approaches to this problem with particular focus on variational methods.

The perceived spatio-temporal relations between local sensorimotor events can be very different from actual distances, durations and even temporal order. In fact, it has been shown that prior expectations, adaptation and other phenomena can warp and shift the subjective metric of space-time, producing observable effects like spatial and temporal recalibration, intentional binding and time reversal illusions. I will shortly describe my PhD project, which aims at characterizing the properties and dynamics of the structure of subjective sensorimotor space-time, with a specific focus on the temporal dimension. The methods of inquiry combine probabilistic and computational modeling, machine learning techniques and psychophysical experiments.

My current research is focused on effcient sampling methods for BCRF= s in high dimensions using Hessian approximation by BFGS. Dealing with Hessian matrix is painful in high dimensions in sampling. However, many optimisation methods have very attractive properties of superlinear convergence by using Hessian approximation on a function with a large number of variables. One of well-known optimisation algorithm is BFGS, a member of Quasi-Newton family. Now, I have found a way to using BFGS to approximat= e the local curvature by a set of most recent samples and such a sampling algorithm based on M-Hs kernel have shown a dramatic performance in low dimensional Gaussian distribution. But, there are still many open questions when applying such a method in high dimensional distributions.

I will be talking about my work joint with Chris on modeling natural images with energy-based models.

I'll do a spontaneous brainstorm session about the The Synthetic Visual Reasoning Test Challenge. This seems to be an interesting one as it involves very simple images of shapes with 'complex' relationships (symmetry, spatial positioning, etc.). Who knows, maybe we even get a team together?

http://www.idiap.ch/~fleuret/svrt/challenge.html

I am working on learning long-term correlations in Boltzmann-machine based dynamic models. I recently defined this project in my first year review, so my talk is based on that. I will state the project aims, briefly review some problems with existing approaches, and then talk about a model I will soon implement.

Starting from basic principles, I will define the concept of a Bayesian nonparametric model. I will explain how a Gaussian Process can be thought of as an example of such models, and proceed to introduce the Dirichlet Process within this framework. Examining the DP in juxtaposition with the Dirichlet distribution and the GP will hopefully lead to a more intuitive understanding of its details. If time permits, I will introduce a simple application of the DP, the Dirichlet Process Mixture and show (with code) how it is used in practice.

I'll talk about some recent work on how by relaxing an exact duality between a KL divergence and the stochastic optimal control problem we can obtain an iterative solution to the latter. The iterations entail lowering the KL divergence between a marginal with partially fixed structure and a posterior. The minimum can be found in closed form, however I'm wondering if anyone has any thoughts about possible weaker updates which have a 'nicer' form then the minimum.

I will speak about a part-based Restricted Boltzmann Machine for modelling music melody. I will describe the model, motivate its application to the specific problem and show some first results on toydata. This is joint work with Nicolas.

My project is about devising automatic tools to aid the diagnosis of heart diseases, by extracting information from the electrocardiogram. I will start by describing the problem at hand and the motivation behind it. Continuing, I will talk about the application of GPs to the problem, in order to motivate my current work on Multi-task learning with Gaussian Processes for classification. Specifically, I will describe the model, the approximation methods of the non-gaussian likelihood that I have applied, and the results they produced. Finally, I will speak about an ongoing work on Meta-generalizing or transfer of learning and how it falls within the multi-task framework. Meta-generalizing is an idea introduced by Jonathan Baxter, on the paper 'A model of Inductive Bias', which I want to investigate empirically. In a few words the idea is how can we make predictions on tasks for which training data are un-available.

I will speak about my ongoing attempts at utilizing a Deep Boltzmann Machine as model of processing in the brain. Specifically, I will talk about modelling visual hallucinations induced by homoeostasis (a model of the 'Charles Bonnet syndrome'), a potential relationship to schizophrenia, and other weird stuff.

I will describe a model to infer the transcription factor activity profile from mRNA time-courses of its target genes. So, the problem is very related to that presented by Frank two weeks ago. But this time you can refer to this paper:

http://bioinformatics.oxfordjournals.org/cgi/content/abstract/25/10/1280

It contains two parts: an exact inference solution and a variational approximating solution. I'll go only through the variational approximation part, showing a new method to solve the inference problem. Basically it's what I've worked on in the last period.

I will give an overview of various approaches to map neural responses to 'stimulus-space' or some other low-dimensional space which can be interpreted meaningfully. I will also present some thoughts of my own which aren't developed very far.

I will present the application of Gaussian processes for modelling transcription factor activity in gene expression. The amount of active protein of a given transcription factor is often hard to measure, making this essentially a latent variable. Gao et al. (2008) have developed a simple ODE model, where the transcription factor activity is modelled using a GP prior. I will discuss the model and some possible extensions.

(Essentially it'll be about this paper:

http://bioinformatics.oxfordjournals.org/cgi/reprint/btn278?ijkey=FauSn114lAUC1Ey&keytype=ref

but people don't have to read it in advance.)

Topic: Subspace Gaussian Mixture Models for HMM-based speech recognition

Abstract--Subspace Gaussian Mixture Models is an alternative approach to build the HMMs for speech recognition, which involves a universal latent subspace to generate the parameters of the GMMs for each state in a HMM, rather than estimates each state models individially as in conventional HMMs. In this approach, it is more fexible to scale the size of the model and we believe we could get comparable or even better performance with much more compact models. In addition, it's also possible to utilize heterogeneous datasets in such approach, for example, speech data from different lanuages can still contribute to the estimation of latent subspace.

I will be talking about some of my work on statistical modeling of natural images using hierarchical nonparametric Bayesian methods, focusing on our hierarchical Dirichlet process hidden Markov tree (HDP-HMT) framework.

I will begin by describing the tree-structured latent variable model it employs to generate pyramidally organized multiscale image features (such as wavelet coefficients), and to couple dependencies between them. I will then describe an extension using Hierarchical Dirichlet Processes to learn data-driven, global statistical image models that automatically adapt to the varying complexity of different datasets.

Finally, I will describe effective learning algorithms using Markov chain Monte Carlo methods and belief propagation for some computational vision problems.

I will give a recap of some standard nonparametric Bayesian models, Dirichlet Processes and Chinese Restaurant Processes and their hierarchical versions, while clarifying any questions about them if there are still any after Jordan's talk today. I will then relate these to my work on author disambiguation and reviewer allocation covering issues including research group identification and name corruption models. Finally I'll discuss some dynamic problems in the mixture modelling or topic modelling context.

My research project is still in its formative stages - so this talk will describe various issues related to that process. I will say a little about my interests in general, then talk specifically about my research area which will involve Boltzmann machines and dynamic models. I shall describe my understanding of the problems with current dynamic models, and why a model based on the dynamics of a Boltzmann machine might be a good idea. I will talk about one approach that might result in such a model. I am seeking a representative problem in dynamic modelling on which to test future models, and hope for some feedback on that as well as the other ideas presented.

I will describe one challenge the machine learning approach faces in the image segmentation task, namely the combinatorial nature of the latent variable space. I will then go over a number of techniques which have been employed to combat this problem. The question I wish to ask is if generic simplifications can be made to the proposed distributions which make inference and learning feasible, in the same way that the Markov property assumption has been applied to the stochastic process framework.

Recently there have been several attempts to use probabilistic inference approaches for stochastic optimal control problems (e.g. work by Kappen&Opper, Todorov, Toussaint). I'll briefly introduce the model I have been working with (Toussaint's), how the model relates to classical stochastic optimal control and alternative formulations, and more importantly what we have gained by using the probabilistic formulation and what we hope to gain in the future. Most will be work in progress or ideas so comments will be highly encouraged and much appreciated.

There has been the idea floating around (e.g. at NIPS last year) that the brain uses probabilistic models to describe the world, but does inference with only a limited amount of samples at a time. In particular, dynamic inference has been described with particle filters using few - or even just one - particles. I would like to brainstorm about this idea and how it could relate to other types of models some of us are working on.

It is a 10mins talk, titled: "Probabilistic Models for Melodic Sequences".

Basically I will describe the scope of my PhD and the 2 models that I've used so far (Variable Length Markov Model and Convolutional RBM in the time dimension) and present/comment on my results.

Nested Effects Models are a simple probabilistic model for representing gene regulation network under the assumption that genes are either signaling (S) or effect-reporting (E) genes. Inference can be done exactly, but searching the space of all networks is computationally expensive for non-trivial networks. Approximations exist that are able to overcome this problem. I will present the methodology and discuss whether improvements are possible.

I am trying to model visual processing in the mammalian brain, in particularly 'high-level' aspects (object perception) and high level driven feed-back processing, both poorly understood in computational neuroscience. The modelling approach I am taking is based on Boltzmann machines, and the tasks in questions are those of computer vision (object classification, segmentation etc.), so naturally my work is very related to that of several others in our group. Thus I hope to get some feedback on how to develop a distinct model specific to the brain, and on how some novel, biologically inspired mechanisms could be implemented in this framework, such as object based attention.

I will start by describing a rather simple model, based on the Restricted Boltzmann Machine, which I will use to model the melody sequences of reels. Then, I will discuss different aspects of music that we want a model to be able to capture and suggest possible extensions to the initial model that may serve this purpose. Your input/comments will be highly appreciated

During the machine learning lunch I will present recent progress on an initialisation for the Gaussian Process Latent Variable Model which has been suggested by Chris. The method takes a shortcut from the optimisation of the model log-likelihood based on the relationship between sample covariance and kernel. After introducing the method I will discuss recent insights following from experiments with synthetic data.

How to improve mixing when training undirected models with persistent chains

CD is a poor approximation to the likelihood gradient. Tieleman (2008) proposed to use persistent Markov chains which can give much better results, but towards the end of learning the Markov chains often mix very slowly with the effect that the model parameters do not converge but begin to oscillate. I'll describe two solutions to this problem: the "fast weights" approach suggested by Tieleman & Hinton (2009) and tempered trajectories (tempered transitions for hybrid Monte Carlo), an approach that I have been experimenting with recently in the hope of improving training of my P.o.E (F.o.E) models.

Last time I talked about using a HDP to perform entity resolution based on author names. This time I'll talk about my extensions to the model including the modelling of research groups and collaborations at document and corpus level and how this can aid the problem. I'll also briefly talk about some of the problems tackled at Google and a summary of the problems I worked on at Google Translate.

I will discuss how task correlations in multi-task Gaussian process (GP) regression affect the generalization error and the learning curve, concentrating on the asymmetric two-task case. Lower and upper bounds to the generalization error and the learning curve will be given. If time permits, I'll also discuss how we may link the learning curve (which is Bayesian) to the average mean square error (which is frequentist).

In computational neuroscience one often is interested in how different stimuli are represented in the responses of single neurons or populations of them. This can also be described as characterizing the mapping from (external) stimulus space to neuronal response space. I will present a recent take on this problem, which uses a distance measure between spike trains and multi-dimensional scaling to 'reconstruct' the response space.

Dynamic Bayesian networks (DBNs) are a subset of Bayesian networks where each network unfolds in time, and directed edges link variables at consecutive time steps. Usually the edges are assumed to be the same no matter which time steps we are considering. We can call this a homogeneous DBN. I will present a couple of recent approaches for inferring heterogeneous DBNs, where the edges between time steps vary over the course of the time series. I will explain certain shortcomings of these approaches, and discuss how they could be addressed.

Last time I talked about my search for a machine learning inspired model of biological vision, describing mostly the biological aspects and constraints. This time I will talk about a concrete model I recently started working with, which is more or less a Deep Belief Net. I will outline what I plan to do with it, and how that relates back to neuroscience phenomena such as attention.

I'll talk about the notion of uncertain evidence, which includes virtual evidence (where evidence is represented as likelihood ratios) and soft evidence (where evidence is represented as probability distributions), and discuss its potential for robust model selection.

This may change:

Restricted Boltzmann Machine (RBM) Models for Melodic Prediction

I'll briefly present two RBM models for sequential data, the Conditional RBM and the Factored Conditional RBM, and I'll discuss how they can be used for modeling music melodies. I'll also talk about a few ideas and/or tricks that I'm thinking of trying out.

The last time I spoke about the deficiencies of the Gaussian Process Latent Variable Model as a dimensionality reduction method. This time I will put a more positive spin on it by showing that it actually is a good method to capture constraints in the data. This is a property that we are trying to exploit in our current project in which we aim at making reinforcement learning in continuous state and action spaces practical when constraints limit the state space. For several reasons it appears beneficial for the RL to stay in regions of the state space for which the GPLVM gives high confidence and I will look into how predictive confidences from the GPLVM can be incorporated into the RL process.

My machine learning lunch will be about the use of various structures of HMMs in the modelling of behaviour . I will describe segment HMMs, factorial HMMs and I/O HMMs, and how I would like to combine them to form a good model of behaviour. I'll talk about the kind of observations we get and how to use these to perform inference in the model.

I have been reading in my spare time about self similarity in ethernet traffic and how the poisson distribution fails to correctly monitor and model the observations. Thick tailed distributions are used to sample packet sizes. Such distributions are commonly seen in practice finance, ecosystems, etc. Extreme value theory is what I thought would apply, but not really so. And it is also used to explain some forms of non-gaussian noise, which is pink!

I will look at the models for sharing information locally across the image while allowing for global variability. We are aiming to aid estimation of the underlying parameters for the ambiguous pixels by using the fact that neighbouring pixels should be similarly distributed. To the brainstorm people - I expect to get into slightly more mathematical detail and expose some of the complexities I am facing.

Most dynamical systems can be modeled using (ordinary or stochastic) differential equations. There are two questions we can ask when modeling the system: What values should the parameters take, and what should be the structure of the equations. The first question is arguably the simpler one, so I will first describe some classic and modern approaches to parameter inference, and point out limitations and possible extensions. If time permits, I will then talk about structure inference and whether it can be achieved.

Structured learning has been one of the problems that have motivated Statistical Relation Learning. I'll be talking about my efforts in a parallel area, that of record linkage or entity disambiguation or in a single word: clustering. The model disambiguates author names based on what the authors write on (similar to an author-topic model) and does this in a generative model based on the hierarchical Dirichlet Process. Though nonparametric models are known for reducing the number of assumptions on your data, it turns out that in the DP this is a tradeoff that exists in terms of adjusting the prior concentration parameter, alpha. I'll also talk about how some of this relates to the Pitman-Yor process.

In the context of online model selection: how could one exclude a data sample probabilistically without resorting to mixture models?

First I will talk in general about using Markov Chain Monte Carlo methods to solve and infer parameters of stochastic differential equations and what are the main problems of that approach. Then I'm gonna talk about how population MCMC methods could be utilized to solve these problems, which is what I will be looking at the next months.

I will describe a problem I have been thinking about recently about inferring neural representations of internal models (for motor control) from motor adaptation data. Previous work has simply guessed a neural representation, then shown that this is reasonably consistent with observed adaptation behaviour. However, it would be much nicer to instead infer the neural representations directly from data...

Modelling Spatial Autocorrelation in Regression and in Bayesian Networks

Spatial autocorrelation is an important problem when modelling spatial data, such as population data in ecology. If data points at neighbouring locations are strongly correlated, then relationships based on other variables can be hard to detect. One easy way of taking spatial autocorrelation into account is including values at neighbouring points as additional variables. This is fine for regression, but breaks down when we try to learn the structure of a Bayesian network. I will present a (very) basic scheme that we have developed to deal with the problem in the Bayesian network framework, and discuss whether there are other possible approaches.

While there's a good local physics-based model for my data (diffusion MRI), full parameter estimation is not feasible on a local basis. We have shown that sharing some parameters between neighbouring locations in the brain can amend this and now I'm hoping to set up a nifty hierarchical model that will capture and exploit this spatial coherence.

Topic revision: r115 - 21 Sep 2015 - 09:28:30 - AmosStorkey

Copyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.

Ideas, requests, problems regarding TWiki? Send feedback

This Wiki uses Cookies

Ideas, requests, problems regarding TWiki? Send feedback

This Wiki uses Cookies