Bayesian latent variable models for learning dependencies between multiple data sources

Lecturer : 
Event type: 
Doctoral dissertation
Doctoral dissertation
Respondent: 
Seppo Virtanen
Opponent: 
Cédric Archambeau, Amazon, Berlin, Germany
Custos: 
Samuel Kaski
Event time: 
2014-08-25 12:00 to 16:00
Place: 
Computer Science Building (Konemiehentie 2), lecture hall T2, Otaniemi Campus
Description: 

Machine learning focuses on automated large-scale data analysis extracting useful information from data collections. The data are frequently high-dimensional and may correspond, for example, to images, text documents, or measurements of neural responses. In many applications data can be collected from multiple data sources, that is, views.

This thesis presents novel machine learning methods for analyzing multiple data sources, especially for understanding relationships between them. The analysis provides a comprehensive summary of the data generating process, which may be used for exploring the relationships and for predicting observations of one or more sources. The methods are based on two assumptions: each view provides complementary information of the data generating process, and each view is corrupted by noise. The methods aim to utilize all available information (views), accumulating partly overlapping information and reducing view-specific noise.

 
In particular, this thesis presents several Bayesian latent variable models that learn a decomposition of latent variables; some of the variables capture information shared by multiple sources, whereas the remaining variables explain noise in each view. The latent variables may be efficiently inferred based on the observed data by using sparsity assumptions and Bayesian inference. The models are applied for analyzing neural responses to natural stimulation as well as for jointly modeling images and text documents.

Last updated on 14 Aug 2014 by Tommi Mononen - Page created on 14 Aug 2014 by Tommi Mononen