I co-lead the Data Mining and Bioinformatics theme in the MRC Integrative Epidemiology Unit, and oversee bioinformatics in the ALSPAC cohort. My specific research interests are in:
- Statistical genetics software (MIDAS, CubeX and CoNVEM; see refs 1, 2 and 3)
- Machine learning for outcome prediction (FSMKL, see ref 4)
- Prediction of functional effects of genetic variants (FATHMM, see refs 5, 6 and 7)
- Pleiotropy analysis (see ref 8)
- Data integration and visualisation
- Data mining
- Microbiomic profiling using 16S-sequencing
As a co-investigator in the BBSRC-funded Accessible Resource for Integrated Epigenomics Studies I have led the bioinformatic components of this large-scale DNA methylation resource. This has generated HM450 methylation data on 5000 blood samples from ALSPAC in addition to 50 whole-genome BS-Seq datasets and a number of other smaller datasets in other tissues. I also manage the methylation data of several other studies.
- The ARIES project generated HM450 data on 5000 ALSPAC samples
- The ARIES Explorer web interface provides open access to data from the ARIES project
- We are in the process of performing large-scale meQTL analysis (including in collaboration with the GoDMC consortium).
Cardiovascular Genetic Epidemiology
Cardiovascular disease (CVD) is a major cause of mortality in Great Britain. CVD and its risk factors of hypertension, diabetes and obesity are complex traits caused by multiple genetic and environmental components. A relationship is also observed between early growth and adult disease, suggesting that either foetal nutrition influences adult disease or that some genetic factors influence both early growth and adult disease.
In 2007 I was awarded a British Heart Foundation project grant (PG/07/131/24254) to genotype 50,000 single nucleotide polymorphisms in cardiovascular disease candidate genes (the Illumina/IBC HumanCVD Beadchip) within the British Women's Heart and Health Study. In 2010 I was awarded a Medical Research Council Project Grant for developing a systems approach to the classification of genes impacting the cardiovascular phenome. The BHF-funded HumanCVD project in BWHHS has resulted in over 25 publications, and the foundation of the UCLEB consortium (UCL, LSHTM, Edinburgh and Bristol) applying Illumina metabochip and NMR metabolomics data to a wide range of cardiovascular epidemiology projects.
- Gaunt TR, Rodriguez S, Zapata C, Day IN. MIDAS: software for analysis and visualisation of interallelic disequilibrium between multiallelic markers. BMC Bioinformatics. 2006 Apr 27;7:227.
- Gaunt TR, Rodriguez S, Day IN. Cubic exact solutions for the estimation of pairwise haplotype frequencies: implications for linkage disequilibrium analyses and a web tool ‘CubeX’. BMC Bioinformatics. 2007 Nov 2;8:428.
- Gaunt TR, Rodriguez S, Guthrie PA, Day IN. An expectation-maximization program for determining allelic spectrum from CNV data (CoNVEM): insights into population allelic architecture and its mutational history. Hum Mutat. 2010 Apr;31(4):414–20.
- Seoane JA, Day IN, Gaunt TR, Campbell C. A pathway-based data integration framework for prediction of disease progression. Bioinformatics. 2014 Mar 15;30(6):838–45.
- Shihab HA, Gough J, Cooper DN, Stenson PD, Barker GL, Edwards KJ, Day IN, Gaunt TR. Predicting the functional, molecular, and phenotypic consequences of amino acid substitutions using hidden Markov models. Hum Mutat. 2013 Jan;34(1):57–65.
- Shihab HA, Gough J, Cooper DN, Day IN, Gaunt TR. Predicting the functional consequences of cancer-associated amino acid substitutions. Bioinformatics. 2013 Jun 15;29(12):1504–10. doi: 10.1093/bioinformatics/btt182.
- Shihab HA, Gough J, Mort M, Cooper DN, Day IN, Gaunt TR(1). Ranking non-synonymous single nucleotide polymorphisms based on disease concepts. Hum Genomics. 2014 Jun 30;8:11.
- Seoane JA, Campbell C, Day IN, Casas JP, Gaunt TR. Canonical correlation analysis for gene-based pleiotropy discovery. PLoS Comput Biol. 2014 Oct 16;10(10):e1003876.