Dr. George Edward Allen

Bioinformatician (30%)

+41 22 379 55 16

Having completed bachelors and masters degrees in mathematics and statistics at the University of Oxford and University College London, respectively, I have a solid grounding in a broad range of mathematical, statistical and probabilistic methods, as well as in statistical computing in R.

Since then, I have been applying these techniques to biological data and expanding on my skill set beginning with my masters and doctorate in bioinformatics. During these I touched on many areas of computing (R, BASH/SunGrid, C++, Java, Python, R, Perl, SQL, HTML, Linux), multivariate statistics and mathematical modelling and was introduced to high-throughput genomic data.

Moving to the National Cancer Centre in Singapore as a Senior Bionformatics Specialist, I was mainly tasked with exome sequencing (WES) and expression profiling (microarray) of primary human cancer tumour samples, as well assessing their DNA binding profiles (ChIP-Seq). This introduced me to the complexities of reliable mapping and mutation calling/annotation using samtools, pindel and SnpEff and custom scripts. These were coupled with expression and binding data to identify functional groups of genes through GO or GSEA. I also gained a broad knowledge of processes in genomics and cancer. The post involved constant communication, presentations and collaboration with clinicians, researchers and drug companies on multiple translational research projects. 

Taking up a post in the bioinformatics core at the Gurdon Institute, University of Cambridge, I worked in parallel on multiple projects and was exposed to a very diverse range of data types, model organisms and biological concepts. I gained my first experience in RNA-Seq (bulk and single cell) and continued to call SNVs and indels in WES as well as WGS. I had the chance to continue clinical-related work, comparing liver and bile duct tumours to derived organoids by overlapping RNASeq, WES SNP, COSMIC and survival data. Furthermore, working with poorly annotated organisms necessitated assembling scaffolds and homologous functional annotation. I also provided bioinformatics training, individually and in lectures. Balancing so many collaborations meant organisation, teamwork and meeting deadlines were essential on a daily basis. Clear, concise communication of ideas was crucial in dealing with wet and dry labs and contributing to top-flight journals.

At the Institute of Pathology, Centre hospitalier vaudois, I focussed on diagnostic pipelines (implemented in R/BASH) for various cancer types, identifying true and disease-relevant variants using logistic regression models fed by platform information, variant characteristics and mutation/polymorphism databases (e.g. COSMIC, dbSNP). Challenges included filtering noise variants from formalin fixation and artifactual homopolymers introduced by SOLiD sequencing. I carried out other research projects characterizing lesser known lymphomas. This work integrated variant and CNV data with expression data from microarray and GSEA. 

In my current post, I research and analyse various data related to the CCR4-Not complex, as well as collaborations interrogating gene expression in cancer. I recently submitted a paper, as first author, on ribosome profiling of translation in the absence of Not4/Not5. We also have an article accepted in Cell Reports on the role in translation of FKBP10 and resulting regulation of lung cancer growth. This leverages ribosome profiling and RNA-Seq. Another major current project studies Not1 DNA and RNA binding through ChEC-Seq and RIP-Seq and we are beginning to look at Not4 RNA binding through PAR-CLIP. I enjoy the potential for deep exploratory analysis into the broad role of a molecule regulating gene expression at all stages. Collaboration, presentation of data and advising on mathematics/computation are frequent features of the role.

In all posts I made extensive use of R/Bioconductor and BASH scripting, as well as cluster computing with Sun Grid Engine and some SLURM. I also gained experience of python and C++.

DNA sequencing:
Whole Exome Sequencing
Whole Genome Sequencing

Variant calling and annotation:
samtools, MuTect, VarScan, GATK, SnpEff, ExAC, DbSNP, 1000 genomes, COSMIC, phyloP, BLAST/BLAT


DNA Binding sequencing:
ChIP-Seq, ChEC-Seq

MACS2 peak calling (SICER for histones)
MEME/DREME motif search
Custom normalisation of ChEC developed


RNA Sequencing:
RNA-Seq (bulk and single cell)
Ribosome footprinting

GSEA, GO, Panther

bwa, bowtie2, hisat2 


Protein Level:
Swiss Model & CAS


Model Organisms: 
Homo sapiens, S. cerevisiae, Mus musculus, C. elegans, X. laevis, E. coli, D. melanogaster


R/Bioconductor, BASH, Sun Grid, SLURM, Python, C++


Operating Systems:
OSX, Linux, Windows



Bachelors and masters in mathematics and statistics. Experience in multivariate statistics, probability, stochastic mathematical modelling, various types of regression on real-world data. In addition to this, I have collaborated in very diverse environments, both clinical and fundamental. I have dealt with most types of high-throughput data and seen a broad range of genomic research fields, giving me a good grounding in relevant biological processes and their analysis.