Rephetio: Repurposing drugs on a hetnet [rephetio]

Assessing the imputation quality of gene expression in LINCS L1000

We recently released version 2.0 of our LINCS L1000 analysis [1, 2]. This release added dysregulation z-scores for 6,489 imputed genes, in addition to the 978 directly measured genes on the L1000 epsilon platform. We only added imputed genes that were part of the best inferred gene set (BING, genes supposedly imputed with high quality).

We've also been looking into the genetic perturbation data in L1000. Here, we will assess the quality of the Broad's gene imputation using genetic perturbation consensus signatures. Specifically, we'll use whether a genetic perturbation dysregulates its target gene in the correct direction as a quality metric.

Below we show the distribution of dysregulation z-scores by imputation status and perturbation type (notebook):

Violin plots of perturbagen-self z-scores

In general, the measured genes responded in the expected direction. For genetic perturbations whose targets were measured, 97% of knockdowns downregulated their targets (negative z-score), and 64% of overexpressions upregulated their targets (positive z-score). Instances where a measured gene responded in the reverse direction could be due to problems with perturbation delivery or expression quantification.

For genetic perturbations whose targets were imputed, 54% of knockdowns downregulated their targets, and 51% of overexpressions upregulated their targets. Using the success rates of measured genes as a baseline, we're led to conclude that the imputation quality of BING genes is poor.

If we instead judge the imputation based only significantly dysregulated genes, the results improve. For significant, imputed perturbagen–target pairs, 67% of knockdowns (18 of 24) downregulated their target, while 80% of overexpressions (4 of 5) upregulated their target. Since these sample sizes are small, I'm hesitant to declare that filtering for significant genes is sufficient to overcome the imputation problems.

For reader reference, recent research [3] looked at improved imputation techniques that presumably could be applied to reimpute LINCS L1000 gene expression.

Status: Completed
Referenced by
Cite this as
Daniel Himmelstein (2016) Assessing the imputation quality of gene expression in LINCS L1000. Thinklab. doi:10.15363/thinklab.d185

Creative Commons License