Rephetio: Repurposing drugs on a hetnet [rephetio]

Positive correlations between knockdown and overexpression profiles from LINCS L1000

@alessandrodidonna suggested adding RNA interference data, so we incorporated genetic perturbation relationships from LINCS L1000. The L1000 project measures how the expression of 978 genes (called landmark genes) changes in response to perturbation. Here we are focusing on gene knockdown (shRNA) and gene overexpression perturbations.

We computed consensus transcriptional profiles for knockdown (pert_type = trt_sh) and overexpression (pert_type = trt_oe) perturbations. For each gene perturbation, we end up with a vector of 978 z-scores representing the change in expression of each landmark gene. Using a Bonferroni cutoff to correct for the 978 comparisons, we identify the significantly upregulated and downregulated genes for each perturbation. Using this approach, we generate four relationship types for our network:

  1. Gene → knockdown downregulates → Gene
  2. Gene → knockdown upregulates → Gene
  3. Gene → overexpression downregulates → Gene
  4. Gene → overexpression upregulates → Gene

In a separate discussion, @larsjuhljensen commented:

I am not entirely sure how useful the "knockdown downregulates" etc. types are. Usually "knockdown downregulates" would be interpreted to mean "upregulates" etc.

In other words, shouldn't we combine relationship types 1 & 4 above into "Gene → upregulates → Gene" and 2 & 3 into "Gene → downregulates → Gene"? To investigate whether this makes sense, I looked into whether knockdown and overexpression profiles for the same gene were anticorrelated. Does knocking down a gene have the opposite trascriptional effect as overexpressing it?

The results were surprising (notebook). Knockdown and overexpression of the same gene resulted in positively correlated transcriptional profiles 65.0% of the time. And if we correlate the knockdown of a random gene with the overexpression of a different random gene, we see a positive correlation 65.3% of the time. In summary, the transcriptional profiles of knocking down and overexpressing genes are more often than not positively correlated. And profiles for the same gene show no more correlation or anticorrelation than profiles for two different genes. Hmm.

Violinplots of correlation distributions

What could cause this counterintuitive finding?

  • We could have a mistake in our code. Does anyone know of a gold standard for genetic perturbations that we could compare to?
  • By looking only at the 978 landmark genes, we are overlooking the crucial genes and instead picking up on a general perturbation response.
  • Gene regulation is a non-linear process.
  • Our method of analysis or the LINCS L1000 data may be limitated.

Does anyone, specifically those with gene expression experience (@caseygreene, @larsjuhljensen, @fbastian), have any insight on what might be happening? I'll also reach out to the L1000 team.

  • Daniel Himmelstein:

    I removed the following sentence, since it is not true:

    Unfortunately, none of the perturbed genes were in the landmark set, so I can't detect whether the perturbations are actually affecting their target genes in the desired direction.

Quick thoughts: Is there a specific set of genes driving the positive correlation? Maybe perturbations in general lead to some change in, for example, growth rate? What, specifically, is your high correlation measuring? Is it possible that highly expressed genes tend to remain highly expressed, or did you transform the data in some way to normalize gene expression across conditions per gene.

  • Lars Juhl Jensen: Exactly - we think alike. What I refer to as "not a happy cell" is usually some mix of stress response and reduced growth rate, which also results in a change in the distribution of cells across cell-cycle phases.

  • Casey Greene: @larsjuhljensen Indeed! My guess right now is that or no gene-wise normalization (e.g. the most highly expressed genes are still the most highly expressed).

  • Lars Juhl Jensen: I assume that the expression values here are already ratios between perturbed and non-perturbed. Otherwise, I would put my money on it being a normalization artifact, but in that case I would expect a much stronger correlation than what is observed.

Messing about with cells always tends to induce some degree of stress-induced global expression changes. This is the case pretty much no matter which perturbation you do to the cells, including overexpression of some gene, knockdown of some gene, cell-cycle synchronization, centrifugation, increasing temperature, decreasing temperature, etc.

My guess is that the small positive correlation you see is caused by small changes in expression of a large number of genes, which you could summarize as "not a happy cell".

Did you transform the data in some way to normalize gene expression across conditions per gene.

@caseygreene, our profiles contain z-scores measuring the differential expression for 978 genes. The profiles (called consensus signatures in L1000 terminology) are at the CONSENSUS stage in the following pipeline:

Processing of Broad LINCS data

The z-scores compare a gene's expression level in cells given the perturbation to cells without the perturbation (controls). I believe the controls account for the non-specific disturbances caused by delivering the molecular payload, but will confirm.

Now I will look into the following questions:

  • @larsjuhljensen: My guess is that the small positive correlation you see is caused by small changes in expression of a large number of genes.
  • @caseygreene: Is there a specific set of genes driving the positive correlation?

Update with workshop findings

I recently led a Systems Pharmacology workshop for first-year graduate students. We analyzed the L1000 genetic perturbation data with the goal of shedding light on the issues in this discussion. The workshop was based on significant dysregulation due to knockdown or overexpression from dhimmel/lincs v2.0 [1, 2]. Compared to v1.0 (what the leadoff post was based on), v2.0 adds dysregulation scores for imputed genes.

See the summary of our findings. In short, certain genes responded in the same direction to a large number of perturbations. For example, RPS4Y1 was frequently downregulated and MCOLN1 was frequently upregulated, regardless of which gene was perturbed in which direction.

@larsjuhljensen noted:

Reassuring to see that things behave the way I would expect. This should make it fairly easy to derive a scoring scheme that extracts only associations that are specifically associated with a small number of perturbations, as opposed to associated with any perturbation.

I think Lars makes a great suggestion, worthy of investigation. However, due to time constraints, we will have to postpone this analysis for a future undertaking.

Proposed quick fix

Currently, I'm leaning towards collapsing all four types of regulation into a single relationship type (Gene → regulates → Gene), which means perturbation of the source gene significantly dysregulated the target gene. In other words, we'll take the union of the four aforementioned regulation relationships.

Our DWPC method for quantifying the connectivity between two nodes downweights paths through high degree nodes [3]. Thus the pervasively dysregulated genes should not be too problematic.

Status: Open
Referenced by
Cite this as
Daniel Himmelstein, Casey Greene, Lars Juhl Jensen (2016) Positive correlations between knockdown and overexpression profiles from LINCS L1000. Thinklab. doi:10.15363/thinklab.d171

Creative Commons License