Project:
Rephetio: Repurposing drugs on a hetnet [rephetio]

Exploring the power of Hetionet: a Cypher query depot


Hetionet v1.0 is now available online as a Neo4j database at https://neo4j.het.io. While Project Rephetio is focusing primarily on drug repurposing, we'd like to illustrate the versatility of Hetionet for answering a broad range of biomedical questions. We want to exhibit that hetnets aren't vaporware. We want to exhibit that a dozen lines of Cypher code in Hetionet can immediately perform analyses that previously would have taken months to implement.

If you're a biologist and have interesting questions that Hetionet may know about, ask away. If you think you have an interesting query, please share.

GO Process enrichment for migraine genes

Here we'll show a query for identifying prominent GO Processes in a set of disease-associated genes. We'll use migraine as an example disease. First, we can see the 46 genes associated with migraine by querying MATCH (:Disease {name: 'migraine'})-[rel:ASSOCIATES_DaG]-() RETURN rel. Now we'll compute the DWPC (degree-weighted path count) between migraine and each GO Process. We'll restrict our results to processes with at least 5 participating genes (of which two or more are migraine-associated). Here's the Cypher:

// Search for DaGpBP paths starting with migraine
MATCH path = (n0:Disease)-[:ASSOCIATES_DaG]-(n1)-[:PARTICIPATES_GpBP]-(n2:BiologicalProcess)
WHERE n0.name = 'migraine'
// Implement the DWPC to adjust for node degree along paths
WITH
[
  size((n0)-[:ASSOCIATES_DaG]-()),
  size(()-[:ASSOCIATES_DaG]-(n1)),
  size((n1)-[:PARTICIPATES_GpBP]-()),
  size(()-[:PARTICIPATES_GpBP]-(n2))
] AS degrees, path, n2
WITH
  // Return the GO Process ID and name
  n2.identifier AS go_id,
  n2.name AS go_name,
  count(path) AS PC,
  // Compute the DWPC
  sum(reduce(pdp = 1.0, d in degrees| pdp * d ^ -0.4)) AS DWPC,
  // Count the number of genes in the GO Process
  size((n2)-[:PARTICIPATES_GpBP]-()) AS n_genes
  WHERE n_genes >= 5 AND PC >= 2
RETURN
  go_id, go_name, PC, DWPC, n_genes
ORDER BY DWPC DESC
LIMIT 5

The query took less than half of a second and performed 60,183 database hits. It returned the following top 5 GO Processes:

go_idgo_namePCDWPCn_genes
GO:0007210serotonin receptor signaling pathway80.06118
GO:0050884neuromuscular process controlling posture20.04815
GO:0042310vasoconstriction80.04328
GO:0006812cation transport160.033781
GO:0014821phasic smooth muscle contraction30.03317

Next, we'll look bolster the query by adding protein interaction relationships.

Tissue-specific interactomics: GO Process enrichment for multiple sclerosis

The following query performs a more advanced GO Process enrichment analysis for multiple sclerosis (MS) genes. First, we restrict to GWAS-associated genes, which have the advantage of being less biased by existing knowledge. Second, we add a protein interaction relationship to identify genes in the MS neighborhood of the interactome. However thirdly, we require that these genes are upregulated in an MS-affected tissue (anatomy). Therefore, we can hopefully capture some of the benefits of tissue-specific gene networks [1] on our anatomy-agnostic interactome.

MATCH path = (n0:Disease)-[e1:ASSOCIATES_DaG]-(n1)-[:INTERACTS_GiG]-(n2)-[:PARTICIPATES_GpBP]-(n3:BiologicalProcess)
WHERE n0.name = 'multiple sclerosis'
  AND 'GWAS Catalog' in e1.sources
  AND exists((n0)-[:LOCALIZES_DlA]-()-[:UPREGULATES_AuG]-(n2))
WITH
[
  size((n0)-[:ASSOCIATES_DaG]-()),
  size(()-[:ASSOCIATES_DaG]-(n1)),
  size((n1)-[:INTERACTS_GiG]-()),
  size(()-[:INTERACTS_GiG]-(n2)),
  size((n2)-[:PARTICIPATES_GpBP]-()),
  size(()-[:PARTICIPATES_GpBP]-(n3))
] AS degrees, path, n3 as target
WITH
  target.identifier AS go_id,
  target.name AS go_name,
  count(path) AS PC,
  sum(reduce(pdp = 1.0, d in degrees| pdp * d ^ -0.7)) AS DWPC,
  size((target)-[:PARTICIPATES_GpBP]-()) AS n_genes
  WHERE 5 <= n_genes <= 100 AND PC >= 2
RETURN
  go_id, go_name, PC, DWPC, n_genes
ORDER BY DWPC DESC
LIMIT 5

The query took under 5 seconds and required 1,420,721 database hits.

go_idgo_namePCDWPCn_genes
GO:0045347negative regulation of MHC class II biosynthetic process30.000066
GO:0010842retina layer formation30.0000622
GO:0045346regulation of MHC class II biosynthetic process30.0000413
GO:0060042retina morphogenesis in camera-type eye40.0000354
GO:0003407neural retina development40.0000355

Interestingly three of the processes involve the retina. Note that the terms are not independent: "retina layer formation" is a subprocess of "retina morphogenesis in camera-type eye" and "retina layer formation". If we're interested more in the MS–"retina layer formation" relationship, we can retrieve the paths behind the DWPC:

MATCH path = (n0:Disease)-[e1:ASSOCIATES_DaG]-(n1)-[:INTERACTS_GiG]-(n2)-[:PARTICIPATES_GpBP]-(n3:BiologicalProcess)
WHERE n0.name = 'multiple sclerosis'
  AND n3.name = 'retina layer formation'
  AND 'GWAS Catalog' in e1.sources
  AND exists((n0)-[:LOCALIZES_DlA]-()-[:UPREGULATES_AuG]-(n2))
RETURN path

multiple sclerosis paths to retina layer formation

Which anatomies express migraine-associated genes

The following query looks for anatomies (tissues) which express the genes associated with migraine.

MATCH path = (n0:Disease)-[:ASSOCIATES_DaG]-(n1)-[:EXPRESSES_AeG]-(n2:Anatomy)
WHERE n0.name = 'migraine'
WITH
[
  size((n0)-[:ASSOCIATES_DaG]-()),
  size(()-[:ASSOCIATES_DaG]-(n1)),
  size((n1)-[:EXPRESSES_AeG]-()),
  size(()-[:EXPRESSES_AeG]-(n2))
] AS degrees, path, n2 as target
RETURN
  target.identifier AS anatomy_id,
  target.name AS anatomy_name,
  count(path) AS PC,
  sum(reduce(pdp = 1.0, d in degrees| pdp * d ^ -0.5)) AS DWPC,
  size((target)-[:EXPRESSES_AeG]-()) AS n_genes
ORDER BY DWPC DESC
LIMIT 5
anatomy_idanatomy_namePCDWPCn_genes
UBERON:0001645trigeminal nerve70.02236
UBERON:0001785cranial nerve100.02066
UBERON:0002363dura mater20.0174
UBERON:0002925trigeminal nucleus40.0167
UBERON:0002360meninx10.0123

The query does a good job identifying migraine-relevant tissues. However, notice that the expression profiles for the retrieved tissues are not very comprehensive: only four genes are known to be expressed in the dura mater, two of which are migraine associated. Therefore, these results are dependent on our gene expression catalog [1], which varies considerably in comprehension by tissue.

Compounds that target genes involved in myelation

Demyelination is the cause of much disability in multiple sclerosis patients. Below is a simple query to find all compounds that bind to proteins whose genes are involved in myelination:

MATCH path = (n0:BiologicalProcess)-[:PARTICIPATES_GpBP]-(n1)-[:BINDS_CbG]-(n2:Compound)
WHERE n0.name = 'myelination'
RETURN path

myelination-compounds

The query identifies 8 myelination-involved genes that are targeted by 33 compounds. To retrieve these counts yourself, change the last line of the query to RETURN count(DISTINCT n1) AS targets, count(DISTINCT n2) AS compounds. If you're interested in myelination, see dhimmel/myelinet for additional queries.

We can modify the above query to find compounds that upregulate rather than bind a myelination gene/protein. Compound–upregulates–Gene relationships in Hetionet are from LINCS L1000 [1]. Here we set an extreme z_score threshold to thin the results:

MATCH path = (n0:BiologicalProcess)-[:PARTICIPATES_GpBP]-(n1)-[e2:UPREGULATES_CuG]-(n2:Compound)
WHERE n0.name = 'myelination' AND
e2.z_score > 12
RETURN path

The targets responsible for a side effect

Here we'll investigate a query to identify genes which cause a given side effect when targeted by a compound. Let's look at the side effect Cushingoid (C0332601). The following query identifies the genes that are commonly targeted by Cushingoid-causing compounds:

MATCH path = (n0:SideEffect)-[r1:CAUSES_CcSE]-(n1:Compound)-[r2:BINDS_CbG]-(n2:Gene)
WHERE n0.name = 'Cushingoid'
WITH
[
  size((n0)-[:CAUSES_CcSE]-()),
  size(()-[:CAUSES_CcSE]-(n1)),
  size((n1)-[:BINDS_CbG]-()),
  size(()-[:BINDS_CbG]-(n2))
] AS degrees, path, n2
WITH
  n2,
  count(path) AS PC,
  sum(reduce(pdp = 1.0, d in degrees| pdp * d ^ -0.4)) AS DWPC
RETURN
  n2.identifier AS gene_id,
  n2.name AS gene_symbol,
  n2.description AS gene_name,
  PC, DWPC
ORDER BY DWPC DESC, gene_symbol

The query returns 52 genes with at least one GbCcSE path to Cushingoid. Here are the resulting first three rows ranked by DWPC:

gene_idgene_symbolgene_namePCDWPC
2908NR3C1nuclear receptor subfamily 3, group C, member 1 (glucocorticoid receptor)70.0343
3290HSD11B1hydroxysteroid (11-beta) dehydrogenase 110.0198
5916RARGretinoic acid receptor, gamma30.0196

We can extract the paths behind contributing to the DWPC of the top hit (NR3C1).

MATCH path = (n0:SideEffect)-[r1:CAUSES_CcSE]-(n1:Compound)-[r2:BINDS_CbG]-(n2:Gene)
WHERE n0.name = 'Cushingoid'
  AND n2.name = 'NR3C1'
RETURN path

Cushingoid-NR3C1 Paths

The involvement of NR3C1 in Cushingoid makes biological sense. NR3C1 encodes the glucocorticoid receptor, and chronic elevation of glucocorticoid levels can result in Cushing’s disease [1].

Next, we'll compare our findings to the gene targets predicted to cause Cushingoid in a 2012 study [2]. Supplementary Table 5 contains predicted target–ADR (adverse drug reaction) relationships. If we filter for Cushingoid, the predictions contain the following 8 targets: AR, KDR, NR3C1, NR3C2, PDGFRA, PDGFRB, SERPINA6, TEK. Of these 8 targets, NR3C1 was the top prediction with a Chi-square statistic of 1922.5. The targets NR3C2 and SERPINA6 were also present in our 52 Hetionet-derived genes.

Hetionet v1.0 contains 5,734 side effects. The workflow and queries above can be used by researchers to highlight the potential target genes responsible for a side effect of interest.

 
Views
160
Topics
Referenced by
Cite this as
Daniel Himmelstein (2016) Exploring the power of Hetionet: a Cypher query depot. Thinklab. doi:10.15363/thinklab.d220
License

Creative Commons License

Share