Exploring the power of Hetionet: a Cypher query depot

Daniel Himmelstein

doi:10.15363/thinklab.d220

Project:

Rephetio: Repurposing drugs on a hetnet [rephetio]

Exploring the power of Hetionet: a Cypher query depot

Daniel Himmelstein Researcher June 25, 2016

Hetionet v1.0 is now available online as a Neo4j database at https://neo4j.het.io. While Project Rephetio is focusing primarily on drug repurposing, we'd like to illustrate the versatility of Hetionet for answering a broad range of biomedical questions. We want to exhibit that hetnets aren't vaporware. We want to exhibit that a dozen lines of Cypher code in Hetionet can immediately perform analyses that previously would have taken months to implement.

If you're a biologist and have interesting questions that Hetionet may know about, ask away. If you think you have an interesting query, please share.

Daniel Himmelstein Researcher June 25, 2016

GO Process enrichment for migraine genes

Here we'll show a query for identifying prominent GO Processes in a set of disease-associated genes. We'll use migraine as an example disease. First, we can see the 46 genes associated with migraine by querying MATCH (:Disease {name: 'migraine'})-[rel:ASSOCIATES_DaG]-() RETURN rel. Now we'll compute the DWPC (degree-weighted path count) between migraine and each GO Process. We'll restrict our results to processes with at least 5 participating genes (of which two or more are migraine-associated). Here's the Cypher:

// Search for DaGpBP paths starting with migraine
MATCH path = (n0:Disease)-[:ASSOCIATES_DaG]-(n1)-[:PARTICIPATES_GpBP]-(n2:BiologicalProcess)
WHERE n0.name = 'migraine'
// Implement the DWPC to adjust for node degree along paths
WITH
[
  size((n0)-[:ASSOCIATES_DaG]-()),
  size(()-[:ASSOCIATES_DaG]-(n1)),
  size((n1)-[:PARTICIPATES_GpBP]-()),
  size(()-[:PARTICIPATES_GpBP]-(n2))
] AS degrees, path, n2
WITH
  // Return the GO Process ID and name
  n2.identifier AS go_id,
  n2.name AS go_name,
  count(path) AS PC,
  // Compute the DWPC
  sum(reduce(pdp = 1.0, d in degrees| pdp * d ^ -0.4)) AS DWPC,
  // Count the number of genes in the GO Process
  size((n2)-[:PARTICIPATES_GpBP]-()) AS n_genes
  WHERE n_genes >= 5 AND PC >= 2
RETURN
  go_id, go_name, PC, DWPC, n_genes
ORDER BY DWPC DESC
LIMIT 5

The query took less than half of a second and performed 60,183 database hits. It returned the following top 5 GO Processes:

go_id	go_name	PC	DWPC	n_genes
GO:0007210	serotonin receptor signaling pathway	8	0.061	18
GO:0050884	neuromuscular process controlling posture	2	0.048	15
GO:0042310	vasoconstriction	8	0.043	28
GO:0006812	cation transport	16	0.033	781
GO:0014821	phasic smooth muscle contraction	3	0.033	17

Next, we'll look bolster the query by adding protein interaction relationships.

Daniel Himmelstein Researcher June 25, 2016

Tissue-specific interactomics: GO Process enrichment for multiple sclerosis

The following query performs a more advanced GO Process enrichment analysis for multiple sclerosis (MS) genes. First, we restrict to GWAS-associated genes, which have the advantage of being less biased by existing knowledge. Second, we add a protein interaction relationship to identify genes in the MS neighborhood of the interactome. However thirdly, we require that these genes are upregulated in an MS-affected tissue (anatomy). Therefore, we can hopefully capture some of the benefits of tissue-specific gene networks [1] on our anatomy-agnostic interactome.

MATCH path = (n0:Disease)-[e1:ASSOCIATES_DaG]-(n1)-[:INTERACTS_GiG]-(n2)-[:PARTICIPATES_GpBP]-(n3:BiologicalProcess)
WHERE n0.name = 'multiple sclerosis'
  AND 'GWAS Catalog' in e1.sources
  AND exists((n0)-[:LOCALIZES_DlA]-()-[:UPREGULATES_AuG]-(n2))
WITH
[
  size((n0)-[:ASSOCIATES_DaG]-()),
  size(()-[:ASSOCIATES_DaG]-(n1)),
  size((n1)-[:INTERACTS_GiG]-()),
  size(()-[:INTERACTS_GiG]-(n2)),
  size((n2)-[:PARTICIPATES_GpBP]-()),
  size(()-[:PARTICIPATES_GpBP]-(n3))
] AS degrees, path, n3 as target
WITH
  target.identifier AS go_id,
  target.name AS go_name,
  count(path) AS PC,
  sum(reduce(pdp = 1.0, d in degrees| pdp * d ^ -0.7)) AS DWPC,
  size((target)-[:PARTICIPATES_GpBP]-()) AS n_genes
  WHERE 5 <= n_genes <= 100 AND PC >= 2
RETURN
  go_id, go_name, PC, DWPC, n_genes
ORDER BY DWPC DESC
LIMIT 5

The query took under 5 seconds and required 1,420,721 database hits.

go_id	go_name	PC	DWPC	n_genes
GO:0045347	negative regulation of MHC class II biosynthetic process	3	0.00006	6
GO:0010842	retina layer formation	3	0.00006	22
GO:0045346	regulation of MHC class II biosynthetic process	3	0.00004	13
GO:0060042	retina morphogenesis in camera-type eye	4	0.00003	54
GO:0003407	neural retina development	4	0.00003	55

Interestingly three of the processes involve the retina. Note that the terms are not independent: "retina layer formation" is a subprocess of "retina morphogenesis in camera-type eye" and "retina layer formation". If we're interested more in the MS–"retina layer formation" relationship, we can retrieve the paths behind the DWPC:

MATCH path = (n0:Disease)-[e1:ASSOCIATES_DaG]-(n1)-[:INTERACTS_GiG]-(n2)-[:PARTICIPATES_GpBP]-(n3:BiologicalProcess)
WHERE n0.name = 'multiple sclerosis'
  AND n3.name = 'retina layer formation'
  AND 'GWAS Catalog' in e1.sources
  AND exists((n0)-[:LOCALIZES_DlA]-()-[:UPREGULATES_AuG]-(n2))
RETURN path

multiple sclerosis paths to retina layer formation

Daniel Himmelstein Researcher June 27, 2016

Which anatomies express migraine-associated genes

The following query looks for anatomies (tissues) which express the genes associated with migraine.

MATCH path = (n0:Disease)-[:ASSOCIATES_DaG]-(n1)-[:EXPRESSES_AeG]-(n2:Anatomy)
WHERE n0.name = 'migraine'
WITH
[
  size((n0)-[:ASSOCIATES_DaG]-()),
  size(()-[:ASSOCIATES_DaG]-(n1)),
  size((n1)-[:EXPRESSES_AeG]-()),
  size(()-[:EXPRESSES_AeG]-(n2))
] AS degrees, path, n2 as target
RETURN
  target.identifier AS anatomy_id,
  target.name AS anatomy_name,
  count(path) AS PC,
  sum(reduce(pdp = 1.0, d in degrees| pdp * d ^ -0.5)) AS DWPC,
  size((target)-[:EXPRESSES_AeG]-()) AS n_genes
ORDER BY DWPC DESC
LIMIT 5

anatomy_id	anatomy_name	PC	DWPC	n_genes
UBERON:0001645	trigeminal nerve	7	0.022	36
UBERON:0001785	cranial nerve	10	0.020	66
UBERON:0002363	dura mater	2	0.017	4
UBERON:0002925	trigeminal nucleus	4	0.016	7
UBERON:0002360	meninx	1	0.012	3

The query does a good job identifying migraine-relevant tissues. However, notice that the expression profiles for the retrieved tissues are not very comprehensive: only four genes are known to be expressed in the dura mater, two of which are migraine associated. Therefore, these results are dependent on our gene expression catalog [1], which varies considerably in comprehension by tissue.

Daniel Himmelstein Researcher June 27, 2016

Compounds that target genes involved in myelation

Demyelination is the cause of much disability in multiple sclerosis patients. Below is a simple query to find all compounds that bind to proteins whose genes are involved in myelination:

MATCH path = (n0:BiologicalProcess)-[:PARTICIPATES_GpBP]-(n1)-[:BINDS_CbG]-(n2:Compound)
WHERE n0.name = 'myelination'
RETURN path

myelination-compounds

The query identifies 8 myelination-involved genes that are targeted by 33 compounds. To retrieve these counts yourself, change the last line of the query to RETURN count(DISTINCT n1) AS targets, count(DISTINCT n2) AS compounds. If you're interested in myelination, see dhimmel/myelinet for additional queries.

We can modify the above query to find compounds that upregulate rather than bind a myelination gene/protein. Compound–upregulates–Gene relationships in Hetionet are from LINCS L1000 [1]. Here we set an extreme z_score threshold to thin the results:

MATCH path = (n0:BiologicalProcess)-[:PARTICIPATES_GpBP]-(n1)-[e2:UPREGULATES_CuG]-(n2:Compound)
WHERE n0.name = 'myelination' AND
e2.z_score > 12
RETURN path

Daniel Himmelstein Researcher Aug. 31, 2016

The targets responsible for a side effect

Here we'll investigate a query to identify genes which cause a given side effect when targeted by a compound. Let's look at the side effect Cushingoid (C0332601). The following query identifies the genes that are commonly targeted by Cushingoid-causing compounds:

MATCH path = (n0:SideEffect)-[r1:CAUSES_CcSE]-(n1:Compound)-[r2:BINDS_CbG]-(n2:Gene)
WHERE n0.name = 'Cushingoid'
WITH
[
  size((n0)-[:CAUSES_CcSE]-()),
  size(()-[:CAUSES_CcSE]-(n1)),
  size((n1)-[:BINDS_CbG]-()),
  size(()-[:BINDS_CbG]-(n2))
] AS degrees, path, n2
WITH
  n2,
  count(path) AS PC,
  sum(reduce(pdp = 1.0, d in degrees| pdp * d ^ -0.4)) AS DWPC
RETURN
  n2.identifier AS gene_id,
  n2.name AS gene_symbol,
  n2.description AS gene_name,
  PC, DWPC
ORDER BY DWPC DESC, gene_symbol

The query returns 52 genes with at least one GbCcSE path to Cushingoid. Here are the resulting first three rows ranked by DWPC:

gene_id	gene_symbol	gene_name	PC	DWPC
2908	NR3C1	nuclear receptor subfamily 3, group C, member 1 (glucocorticoid receptor)	7	0.0343
3290	HSD11B1	hydroxysteroid (11-beta) dehydrogenase 1	1	0.0198
5916	RARG	retinoic acid receptor, gamma	3	0.0196

We can extract the paths behind contributing to the DWPC of the top hit (NR3C1).

MATCH path = (n0:SideEffect)-[r1:CAUSES_CcSE]-(n1:Compound)-[r2:BINDS_CbG]-(n2:Gene)
WHERE n0.name = 'Cushingoid'
  AND n2.name = 'NR3C1'
RETURN path

Cushingoid-NR3C1 Paths

The involvement of NR3C1 in Cushingoid makes biological sense. NR3C1 encodes the glucocorticoid receptor, and chronic elevation of glucocorticoid levels can result in Cushing’s disease [1].

Next, we'll compare our findings to the gene targets predicted to cause Cushingoid in a 2012 study [2]. Supplementary Table 5 contains predicted target–ADR (adverse drug reaction) relationships. If we filter for Cushingoid, the predictions contain the following 8 targets: AR, KDR, NR3C1, NR3C2, PDGFRA, PDGFRB, SERPINA6, TEK. Of these 8 targets, NR3C1 was the top prediction with a Chi-square statistic of 1922.5. The targets NR3C2 and SERPINA6 were also present in our 52 Hetionet-derived genes.

Hetionet v1.0 contains 5,734 side effects. The workflow and queries above can be used by researchers to highlight the potential target genes responsible for a side effect of interest.

Views

162

Topics

Network Biology Hetionet Cypher Neo4J

Referenced by

Research report: Rephetio: Repurposing drugs on a hetnet

Cite this as

Daniel Himmelstein (2016) Exploring the power of Hetionet: a Cypher query depot. Thinklab. doi:10.15363/thinklab.d220

License