Data nomenclature: naming and abbreviating our network types

Daniel Himmelstein, Lars Juhl Jensen, Pouya Khankhanian

doi:10.15363/thinklab.d162

Project:

Rephetio: Repurposing drugs on a hetnet [rephetio]

Data nomenclature: naming and abbreviating our network types

Daniel Himmelstein Researcher Feb. 17, 2016

We've created a preliminary network with 10 types of nodes (metanodes) and 27 types of edges (metaedges). Now an important detail is naming node and edge types appropriately.

For each metanode and metaedge, we also need abbreviations. We use the abbreviations to make writing out complete paths less cumbersome. For example, in our previous project, we abbreviated the Gene - interaction - Gene - expression - Tissue - localization - Disease path to GiGeTlD [1].

We have several conventions for naming and abbreviations, but they haven't been publicly explained or discussed. This discussion is now home to these topics.

Daniel Himmelstein Researcher Feb. 17, 2016

Naming according to parts of speech

According to Chen's rules of thumb, we should use parts of speech as follows [1]:

common nouns for node labels (types)
proper nouns for node names
transitive verbs for relationship (edge) types
intransitive verbs for property (attribute) types
adjectives for node properties
adverbs for relationship properties

I'm not convinced about the last three, since our properties (data attributes for nodes and relationships) are often highly technical. However, I think we should adhere to the first three rules when possible.

Our node labels are already common nouns. Our node names are already proper nouns. However, we were using common nouns for relationship types. Thus, I switched to transitive verbs for relationship types (commit). The table below shows the noun (old) and verb (new) relationship type.

Source	Target	Metaedge (noun)	Metaedge (verb)
compound	gene	binding	binds
compound	side effect	causation	causes
compound	gene	downregulation	downregulates
compound	disease	indication	palliates
compound	compound	similarity	resembles
compound	disease	indication	treats
compound	gene	upregulation	upregulates
disease	gene	association	associates
disease	gene	downregulation	downregulates
disease	anatomy	localization	localizes
disease	symptom	presence	presents
disease	disease	similarity	resembles
disease	gene	upregulation	upregulates
gene	anatomy	downregulation	downregulates
gene	gene	evolution	evolves
gene	anatomy	expression	expresses
gene	gene	interaction	interacts
gene	biological process	participation	participates
gene	cellular component	participation	participates
gene	molecular function	participation	participates
gene	pathway	participation	participates
gene	perturbation	regulation	regulates
gene	anatomy	upregulation	upregulates
gene	gene	knockdown downregulation	knockdown downregulates
gene	gene	knockdown upregulation	knockdown upregulates
gene	gene	overexpression downregulation	overexpression downregulates
gene	gene	overexpression upregulation	overexpression upregulates

In several cases, switching from noun to verb cut out several characters — a welcome occurrence. Switching relationship types to verbs also makes sense as part of our migration to neo4j. The neo4j convention is to use verbs for relationship types. In fact, a neo4j company explains relationships by saying:

Where nodes can be thought of as nouns, relationships can be thought of as verbs.

Lars Juhl Jensen Feb. 18, 2016

The compound-gene associations are not intuitive to me. I assume that when, for example, a compound downregulates a gene, it is supposed to mean that the compound inhibits the protein product encoded by the gene. However, if read at face value, it would mean that the compound binds to something else that through some signaling results in down-regulation of the gene (i.e. less transcription).

The gene-gene association "evolves" is bit of a misnomer, I think. Unless you are looking at ancestral genes, one gene will not have evolved from another gene. Rather two genes will share ancestry. In that case, the term "homology" is would be much clearer. Also, you probably want to be able to distinguish between orthologs and paralogs in your network.

Are the gene-anatomy relationships not backwards? I can understand what it means that means that the liver "upregulates" a gene (I assume it means that the gene is higher expressed in the liver than elsewhere). But I cannot comprehend what it would mean that a gene upregulates the liver.

Same goes for gene-pertubation relationships. I can understand that a pertubation regulates a gene, but how can a gene regulate a pertubation? And why is this type of association not divided into up- and down-regulation like everything else?

I am not entirely sure how useful the "knockdown downregulates" etc. types are. Usually "knockdown downregulates" would be interpreted to mean "upregulates" etc.

Daniel Himmelstein Researcher Feb. 18, 2016

I assume that when, for example, a compound downregulates a gene, it is supposed to mean that the compound inhibits the protein product encoded by the gene. However, if read at face value, it would mean that the compound binds to something else that through some signaling results in down-regulation of the gene (i.e. less transcription).

Your face value interpretation is correct. Compound–downregulates–Gene means the compound decreases the transcriptional expression of the gene. We extracted these relationships from LINCS L1000.

The gene-gene association "evolves" is bit of a misnomer

I agree, "evolves" is not good. This edge signifies evolutionary rate covariation [1]. It's a mouthful, and I don't know the best way to shorten and verbify it. Perhaps "covaries" is an improvement?

Are the gene-anatomy relationships not backwards? … Same goes for gene-pertubation relationships.

Great point. We should present these edges in subject-verb-object order. I have switched the default orientation of the confusing metaedges (commit). In practice the object-verb-subject order may still arise, for example when representing paths.

I am not entirely sure how useful the "knockdown downregulates" etc. types are. Usually "knockdown downregulates" would be interpreted to mean "upregulates" etc.

I will look into collapsing:

knockdown downregulates with overexpression upregulates to create an upregulates edge
knockdown upregulates with overexpression downregulates to create a downregulates edge

Daniel Himmelstein Researcher Feb. 22, 2016

Indication terminology

We've been referred to when a drug treats a disease as an "indication". While readers with a medical background understand the term, others find "indication" confusing.

Now we've split our indications into two categories: disease-modifying and symptomatic. Additionally, we've switched to using verbs to describe relationships.

Given these factors, I chose "treats" for disease-modifying indications and "palliates" for symptomatic indications. This terminology aligns with a recent repurposing study [1], which refers to

distinguishing non-causative and palliative from causative and effective treatments

While readers may not be familiar with the term palliates, it has an applicable and precise definition (making lookup easier):

Make (a disease or its symptoms) less severe or unpleasant without removing the cause

@pouyakhankhanian, do you think the treats/palliates terminology makes sense?

Pouya Khankhanian Researcher Feb. 23, 2016

I certainly agree with maintaining the terminology consistent with prior studies. I think the terms "indication and "palliates" are well defined as you describe. My only concern is the use of the word "treat" to mean "disease-modifying" as opposed to symptom management, especially since it is very common to use the phrase "treat symptoms".

If there are other prior studies that use alternate terminology, it might be best to align with those. Otherwise, I would think the two goals are (1) maintain previous terminology and (2) make sure to define our terminology very clearly.

Daniel Himmelstein Researcher Feb. 24, 2016

I'm not sure the phrase "drug X treats symptom Y" is that problematic, since symptom Y is the sentence's subject rather than a disease. I agree that we should maintain existing terminology, but I'm not finding much guidance in the literature.

Potential alternatives to "treats" for representing disease-modifying indications are: modifies, medicates, indicates, remedies, ameliorates, betters, improves, corrects, affects, alleviates, repairs, and cures. @pouyakhankhanian, do you prefer any of these verbs to "treats"?

And regardless of which term we pick, we'll make sure to define each relationship type.

Daniel Himmelstein Researcher April 18, 2016

Hetionet v1.0 type nomenclature

We've settled on a final type nomenclature for Hetionet v1.0 (our hetnet for this project). See the following tables:

Metanodes where metanode is the primary name, abbreviation is the 1–2 letter abbreviation, and label is the Neo4j node label.
Metaedges where metaedge is the primary name, unicode_metaedge is a styled version of the primary name, standard_metaedge is the primary edge orientation, and inverted indicates the non-primary edge orientation. The remaining columns are abbreviation, standard_abbreviation, source, and target.
Neo4j relationship types where metaedge is the primary name, rel_type is Neo4j relationship type, and direction notes whether edges are bidirectional (both) or directed (forward or backward).

Neo4j type nomenclature

We conform to the Neo4j style of CamelCase labels and ALL_CAPS relationship types. In addition, Neo4j relationship types are appended with metaedge standard abbreviations. This adds source/target-metanode awareness to relationship types and enables optimized queries.

Status: Completed

Views

165

Topics

Nomenclature Drug Repurposing Hetnets

Referenced by

Cite this as

Daniel Himmelstein, Lars Juhl Jensen, Pouya Khankhanian (2016) Data nomenclature: naming and abbreviating our network types. Thinklab. doi:10.15363/thinklab.d162

License