Rephetio: Repurposing drugs on a hetnet [rephetio]

Incorporating DrugCentral data in our network

I spoke with @TIOprea and Oleg Ursu from the University of New Mexico. They are constructing a highly curated yet highly integrative database of pharmacology named DrugCentral. They have not yet published a journal article detailing their database. However, they have posted an alpha webapp and data repository, which provide access to select components of the database.

My impression was that the database is similar in concept to DrugBank but has key advantages in certain areas. First, it has integrated types of data which are not currently part of DrugBank. Second, it takes a more clinical approach to curation compared to DrugBank. For example, drug–target relationships in DrugCentral adhere more to the "three pillars [1]" of pharmacological activity.

I created a repository (dhimmel/drugcentral) to process parts of DrugCentral for inclusion in our network. Details of the integration will follow.

Contributions to our hetnet

I processed DrugCentral data and converted it into the identifier systems used by our network (notebook). I have initially added two relationship types from DrugCentral into the hetnet (commit).

Drug targets

I extracted drug–target relationships from DrugCentral and converted them into the DrugBank and Entrez Gene identifiers in our network (dataset). The table below shows the sources from which DrugCentral compiled drug targets and how many relationships each source contributed.

DrugCentral (ChEMBL)2,922
DrugCentral (literature)182
DrugCentral (label)89
DrugCentral (IUPHAR)56
DrugCentral (KEGG DRUG)25

Prior to including DrugCentral, our network contained 10,747 Compound–binds–Gene relationships from DrugBank and BindingDB. Drug targets from DrugCentral added 824 additional binding relationships.

Pharmacologic classes

DrugCentral has compiled the membership of compounds in pharmacologic classes from several sources, which contain the following types of classes:

  • FDA — Mechanism of Action
  • FDA — Physiologic Effect
  • FDA — Chemical/Ingredient
  • FDA — Established Pharmacologic Class
  • MeSH — Pharmacological Action
  • CHEBI — Application

I decided to assign all of these classes to a single node type (Pharmacologic Class). I added a new relationship type for Pharmacologic Class–includes–Compound. DrugCentral contributed 10,959 relationships for 1,262 pharmacologic classes.

Medical indications

In my conversation with DrugCentral team members, we first discussed PharmacotherapyDB, our recently-released physician-curated catalog of indications. One major takeaway was that we needed to more clearly explain that our definition of disease modifying differs from the clinical definition. Also, we need to more clearly state that NOT refers to non-indications.

As part of DrugCentral, they've constructed their own indications catalog. Their seeded their catalog from OMOP in 2012 and have since then manually added additional indications. OMOP has now become OHDSI and hosts their vocabular on GitHub at OHDSI/Vocabulary-v5.0. As a side note, we were not aware of OMOP [1] or OHDSI [2] when we assembled our indications for version 1.0 of PharmacotherapyDB.

Aligning indications with PharmacotherapyDB

I converted the DrugCentral indications to the slim sets of DrugBank drugs and Disease Ontology diseases in PharmacotherapyDB 1.0 (notebook, dataset). For each disease, I aggregated direct indications as well as indications for subtypes (referred to as propagation).

In the converted dataset, I included a category column giving the indication's PharmacotherapyDB 1.0 status. Of a total of 671 indications extracted from DrugCentral, 210 were not in PharmacotherapyDB 1.0. Of the 461 indications in PharmacotherapyDB, 359 were classified as disease modifying (78%), 77 were classified as symptomatic (17%), and 25 were classified as non-indications (5%).

6 of the non-indications were for anemia and 8 were for hypertension, two diseases for which we have a known problem with their generality. Compared to the four sources of PharmacotherapyDB indications, DrugCentral appears to have a higher percentage of disease modifying indications. However, we're basing this assessment on indications that appeared in DrugCentral and at least one other resource, so it's potentially biased.

@pouyakhankhanian, if you are up for curating the 210 new indications as DM, SYM, or NOT, we could potentially:

  1. add these indications to a future release of PharmacotherapyDB
  2. use these indications to test our predictions
  • Pouya Khankhanian: I'm up for it. I should have some time either late this week or early next week.

Pharmacologic Classes that are indications

We've noticed that many of the pharmacologic classes are essentially indications. This could be problematic since it could confound our classification approach. Specifically, it could lead to the appearance that our method predicts indications when in reality it just regurgitates indications which were encoded by a pharmacologic class.

Some examples of classes that resemble indications are:

N0000175482AntimalarialFDAFDA Established Pharmacologic Class
D018501Antirheumatic AgentsMeSHPharmacological Action

@sergiobaranzini and I looked through the 6 sources and found that 3 were less problematic:

  • FDA — Chemical/Ingredient
  • FDA — Mechanism of Action
  • FDA — Physiologic Effect

The other 3 were more problematic:

  • FDA — Established Pharmacologic Class
  • MeSH — Pharmacological Action
  • CHEBI — Application

Therefore, I excluded classes from the 3 more problematic sources. This reduced the number of classes from 1,262 to 345, the number of edges from 10,959 to 1,029, and the number of compounds in a class from 1,423 to 724 (commit).

One step would be to salvage many of the filtered classes by manual curation. The majority of the removed classes did not overlap with DO Slim diseases and thus shouldn't confound our analysis. If we decide to curate, we'll have to decide whether to exclude all indications or just indications in DO Slim.

@dhimmel does bosentan indication for hypertension originate from DrugCentral? if so, there might be an error in your pipeline, the files uploaded to Github have pulmonary hypertension as an indication.

Greetings @olegursu! I used transitive closure [1] on to convert diseases to the level of specificity in Hetionet. This is what I meant by saying:

For each disease, I aggregated direct indications as well as indications for subtypes (referred to as propagation).

I think you've picked up on an issue that came up during our curation. Specifically the Disease Ontology defines pulmonary hypertension (DOID:6432) as a subtype of hypertension (DOID:10763). However, our curator considered the definition of hypertension to be distinct from pulmonary hypertension.

So in conclusion, DrugCentral included a bosentan indication for pulmonary hypertension, which was translated to an indication for hypertension in Hetionet. In the future, I'd like to make transitive closure a query-time decision rather than a builtin, but for now that's not the case.

DrugCentral now published

DrugCentral is now available at and published in Nucleic Acids Research [1].

According to the website, the resource is available under a CC BY-SA 4.0 License.

Looks cool.

I tested it by searching for "lipitor", and was surprised to find hypertension listed as an "indication". I don't think this is right, and I don't immediately see a way to determine how hypertension was assigned as an indication for lipitor.

However, there is a lot of good information here as well. For example, a search for "nifedipine" turned up a contraindication of which I was entirely unaware, and which I could easily confirm by a web search.

  • Pouya Khankhanian: @mkgilson Would you be interested in doing a case study of hypertension predictions by this algorithm? For example, a case study similar to the case study done for epilepsy here ( The algorithm made a great number of high probability predictions for hypertension (there can be a link to a file in thinklab here), as it did for epilepsy. We chose to do a case study of epilepsy in part because our three physician curators were all neurologists. It would be great to get input from someone who is more experienced in general internal medicine to evaluate the predictions for hypertension.

  • Mike Gilson: I'd love to contribute, but don't have time, and also am probably too far from clinical practice these days. I'll bet someone at Stanford could look at this with you, though!

Hi Mike,

Thank you for feedback! Regarding indications for atorvastatin, most of indications for drugs approved before 2012 come from OMOP v4 which in turn imported data from First Data Bank, while quality assessed by us for few samples for this dataset appears to be high there are still indications which either address a disease symptom or associated co-morbidity and it is not clear what is the actual association. We will amend the data and re-upload.

You're welcome! I just now looked for info on OMOP on line, but am not finding any relevant dataset. Is there a link you could provide?

Atorvastatin for hypertension

@mkgilson, thanks for the feedback. According to the DrugCentral publication, here's their method for compiling indications [1]:

Indications (10,707), contra-indications (27,851) and off-label indications (2496) were initially extracted from OMOP data model version 4.4 ( Since the OMOP project transitioned to OHDSI (, updated drug indication and contra-indication data are covered under a revised license agreement that in turn requires subscription licenses (i.e. it is no longer open-access). Therefore, indications for drugs approved after 2012 (322 pairs) were extracted from approved drug labels and mapped onto SNOMED-CT and UMLS concepts.

Note that we also created a catalog of indications called PharmacotherapyDB. When creating this resource, we had three physicians curate all of our indications. Interestingly, all three of our curators classified atorvastatin (lipitor) as a disease-modifying indication for hypertension. Atorvastatin was also considered disease-modifying for coronary artery disease but not for type 2 diabetes mellitus.

It appears that the verdict is still out on whether statins lower blood pressure [2, 3, 4, 5], but perhaps physicians are prescribing atorvastatin as an off-label treatment for hypertension and this is what our curators picked up on. @pouyakhankhanian, do you remember your reasoning here?

Most likely the association is via cholesterol which is risk factor for hypertension, and statins lower cholesterol.

  • Mike Gilson: But I don't think hypercholesterolemia is an independent risk factor for hypertension...

@dhimmel Thanks, Daniel. Before posting my original comment, I did a quick web search for atorvastatin and HTN, and found something very equivocal: there seemed to be supportive statistics, but the mean drop in BP observed was paltry, something like 0.5 - 1 mm Hg, on typical systolic and diastolic values of 120 and 70 mmHg. I'd be interested to know if a more robust effect has in fact been observed.

Though I cannot speak for the other curators, my own clinical suspicion was call atorvastatin as NOT for hypertension (HTN), because, for example, the other two statins in our curation database are also listed as NOT for HTN. As the only curator who was not blind to the other two curators' selections, I saw the choice of DM by the other two reviewers and therefore did a cursory round of research, found [1], which is specific for atorvastatin, and therefore agreed with the other reviewers. Upon more detailed review of this, my thoughts below.

Hyperlipidema (HLD) is treated with an HMG-COARi (the 'statins', such as atorvastatin, simvastatin, lovastatin). The decision to treat HLD with a statin, and the strength of statin to use, is a decision guided by a "risk factor" score which predicts poor cardiovascular outcomes, the ASCVD score is the latest in use in the last few years. The ASCVD risk score and many other scoring systems use your blood pressure as a major factor in determining if and how much statin you get for your HLD.

HTN is treated with antihypertensives, commonly guided by the JNC8 paradigm [2]. The decision to treat HTN and the aggressiveness of therapy is also guided toward reducing poor cardiovascular outcomes.

Therefore, in clinical practice, the treatment of HTN and HLD is generally thought to be really two parts of the same battle, with the goal being to decrease the number of poor cardiovascular outcomes (death or major disability from MI or stroke or PVD). And the latest trend is combination treatments which include statins and antihypertensives such as [3].

Given that clinically we are moving toward the use of mixing antihypertensive drugs with statins in clinical practice, I'm not sure we will have more evidence in the future as to the efficacy of a statin alone (in the absence of antihypertensive use) on hypertension alone (in the absence of hyperlipidemia). For example, note the possibility of confounding between HTN and HLD in the articles referenced by Daniel above. Therefore, the best evidence we have would be [1]. In that case, I suppose one could say atorvastatin is DM for HTN, but I wouldn't disagree with calling it NOT for HTN. Furthermore, one could make a case that all three statins should be DM if one of them is DM, but I would personally think that's too much of a stretch. Here is the list of how the all of the statins were designated by the three curators. You will note the lack of completeness of the PharmacotherapyDB list (every statin not listed for every indication), and I would say this was quite usual other drug classes in the database as well.

Atorvastatincoronary artery diseaseDMDMDM
Lovastatincoronary artery diseaseDMDMDM
Pitavastatincoronary artery diseaseDMDMDM
Pravastatincoronary artery diseaseDMDMDM
Rosuvastatincoronary artery diseaseDMDMDM
Simvastatincoronary artery diseaseDMDMDM
Pravastatinprostate cancerNOTDMNOT
Atorvastatintype 2 diabetes mellitusNOTNOTNOT
Simvastatintype 2 diabetes mellitusNOTNOTNOT

Also of interest is how the statins were ranked to help in each disease.

  • Tudor Oprea: see my post below about CADUET and the most likely (Occam's razor) explanation for how Atorvastatin got annotated as anti-hypertensive. I remain skeptical that this is the case, speaking from a molecular interactions perspective. the algorithm may be biased by the mixtures that feed into the system confounding factors.

The algorithm predicts hypertension fairly highly for some of the other statins as well. Here are the top predictions of the algorithm for any statin:

Simvastatincoronary artery diseaseDM19.0%99.5%
Lovastatincoronary artery diseaseDM16.9%99.4%
Pravastatincoronary artery diseaseDM14.8%99.3%
Rosuvastatincoronary artery diseaseDM13.2%99.0%
Fluvastatincoronary artery disease6.7%98.4%
Pitavastatincoronary artery diseaseDM4.8%97.4%
Atorvastatincoronary artery diseaseDM3.2%96.7%
Simvastatintype 2 diabetes mellitusNOT1.6%95.4%
Pravastatintype 2 diabetes mellitus1.3%94.5%
Lovastatintype 2 diabetes mellitus1.3%94.4%
Rosuvastatintype 2 diabetes mellitus1.3%94.3%
Pitavastatintype 2 diabetes mellitus1.2%93.6%
Atorvastatintype 2 diabetes mellitusNOT1.1%93.1%

It appears above the algorithm likes some of the other statins more than atorvastatin for hypertension. Here are the top results for atorvastatin:

Atorvastatincoronary artery diseaseDM3.23%99.26%96.75%
Atorvastatintype 2 diabetes mellitusNOT1.09%98.53%93.11%
Atorvastatinchronic kidney failure0.42%97.06%95.45%
Atorvastatinmetabolic syndrome X0.42%96.32%95.77%
Atorvastatinepilepsy syndrome0.36%95.59%53.38%
AtorvastatinKawasaki disease0.36%94.85%95.12%
Atorvastatinfocal segmental glomerulosclerosis0.33%93.38%93.89%
Atorvastatinprimary biliary cirrhosis0.32%92.65%87.78%
Atorvastatinprostate cancer0.32%91.91%66.38%
Atorvastatinacquired immunodeficiency syndrome0.30%91.18%62.74%
Atorvastatinrheumatoid arthritis0.29%90.44%79.00%
Atorvastatinbreast cancer0.28%89.71%58.13%
Atorvastatinpancreatic cancer0.27%88.97%80.17%
Atorvastatinlung cancer0.27%88.24%73.02%

Hypertension is highly ranked for atorvastatin, but atorvastatin is not highly ranked for hypertension.

  • Pouya Khankhanian: note, my filter above also included pentostatin as a statin. I manually removed this in a re-post

Thanks, this is very informative! I agree that treatment of HTN and HLD are two parts of the same battle — reduction of cardiovascular risk — but this in itself would not be a good rationale for saying atorvastatin is indicated for HTN; only that elevated cardiovscular risk may be viewed as an indication for both statins and antihypertensives. If being two parts of the same battle were valid, then one would, by the same token, say that hypercholesterolemia is an indication for antihypertensives!

As to the literature regarding antihypertensive effects of statins– I'm skeptical that any physician would regard HTN as an off-label indication for a statin. I wonder if there is a way to find out...

  • Pouya Khankhanian: agree that this is not a rationale for saying atorva it is indicated in htn. i present this as a rationale for why we may never know the true answer.

    i also wholly agree that physicians do not regard HTN as an off-label indication for a statin. (note that this is not how "DM" was defined).

Thanks again. What exactly does DM mean?

  • Daniel Himmelstein: It's a shorthand for disease-modifying indication. In PharmacotherapyDB, the three physicians classified each indication as disease modifying (DM), symptomatic (SYM), or non-indication (NOT).

  • Pouya Khankhanian: The definition of DM was a subject of great debate, and was finally defined in this discussion

    Abstracted from that discussion:

    disease modifying (DM) — a drug that therapeutically changes the underlying or downstream biology of the disease
    symptomatic (SYM) — a drug that treats a significant symptom of the disease
    non-indication (NOT) — a drug that neither therapeutically changes the underlying or downstream biology nor treats a significant symptom of the disease

    reasonable evidence of efficacy is required to be classified as disease modifying or symptomatic. This includes off-label use.
    if no classification accurately describes an indication, the most appropriate (although imperfect) classification should be chosen

    Amendment 1: if a drug was previously indicated, but is no longer used due to side effects, or because there are better drugs, it is still considered DM
    Amendment 2: it doesn't matter whether it is first line or fifth line, it's still considered DM

    Assumption 1: DM trumps SYM. If a drug is clearly both disease modifying and also treats symptoms, then I will call it disease modifying. This is because most disease modifying drugs also treat symptoms.

    Assumption 2: SYM trumps NOT. If a drug is clearly symptomatic treatment, but can actually exacerbate the downstream biology of disease, then I chose SYM. I made this choice because this was the choice I saw most often made by AJG and CSH

    Expert curation of our indication catalog for disease-modifying treatments
    Daniel Himmelstein, Pouya Khankhanian, Chrissy Hessler (2015) Thinklab. doi:10.15363/thinklab.d95

Interesting classification. So for HTN, are all drugs either DM or NOT, given that HTN is typically asymptomatic?

I can relate to the challenge of arriving at hard definitions for concepts in biology and medicine that turn out to be complicated and case-dependent!

One thing that comes to mind is that, in medicine, a "symptom" is something a patient experiences. Thus, HTN is not a symptom. Instead, it is a "sign", something the physician may observe. There's the further complexity that essential HTN is probably best regarded as its own disease, whereas secondary HTN (e.g. due to renal artery stenosis), might not be best to regard as its own disease.

@mkgilson posts an interesting question about disease classification in hypertension.

The answer is found in this file which documents our decision making for each call (in the first sheet). Also the discussion of the file here.

To answer your specific question, there were two medications (diazoxide and phentolamine) which were classified as SYM for HTN because they were only used in the treatment of hypertensive emergency, which we decided was a sort of symptom of hypertension. Again, a very grey line and open to debate regarding what "hypertension" is and what it means to be DM vs SYM, (if you search the word "hypertension" on this page). Interesting to note in the predictions for these two drugs, both were predicted to be good treatments for hypertension (the entity of hypertension as we have defined it).


Since other people may have similar questions to mine, how about putting your definitions/usage of DM, SYM and NOT in, e.g., the FAQ? Sorry if it's there and I'm missing it.

Back to the details... my off the cuff thought would have been that essentially none of the common HTN drugs are disease modifying because they don't treat the underlying cause. They only compensate for it, so if you stop taking them, the HTN is back the same as ever. So the disease isn't modified. In contrast, an antibiotic truly eliminates the root cause of an infection.

Regarding your comment about a drug not being designated as DM because withdrawal of the drug causes relapse of the disease, one could make the same argument for many other diseases: anti-epileptic drugs do not cure epilepsy, immuno-suppressants do not cure auto-immune disease, and chemo-therapies do not cure most cases of cancer. But this decision (DM vs SYM) is actually academic.

When looking at the input to the algorithm, as you do here, recall that the data feeds in essentially as binary (and I believe SYM was essentially treated as "on" in the main report but we also ran it as "off", @dhimmel would have to confirm this). So to switch all of a disease's agents from SYM to DM would not really change the output of the algorithm. Also recall that there are abundant false negatives in the input data. This level of false negatives was unfortunately quite necessary but also very proved very important in testing the output of the data.

So, assuming a connection is truly DM but we mis-label it as NOT, then that would add to the already abundant false negative rate in the input data and presumably have little effect on the output. Therefore, I would not be strongly against removing any edge (changing DM to NOT) in the input data in general. And I know that @dhimmel tested his algorithm to be robust to such perturbations in the input.

Moving forward, the algorithm is meant to be automatically update-able in the future. I think it would be cool to crowdsource the input, essentially taking a vote as to whether things should be DM or SYM or NOT.

  • Daniel Himmelstein: Great points. Just wanted to clarify that symptomatic treatments were not used as positives to train the model. Only disease-modifying (DM) treatments were. In fact, symptomatic treatments were considered negatives, but excluding them all together wouldn't have made a big difference (since there were 29,044 negatives, of which only 390 were symptomatic treatments).

I'm perhaps overly influenced by the use of "disease modifying" in the context of rhematoid arthritis: The specific meaning is that such a drug prevents joint damage, rather than just reducing pain. By analogy, I'd agree that antiepileptics are not disease modifying. Other cases get tougher. I'm impressed in any case by the level of care you guys have put into all of this.

Dumb question: what algorithm? I was viewing this only as a database.

To answer @mkgilson 's question, the algorithm (described here) used the database of connections to predict what drugs would treat what disease. The results are also found in the same discussion.

Should have responded sooner, but got side-tracked with my own work and was told by @olegursu that he had answered this. So here's my two cents as to why Atorvastatin got annotated as treatment for essential hypertension, an indication "bleeding" from OMOP (now rebranded as OHDSI) that probably should have been carefully revised.
First off, I agree with @mkgilson, atorvastatin has no business treating HTN. It simply does not lower blood pressure. Some preliminary results suggested this to be the case, but systematic analysis did not reproduce this. - in particular the study from Ostra Sjuikuset / Gothenburg / Sweden shows no difference (though the UCSD Statin Study claims a small effect).
I went to STITCH and looked at direct evidence for interacting partners between atorvastatin and proteins ( but could not piece together any direct (or even indirect) way for this molecule to lower blood pressure. As additional qualifier, I did my PhD in molecular physiology and studied catecholamines for 5 years, and am somewhat familiar with mechanisms for lowering blood pressure.
Second, and here's where I hypothesize that FirstDataBank annotators (hence OMOP and now DrugCentral) got this wrong: Atorvastatin is formulated not only as LIPITOR but also as CADUET. And CADUET contains amlodipine besylate in addition to atorvastatin calcium ( Error understandable, case closed.

Hi Tudor. I can see how the caduet case could have generated this anomaly. Is the lesson to omit data for combination drugs?

@mkgilson: definitely, I would start with 1-active ingredient drugs only, and build my Indications that way. Then go through 2-APIs and match known indications, and look for synergies (e.g., are there new indications for the combo that do not work when taking the 2 drugs separately). And so forth...

Just one additional point of clarification, with respect to the 1-2 mm Hg blood pressure lowering effect of atorvastatin from the UCSD Statin Study group ( In the first 4 years of medical school, I measured blood pressure (manually) for more than 100 patients, as well as 20 healthy volunteers. Differences of 5 mm Hg are found just by shifting from left hand to right hand; measuring the same person the same time, next day, can give that variation; measurements done by someone else (recall this was done using a stethoscope under the cuff) can give even more variations; and so forth. This is important enough that it warrants its own error table... Since that Statin Study group Medscape reference is an abstract at a conference (i.e., no follow-up peer reviewed paper), we can most likely attribute those differences to experimental error, and conclude that the effect is not there.

I agree with you, Tudor. That's why I characterized the drop as "paltry" :-) (Though, in principle, if one averages over enough data, one could resolve a shift in the mean of 2 mm Hg using data with a 5 mm Hg standard deviation.)

Status: Completed
Referenced by
Cite this as
Daniel Himmelstein, Oleg Ursu, Mike Gilson, Pouya Khankhanian, Tudor Oprea (2016) Incorporating DrugCentral data in our network. Thinklab. doi:10.15363/thinklab.d186

Creative Commons License