I'm excited to announce the initial release of our catalog of drug therapies for disease. The catalog contains physician curated medical indications. It's available on figshare and GitHub and licensed to be maximally reusable.
This initial release contains 97 diseases and 601 drugs. Between these drug–disease pairs, there are 755 disease-modifying therapies, 390 symptomatic therapies, and 243 non-indications. To enable integrative analyses, drugs and diseases are coded using DrugBank and Disease Ontology identifiers.
The catalog adheres to pathophysiological principals first. Therefore, the catalog includes indications with a poor risk–benefit ratio that are rarely used in the modern clinic. Contributions are welcome as we hope to expand and refine the catalog over time.
History & Methods
One of our priorities from the beginning of this project was to construct a catalog of efficacious pharmacotherapies. Since our approach learns how to repurpose drugs based on the indications we feed it, a high quality indication catalog was a crucial.
Compilation and data integration
We began by looking for existing indication resources. In a discussion which generated 23 comments — the most of any Thinklab discussion to date — we received helpful suggestions from the community. Based on these suggestions and our research, we proceeded by integrating four resources:
MEDI-HPS — indications from RxNorm, SIDER 2, MedlinePlus, and Wikipedia (discussed).
LabeledIn — indications extracted from drug labels by experts  and crowdsourced non-experts  (discussed).
ehrlink — indications from electronic health records where physicians linked medications to problems (discussed).
PREDICT — indications from UMLS relationships, drugs.com, and drug labels (discussed).
Next, we decided physician curation was needed to separate disease-modifying from symptomatic indications. We recruited two physician curators (@chrissyhessler & Ari J. Green) to perform a pilot on 50 random indications. Then together, we defined disease modifying as "a drug that therapeutically changes the underlying or downstream biology of the disease" and symptomatic as "a drug that treats a significant symptom of the disease."
The two curators then each reviewed all 1,388 indications and classified them as disease modifying (DM), symptomatic (SYM), or a non-indication (NOT). The initial two curators disagreed 444 times. We recruited a third curator (@pouyakhankhanian) who had access to the prior curations. The third curator developed a detailed methodology that helped us reach consensus for the time being.
We're receptive to feedback on how to improve PharmacotherapyDB. For future releases, we hope to curate the unpropagated indications, include additional sources, and expand our disease and drug vocabularies.
Lars Juhl Jensen: I get an error when trying to follow the figshare link. Looks like the DOI is either wrong or not registered correctly (yet).
Daniel Himmelstein, Pouya Khankhanian, Christine S. Hessler, Ari J. Green, Sergio Baranzini (2016) Figshare. doi:10.6084/m9.figshare.3103054
Category breakdown by resource
Using the consensus curation, we have gone back and calculated the composition of indication category by resource (notebook).
The table indicates that of the 793 indications we extracted from MEDI-HPS, 532 (67.1%) were disease modifying. In short, we found that MEDI-HPS and LabeledIn contained the highest percentage of disease-modifying indications. EHRLink, which is based on electronic health records, contained the highest percentage of symptomatic (35.2%) and non (20.5%) indications.
Category breakdown by number of resources
Next, we looked at the category composition based on the number of resources reporting each indication.
# of Resources
The more resources that reported an indication the more likely it was to be disease modifying: indications in only a single resource were disease modifying 47.4% of the time whereas indications in all four resources were disease modifying 89.1% of the time.