Processing LabeledIn to extract indications

Daniel Himmelstein, Ritu  Khare

doi:10.15363/thinklab.d46

Project:

Rephetio: Repurposing drugs on a hetnet [rephetio]

Publication:

LabeledIn: Cataloging labeled indications for human drugs

Processing LabeledIn to extract indications

Daniel Himmelstein Researcher April 2, 2015

The LabeledIn resource consists of an expert curated [1] and crowdsourced [2] components. Here we will discuss parsing these resources to extract indications.

Jesse Spaulding: For future reference it's probably better to wait until you have more to say here before you post this. The project's followers probably don't need to know what you are planning to post here (until you actually post it!) What do you think? And yes, 'draft mode' is coming soon!
Daniel Himmelstein: Hey Jesse, I was making a stub because one project member may be interested in posting and I thought this would simplify the process.
Jesse Spaulding: Oh alright, carry on then :)

Ritu Khare April 3, 2015

Question: I was under the impression that the workers assessed individual indications rather than all indications within a specific label. Therefore each drug–disease (RxNORM–UMLS) pair should have it's own majority vote. However, the data release appears to be listed in terms of labels rather than indications. Some labels have multiple UMLS diseases but only report the outcome of a single vote. Majority votes should be in terms of indications rather than labels, right?

Answer: You are right: each drug–disease (RxNORM–UMLS) pair should have it's own majority vote and majority votes should be in terms of indications rather than labels. The data is organized in this manner only (one entry = one drug-label/UMLSCUI pair). Each entry in the text file corresponds to one indication candidate (i.e. one disease UMLS-CUI) in a given drug label. The disease CUI is specified in the third field of the file. Also, as you have already noted that there for some entries with two CUIs in the third field. These correspond to composite mentions (e.g."Moderate to severe pain"). Our disease NER module detects two concepts for this phrase ("moderate +pain" and "severe pain") but we present this phrase as a single disease mention to the turkers and hence a single majority vote was computed for both UMLS-CUIs.

We are happy to answer more questions! - LabeledIn Team

Daniel Himmelstein Researcher April 3, 2015

Thanks @ritukhare, we've processed your datasets and combined the expert [1] and crowdsourced [2] indications. The resulting .tsv file is available for download. We provide ingredient and disease names here only for convenience, since our simplistic lookup methodology left many identifiers unnamed.

Specifically, we extracted 1,335 indications from the expert data release and 1,516 indications from the crowdsourced data release. The two sets shared one indication, so merging the two resources resulted in 2850 = 1335 + 1516 - 1 indications.

We calculated the total number of labels reporting each indication. For this task, we assumed study_drug_label_ID was consistent across the expert and crowdsourced datasets. If this assumption is wrong, the effect would be minimal, since the two releases report almost entirely disjoint sets of indications.

Daniel Himmelstein: @ritukhare, if you have a readily-available and exhaustive mapping of ingredient and disease identifiers to names, I could update my analysis with those names.

Ritu Khare April 6, 2015

This is great. Thanks @dhimmel. There should be no confusion with the study_drug_label_ID between the two datasets: In expert-LabeledIn, the values are numbers and in crowd-LabeledIn, the values are concatenation of drug type and a number.

I don't have a readily available mapping of ingredient and disease identifiers to names. Please note that the it would be more appropriate to use the title of drug label (SPL) instead of ingredient name as the title will also contain the dose form information of the drug (and we found that indications may be different between two drugs having same ingredient but different dose form). However, it's your decision.

Daniel Himmelstein Researcher April 8, 2015

we found that indications may different between two drugs having same ingredient but different dose form

@ritukhare, interesting to hear that examples of repurposing frequently relied on different dose forms (and perhaps dosage levels as well). I think we would like to ignore this complexity. In other words, our predicted indications will not include dosage or dose form recommendations. I am comfortable leaving these details for the end users to investigate.

Status: Completed

Labels

Views

147

Topics

Data Processing Indications LabeledIn Crowdsourcing

Referenced by

Cite this as

Daniel Himmelstein, Ritu Khare (2015) Processing LabeledIn to extract indications. Thinklab. doi:10.15363/thinklab.d46

License