Extracting side effects from SIDER 4

SIDER is a project to extract side effects from drug labels [1], originally motivated by off-target prediction [2]. We evaluated version 2 and produced an online tutorial. We found that side effect similarity was a weak predictor of chemical and indication similarity.

Just two days ago, version 4 was released. Here, we will detail our extraction of side effects from SIDER4.

Daniel Himmelstein Researcher

Data release formatting

We ran into some issues when parsing the SIDER4 datasets. In defense of the creators, version 4 is still in beta and hasn't become the default version (the url was provided to us by a project member).

The remainder of the post refers to this notebook. I ran into the following issues:

• label_mapping.tsv.gz is strangely encoded and/or is improperly tab-delimited
• meddra_all_indications.tsv.gz is not documented in the README

@larsjuhljensen, are you the right contact for this project?

• Lars Juhl Jensen: No, the right person to contact is Michael Kuhn.

• Daniel Himmelstein: I contacted Michael Kuhn. label_mapping.tsv.gz was not essential and was removed. Documentation was added to the README for meddra_all_indications.tsv.gz.

Daniel Himmelstein Researcher

Initial processing complete

We've completed a first pass off the SIDER 4 data processing (notebook, downloads). Our analysis consisted of mapping STICH [1, 2] compounds to DrugBank and consolidating duplicate rows.

We added the side effects extracted from meddra_all_se.tsv.gz to our network. Overall, the resource contributed 139,235 compound-side effect relationships for 5,745 side effects.

Data quality

Compared to version 2, I subjectively noticed a considerable quality improvement. However, many of the problems inherent to label based NLP extraction remain. I think there are two potential methods for extracting higher confidence side effects:

1. Number of labels approach: Most drugs have multiple labels. Side effects reported by more labels may be of higher quality. Amphetamine is a good example.
2. Frequency approach: Some side effects have associated frequency information. Placebo comparisons are also sometimes present. Thus enrichment in frequency compared to placebo, other drugs, or a cutoff is feasible. Ibuprofen is a good example.

The current data release may be insufficient to apply these methods. More documentation is needed. Judging from the webapp the underlying database would support both methods.

Status: Completed
Views
188
Topics
Referenced by
Cite this as
Daniel Himmelstein (2015) Extracting side effects from SIDER 4. Thinklab. doi:10.15363/thinklab.d97