We would like to validate our predictions by assessing whether we highly rank treatments that are under investigation in clinical trial. Clinical trials are generally supporting by a large body of preclinical evidence including animal studies , which makes them a rich resource of drug efficacy.
The ClinicalTrials.gov database is a "registry and results database of publicly and privately supported clinical studies of human participants conducted around the world." Previous studies that have computationally predicted pharmacotherapies have used ClinicalTrials.gov to evaluate their performance [2, 3].
Several existing studies have investigated the utility of ClinicalTrials.gov data from various angles [4, 5, 6, 7, 8, 9, 10, 11]. In this discussion, we will focus on constructing a catalog of potential indications under investigation by clinical trial. Any advice on using analyzing ClinicalTrials.gov data will be appreciated.
Initial catalog of drug–disease therapies
I created an initial catalog of drug–disease therapies from ClinicalTrials.gov. The process consisted of downloading all study records and extracting the MeSH-coded interventions and conditions (notebook). Next, I mapped the MeSH diseases to Disease Ontology terms and mapped MeSH compounds to DrugBank via DrugCentral (notebook).
The resulting catalog consists of 158,767 trial–drug–disease relationships between 1,181 drugs and 1,617 diseases from 42,826 trials. Note that trials will often assess multiple drug interventions and even conditions, resulting in multiple drug–disease pairs for a single trial. The total number of distinct drug–disease therapies extracted was 33,095.
Project Rephetio uses a subset of 137 diseases called DO Slim and a subset of all drugs called DrugBank Slim. Thus, I created a slim trial–drug–disease catalog for use with our project (dataset). Transitive closure was used to propagate therapies to DO Slim diseases from their subtypes. The slim clinical trial catalog contains 6,382 drug–disease pairs for 794 compounds and 130 diseases that incorporate 27,240 trials.
Danny TY Wu, David A Hanauer, Qiaozhu Mei, Patricia M Clark, Lawrence C An, Joshua Proulx, Qing T Zeng, VG Vinod Vydiswaran, Kevyn Collins-Thompson, Kai Zheng (2015) Journal of the American Medical Informatics Association. doi:10.1093/jamia/ocv062