LINCS L1000 licensing

Daniel Himmelstein

doi:10.15363/thinklab.d110

Project:

Rephetio: Repurposing drugs on a hetnet [rephetio]

LINCS L1000 licensing

Daniel Himmelstein Researcher Sept. 28, 2015

We're currently using LINCS L1000 data for compound–gene and gene–gene edges in our network. Thus far we have developed methods for computing consensus expression signatures and mapping LINCS compounds to other identifier systems. However, @larsjuhljensen pointed out that the license requires permission for redistribution:

If you have a derivative work that is significantly different from what we provide and you would like to distribute it, please contact us with the details. Our goal is to encourage significant improvements while maintaining provenance and reproducible research standards.

Therefore, we have emailed the LINCS L1000 team with the following permission request. We will post any updates regarding licensing or permissions on this discussion.

Greetings LINCS L1000 Team,

I am a graduate student at UCSF, and I have been using LINCS L1000 for my research. My project aims to predict new uses for existing drugs by integrating many different types of biomedical information. Recently, the issue of database copyright and licensing came up, and we are now trying to ensure that we have sufficient permissions for each of the 28 databases we're integrating.

Currently, several resources I have created may be non-compliant with the license.

My GitHub repository (dhimmel/lincs) contains:

Python code from cmap/l1ktools/python/cmap
Data retrieved from the API in an unmodified json format and a condensed tsv format.
Consensus signatures for DrugBank compounds, gene overexpressions, gene knockdowns, and perturbations. Our consensus signatures combine z-scores from multiple signatures. We computed our signatures using a method suggested to us during LINCS office hours, with some modifications.
Our .gitignore file prevents the following items from being uploaded to the repository: our private API key, modzs.gctx, and a local database (l1000.db) that is too large for GitHub.
An archived version of this repository is hosted on Zenodo [1].

My GitHub repository (dhimmel/integrate) for integrating many resources into a single network contains:

Consensus signatures for DrugBank compounds and genetic perturbations (gene overexpressions and knowdowns) encoded as network nodes and edges.

@leobrueggeman assisted with the LINCS analysis. His GitHub repository (LABrueggs/L1000) contains elements similar to dhimmel/lincs discussed above. Two files of consensus signatures from his repository are posted to figshare [2].

The public availability of the aforementioned resources is important so others can reproduce and build off of our work. We have attempted to provide sufficient information for provenance and reproducibility but are happy to make any modifications to assist in these regards.

Thus, we request permission for our current usage of LINCS L1000 data. Ideally, we could be granted permission to release the data under a Creative Commons license without a No Derivatives restriction. Applying a CC license would lessen the burden on downstream users.

Thanks for your consideration. Our research is academic in nature, and we suspect it is in line with the intended use of LINCS.

Finally, we're performing our project using an open science platform called Thinklab. I've posted a copy of this email on Thinklab and will update the discussion with our progress. Alternatively, feel free to respond via Thinklab rather than email. By detailing each step of our research process publicly, we're hoping to create a valuable resource and explore a more holistic and collaborative medium of publication.

Sincerely,
Daniel

Daniel Himmelstein Researcher Oct. 19, 2015

On October 14, Aravind Subramanian, a member of the LINCS team at the Broad, replied to our email. He wrote (posted here with permission):

You are free to redistribute your re-processing of the Broad LINCS data. We are working on a manuscript describing L1000 and the dataset.

And continued:

But if you believe your work would be valuable, we don't want our publication needs to hold up access for the field, so kindly proceed as you see fit

Aravind took the position that there is no formal license from the Broad Institute and that the LINCS L1000 licensing is determined by the NIH — the Broad and L1000 team do not apply any additional restrictions. While the original license from www.lincscloud.org/license/ suggested otherwise, the following update was added:

Update - October 14, 2015
All LINCS Production Phase L1000 data generated by the Broad Institute is posted at the NCBIs Gene Expression Omnibus (GEO). Standard NIH data access rules apply - data is freely accessible by anyone (GEO BioProject ID PRJNA290347)
The website lincscloud.org, a Broad Institute developed resource for analysis of LINCS Phase 1 (2011-2014) data, will be deprecated in 2015 as the NIH has recently funded a separate LINCS Data Coordination and Integration Center (DCIC).
Our historic license is given below for reference, but the official information on access to all LINCS resources via the DCIC is available at lincsproject.org.

The update specifies that LINCS data will be deposited in GEO. However, GEO availability does not grant usage rights since,

some submitters may claim patent, copyright, or other intellectual property rights in all or a portion of the data they have submitted.

The update further specifies that the DCIC is the authoritative source for LINCS licensing. Their data release policy states:

LINCS data are released with the sole restriction that they must be correctly cited so that others can establish provenance and access the original data

Conclusion

We have permission to distribute our L1000 datasets. The formal LINCS data policy, which covers the L1000 project, requires attribution. Therefore, we will release our LINCS datasets as CC-BY.

The LINCS project and the L1000 team especially have done a laudable job sharing their data and providing support. Clearly and explicitly specifying the license of all public datasets will help remove any uncertainty and avoid laborious permission requests.

Status: Completed

Labels

Views

389

Topics

Licensing Copyright Permissions L1000 MIT LINCS

Referenced by

Integrating resources with disparate licensing into an open network
Research report: Rephetio: Repurposing drugs on a hetnet

Cite this as

Daniel Himmelstein (2015) LINCS L1000 licensing. Thinklab. doi:10.15363/thinklab.d110

License