Project:
Rephetio: Repurposing drugs on a hetnet [rephetio]

Sounding the alarm on DrugBank’s new license and terms of use


Update on 2016-06-21: The OMx team in charge of DrugBank's licensing was highly responsive and thoroughly addressed the issues I raised. See @cknoxrun's description below of the revised DrugBank licensing.


DrugBank is a database of drugs which includes information on drug biochemistry and pharmacology [1, 2, 3, 4]. DrugBank is a publicly-funded resource that is frequently used for biomedical research. Google Scholar returns 8,940 studies which mention "DrugBank" during its first decade of existence (2006–2015).

When we began using DrugBank for Project Rephetio and Hetionet in 2015, DrugBank had a brief legal statement:

DrugBank is offered to the public as a freely available resource. Use and re-distribution of the data, in whole or in part, for commercial purposes requires explicit permission of the authors and explicit acknowledgment of the source material (DrugBank) and the original publication.

Therefore, in accordance with our data-licensing compliance strategy, we applied a CC BY-NC license to DrugBank data. The non-commercial stipulation was regrettable because it discriminates against a class of user and precludes the data from being open knowledge. Additionally, what qualifies as commercial reuse is unclear and potentially broad making NC stipulations especially toxic [5].

Nonetheless, many publicly-funded resources will specify "academic use only" by habit, without appreciating the full ramifications of their discrimination. Hence, I assumed that DrugBank was effectively an open resource with a poorly-devised license statement.

The release of DrugBank 4.5 on April 20, 2016 dispelled my naiveté. DrugBank has been hijacked by the commercial interests of the University of Alberta and OMx. OMx is a company started in 2012 at the University of Alberta, which currently lists DrugBank as their only product. While the University of Alberta asserts to own DrugBank, OMx is responsible for commercial DrugBank licensing.

Downloading DrugBank now requires registration. To register, users must provide their name, contact information, university/institutional affiliation, and intended use of DrugBank. Additionally, registration is contingent upon entering into the following agreement:

I accept DrugBank's Privacy Policy, Terms of Use, and the above End User License Agreement (required)

By checking the above box, you are confirming that your use of DrugBank shall not be for commercial purposes. If you wish to use DrugBank for commercial purposes, contact info@omx.io to inquire about a separate agreement with OMx.

My next post will examine how the new Privacy Policy, Terms of Use, and License Agreement inhibit reuse of DrugBank. In essence, DrugBank has chosen a path that prioritizes licensing revenue over public reuse. Its heavy-handed legal infrastructure presents a grave barrier to reuse. The academic community must now consider parting ways with what has been a foundational resource.

Decoding the legal texts

Researchers looking to use DrugBank now have quite a bit of reading to do. As described in the 1,424-word Privacy Policy:

You are prohibited from using the Platform unless you fully understand and agree to the Terms of Use and the [Privacy] Policy.

Terms of Use

Let's start with the 3,529-word Terms of Use. Remember that these terms of use claim to govern any use of the Platform, which encompasses the drugbank.ca website. First make sure you understand the terms before using DrugBank:

If you do not agree to or understand all the provisions, terms and conditions set out below, then you may not access the Platform or use any of our services.

Second, it appears that using information from DrugBank in a publication is prohibited:

Users are not permitted to take any information from the site and share it with others ... unless they have obtained OMx’s consent.

In fact, you're not allowed to even copy information:

Users are prohibited from copying, reproducing, downloading, sharing, or storing the Platform, the data contained within DrugBank or any components thereof, to any device or server unless previously authorized to do so by OMx.

Third, it appears that showing a DrugBank page in your presentation is prohibited:

Users are prohibited from providing any display or demonstration of the Platform to others, either publicly or in private, for any purpose whatsoever without the prior express written consent of OMx.

Fourth, users under 18 are denied access:

We do not permit any User under the age of 18 to use our Service or Platform.

Finally, any past permissions you've received are no longer any good:

The Agreement, in combination with all policies and guidelines related thereto (including but not limited to the Privacy Policy), incorporated by reference, constitute the entire agreement between you and OMx and supersede all prior communications, agreements and understandings, written or oral, with respect to the subject matter of the Agreement.

License Agreement

To register with DrugBank, which is necessary for downloading the database, users must agree to a 877-word Non-Commercial End User License Agreement (EULA). Specifically, the agreement states:

Your use of DrugBank Database is governed by a legal agreement between you and OMx consisting of this Non-Commercial End User License Agreement, the Terms of Use (the "Terms") and the Privacy Policy, which you must accept by checking the box indicating your acceptance of this License, the Terms and the Privacy Policy.

Next, the EULA states:

1.2 Access to DrugBank Database is subject to this standard Non-Commercial End User License Agreement, which is Creative Common's Attribution-NonCommercial-ShareAlike 4.0 International License. You must accept the license as an integrated part of this Agreement by checking the box indicating your acceptance.

I am confused because Section 1.2 claims that the EULA is a CC BY-NC-SA 4.0 license. However, the EULA also requires agreeing to the Terms of Use and other conditions that impose additional restrictions on top of CC BY-NC-SA 4.0. Is this a modified Creative Commons license? Or does CC BY-NC-SA 4.0 become the only relevant license and terms once the EULA is accepted? @katiefortney, what's your take here?

The EULA reiterates the age discrimination:

In order to use DrugBank Database you must be 18 years of age or older.

Additionally, it's not clear that you can access the DrugBank Database though other channels even though it's supposedly CC BY-NC-SA 4.0:

You agree not to access (or attempt to access) DrugBank Database by any means other than through the interface that is provided by the Platform (as defined in the Terms and Privacy Policy), unless you have been specifically allowed to do so in a separate agreement with OMx.

Posting a disclaimer is required (which is a condition of the CC license):

You warrant that you shall publish the following disclaimer to third parties

Additionally, users are bound by whatever the current License is at the time of use rather than just the one they agreed to:

You understand and agree that if you use DrugBank Database after the date on which this License, the Terms or the Privacy Policy have changed, we will treat your use as acceptance of the updated License, Terms or Privacy Policy

Finally, the EULA includes the CC definition of commercial use:

Commercial use is one primarily intended for commercial advantage or monetary compensation.

Conclusion

The Terms of Use and License Agreement are deeply troubling. Imposing non-commercial share-alike stipulations is already severely limiting and disqualifies reuse in an open resource. However, the Terms of Use and License exceed CC BY-NC-SA. For example, they add age discrimination and the prohibition on displaying the Platform. Finally, the burden to "fully understand" 8,000+ words of legalese is placed on the academic user, before they are supposed to even access the Platform.

Funding

Ultimately science funders have the most leverage regarding data licensing issues. Public funding agencies and philanthropists are generally interested in making sure the research they fund is shared openly. DrugBank appears to have received both public and private funding. DrugBank has pursued commercial partnerships since at least 2008.

Specific grants

The database has received at least $1,069,848 CAD from the Canadian Institutes of Health Research (CIHR) from the following two grants:

Please leave a note if you find specifics on other DrugBank grants.

Funding Statements

As of May 8, 2016, the DrugBank homepage includes the following funding statement:

This project is supported by the Canadian Institutes of Health Research (award #111062), Alberta Innovates - Health Solutions, and by The Metabolomics Innovation Centre (TMIC), a nationally-funded research and core facility that supports a wide range of cutting-edge metabolomic studies. TMIC is funded by Genome Alberta, Genome British Columbia, and Genome Canada, a not-for-profit organization that is leading Canada's national genomics strategy with $900 million in funding from the federal government. Maintenance, support, and commercial licensing is provided by OMx Personal Health Analytics, Inc.

The DrugBank publications include the following funding statements:

From 2006 [1]:

The authors wish to thank Genome Prairie, a division of Genome Canada for financial support. Funding to pay the Open Access publication charges for this article was provided by Genome Canada.

From 2007 [2]:

The authors wish to thank the Canadian Institutes for Health Research (CIHR), as well as Genome Alberta and Genome Canada for financial support. We are also indebted to the many users of DrugBank who have provided valuable feedback and suggestions. Funding to pay the Open Access publication charges was provided by Genome Alberta.

From 2010 [3]:

Canadian Institutes of Health Research (CIHR); Genome Alberta; Genome Canada; GenomeQuest Inc. Funding for open access charge: CIHR.

From 2013 [4]:

The authors wish to thank Genome Alberta (a division of Genome Canada), The Canadian Institutes of Health Research (CIHR), and Alberta Innovates Health Solutions (AIHS) for financial support. Funding for open access charge: CIHR.

NAR Database Issue Compliance

The last three DrugBank papers have been published in Nucleic Acids Research Database Issue [1, 2, 3]. The NAR guidelines for the Database Issue state that:

Databases must be freely available to all via the web without the need to register or login. If any part of the database (e.g. the one that deals with the user-submitted data) needs to be password-protected, only the freely available part will be considered by the reviewers. Authors are encouraged, but not required, to make the contents of their databases freely available as flat or relational files upon request.

Additionally,

All databases published in the NAR Database Issue are expected to be maintained under the same URL for at least 5 years after the publication data. Graduation or retirement of the database developers is not a valid reason for termination of the database.

I interpret the five year window to mean that DrugBank must be "must be freely available to all via the web without the need to register or login" through 2018. The age restrictions and registration requirements would therefore violate NAR's policy.

Since reviewers are directed to consider "only the freely available part", any future submission of DrugBank to the NAR Database Issue should be treated as lacking a bulk database download. Much of the value contributed by of DrugBank depends on high-throughput applications that use the bulk download. Hence, it would be interesting to see how reviewers respond to this feature being effectively removed.

  • Lars Juhl Jensen: As far as I know, the "freely available to all" is not about data download. By having their current web interface, which allows anyone to freely search DrugBank, they are in compliance with the NAR Database Issue rules. You are right that the bulk download option should be disregarded by the next reviewers.

  • Daniel Himmelstein: @larsjuhljensen, assuming users abide by the Terms of Use, the current web interface now forbids anyone under 18 from freely searching DrugBank.

  • Daniel Himmelstein: The 2013 paper mentions a Data Extractor [1]:

    The new Data Extractor appears in all of DrugBank’s standard browse and search windows, allowing users to easily extract, download and save their current data view at any point.

    The current about page on the website says:

    The Data Extractor is the most sophisticated search tool for DrugBank. Users may download selected text components and sequence data from DrugBank and track the latest DrugBank statistics by clicking on the Download button.

    I don't really understand what features the Data Extractor is referring to. However, if the Data Extractor includes the download functionality, I interpret the NAR policies to preclude password-protecting the version 4.0 download until 2018.

    4
    1.
    DrugBank 4.0: shedding new light on drug metabolism
    V. Law, C. Knox, Y. Djoumbou, T. Jewison, A. C. Guo, Y. Liu, A. Maciejewski, D. Arndt, M. Wilson, V. Neveu, A. Tang, G. Gabriel, C. Ly, S. Adamjee, Z. T. Dame, B. Han, Y. Zhou, D. S. Wishart (2013) Nucleic Acids Research. doi:10.1093/nar/gkt1068
  • Lars Juhl Jensen: You may be right or you may be wrong. You're only asked to accept the terms of use if you sign up for an account to download the files. It is thus unclear to me if the terms of use apply to the download files only or also to the website.

I would say it sounds like they don't understand CC licenses. The statement "Access to DrugBank Database is subject to this standard Non-Commercial End User License Agreement, which is Creative Common's Attribution-NonCommercial-ShareAlike 4.0 International License" is false. It's bad drafting. In a hypothetical universe where someone relied upon the CC BY-NC-SA terms because OMx said their use was subject to the CC license, and then OMx sued them for sharing the database, in such a way that complied with the terms of the CC license but violated OMx's other terms... I don't know. It'd be messy. I think it's most likely the bad drafting is construed against the drafter, but it's possible a court could find that OMx's restrictions were valid since they're more specific. And spread among multiple documents.

  • Lars Juhl Jensen: I completely agree with you that it seems like the people who drafted this license don't understand CC licenses. The license is either CC-BY-NC-SA 4.0 or it is not. The moment you change it in any way, such as adding additional usage restrictions, it is no longer a CC license.

Daniel,

Thanks for your post detailed post detailing your criticisms of the new terms/privacy policy and license for DrugBank. I wanted to reply and clear a couple things up here, and explain some of our thinking:

1) You have some good points and I believe we have made some mistakes in our current documents. I apologize and hope that we can work to improve this to make our users happier. It would be great to have a discussion about ways to improve.

2) Half of the founding members of OMx have been working on DrugBank since it’s inception in 2005. One of the things that has attracted and kept us working on DrugBank over the years is the passionate research community that has used the data to produce interesting and insightful work.

3) The funding situation is not good. Over the past decade we have received significant funding from various levels of the Canadian government. This has allowed us to hire several annotators and programmers, and we have managed to produce several highly cited papers. However, over the past year we have seen several of our colleagues lose their positions (as well as our own) as the funding has disappeared. It seems no funding agency is interested in maintaining highly used databases, regardless of how useful they are.

4) Our goal with DrugBank is to continue to improve and expand the database. The DrugBank database will always be free for non-commercial use, free to use in research, and by the public. At the same time we intend on funding further development by providing companies with affordable licensing, including small startups and enterprise customers. We are also working on some new things that we plan on releasing with fully open licensing.

5) We are trying to give as much away for free as we can. Unlike some of the other databases mentioned on twitter and here (KEGG, etc.), we are not charging non-commercial users. Non-commercial can include corporations, depending on use. I think we need to do more work to explain what entails commercial use.

6) As you pointed out, in the past there was a brief description that included: “Use and re-distribution of the data, in whole or in part, for commercial purposes requires explicit permission of the authors”, some commercial users requested permission, some were granted permission and some were required to get a commercial license, but the majority did not ask for permission. This is one of the things that pushed us to make a signup page for the downloads. Previously it was vague and inconsistently applied, and some groups did not use DrugBank because the licensing was unclear.

7) We attempted to make our license simple and fair and easy for non-commercial researchers to access. I think we have more work to do here. We also agree that we need to address the requirement that our users are 18 or older, this is far too restrictive.

We are new to this, and we are learning as we go, so it is good to get your feedback.
You have brought up some valid criticism that needs to be addressed. We propose that instead of “sounding the alarm”, we start the discussion on how to solve the issues you have brought up. We will be looking more closely at our terms of use in the next week to see how we can move forward in a way that allows us to secure a future for DrugBank as well as alleviate any fears people in the research community may have. We will be setting up a blog as a medium for this discussion, and look forward to working with you and other members of the research and data community.

Thanks,
Mike and Craig
OMx/DrugBank

Thanks @cknoxrun for your helpful comment. I'm happy that you are open to discussion regarding these issues and I'd love to contribute. I really appreciate your quick response and willingness to address the issues we've raised.

Sorry if I made too many assumptions based on our past experiences. Specifically, I assumed DrugBank's licensing was a done deal and in the hands of an intellectual property department that didn't prioritize data reuse. My experience with MIT's technology licensing (primarily regarding MSigDB and less so OmicsIntegrator) led me to "sound the alarm" and reference "the hijacking". I think the close ties between DrugBank and OMx will help resolve any disconnect between the legal and community considerations DrugBank faces.

I understand the difficult realities your team faces when seeking biocuration funding — a point that's been echoed elsewhere. DrugBank is a fantastic resource. Were I to have to choose between no DrugBank and a highly-restrictive DrugBank, I'd choose the later. Additionally, your team has done a great job of responding to demand and providing a clean user experience. Given the hard compromises imposed by the funding environment, I completely understand that releasing DrugBank openly may not be most healthy long-term option for DrugBank's continued upkeep.

That being said I do have a few suggestions, which I'll post next. Once the blog is up and running, we can copy any relevant discussion over.

How to improve the reusability of DrugBank

Here, I'll provide my thoughts on steps DrugBank can take to foster its use in academic research.

A standard license

From a research perspective, having a license that allows for use, redistribution, and modification are of utmost importance. The CC BY-NC-SA 4.0 License mentioned in the EULA would allow for these uses. As pointed out above, some changes will be needed before DrugBank can be clearly considered as having a Creative Commons license.

Redistribution rights are especially important for reproducible science (as some weird situations have shown) [1]. One aspect of redistribution is that users could access DrugBank data from a third party and avoid having to sign up.

I also think it's important to use a standard license, such as a Creative Commons license, rather than a custom one. Especially since most researchers have minimal legal expertise, a standard license helps reduce the burden academic users must endure to use the resource. Given OMx's commercial licensing goals, a CC BY-NC license for the public seems to make sense. I'm not sure what OMx gains by adding the share alike (SA) restriction.

An open vocabulary

A standard non-commercial license like CC BY-NC would enable many academic research uses. For resources such as Hetionet (the network we're creating in Project Rephetio) that aim to integrate knowledge into an open database, the non-commercial stipulation is a disqualifying factor. Here there's no easy solution: one of our main long term goals is to integrate only openly licensed content, which would preclude DrugBank.

In the short term, I have a more pressing concern. Since we use DrugBank to identify compounds in our networks, many of our datasets rely on DrugBank identifiers and drug names. For example, our catalog of medical indications named PharmacotherapyDB [2] and our forthcoming drug repurposing predictions both identify compounds with their DrugBank IDs. We're committed to releasing the results of our research under the CC0 Public Domain Dedication. As it turns out, our results will often include DrugBank identifiers. While I believe that using a portion of the DrugBank vocabulary is likely fair use, reducing uncertainty in arena would be helpful.

Hence, I think it would make sense to release the DrugBank vocabulary under an open license (preferably CC0). By vocabulary, I'm referring the identifying information for each compound such as ID, name, and chemical structure. This would allow researchers to use DrugBank-coded compounds in their integrative research without having to worry about any legality issues.

One final point, I don't want users to feel uncomfortable using our predictions because they include DrugBank identifiers. I hope many of our users will be commercial, and I don't want to scare them away.

Website Terms of Use

Finally, I think the website Terms of Use should be made less draconian. Many of the forbidden uses seem legitimate and even desirable from a business perspective. The current Terms of Service make the website an intimidating platform to use.

  • Craig Knox: Hi Daniel, some excellent ideas here. I think the idea of an open vocabulary makes sense, and we are looking at other data sets as well that would make sense to have under an open license. We are working on improving the website terms of use as well, however keep in mind we are bound by local Alberta privacy laws (this is the origin of the requirement that users are over 18; I believe we'll be able to relax this one with some additional terms regarding consent).

    We really appreciate the time and effort you are putting into this. Thanks for that. Just a heads up that it will take us a few more days to get revised versions of our documents up for review.

I'm assuming Mike and Craig at least contemplated the option of an open/Pro version split. So what precludes this? Wouldn't it make things sooooo much simpler? Cheers

Hi Everyone,

We've done a bunch of work on improving DrugBank's terms of use, license, and privacy policy. It's taken us a little longer to release than we had hoped, but has now been released as version 5.0.0 (http://www.drugbank.ca/releases/latest).

If you click on Downloads the first thing you will notice is that you aren't hit with a login screen. Here are some of the additional steps we've taken:

  1. The licensing is now more clear, the downloads that require an account to access are all released under a straightforward Creative Common’s Attribution-NonCommercial 4.0 International License. The blurb you now agree to when signing up makes this clear (see http://drugbank.ca/public_users/sign_up). The blurb is now just there to provide a bit of extra context and disclaimers to warn against using DrugBank data as medical advice, as well as providing a better definition of what 'Non-commercial' means.

  2. We have created a new 'Open Data' tab. The data here is released under a Creative Common’s CC0 International License, which is public domain. We have 2 datasets here and are working on more. 1) The DrugBank vocabulary which includes DrugBank IDs, names, UNIIs, CAS, InChI Key; 2) all DrugBank structures in SDF format. You do not have to create an account to download this data. We plan on adding more open datasets soon and on a regular basis.

  3. Our terms of use have been updated to make it clear that the data on the DrugBank website can be freely copied and shared (in publications, presentations, documents, etc.)

  4. Additionally the requirement for users being 18 or older has been changed to permit younger users (with parental consent). However, users are still required to be 13 or older.

  5. We have tried to improve our definitions for 'Commercial Use' - we are expanding our FAQ to include more questions people might have about this. We will probably setup a dedicated page with different use cases very soon.

I hope this addresses everyone's concerns. We have a few more datasets that we will be putting in the Open Data section very soon, and hope to continue to push new datasets there regularly. Thanks for your patience and input (especially Daniel and Egon!). If you have any more questions or issues please don't hesitate to comment on here or email me directly at craig@omx.io.

Thanks,
Craig, Mike, and the rest of the OMx/DrugBank team

Closing remarks

Thanks Craig, Mike, and the rest of the OMx/DrugBank team. I really appreciate your commendable communication, dedication to your userbase, and expediency addressing these issues. You've set a high bar for other resources to strive towards. While the overall process took 44 days, the changes were substantial, and it sounds like University approval took a large chunk of the time. In other words, when dealing with major licensing changes, I expect very few prominent resources will deliver in under 44 days.

I am particularly excited about the CC0 subset that includes basic drug information. From an adoption and marketing standpoint, I think this makes sense. For someone who wants to use DrugBank identifiers and not worry about licensing, the change is huge. I'm hopeful that DrugBank taking this major step will motivate other biodata resources (even CC BY ones) to release essential components under CC0.

Also I'm glad you removed the share-alike stipulation. Our experience suggests that license compatibility is a major obstacle to data integration [1]. Hence, applying share alike should be subject to strict scrutiny.

The new licensing arrangement does a good job of balancing free public usage and commercial revenue generation. While for some applications, we may avoid using data with a non-commercial stipulation, it's nice to have the option of NC usage.

 
Status: Completed
Views
553
Topics
Referenced by
Cite this as
Daniel Himmelstein, Katie Fortney, Craig Knox, Christopher Southan (2016) Sounding the alarm on DrugBank’s new license and terms of use. Thinklab. doi:10.15363/thinklab.d213
License

Creative Commons License

Share