We're building an open project to predict new uses for existing drugs. The core of our project is a network with multiple types of nodes and relationships (hetnet) . We use neo4j to store, explore, and quantify this hetnet.
With regards to licensing, there are several ways we plan to use the software:
distributing the hetnet — we strive to make our hetnet reusable and extensible. Therefore, we need convenient and reliable formats for distributing the network. One format we'd like to use for distribution is an archive file of the database. For example, a tarball of data/graph.db/. In addition to an archive of just the database location, we would like to distribute an archive of the binary with the database included. So for example, a user could extract a single archive containing the neo4j server with our database and configuration already loaded.
quantifying the hetnet — our method for relationship prediction relies on extracting network features, which quantify the prevalence of specific path types between two nodes. We implemented the algorithm in cypher. We read that the enterprise version's high-performance cache "can provide up to 10x the performance under concurrent load." However, for our application the enterprise edition did not provide a performance improvement over the community edition (notebook). Therefore, we are inclined to stick to the community version, but may consider enterprise to scale up our query throughput via the high-availability cluster.
exploring the hetnet — we have been using the neo4j browser as a GUI to interact with the network. Eventually, we would like to host a publicly-accessible neo4j server which allows anyone to interact with our hetnet from their web browser.
We try to adhere to the following project ground-rules:
releasing all of our original data and source code as CC0.
avoiding dependencies that forbid distribution, to prevent situations where others cannot reproduce our science due to access issues.
Simply, if you are open source, then Neo4j is open source; if you are closed source, then Neo4j is commercial.
Our project is open source, although we use CC0 rather than a copyleft license. We do not modify neo4j's source code, although we do want to distribute the compiled version (see 1 above). Finally, it's unclear exactly which aspects of our project are affected by neo4j's license.
Given these complexities, we will reach out to neo4j for guidance. Specifically, we're interested in what restrictions neo4j's license places on our three use cases. And if there are restriction, could we obtain an educational or research license permitting our use?
Licensing and usage options
Trey Knowles, the Startup Community Manager at Neo Technology, assisted us with our licensing questions.
To clarify the situation, we specified four licensing options:
A. Community edition with the default GPLv3 license B. Community edition with a contractual education license C. Enterprise edition with the default AGPLv3 license D. Enterprise edition with a contractual education license
And then we specified four desired uses of Neo4j for our project:
distribute the neo4j binaries with our network preloaded
run internal database queries
make a publicly-accessible neo4j server instance
release all of our code and data as CC0
Trey provided the following answer, reproduced here with permission, on which of our desired uses are allowed by each license:
A. Community edition with the default GPLv3 license
2 (single server, no clustering / live backups)
3 (single server, no clustering / live backups)
B. Community edition with a contractual education license
Neo Technology offers no contracts with respect to CE
C. Enterprise edition with the default AGPLv3 license
1 provided you license your work as AGPLv3
2 provided you license your work as AGPLv3
3 provided you license your work as AGPLv3
4 provided you license your work as AGPLv3
D. Enterprise edition with a contractual education license
4(pending legal review from Neo team)
Since AGPLv3 would place undesirable restrictions on our work compared to CC0, C is not a good option for us. Therefore, we'll choose between A and D. We will default to A (GPLv3-licensed community edition) unless enterprise features are needed in which case we'll explore D (contractually-licensed enterprise edition). Sticking with the community edition whenever possible will lessen the burden on anyone wanting to replicate or reuse or work.