Project:
Rephetio: Repurposing drugs on a hetnet [rephetio]

Hosting Hetionet in the cloud: creating a public Neo4j instance


One month ago, I asked a Stack Overflow question on how to host a public (read only) Neo4j instance in the cloud. Now, our Neo4j instance is up and running, serving Hetionet over the World Wide Web at neo4j.het.io. Below is a screenshot of what the Neo4j Browser looks like:

neo4j-browser

The public browser let's anyone immediately interact with Hetionet. Since most potential users may not know the Cypher query language, we created a guide to help bring users up to speed. In addition to accessing Hetionet through the Neo4j Browser, users can programmatically query the Neo4j server using one of the many supported languages.

Technical details

Creating a public Hetionet instance using Neo4j was gratifying because it reinforced the benefits of migrating our hetnet infrastructure to Neo4j [1]. We're promptly leveraging advanced Neo4j capabilities that would have been impractical to develop in house. In fact, we really pushed Neo4j to its limits and submitted several help/feature requests and bug reports. Many of the features we're using just became available in Neo4j 3.0 released on April 26, 2016. Currently, our server runs Neo4j 3.0.2.

Many of the issues we faced had to do with Neo4j Browser Guides. We use guides to provide documentation, examples, and explanations from within the Neo4j Browser. For example, in the screenshot above, the "Hetionet in Neo4j" frame is a guide. We wrote our guides in asciidoc and converted them to HTML using jexp/neo4j-guides.

Hosting the HTML guides in a way that the Neo4j Browser could access was difficult. We ran into CORS issues but were eventually able to use the guide-extension to serve files from within the Neo4j instance (see issues a, b, c) and circumvent the CORS problems.

We also made several tweaks to make our Neo4j server appropriate for public access. We disabled authentication, set a query execution timeout, created a GRASS style, and enabled web analytics. As far as I'm aware, there are not many other groups hosting public Neo4j instances. I did draw inspiration from ryguyrg/panama-neo4j which makes it easy for users to create a Neo4j server containing the Panama Papers Hetnet. However, users still have to host the server locally. Despite being uncommon, there shouldn't be any licensing issues with hosting a public server [2].

Michael Hunger, Oskar Hane, Christophe Willemsen, and stdob were super helpful — thanks! During the process we suggested several enhancements including multiple post-connect commands, setting AUTO-COMPLETE to OFF by default, play-topic support for URLs with capitalized characters, and relative path support for playing guides.

Cloud hosting

We used DigitalOcean for our cloud hosting. They have a Ubuntu 14.04 Droplet with Docker 1.11.1 preconfigured, which reduced setup to a minimum. We went with a 2 GB Memory / 40 GB Disk Droplet that costs $20 a month and is located in a datacenter near New York City. DigitalOcean compared favorably to AWS in terms of price–performance ratio and user interface.

HTTPS

I added HTTPS/TLS support using Let's Encrypt, which @alizee previously recommended [3]. The certbot python package made obtaining the certificates easier than I expected (code). The one potential downside to Let's Encrypt is that I had to schedule a cron job to shutdown the Neo4j server and attempt certificate renewal on a weekly basis. This will result in short service outages. One advantage of enabling encryption is that users do not have to accept a self-signed certificate to use Bolt from within the Neo4j Browser. Bolt is the new binary protocol to enable more efficient communication with the Neo4j server.

Docker

We created a Docker image (dhimmel/hetionet) for running the Hetionet Neo4j server. Our docker extends the official Neo4j docker and automatically downloads and configures the Hetionet database and guides. In addition to making deployment automatable and reliable, Docker promises to improve the reproducibility of computation [4, 5, 6, 7].

Suggestions

If you have suggestions on how to improve our Neo4j Browser or questions on how to use it, please don't by shy.

Lysenko et al 2016

I just came across a study titled "Representing and querying disease networks using graph databases" [1] that was published on July 25, 2016. The study takes a similar approach to Hetionet: a hetnet was created from publicly available data to encode disease biology.

The code to produce the database is on GitHub at ibalaur/ProteinFramework. Additionally, there is a corresponding public Neo4j instance at https://diseaseknowledgebase.etriks.org/protein/browser/.

Since the paper was submitted on December 16, 2015, the availability of their public Neo4j instance likely predates https://neo4j.het.io/browser/. It's reassuring that it's still running, although the instance does not appear to be read-only, which means that anyone can modify it. For example, if a new user tries out the builtin movie graph guide, they will end up creating nodes and relationships.

Update on 2017-07-21: I just came across another study by the same group that created a metabolomics hetnet and Neo4j browser [2].

DigitalOcean Sponsorship

DigitalOcean has sponsored the Hetionet Browser for at least the next year, as part of their program to support open source (see Tweet). As a reminder, our hosting costs are $20 a month for https://neo4j.het.io. Thanks DigitalOcean!

From this and other projects, I've found the user interfaces and prices at DigitalOcean and Google Cloud are favorable compared to Amazon Web Services.

 
Views
292
Topics
Referenced by
Cite this as
Daniel Himmelstein (2016) Hosting Hetionet in the cloud: creating a public Neo4j instance. Thinklab. doi:10.15363/thinklab.d216
License

Creative Commons License

Share