Edge: a framework for developing collective understanding

Funding opportunity: The Open Science Prize
Awaiting Funder

Idea

We propose to create a mobile application for the Life Sciences, targeting non-technical users, that will provide candidate explanations for relationships between pairs of concepts based on information harvested dynamically from the Web.

Active researchers, curious citizens, and desperate patients have access to a vast collection of associations, both meaningful and imaginary. Zika and microcephaly, NGLY1 deficiency and alacrima, mianserin and life extension, eggs and cholesterol, etc.. For scientists, new candidate associations now routinely emerge from high-throughput screens. For the general public, media-friendly, often vague and sometimes completely unfounded associations are a daily experience. For patients and their family members, the associations are personal and directly observed: taking a drug and having a reaction, having a rare genetic mutation and whole constellation of associated symptoms, eating bread and having an upset stomach, etc. etc.. The question that unites all of these associations is “why?”.

Currently, people primarily turn to Google searches for answers to this question, yet lists of documents where terms co-occur are a far cry from a direct, well-supported explanation of the relationships that hold or do not hold between two concepts. This is one reason that all of the major search engines are building “knowledge graphs” in their attempts to improve their services https://www.google.com/intl/bn/insidesearch/features/search/knowledge.html . Knowledge graphs are structured networks of concepts linked together with semantic relationships. These networks, whose roots lie in decades of research in artificial intelligence research, are the building blocks of true question answering services.

Knowledge graphs, generated by human knowledge engineers, machine reading of the literature, and all manner of increasingly accessible databases can help provide evidence supporting, refuting or explaining potential relationships. Yet, to date, there is a substantial divide between this potential and its realization - especially for non-data scientists. This divide is the result of both the technical challenges of gathering, integrating and querying distributed information and of delivering effective user interfaces.

Today, the maturing work of the Linked Data community is making it possible to dynamically gather the contents of knowledge graphs through SPARQL endpoints such as those provided by the EBI https://www.ebi.ac.uk/rdf/ and an increasing number of Web APIs such as OpenPHACTS https://dev.openphacts.org/docs/2.0, http://mygene.info, and http://query.wikidata.org that implement the key Linked Data principles of unique concept identifiers and standard data formats such as JSON-LD. With access to these services the task of assembling knowledge graphs on demand is approachable, yet this is still not enough. In the context of explanation generation, the formalization of knowledge as a graph of concept nodes generates two important requirements:

  • Given an edge connecting two nodes, provide evidence for or against it.
  • When nodes are not directly connected (or if more evidence is desired for a particular edge), use intervening nodes to provide candidate explanations for associations observed or hypothesized to exist between those nodes.

We propose to address the first challenge using the nanopublication model [http://nanopub.org/]. Data providing evidence for or against an edge will be gathered from nanopublication stores or dynamically structured according to this data model facilitating dynamic integration and display. For the second challenge we propose to enhance current graph algorithms (e.g. shortest path) using the paradigm of semantic storytelling. Experts will help design 'stories' that form the structure of meaningful explanations for associations between concepts. These stories will be captured as automated workflows, each with 2 concepts as inputs (e.g. a disease and drug) and candidate, human readable explanations as output. We will apply these technical innovations in the context of an application designed to bring the biomedical knowledge of the Web into the hands of any person with access to a mobile phone.