DBpedia Life Science Ontologies

This is the first of several blog post regarding the warm up task for my application to GSoC 2017. I have decided to collaborate with DBpedia aligning Life-Science Ontologies.

Let me write this post with an interview style, straight to the point, and facing the same questions that I had at the beginning ;)

What is DBpedia?

DBpedia is a crowd-sourced community effort to extract structured information from Wikipedia and make this information available on the Web. DBpedia allows you to ask sophisticated queries against Wikipedia, and to link the different data sets on the Web to Wikipedia data. DBpedia

What is an Ontology?

An ontology compartmentalizes the variables needed for some set of computations and establishes the relationships between them.LINK

It is thus a practical application of philosophical ontology, with a taxonomy.

Taxonomy is the practice and science of classificationLINK

and what does it have to do with DBpedia?

The DBpedia Ontology is a shallow, cross-domain ontology, which has been manually created based on the most commonly used infoboxes within WikipediaLINK

An infobox is a fixed-format table usually added to the top right-hand corner of articles to consistently present a summary of some unifying aspect that the articles share and sometimes to improve navigation to other interrelated articles.LINK

But if DBpedia uses manual input, and automated learning form wikipedia, What else can we do to improve it?

When it comes to specialized fields with a technical jargon, like in Life-Science, we can use ontologies designed by domain experts which are considered to be authoritative for their domains.

Lets say I want to focus on Gene Ontology, What can I do?

For this we have the Gene Ontology Consortium LINK

The Gene Ontology (GO) project is a collaborative effort to address the need for consistent descriptions of gene products across databases.LINK

You could download the ontology from the web and use it to improve DBpedia.

What about other fields like Celltype Ontology, Disease Ontology? How can I help with that?

You can have a look at BioPortal and AberOWL and find good ontologies there.

For Celltype we would find an ontology like this LINK and interesting extended linked info from the designers LINK

For Disease Ontology we find several options. We can start with the most general Human Disease Ontology

and then keep going with the others more specific ontologies.

Do you imagine how good can be for researchers and professionals to have a tool like DBpedia trained in those fields?

Once I have my brand new Ontology, How do I handle it? What does it contains exactly?

Well, probably you thought that it’s some kind of tree of concepts related between then. And yes, sort of. You can start paying attention to Axioms

The main component of an OWL 2 ontology is a set of axioms — statements that say what is true in the domain. OWL 2 provides an extensive set of axioms, all of which extend the Axiom class in the structural specification.LINK

It is a very good Idea to have a look to that link because it’s full of very easy examples that helps understand these concepts.

Next image is an example of this.

But I am a software developer, all I know is about code, What can I do with all of this?

No problem, you can become a Bioinformatic :) You only have to get familiar with the OWL API and design a smart pipeline between all those independent ontologies and DBpedia, and… to read the next blog post ;)