In the post-genomic era, life science researchers are faced with the need to manage and inspect a growing abundance of data and information. Data from different sources, both public and proprietary, have the most value when considered in the context of each other as they give more information. In order to answer questions that spans multiple fields in the biology domain without an integrated approach, a biologist needs to visit all data sources related to the problem and find relevant data. In the last years we have become witnesses of a growing interest for the Semantic Web technologies to integrate and query biological data. Semantic Web technologies were designed to meet the challenges of reduce the complexity of combining data from multiple sources, resolve the lack of widely accepted standards and manage highly distributed and mutable resources. However, Semantic Web standard technologies do not provide any tools to query integrated knowledge bases from a graph perspective, that is defining graph traversal patterns. For example, it is not possible to ask a query like "are enzyme A and compound B related?" without knowing the complete structure of the knowledge base. After exploring different alternatives we come up with the use of a graph traversal programming language on top of a triplestore in order to perform several path traversal queries on an integrated graph. We tested the feasibility of the approach integrating Uniprot, Gene Ontology, Chebi and Kegg resources posing queries of different complexity.
Moretto, M.; Cestaro, A.; Blanzieri, E.; Velasco, R. (2011). Graph-based queries of Semantic Web integrated biological data. In: 19th Annual International Conference on Intelligent Systems for Molecular Biology and 10th European Conference on Computational Biology, Wien, 18 July 2011. url: http://www.iscb.org/ismb-mm/media-ismbeccb2011 handle: http://hdl.handle.net/10449/20876
Graph-based queries of Semantic Web integrated biological data
Moretto, Marco;Cestaro, Alessandro;Velasco, Riccardo
2011-01-01
Abstract
In the post-genomic era, life science researchers are faced with the need to manage and inspect a growing abundance of data and information. Data from different sources, both public and proprietary, have the most value when considered in the context of each other as they give more information. In order to answer questions that spans multiple fields in the biology domain without an integrated approach, a biologist needs to visit all data sources related to the problem and find relevant data. In the last years we have become witnesses of a growing interest for the Semantic Web technologies to integrate and query biological data. Semantic Web technologies were designed to meet the challenges of reduce the complexity of combining data from multiple sources, resolve the lack of widely accepted standards and manage highly distributed and mutable resources. However, Semantic Web standard technologies do not provide any tools to query integrated knowledge bases from a graph perspective, that is defining graph traversal patterns. For example, it is not possible to ask a query like "are enzyme A and compound B related?" without knowing the complete structure of the knowledge base. After exploring different alternatives we come up with the use of a graph traversal programming language on top of a triplestore in order to perform several path traversal queries on an integrated graph. We tested the feasibility of the approach integrating Uniprot, Gene Ontology, Chebi and Kegg resources posing queries of different complexity.File | Dimensione | Formato | |
---|---|---|---|
2011 ISCB-ECCB poster_moretto_cestaro.pdf
accesso aperto
Licenza:
Creative commons
Dimensione
2.55 MB
Formato
Adobe PDF
|
2.55 MB | Adobe PDF | Visualizza/Apri |
Questo articolo è pubblicato sotto una Licenza Licenza Creative Commons