Skip to main content

Table 3 Overview of software systems designed to integrate data from multiple databases

From: Clinical decision support systems for improving diagnostic accuracy and achieving precision medicine

Database integration system

Purpose

License

Update method

# of databases

Databases integrated

Company

Atlas [ 90 ]

“A biological data warehouse called Atlas that locally stores and integrates biological sequences, molecular interactions, homology information, functional annotations of genes, and biological ontologies”.

Open Source

Manual

13

GenBank, RefSeq, UniProt, Human Protein Reference Database (HPRD), Biomolecular Interaction Network Database (BIND), Database of Interacting Proteins (DIP), Molecular Interactions Database (MINT), IntAct, NCBI Taxonomy, Gene Ontology (GO), Online Mendelian Inheritance in Man (OMIM), LocusLink, Entrez Gene and HomoloGene

British Columbia University - Vancouver, BC

Biowarehouse [ 91 ]

“An open source toolkit for constructing bioinformatics database warehouses using the MySQL and Oracle relational database managers. Integrates multiple public bioinformatics databases into a single relational database system within a common bioinformatics schema”.

Open Source

Dependent on the individual databases

12

ENZYME, KEGG, BioPax, Eco2dbase, Metacyc, Mage-ML and BioCyc, UniProt, GenBank, NCBI Taxonomy, CMR databases, and Gene Ontology.

Stanford Research Institute – Menlo Park, Ca

Columba [ 92 ]

“Facilitates the creation of protein structure data sets for many structure-based studies. It allows combining queries on a number of structure-related databases not covered by other projects at present”.

Free-Use

Dependent on the individual databases

12

PDB, SCOP, CATH, DSSP, ENZYME, Boehringer, KEGG, Swiss-Prot, GO, GOA, Taxonomy, PISCES

Humboldt-Universität zu Berlin – Berlin Germany

Systomonas [ 93 ]

“To provide an integrated bioinformatics platform for a systems biology approach to the biology of pseudomonads in infection and biotechnology”.

Free-Use

Unknown

4

KEGG, Pseudomonas Genome Database v2, PRODORIC, and BRENDA

Technische Universität Braunschweig - Braunschweig, Germany

Oncomine [ 94 ]

“A cancer microarray database and web-based data-mining platform aimed at facilitating discovery from genome-wide expression analyses”.

Free-Use, Subscription-based for expanded functionality

Annually for Free Version, Regular data updates for subscription

-

65 Gene expression datasets, from 4700 microarray experiments.

Life Technologies Corporation

Biomart [ 95 ]

“BioMart enables scientists to perform advanced querying of biological data sources through a single web interface. The power of the system comes from integrated querying of data sources regardless of their geographical locations”.

Open Source

Unknown

25 (as of 2009), 46 as of 5/2014

Ensembl Genes, Ensembl Homology, Ensembl Variation, Ensembl Genomic Features, Vega, HTGT, Gramene, Reactome, Wormbase, Dictybase, RGD, PRIDE, EURATMart, MSD, Uniprot, Pancreatic Expression Database, PepSeeker, ArrayExpress, GermOnLine, DroSpeGe, HapMap, VectorBase, Paramecium, Eurexpress, Europhenome

Collaboration between many institutes and Universities.

Ondex [ 96 ]

“The Ondex data integration platform enables data from diverse biological datasets to be linked, integrated and visualised through graph analysis techniques. Ondex can be used in a number of important application areas such as transcription analysis, protein interaction analysis, data mining and text mining”.

Open Source

Unknown

28

AraCyc, AtRegNet, BioCyc, BioGRID, Brenda, Cytoscape, EcoCyc, GOA, Gramene, Grassius, KEGG, Medline, MetaCyc, O-GlycBase, OMIM, PDB, Pfam, Prolog (limited functionality), SGD, TAIR, TIGR, Transfac, transpath, UniProt, WordNet, ChEBI, ChEMBL, GFF3

Rothamsted Research Harpenden, UK

InterMine [ 97 ]

“InterMine is an open-source data warehouse system that facilitates the building of databases with complex data integration requirements and a need for a fast customizable query facility. Using InterMine, large biological databases can be created from a range of heterogeneous data sources, and the extensible data model allows for easy integration of new data types”.

Open Source

Unknown

23

GO Annotation, GO OBO, Treefam, Homologene, OrthoDB, Panther, Ensembl, Compara, BioGRID, IntAct, PSI-MI Ontology, KEGG, Reactome, UniProt, Protein Data Bank, InterPro, PubMed, Ensembl SNP, Chado, Ensembl Core, FASTA, GFF3, OMIM, Uberon

University of Cambridge - Cambridge, United Kingdom

Scan-MarK [ 65 ]

“An integrated, growing biomarker repository of over 2,000 breast, ovarian, colorectal, non-Hodgkin’s lymphoma and melanoma biomarkers mined and manually curated by PhD. scientist from full-text papers. Annotations include 33 critical data elements (CDEs) organized in computable Sophic Cancer Biomarker Objects (SCBOs). SCan-MarK allows researchers to mine, explore and expose complex biomarker, disease, treatment, outcome relationships graphically displayed as knowledge networks”.

Free Trial

Manual

30

Examples: TCGA, dbSNP, Cancer Gene Index, Drugbank, PDB, Sophic’s non-redundant Sanger COSMIC, Medline, ENSEMBL, ENZYME, Go, Interpro, Pfam, Pubchem, Unigene, Taxonomy, Uniprot, Refseq, Entrezgene, Reactome Pathway

Sophic Alliance

  1. Displayed are current software solutions whose primary goal is to facilitate research workflow through data-mining algorithms. These software solutions range from open-source to paid subscription, and target specific subgroups of scientists. A common underlying goal amongst these examples is centralization of multiple databases through the use of algorithms and standardization.