Table 3 Overview of software systems designed to integrate data from multiple databases

From: Clinical decision support systems for improving diagnostic accuracy and achieving precision medicine

Database integration system Purpose License Update method # of databases Databases integrated Company
Atlas [ 90 ] “A biological data warehouse called Atlas that locally stores and integrates biological sequences, molecular interactions, homology information, functional annotations of genes, and biological ontologies”. Open Source Manual 13 GenBank, RefSeq, UniProt, Human Protein Reference Database (HPRD), Biomolecular Interaction Network Database (BIND), Database of Interacting Proteins (DIP), Molecular Interactions Database (MINT), IntAct, NCBI Taxonomy, Gene Ontology (GO), Online Mendelian Inheritance in Man (OMIM), LocusLink, Entrez Gene and HomoloGene British Columbia University - Vancouver, BC
Biowarehouse [ 91 ] “An open source toolkit for constructing bioinformatics database warehouses using the MySQL and Oracle relational database managers. Integrates multiple public bioinformatics databases into a single relational database system within a common bioinformatics schema”. Open Source Dependent on the individual databases 12 ENZYME, KEGG, BioPax, Eco2dbase, Metacyc, Mage-ML and BioCyc, UniProt, GenBank, NCBI Taxonomy, CMR databases, and Gene Ontology. Stanford Research Institute – Menlo Park, Ca
Columba [ 92 ] “Facilitates the creation of protein structure data sets for many structure-based studies. It allows combining queries on a number of structure-related databases not covered by other projects at present”. Free-Use Dependent on the individual databases 12 PDB, SCOP, CATH, DSSP, ENZYME, Boehringer, KEGG, Swiss-Prot, GO, GOA, Taxonomy, PISCES Humboldt-Universität zu Berlin – Berlin Germany
Systomonas [ 93 ] “To provide an integrated bioinformatics platform for a systems biology approach to the biology of pseudomonads in infection and biotechnology”. Free-Use Unknown 4 KEGG, Pseudomonas Genome Database v2, PRODORIC, and BRENDA Technische Universität Braunschweig - Braunschweig, Germany
Oncomine [ 94 ] “A cancer microarray database and web-based data-mining platform aimed at facilitating discovery from genome-wide expression analyses”. Free-Use, Subscription-based for expanded functionality Annually for Free Version, Regular data updates for subscription - 65 Gene expression datasets, from 4700 microarray experiments. Life Technologies Corporation
Biomart [ 95 ] “BioMart enables scientists to perform advanced querying of biological data sources through a single web interface. The power of the system comes from integrated querying of data sources regardless of their geographical locations”. Open Source Unknown 25 (as of 2009), 46 as of 5/2014 Ensembl Genes, Ensembl Homology, Ensembl Variation, Ensembl Genomic Features, Vega, HTGT, Gramene, Reactome, Wormbase, Dictybase, RGD, PRIDE, EURATMart, MSD, Uniprot, Pancreatic Expression Database, PepSeeker, ArrayExpress, GermOnLine, DroSpeGe, HapMap, VectorBase, Paramecium, Eurexpress, Europhenome Collaboration between many institutes and Universities.
Ondex [ 96 ] “The Ondex data integration platform enables data from diverse biological datasets to be linked, integrated and visualised through graph analysis techniques. Ondex can be used in a number of important application areas such as transcription analysis, protein interaction analysis, data mining and text mining”. Open Source Unknown 28 AraCyc, AtRegNet, BioCyc, BioGRID, Brenda, Cytoscape, EcoCyc, GOA, Gramene, Grassius, KEGG, Medline, MetaCyc, O-GlycBase, OMIM, PDB, Pfam, Prolog (limited functionality), SGD, TAIR, TIGR, Transfac, transpath, UniProt, WordNet, ChEBI, ChEMBL, GFF3 Rothamsted Research Harpenden, UK
InterMine [ 97 ] “InterMine is an open-source data warehouse system that facilitates the building of databases with complex data integration requirements and a need for a fast customizable query facility. Using InterMine, large biological databases can be created from a range of heterogeneous data sources, and the extensible data model allows for easy integration of new data types”. Open Source Unknown 23 GO Annotation, GO OBO, Treefam, Homologene, OrthoDB, Panther, Ensembl, Compara, BioGRID, IntAct, PSI-MI Ontology, KEGG, Reactome, UniProt, Protein Data Bank, InterPro, PubMed, Ensembl SNP, Chado, Ensembl Core, FASTA, GFF3, OMIM, Uberon University of Cambridge - Cambridge, United Kingdom
Scan-MarK [ 65 ] “An integrated, growing biomarker repository of over 2,000 breast, ovarian, colorectal, non-Hodgkin’s lymphoma and melanoma biomarkers mined and manually curated by PhD. scientist from full-text papers. Annotations include 33 critical data elements (CDEs) organized in computable Sophic Cancer Biomarker Objects (SCBOs). SCan-MarK allows researchers to mine, explore and expose complex biomarker, disease, treatment, outcome relationships graphically displayed as knowledge networks”. Free Trial Manual 30 Examples: TCGA, dbSNP, Cancer Gene Index, Drugbank, PDB, Sophic’s non-redundant Sanger COSMIC, Medline, ENSEMBL, ENZYME, Go, Interpro, Pfam, Pubchem, Unigene, Taxonomy, Uniprot, Refseq, Entrezgene, Reactome Pathway Sophic Alliance
  1. Displayed are current software solutions whose primary goal is to facilitate research workflow through data-mining algorithms. These software solutions range from open-source to paid subscription, and target specific subgroups of scientists. A common underlying goal amongst these examples is centralization of multiple databases through the use of algorithms and standardization.