K-core decomposition of a protein domain co-occurrence network reveals lower cancer mutation rates for interior cores
© Emerson et al.; licensee BioMed Central. 2015
Received: 18 November 2014
Accepted: 18 February 2015
Published: 3 March 2015
Network biology currently focuses primarily on metabolic pathways, gene regulatory, and protein-protein interaction networks. While these approaches have yielded critical information, alternative methods to network analysis will offer new perspectives on biological information. A little explored area is the interactions between domains that can be captured using domain co-occurrence networks (DCN). A DCN can be used to study the function and interaction of proteins by representing protein domains and their co-existence in genes and by mapping cancer mutations to the individual protein domains to identify signals.
The domain co-occurrence network was constructed for the human proteome based on PFAM domains in proteins. Highly connected domains in the central cores were identified using the k-core decomposition technique. Here we show that these domains were found to be more evolutionarily conserved than the peripheral domains. The somatic mutations for ovarian, breast and prostate cancer diseases were obtained from the TCGA database. We mapped the somatic mutations to the individual protein domains and the local false discovery rate was used to identify significantly mutated domains in each cancer type. Significantly mutated domains were found to be enriched in cancer disease pathways. However, we found that the inner cores of the DCN did not contain any of the significantly mutated domains. We observed that the inner core protein domains are highly conserved and these domains co-exist in large numbers with other protein domains.
Mutations and domain co-occurrence networks provide a framework for understanding hierarchal designs in protein function from a network perspective. This study provides evidence that a majority of protein domains in the inner core of the DCN have a lower mutation frequency and that protein domains present in the peripheral regions of the k-core contribute more heavily to the disease. These findings may contribute further to drug development.
Domains are distinct functional or structural units in a protein. Most domains correspond to tertiary structure elements, and are able to fold independently. All protein domains exhibit evolutionary conservation and many either perform specific functions or contribute in a specific way to the structure of their proteins. Domains may exist in a variety of biological contexts, wherein similar domains can be found in proteins with different functions. Many proteins are composed of one or more domains that can fold independently into a stable core structure [1-3].
Many complex systems have been analyzed as networks by representing the system as nodes and interactions between them as edges. Studies on complex networks including the network of co-authorships, sexual contacts and the world-wide-web (WWW) reveal that their structure and growth is governed by a set of generic organizing principles [4,5]. Network biology is emerging as a new field in biology due to the increasing availability of genome-scale data sets of molecular interactions. These data are a result of new high-throughput technologies yielding information on protein interactions, regulatory networks and the metabolome. Biological systems like gene interaction networks, protein and metabolite networks have been found to exhibit a scale-free property [6-13].
The development of high-throughput, whole-exome/genome DNA sequencing has made it possible to evaluate normal and tumor tissue samples in a single study. These studies have revealed the connection between somatic mutations and cancer susceptibility, initiation and development . A central goal of cancer genome analysis is the identification of cancer genes that, by definition, carry driver mutations. A key challenge will therefore be to distinguish driver from passenger mutations. Most studies thus far have attempted to identify driver mutations using gene-centric approaches [15-20]. Unfortunately, this method is limited to a small subset of genes and also leads to mischaracterized mutations . The gene-based approach usually fails to reflect the position of mutation or the functional context the position of mutation provides in protein level. But a protein domain network enables the identification of mutations that are rare at the gene level, but that occur frequently within the specified domain. These highly mutated domains potentially reveal disruptions of protein function necessary for cancer development.
Several studies have been conducted on protein domain co-occurrence networks (DCN). These studies represent domains as nodes and their co-occurrence in a protein are denoted as edges. The networks have also been shown to possess a scale-free property [22-24]. Increasing complexity of the organisms were observed from bacteria to eukaryotes due to the links involved in the cell-cell interaction domains, signal transduction and cell differentiation domains. Studies on DCN have examined the network property , evolutionary traces among the species , architectural design of protein domain networks  and mapping somatic mutations to protein domains in colon cancer . More recently, a disease-drug-phenotype matrix was also analyzed using protein domain networks . However, each of these studies have focused either on domain co-occurrence networks or on a specific feature of the DCN and therefore, do not provide a generalized view of mutations in the domain co-occurrence network. In this study, we investigated the protein domain co-occurrence network in the context of various cancers and their mutations. We specifically focused on the highly connected protein domains of the DCN core by using k-core decomposition techniques.
The definition of k-core was first introduced by Seidman  to characterize the cohesive regions of graphs. Batagelj et al., developed an efficient algorithm to find the k-core decomposition of a graph . K-core technique has been used in many areas including the alternative method for community detection algorithm  and for the identification of dense components in most of the complex networks [32-34].
K-core decomposition is a network analysis approach that helps in understanding interesting structural properties that are not otherwise captured by many other network topological parameters. The basic principle behind the k-core is decomposition to identify particular subsets of the network called k-cores. Each k-core is obtained by a recursive pruning method [29,35,36]. This decomposition method allows the study of the hierarchical properties of large complex networks by focusing on the network centrality and connectedness properties of nodes. The central cores of this analysis have more strongly connected vertices with large number of possible distinct paths between them. This helps in obtaining robust routing properties.
Materials and methods
Construction of protein domain co-occurrence networks (DCN)
The DCN for Homo sapiens was constructed using the Ensembl database (version 72) that provides a comprehensive source of stable automatic annotation of individual genomes .
K-core decomposition of DCN
To identify the core-periphery organization of the domain co-occurrence network, we subjected it to core decomposition. The cores of different orders of a network can be obtained by iteratively removing all nodes which have less than k connections with other protein domains (k = 1, 2, …). This is done by first identifying all nodes whose degree (i.e., number of connections) is less than k. After removing these, the network is re-analyzed to determine if the removal of these nodes has resulted in other nodes (which originally had degree > k) having now less than k connections. If such nodes are identified, then they are removed, and the process is continued, until no more nodes can be removed. The resulting sub-network is called the k-core of the network.
Randomization of k-core
To determine the statistical significance of the properties calculated for members of an empirically determined k-core, we compared them with the mean and variance of the corresponding values obtained for a randomized ensemble. Each randomized k-core in the ensemble is obtained by random selection without replacement of Nk domains from the DCN, where Nk is the size of the empirically determined k-core. The randomized ensemble for every DCN considered was generated by constructing 100 such randomized k-cores.
Evolutionary conservation of protein domains
The evolutionary conserved protein domains were identified using the database PANDITplus . It consists of a database of PFAM alignment phylogenetic trees for known protein domains and their families. This database was constructed using a relational database which comprises of information regarding the functional categories, metabolic pathways, protein–protein interactions, disease associations, gene expressions, three-dimensional structures, as well as estimates from an evolutionary analyses of selective pressures.
Cancer mutation dataset
Somatic mutation data for ovarian, breast and prostate cancer were obtained from TCGA data portal (http://tcga-data.nci.nih.gov/tcga/) using mutation files from the hgsc.bcm.edu_COAD.IlluminaGA_DNASeq.1 and hgsc.bcm.edu_COAD.SOLiD_DNASeq.1 directories downloaded on March 30th, 2013. The silent and RNA mutations were filtered out from the data set as they were assumed unlikely to affect the cancer development. Somatic mutation counts for ovarian cancer were found to be 20,878. For breast and prostate cancer the values were found to be 35,558 and 23,349.
Mapping cancer SNPs to individual protein domains
Percentage of mapped mutations in three forms of cancer
No. of Mutations
No. of mapped mutations
Procedure for normalizing the domain mutation frequency
To determine the domains that are frequently mutated in the human genome, we first obtained the count of mutations that fell within each domain. Since larger domains are generally expected to accumulate more mutations than the shorter domains, we normalized the domain mutation counts with domain length. This was done by dividing domain mutations counts by the cumulative length of the domain in the genome. That is, the summed length of all occurrences of the domain in the genome was used as total length. The normalized score for all the three cancer types are shown in Additional file 2: Table S2.
Calculation of significantly mutated domains
The “locfdr” package  from R was used to estimate the null distribution and these statistics were used to identify domains with a local false discovery rate < 0.1 . The local false discovery rate values for each domain in all the three cancer types are shown in Additional file 3: Table S3a (sheet1), Table S3b (sheet2) and Table S3c (sheet3).
Domains in the inner cores are more conserved than those at the periphery
The domain co-occurrence network of Homo sapiens was constructed and its statistical properties were determined. From the degree distribution plot (Additional file 4: Figure S1a), the DCN was found to have scale-free behavior. Additional file 4: Figure S1b shows the shortest path length distribution exhibiting a small-world phenomenon. The average clustering coefficient distributions and the node degrees are found to have an inherent hierarchical modularity (Additional file 4: Figure S1c). We applied the k-core decomposition algorithm to the Homo sapiens DCN . The cores were found to have 10 nested k-cores, where k values ranged from 1 to 10. The property of k-core decomposition is that as the core increases the number of nodes in each core decreases. This property was observed in the Homo sapiens DCN.
Domains in inner cores have fewer mutations than those at the periphery
Cancer mutations obtained from TCGA data portal were mapped to the individual protein domains. The normalized mutation score for each protein domain was also calculated (for details see Methods). To study the mutations in domain co-occurrence network, the normalized mutation scores were assigned to all the nodes (i.e., protein domain) in the network. In order to understand the nature of mutation in the Homo sapiens DCN we subjected it to core decomposition. We found that the normalized mutations per domain in each core gradually decreased with the core order. This observation occurred for all the three cancer types. To verify whether the findings are statistically significant, we compared the empirical DCN with its random counterparts. The pattern observed in the Homo sapiens DCN was not found in 100 random DCNs (i.e. p < 0.01).
From the randomized simulations, we observed that the inner core domains had lesser rates of mutation compared to the peripheral cores. To verify if, the inner core significantly differed from the other cores (we wanted to investigate the extent to which this aspect was true and to also identify outliers), every domain’s normalized mutation rates were plotted against the k-core values for the three types of cancers as shown in Additional file 5: Figure S2 (2a-ovarian cancer, 2b-breast cancer & 2c-prostate cancer). Results suggested that the normalized mutation rates gradually declined with the core order and the correlation values (R2) between them were also found to be positive. Interestingly, the outlier of the inner core is found to be more significant as it comprises of lower mutation rates.
Identification of significant mutated domains
Significantly mutated domains found in all three cancers studied
On analyzing genomes studies have shown that more than 70% of eukaryotic proteins comprised of multiple domains. Domain-domain interactions are now becoming an upcoming trend of interest across numerous studies [46-51]. Studies on protein-protein and domain-domain interaction networks using graph models have revealed that domain levels are the most important aspects of evolutionary selection. In addition to this, protein structural domains seem to have been the most distinct and significant biological entities for interaction, function and evolution . Modeling of domain interaction networks have identified that domains are often involved in the propagation of signal transduction and helps determine the recognition specificity of each domain family member. This becomes an essential step toward a functional description of the global interactome .
By constructing and analyzing domain co-occurrence networks we gain new and fundamental insights into the qualitative arrangement and evolutionary utilization of the proteome. Domain databases like Pfam and Interdom provide comprehensive domain information but mapping cancer SNPs to the individual domains may help identify cancer targeted protein domains rather than just the proteins. Domains with high relative rates of mutation in three hormonal cancer types were identified along with their common domains. Recent studies from Liu et al, 2014 revealed that the PDZ and LIM protein domain promotes breast cancer cell migration, invasion and metastasis . These two Pfam domains were also listed among the significantly mutated domains of the breast cancer.
Analyzing mutations in domain co-occurrence network helps in identifying crucial protein domains that aid in the progression of cancer disease. Highly connected protein domains are found to be evolutionarily conserved in the domain co-occurrence network. This implies that protein domains in the inner core are more conserved than the domains in the peripheral region. Significantly mutated protein domains which were identified further contributed to determining the disease target protein domains in all the three cancer diseases. Comparing the mutational landscape of somatic mutations in the protein domain co-occurrence network with the random counterparts, our findings revealed that there is a statistically significant difference between them.
The functional annotations obtained for all the significantly mutated domains were seen to be involved in all the three cancer diseases. Polymorphisms in inflammation-related genes, including those in the Toll-like receptor (TLR) signaling pathway, are hypothesized to be involved in prostate carcinogenesis . This Toll-like receptor signaling pathway was enriched as one of the top ranked KEGG pathway in our results. Similarly, the ribosome pathway is also enriched as one of the top ranked KEGG pathway. This ribosome pathway are activated in aggressive human breast cancer cells  and comparison with other pathways showed that the ribosome pathway genes were up-regulated in ZR-75-1 (Human breast carcinoma cell line) .
A recent study done by Pasta A et al , showed that overexpressed genes in cancer stem cells (CSC) from patients with epithelial ovarian cancer are associated with glucose uptake, oxidative phosphorylation (OXPHOS) and fatty acid beta-oxidation. These overexpressed genes are consistent with a metabolic profile dominated by OXPHOS pathway . In our results, ‘Complement and coagulation cascades’ was the most frequently perturbed pathway, as it was dysregulated in the ovarian cancer  and these two pathways are also found to be enriched in the significantly mutated domains. This clearly suggests that the statistically significant domains occur more commonly in cancer diseases. Further studies are however recommended to investigate the functional and structural constraints for the protein domain that evolves to be an inner core rather than outer core domain of the DCN.
The authors are grateful to Qatar Foundation Biomedical Research Program funding to Weill Cornell Medical College-Qatar (WCMCQ) for supporting this research.
- Cheng J. DOMAC: an accurate, hybrid protein domain prediction server. Nucl Acids Res. 2007;35:w354–6.View ArticlePubMed CentralPubMedGoogle Scholar
- Cheng J, Sweredoski M, Baldi P. Data mining and knowledge discovery. DOMpro. 2006;13:1–10.Google Scholar
- Jaenicke R. Folding and association of proteins. Prog Biophys Mol Biol. 1987;49:117–237.View ArticlePubMedGoogle Scholar
- Albert R, Barabasi AL. Statistical mechanics of complex networks. Rev Mod Phys. 2002;74(1):47–97.View ArticleGoogle Scholar
- Barabasi AL, Albert R. Emergence of scaling in random networks. Science. 1999;286(5439):509–12.View ArticlePubMedGoogle Scholar
- Wagner A, Fell DA. The small world inside large metabolic networks. Proc Roy Soc London Series B. 2001;268(1478):1803–10.View ArticleGoogle Scholar
- Jeong H, Tombor B, Albert R, Oltvai Z, Barabasi AL. The large-scale organization of metabolic networks. Nature. 2000;407(6804):651–4.View ArticlePubMedGoogle Scholar
- Fell D, Wagner A. The small world of metabolism. Nature Biotech. 2000;18(11):1121–2.View ArticleGoogle Scholar
- Wuchty S. Small-worlds in RNA. Nucl Acids Res. 2003;31(3):1108–17.View ArticlePubMed CentralPubMedGoogle Scholar
- Ravasz E, Somera A, Mongru D, Oltvai Z, Barabai A. Hierarchical organization of modularity in metabolic networks. Science. 2002;297(5586):551–1555.View ArticleGoogle Scholar
- Holme P, Huss M, Jeong H. Subnetwork hierarchies in biochemical pathways. Bioinformatics. 2003;19:532–8.View ArticlePubMedGoogle Scholar
- Barabasi A, Oltvai Z. Network biology: understanding the cell’s functional organization. Nature Rev Gen. 2004;5(2):101–13.View ArticleGoogle Scholar
- Rives A, Galitski T. Modular organization of cellular networks. Proc Nail Acad Sci. 2003;100(3):1128–33.View ArticleGoogle Scholar
- National Cancer Institute. The Cancer Genome Atlas Homepage. http://cancergenome.nih.gov (30 March 2013, date last accessed).
- Sjoblom T, Jones S, Wood LD, Parsons DW, Lin J, Barber TD, et al. The consensus coding sequences of human breast and colorectal cancers. In Science. 2006;314:268–74.View ArticleGoogle Scholar
- Wood LD, Parsons DW, Jones S, Lin J, Sjoblom T, Leary RJ, et al. The genomic landscapes of human breast and colorectal cancers. In Science. 2007;318:1108–13.View ArticleGoogle Scholar
- Ding L, Getz G, Wheeler D, Mardis E, McLellan M, Cibulskis K, et al. Somatic mutations affect key pathways in lung adenocarcinoma. Nature Protoc. 2008;455:1069–75.View ArticleGoogle Scholar
- Parsons DW, Li M, Zhang X, Jones S, Leary RJ, Lin JC, et al. The genetic landscape of the childhood cancer medulloblastoma. Science. 2011;331:435–9.View ArticlePubMed CentralPubMedGoogle Scholar
- Berger MF, Lawrence MS, Demichelis F, Drier Y, Cibulskis K, Sivachenko AY, et al. The genomic complexity of primary human prostate cancer. Nature. 2011;470:214–20.View ArticlePubMed CentralPubMedGoogle Scholar
- Jones S, Zhang X, Parsons DW, Lin JC, Leary RJ, Angenendt P, et al. Core signaling pathways in human pancreatic cancers revealed by global genomic analyses. Science. 2008;321:1801–6.View ArticlePubMed CentralPubMedGoogle Scholar
- Zhong Q, Simonis N, Li QR, Charloteaux B, Heuze F, Klitgord N, et al. Edgetic perturbation models of human inherited disorders. Mol Syst Biol. 2009;5(1):321.PubMed CentralPubMedGoogle Scholar
- Wuchty S. Scale-free behavior in protein domain networks. Mol Biol Evol. 2001;18:1694–702.View ArticlePubMedGoogle Scholar
- Apic G, Gough J, Teichmann S. Domain combinations in archaeal, eubacterial and eukaryotic proteomes. J Mol Biol. 2001;310:311–25.View ArticlePubMedGoogle Scholar
- Wuchty S. Proteomics. Interaction and Domain Networks of Yeast. 2002;2:1715–23.Google Scholar
- Wuchty S, Almaas E. Evolutionary cores of domain co-occurrence networks. BMC Evol Biol. 2005;5(1):24.View ArticlePubMed CentralPubMedGoogle Scholar
- Hsu CH, Chen CK, Hwang MJ. The architectural design of networks of protein domain architectures. Biol Lett. 2013;9:20130268.View ArticlePubMed CentralPubMedGoogle Scholar
- Nehrt NL, Peterson TA, Park D, Kann MG. Domain landscapes of somatic mutations in cancer. BMC Genomics. 2012;13:S9.View ArticlePubMed CentralPubMedGoogle Scholar
- Fang H, Gough J. A disease-drug-phenotype matrix inferred by walking on a functional domain network, Mol Bio Syst. 2013;9:1686–96.Google Scholar
- Seidman SB. Network structure and minimum degree. Soc Networks. 1983;5:269–87.View ArticleGoogle Scholar
- Batagelj V, Zaversnik M. An O (m) Algorithm for Cores Decomposition of Networks. 2003. arXiv preprint cs/0310049 2003.Google Scholar
- Giatsidis C, Thilikos DM, Vazirgiannis M. D-cores: measuring collaboration of directed graphs based on degeneracy IEEE. In: Data Mining (ICDM), IEEE 11th International Conference on. 2011. p. 210–10.Google Scholar
- Andersen R, Chellapilla K. Finding dense subgraphs with size bounds. In: Algorithms and Models for the Web-Graph. Berlin Heidelberg: Springer; 2009. p. 25–37.View ArticleGoogle Scholar
- Balasundaram B, Butenko S, Hicks IV. Clique relaxations in social network analysis: the maximum k-plex problem. Oper Res. 2011;59:133–42.View ArticleGoogle Scholar
- Kortsarz G, Peleg D. Generating sparse 2-spanners. J Algorithms. 1994;17:222–36.View ArticleGoogle Scholar
- Bollobas B, Thomason A. Random graphs of small order. North-Holland Mathematics Studies. 1985;118:47–97.View ArticleGoogle Scholar
- Batagelj V, Zaversnik M. Generalized Cores. 2002. arXiv preprint cs/0202039 2002.Google Scholar
- Flicek P, Ahmed I, Amode MR, Barrell D, Beal K, Brent S, et al. Ensembl 2013. Nucleic acids research. 2012;41:D48–55. gks1236.View ArticlePubMed CentralPubMedGoogle Scholar
- Bateman A, Coin L, Durbin R, Finn R, Hollich V. The Pfam protein families database. Nucleic Acids Res. 2004;32:276–80.View ArticleGoogle Scholar
- Shannon P, Markiel A, Ozier O, Baliga N, Wang J. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 2003;13:2498–504.View ArticlePubMed CentralPubMedGoogle Scholar
- Dimitrieva S, Anisimova M. PANDITplus: toward better integration of evolutionary view on molecular sequences with supplementary bioinformatics resources. Trends Evol Biol. 2009;2:e1.View ArticleGoogle Scholar
- Efron B, Turnbull BB, Narasimhan B. Computation of Local False Discovery Rates, R package version 1.1-7. Vienna, Austria: R Foundation for Statistical Computing; 2011.Google Scholar
- Efron B, Ibshirani R. Empirical bayes methods and false discovery rates for microarrays. Genet Epidemiol. 2002;23:70–86.View ArticlePubMedGoogle Scholar
- Huang D, Sherman B, Lempicki R. Systematic and integrative analysis of large gene lists using DAVID Bioinformatics Resources. Nature Protoc. 2009;4:44–57.View ArticleGoogle Scholar
- Huang D, Sherman B, Lempicki R. Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res. 2009;37:1–13.View ArticlePubMed CentralGoogle Scholar
- Shoemaker B, Panchenko A, Bryant SH. Finding biologically relevant protein domain interactions: conserved binding mode analysis. Protein Sci. 2006;15:352–61.View ArticlePubMed CentralPubMedGoogle Scholar
- Deng M, Mehta S, Sun F, Chen T. Inferring domain-domain interactions from protein-protein interactions. Genome Res. 2002;12:1540–8.View ArticlePubMed CentralPubMedGoogle Scholar
- Moon HS, Bhak J, Lee KH, Lee D. Architecture of basic building blocks in protein and domain structural interaction networks. Bioinformatics. 2005;21:1479–86.View ArticlePubMedGoogle Scholar
- Santonico E, Castagnoli L, Cesareni G. Methods to reveal domain networks. Drug Discov Today. 2005;10:1111–7.View ArticlePubMedGoogle Scholar
- Emig D, Cline MS, Lengauer T, Albrecht M. Integrating expression data with domain interaction networks. Bioinformatics. 2008;24:2546–8.View ArticlePubMed CentralPubMedGoogle Scholar
- Prieto C, De Las Rivas J. Structural domain-domain interactions: assessment and comparison with protein-protein interaction data to improve the interactome. Proteins. 2010;78:109–17.View ArticlePubMedGoogle Scholar
- Stein A, Ceol A, Aloy P. 3did: identification and classification of domain-based interactions of known three-dimensional structure. Nucleic Acids Res. 2011;39:D718–23.View ArticlePubMed CentralPubMedGoogle Scholar
- Liu Z, Zhan Y, Tu Y, Chen K, Liu Z, Wu C. PDZ and LIM domain protein 1(PDLIM1)/CLP36 promotes breast cancer cell migration, invasion and metastasis through interaction with α-actinin. Oncogene. 2014, doi:10.1038/onc.2014.64Google Scholar
- Stark JR, Wiklund F, Gronberg H, Schumacher F, Sinnott JA, Stampfer MJ, et al. Toll-like receptor signaling pathway variants and prostate cancer mortality. Cancer Epidemiol Biomarkers Prev. 2009;18:1859–63.View ArticlePubMed CentralPubMedGoogle Scholar
- Belin S, Beghin A, Solano-Gonzalez E, Bezin L, Brunet-Manquat S, Textoris J, et al. Dysregulation of ribosome biogenesis and translational capacity is associated with tumor progression of human breast cancer cells. PLoS One. 2009;4:e7147.View ArticlePubMed CentralPubMedGoogle Scholar
- Mandal S, Davie JR. An integrated analysis of genes and pathways exhibiting metabolic differences between estrogen receptor positive breast cancer cells. BMC Cancer. 2007;7:181.View ArticlePubMed CentralPubMedGoogle Scholar
- Pasta A, Bellio C, Pilotto G, Ciminale V, Silic-Benussi M, Guzzo G, et al. Cancer stem cells from epithelial ovarian cancer patients privilege oxidative phosphorylation, and resist glucose deprivation. Oncotarget. 2014;5:4305.Google Scholar
- Lin P, Huang Z. Correlation analysis connects cancer subtypes. PLoS One. 2013;8:e69747.View ArticlePubMed CentralPubMedGoogle Scholar
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.