- Open Access
Functional repertoire, molecular pathways and diseases associated with 3D domain swapping in the human proteome
Journal of Clinical Bioinformaticsvolume 2, Article number: 8 (2012)
3D domain swapping is a novel structural phenomenon observed in diverse set of protein structures in oligomeric conformations. A distinct structural feature, where structural segments in a protein dimer or higher oligomer were shared between two or more chains of a protein structure, characterizes 3D domain swapping. 3D domain swapping was observed as a key mediator of numerous functional mechanisms and play pathogenic role in various diseases including conformational diseases like amyloidosis, Alzheimer's disease, Parkinson's disease and prion diseases. We report the first study with a focus on identifying functional classes, pathways and diseases mediated by 3D domain swapping in the human proteome.
We used a panel of four enrichment tools with two different ontologies and two annotations database to derive biological and clinical relevant information associated with 3D domain swapping. Protein domain enrichment analysis followed by Gene Ontology (GO) term enrichment analysis revealed the functional repertoire of proteins involved in swapping. Pathway analysis using KEGG annotations revealed diverse pathway associations of human proteins involved in 3D domain swapping. Disease Ontology was used to find statistically significant associations with proteins in swapped conformation and various disease categories (P-value < 0.05).
We report meta-analysis results of a literature-curated dataset of human gene products involved in 3D domain swapping and discuss new insights about the functional repertoire, pathway associations and disease implications of proteins involved in 3D domain swapping.
Our integrated bioinformatics pipeline comprising of four different enrichment tools, two ontologies and two annotations revealed new insights into the functional and disease correlations with 3D domain swapping. GO term enrichment were used to infer terms associated with three different GO categories. Protein domain enrichment was used to identify conserved domains enriched in swapped proteins. Pathway enrichment analysis using KEGG annotations revealed that proteins with swapped conformations are present in all six classes of KEGG BRITE hierarchy and significantly enriched KEGG pathways were observed in five classes. Five major classes of disease were found to be associated with 3D domain swapping using functional disease ontology based enrichment analysis. Five classes of human diseases: cancer, diseases of the respiratory or pulmonary system, degenerative diseases of the central nervous system, vascular disease and encephalitis were found to be significant. In conclusion, our study shows that bioinformatics based analytical approaches using curated data can enhance the understanding of functional and disease implications of 3D domain swapping.
Computationally efficient classification, annotation and prediction algorithms are rapidly improving our understanding of protein sequence-structure-function relationships. Analysis of such relationships often helps in our understanding of novel sequence or structural features in the regulation of a particular function including molecular pathways and various disease mechanisms. Cells attain its functional integrity with the help of molecular mechanisms including protein-protein interactions [1–7]. Protein folding and subsequent oligomerization of protein chains help such interactions in cellular environment. Protein-protein interactions play a key role in mediating higher order oligomerization. Protein-protein interactions are diverse in nature and they can be broadly classified, as transient interactions where the interactions are weak and obligatory interactions that are permanent in nature. Based on sequence homology, two proteins with high degree of similarity could interact and form a homodimer, where as two distantly related proteins could form a heterodimer [8, 9]. 3D domain swapping is a unique protein structural mechanism observed in homodimers or higher order oligomers with a specific type of interaction, where a segment of two protein chains are mutually swapped. 3D domain swapping was also observed in protein structures in heteroligomer conformations. 3D domain swapping was associated with several proteins that were involved in diverse functional events and disease pathways. Previous studies on 3D domain swapping using structural properties indicated that 3D domain swapping share similar structural features of oligomeric protein complexes and primarily associated with deposition diseases [10–13]. Prior studies on 3D domain swapping were focused on small set of proteins largely due to the unavailability of a curated database of proteins involved in 3D domain swapping. In this study, we present results from analysis of proteins in the human genome and curated in 3DSwap knowledgebase using multiple biological enrichment methods. 3DSwap is the first database that catalogued proteins involved in 3D domain swapping. The database was developed using a literature-based protein structural curation strategy that utilized manual curation and a structural bioinformatics pipeline to gather data pertaining to 3D domain swapping. We used complete set of human proteins from 3DSwap database and examined statistically significant domains, biological process, cellular component, molecular function, biological pathways and diseases using enrichment methods. From a bioinformatics perspective, this manuscript is a case study that leverage application of robust bioinformatics methods to gain new functional and therapeutic insights from a protein structural mechanism.
3D domain swapping: Pathophysiological basis of deposition diseases
3D domain swapping is a unique protein structural phenomenon with implications in function, form and disease (Figure 1). Only two scenarios (domain swapped dimer and open-ended oligomeric swapping) of 3D domain swapping are provided in the figure. Other scenarios like double domain swapping, cyclic swapping and entirely swapped structures were observed in proteins with swapped oligomeric architecture. Protein structures involved in 3D domain swapping is characterized by hinge regions and swapped regions. 3D domain swapping is associated with mutual swapping of a structural segment between two or more chains in a protein oligomer. This mechanism was observed in a diverse group of proteins that mediate different structural, functional and physiological mechanisms. 3D domain swapping was primarily defined as a mechanism for functional or structural oligomeric assembly, recently defined as the molecular mechanism behind protein aggregation and thus implicated as a pathogenic basis of diseases like deposition diseases or conformational diseases , amyloidosis , serpinopathies  and proteinopathies . Proteins involved in such diseases have higher aggregation propensities and involved in the formation of highly specific aggregates of a single protein. From a structural perspective, some of these aggregates were generated by 3D domain swapping mechanism [12–14, 17–33]. From a clinical perspective, such diverse disease manifestations mediated by this single structural mechanism are of great interest. It still remains elusive whether 3D domain swapping is exclusively associated with such conformational diseases or they may also play a crucial role in mediating complex diseases.
Dataset of human proteins involved in domain swapping
Irrespective of numerous biochemical and computational studies focused on the molecular basis of 3D domain swapping [11, 34–52], a detailed account of functional repertoire, including protein domains, Gene Ontology (GO) terms, biological pathways and disease associated with proteins in swapped conformation, were not reported. The mechanism of 3D domain swapping was reported in different evolutionary lineages and structures in swapped conformation were identified in multiple organisms with a large proportion characterized from eukaryotes. Hitherto, proteome-wide analysis of this unique structural mechanism was impossible due to the non-availability of proteome level curated dataset. Recently, we integrated in-depth literature curation and structural bioinformatics analytics to curate proteins involved in 3D domain swapping from Protein Data Bank (PDB) and reported a knowledgebase of proteins involved in 3D domain swapping . 3DSwap offers a compendium of 293 protein structures with delineated hinge regions, swapped regions and offers an ideal resource to study functional and structural implications of domain swapping.
Inference from biological and biomedical ontologies using enrichment analysis
Enrichment analysis plays an important role in knowledge-based bioinformatics approaches [54, 55]. In this study, enrichment analysis was performed using annotations derived from Pfam domains , GO [57–59], KEGG pathways  and Disease Ontology (DO) [61, 62]. Enrichment analysis in bioinformatics is a collective term referring to a group of statistical bioinformatics algorithms developed to understand the global trends of a subset of genes or gene products compared to a background population (for example, all genes in the human genome and whole proteins encoded in the entire human genome or all genes tested in a given experiment or genes included in gene expression platforms etc.). Huang et al.  suggested a nomenclature to classify enrichment tools in bioinformatics as singular enrichment analysis (SEA), gene set enrichment analysis (GSEA)  and modular enrichment analysis (MEA) . Fundamental differences between these three classes of algorithms arise in the manner by which the enrichment P-value was calculated. In SEA-based approach, annotation terms of subset of genes were assessed one at a time against a list of background genes. An enrichment p-value was calculated by comparing the observed frequency of an annotation term with the frequency expected by chance and individual terms beyond the p-value cut-off (P-value ≤ 0.05). BiNGO , FunctAssociate , Onto-express [66, 67] are examples of SEA-based enrichment analysis tools. GSEA approaches are similar, but consider all genes during the enrichment analysis, instead of a pre-defined threshold based genes, as in SEA approach. For example, Gene Ontology terms are connected by relationships and MEA based programs like Ontologizer  and topGO  employ the relationships that exist between the annotations. These programs were reported to attain better sensitivity and specificity due to the consideration of GO term relationships. GSEA is an enrichment-based computational method to determine whether an a priori defined set of genes show statistically significant differences, when compared between two biological states . For example, a set of human genes differentially regulated in a gene expression of analysis for a particular type of cancer can be considered as a prior gene list, and the background can be defined one or more datasets compiled in Molecular Signatures Database (MSigDB) . A variety of tools are currently available for the functional enrichment analysis, a recent review cited 69 tools for such analysis and the list of tools are rapidly growing. Majority of these tools employ statistical methods using Fisher's test [71, 72], hypergeometric function , binomial test  or χ2 tests  or combination of such methods as implemented in tools like GFINDER  and Onto-Express [66, 67] for significant association of the GO terms and the gene list with respect to the background distribution. Concept of gene set enrichment analysis was incorporated in to various programs that use biological or functional annotations of genes and gene products to perform biological enrichment calculations using ontologies and annotations. Gene Ontology enrichment and pathway enrichment analysis employ similar conceptual and statistical methods to understand functional and molecular roles of subset of genes or proteins were found to be very efficient in summarizing functional diversity or similarity trends. Such approaches are routinely employed in gene expression studies, high-throughput screening experiments and genome-wide association studies (GWAS) [75, 76].
Gene ontology enrichment and pathway enrichment analysis, using ontologies or annotations derived from a subset of genes characterized from an experimental or computational study, generally applied to infer new biological insights, which was otherwise impossible with candidate gene-centric approaches. Due to the generic nature of statistical methods used in enrichment analysis, current set of enrichment algorithms and related statistical methods can be used to infer enrichment from annotation databases. Enrichment calculations are currently available for various types of annotations. Annotations of protein domains (Pfam , SMART), pathways (KEGG , GenMAPP) and human gene-disease associations using Online Mendelian Inheritance in Man (OMIM)  are currently used for enrichment analysis. Similar to GO, any ontology (for example: disease ontology (DO) ) maintained by Open Biological and Biomedical Ontologies (OBO)  foundry or its mapping or derivatives (for example: disease-ontology (DOLite) ) can be effectively used for enrichment analysis.
Enrichment tools, ontologies, annotation databases and statistical methods
This study utilized four tools, two ontologies and two annotation databases for inferring functional and disease insights from list of human proteins involved in 3D domain swapping. Protein domain enrichment was performed using DAVID 6.7. Protein domain annotations were derived from Pfam database, a database of evolutionarily conserved protein domain coordinates. Ontologizer 2.0, a GO term enrichment tool with command-line interface and improved statistical method for deriving GO terms enriched in a given list of proteins was used in this study. SubPathwayMiner, an R package that internally handles KEGG annotations for pathway enrichment analysis were used to derive statistically significant pathways associated with the dataset. Enriched disease ontology terms were identified using Functional Disease Ontology server that consults Disease Ontology and it's derivative disease-ontology lite for identifying significant diseases. H 0 = List of curated proteins with swapped conformations are not associated with any class of protein domains, gene ontology terms, KEGG pathways or disease ontology terms. We tested our null hypothesis individually using four different tools and associated annotations or ontologies. P-value from enrichment analyses were obtained using default statistical settings of different tools employed in this study. Protein domain enrichment P-values were derived from DAVID using a modified Fisher Exact P-value, called EASE score . GO term enrichment analysis P-values were derived using Ontologizer 2.0 and corrected using Bonferroni method . KEGG pathway enrichment using SubPathwayMiner, it provides False Discovery Rate (FDR) corrected P-values. Disease enrichment analysis was performed using Functional Disease Ontology server and it uses a Fisher's exact test for deriving P-values.
Curated dataset of human proteins involved in 3D domain swapping
Classification of proteins in 3DSwap knowledgebase based on SOURCE record from PDB and subsequent mapping using SIFTS annotations revealed that 75 structures out of 293 structures reported in 3DSwap were from Homo sapiens. A cursory look at 3DSwap database for the taxonomic spread would indicate that the largest fraction was from humans (25.6%) (Figure 2). We used literature-curated structures from 3DSwap database with delineated 'hinge' and 'swapped' regions for the analysis in (see Additional file 1: Supplementary Table 1) for list of proteins used in this study). 75 PDB identifiers were mapped to UNIPROT and KEGG database identifiers using Protein ID cross-reference (PICR) service and custom Perl scripts . Out of the 75 curated protein structures with 3D domain conformation retrieved from 3DSwap knowledgebase, 45 proteins were unique (See Table 1). Human proteins from our curated dataset had several redundant structures. To avoid potential functional bias, only unique human proteins (45/75 structures) were used in this analysis. Graphical summary of the bioinformatics pipeline employed in this study is depicted in Figure 3.
Enrichment analysis of human proteins involved in 3D domain swapping
Protein domain enrichment analysis was performed using DAVID . KEGG pathway analysis was performed using SubPathwayMiner  and Disease Ontology analysis was performed using Functional Disease Ontology server [61, 62].
Protein domain enrichment analysis
To perform protein domain enrichment analysis, domains were identified in proteins involved in 3D domain swapping and a list of protein domains was obtained. This list of protein domains was compared against a reference dataset of protein domains associated with complete human proteome. Protein domain enrichment analysis was performed to understand statistically significant, conserved, functional modules associated with proteins involved in 3D domain swapping. Dataset of 45 Uniprot identifiers were used for protein domain enrichment analysis using Pfam annotations. DAVID version 6.7 with default settings was used for the analysis.
Gene ontology enrichment analysis
GO term enrichment analysis in this study was performed using Ontologizer 2.0, a multifunctional tool for GO term enrichment analysis. Ontologizer was selected due to the improved statistical approximation methods incorporated in it. A brief description of the method is provided here. Generic GO enrichment tools calculate the enrichment of a GO term with respect to the list of genes in the dataset and the background population using the probability of drawing the same or higher number of genes annotated to a given term. This basic concept was implemented using statistical test involving the upper tail of the hypergeometric distribution or one-tailed Fisher's exact test. Such methods do not consider relationships between the annotation terms. GO is defined as a directed acyclic graph (DAG), with various levels of relationships between the terms. Due to DAG architecture of GO, a gene or gene product annotated with a term x is also annotated to all parent terms of x, and this often leads to false enrichment calculations. Such relationships (for example: is a, part of, has part, regulates) were taken into account in Ontologizer 2.0 using parent-child inheritance concepts . Detailed description about the statistical method implemented in the Ontologizer 2.0 can be found elsewhere [68, 84]. Dataset consisting of 45 Uniprot identifiers were used for species (Homo sapiens) specific GO enrichment analysis and pathway analysis. GO enrichment analysis was performed using the following parameters using Ontologizer 2.0: Gene Ontology annotations were derived from human-specific annotation data (gene_association.goa_human) , multiple testing correction was set to "Bonferroni correction" method, enrichment calculation was set to Parent-child-Intersection, re-sampling step was set to 1000. Gene Ontology was defined using 33,738 terms and 59,508 relations recorded in the gene_ontology.obo file (downloaded on February 2011) were used for the analysis. Background population for statistical tests was defined using 18,257 proteins encoded in the human genome with Gene Ontology annotations.
KEGG based pathway enrichment analysis of proteins in human proteome with swapped conformation
Pathway enrichment analysis using KEGG pathway annotations were performed to understand the role of proteins in 3D domain swapping conformation in various biological pathways. UNIPROT Identifiers were mapped to Entrez gene identifiers using custom Perl scripts and used as the input in R package SubPathwayMiner  for pathway enrichment analysis. Pathways associated with these proteins were obtained from KEGG pathway database and compared to a reference database of full list of proteins and its corresponding pathways annotated in KEGG databases.
Disease enrichment analysis of proteins in swapped conformation using disease ontology
The disease ontology tem enrichment analysis was performed using Functional Disease Ontology server . List of 45 human genes mapped to UNIPROT Identifiers were mapped to Entrez gene identifiers using custom Perl scripts. List of Entrez identifiers were used as input for Disease Ontology enrichment to understand the role of the human proteins with swapped conformation in various biological pathways. Out of 45 genes in the list, 35 were found to be associated with at least one disease. Briefly, the disease association of each gene in the human genome was annotated using the Disease Ontology and peer-reviewed evidence from Gene Related Information into Function (GeneRIF) [61, 62, 85]. A condensed version of the Disease Ontology, Disease Ontology Lite , was used for the statistical analysis. Similar to Gene Ontology analysis, the significance of each disease association was evaluated using Fisher's exact test.
3D domain swapping is a structural mechanism employed by a variety of protein structures to form oligomeric assemblies. These oligomers were often associated with aggregation diseases or proteinopathies in humans. Parkinson's diseases and Alzheimer's diseases are two major neurodegenerative diseases due to phenotypic impact of 3D domain swapping. Hitherto, no comprehensive study has been reported to analyze the impact of all proteins involved in 3D domain swapping from a whole proteome-wide or genome-wide perspective due to unavailability of a well-defined, curated dataset. We performed the initial investigation of proteins involved in 3D domain swapping in the level of protein domains, Gene Ontology, KEGG pathways and Disease Ontology. Our approach helped to understand enriched protein domains, Gene Ontology terms, biological pathways and Disease Ontology terms mediated by these proteins and their role in mediating various human diseases.
Statistically significant protein domains associated with swapped proteins in the human proteome is provided (Table 1), GO terms (Tables 2, 3, 4), KEGG pathways (Table 5) and DO terms (Table 6), associated with swapped proteins encoded in the human proteome, are provided. Critical aspects of statistically significant evolutionarily conserved domains, GO terms, KEGG pathways and DO terms associated with human proteins in swapped conformation are summarized in the 'Discussion' section.
Proteins involved in 3D-domain swapping represents a large collection of proteins with a variety of functional and regulatory roles in the cell. Due to limitation in crystallizing structures in the swapped conformation, currently available repertoire of proteins in the swapped conformation may represent only a small fraction of proteins that may perform its molecular role via 3D domain swapping. Machine learning algorithms and computational approaches may help to predict more proteins with features of 3D domain swapping [11, 52]. Here we discuss primary insights obtained from the initial investigation of proteins involved in 3D domain swapping. Present results from the human proteome indicates an important paradigm that future drug design studies, focusing on various disease categories or pathways associated with 3D domain swapping, should consider the structural implications of this important structural mechanism and associated mechanisms like macromolecular crowding and protein aggregation.
Functional repertoire of proteins involved in 3D domain swapping
Protein domain enrichment analysis reveals that five protein domain families were enriched in the dataset (See Table 1). These include protein tyrosine kinase domain, a member of kinase domain family involved in signal transduction , cystatin domain, a member of cysteine protease inhibitor family , leucine-rich repeat C-terminal domain, an unique motif that mediates protein-protein interaction , Guanylate kinase, a key mediator of catalytic reaction that converts adenosine triphosphate (ATP) to adenosine diphosphate (ADP) and adenosine monophosphate (AMP)  and Immunoglobulin I-set domain found in several cell adhesion molecules . We noted that significantly enriched conserved protein domains associated with 3D domain swapping plays pivotal role in various signaling pathways, thus it also points the role of domain swapping in multiple signal transduction events.
Statistically significant GO terms associated with swapped proteins
GO term enrichment analysis revealed that multiple terms in three different GO categories were associated with swapped proteins encoded in the human proteome. This includes 31 GO terms in biological process category (Table 2), five GO terms in cellular component category (Table 3) and 12 terms in molecular function category (Table 4). DAG structure with highlighted GO terms in biological process (Additional file 1: Figure S1), cellular compartment (Figure 4) and molecular function (Additional file 1: Figure S2) categories are provided. Biological process contains several non-specific and specific GO terms that point towards functional understanding of the proteins involved in 3D domain swapping. Top "Biological Process" terms include viral reproduction and protein amino acid hydroxylation. Two cellular transport related terms under "Cellular Component" category (membrane raft and trans-Golgi network), along with cytoplasm and cell periphery, were also found to be associated with human proteins involved in 3D domain swapping. Enriched molecular function terms indicate that human proteins involved 3D domain swapping is involved in multiple signaling and binding activities including chromatin binding, protein kinase activity and protein dimerization activity. This also indicates specific role of proteins involved in swapping and its association with mechanisms like oligomerization, macromolecular crowding and aggregation which are considered to be cellular mechanisms implicated by 3D domain swapping. GO term enrichment analysis provided a cursory view of biological processes, cellular components and molecular functions associated with 3D domain swapping.
Implications of 3D domain swapping in in biochemical pathways
Results from pathway enrichment analysis using BioConductor based SubPathwayMiner package indicates that proteins in swapped conformation participate in multiple biological pathways. Results from pathway enrichment analysis using KEGG annotations are provided in Table 5. KEGG database classifies the pathways using a top-level functional hierarchy classification using KEGG-BRITE hierarchy. According to this hierarchy, human pathways were classified into six categories (Metabolism, Genetic Information Processing, Cellular Processes, Organismal Systems and Human diseases). Current analysis reveals that proteins with 3Dswap conformations are present in all six classes, but significantly enriched KEGG pathways were observed in all classes except the Genetic Information Processing. Proteins involved in 3D domain swapping are observed in multiple subcategories of KEGG pathway hierarchy (see Figure 5). KEGG pathway analysis indicated that proteins in the swapped conformation are statistically significant in four subclasses of human disease class viz. Cancers, Immune System Diseases, Infectious Diseases and Neurodegenerative Diseases. Proteins are also involved in other subclasses of diseases like Cardiovascular Diseases of KEGG BRITE hierarchy (See Table 5).
Disease implications of proteins involved in 3D domain swapping
Since KEGG pathways represent biochemical pathways and disease pathways in a single framework, a further detailed analysis of human proteins in swapped conformation was performed using a dedicated ontology that defines human diseases. Functional disease ontology annotation tool that uses Disease Ontology-derived "Disease Ontology-lite" and GeneRIFs were used in this analysis due to the brevity of the terms and availability of significant gene-disease association data. Enrichment analysis using disease ontology provided a detailed overview of the statistically significant association between gene-products in the swapped conformation with various disease categories. Using the current subset of data, five major classes of diseases were observed in the disease Ontology-based enrichment analysis as follows: cancer (prostate cancer, thyroid cancer, breast cancer and neoplasm metastasis), diseases of the respiratory or pulmonary system (asthma, bronchial hyperreactivity, pulmonary alveolar proteinosis), degenerative diseases of the central nervous system (Amyotrophic lateral sclerosis, Parkinson's Disease), vascular disease (atherosclerosis, hypertension) and encephalitis (rabies). Neurodegenerative diseases are well-known to have strong association with 3D domain swapping, but insights into other diseases indicates that there could be more proteins with disease association and 3D domain swapping, beyond the currently well-known group of conformational diseases. Detailed table with Disease Ontology term (disease), genes associated with each disease and P-value for the association is provided in Table 6. Five of the significantly enriched diseases in the dataset and the genes associated with the diseases are provided as a network (Figure 6). Network is defined using genes as nodes and disease shared between the genes are considered as common edge between two genes. Disease ontology is useful to map disease relationships across human genes and diseases. To expand this disease association to clinically relevant information, we curated the disease ontology terms associated with 3D domain swapping to derive the associated International Classification of Diseases - 9 (ICD-9) codes. Diseases under the following ICD-9 codes 001-139 (infectious and parasitic diseases), 140-239: (neoplasms), 320-359 (diseases of the nervous system), 390-459: diseases of the circulatory system, 460-519 (diseases of the respiratory system). This further helped to understand major classes of clinically relevant disease phenotypes mediated by a unique molecular mechanism.
Domain swapping is a key pathophysiological mechanism mediating conformational disease. A detailed account of functional repertoire, molecular pathways and spectrum of diseases affected by this mechanism remains elusive. We used enrichment calculations to understand the aspects using a curated dataset of proteins involved in 3D domain swapping. Our analysis was performed using a dataset of 45 unique human proteins derived from 3DSwap knowledgebase . This dataset will be growing in the future as structural characterization of human proteins involved in domain swapping is rapidly increasing. Numerous structures are being identified and more proteins with swapped conformation may found to be associated with domain swapping. Performing analysis using the approaches we employed in the future may help to identify additional protein domains, Gene Ontology terms, molecular pathways and human diseases.
Due to oligomeric features of swapping, earlier studies have indicated that 3D domain swapping plays a crucial role in conformational diseases or deposition diseases and proteinopathies. There was limited insight on structure-function relationship of proteins involved in domain swapping due to unavailability of a large dataset to objectively analyze functional or disease implications implicated by 3D domain swapping. Proteins encoded in the human genome and reported to be involved in 3D domain swapping were analyzed in detail to understand the role of gene products in various classes of diseases, beyond conformation diseases or proteinopathies. Mapping and enrichment analysis of human proteins involved in 3D domain swapping to KEGG pathways in 'disease' class and Disease Ontology indicates that these proteins play a significant role in various other diseases categories along with well-known neurodegenerative or conformational diseases.
Availability of genome-scale sequence data and annotations were considered as the ideal resource for gaining new insights from a plethora of biological data. Structural mechanisms can gain new insights about the functional aspects by mapping and database-wide enrichment analysis using annotations. In a similar way, functional mechanism may also gain new insight by using knowledge-based approaches employed in this study. In summary, the present study reports the application of knowledge-based approaches to understand new functional insights about a structural mechanism. Starting from an initial dataset of protein structures, the present study shows the importance and impact of the data integration and data mining to derive biologically relevant interpretations of global trends of a structural mechanism from sequence, functional and disease perspective. Further new insights are obtained from a translational perspective by focusing on proteins involved in 3D domain swapping in the human genome. 3D domain swapping is a unique phenomenon and may affect availability of active sites and binding sites required to impart the biological function depending on the swapped conformation. Perhaps, future drug design studies should consider these important aspects while developing therapeutics for various disease categories where 3D domain swapping is observed.
Clinical relevance of 3D domain swapping
In the current era of personal genomes and network medicine, clinical and therapeutic approaches are utilizing integrated approaches for the understanding of disease states and pathophysiological mechanisms. Complex disease states are often triggered by perturbations in multiple pathways by multiple genes [91–94]. Protein structures and structural mechanisms play an important role in the phenotypic impact of various diseases and signaling pathways [95–101]. Protein structural information is routinely utilized to identify drug targets that will help in development of effective drugs [102–104]. New approaches will be required to target proteins or biochemical pathways with proteins in the swapped conformation. Our study illustrates the application of biological and biomedical enrichment tools, ontologies and annotations to understand functional role and disease implications of an important structural mechanism from the global perspective of human proteome.
Insights obtained from our disease ontology analysis indicates that 3D domain swapping is not just confined to neurodegenerative diseases, proteins in swapped conformation play a significant role in several other classes of diseases like cancer, vascular disease, pulmonary disease etc. Enrichment results discussed in this paper will be useful in such studies in the future from biochemical, functional, structural and therapeutic perspective. Our analysis also indicates that further genome-specific analysis of proteins involved in 3D domain swapping, using comparative genome analysis framework, may also add further understanding of functional, structural and pathophysiological manifestations of 3D domain swapping.
3D domain swapping is an important structural mechanism associated with a diverse set of proteins involved in multitude of biological processes and molecular functions and diseases including proteinopathies. This phenomenon is often studied from the perspective of protein structure and its impact on biological pathways, correlations with biological functions and association with classes of diseases other conformational diseases were largely unknown. We performed a knowledge-based analysis of human proteins involved in 3D domain swapping to find the key functions, pathways and diseases associated with 3D domain swapping. Our study was limited to 45 unique proteins involved in 3D domain swapping. 3D domain swapping is a functionally relevant phenomenon due to its primary role in protein oligomerization; proteins with swapped oligomeric states are being identified on a regular basis using crystallography experiments. Effective algorithms that can predict swapping from structural and sequence information may also help to identify more proteins in swapped confirmation. As more proteins are being characterized in swapped conformation, performing such knowledge-based analysis using new proteins, improved annotations and enhanced ontologies may reveal additional functional classes, pathways and disease. In summary, we showed results from an initial investigation to understand conserved protein domains, functional repertoire, pathways and diseases mediated by 3D domain swapping in human proteome.
May AC, Johnson MS, Rufino SD, Wako H, Zhu ZY, Sowdhamini R, Srinivasan N, Rodionov MA, Blundell TL: The recognition of protein structure and function from sequence: adding value to genome data. Philos Trans R Soc Lond B Biol Sci. 1994, 344 (1310): 373-381. 10.1098/rstb.1994.0076.
Holm L, Sander C: Mapping the protein universe. Science. 1996, 273 (5275): 595-603. 10.1126/science.273.5275.595.
Grant A, Lee D, Orengo C: Progress towards mapping the universe of protein folds. Genome Biol. 2004, 5 (5): 107.-10.1186/gb-2004-5-5-107.
Reddy Chilamakuri CS, Sekhar SK, Bernard Offmann, Sowdhamini Ramanathan: PURE: a web server for querying the relationship between pre-existing domains and unassigned regions in proteins. Nature Protocol Exchange. 2007, doi:10.1038/nprot.2007.486
Murzin AG, Brenner SE, Hubbard T, Chothia C: SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol. 1995, 247 (4): 536-540.
Andreeva A, Howorth D, Chandonia JM, Brenner SE, Hubbard TJ, Chothia C, Murzin AG: Data growth and its impact on the SCOP database: new developments. Nucleic Acids Res. 2008, D419-425. 36 Database
Nooren IM, Thornton JM: Diversity of protein-protein interactions. EMBO J. 2003, 22 (14): 3486-3492. 10.1093/emboj/cdg359.
Nooren IM, Thornton JM: Structural Characterisation and Functional Significance of Transient Protein-Protein Interactions. J Mol Biol. 2003, 325 (5): 991-1018. 10.1016/S0022-2836(02)01281-0.
Jones S, Thornton JM: Principles of protein-protein interactions. Proc Natl Acad Sci USA. 1996, 93 (1): 13-20. 10.1073/pnas.93.1.13.
Jones S, Marin A, Thornton JM: Protein domain interfaces: characterization and comparison with oligomeric protein interfaces. Protein Eng. 2000, 13 (2): 77-82. 10.1093/protein/13.2.77.
Shameer K, Pugalenthi G, Kandaswamy KK, Sowdhamini R: 3dswap-pred: Prediction of 3D Domain Swapping from Protein Sequence Using Random Forest Approach. Protein Pept Lett. 2011, 18 (10): 1010-20. 10.2174/092986611796378729.
Ding F, Prutzman KC, Campbell SL, Dokholyan NV: Topological determinants of protein domain swapping. Structure. 2006, 14 (1): 5-14. 10.1016/j.str.2005.09.008.
Dehouck Y, Biot C, Gilis D, Kwasigroch JM, Rooman M: Sequence-structure signals of 3D domain swapping in proteins. J Mol Biol. 2003, 330 (5): 1215-1225. 10.1016/S0022-2836(03)00614-4.
Bennett MJ, Sawaya MR, Eisenberg D: Deposition diseases and 3D domain swapping. Structure. 2006, 14 (5): 811-824. 10.1016/j.str.2006.03.011.
Janowski R, Kozak M, Abrahamson M, Grubb A, Jaskolski M: 3D domain-swapped human cystatin C with amyloidlike intermolecular beta-sheets. Proteins. 2005, 61 (3): 570-578. 10.1002/prot.20633.
Yamasaki M, Li W, Johnson DJ, Huntington JA: Crystal structure of a stable dimer reveals the molecular basis of serpin polymerization. Nature. 2008, 455 (7217): 1255-1258. 10.1038/nature07394.
Bennett MJ, Choe S, Eisenberg D: Domain swapping: entangling alliances between proteins. Proc Natl Acad Sci USA. 1994, 91 (8): 3127-3131. 10.1073/pnas.91.8.3127.
Bennett MJ, Schlunegger MP, Eisenberg D: 3D domain swapping: a mechanism for oligomer assembly. Protein Sci. 1995, 4 (12): 2455-2468. 10.1002/pro.5560041202.
Heringa J, Taylor WR: Three-dimensional domain duplication, swapping and stealing. Curr Opin Struct Biol. 1997, 7 (3): 416-421. 10.1016/S0959-440X(97)80060-7.
Gouldson PR, Snell CR, Bywater RP, Higgs C, Reynolds CA: Domain swapping in G-protein coupled receptor dimers. Protein Eng. 1998, 11 (12): 1181-1193. 10.1093/protein/11.12.1181.
Balciunas D, Ronne H: Evidence of domain swapping within the jumonji family of transcription factors. Trends Biochem Sci. 2000, 25 (6): 274-276. 10.1016/S0968-0004(00)01593-0.
Ostermeier M, Benkovic SJ: Evolution of protein function by domain swapping. Adv Protein Chem. 2000, 55: 29-77.
Jaskolski M: 3D domain swapping, protein oligomerization, and amyloid formation. Acta Biochim Pol. 2001, 48 (4): 807-827.
Schymkowitz JW, Rousseau F, Wilkinson HR, Friedler A, Itzhaki LS: Observation of signal transduction in three-dimensional domain swapping. Nat Struct Biol. 2001, 8 (10): 888-892. 10.1038/nsb1001-888.
Hakansson M, Linse S: Protein reconstitution and 3D domain swapping. Curr Protein Pept Sci. 2002, 3 (6): 629-642. 10.2174/1389203023380459.
Liu Y, Eisenberg D: 3D domain swapping: as domains continue to swap. Protein Sci. 2002, 11 (6): 1285-1299. 10.1110/ps.0201402.
Rousseau F, Schymkowitz JW, Itzhaki LS: The unfolding story of three-dimensional domain swapping. Structure. 2003, 11 (3): 243-251. 10.1016/S0969-2126(03)00029-7.
Bennett MJ, Eisenberg D: The evolving role of 3D domain swapping in proteins. Structure. 2004, 12 (8): 1339-1341. 10.1016/j.str.2004.07.004.
Sanejouand YH: Domain swapping of CD4 upon dimerization. Proteins. 2004, 57 (1): 205-212. 10.1002/prot.20197.
Yang S, Cho SS, Levy Y, Cheung MS, Levine H, Wolynes PG, Onuchic JN: Domain swapping is a consequence of minimal frustration. Proc Natl Acad Sci USA. 2004, 101 (38): 13786-13791. 10.1073/pnas.0403724101.
Kingston RL, Vogt VM: Domain swapping and retroviral assembly. Mol Cell. 2005, 17 (2): 166-167. 10.1016/j.molcel.2005.01.002.
Yang S, Levine H, Onuchic JN, Cox DL: Structure of infectious prions: stabilization by domain swapping. FASEB J. 2005, 19 (13): 1778-1782. 10.1096/fj.05-4067hyp.
Gronenborn AM: Protein acrobatics in pairs-dimerization via domain swapping. Curr Opin Struct Biol. 2009, 19 (1): 39-49. 10.1016/j.sbi.2008.12.002.
Chahine J, Cheung MS: Computational studies of the reversible domain swapping of p13suc1. Biophys J. 2005, 89 (4): 2693-2700. 10.1529/biophysj.105.062679.
Cho SS, Levy Y, Onuchic JN, Wolynes PG: Overcoming residual frustration in domain-swapping: the roles of disulfide bonds in dimerization and aggregation. Phys Biol. 2005, 2 (2): S44-S55. 10.1088/1478-3975/2/2/S05.
Esposito L, Daggett V: Insight into ribonuclease A domain swapping by molecular dynamics unfolding simulations. Biochemistry. 2005, 44 (9): 3358-3368. 10.1021/bi0488350.
Picone D, Di Fiore A, Ercole C, Franzese M, Sica F, Tomaselli S, Mazzarella L: The role of the hinge loop in domain swapping. The special case of bovine seminal ribonuclease. J Biol Chem. 2005, 280 (14): 13771-13778. 10.1074/jbc.M413157200.
Seeliger MA, Spichty M, Kelly SE, Bycroft M, Freund SM, Karplus M, Itzhaki LS: Role of conformational heterogeneity in domain swapping and adapter function of the Cks proteins. J Biol Chem. 2005, 280 (34): 30448-30459. 10.1074/jbc.M501450200.
Yang S, Levine H, Onuchic JN: Protein oligomerization through domain swapping: role of inter-molecular interactions and protein concentration. J Mol Biol. 2005, 352 (1): 202-211. 10.1016/j.jmb.2005.06.062.
Guo Z, Eisenberg D: Runaway domain swapping in amyloid-like fibrils of T7 endonuclease I. Proc Natl Acad Sci USA. 2006, 103 (21): 8042-8047. 10.1073/pnas.0602607103.
O'Neill JW, Manion MK, Maguire B, Hockenbery DM: BCL-XL dimerization by three-dimensional domain swapping. J Mol Biol. 2006, 356 (2): 367-381. 10.1016/j.jmb.2005.11.032.
Benfield AP, Whiddon BB, Clements JH, Martin SF: Structural and energetic aspects of Grb2-SH2 domain-swapping. Arch Biochem Biophys. 2007, 462 (1): 47-53. 10.1016/j.abb.2007.03.010.
Wahlbom M, Wang X, Lindstrom V, Carlemalm E, Jaskolski M, Grubb A: Fibrillogenic oligomers of human cystatin C are formed by propagated domain swapping. J Biol Chem. 2007, 282 (25): 18318-18326. 10.1074/jbc.M611368200.
Garcia-Pino A, Martinez-Rodriguez S, Wahni K, Wyns L, Loris R, Messens J: Coupling of Domain Swapping to Kinetic Stability in a Thioredoxin Mutant. J Mol Biol. 2008, 385 (5): 1590-1599.
Malevanets A, Sirota FL, Wodak SJ: Mechanism and energy landscape of domain swapping in the B1 domain of protein G. J Mol Biol. 2008, 382 (1): 223-235. 10.1016/j.jmb.2008.06.025.
Park SH, Park HY, Sohng JK, Lee HC, Liou K, Yoon YJ, Kim BG: Expanding substrate specificity of GT-B fold glycosyltransferase via domain swapping and high-throughput screening. Biotechnol Bioeng. 2009, 102 (4): 988-94. 10.1002/bit.22150.
Sirota FL, Hery-Huynh S, Maurer-Stroh S, Wodak SJ: Role of the amino acid sequence in domain swapping of the B1 domain of protein G. Proteins. 2008, 72 (1): 88-104. 10.1002/prot.21901.
Hansen EH, Osmani SA, Kristensen C, Moller BL, Hansen J: Substrate specificities of family 1 UGTs gained by domain swapping. Phytochemistry. 2009, 70 (4): 473-482. 10.1016/j.phytochem.2009.01.013.
Pesenti ME, Spinelli S, Bezirard V, Briand L, Pernollet JC, Campanacci V, Tegoni M, Cambillau C: Queen bee pheromone binding protein pH-induced domain swapping favors pheromone release. J Mol Biol. 2009, 390 (5): 981-990. 10.1016/j.jmb.2009.05.067.
Miller KH, Karr JR, Marqusee S: A hinge region cis-proline in ribonuclease A acts as a conformational gatekeeper for C-terminal domain swapping. J Mol Biol. 2010, 400 (3): 567-578. 10.1016/j.jmb.2010.05.017.
Orlikowska M, Jankowska E, Kolodziejczyk R, Jaskolski M, Szymanska A: Hinge-loop mutation can be used to control 3D domain swapping and amyloidogenesis of human cystatin C. J Struct Biol. 2010, 173 (2): 406-13.
Shameer K, Pugalenthi G, Kandaswamy KK, Suganthan PN, Archunan G, Sowdhamini R: Insights into Protein Sequence and Structure-Derived Features Mediating 3D Domain Swapping Mechanism using Support Vector Machine Based Approach. Bioinformatics and Biology Insights. 2010, 4 (4): 33-42. [http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2901629/]
Shameer K, Shingate PN, Manjunath SC, Karthika M, Pugalenthi G, Sowdhamini R: 3DSwap: curated knowledgebase of proteins involved in 3D domain swapping. Database (Oxford). 2011, 2011: [http://database.oxfordjournals.org/content/2011/bar042.full]
da Huang W, Sherman BT, Lempicki RA: Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res. 2009, 37 (1): 1-13. 10.1093/nar/gkn923.
Tipney H, Hunter L: An introduction to effective use of enrichment analysis software. Hum Genomics. 2010, 4 (3): 202-206. 10.1186/1479-7364-4-3-202.
Finn RD, Mistry J, Tate J, Coggill P, Heger A, Pollington JE, Gavin OL, Gunasekaran P, Ceric G, Forslund K, et al: The Pfam protein families database. Nucleic Acids Res. 2010, 38 (Database issue): D211-222.
Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000, 25 (1): 25-29. 10.1038/75556.
Barrell D, Dimmer E, Huntley RP, Binns D, O'Donovan C, Apweiler R: The GOA database in 2009--an integrated Gene Ontology Annotation resource. Nucleic Acids Res. 2009, 37 (Database issue): D396-403.
Rhee SY, Wood V, Dolinski K, Draghici S: Use and misuse of the gene ontology annotations. Nat Rev Genet. 2008, 9 (7): 509-515. 10.1038/nrg2363.
Kanehisa M, Goto S: KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000, 28 (1): 27-30. 10.1093/nar/28.1.27.
Du P, Feng G, Flatow J, Song J, Holko M, Kibbe WA, Lin SM: From disease ontology to disease-ontology lite: statistical methods to adapt a general-purpose ontology for the test of gene-ontology associations. Bioinformatics. 2009, 25 (12): i63-i68. 10.1093/bioinformatics/btp193.
Osborne JD, Flatow J, Holko M, Lin SM, Kibbe WA, Zhu LJ, Danila MI, Feng G, Chisholm RL: Annotating the human genome with Disease Ontology. BMC Genomics. 2009, 10 (Suppl 1): S6-10.1186/1471-2164-10-S1-S6.
Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, et al: Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA. 2005, 102 (43): 15545-15550. 10.1073/pnas.0506580102.
Maere S, Heymans K, Kuiper M: BiNGO: a Cytoscape plugin to assess overrepresentation of gene ontology categories in biological networks. Bioinformatics. 2005, 21 (16): 3448-3449. 10.1093/bioinformatics/bti551.
Berriz GF, King OD, Bryant B, Sander C, Roth FP: Characterizing gene sets with FuncAssociate. Bioinformatics. 2003, 19 (18): 2502-2504. 10.1093/bioinformatics/btg363.
Draghici S, Khatri P, Bhavsar P, Shah A, Krawetz SA, Tainsky MA: Onto-Tools, the toolkit of the modern biologist: Onto-Express, Onto-Compare, Onto-Design and Onto-Translate. Nucleic Acids Res. 2003, 31 (13): 3775-3781. 10.1093/nar/gkg624.
Khatri P, Draghici S, Ostermeier GC, Krawetz SA: Profiling gene expression using onto-express. Genomics. 2002, 79 (2): 266-270. 10.1006/geno.2002.6698.
Bauer S, Grossmann S, Vingron M, Robinson PN: Ontologizer 2.0-a multifunctional tool for GO term enrichment analysis and data exploration. Bioinformatics. 2008, 24 (14): 1650-1651. 10.1093/bioinformatics/btn250.
Alexa A, Rahnenfuhrer J, Lengauer T: Improved scoring of functional groups from gene expression data by decorrelating GO graph structure. Bioinformatics. 2006, 22 (13): 1600-1607. 10.1093/bioinformatics/btl140.
Molecular Signatures Database. [http://www.broadinstitute.org/gsea/msigdb/index.jsp]
Al-Shahrour F, Diaz-Uriarte R, Dopazo J: FatiGO: a web tool for finding significant associations of Gene Ontology terms with groups of genes. Bioinformatics. 2004, 20 (4): 578-580. 10.1093/bioinformatics/btg455.
Newman JC, Weiner AM: L2L: a simple tool for discovering the hidden significance in microarray expression data. Genome Biol. 2005, 6 (9): R81.-
Zhong S, Storch KF, Lipan O, Kao MC, Weitz CJ, Wong WH: GoSurfer: a graphical interactive tool for comparative analysis of large gene sets in Gene Ontology space. Appl Bioinformatics. 2004, 3 (4): 261-264. 10.2165/00822942-200403040-00009.
Masseroli M, Galati O, Pinciroli F: GFINDer: genetic disease and phenotype location statistical analysis and mining of dynamically annotated gene lists. Nucleic Acids Res. 2005, (33 Web Server): W717-723. [http://nar.oxfordjournals.org/content/33/suppl_2/W717.full]
Cantor RM, Lange K, Sinsheimer JS: Prioritizing GWAS results: A review of statistical methods and recommendations for their application. Am J Hum Genet. 2010, 86 (1): 6-22. 10.1016/j.ajhg.2009.11.017.
Moore JH, Asselbergs FW, Williams SM: Bioinformatics challenges for genome-wide association studies. Bioinformatics. 2010, 26 (4): 445-455. 10.1093/bioinformatics/btp713.
Letunic I, Doerks T, Bork P: SMART 6: recent updates and new developments. Nucleic Acids Res. 2009, D229-D232. [http://ukpmc.ac.uk/articles/PMC2686533//reload=0;jsessionid=WBactjswkZTve9JhiOKX.12]37 Database
Salomonis N, Hanspers K, Zambon AC, Vranizan K, Lawlor SC, Dahlquist KD, Doniger SW, Stuart J, Conklin BR, Pico AR: GenMAPP 2: new features and resources for pathway analysis. BMC Bioinforma. 2007, 8: 217-10.1186/1471-2105-8-217.
Hamosh A, Scott AF, Amberger JS, Bocchini CA, McKusick VA: Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res. 2005, 33 (Database issue): D514-D517.
Smith B, Ashburner M, Rosse C, Bard J, Bug W, Ceusters W, Goldberg LJ, Eilbeck K, Ireland A, Mungall CJ, et al: The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration. Nat Biotechnol. 2007, 25 (11): 1251-1255. 10.1038/nbt1346.
da Huang W, Sherman BT, Lempicki RA: Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc. 2009, 4 (1): 44-57.
Cote RG, Jones P, Martens L, Kerrien S, Reisinger F, Lin Q, Leinonen R, Apweiler R, Hermjakob H: The Protein Identifier Cross-Referencing (PICR) service: reconciling protein identifiers across multiple source databases. BMC Bioinforma. 2007, 8: 401-10.1186/1471-2105-8-401.
Li C, Li X, Miao Y, Wang Q, Jiang W, Xu C, Li J, Han J, Zhang F, Gong B, et al: SubpathwayMiner: a software package for flexible identification of pathways. Nucleic Acids Res. 2009, 37 (19): e131-10.1093/nar/gkp667.
Grossmann S, Bauer S, Robinson PN, Vingron M: Improved detection of overrepresentation of Gene-Ontology annotations with parent child analysis. Bioinformatics. 2007, 23 (22): 3024-3031. 10.1093/bioinformatics/btm440.
GeneRIF -- Gene Reference Into Function. [http://www.ncbi.nlm.nih.gov/projects/GeneRIF/]
Hanks SK, Quinn AM, Hunter T: The protein kinase family: conserved features and deduced phylogeny of the catalytic domains. Science. 1988, 241 (4861): 42-52. 10.1126/science.3291115.
Rawlings ND, Barrett AJ: Evolution of proteins of the cystatin superfamily. J Mol Evol. 1990, 30 (1): 60-71. 10.1007/BF02102453.
Kobe B, Deisenhofer J: The leucine-rich repeat: a versatile binding motif. Trends Biochem Sci. 1994, 19 (10): 415-421. 10.1016/0968-0004(94)90090-6.
Stehle T, Schulz GE: Refined structure of the complex between guanylate kinase and its substrate GMP at 2.0 A resolution. J Mol Biol. 1992, 224 (4): 1127-1141. 10.1016/0022-2836(92)90474-X.
Smith DK, Xue H: Sequence profiles of immunoglobulin and immunoglobulin-like domains. J Mol Biol. 1997, 274 (4): 530-545. 10.1006/jmbi.1997.1432.
Goh KI, Cusick ME, Valle D, Childs B, Vidal M, Barabasi AL: The human disease network. Proc Natl Acad Sci USA. 2007, 104 (21): 8685-8690. 10.1073/pnas.0701361104.
Barabasi AL, Gulbahce N, Loscalzo J: Network medicine: a network-based approach to human disease. Nat Rev Genet. 2011, 12 (1): 56-68. 10.1038/nrg2918.
Vidal M, Cusick ME, Barabasi AL: Interactome networks and human disease. Cell. 2011, 144 (6): 986-998. 10.1016/j.cell.2011.02.016.
Ashley EA, Butte AJ, Wheeler MT, Chen R, Klein TE, Dewey FE, Dudley JT, Ormond KE, Pavlovic A, Morgan AA, et al: Clinical assessment incorporating a personal genome. Lancet. 2010, 375 (9725): 1525-1535. 10.1016/S0140-6736(10)60452-7.
Thomas PJ, Qu BH, Pedersen PL: Defective protein folding as a basis of human disease. Trends Biochem Sci. 1995, 20 (11): 456-459. 10.1016/S0968-0004(00)89100-8.
Prusiner SB: Molecular biology and pathogenesis of prion diseases. Trends Biochem Sci. 1996, 21 (12): 482-487. 10.1016/S0968-0004(96)10063-3.
Buxbaum JN: Diseases of protein conformation: what do in vitro experiments tell us about in vivo diseases?. Trends Biochem Sci. 2003, 28 (11): 585-592. 10.1016/j.tibs.2003.09.009.
Soto C, Estrada L, Castilla J: Amyloids, prions and the inherent infectious nature of misfolded protein aggregates. Trends Biochem Sci. 2006, 31 (3): 150-155. 10.1016/j.tibs.2006.01.002.
Blundell TL, Jhoti H, Abell C: High-throughput crystallography for lead discovery in drug design. Nat Rev Drug Discov. 2002, 1 (1): 45-54. 10.1038/nrd706.
Blundell TL, Sibanda BL, Montalvao RW, Brewerton S, Chelliah V, Worth CL, Harmer NJ, Davies O, Burke D: Structural biology and bioinformatics in drug design: opportunities and challenges for target identification and lead discovery. Philos Trans R Soc Lond B Biol Sci. 2006, 361 (1467): 413-423. 10.1098/rstb.2005.1800.
Zhang S, Zhong N, Xue F, Kang X, Ren X, Chen J, Jin C, Lou Z, Xia B: Three-dimensional domain swapping as a mechanism to lock the active conformation in a super-active octamer of SARS-CoV main protease. Protein Cell. 2010, 1 (4): 371-383. 10.1007/s13238-010-0044-8.
Mestres J: Representativity of target families in the Protein Data Bank: impact for family-directed structure-based drug discovery. Drug Discov Today. 2005, 10 (23-24): 1629-1637. 10.1016/S1359-6446(05)03593-2.
Stewart L, Clark R, Behnke C: High-throughput crystallization and structure determination in drug discovery. Drug Discov Today. 2002, 7 (3): 187-196. 10.1016/S1359-6446(01)02121-3.
Hillisch A, Pineda LF, Hilgenfeld R: Utility of homology models in the drug discovery process. Drug Discov Today. 2004, 9 (15): 659-669. 10.1016/S1359-6446(04)03196-4.
Authors thanks NCBS (TIFR) for financial and infrastructural support. We would like that anonymous reviewers and the editor for constructive criticism and useful suggestions.
The authors declare that they have no competing interests.
KS curated the data, performed the analysis and compiled the first draft of the manuscript. RS conceived the project, designed the curation strategy, discussed the approaches and provided critical comments to the manuscript. All authors read and approved the final manuscript.