comoR: a software for disease comorbidity risk assessment
© Moni and Liò; licensee BioMed Central Ltd. 2014
Received: 14 February 2014
Accepted: 17 April 2014
Published: 23 May 2014
The diagnosis of comorbidities, which refers to the coexistence of different acute and chronic diseases, is difficult due to the modern extreme specialisation of physicians. We envisage that a software dedicated to comorbidity diagnosis could result in an effective aid to the health practice.
We have developed an R software comoR to compute novel estimators of the disease comorbidity associations. Starting from an initial diagnosis, genetic and clinical data of a patient the software identifies the risk of disease comorbidity. Then it provides a pipeline with different causal inference packages (e.g. pcalg, qtlnet etc) to predict the causal relationship of diseases. It also provides a pipeline with network regression and survival analysis tools (e.g. Net-Cox, rbsurv etc) to predict more accurate survival probability of patients. The input of this software is the initial diagnosis for a patient and the output provides evidences of disease comorbidity mapping.
The functions of the comoR offer flexibility for diagnostic applications to predict disease comorbidities, and can be easily integrated to high–throughput and clinical data analysis pipelines.
The term “comorbidity” refers to the coexistence or presence of multiple diseases or disorders in relation to a primary disease or disorder in a patient . Multimorbidity can be also defined as coexistence of two or more diseases, but no index disease is considered . A comorbidity relationship between two diseases exists whenever they appear simultaneously in a patient more than chance alone. It represents the co–occurrence of diseases or presence of different medical conditions one after another in the same patient [3, 4]. Some diseases or infections can coexist in one person by coincidence, and there is no pathological association among them. However, in most of the cases, multiple diseases (acute or chronic events) occur together in a patient because of the associations among diseases. These associations can be due to direct or indirect causal relationships and the shared risk factors among diseases [5, 6]. For an instance, people with HIV-1 appear to have a markedly higher rate of end-stage renal disease (ESRD) than the healthy people . It is because some of the risk factors associated with HIV-1 acquisition are the same as those that lead to kidney disease. Patients with chronic kidney disease increase risk of cardiovascular mortality . Thus HIV-1 infections is associated with cardiovascular mortality.
One of the most challenging problems in biomedical research is to understand the complex correlation mechanisms of human diseases. Recent research has increasingly demonstrated that many seemingly dissimilar diseases have common molecular mechanisms. Exploring relations between genes and diseases at the molecular level could greatly facilitate our understanding of pathogenesis, and eventually lead to better diagnosis and treatment. Diseases are more likely to be comorbid if they share associated genes . However, some diseases have direct positive association among them while other diseases may have indirect positive association among them through the biological pathways. The analysis of pathway-disease associations, in addition to gene-disease associations, could be used to clarify the molecular mechanism of a disease. Ashley, Butte, Wheeler, Chen, Klein, Dewey, Dudley, Ormond, Pavlovic, Morgan, Pushkarev, Neff, Hudgins, Gong, Hodges, Berlin, Thorn, Sangkuhl, Hebert, Woon, Sagreiya, Whaley, Knowles, Chou, Thakuria, Rosenbaum, Zaranek, Church, Greely and Quake et al. analysed personal genome, gene-environment interactions and conditionally dependent risks for the clinical assessment . Population-based disease association is also useful in conjunction with molecular and genetic data to discover the molecular origins of disease and disease comorbidity . Patient medical records contain important clarification regarding the co-occurrences of diseases affecting the same patient. To estimate the correlation starting from disease co-occurrence, we need to quantify the strength of the comorbidity risk. Disease Ontology (DO) is also helpful to promote the investigation of diseases and disease risk factors .
Comorbidity is an important factor for better risk stratification of patients and treatment planning. The more precise predictions can be made by taking comorbidity into account, the more accurate patient management could be possible. Comorbidity has a significant predictive value on overall survival . Older persons’ survival is highly dependent on it. Comorbidities influence patients treatments and confound survival analysis . For an instance, comorbidity has a major effect on survival in gynaecological cancer, particularly for cancer of the cervix . Many researchers have developed survival analysis software for predicting outcomes of the disease [14–23]. However, all of them are based on the single disease. But survival of patient depends on the disease comorbidity, environment, patient age and treatment plan. Kan et al. performed survival analysis of elderly dialysis patients considering comorbidity risk . They observed that the life expectancy decreases with increasing the number of comorbid diseases. So it is important to consider the comorbidity for more accurate survival prediction.
We have developed an R software comoR to compute statistically significant associations among diseases and to predict disease comorbidity risk by using diverse set of data. The input of this software is the initial diagnosis for a patient. To perform the computation of the comorbidity risk, this software uses clinical, gene expression, pathways and ontology data. It provides different comorbidity assessment; integration of genetic information with the comoR output data could be used to infer causal relationships among diseases and to predict more accurate survival probability of patients. The goal of this software is to assist a medical practitioner in decision making in potential treatment.
Comorbidity based on clinical information
Patient medical records contain important clarification regarding the co-occurrences of diseases affecting the same patient. Two diseases are connected if they are co-expressed in a significant number of patients in a population . To estimate the correlation starting from disease co-occurrence, we need to quantify the strength of the comorbidity risk. We used two comorbidity measures to quantify the strength of comorbidity associations between two diseases: (i) the Relative Risk (fraction between the number of patients diagnosed with both diseases and random expectation based on disease prevalence) as the quantified measures of comorbidity tendency of two disease pairs; and (ii) ϕ-correlation (Pearsons correlation for binary variables) to measure the robustness of the comorbidity association. We used the relative risk R R i j and ϕ-correlation ϕ i j of observing a pair of diseases i and j affecting the same patient. The R R i j allows us to quantify the co-occurrence of disease pairs compared with the random expectation. When two diseases co-occur more frequently than expected by chance, we will get R R i j >1 and ϕ i j >0. The two comorbidity measures are not completely independent of each other. We included edges between disease pairs for which the co-occurrence is significantly greater than the random expectation based on population prevalence of the diseases. Clinical information is from the http://www.icd9data.com in the ICD-9-CM format and collected from . The function comorbidityPatients of the comoR package is able to take input an OMIM id/3 or 5 digit ICD-9-CM code of a disease or a list of gene symbols/Entrez ids and provides comorbidity pattern of diseases based on the relative risk and ϕ-correlation between two diseases. comorbidityPatients requires two parameters id list and id type (see details in the Additional file 1). An example and its output (Figure 2) is as follows:
Ontology and causal inference to evaluate comorbidity
where N is the total number of patients in the population, P i and P j are incidences/prevalences of diseases i and j respectively. C i j is the number of patients that have been diagnosed with both diseases i and j, and P i P j is the random expectation based on disease prevalence. The significance of the relative risk R R i j is calculated by using the Katz et al. method to estimate confidence intervals . The 99% confidence interval for the R R i j between two diseases i and j is calculated by: Lower bounds of the confidence interval (L B)=R R i j ∗e x p(−2.56∗σ i j ) and Upper bounds of the confidence interval (U B)=R R i j ∗e x p(2.56∗σ i j ), where σ i j is given by: . Disease pairs within the 99% confidence interval are only considered if the LB value is larger than 1 when R R i j is larger than 1, or if the UB value is smaller than 1 when R R i j is smaller than 1. For ϕ i j >0 comorbidity is larger than expected by chance and for ϕ i j <0 comorbidity is smaller than expected by chance. We can determine the significance of ϕ≠0 by performing a t-test. This consists of calculating t according to the formula: , where n is the number of observations used to calculate ϕ.
where N is the total number of reference genes, M is the number of genes that are associated to the disease of interest, n is the size of the list of genes of interest and k is the number of genes within that list which are associated to the disease.
where D A (t) is the semantic value of disease t related to DO term or disease A and D B (t) is the semantic value of DO term or disease t associated to DO term or disease B.
Comparison with similar software
An R package “comorbidities” that has functions to categorize comorbidites into the Deyo-Charlson index, the original Elixhauser index of 30 comorbidities, and the AHRQ comorbidity index of 29 diagnoses [35, 36]. This package provides total comorbidity count or the total Charlson score. But comoR provides relative risk, ϕ-correlation, associated genes, pathway and p-value between the comorbidity diseases. It could provide comorbidity associations among all diseases. So comoR is more useful than “comorbidities”.
Comparative values of genes co-expression and functional linkage network based penalised Cox regression coefficient ( β ) of five significant genes (BRCA1, BRCA2, PTEN, TGFB2 and TP53) in five diseases conditions (breast cancer, colon cancer, ovarian cancer, liver cancer and osteosarcoma)
Exploring associations among diseases at the molecular and clinical levels could greatly facilitate our understanding of pathogenesis, and eventually lead to better diagnosis and treatment. If two diseases have associated comorbidity, the occurrence of one of them in a patient may increase the likelihood of developing the other diseases. Development of methods integrating genetic and clinical data will assist clinical decision making and represent a large step towards individualised medicine. Hidalgo et al. analysed comorbidity associations using the medical records . To our knowledge, there is no available R software package for the prediction of disease comorbidities. An R package “comorbodoties” is able to categorises ICD-9-CM codes based on published 30 comorbidity indices using Deyo adaptation of Charlson index and the Elixhauser index [35, 36]. We have developed comoR, an R package that implements different statistical approach for the prediction of disease comorbidity using divers set of data.
Advances in high-throughput molecular assay technologies in the fields of genomics, proteomics and other omics is increasing the diagnostic and therapeutic strategies, and systems-driven strategies for personalised treatment. In particular, the availability of these data sets for many different diseases presents a ripe opportunity to use data-driven approaches to advance our current knowledge of disease relationships in a systematic way. Patient’s genetic/genomic data is becoming important for clinical decision making, including disease risk assessment, disease diagnosis and subtyping, drug therapy and dose selection . In the future, clinicians will have to consider genetic/genomic implications to patient care throughout their clinical workflow, including electronic prescribing of medications. The identified disease patterns can then be further investigated with regards to their diagnostic utility or help in the prediction of novel therapeutic targets. Therefore, comoR could be helpful for the personalised medicine system. This software will provide us to detect many diseases at the earliest detectable phase, weeks, months, and maybe years before symptoms appear. Thus it could be applicable in the personalised medicine and in clinical bioinformatics.
Doctors need to be kept updated on novel information on likely comorbidities of diseases. The comoR software provides a robust approach to study disease comorbidities, which can be easily integrated into pipelines for high-throughput and clinical data analysis and to predict causal inference of a disease. This software will help to gain a better understanding of the complex pathogenesis of disease risk phenotypes and the heterogeneity of disease comorbidity. Thus it could be applicable in the personalised medicine and in clinical bioinformatics.
Availability and requirements
The software package comoR has been written in the platform independent R programming language. It requires R version 3.0.1 or newer to run. The software is freely available at http://www.cl.cam.ac.uk/~mam211/comoR/ and will appear in Comprehensive R Archive Network (CRAN) at (http://cran.r-project.org/).
This work is supported by the EU Mission T2D project.
- Capobianco E, Liò P: Comorbidity: a multidimensional approach. Trends Mol Med. 2013, 19 (9): 515-521. 10.1016/j.molmed.2013.07.004.View ArticlePubMedGoogle Scholar
- Radner H, Yoshida K, Smolen JS, Solomon DH: Multimorbidity and rheumatic conditions —enhancing the concept of comorbidity. Nat Rev Rheumatol. 2014,Google Scholar
- Park J, Lee DS, Christakis NA, Barabási AL: The impact of cellular networks on disease comorbidity. Mol Syst Biol. 2009, 5: 1-View ArticleGoogle Scholar
- Hidalgo CA, Blumm N, Barabási AL, Christakis NA: A dynamic network approach for the study of human phenotypes. PLoS Comput Biol. 2009, 5 (4): e1000353-10.1371/journal.pcbi.1000353.PubMed CentralView ArticlePubMedGoogle Scholar
- Tong B, Stevenson C: Comorbidity of cardiovascular disease, diabetes and chronic kidney disease in Australia. 2007, Australian Institute of Health & Welfare, CanberraGoogle Scholar
- Liò P, Paoletti N, Moni MA, Atwell K, Merelli E, Viceconti M: Modelling osteomyelitis. BMC bioinformatics. 2012, 13 (Suppl 14): S12-10.1186/1471-2105-13-S14-S12.PubMed CentralView ArticlePubMedGoogle Scholar
- Kumar MSA, Sierka DR, Damask AM, Fyfe B, Mcalack RF, Heifets M, Moritz MJ, Alvarez D, Kumar A: Safety and success of kidney transplantation and concomitant immunosuppression in HIV-positive patients. Kidney Int. 2005, 67 (4): 1622-1629. 10.1111/j.1523-1755.2005.00245.x.View ArticlePubMedGoogle Scholar
- de Jager DJ, Vervloet MG, Dekker FW: Noncardiovascular mortality in CKD: an epidemiological perspective. Nat Rev Nephrol. 2014, 10 (4): 208-214. 10.1038/nrneph.2014.8.View ArticlePubMedGoogle Scholar
- Ashley EA, Butte AJ, Wheeler MT, Chen R, Klein TE, Dewey FE, Dudley JT, Ormond KE, Pavlovic A, Morgan AA: Clinical assessment incorporating a personal genome. The Lancet. 2010, 375 (9725): 1525-1535. 10.1016/S0140-6736(10)60452-7.View ArticleGoogle Scholar
- Schriml LM, Arze C, Nadendla S, Chang YWW, Mazaitis M, Felix V, Feng G, Kibbe WA: Disease Ontology: a backbone for disease semantic integration. Nucleic Acids Res. 2012, 40 (D1): D940-D946. 10.1093/nar/gkr972.PubMed CentralView ArticlePubMedGoogle Scholar
- Lagro J, Melis RJ, Rikkert MGO: Importance of comorbidity in competing risks analysis in patients with localized renal cell carcinoma. J Clin Oncol. 2010, 28 (18): e298-e298. 10.1200/JCO.2009.27.3987.View ArticlePubMedGoogle Scholar
- Hall SF, Rochon PA, Streiner DL, Paszat LF, Groome PA, Rohland SL: Measuring comorbidity in patients with head and neck cancer. The Laryngoscope. 2002, 112 (11): 1988-1996. 10.1097/00005537-200211000-00015.View ArticlePubMedGoogle Scholar
- Ferrandina G, Lucidi A, Paglia A, Corrado G, Macchia G, Tagliaferri L, Fanfani F, Morganti AG, Valentini V, Scambia G: Role of comorbidities in locally advanced cervical cancer patients administered preoperative chemoradiation: impact on outcome and treatment-related complications. Eur J Surg Oncol (EJSO). 2012, 38 (3): 238-244. 10.1016/j.ejso.2011.12.001.View ArticleGoogle Scholar
- Lin Y, Wang S, Chappell RJ: Lasso tree for cancer staging with survival data. Biostatistics. 2013, 14 (2): 327-339. 10.1093/biostatistics/kxs044.PubMed CentralView ArticlePubMedGoogle Scholar
- Annest A, Bumgarner RE, Raftery AE, Yeung KY: The iterative bayesian model averaging algorithm for survival analysis: an improved method for gene selection and survival analysis on microarray data. 2010,Google Scholar
- Oberthuer A, Kaderali L, Kahlert Y, Hero B, Westermann F, Berthold F, Brors B, Eils R, Fischer M: Subclassification and individual survival time prediction from gene expression data of neuroblastoma patients by using CASPAR. Clin Cancer Res. 2008, 14 (20): 6590-6601. 10.1158/1078-0432.CCR-07-4377.View ArticlePubMedGoogle Scholar
- Haibe-Kains B, Schröder M, Olsen C, Sotiriou C, Bontempi G, Quackenbush J, de Montréal RC: Survcomp: a package for performance assessment and comparison for survival analysis. 2013, 27 (22): 3206-3208.Google Scholar
- Cho H, Yu A, Kim S, Kang J, Hong SM: Robust likelihood-based survival modeling for microarray data. J Stat Softw. 2009, 29 (i01): (American Statistical Association),Google Scholar
- Ishwaran H, Kogalur UB, Blackstone EH, Lauer MS: Random survival forests. Ann Appl Stat. 2008, 841-860. (JSTOR),Google Scholar
- Therneau T: Package survival. R Project. 2013,Google Scholar
- Yasrebi H: SurvJamda: an R package to predict patients’ survival and risk assessment using joint analysis of microarray gene expression data. Bioinformatics. 2011, 27 (8): 1168-1169. 10.1093/bioinformatics/btr103.View ArticlePubMedGoogle Scholar
- Lopez-de Ullibarri I, Jácome MA: survPresmooth: an R package for presmoothed estimation in survival analysis. J Stat Softw. 2013, 54 (11): 1-26.View ArticleGoogle Scholar
- Colchero F, Jones O, Rebke M, Colchero MF: Package BaSTA. Methods Ecol Evol. 2013, 3 (3): 466-470.View ArticleGoogle Scholar
- Kan WC, Wang JJ, Wang SY, Sun YM, Hung CY, Chu CC, Lu CL, Weng SF, Chio CC, Chien CC: The new Comorbidity Index for predicting survival in elderly dialysis patients: a long-term population-based study. PloS one. 2013, 8 (8): e68748-10.1371/journal.pone.0068748.PubMed CentralView ArticlePubMedGoogle Scholar
- Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J: Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 2004, 5 (10): R80-10.1186/gb-2004-5-10-r80.PubMed CentralView ArticlePubMedGoogle Scholar
- Yu G, Wang LG: Disease ontology semantic and enrichment analysis. 2012,Google Scholar
- McKusick VA: Mendelian inheritance in man and its online version, OMIM. Am J Human Genet. 2007, 80 (4): 588-10.1086/514346.View ArticleGoogle Scholar
- Goh KI, Cusick ME, Valle D, Childs B, Vidal M, Barabasi AL: The human disease network. Proc Nat Acad Sci. 2007, 104 (21): 8685-8690. 10.1073/pnas.0701361104.PubMed CentralView ArticlePubMedGoogle Scholar
- Kanehisa M, Goto S, Furumichi M, Tanabe M, Hirakawa M: KEGG for representation and analysis of molecular networks involving diseases and drugs. Nucleic Acids Res. 2010, 38 (suppl 1): D355-D360.PubMed CentralView ArticlePubMedGoogle Scholar
- Wang JZ, Du Z, Payattakool R, Philip SY, Chen CF: A new method to measure the semantic similarity of GO terms. Bioinformatics. 2007, 23 (10): 1274-1281. 10.1093/bioinformatics/btm087.View ArticlePubMedGoogle Scholar
- Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES: Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Nat Acad Sci USA. 2005, 102 (43): 15545-15550. 10.1073/pnas.0506580102.PubMed CentralView ArticlePubMedGoogle Scholar
- Du P, Feng G, Flatow J, Song J, Holko M, Kibbe WA, Lin SM: From disease ontology to disease-ontology lite: statistical methods to adapt a general-purpose ontology for the test of gene-ontology associations. Bioinformatics. 2009, 25 (12): i63-i68. 10.1093/bioinformatics/btp193.PubMed CentralView ArticlePubMedGoogle Scholar
- Kalisch M, Mächler M, Colombo D, Maathuis MH, Bühlmann P: Causal inference using graphical models with the R package pcalg. J Stat Softw. 2012, 47 (11): 1-26.View ArticleGoogle Scholar
- Katz D, Baptista J, Azen S, Pike M: Obtaining confidence intervals for the risk ratio in cohort studies. Biometrics. 1978, 469-474. (JSTOR),Google Scholar
- Deyo RA, Cherkin DC, Ciol MA: Adapting a clinical comorbidity index for use with ICD-9-CM administrative databases. J Clin Epidemiol. 1992, 45 (6): 613-619. 10.1016/0895-4356(92)90133-8.View ArticlePubMedGoogle Scholar
- Elixhauser A, Steiner C, Harris DR, Coffey RM: Comorbidity measures for use with administrative data. Med Care. 1998, 36: 8-27. 10.1097/00005650-199801000-00004.View ArticlePubMedGoogle Scholar
- Zhang W, Ota T, Shridhar V, Chien J, Wu B, Kuang R: Network-based Survival Analysis Reveals Subnetwork Signatures for Predicting Outcomes of Ovarian Cancer Treatment. PLoS Comput Biol. 2013, 9 (3): e1002975-10.1371/journal.pcbi.1002975.PubMed CentralView ArticlePubMedGoogle Scholar
- Bowker SL, Majumdar SR, Veugelers P, Johnson JA: Increased cancer-related mortality for patients with type 2 diabetes who use sulfonylureas or insulin. Diabetes Care. 2006, 29 (2): 254-258. 10.2337/diacare.29.02.06.dc05-1558.View ArticlePubMedGoogle Scholar
- Miller LD, Smeds J, George J, Vega VB, Vergara L, Ploner A, Pawitan Y, Hall P, Klaar S, Liu ET: An expression signature for p53 status in human breast cancer predicts mutation status, transcriptional effects, and patient survival. Proc Nat Acad Sci USA. 2005, 102 (38): 13550-13555. 10.1073/pnas.0506230102.PubMed CentralView ArticlePubMedGoogle Scholar
- Smith JJ, Deane NG, Wu F, Merchant NB, Zhang B, Jiang A, Lu P, Johnson JC, Schmidt C, Bailey CE: Experimentally derived metastasis gene expression profile predicts recurrence and death in patients with colon cancer. Gastroenterology. 2010, 138 (3): 958-968. 10.1053/j.gastro.2009.11.005.PubMed CentralView ArticlePubMedGoogle Scholar
- Bonome T, Levine DA, Shih J, Randonovich M, Pise-Masison CA, Bogomolniy F, Ozbun L, Brady J, Barrett JC, Boyd J: A gene signature predicting for survival in suboptimally debulked patients with ovarian cancer. Cancer Res. 2008, 68 (13): 5478-5486. 10.1158/0008-5472.CAN-07-6595.View ArticlePubMedGoogle Scholar
- Villanueva A, Hoshida Y, Battiston C, Tovar V, Sia D, Alsinet C, Cornella H, Liberzon A, Kobayashi M, Kumada H: Combining clinical, pathology, and gene expression data to predict recurrence of hepatocellular carcinoma. Gastroenterology. 2011, 140 (5): 1501-1512. 10.1053/j.gastro.2011.02.006.PubMed CentralView ArticlePubMedGoogle Scholar
- Buddingh EP, Kuijjer ML, Duim RA, Bürger H, Agelopoulos K, Myklebost O, Serra M, Mertens F, Hogendoorn PC, Lankester AC: Tumor-infiltrating macrophages are associated with metastasis suppression in high-grade osteosarcoma: a rationale for treatment with macrophage activating agents. Clinical Cancer Research. 2011, 17 (8): 2110-2119. 10.1158/1078-0432.CCR-10-2047.View ArticlePubMedGoogle Scholar
- Ullman-Cullere MH, Mathew JP: Emerging landscape of genomics in the electronic health record for personalized medicine. Hum Mutat. 2011, 32 (5): 512-516. 10.1002/humu.21456.View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.