Modeling autism: a systems biology approach

Autism is the fastest growing developmental disorder in the world today. The prevalence of autism in the US has risen from 1 in 2500 in 1970 to 1 in 88 children today. People with autism present with repetitive movements and with social and communication impairments. These impairments can range from mild to profound. The estimated total lifetime societal cost of caring for one individual with autism is $3.2 million US dollars. With the rapid growth in this disorder and the great expense of caring for those with autism, it is imperative for both individuals and society that techniques be developed to model and understand autism. There is increasing evidence that those individuals diagnosed with autism present with highly diverse set of abnormalities affecting multiple systems of the body. To this date, little to no work has been done using a whole body systems biology approach to model the characteristics of this disorder. Identification and modelling of these systems might lead to new and improved treatment protocols, better diagnosis and treatment of the affected systems, which might lead to improved quality of life by themselves, and, in addition, might also help the core symptoms of autism due to the potential interconnections between the brain and nervous system with all these other systems being modeled. This paper first reviews research which shows that autism impacts many systems in the body, including the metabolic, mitochondrial, immunological, gastrointestinal and the neurological. These systems interact in complex and highly interdependent ways. Many of these disturbances have effects in most of the systems of the body. In particular, clinical evidence exists for increased oxidative stress, inflammation, and immune and mitochondrial dysfunction which can affect almost every cell in the body. Three promising research areas are discussed, hierarchical, subgroup analysis and modeling over time. This paper reviews some of the systems disturbed in autism and suggests several systems biology research areas. Autism poses a rich test bed for systems biology modeling techniques.


Background
Autism is the fastest rising developmental disorder in the world today. In the US the rates of autism have risen from 1 in 2500 in the 1970 [1] to 1 in 88 today [2]. Autism is defined behaviorally, and is characterized by impairments in social behavior, stereotypic movements and difficulties in communicating [3]. Autism presents a burden upon both families and society as a whole. The estimated total lifetime societal cost of caring for one individual with autism is $3.2 million US dollars. This includes direct costs such as medical, therapeutic, educational and child and adult care. This figure also includes indirect costs such as loss of productivity of both the individual with autism and their caregivers [4].
In the past autism was considered purely a psychological [5] or neurological disorder [6]. There is increasing evidence that it is a highly diverse disease affecting multiple systems of the body. Some systems with strong evidence of involvement are metabolic, gastrointestinal, immunological, mitochondrial, and neurological [7,8]. Identification and modeling of these systems may lead to new treatments. It is hard to predict the all new treatments that would result from a systems approach, but the first would be better targeting of treatments. At present, physicians often rely on therapeutic trials and on psychotropic drugs not approved for autism [9].
One of the difficulties in describing the biology of autism is that it appears to have multiple etiologies. Some children have gastrointestinal disease, while others do not [10]. Some children have frank immune disorders, while others appear healthy [11]. Some show signs of autism from birth, while others appear to have a period of normal development, and then regress [12]. In addition to the difficulties this presents for modeling autism, the complex etiology can be a confounding factor in many autism studies as the different subgroups are not apparent using just the defining behavioral characteristics.
Currently those going in for autism evaluation do not get a comprehensive workup. These patients cannot articulate their problems or have the cognition to request an evaluation, so we need better lab workups. Many of these patients present with behavioral challenges, so the testing procedures should be all-encompassing and as less invasive as possible. So this whole body approach to modeling could potentially generate the parameters for a comprehensive evaluation or intake that would best guide treatment.
Another difficulty in understanding autism is that the various systems involved interact in complex and highly interdependent ways. This complexity points to a new paradigm in autism research using systems biology. In addition, autism poses particular difficulties as the scale of information to be modeled varies widely, from molecular level to anatomical. The diverse systems involved in autism and its complex etiology, makes the development of new techniques to model autism and mine its data, imperative. This paper will first review the systems that are altered in people with autism, and then present some of the challenges autism presents to system biologists.

Genetics, metabolism and oxidative stress
Autism has an established genetic component. Studies of twins shows a concordance of 0-10% in dizygotic twins and 70-90% in monozygotic twins [13,14]. However, the search for single autism genes has not been fruitful. It appears that autism results from a combination of relatively high frequency genes. The current model predicts that between 10 and 100 possible genetic variants may be responsible [15]. The rising rates of autism and the fact that the concordance of identical twins is not 100% supports the theory that autism results from a combination of genetic and environmental factors [16][17][18].
Several genetic variants have been associated with increased risk for autism. The variants found so far are mostly associated with differences in the metabolism, rather than in brain structure. The MET promoter variant rs1858830 allele "C", found at increased rates in autism, is associated with neuronal growth and development, but also is involved in immune function and gastrointestinal repair [19,20]. The fact that this genetic variant is present in 47% of the general population gives credence to the assertion that there is an environmental component to the development of autism. Many of the genetic variants at increased prevalence in autism are associated with the folic acid, transmethylation and transsulfuration metabolic pathways. Some of these genes are MTHFR, COMT, GST, RFC and TCN2. As with the MET variant, these are common in the general population. These variants decrease the activity of enzymes and decrease the efficiency of the body's ability to resolve oxidative stress, methylate genes and detoxify exogenous and endogenous toxins [21].
Oxidative stress occurs when production of Reactive Oxygen Species (ROS) and Reactive Nitrogen Species (RNS) exceeds the body's ability to neutralize them. ROS/RNS are free radicals, highly reactive molecules which can damage many parts of the cell. ROS/RNS occur through the energy production process in the mitochondria and through environmental sources. The mitochondrion is the main source of ROS/RNS and has evolved a system to neutralize the oxidants. The most important among these defences is glutathione (GSH). If the mitochondrial GSH pool is low, increased mitochondrial ROS production can occur. GSH is also the main antioxidant for extra-mitochondrial parts of the cell. GSH is produced by the sulfuration pathway as shown in Figure 1. The sulfuration pathway is linked to the methylation and folic acid pathways and any perturbation of those pathways will affect the production of GSH.
The methylation pathway provides methyl groups, CH3, to many functions in the body. S-adenosylmethionine (SAM) transfers methyl groups to be used in over 150 methyltransferase dependant methylation reactions in the body [22], most notably the methylation of genes. This transfer results in S-adenosylhomocysteine (SAH). SAH can be reversibly transformed into homocysteine and adenosine by the SAH hydrolase (SAHH). Homocysteine can then be either remethylated to methionine or can be transferred to the sulfuration pathway to create glutathione. The pathway flux is influenced by the relative amounts of the components. If the activity of methionine synthase (MS) is reduced, either through availability of its cofactor cobalamin (vitamin B12) or other impairment, less homocysteine will be converted to methionine to continue the cycle. This will result in more homocysteine and SAH, which reduces SAM dependent methylation processes. Methylation serves many important functions in the body. It is used epignetically to turn on and off genes. A methylated gene will not be expressed [23]. Methylation is also important in the function of neurotransmitters, neurohormones, myelin, membrane phospholids, proteins and creatine [24].
The activity of MS also determines the proportion of homocysteine shunted into the sulfuration pathway to make GSH. As the MS cofactor cobalamin is easily oxidized, oxidative stress will cause more homocysteine to be turned into GSH. In a properly functioning system this additional GSH would resolve the oxidative stress.
But in autism there is evidence of continued oxidative stress [25].
Metabolic markers of oxidative stress have been found to be elevated in children with autism. Glutathione, the main cellular antioxidant, levels were reduced. In addition, the oxidized disulfide form of glutathione (GSSG) was increased resulting in a doubling of the GSSG/GSH ratio, The ratio of plasma Sadenosylmethionine (SAM) to S-adenosylhomocysteine (SAM/SAH ratio) was reduced [26,27]. Evidence of increased lipid peroxidation was found which might indicate oxidative stress [28]. Oxidative stress can have a negative effect on many systems in the body. It has been implicated in cancer, cardiovascular disease, and autoimmune disease [29][30][31]. Oxidative stress is particularly destructive to the brain. The brain has higher energy requirements, high concentration of polyunsaturated fatty acids and lower reserves of GSH. Oxidative stress is also increased in schizophrenia, bipolar disorder and Parkinson's disease [32][33][34][35].
These interacting cycles are of great importance in autism as they have the potential for therapeutic intervention. Defects in MTHFR enzyme can be bypassed by supplementing the 5-CH3THF form of folic acid. Supplemental cobalamin can increase the efficiency of MS [36]. Supplements of other enzyme cofactors might also be of benefit [7].
The impairment of the metabolic pathways in autism can result from environmental influences in addition to genetics. Heavy metals [22] and pesticides [37,38] have been shown to inhibit the enzymes often deficient in autism. This could form a feedback loop, where insufficient activity of these systems allows toxins to remain, where they can further impair the detoxification systems.

Mitochondrial system
Mitochondria are the organelles responsible for the energy production in most eukaryotic cells. They convert the energy from carbohydrates and fats into adenosine triphosphate (ATP) through the process of cellular respiration. ATP is used to power most cellular functions. Mitochondria are also involved in signalling, cellular differentiation, and apoptosis, as well as the control of the cell cycle and cell growth [24].
The mitochondrion utilizes a complex series of chemical reactions to produce the ATP. During this process free radicals, including the particularly damaging super oxide, are produced. Since free radicals are so destructive, the mitochondrion has a series of defences to reduce the free radicals. If, due to genetic defects or acquired dysfunction, more free radicals are produced than the defences can reduce, oxidative stress can occur [39]. Mitochondrial disease occurs when there are mutations in the mitochondrial DNA. Mitochondrial disease is associated with a multitude of disorders including hypotonia, mitochondrial encephalomyopathy, cardiomyopathy and a range of endocrine, hepatic or renal tubular dysfunctions, myoclonic epilepsy and mitochondrial myopathy and developmental delay among others. Mitochondrial disease has many different presentations as a child can inherit a mixture of normal and mutated mitochondria from the mother [40].
There is clinical evidence of mitochondrial disease and dysfunction in autism [41]. Although only a small minority of people with autism have mitochondrial DNA (mtDNA) mutations, the rate of autism is higher among children with mitochondrial disease [42][43][44][45][46]. In addition, the task of finding genetic mutations influencing mitochondrial function is confounded by the fact  that many mitochondrial functions are encoded by nuclear DNA.
Mitochondrial dysfunction occurs when there is reduced mitochondrial function without genetic changes. Mitochondrial dysfunction and oxidative stress has been implicated in a variety of neurodegenerative diseases such as Alzheimer's disease (AD), Parkinson's disease (PD), amyotrophic lateral sclerosis (ALS) and Huntington's disease (HD). Since the brain has high energy demands, it is more susceptible to damage from faulty mitochondria [47]. Mitochondria can be inhibited by many stressors, but chief among them are metals such as mercury, arsenic, cadmium and lead [48,49]. Pesticides and industrial chemicals have been found to inhibit mitochondrial function [50]. In addition, people with autism have been found to have higher levels of the bacterium clostridium in their guts. Clostridium produces proprionic acid, which inhibits the oxidative phosphorylation of the mitochondria [8].
Although most people with autism have no discernible mutation indicating primary mitochondrial disorder, labwork gives evidence to reduced mitochondrial function, namely elevated plasma lactate, hyperlactacidemia and increased lactate/pyruvate ratio. Rarely have mtDNA changes been found in people with autism with clinical signs of mitochondrial dysfunction [51][52][53]. In addition, levels of enzymes associated with resolving mitochondrial produced radical production have been found to be lower in people with autism [54].
In addition to producing ATP, mitochondria perform the important function of sequestering calcium. Calcium is also used as a biologic signal between the mitochondria and the endoplasmic reticulum. Neuronal calcium signalling causes the release of neurotransmitters and can affect the speed of signals. Diseases with defects in the mitochondrial calcium pathways have a high Co-morbid occurrence of autism [55]. Post mortem studies of autistic brains show alterations in calcium homeostasis. This study also showed a possible connection between ionized calcium levels and the immune system [56].
There are several pathways for impaired mitochondrial function to affect the brain. The brain has high energy demands and a limited ability to neutralize free radicals, thus impaired mitochondria might be damaging to neurons [57]. Mitochondrial dysfunction could also lead to reduced frequency of neuron firing, particularly of inhibitory neurons [58]. Mitochondrial dysfunction could also affect the brain indirectly, through the immune system. Mice with mitochondrial deficiency have reduced number of immune cells [59], and supplementation of mitochondrial nutrients improve immune function of Type 2 diabetic rats [60]. Mitochondrial dysfunction in areas outside the brain could lead to hepatic production of VLCFA-containing lipids arising from impaired mitochondrial fatty acid beta-oxidation. These lipids can lead to microglial activation, and release of the neurotoxin glutamate [61].

Immune system
There is strong evidence of immune dysfunction in children with autism. Relatives of children with autism have increased rates of autoimmune diseases [62]. Imbalances of immune system cells and cytokines are found in many different parts of the immune system of people with autism.
Total levels of lymphocytes are reduced [63,64]. The serum immunoglobulin subtypes show abnormal patterns. In particular there is often a skewed Th1-Th2 helper ratio, with most people with autism showing a Th2 predomination [64,65]. T2 skewing results in increased antibodies which can induce allergies and autoimmune reactions. Food allergies are common in children with autism [66]. Th2 skewing also makes chronic viral infections more likely. Skewing also occurs in the serum immunoglobulin subtypes. Immunoglobulins are antibodies formed by the B cells to create humoral, persistant immunity. Immunoglobulins IgM, IgA, and total IgG are depressed while IgG subtypes IgG2 and IgG4, and total IgE are increased [67][68][69][70]. Increases of pro-inflammatory cytokines along with reductions of regulatory cytokines have been found [71].
The immune system has the ability to affect the mitochondria. Cytokines such as TNFα and IL6 can facilitate calcium influx and contribute to mitochondrial dysfunction possibly contributing to the deficits of autism through the mitochondrial system [72]. Extracellular mitochondrial DNA and anti-mitochondrial antibodies have been found in the serum of children with autism [73].
There are several avenues for the immune system to induce autistic behaviors. Immune dysregulation could result in generalized inflammation in the brain [74]. Inflammation in the brain has been linked to a number of psychiatric diseases including schizophrenia, [75] bipolar [76] Alzheimer's disease [77] and depression [78]. Multiple studies have found a correlation between abnormal levels of immune factors and core autistic deficits such as speech, mood and social deficits [79][80][81][82][83][84]. Another study found that the more the levels of the cytokines IL-1, IL5, IL-8 and IL-12p40 deviated from the norm, the more severe the stereotypical behaviors [85].
Challenge with nasal allergens during the low pollen winter months resulted in regression in 55% of children with autism as measured by the Aberrant Behavior Checklist [86]. Children with autism have been reported to have fewer aberrant behaviors particularly speech during fever as reported in a prospective study [87]. This gives further support to an immunological component [88].
The interaction between the immune system and the brain can present in several variations. Neuropeptides can modulate the immune system by recruitment of the innate immune system and chemotaxis [11]. In mouse models, decreased lymphocytes result in impaired learning and memory [89]. Autoimmunity is present in some cases. Anti-brain antibodies have been found in children with autism, though no evidence of demyelination has been found [11]. A study of 93 children with autism found that 75% had autoantibodies to the folate receptors in the central nervous system (CNS). Impairments of these receptors can lead to reduced levels of folate in the CNS and Cerebral Folate Deficiency (CFD). The levels of folate receiver antibodies were highly correlated with cerebrospinal fluid 5-methyltetrahydrofolate concentrations, thus indicating possible CFD in the tested children. There are structural similarities between the folate receptors and proteins found in milk [90]. A milk free diet, in addition to high dose folinic acid supplementation has been found to decrease the autoantibody titer and improve functioning in younger patients [91,92]. These immunological differences point to treatment options. Replacement of deficient lymphocytes in mice resolved the learning and memory difficulties [89]. Treatment of allergies often results in improvement in autistic behaviors such as hyperactivity and irritability [66]. An early study found that treatment with intravenous immune globulin in ten children with autism resulted in better speech, eye contact, focus and awareness of surroundings [93].

Gastrointestinal system
Incidence of gastrointestinal (GI) disease among those with autism varies widely, depending on exclusion criteria and whether the study was prospective or retrospective. A prospective study showed GI symptoms in 80% of patients with autism [94]. These symptoms include abdominal pain, chronic diarrhea and or constipation, and gastro esophageal reflux disease [10]. GI disease has been confirmed via endoscopy in several studies [95][96][97]. Inflammation was found throughout the GI tract, with reflux esophogitis, stomach inflammation, duodenum and abnormal carbohydrate digestive enzyme activity. Other studies have found chronic patchy inflammation and lymphonodular hyperplasia. This is different than the pattern seen in classical inflammatory bowel disease, with infiltration of T cells and plasma cells into the epithelial layers of the mucosa [68,97]. Lymphocyte infiltration into the epithelial layers of the gut lining and crypt cells has been found on endoscopy. In addition, there were IgG antibodies deposited onto the epithelium and complement immune system activation. This might be indicative of an autoimmune process [98].
There is evidence of increased intestinal permeability in people with autism [99][100][101][102][103]. Increased intestinal permeability was even found in 43% of children with autism without clinical signs of bowel dysfunction [101]. Intestinal permeability allows larger molecules that would normally stay in the gut to cross into the bloodstream. Plasma and urinary concentrations of oxalate were greatly elevated in children with autism, which may be a result of increased intestinal absorption [104]. Increased permeability can lead to allergy and autoimmune processes. There appear to be multiple reasons for the increased permeability. The dietary protein gluten can bind to the CXCR3 receptor, resulting in increased zonulin levels. Zonulin regulates the opening of the tight junctions in the gut [105]. Ingested toxins such as Polychlorinated Biphenyls can also open the tight junctions in the gut [106].
Increased incidence of dysbiosis, an imbalance of intestinal flora, has been noted in children with autism [99,107] Dysbiosis can result from use of antibiotics. As beneficial bacteria are killed, antibiotic resistant pathogenic organisms can take their place. It has been theorized that toxins produced by pathogenic organisms may be affecting the brains of individuals with autism. In addition, decreased levels of disaccharide digestive enzymes have been noted in children with autism [99].
There are anecdotal reports of improvement of autistic behavior on restricted diets. Some experimental studies have reported improvements reported include socialization, speech, strange and unusual behavior [108,109], stereotyped behaviors, attention/hyperactivity [110] and physiological symptoms [109] One study of the casein/gluten free diet considered children with and without GI symptoms separately. They found greater improvent in autistic behaviors in children with gastrointestinal symptoms compared to those without [109]. The reported improvements may be due to several reasons. Removal of allergens may result in lessened autoimmune reactions [66]. Removal of gluten may reduce intestinal permeability [103,105]. Removal of dietary proteins for which there is insufficient enzymic activity may reduce dysbiosis [111]. The brain has the potential to directly effect the functioning of the gut. Stress has been implicated in Irritable Bowel Syndrome with alterations of the intestinal barrier function, altered balance in enteric microflora, exaggerated stress response and visceral hypersensitivity [112]. Antidepressants [113] and therapy [114] have been found to be effective treatments for irritable bowel syndrome (IBS) and inflammatory bowel disease (IBD). There is also a finding that the brains of patients with IBS have increased hypothalamic gray matter compared with controls, though it is unknown whether the brain changes result from long term IBS or are preexisting [115].

Neurological system
Among the body systems involved in autism is obviously the brain. Anatomical differences in the cerebellum and amygdala have been noted in multiple studies, and other regions have been inconsistently identified as diverging from the average [116]. Decreases in Purkinje and granular cells have been noted [117]. Macrocephaly is present in about 20% of people with autism studied, with a general upward trend in brain size in other people with autism. The increase appears to be disproportionately from white matter enlargement. The cause of the macrocephaly is not known, though larger brains are prevalent among first degree, unaffected relatives. Neuroinflamation is one postulated cause [118].
Minicolumns in the neocortex have been postulated as the fundamental unit of cognition [119]. Minicolumns in autistic brains appear to be narrower, with tighter spacing and higher neuron density [120]. Whether this is a sign of pathology is unclear, as the same variation occurs in autopsies of three distinguished scientists [121]. Autism does occur more often in families or mathematicians, engineers and physicists [122]. It has been theorized that narrow minicolumns facilitate discrimination and more finely tuned activities, while wider minicolumns would facilitate generalization. This is consistent with the behavioral observations of stimulus overselectivity in autism. Stimulus overselectivity is the neglect of some features and the overly focused attention on other features, to the detriment of the observation of the whole [123]. Evidence also exists for an increased excitatory/inhibitory neuronal activity in the autistic brain [119,124].
Functional MRI studies are giving evidence to enhanced local connectivity, and reduced global connectivity in the autistic brain. This might result in an over analysis of smaller features and an impairment in synthesizing the information into a coherent whole [125]. It has been suggested that a feature in the development of autistic traits is a low signal to noise ratio in neural signals. In murine models, constant undifferentiated noise will indefinitely delay the maturation of neurons responsible for processing sound. A similar low signal to noise ratio in multiple systems in the autistic brain may be responsible for the impairments observed [126]. This would be consistent with the underconnectivity theory. Neuronal synchrony may be impaired if presynaptic and postsynaptic neurons don't fire within <100 ms of each other [127].
Brain hypoperfusion has been noted in several studies of subjects with autism. Interestingly, the region affected can vary widely. Hypoperfusion can result from structural abnormalities or from global effects such as oxidative stress [7]. Seizures are present in 30% of people with autism [128]. In addition, subclinical seizures are often present and treatment with anti-epileptics can result in mental improvement [129,130].

Modeling autism
All of the systems described above interact in highly complex ways. To date, little research exists in autism modeling outside of the genetic and neurological systems. Finding commonalities between autism and other conditions may lead to new treatments. Rzhetsky used statistical models to find genetic overlaps between autism, bipolar disorder and schizophrenia [131]. Individual subsystems of importance in autism have been modeled [132,133], but work needs to be done in modeling combinations of systems. It is clear that autism poses a challenging problem for modeling due to the high level of interactions between the different elements [134]. The probably incomplete Figure 2 shows some potential interactions between the systems discussed in this paper. For example, an analysis of children with both autism and mitochondrial disease found that a high proportion, 70%, regressed during a fever [135]. This illustrates just one example of an intersystem interaction between the mitochondrial, immune and neurological systems.
The dotted lines in Figure 2 indicate how even the environment might be effected by the presence of autism. For example, food allergies or special diets would change the environment through different food choices. Fecal incontinence in older children would change the activities the child would be exposed to. Energy deficits from mitochondrial dysfunction could affect school activities. And being oversensitive to sensory input would change activities and family dynamics.
Much work has been done investigating the genetic basis of autism. Additional work needs to be done to find and cluster the genes involved in autism. Modeling autism will require an integration of both systems and scales. A few potential research areas are presented below.

Hierarchical modeling
Modeling autism is complex due to the different physiological scales involved. Issues of importance to model range from the organ level to the genetic. Most systems biology to date has emphasized the "lower" levels, with a strong emphasis on direct genetic interactions. Outside of a few systems, such as the cardiovascular [136], less work has been done on an organ scale. To create a true model of the human body, the microscopic and macroscopic need to be integrated. One way to do this is to use a hierarchical system. Modules can be developed to model the scale being considered, with appropriate links between levels. Techniques have been borrowed from the systems engineering and software engineering communities to aid and formalize these connections between modules. An example is the BioUML, an open source platform for multilevel biology modeling [137]. Hierarchical modeling using rule based models has been implemented at a cellular level [138].
A hierarchical approach allows for separation of development of models for subsystems, but global effects of different substances and conditions need to be considered too. Studies of trans-organ and system effects of substances is a relatively unexplored field of study. For example, oxidative stress affects the mitochondria directly [24], but also the larger systems such as the brain [47]. Mitochondrial stress may also affect the brain indirectly. Dysregulation of mitochondrial dynamics has been implicated in Parkinson's disease [139]. Mitochondrial stress may lead to lipid peroxidation leading to reactive aldehyde generation in the liver, and finally to microglial activation and neuronal death [61].
Inflammation can affect many body systems. Inflammation can also be part of a feedback mechanism where inflammation creates conditions which create or perpetuate inflammation [140]. Xenobiotic substances must be taken into account.. Many exogenous substances are not typically included in existing models. Toxins such as PCBs, pesticides and heavy metals can affect the efficiency of enzymes often deficient in autism and need to be considered as a potential causative element [18,38]. In addition, the effect of toxins in combination may not be the same as the effect of the toxins in isolation [141][142][143]. The microbiome, the complex ecosystem of intestinal flora, may have an impact on many systems in the body either through immunological effects, or through the microbial metabolites such as the proprionic acid produced by clostridium [144]. Special diets and supplements used by many on the autism spectrum may affect the composition of the microbiome in addition to possibly changing the function of enzymes [145,146].

Identifying subgroups
In spite of autism's many common behaviors, it has become evident that autism has a complex etiology and multiple subgroups. The development of autism appears to be a complex interaction of genes and environmental factors. Since most cases of autism are idiopathic, there are an unknown number of subgroups that may be present. Treating autism as homogeneous will obscure the differences required to ascertain the variances needed for proper treatment. Identification of subgroups would aid in both research and treatments. As David Amaral, President of the International Society for Autism Research states, "There is not going to be rapid progress in autism research unless we subtype" [147]. This subtyping can be done on the basis of genes or clinical data. Clustering has been tried using behavioral symptoms but has had little success at identifying latent factors [148][149][150].
The benefits of subgrouping are as follows. Subgrouping the population might result in subgroups that have distinctive symptoms and pathology that are already familiar in the medical literature, and can draw upon treatments that work in existing treatable conditions. For example, if one subgroup is a variant of a known syndrome, we can possibly benefit from the treatments known in the context of that syndrome. Subtyping would reduce the use of therapeutic trials, allowing a more targeted treatment. Another benefit that accrues from subgrouping is in prevention. If we know the sequelae of another similar condition, we can take appropriate action to include appropriate preventive measures in the treatment protocol. For example, if seizures are a symptom of the similar known syndrome or condition, potentially a periodic EEG evaluation could be included in the treatment protocol.
Biomarkers can be used for clustering subgroups. Many of the metabolic, immunologic, proteomic, genetic and anatomical differences listed above can be used to search for subgroups [151,152]. Biomarkers can also be identified with more advanced methods [16,153]. An important consideration is that the biomarkers used be clinically relevant, chosen to maximize the potential for treatment [154]. For example, the following parameters could be included in a feature vector in the subgroup calculation algorithm, for the purpose of clustering: -Genetics. This can include genetic panels such as mitochondrial or results of microarray testing. -Lab test results, such as the above mentioned metabolic, immunologic and proteomic biomarkers. -Symptoms and severity as a function of time. These could be "hard" symptoms such as the presence and type of epilepsy, or "soft" symptoms such as parent reports of sociability. -Treatments and their effectiveness. The treatments could include steps to address some of the disease markers discussed above, such as methycobalamin and folinic acid [36] for methylation issues and carnitine [61] for mitochondrial issues.
The feature vector would be a vector with both specific values and binary numbers as markers such as a 1 for the presence of a polymorphism or other hard symptom and a 0 for none. For example, a child with the MTHFR 677 genotype, a tGSH: GSSG ratio of 8.6 and no epilepsy could be represented by the feature vector [1, 8.6, 0]. Once in numerical form, a variety of pattern recognition techniques can be used.
One popular clustering technique is the K-means [155]. The k-means algorithm is essentially a density finder. It assigns each input vector using an indicator function to a cluster defined by a prototype vector. The algorithm then minimizes the global average squared Euclidean distance from each input vector to the prototype. This optimization changes the position of the prototype vector to reflect Euclidean density patterns. The prototype center is the average of the input vectors assigned to it and thus potentially representative of a subgroup.
One weakness of the k-means is that it performs a hard assignment of each input vector to a cluster. An input vector is either entirely in a cluster or not at all. This would not match situations where there might be an overlap of symptoms. Fuzzy techniques would be of value in these cases. Fuzzy set theory allows intermediate levels, between 0 and 1, of belonging to member sets. The fuzzy c-means (FCM) is a fuzzy generalization of the k-means algorithm to allow input vectors to belong to more than one prototype [156]. The FCM also does not suffer from the stability problems that sometimes occur in the k-means when an input vector will switch back and forth between two prototypes, and thus changing the prototypes in the process.
An important issue with clustering algorithms is the number and validity of clusters. The k-means and FCM algorithms will find the number of clusters specified during program initialization, regardless of the actual number of clusters. Some clustering algorithms can produce clusters that are empty or degenerate. Many practitioners will heuristically try different numbers of clusters and asses the fit. There are also various methods to attempt to quantify the validity of clusters [157]. The Self Organizing Map (SOM) maintains a proximity relationship between clusters and can be useful for visualization [158].
The above techniques are unsupervised. Unsupervised techniques relay wholly on the input data to find clusters or groupings in the data. Supervised techniques incorporate additional knowledge about the expected groupings to guide the cluster development process. This additional information, if available, can aid in complex and high dimensional problems. Support Vector Machines [159,160] and a variety of Neural Network algorithms can be used to find patterns in the data [161]. Although supervised algorithms can, in general, outperform unsupervised algorithms, additional "ground truth" data is often unavailable. This ground truth can be information such as genes already associated with a phenotype or reaction to an intervention. It could also be symptoms that could also be used as inputs, such as the before mentioned presence of epilepsy.
Most of the algorithms mentioned above measure similarity based on the Euclidean distance metric. Euclidean and other Minkowski lp norms such as the "city block" distance measures will represent hyperspherical patterns well. Other distance measures are possible such as various correlation measures [162] and non-spherical distance measures such as the Mahalanobis distance [161]. Another issue is the scale of the data. Expected results in lab tests may vary by several orders of magnitude. Therefore, it is usually advisable to normalize the data before using in an algorithm.
Perhaps the most critical issue is the "Curse of Dimensionality". The curse of dimensionality refers to the somewhat counterintuitive properties of high dimensional spaces whereby additional information can result in a lessening of discernment. The simplest of the implications of high dimensional space is that the amount of data required to adequately cover a volume increases exponentially with dimension. It can be shown geometrically that most of the volume of a high dimensional Gaussian is contained in its tails, rather than at its center. This has obvious implications to distance based algorithms. The distance from a center of a cluster to any point is concentrated in a small interval and the relative differences from various data points to the prototype become essentially the same. Thus discriminatory power can decrease with added information, even if that additional information has discriminatory power in of itself [163,164]. That has implications for finding subgroups in a complicated disease such as autism that might require a large number of features. Feature selection will alleviate the curse of dimensionality but may exclude features needed to find less prevalent subgroups. The curse of dimensionality may also be avoided by using subspace methods or hierarchical clustering.
Another issue prevalent in autism data is the abundance of missing data. One cause of missing data would be different protocols for different studies resulting in similar but not identical feature vectors. When utilizing clinical data, physicians will not perform all tests on all patients, resulting in missing data when patients are combined. Therefore techniques need to be utilized to make the most of the data that is present [165,166].
Numerical data in autism research has particular challenges. The data can refer to disparate body systems. Data can be problematic to integrate across studies and research centers. For example, studies can have different selection criteria, experimental conditions, and goals. Research centers can have different testing procedures which can lead to varying results. Data is often not precise. Fuzzy techniques should be incorporated, as many of the data considered will not be easily quantifiable, such as parent reports of behavior. Also, what might be considered outlier data may in fact be important. It may be representative of the extreme values that are evident in autism data [167]. There are a myriad of information that might be useful in determining autism phenotypes.
As mentioned before, it might include items such as genotype information and lab results. It also might include items such as parent ratings of diarrhea odor. It is obvious that a value of '1' in these three categories would have very different meanings, although numerically they would be the same.
Incorporating domain knowledge into the identification of subgroups will alleviate many of the problems noted above. As shown in the preceding sections, there is much qualitative information about autism contained in the medical literature. Most of it is single system studies. Techniques need to be discovered to integrate this information together. One way to incorporate domain knowledge is to embed causal information into the solution [168].
Some preliminary, simple subgrouping has already shown promise. An analysis of the gluten and casein elimination diet showed greater improvement in symptoms in children with gastrointestinal symptoms compared to those without [109]. This information can help practitioners decide whether to recommend restrictive diets. It has been proposed that there may be a mitochondrial [58], intestinal permeability [103] and immune subgroups [11] in autism, but it is probably more complex than that as many children may belong to multiple subgroups. Thus it is imperative to develop subgroups that have clinical significance for treating the symptoms of autism, not just statistical validity.
For example, one could, possibly discover a subtype of autism that presents with clinical or subclinical seizures of a certain characteristic type. The treatment of seizures being a well-studied area, by itself, we could potentially establish a treatment protocol for patients in this subgroup, using treatment studies of drugs used for seizures in these patients also presenting with autism. This would result in a new treatment for those with autism, in contrast to using a seizure medication as an off-label drug without clear evidence of efficacy in this population.

Time dependent modeling
Another issue of importance is the time scales involved. Autism is a developmental, not a static disease. Disease progression might start prenatally and extend throughout childhood. And of course, the child's body is growing and changing. Modeling incorporating time progression has been primarily on the genetic or cellular level. Frameworks have been developed for parameter adjustments during phenotype transitions [169]. Molecular connectivity maps incorporating differentially expressed genes have been used to investigate the relationship of aging to neurological and psychiatric diseases [170].
Another time range to be considered is the progression through generations. Transgenerational changes have been shown with common toxicants. Low level bisphenol A exposure during pregnancy in mice resulted in transgenerational alterations in gene expression and behavior [171]. Another possible avenue for children with autism would be that impairments in the mother's methylation and sulfur pathways might result in a concentration of toxins in a mother. She would then pass on a greater than normal amount of toxins to her child prenatally [172] and through breast feeding [173]. This will impair the detoxification systems of the child from an early age, resulting in an even greater build-up of toxins. If this child, a girl, has children, she would pass on an even greater toxic load to her children. As the effects of toxins are more severe the earlier they are introduced, this might lead to developmental delays, including autism. Thus a non-genetic, non-epigenetic trans-generational inheritance could be occurring. A recent study showed a three-fold rate increase of autism in the descendants of survivors of the mercury induced Pink disease (infantile acrodynia). The study did not separate out matrilineal descendants, so it is impossible to determine whether there were toxins passed in utero, or whether the increased incidence was a result of a genetic hypersensitivity to mercury [174]. This sort of inheritance can also happen in other systems [175]. Inducement of diabetes in pregnant rats will result in increased prevalence of diabetes and obesity in the offspring. This can lead to gestational diabetes in the children and perpetuation of the diabetes through generations, through environmental causes [176,177].
Another source of time-dependence is that the brain itself is a state machine, in the sense that future characteristics depend on past characteristics, the various interventions employed or not employed at a certain time, etc. Simplified modeling with reasonable assumptions can be potentially employed to answer questions of generic value. An example would be "Are outcomes better for children with regression, who were treated with antiepileptic medications prior to puberty, compared to children who received such treatment later, after puberty?" Another example would be "Do children who exhibit conditions such as gastro-intestinal abnormalities or seizures generally tend to lose these symptoms after a certain age?" If so, did these children receive a certain therapy, either medical, educational or behavioral, at a certain age? In summary, a time-dependent model will throw more light into brain plasticity and its contribution to the outcomes that we see in this population.
In order to introduce this complexity, we propose enhancing our models using Dynamic Time Warping (DTW) [178] or a more complex model with state information, similar to hidden Markov models where the body is assumed to be in a state where it produces certain symptoms or observations and transitions to other states based on the model. Estimating these models and predicting outcomes would be the most complex of the techniques proposed in this article, and would be the goal for modeling such a complex time-varying system.

Discussion and conclusions
This paper contains, of necessity, an incomplete review of the issues involved in autism. Research is exploding in this area and new findings are being published every month.
It is clear that the complexity of autism presents a both challenge and an opportunity for systems biologists. Modeling autism requires new techniques to be developed to harness and tame the complexity of interactions. For example, a possible interaction would be impairment of the detoxification system could allow toxins to accumulate and cause mitochondrial dysfunction [48,49], which could cause immune dysfunction [179,180], which could cause gastrointestinal dysfunction which could then affect the brain [181]. This is not to imply that this relation is the cause of autism. In fact, the whole relation could go the other way, with stress inducing bowel dysfunction [112]. The bowel dysfunction could, through opening of tight junctions, induce immune activation [182], which could contribute to mitochondrial dysfunction [72], and finally the resultant oxidative stress can cause more resources to be used in the production of GSH, perturbing the metabolic pathways [183]. And in fact, the chain is not ordered. Gastrointestinal dysfunction could impair mitochondrial function directly through the clostridial production of proprionic acid [8]. These interactions are a purely hypothetical thought experiment and are not to be represented as causes. But even so, it is apparent that the number of possible interactions of systems in autism is almost exponential. This necessitates a system approach.
Autism could be considered a model for other complex diseases. The probable interplay between genetic and environmental factors is suspected as a factor in many diseases such as cancer and diabetes. Since many of the genetic variants that predispose children to autism are common in the general population, findings in autism may have much broader implications for the population in general.
Autism is the most rapidly increasing developmental disability with enormous costs to individuals and to society. The importance of modeling autism research cannot be overstated. In summary, the goal of a systems approach to modeling autism, can potentially lead to the following concrete benefits. First, having a comprehensive evolving digital data model for autism gives us a platform to capture the on-going research in an analysable format. The model itself can "learn" as results are incorporated as training data, into the system. Second, immediate tools such as a detailed hierarchical Intake or Follow up questionnaire could result from the system, based on its knowledge of subtypes and interconnections, leading to better clinical care for this population. Third, the system can be used for hypothesis generation, suggesting possible research topics for clinical trials.
Autism research findings need to be mined, integrated and modeled to help not just future generations, but also to improve the outcomes for the current generation of people with autism.