An independent investigation into the care and treatmentof TW A report forNHS London Authors:Alan WatsonDr Sally Adams Verita is an independent consultancy which specialises in conducting and managinginquiries, investigations and reviews for public sector and statutory organisations. Verita77 Shaftesbury AvenueLondon W1D 5DU
B902356a 588.602Systems biology approaches and pathway tools for investigatingcardiovascular diseasew Craig E. Wheelock,*ab A˚sa M. Wheelock,bcd Shuichi Kawashima,e Diego Diez,bMinoru Kanehisa,be Marjan van Erk,f Robert Kleemann,g Jesper Z. Haeggstro¨maand Susumu Gotob Received 4th February 2009, Accepted 26th March 2009First published as an Advance Article on the web 27th April 2009DOI: 10.1039/b902356a Systems biology aims to understand the nonlinear interactions of multiple biomolecularcomponents that characterize a living organism. One important aspect of systems biologyapproaches is to identify the biological pathways or networks that connect the diﬀering elementsof a system, and examine how they evolve with temporal and environmental changes. The utilityof this method becomes clear when applied to multifactorial diseases with complex etiologies,such as inﬂammatory-related diseases, herein exempliﬁed by atherosclerosis. In this paper, theinitial studies in this discipline are reviewed and examined within the context of the developmentof the ﬁeld. In addition, several diﬀerent software tools are brieﬂy described and a novelapplication for the KEGG database suite called KegArray is presented. This tool is designed formapping the results of high-throughput omics studies, including transcriptomics, proteomics andmetabolomics data, onto interactive KEGG metabolic pathways. The utility of KegArray isdemonstrated using a combined transcriptomics and lipidomics dataset from a published studydesigned to examine the potential of cholesterol in the diet to inﬂuence the inﬂammatorycomponent in the development of atherosclerosis. These data were mapped onto the KEGGPATHWAY database, with a low cholesterol diet aﬀecting 60 distinct biochemical pathways anda high cholesterol exposure aﬀecting 76 biochemical pathways. A total of 77 pathways werediﬀerentially aﬀected between low and high cholesterol diets. The KEGG pathways ‘‘Biosynthesisof unsaturated fatty acids'' and ‘‘Sphingolipid metabolism'' evidenced multiple changes ingene/lipid levels between low and high cholesterol treatment, and are discussed in detail.
Taken together, this paper provides a brief introduction to systems biology and the applicationsof pathway mapping to the study of cardiovascular disease, as well as a summary of availabletools. Current limitations and future visions of this emerging ﬁeld are discussed, with theconclusion that combining knowledge from biological pathways and high-throughput omics datawill move clinical medicine one step further to individualize medical diagnosis and treatment.
a Department of Medical Biochemistry and Biophysics, Division of Physiological Chemistry II, Karolinska Institutet,S-171 77, Stockholm, Sweden. E-mail: [email protected]; An organism is an individual living system capable of reacting Fax: +46-8-736-0439; Tel: +46-8-5248-7630 to stimuli, reproducing and maintaining a stable structure b Bioinformatics Center, Institute for Chemical Research, over time. Organisms are composed of multiple individual Kyoto University, Uji, Kyoto, 611-0011, Japan components, e.g. cells and their corresponding genes, proteins, Lung Research Lab L4:01, Respiratory Medicine Unit, Departmentof Medicine, Karolinska Institutet, 171 76, Stockholm, Sweden metabolites, etc., which are all governed by an intricate d Karolinska Biomics Center Z5:02, Karolinska University Hospital, network of interactions. This network is not static, and the 171 76, Stockholm, Sweden various components evolve and adapt dynamically to internal Human Genome Center, Institute of Medical Science, University of and environmental changes. The study of this complex system Tokyo, 4-6-1 Shirokanedai, Minato-ku, Tokyo, 108-8639, Tokyo f Department of Physiological Genomics, TNO-Quality of Life, as a single entity is a challenge that has been traditionally BioSciences, Utrechtseweg 48, 3704 HE, Zeist, The Netherlands addressed by studying diﬀerent components of the system g Department of Vascular and Metabolic Disease, TNO-Quality of in isolation. Although such approaches have produced a Life, BioSciences, Gaubius Laboratory, Zernikedreef 9,2333 CK, Leiden, The Netherlands signiﬁcant amount of knowledge and understanding, they w Electronic supplementary information (ESI) available: Complete list are limited in their ability to predict the eﬀects of alterations of all KEGG biochemical pathways identiﬁed by KegArray as being in single or multiple components upon the dynamics of the aﬀected by low cholesterol treatment, high cholesterol treatment, and whole system. This limitation may reﬂect why in some cases, diﬀerentially aﬀected between low and high cholesterol treatment. SeeDOI: 10.1039/b902356a signiﬁcant research advances do not translate, for example, 588 Mol. BioSyst., 2009, 5, 588–602 c The Royal Society of Chemistry 2009 into improved therapeutics or a ‘‘cure'' for the disease under interindividual basis. The normal or control state is more study. The discipline of systems biology attempts to shift the appropriately categorized as one of dynamic stability in which way in which an organism is perceived to address the our concept of homeostasis is more correctly deﬁned as complexity of living systems. Multiple deﬁnitions for systems homeodynamics.3 Accordingly, by deﬁning the parameters of biology exist, one of which describes it as a new ﬁeld of study the network that determine disease from healthy state, inter- that aims to understand the living cell as a complete system.1,2 ventions or treatments can be derived that are tailored for the In other words, systems biology seeks to understand how individual variability of the parameters for this steady state— system properties emerge from the nonlinear interactions of in other words, personalized medicine.
multiple components.3,4 The era of personalized medicine has been heralded for a The applications of systems biology approaches are number of years, and systems biology is a key component of increasing dramatically; however, the exact nature of what a this new paradigm.6–8 The intent is to identify disease before ‘‘systems approach'' entails remains diﬀuse in the literature.
pathogenic manifestation, thereby initiating therapeutic inter- The fundamental theme of systems biology is integration vention prior to signiﬁcant adverse eﬀects. Current medical practice is a reductionist approach that involves treating each disciplines.5 However, it should be noted that systems science problem or symptom in isolation. By these standards, the relief is not novel and has been advocated for many years in a of symptoms as determined by clinical evaluations following a number of research ﬁelds. At the simplest level, a systems treatment regimen embodies the deﬁnition of a cured or approach signiﬁes a study based upon examining the entire maintained patient. A corresponding ‘‘limited'' systems ‘‘system'' simultaneously, as opposed to a reductionist biology approach, where a multitude of clinical and bio- approach that focuses on a single gene, metabolite, pathway, chemical variables are combined with multivariate statistical etc. In other words, a systems biology approach does not focus analyses often reveals that the patient indeed has been on identifying a single target or mechanism for an observed removed from the disease group following treatment, but phenotype (e.g. disease). Systems biology instead seeks to not necessarily back towards a healthy state as is often identify the biological networks or pathways that connect assumed. Instead, the treated patient belongs to a novel the diﬀering elements of a system, and in the process describe biological status, distinctly diﬀerent from both healthy the characteristics that deﬁne a shift in equilibrium, such as individuals and peers in the disease group. This novel metabolic ﬂuxes or altered protein activities, which may cause pharmacological state is generally not discernable in classical a shift from a healthy to a diseased state. The hypothesis then medicine, as the patient per deﬁnition is classiﬁed as belonging becomes that those components of the network that are to the ‘‘healthy'' group as soon as the symptoms that deﬁne the disease are no longer detectable. More importantly, the and potentially descriptive of the disease, and accordingly classical reductionist approach does not reveal the novel represent potential targets for intervention to return the pharmaceutical state that the treatment regimen has induced, system to its original state (i.e. a healthy state). However, it and consequently implications on the patient's future health is important to realize that the concept of equilibrium may not cannot be predicted. In contrast, a true systems biology be as static as previously thought. It is more likely that approach oﬀers the ability to distinguish between multiple equilibrium is a steady state that represents a range of disease, healthy, or pharmacological states, as well as ﬂuctuations in the biological network that varies on an causative and adaptive responses and variables. However, in Associate Professor Craig E. Wheelock heads a research group at University and a professor at the Human Genome Center the Karolinska Institute that examines the role of bioactive lipid in the Institute of Medical Science at the University of mediators in inﬂammatory diseases, with a focus on cardiovascular Tokyo. His research involves deciphering systemic biological disease. He is broadly interested in the development of bioinfor- functions by integrated analysis of genomic and chemical matics tools for probing inﬂammatory diseases at the systems level.
Assistant Professor A˚sa M. Wheelock heads a research group at Dr. Marjan van Erk is a researcher at TNO Quality of Life who the Karolinska Institute that investigates pneumotoxicants and is interested in developing bioinformatical systems biology tools inﬂammatory lung diseases, as well as gel-based quantitative for metabolic and cardiovascular diseases.
Dr. Robert Kleemann heads a research unit at TNO Quality of Assistant professor Shuichi Kawashima is a researcher at the Life that investigates the role of inﬂammation in cardiovascular Human Genome Center in the Institute of Medical Science at disease and metabolic disorders and has particular interest in the University of Tokyo who is broadly interested in the devel- gene regulation and drug intervention.
opment of genome databases, bioinformatics web services and Professor Jesper Z. Haeggstro¨m heads a research group at the the biology of eukaryotic genomes.
Karolinska Institute that examines the role of bioactive lipid Dr. Diego Diez is a postdoctoral researcher at the Kyoto mediators in inﬂammatory disease.
University Bioinformatics Center working on applying systems Associate Professor Susumu Goto is interested in the develop- biology approaches to cardiovascular disease.
ment of databases for molecular interaction networks and Professor Minoru Kanehisa is the Director of the Bioinformatics network analysis using the KEGG database suite. His work also Center in the Institute for Chemical Research at Kyoto involves in silico metabolic reconstruction.
c The Royal Society of Chemistry 2009 Mol. BioSyst., 2009, 5, 588–602 589 order to make conclusions regarding causative relationships, it requiring the life scientist to become familiar with this research is necessary to have a suﬃcient number of variables and ﬁeld. These technical properties provide information regarding observations. In addition, the quantitative quality and source the global behavior of the network and therefore of the of the data, as well as the choice of multivariate statistical tools biological system under study. For example, one important both in the experimental design and the post-experimental ﬁnding was the scale-free topology nature of biological analyses, are vital for interpretation.
networks. In this type of network, most nodes have few links, The increase in systems biology applications is a reﬂection whereas a few nodes have many links (called hubs or nexus of a ‘‘perfect storm'' of advances in analytical methodology, nodes). One of the translations of this characteristic into a computing power and data acquisition. The completion of the biological context is the hypothesis that hub nodes perform human genome sequencing project heralded the age of key functions in the network. Accordingly, many fundamental large-scale biology and data acquisition. This paradigm shift genes, proteins, enzymes and compounds have been identiﬁed coupled to commensurate developments in technology and as hubs in their respective biological networks. Another experimental techniques that can simultaneously interrogate consequence derived from this ﬁnding is that because of the many elements of a system (i.e., microarrays, mass spectro- sparse nature of scale-free networks (i.e. most nodes having a metry, computational power and the Internet) has led to a few edges), they are very robust to environmental alterations.
veritable explosion in ‘‘omics'' science and systems biology However, although network analysis can help us understand related research. The challenge for systems biology is to the behavior of the system as a whole, the importance of integrate the disparate disciplines of biology, chemistry, individual elements is not lost in this global view. For example, statistics, computer science and engineering into a cohesive the study of biological networks shows that complex networks science. Towards this end, it is necessary to develop common are constructed of recurrent simple motifs.29 Initially described platforms for the analysis, presentation and archiving of data in simple bacteria, these motifs are also found in the regulatory to ensure inter-laboratory and cross-disciplinary compatibility networks of higher eukaryotes and are fundamental to and accessibility of data sets. Signiﬁcant steps have already understanding the behavior of complex networks, including been taken in this direction, and it is not our aim to review the biological networks. Moreover, the mathematical models used status of the technological platforms or compatibility of to generate the network itself can be used to predict the data formats, as these aspects have been covered in behavior of the network when speciﬁc elements are altered.
detail elsewhere.9–17 In contrast, this review focuses on the For example, what are the eﬀects if a speciﬁc node of a gene integration of diﬀerent types of data sets, and aims to regulatory network is removed by a knockout mutation? summarize the current state of systems biology research into How does this change aﬀect the global stability and robustness cardiovascular disease as well as present a number of diﬀerent of the network, and eventually, the phenotype of the pathway mapping tools that have been developed. In addition, studied system? Systems biology seeks to answer these and an example of a pathway analysis of atherosclerosis is other questions by modeling the relationship between the presented using a novel tool for mapping of omics data to the KEGG database suite.
One critical step is how the network is constructed from the raw data (transcriptomics, proteomics, metabolomics, etc.).
Networks in a nutshell This is accomplished by using diﬀerent mathematicaltechniques, ranging from simple Pearson correlations to the One of the recurrent concepts in system biology is that of the use of ordinary diﬀerential equations, Boolean networks, etc.
network. Much of the early work in networks focused (reviewed in refs. 31 and 32). Through this modeling, on simple model organisms including bacteria, yeast and fundamental concepts in the understanding of biological nematodes;18–24 however, this work is expanding to the under- systems, like robustness, modularity, emergence, etc. are standing of human diseases.25–28 A network type of represen- incorporated. Unfortunately not of all these questions are tation formalizes the interaction of diﬀerent components of a easily answered, even within the context of the systems biology system utilizing the infrastructure of a branch of mathematics paradigm. Whereas most studies currently focus on individual called graph theory. In the network paradigm, nodes represent networks (i.e. a transcription network or a protein–protein elements of the system while relations are symbolized by edges.
interaction network), in reality these diﬀerent networks func- For example, in a metabolic network, enzymes and com- tion as a connected system. Therefore, a change in the gene pounds are nodes, and reactions are edges. In a protein–protein regulatory network may have a corresponding eﬀect in the interaction network, two nodes connected by an edge protein–protein interaction network, the metabolic network, represent interacting proteins. This formalism enables the etc., which collectively may manifest changes in the observed study of living systems in a way never thought possible before.
phenotype. To understand the whole system, it is critical to The individual elements are integrated in a network whose integrate knowledge from diﬀerent studies. However, the properties can be analyzed globally: the number of edges per crosstalk between diﬀerent networks is not yet well understood node, the degree distribution (the probability that a node has a and although some progress has been made,33,34 the speciﬁc number of edges), the cluster coeﬃcient, etc. Barabasi integration of diﬀerent types of data is still in its infancy.12 and Oltvai have reviewed these concepts in detail, and Through the generation of mathematical models that integrate provided a comprehensive review of the terminology and diﬀerent types of data (e.g. transcriptomic, metabolomic, and concepts associated with network analysis.2 This new termi- protein–protein interactions),2 we can explain the observed nology is increasingly prevalent in the biological literature, phenotype, and hopefully make predictions regarding how the 590 Mol. BioSyst., 2009, 5, 588–602 c The Royal Society of Chemistry 2009 phenotype is altered when the network itself is modiﬁed components is utilized, it is possible to build a model that can through the alteration of internal or environmental factors.
describe any data set with a perfect correlation (i.e. R2 = 1.0;Fig. 1). A comparison of the correlation coeﬃcient to the Data processing and statistical analysis predictive power of the model is therefore essential. Thepredictive power (Q2) can be calculated through the use of a The pre-processing of data is crucial in network applications, training set and a test set, or if the data set is too small to allow as well as other systems level analyses. It is important to this, through a cross-validation approach. A good rule of recognize that the nature of large scale omics data is very thumb is to remove all components that do not contribute diﬀerent from that of reductionist approaches, and other to an increased predictive power of the model. If the data set is statistical methods should be utilized. The majority of the suﬃciently large, Q2 can be used as a measure to evaluate the univariate methods that have dominated biological sciences robustness of the model in relation to the whole population.
for centuries (e.g. Student's t-test) are not well-suited for a Another concern when utilizing MVA is that of strong number of reasons. For example, univariate statistical outliers. One should be cautious of any observation that is methods employ repeated testing to evaluate whether the null located on either end of the axis of the ﬁrst component hypothesis for a certain variable can be rejected, i.e. if it is (strong outliers), as it is likely that characteristics that are signiﬁcantly altered compared to the control group. Given the unique for this individual are inﬂuencing the entire model.
cumulative nature of the error in repeated testing, these Interpretability represents another concern in MVA. MVA methods are prone to high false positive rates, which become summarizes the entire data set in a few latent variables, which particularly pronounced in omics analyses where a large cannot be directly connected to the original measured number of variables are tested simultaneously. Even though variables. As such, it can be diﬃcult for the untrained eye to a range of approaches have been developed to correct interpret which variables are important or ‘‘signiﬁcant'' in for the resulting large false positive rates, most notably driving the separation of the diﬀerent study groups. This Bonferroni35,36 and false discovery rate (FDR) corrections,37 becomes particularly pronounced in more complex analyses the use of univariate methods remains a compromise. The fact such as PLS. A recent addition to this group of analysis, that univariate methods are very sensitive to missing data orthogonal PLS (OPLS), greatly simpliﬁes the interpretability points further decreases the robustness of network analyses by separating the variance in the data set according to the based solely on traditional statistical pre-processing of correlation to the selected Y matrix (e.g. disease group).38 In contrast, the ‘‘orthogonal'' component pulls out the variance Multivariate analysis (MVA) is a more suitable option for that is not correlated to the Y-variables of interest, and thus these ‘‘short and fat'' data sets that are typical for omics represents internal variance in the X-matrix. While this studies (i.e. a large number of variables with few observations).
approach is well-suited for motivating variable selection, it Instead of repeated testing of single variables, MVA aims to should be used cautiously in this aspect, given that the create a model that reduces the complexity of multi- back-drop of the method is a supervised selection of the dimensional data to a few latent variables that express the Y-variables that determine the separation. When in doubt, it majority of the variance of the data set. Exempliﬁed is generally better to include all of the variables in subsequent by principal component analysis (PCA), the most utilizedunsupervised method in omics applications, the model isstructured so that the ﬁrst principal component (PC1) isoriented so that it describes the largest possible portion ofthe variance in the data set that can be described by a linearvector. Accordingly, each subsequent PC contains a smallerportion of the variance in the data set than the previouscomponent. Given that the MVA is based on all individualvariable data points for all observations, the resulting model isrobust both against false positives and missing data points.
Furthermore, a conﬁdence interval representing all of thevariables is obtained, in contrast to univariate methods whereeach variable is analyzed as a separate unit, and consequentlyonly conﬁdence intervals for individual variables can beobtained. MVA can also be utilized to perform regressionanalysis between large data sets, most commonly throughpartial least squares between latent structures (PLS). Thesetypes of analyses are referred to as supervised methods, Overﬁtting of data represents one of the main pitfalls since the user deﬁnes which variables belong to the X dataset associated with multivariate analyses. With a suﬃcient number of (dictating variables) and which belong to the Y dataset components, a model that explains 100% of the variance (R2 = 1.0) can be built for any data set. In the above example, the simplest While useful, multivariate statistical methods are not (linear) model represents the most representative model for the data, without their own weaknesses. A major pitfall in MVA relates demonstrating that the simplest model provides optimal prediction, to overﬁtting of the model to the data. If a suﬃcient number of even though the correlation coeﬃcient is lower.
c The Royal Society of Chemistry 2009 Mol. BioSyst., 2009, 5, 588–602 591 analyses. Taken together, this section emphasized the point syndrome are recalcitrant to current interventions and that it is vital to employ the correct statistical analysis in both challenge the ability of the pharmaceutical industry to produce experimental design as well as data processing. These eﬀective and inexpensive therapies. For example, in cardio- approaches require an in-depth knowledge of MVA in order vascular disease, each known risk factor is addressed to correctly interpret the output of statistical models, prevent individually, whether it be hyperlipidemia or hypertension.3 overﬁtting of the data, apply multitest corrections, and achieve However, given the complex etiology of this disease, it is an appropriate balance of false positives and power.
likely that multiple factors are responsible for the observedpathology, resulting in a need for holistic treatment Systems biology in cardiovascular disease approaches that address the underlying problems. Accord-ingly, these diseases are logical targets for systems biology The utility of systems biology becomes clear when applied approaches to understanding disease mechanism, progression to multifactorial diseases whose etiology is complex. For and pathogenesis.
example, the etiology of inﬂammatory diseases such as atherosclerosis and asthma has proven recalcitrant to and linked to other systemic disorders,43,44 and the role of elucidation with reductionist approaches. It is possible that inﬂammation in the development of atherosclerosis and part of the diﬃculty in identifying new therapeutics lies in the cardiovascular disease is ﬁrmly established.45,46 The onset inability of current approaches to visualize the complexity of and development of cardiovascular disease has been shown these biological systems.39 The development of lead drug to involve multiple factors including lifestyle, diet, body candidates would also beneﬁt from a systems approach. For mass index, (epi)genetics, dyslipidemia, hypertension, and example, drugs such as torcetrapib, statin + ezetimide and inﬂammation among others. However, the current paradigm rimonabant have been withdrawn from the market because of patient treatment involves addressing these individual of side eﬀects that were not predicted with reductionistic risk factors in isolation, even though they are known to thinking. Diseases and disorders such as cardiovascular concomitantly contribute to disease pathogenesis. While disease, diabetes, metabolic syndrome, asthma and chronic eﬀective in many cases, this approach has not provided a cure or even a full understanding of the disease, which remains a complicated developments that resist eﬀorts to identify a single major source of mortality and morbidity worldwide.
gene or pathway responsible for disease onset and progression.
A number of studies have begun to address the issues Numerous therapeutics have been successfully developed that outlined above in a comprehensive fashion, and active intervene in diﬀerent stages of the disease; however, we are still research is being performed to develop systems biology far from developing a true cure for any of these pathologies.
approaches to cardiovascular disease.47 We present a few of The cellular complexity of many of the aﬀected organs these studies in chronological order, but stress that this list is represents a major obstacle in the elucidation of the systems not comprehensive. Many of the early studies that performed biology behind these pathologies. The lung, for example, systems biology-related investigations into cardiovascular consists of more than 40 diﬀerent cell phenotypes, all of which disease focused on a single omics proﬁling method (i.e., may elicit diﬀerent responses to up- or down-regulation of a transcriptomics or metabolomics) and then included clinical certain factor. Add to that the spatial and temporal aspects of parameters using multivariate statistics to develop models of the cellular response, and we are starting to approach the true disease. It is only recently that unifying systems biology complexity of biological systems. Accordingly, while beyond models employing multiple analytical platforms linked with the scope of this review, sampling design and strategy can have bioinformatics analyses have been produced. One of the signiﬁcant eﬀects upon experimental observations. Given earliest attempts to bring systems biology to cardiovascular the heterogeneity of many tissue types, it is challenging to function involved mapping important cardiovascular pheno- reproducibly sample tissue in such a way as to enable types onto the human genome. Stoll et al. studied 239 intra- and interlab comparisons. The obstacles involved in cardiovascular and renal phenotypes in 113 male rats. They this area are not trivial and need to be addressed by the identiﬁed and mapped a total of 81 cardiovascular phenotypes research community.
from an F2 intercross onto the human genome using correla- Cardiovascular disease is the major cause of premature tion patterns (‘‘physiological proﬁles'') and comparative death in Europe, resulting in 44 million deaths in the year genomics.25 The resulting genomic-systems biology map 2000.40 In the United States, cardiovascular disease was was applicable for gene hunting and mechanism-based physio- responsible for one of every ﬁve deaths in 2004, with an logical studies of cardiovascular function. For example, the average of one death every 37 seconds.41 The rapidly increasing authors presented a correlation matrix with phenotypic incidence of obesity and commensurate health eﬀects ordering of 125 likely determinants of arterial blood pressure, including atherosclerosis, metabolic syndrome and diabetes which could be used to assess the impact of allelic substitutions is of epidemic proportions, with the potential for signiﬁcant on each of the traits in either the parental or F2 generation increases in developing countries. It is anticipated that the of the intercross. The phenotypes were grouped into ‘‘BRIC'' countries (Brazil, Russia, India and China) will functionally related clusters (vascular, heart, renal, endocrine signiﬁcantly contribute to the global cardiovascular disease burden such that by 2020 an additional B4% of deaths in the blood pressure, and ordered within the clusters by known world will be due to ischemic heart disease.42 The complexities physiological relationships. All of the results of the linkage analyses and the phenotypic physiological proﬁles for each 592 Mol. BioSyst., 2009, 5, 588–602 c The Royal Society of Chemistry 2009 While useful for identifying potential markers of disease, the previous studies do not represent a systems methodology.
(http://brc.mcw.edu/phyprf/). A more diagnostic application One of the ﬁrst comprehensive systems biology approaches was presented by Brindle et al. who employed a supervised involving the integration of multiple omics platforms partial least squares discriminant analysis (PLS-DA) approach (transcriptomics, proteomics and metabolomics) examined to analyze 1H NMR spectra of human serum to diagnose the presence, as well as the severity of coronary heart disease.48 (ApoE*3Leiden) mouse model (a commonly used model of The PLS-DA model predicted the presence of coronary heart atherosclerosis50). The authors integrated gene transcripts, disease with a sensitivity of 92% and a speciﬁcity of 93% and protein and lipid data along with their putative relation- based on a 99% conﬁdence limit. The major driving factor for ships to gain insight into the early onset of disease.51,52 As is the observed separation in severe coronary heart disease common with many systems approaches, the authors devel- patients (triple vessel disease, TVD) was the presence of lipids, oped a number of their methods for data processing particularly LDL and VLDL, whereas the most inﬂuential and network analysis in-house, demonstrating a signiﬁcant loadings for the angiographically normal coronary arteries obstacle in the advance of systems biology. It is challenging to (NVA) were HDL-associated (e.g., fatty acid chains and integrate bioanalytical results from multiple platforms and phosphotidylcholine). Of particular importance is the fact that between diﬀerent research groups, making it diﬃcult to the authors conﬁrmed that the method was able to diagnose standardize results.12 The ApoE knockout mouse was used coronary heart disease independently of the inevitable in another investigation into atherosclerosis mechanisms associated gender bias. However, work by Kirschenlohr involving conjugated linoleic acids (CLAs) to determine how et al. concluded that plasma-based 1H NMR analysis is a individual CLA isomers diﬀerently aﬀected pathways involved weak predictor of coronary heart disease.49 They found that in atherosclerosis.53 ApoE knockout mice were fed a diet the predictive power was signiﬁcantly weaker, with NVA and supplemented with 1% cis9, trans11-CLA, 1% trans10, coronary heart disease groups identiﬁed 80.3% correctly for cis12-CLA or 1% linoleic acid for twelve weeks. The eﬀects patients not receiving statin therapy and 61.3% for patients upon lipid and glucose metabolism were measured, as well as treated with statins. The main reason postulated for the the regulation of hepatic proteins. Correlation analysis observed study discrepancy was the inclusion of additional between physiological and protein data identiﬁed two clusters variables in the Kirschenlohr et al. study, including drug associated with glucose metabolism. The results showed that treatment regimen. Statins signiﬁcantly aﬀect LDL levels, cis9, trans11-CLA speciﬁcally increased expression of the which was a discriminating factor in the PLS-DA model.
anti-inﬂammatory HSP 70, as well as decreased expression Accordingly, as the most signiﬁcant loadings associated with of the pro-inﬂammatory macrophage migration inhibitory diagnosis in both studies were related to lipid species, it is not factor, suggesting that consumption of cis9, trans11-CLA surprising that treatments aﬀecting lipid levels inﬂuenced the could protect against the development of atherosclerosis.
observed separation power of the model. In other words, statin A systems biology approach to elucidating biological treatment partially resolves the incidence of coronary artery pathways in coronary atherosclerosis was published by King disease, thus reducing the biomarker signal in these patients. It et al. who performed custom microarray analysis of coronary would be interesting to further examine these patients to artery segments.54 A number of clinical variables were determine if they were truly moving towards a ‘‘healthy'' examined, and diabetic states provided the most interesting phenotype or were instead representative of a third pharma- results, with 653 upregulated genes in the no diabetes class and cological state as discussed above. This point demonstrates 37 upregulated genes in the diabetes class, with an FDR of one of the main challenges in developing diagnostic markers of 0.08%. The top gene upregulated in the diabetes class was complex disease in that in many cases patients will present IGF-1, followed by the IL-1 receptor and IL-2 receptor-a, distinct genotypes as well as personal therapeutic treatment indicating that there were changes in cytokine-induced regimens that can potentially confound the use of biomarkers, immune and inﬂammatory responses. These results suggest as reported by Brindle et al. At the very least, these studies that inﬂammation is more prominent in diabetic than demonstrate the importance of including as much patient metadata in the analyses as possible. The work of both expression proﬁles were then used to construct a novel groups supports further research into exploring the potential pathway based upon gene connectivity as determined by of applying metabolomics methods to identify plasma language parsing of the published literature, and ranking as (i.e., non-invasive) biomarkers of coronary heart disease. It determined by the signiﬁcance of diﬀerentially regulated genes is possible that biomarkers could be identiﬁed in a study with in the network. The resulting gene subnets were visualized with increased cohort size composed of the myriad of clinical Cytoscape, an open-source bioinformatics resource (discussed and interindividual variables. An important aspect of these in more detail below55), to identify nexus genes in disease metabolomic analyses is that in order to correctly classify severity. Results indicated that the key process in the individuals with coronary heart disease, it is not necessary progression of atherosclerosis relates to smooth muscle cell to fully understand the complex molecular diﬀerences dediﬀerentiation, suggesting a focus on changes in the smooth that underlie disease etiology.48 This methodology is an muscle phenotype as a target for atherosclerosis. The results important ﬁrst step towards being able to identify individuals also provided insight into the severe form of coronary artery at risk of disease development or in the early stages of disease associated with diabetes, reporting an overabundance disease onset.
of immune and inﬂammatory signals in diabetics. This method c The Royal Society of Chemistry 2009 Mol. BioSyst., 2009, 5, 588–602 593 for querying multiple search engines and/or databases biomarker of myopathy. The results showed that the arachi- combined with parsing of the retrieved results (documents) donate 5-lipoxygenase activating protein gene (ALOX5AP) for biological associations is extremely powerful for generating had high positive regression coeﬃcients with plasma levels networks, and is used extensively in multiple software of phosphatidylethanolamine(42:6) and negative regression applications for network generation.
coeﬃcients for cholesterol ester ChoE(18:0). These results Lipopolysaccharide (LPS) is a critical inducer of sepsis, were particularly intriguing as the ALOX5 gene has been which is characterized by systemic inﬂammation, hypotension previously shown to predispose humans to atherosclerosis.64,65 and multiple organ failure.56 Tseng et al.57 examined the This systems biology approach successfully identiﬁed potential molecular eﬀects of late-phase LPS stimulation on primary plasma-based markers of the eﬀects of statin treatment rat endothelial cells in an attempt to develop diagnostic and showed that observed eﬀects upon pathways were markers of inﬂammatory disease. A combination of cDNA statin-speciﬁc. In particular it also provided mechanistic microarray, 2-DE and MALDI-TOF MS/MS, as well as insight into the development of atherosclerosis, demonstrating cytokine protein arrays were analyzed using custom bio- the utility of a systems approach. A similar method was informatics applications. Diﬀerentially expressed genes and employed by Pietila¨inen et al. who examined obesity in proteins were mapped onto their corresponding biological pathways using BioCarta or KEGG, and the results were obesity to be associated with deleterious alterations in lipid ordered using the BGSSJ software (bulk gene search system metabolism pathways known to promote atherogenesis, for Java) followed by analysis with ArrayXPath.58 The results inﬂammation and insulin resistance.66 Intriguingly, they showed signiﬁcant eﬀects (p o 0.05) on the BioCarta path- reported that obesity primarily related to increases in ways ‘‘LDL pathway during atherogenesis'', ‘‘MSP/RON lyso-phosphatidylcholines and decreases in ether phospholipids.
receptor signaling pathway'' (MSP, macrophage-stimulating Nikkila¨ et al.67 used this method to examine the gender- protein; RON, tyrosine kinase/receptor d'origine nantais), dependent progression of systemic metabolic states in early ‘‘signal transduction through IL-1R'', and ‘‘IL-5 signaling childhood. They were able to categorize children in terms of pathway'', demonstrating that inﬂammatory pathways were metabolic state at a very young age (from birth to 4 years old).
signiﬁcantly aﬀected by LPS treatment, as would be expected.
Using lipidomics proﬁling methodology and hidden Markov Overall, this study used a systems biology approach to models, they found that the major developmental state diﬀer- show that NF-kB-associated responses in endothelial cells ences between girls and boys can be attributed to sphingolipids.
aﬀected pathways involved in proliferation, atherogenesis, They also found multiple previously unknown age- and gender- inﬂammation and apoptosis, thereby providing information related metabolome changes of potential medical signiﬁcance.
on multiple pathways simultaneously. However, it should be In addition, they demonstrated the feasibility of state-based stressed that it is necessary to diﬀerentiate protein concentra- alignment of personal metabolic trajectories, which is an tions from protein activities in order to make meaningful important proof-of-principle step for applications of meta- deductions. Several studies using ‘‘focused'' arrays to analyze bolomics towards systems biology and personalized medicine.
Children were shown to have diﬀerent development rates at the conﬁrmed that short-term LPS exposure results in vivid level of the metabolome and thus the state-based approach may upregulation of a spectrum of proinﬂammatory genes be advantageous when applying metabolome proﬁling in search including IL-1b, IL-15, interferon-induced genes, and a series of markers for subtle (patho)physiological changes.
of TNF superfamily members.59–62 Statins are an important therapeutic in the control of plasma lipoproteins upon plaque formation using the hyperlipidemia, with demonstrated eﬃcacy in lowering Ldlr / Apo100/100Mttpﬂox/ﬂoxMx1-Cre mouse model, which cholesterol levels. However, there are concerns regarding the has a plasma lipoprotein proﬁle similar to that of familial development of statin-induced myopathy following aggressive hypercholesterolemia and a genetic switch to block the hepatic treatment. Laaksonen et al. employed a systems biology synthesis of lipoproteins.68 Transcriptional proﬁling of approach to probe the cellular mechanisms leading to atherosclerosis-prone mice with human-like hypercholestero- myopathy and identify potential biomarkers.63 Muscle lemia and reverse engineering of whole-genome expression biopsies were analyzed for whole genome expression and data provided a network of cholesterol-response atherosclerosis plasma samples were proﬁled using a lipidomics approach.
target genes. This regulatory gene network appeared to The microarray analysis revealed modest changes in the control foam cell formation, suggesting that these genes could atorvastatin treatment group (ﬁve altered genes), but 111 potentially serve as drug targets to prevent the transformation genes were aﬀected in the simvastatin group. The diﬀerences of early lesions into advanced, clinically signiﬁcant plaques.
in response are not necessarily unexpected given that the two Kleemann et al. employed a systems approach to examine statins diﬀer in their hydrophobicity/lipophilicity, and thus in the eﬀects of dietary cholesterol upon atherosclerosis.69 Of the extent that they aﬀect the vasculature. The lipidomics particular interest in this study is the focus of the eﬀects of proﬁling identiﬁed 132 unique lipid molecular species dietary cholesterol upon inﬂammation. The role of inﬂamma- (however, this method does not allow for the unequivocal tion in cardiovascular disease and atherosclerosis in particular identiﬁcation of fatty acid substitution position on lipid head has been established;70 however, the source of inﬂammation groups). The gene expression data and the lipidomics data and the exact mechanisms of how inﬂammation is evoked and were combined following gene set enrichment analysis (GSEA) contributes to disease development and progression are still and further analyzed with PLS-DA to look for a plasma-based unclear. The data of Kleeman et al. demonstrated that the liver 594 Mol. BioSyst., 2009, 5, 588–602 c The Royal Society of Chemistry 2009 is capable of absorbing moderate cholesterol-induced stress pSTIING. These types of tools enable the visualization of the (up to about 0.5% w/w in the diet), but a further increase results integrated with the information provided in these evoked the expression of hepatic pro-inﬂammatory genes databases. Other tools enable the generation of networks that including a number of pro-atherosclerotic candidate genes.
are inferred from omics data, such as Cytoscape (through These data also showed that dietary cholesterol can be a several plugins), VANTED, some of the R/Bioconductor trigger of hepatic inﬂammation (as reﬂected by elevated packages79 and many of the commercial software packages.
plasma levels of acute phase genes) and that it may be involved Most of these tools can also be used to analyze and manipulate in the development of the inﬂammatory component of networks. However, to date there is no perfect solution and atherosclerosis by switching on four distinct inﬂammatory substantial eﬀort is needed to integrate multiple datasets in a comprehensive fashion. Herein we provide a brief overview of pathways). Furthermore, the authors used a network some of the diverse options.
analysis approach to demonstrate that lipid metabolism and The Kyoto Encyclopedia of Genes and Genomes (KEGG) inﬂammatory pathways are closely linked via speciﬁc is a web-based resource that contains a series of databases of transcriptional regulators. They conﬁrmed that targeting of biological systems, consisting of genetic building blocks of a prototype transcription factor of the inﬂammatory response genes and proteins (KEGG GENES), chemical building (NF-kB) aﬀected plasma lipid levels and lowered plasma blocks of both endogenous and exogenous substances (KEGG LIGAND), molecular wiring diagrams of interaction and demonstrated the strength of a systems approach in that reaction networks (KEGG PATHWAY), and hierarchies multiple analytical platforms were combined to build an and relationships of various biological objects (KEGG overall model of disease, which provided mechanistic BRITE). KEGG provides a reference knowledge base for information across multiple biological pathways that suggest linking genomes to biological systems, and also to environ- potential new strategies for therapeutic interventions aﬀecting ments, by the processes of PATHWAY mapping and BRITE inﬂammation, as well as plasma lipids, in a beneﬁcial way. The mapping. The visualization objects in the KEGG suite are results of this study are examined in greater detail using the consistent, with the nodes of a pathway map shown as KegArray tool discussed below.
rectangles that represent gene products, usually proteins, andsmall circles representing chemical compounds and othermolecules. A large oval represents a link to another pathway An expanding toolbox map, and a cluster of rectangles represents a protein complex.
An important bottleneck in the development of systems Aoki and Kanehisa provide a comprehensive tutorial on approaches is the need for software capable of analyzing KEGG for interested readers.80 collected omics data from multiple platforms. There are many The Systems Biology Markup Language (SBML) is a software packages and web resources available, all of which are too numerous to describe in this review (see ref. 71 for a biochemical reaction networks in software. It is oriented comprehensive list of 4150 resources for systems biology).
towards describing systems of biochemical reactions, including A few resources worth brieﬂy mentioning here include cell signaling pathways, metabolic pathways, biochemical KEGG,72 PathVisio,73 pSTIING,74 MetaCoret,75 Cytoscape,55 reactions and gene regulation.78 The SBML project has VANTED,76 Pathway-Express,77 Ingenuitys Systems and a produced a KEGG2SBML tool that is useful for converting plethora of SBML applications78 (Table 1). Some of this KEGG-based metabolic pathways into SBML format. The software is designed to map the results from omics experi- pSTIING resource consists of a web-based application ments onto existing pathway databases such as KEGG or containing metabolic pathways, protein–protein, protein–lipid Network and pathway mapping software, including tools for network visualization/manipulation and network inference from high-throughput dataa Various (plugins) Ingenuitys Systems http://www.ingenuity.com/ KEGG (Kyoto Encyclopedia of Genes and Genomes) http://www.genome.jp/ Same as Cytoscape a This list is non-exhaustive and is solely provided to give an example of some of the available resources. See Ng et al. for a more comprehensivelist.71 b Systems biology markup language (see http://sbml.org/). c Aﬃnity puriﬁcation-Mass spectrometry.
c The Royal Society of Chemistry 2009 Mol. BioSyst., 2009, 5, 588–602 595 but interested readers are suggested to examine work by the transcriptional regulatory associations. It is focused on Institute for Systems Biology SBEAMS (Systems Biology regulatory networks relevant to chronic inﬂammation, cell migration and cancer, therefore, making it a useful resource sbeams.org/), a framework for collecting, storing, and for inﬂammatory-related applications. The pSTIING web site accessing data produced by these and other experiments.89 also features a tool for inferring networks (Cladist). VANTED Other eﬀorts in this area include the Biological Networks is a multiplatform tool for the manipulation of graphs that server, which is a systems biology software platform with represent either biological pathways or functional hierarchies.
multiple visualization and analysis functions including: It also allows the mapping of experimental data into visualization of molecular interaction networks, sequence the network and is capable of processing ﬂux data. Graph and 3D structure information, integration with other graph- information is loaded in SBML format, but it also has a structured data such as ontologies (e.g., gene ontology) and KEGG interface.81 Cytoscape is an open source platform for taxonomies (e.g., enzyme classiﬁcation system), integration of visualizing molecular interaction networks and biological interactions with experimental data (e.g., gene expression), pathways. One of its most useful features is the ability to and extraction of biologically meaningful relations, as well as accept custom plugins to perform speciﬁc tasks, extending the number of initial features. A number of useful plugins are Networks server provides querying services and an information already available, including MONET,82 a method for inferring management framework over PathSys, which is a graph-based gene regulatory networks from gene expression data, and system for creating a combined database of biological the AgilentLiteratureSearch plugin,83 which enables the pathways, gene regulatory networks and protein interaction generation of association networks from literature mining maps, which integrates over 14 curated and publicly contributed (see below). R and Bioconductor are a platform extensively data sources for eight representative organisms.91 There is also used for the analysis of high-throughput data.84 In addition, currently a signiﬁcant amount of eﬀort to determine standards there are several free resources available related to the for storing microarray data (MAGE-OM/ML, GeneX, analysis of networks, including packages such as GeneNet,85 apComplex86 and Rgraphivz,87 (for creating and visualizing and metabolomics standards initiatives.93 Data-integration networks). The package Gaggle88 enables interaction between techniques for omics data sets have been reviewed in detail Cytoscape and R.
by Joyce and Palsson,12 and references therein.
The two main commercial packages are MetaCoret and One of the long-range goals of systems biology approaches Ingenuitys Systems. MetaCoret (GeneGo, Inc.) is an is to develop models capable of predicting clinical phenotypes, integrated suite of software applications that is designed for as well as patient treatment regimens and associated outcomes.
functional analysis of experimental data, including omics data, However, the complexity of cardiovascular disease and other CGH arrays, SNPs, SAGE gene expression and pathway inﬂammatory-related diseases makes model development analysis. MetaCoret is based on a proprietary manually challenging. A number of diﬀerent groups are working on curated database of human protein–protein, protein–DNA developing in silico models of inﬂammation, with the majority and protein–compound interactions, metabolic and signaling of eﬀorts focused on the acute inﬂammatory response.94–97 pathways, and the eﬀects of bioactive molecules on gene However, it is likely that these models can eventually be expression. GeneGo is also in the process of creating a systems adapted for diseases of chronic inﬂammation. Recent reviews biology and pathway analysis platform speciﬁc for cardio- have addressed the status of cardiac systems biology, with a vascular diseases (MetaMiner Cardiac Consortium). Ingenuity number of promising developments.5,47,98–100 These models Pathways Analysis (IPA) enables researchers to model and represent the logical extension of the systems biology tools analyze biological and chemical systems. The IPA suite discussed above and as the amount of data increases, our contains a series of modules including IPA-Biomarkert ability to develop interactive models of individual pathologies will increase. This translational systems biology approach will Analysis. IPA-Biomarkert identiﬁes the most promising and make it feasible to develop patient-speciﬁc modeling based relevant biomarker candidates within experimental data.
upon known disease mechanisms.97 These models will be IPA-Toxt delivers a focused toxicity and safety assessment useful in clinical settings to predict and optimize the outcome of candidate compounds, elucidates toxicity mechanisms and from surgery and non-interventional therapy.101 identiﬁes potential markers of toxicity, with a focus oncardiovascular toxicity, nephrotoxicity, and hepatotoxicity.
IPA-Metabolomicst analyzes metabolomics data in thecontext of metabolic and signaling pathways. This module To address the need for software capable of analyzing data can integrate transcriptomics, proteomics and metabolomics from multiple omics platforms, KEGG has recently intro- data in a systems biology approach to biomarker discovery, duced a new application called KegArray that is designed to molecular toxicology, and mechanism of action studies.
map omics data onto the KEGG suite of databases. KegArray Multiple eﬀorts are currently under way to synchronize the is a Java application that provides an environment for data being collected by research groups around the world. In analyzing transcriptomics or proteomics (expression proﬁles) order to advance the ﬁeld, it is therefore necessary to develop and metabolomics data (compound proﬁles) individually or databases with deﬁned metrics for evaluating the quality of the simultaneously. The application is tightly integrated with the global data sets. This area is beyond the scope of this review, KEGG database, and maps input data to KEGG resources 596 Mol. BioSyst., 2009, 5, 588–602 c The Royal Society of Chemistry 2009 including PATHWAY, BRITE and genome maps. KegArray genes/proteins/compounds. In this case, the ranking represents is available for running in Mac, Windows or Linux how well the respective pathways have been covered by the environments and can be downloaded freely from the KEGG experimental analyses. Subsequently, by only including the up- and down-regulated entries in the mapping, a ranking The KegArray tool is designed to facilitate integrated based on biological eﬀects on the pathway can be achieved.
mapping of omics results onto a KEGG application of choice.
The statistical evaluation of systems biology data is a complex Metabolic pathways signiﬁcantly aﬀected in high cholesterol and highly debated subject (see Data Processing and Statistical exposure relative to low cholesterol exposurea Analysis). As such, the KegArray tool itself does not imposeany statistical evaluation on the inputted data, but is rather mmu01040 Biosynthesis of unsaturated fatty acidsmmu03320 PPAR signaling pathway intended as a link between processed data and the interactive mmu00564 Glycerophospholipid metabolism KEGG environment. This conceptual solution allows the user mmu00071 Fatty acid metabolism to have full control over the choice of statistical methods, data mmu04920 Adipocytokine signaling pathway transformation and data selection prior to mapping onto the mmu00565 Ether lipid metabolismmmu00590 Arachidonic acid metabolism KEGG tool of choice. KegArray allows full ﬂexibility in mmu00100 Biosynthesis of steroids determining the signiﬁcance or cut-oﬀ levels, as well as the mmu00120 Bile acid biosynthesis corresponding color coding for the mapping. KegArray can mmu00561 Glycerolipid metabolismmmu00600 Sphingolipid metabolism thus be described as a visualization tool, but with the added mmu00591 Linoleic acid metabolism advantage of a sustained interactive environment with the vast mmu00592 alpha-Linolenic acid metabolism KEGG database. It is not necessary to pre-select the pathways a Data are from a KegArray-based analysis of quantiﬁed lipid and of interest and the output is formatted as a list of links transcriptomics data from Kleemann et al.69 Pathways are from to all aﬀected pathways, organized in the order of highest KEGG PATHWAY and are listed with pathway name and KEGG number of mapped genes/proteins/compounds per pathway.
ID number (e.g. mmu for mouse). The pathways are ranked in order of KegArray can be conﬁgured to display any combination greatest number of components signiﬁcantly aﬀected in the pathway.
A total of 77 diﬀerent pathways were aﬀected, of which the top 13 areshown here. A complete list of all 77 aﬀected pathways is provided inTable S3. In addition, those pathways signiﬁcantly aﬀected by low and An example for expression ratios between two channels for high cholesterol exposure are provided in Table S1 and S2, respec- the input of transcriptomics data into KegArraya tively. It is not possible to state whether an entire pathway is positivelyor negatively aﬀected, but these individual pathways can be visualized following mapping to KEGG and inspected for speciﬁc ﬂuctuations in the data. Examples of this are shown in Fig. 3 and Fig. 4.
a Data are the high cholesterol (HC) treatment shown in Fig. 2.
KegArray input format for metabolomics dataa Venn diagram displaying the number of metabolic pathways signiﬁcantly aﬀected following treatment with either low cholesterol (LC) or high cholesterol (HC) relative to control in n ApoE*3Leiden mouse model of atherosclerosis. In addition, the changes between HC and LC were compared, evidencing ﬁve pathways that were speciﬁcally aﬀected between these two treatments (mmu00010 glycolysis/ gluconeogenesis, mmu00641 3-chloroacrylic acid degradation, mmu00680 methane metabolism, mmu00980 metabolism of xenobiotics by cyto- chrome P450, and mmu00982 drug metabolism-cytochrome P450).
Data are from a KegArray-based analysis of quantiﬁed lipid and transcriptomics data from Kleemann et al.69 A complete list of all Data are the high cholesterol (HC) treatment shown in Fig. 2.
pathways aﬀected is provided in the ESI, Tables S1–S3.w c The Royal Society of Chemistry 2009 Mol. BioSyst., 2009, 5, 588–602 597 The expected mapping format is that of ratios between e.g. a available. Additional information regarding experimental treated and control group, and a speciﬁc tab-delimited format descriptions, reference information, etc., can also be included to facilitate the automatic calculation of ratios from raw data in the input ﬁle by simply adding the ‘#' character at the is available (KEGG EXPRESSION format). However, in beginning of the line, which will result in that line being order to increase the versatility of the tool, an additional skipped by KegArray (other than the ‘#organism:' or generic ﬁle input format has also been constructed (RATIO ‘#source:' line).
format) to allow other aspects of the data to be evaluated The lines in tab-delimited format below the ‘#'-delimited through the KegArray tool (e.g. weighting according to section contain omics proﬁling data. The ﬁrst column must statistical signiﬁcance, ranking etc.). Both formats, described contain the KEGG GENES ID, which is the unique identiﬁer in detail in the ReadMe ﬁle available for download with of the organism-speciﬁc gene. The second and third columns KegArray (http://www.genome.jp/kegg/expression/), can be are aimed for entering X- and Y-coordinates, e.g. those used for the input of transcriptomics or proteomics data.
derived from a microarray experiment, to facilitate a Organism-speciﬁc mapping of the results is facilitated by the schematic view of the microarray through the ‘‘ArrayViewer'' organism information provided on the ﬁrst line of the input application. If the data are from a proteomics experiment, the ﬁle, in the format ‘#organism:' followed by the organism second and third columns can be left blank. Accordingly, it is three- or four-letter organism identiﬁer code used in KEGG.
not necessary to input the microarray coordinate information, (e.g., ‘hsa' for human and ‘mmu' for mouse). If organism- and the KEGG ID and data columns are suﬃcient. If the speciﬁc mapping is not desirable, the abbreviation for the RATIO ﬁle format is utilized, the fourth column contains the all-inclusive generic pathway can be used (‘map'). Since the data value of interest, as exempliﬁed by the ratios between interactive environment of KEGG is maintained, it is easy to control channel and target channel in Table 2. In contrast, if scroll between the many diﬀerent organism-speciﬁc pathways the EXPRESSION ﬁle format is utilized, the fourth through Results of KegArray-based analysis of quantiﬁed lipid and transcriptomics data from Kleemann et al.69 The KEGG metabolic pathway ‘‘Biosynthesis of unsaturated fatty acids'' (map01040) was the pathway that evidenced the greatest number of changes between low and highcholesterol treatment. KegArray was run with a 1.1-fold threshold, with red and orange indicating a 10% and 5% increase, respectively, yellowindicating no change (grey indicates that the enzyme/metabolite is present in the organism), and light green and dark green indicating a 5% and10% decrease, respectively. Table 4 provides a list of the top 13 pathways that diﬀered between low and high cholesterol treatment.
598 Mol. BioSyst., 2009, 5, 588–602 c The Royal Society of Chemistry 2009 seventh columns contain the total signal from the treated/ KEGG PATHWAY maps as well as KEGG BRITE and diseased sample, background signal from the treated sample, KEGG DAS for further analysis. These data can also be total signal from control sample, and background signal from mapped onto the KEGG DISEASE pathways.
the control sample in the indicated order. KegArray then In order to demonstrate the utility of KegArray, we performs the background subtraction and calculates the ratio have applied it to a dataset of gene and metabolite data between treated and control sample upon submission of the taken from Kleemann et al.69 This study was designed to examine the potential of increasing doses of dietary cholesterol The data format for metabolomics data is similar to the to evoke the inﬂammatory component that is necessary for the gene/protein data; however, only the ratio format can be used.
onset of atherosclerosis. Towards this end, ApoE*3Leiden All metabolites (compounds) must be assigned KEGG mice were fed either a control diet (cholesterol-free), COMPOUND ID numbers in order to be recognized by low cholesterol (LC, 0.25% w/w) or high cholesterol KegArray. In the data ﬁle, the ﬁrst column contains the (HC, 1.0% w/w) diet for ten weeks (to achieve early mild KEGG COMPOUND ID (e.g., C00219 for arachidonic acid) atherosclerotic plaques), with the amount of cholesterol being and the second column contains the pre-processed data value the only dietary variable in the study. At the end of the study, of interest, e.g. ratios of the target compound relative to the the mice were sacriﬁced, scored for atherosclerosis and control (Table 3).
proﬁled using microarray analysis (livers) and lipidomics Because entry IDs must be in KEGG GENES ID format, quantiﬁcation (liver and plasma). The results showed that an ID converter has also been created. Currently, the only the HC diet evoked hepatic inﬂammation and induced following external databases are supported: NCBI GI, NCBI Entrez Gene, GenBank, UniGene, UniProt and IPI. When observed with the LC diet). A total of 264 genes involved in using KegArray, a number of parameters can be customized, lipid metabolism were measured, with 23 genes diﬀerentially including the threshold, normalization and color scheme.
expressed in the LC diet, and 64 in the HC diet. In addition, The output can be viewed as signiﬁcantly either up- a range of intrahepatic fatty acids were quantiﬁed, of which regulated, down-regulated or all data that were input into 27 free fatty acids were mapped along with the gene data KegArray. These data are then visualized onto interactive onto the KEGG database using KegArray. The KegArray Results of KegArray-based analysis of quantiﬁed lipid and transcriptomics data from Kleemann et al.69 The KEGG metabolic pathway ‘‘Sphingolipid metabolism'' (map00600) evidenced a number of changes between low and high cholesterol treatment. KegArray was run with a1.1-fold threshold, with red and orange indicating a 10% and 5% increase, respectively, yellow indicating no change (grey indicates that theenzyme/metabolite is present in the organism), and light green and dark green indicating a 5% and 10% decrease, respectively. Table 4 provides alist of the top 13 pathways that diﬀered between low and high cholesterol treatment.
c The Royal Society of Chemistry 2009 Mol. BioSyst., 2009, 5, 588–602 599 parameters were set to display a 1.1-fold diﬀerence and non-aﬀected pathways were excluded. For the LC exposure,60 biochemical pathways were aﬀected (ESI, Table S1w) as One of the main current obstacles in systems biology is the opposed to 76 pathways for the HC exposure (ESI, Table S2w), heterogeneity of available datasets. The ﬁeld requires the which included all 60 pathways from the LC dosing. This creation of legacy databases of omics data that are formatted suggests that already with LC, a very pronounced adaptation to enable inter-study comparison. Many existing methodologies of liver lipid metabolism occurs. With these adaptations, the liver is capable of dealing with cholesterol as there is manipulation and analysis. In order to increase the utility very little development of early atherosclerotic lesions and and availability of these tools, it is necessary to either develop there is no signiﬁcant inﬂammation. However, when the simpliﬁed web-based applications that are equally useable for dose of dietary cholesterol is increased (HC condition), cross-disciplinary users and/or shift the educational paradigm 16 additional lipid pathways are activated. These data suggest to place increased emphasis on the acquisition of computer that a very low dose of cholesterol aﬀects a signiﬁcant part of skills. Future advances in understanding complex medical the pathways involved in lipid handling. It appears that with problems are highly dependent on methodological advances HC, the quality of the lipids changes and increased number of and integration of the computational systems biology unsaturated or proatherogenic lipids such as sphinogomyelin community with biologists and clinicians.97 are signiﬁcantly impacted. Of particular interest was the Although commercial tools are more complete in terms of diﬀerence in aﬀected pathways between LC and HC diets.
features, they are often closed platforms that do not allow for A total of 77 pathways were diﬀerentially aﬀected (ESI, the development and interchange of analysis tools and data Table S3w), of which the top 13 pathways aﬀected are provided beyond their supported applications. In addition, these tools in Table 4. These diﬀerences are shown on treatment-speciﬁc can be expensive, which can be prohibitive for the academic basis in Fig. 2. A total of 59 pathways were aﬀected in both and/or clinical settings. It is desirable that developments in LC and HC treatment, as well as between treatments. Of these ﬁelds be based upon open standards that allow the easy particular interest are the ﬁve pathways that diﬀer between LC interchange of multiple types of data and the subsequent and HC treatment, but did not evidence changes in LC or HC analyses. The adoption of standard ﬁle formats should reduce the diﬃculties in the integration of data derived from diﬀerent mmu00641 3-chloroacrylic acid degradation, mmu00680 analysis tools.
methane metabolism, mmu00980 metabolism of xenobiotics The ultimate goal for translational systems biology by cytochrome P450, and mmu00982 drug metabolism- approaches is to bring forth an understanding of the cytochrome P450). Examples of aﬀected metabolic pathways pathogenesis and disease etiology at the organism level that are shown for the biosynthesis of unsaturated fatty acids goes beyond what traditional minimalistic approaches have to (Fig. 3) and sphingolipid metabolism (Fig. 4). Kleemann oﬀer. Such an in depth understanding of the diﬀerences et al.69 reported that with increasing cholesterol uptake, the between the healthy and diseased states can help solve crucial liver switched from an adaptive state to an inﬂammatory clinical issues, and provide markers and insights that aid pro-atherosclerotic state (with LC there is primarily an clinicians in making prognostic and diagnostic evaluations.
adaptive response of key metabolic pathways required to In terms of atherosclerosis, one of the most important clinical cope with lipids). At the gene expression level, there is dilemmas is determining if and when a patient is at risk of clearly a further adaptation of the pathways switched on/oﬀ developing symptomatic disease. A systems biology approach with LC when animals receive HC. These eﬀects were could potentially identify alterations in molecular pathways in accordance with the metabolite levels, with signiﬁcant and targets that precede plaque instability, and thus assist in (p o 0.05) decreases in myristic, palmitic, stearic, arachidonic, developing molecular tools that can substitute imaging docosapentaenoic and docosahexaenoic acids. This ﬁnding modalities such as MRI or PET CT to more accurate identi- is supported by the observation that the biosynthesis ﬁcation of vulnerable lesions. Accordingly, systems biology of unsaturated fatty acids was the metabolic pathway with tools can be utilized to develop concrete clinical applications the greatest number of changes between LC and HC that will help improve patient selection, monitoring of treatment. Speciﬁc decreases were observed in unsaturated stroke preventive intervention, and other needs of the medical fatty acids in the HC treatment: a decrease in arachidonic acid was observed at p o 0.05 and docosahexaenoic The advent of systems biology is bringing forth a change in acid (DHA) at p o 0.07). This pathway is a potential source the philosophy of medicine, and is rapidly changing the way of the unsaturated fatty acid substrates for the many of we view the disease process. However, in order to realize the the pro-inﬂammatory lipids involved in the development of promise of systems biology, i.e. the understanding of the atherosclerosis (e.g., observed reductions in arachidonic acid organism as a whole, the next major challenge is to facilitate levels). Accordingly, mapping of these data to KEGG was integrated analysis of data from multiple sources.102 Without a rapid method for providing information on which the integration of individual networks and biochemical pathways were most aﬀected by cholesterol treatment and pathways into the entire system, the observed eﬀects of provided a mechanistic insight into the disease process. This individual components remain without meaning and context, new tool for the KEGG suite will be a useful compliment and cannot provide understanding of pathological processes at to existing strategies for network analysis and pathway the systems level. Some steps in the direction of integrated analyses have already been made,33 but increased integration 600 Mol. BioSyst., 2009, 5, 588–602 c The Royal Society of Chemistry 2009 of heterogeneous data and networks is non-trivial. The 26 S. E. Calvano, W. Xiao, D. R. Richards, R. M. Felciano, potential of combining the knowledge from multiple networks H. V. Baker, R. J. Cho, R. O. Chen, B. H. Brownstein,J. P. Cobb, S. K. Tschoeke, C. Miller-Graziano, L. L. Moldawer, with high-throughput data, as exempliﬁed herein by the M. N. Mindrinos, R. W. Davis, R. G. Tompkins and S. F. Lowry, KegArray tool and the KEGG database, will move us one Nature, 2005, 437, 1032–1037.
step further towards a true understanding of the living 27 K. I. Goh, M. E. Cusick, D. Valle, B. Childs, M. Vidal and organism. The rapid advances in computer sciences and A. L. Barabasi, Proc. Natl. Acad. Sci. U. S. A., 2007, 104,8685–8690.
high-throughput technologies, coupled with paradigm shifts 28 X. Wu, R. Jiang, M. Q. Zhang and S. Li, Mol. Syst. Biol., 2008, 4, 189.
in the way clinical and pre-clinical researchers perceive science, 29 U. Alon, Nat. Rev. Genet., 2007, 8, 450–461.
holds the key to understanding the intricate systems that 30 M. Isalan, C. Lemerle, K. Michalodimitrakis, C. Horn, P. Beltrao, E. Raineri, M. Garriga-Canut and L. Serrano, Nature, dictate the switch from healthy to diseased, and represents 2008, 452, 840–845.
the path that will lead us to true personalized medicine.
31 F. Markowetz and R. Spang, BMC Bioinf., 2007, 8(Suppl 6), S5.
32 T. Schlitt and A. Brazma, BMC Bioinf., 2007, 8(Suppl 6), S9.
33 N. Ishii, K. Nakahigashi, T. Baba, M. Robert, T. Soga, A. Kanai, T. Hirasawa, M. Naba, K. Hirai, A. Hoque, P. Y. Ho,Y. Kakazu, K. Sugawara, S. Igarashi, S. Harada, T. Masuda, This research was supported by the A˚ke Wibergs Stiftelse, the N. Sugiyama, T. Togashi, M. Hasegawa, Y. Takai, K. Yugi, Fredrik and Ingrid Thurings Stiftelse, The Royal Swedish K. Arakawa, N. Iwata, Y. Toya, Y. Nakayama, T. Nishioka, Academy of Sciences, the Swedish Heart-Lung Foundation K. Shimizu, H. Mori and M. Tomita, Science, 2007, 316, 593–597.
34 J. Zhu, B. Zhang, E. N. Smith, B. Drees, R. B. Brem, and the Japanese Society for the Promotion of Science (JSPS).
L. Kruglyak, R. E. Bumgarner and E. E. Schadt, Nat. Genet., C.E.W was supported by a Center for Allergy Research 2008, 40, 854–861.
Fellowship. R.K. and M.v.E. received support from the 35 C. Bonferroni, Pubblicazioni del R Istituto Superiore di Scienze TNO Research Program VP9 Personalized Health.
Economiche e Commerciali di Firenze, 1936, vol. 8, pp. 3–62.
36 R. G. Miller, Simultaneous Statistical Inference, Springer Verlag, New York, 1981, pp. 6–8.
37 Y. Benjamini and Y. Hochberg, J. R. Stat. Soc. Ser. B (Methodological), 1995, 289–300.
1 H. Kitano, Science, 2002, 295, 1662–1664.
38 J. Trygg and S. Wold, J. Chemom., 2002, 16, 119–128.
2 A. L. Barabasi and Z. N. Oltvai, Nat. Rev. Genet., 2004, 5, 101–113.
39 L. Hood and R. M. Perlmutter, Nat. Biotechnol., 2004, 22, 1215–1217.
3 A. C. Ahn, M. Tewari, C. S. Poon and R. S. Phillips, PLoS Med., 40 I. Graham, D. Atar, K. Borch-Johnsen, G. Boysen, G. Burell, 2006, 3, e208.
R. Cifkova, J. Dallongeville, G. De Backer, S. Ebrahim, 4 A. C. Ahn, M. Tewari, C. S. Poon and R. S. Phillips, PLoS Med., B. Gjelsvik, C. Herrmann-Lingen, A. Hoes, S. Humphries, 2006, 3, e209.
M. Knapton, J. Perk, S. G. Priori, K. Pyorala, Z. Reiner, 5 A. D. McCulloch and G. Paternostro, Ann. N. Y. Acad. Sci., L. Ruilope, S.
2005, 1047, 283–295.
P. Weissberg, D. Wood, J. Yarnell, J. L. Zamorano, E. Walma, 6 A. D. Weston and L. Hood, J. Proteome Res., 2004, 3, 179–196.
T. Fitzgerald, M. T. Cooney, A. Dudina, A. Vahanian, J. Camm, 7 J. van der Greef, T. Hankemeier and R. N. McBurney, R. De Caterina, V. Dean, K. Dickstein, C. Funck-Brentano, Pharmacogenomics, 2006, 7, 1087–1094.
G. Filippatos, I. Hellemans, S. D. Kristensen, K. McGregor, 8 J. van der Greef, S. Martin, P. Juhasz, A. Adourian, T. Plasterer, U. Sechtem, S. Silber, M. Tendera, P. Widimsky, J. L. Zamorano, E. R. Verheij and R. N. McBurney, J. Proteome Res., 2007, 6, I. Hellemans, A. Altiner, E. Bonora, P. N. Durrington, R. Fagard, S. Giampaoli, H. Hemingway, J. Hakansson, 9 D. J. Lockhart and E. A. Winzeler, Nature, 2000, 405, 827–836.
S. E. Kjeldsen, M. L. Larsen, G. Mancia, A. J. Manolis, 10 R. Aebersold and M. Mann, Nature, 2003, 422, 198–207.
11 B. Domon and R. Aebersold, Science, 2006, 312, 212–217.
12 A. R. Joyce and B. O. Palsson, Nat. Rev. Mol. Cell Biol., 2006, 7, L. Tokgozoglu, O. Wiklund and A. Zampelas, Eur. Heart J., 2007, 28, 2375–2414.
13 J. C. Smith and D. Figeys, Mol. BioSyst., 2006, 2, 364–370.
41 W. Rosamond, K. Flegal, K. Furie, A. Go, K. Greenlund, 14 B. F. Cravatt, G. M. Simon and J. R. Yates, 3rd, Nature, 2007, N. Haase, S. M. Hailpern, M. Ho, V. Howard, B. Kissela, 450, 991–1000.
S. Kittner, D. Lloyd-Jones, M. McDermott, J. Meigs, C. Moy, 15 K. Dettmer, P. A. Aronov and B. D. Hammock, Mass Spectrom.
G. Nichol, C. O'Donnell, V. Roger, P. Sorlie, J. Steinberger, Rev., 2007, 26, 51–78.
T. Thom, M. Wilson and Y. Hong, Circulation, 2008, 117, 16 X. Han, A. Aslanian and J. R. Yates 3rd, Curr. Opin. Chem. Biol., 2008, 12, 483–490.
42 D. B. Mark, F. J. Van de Werf, R. J. Simes, H. D. White, 17 J. Zaia, Chem. Biol., 2008, 15, 881–892.
L. C. Wallentin, R. M. Caliﬀ and P. W. Armstrong, Eur. Heart J., 18 H. Jeong, B. Tombor, R. Albert, Z. N. Oltvai and A. L. Barabasi, 2007, 28, 2678–2684.
Nature, 2000, 407, 651–654.
43 A. J. Lusis, J. Lipid Res., 2006, 47, 1887–1890.
19 S. S. Shen-Orr, R. Milo, S. Mangan and U. Alon, Nat. Genet., 44 A. J. Lusis, Nature, 2000, 407, 233–241.
2002, 31, 64–68.
45 G. K. Hansson, N. Engl. J. Med., 2005, 352, 1685–1695.
20 E. Ravasz, A. L. Somera, D. A. Mongru, Z. N. Oltvai and 46 G. K. Hansson and J. Nilsson, J. Intern. Med., 2008, 263, A. L. Barabasi, Science, 2002, 297, 1551–1555.
21 R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii 47 P. K. Shreenivasaiah, S. H. Rho, T. Kim and H. Kim do, J. Mol.
and U. Alon, Science, 2002, 298, 824–827.
Cell. Cardiol., 2008, 44, 460–469.
48 J. T. Brindle, H. Antti, E. Holmes, G. Tranter, J. K. Nicholson, C. D. Maranas, Genome Res., 2004, 14, 301–312.
H. W. Bethell, S. Clarke, P. M. Schoﬁeld, E. McKilligin, 23 E. V. Nikolaev, A. P. Burgard and C. D. Maranas, Biophys. J., D. E. Mosedale and D. J. Grainger, Nat. Med., 2002, 8, 2005, 88, 37–49.
24 V. Vermeirssen, M. I. Barrasa, C. A. Hidalgo, J. A. Babon, 49 H. L. Kirschenlohr, J. L. Griﬃn, S. C. Clarke, R. Rhydwen, A. A. Grace, P. M. Schoﬁeld, K. M. Brindle and J. C. Metcalfe, A. J. Walhout, Genome Res., 2007, 17, 1061–1071.
Nat. Med., 2006, 12, 705–710.
25 M. Stoll, A. W. Cowley, Jr, P. J. Tonellato, A. S. Greene, 50 A. M. van den Maagdenberg, M. H. Hofker, P. J. Krimpenfort, M. L. Kaldunski, R. J. Roman, P. Dumas, N. J. Schork, I. de Bruijn, B. van Vlijmen, H. van der Boom, L. M. Havekes Z. Wang and H. J. Jacob, Science, 2001, 294, 1723–1726.
and R. R. Frants, J. Biol. Chem., 1993, 268, 10540–10545.
c The Royal Society of Chemistry 2009 Mol. BioSyst., 2009, 5, 588–602 601 51 C. B. Clish, E. Davidov, M. Oresic, T. N. Plasterer, G. Lavine, 76 B. H. Junker, C. Klukas and F. Schreiber, BMC Bioinf., 2006, 7, T. Londo, M. Meys, P. Snell, W. Stochaj, A. Adourian, X. Zhang, N. Morel, E. Neumann, E. Verheij, J. T. Vogels, 77 S. Draghici, P. Khatri, A. L. Tarca, K. Amin, A. Done, L. M. Havekes, N. Afeyan, F. Regnier, J. van der Greef and C. Voichita, C. Georgescu and R. Romero, Genome Res., 2007, S. Naylor, Omics, 2004, 8, 3–13.
52 M. Oresic, C. B. Clish, E. J. Davidov, E. Verheij, J. Vogels, 78 M. Hucka, A. Finney, H. M. Sauro, H. Bolouri, J. C. Doyle, L. M. Havekes, E. Neumann, A. Adourian, S. Naylor, J. van der H. Kitano, A. P. Arkin, B. J. Bornstein, D. Bray, A. Cornish- Greef and T. Plasterer, Appl. Bioinf., 2004, 3, 205–217.
Bowden, A. A. Cuellar, S. Dronov, E. D. Gilles, M. Ginkel, 53 B. de Roos, G. Rucklidge, M. Reid, K. Ross, G. Duncan, V. Gor, Goryanin, II, W. J. Hedley, T. C. Hodgman, M. A. Navarro, J. M. Arbones-Mainar, M. A. Guzman-Garcia, J. H. Hofmeyr, P. J. Hunter, N. S. Juty, J. L. Kasberger, J. Osada, J. Browne, C. E. Loscher and H. M. Roche, FASEB J., A. Kremling, U. Kummer, N. Le Novere, L. M. Loew, 2005, 19, 1746–1748.
54 J. Y. King, R. Ferrara, R. Tabibiazar, J. M. Spin, M. M. Chen, Y. Nakayama, M. R. Nelson, P. F. Nielsen, T. Sakurada, A. Kuchinsky, A. Vailaya, R. Kincaid, A. Tsalenko, D. X. Deng, J. C. Schaﬀ, B. E. Shapiro, T. S. Shimizu, H. D. Spence, A. Connolly, P. Zhang, E. Yang, C. Watt, Z. Yakhini, A.
J. Stelling, K. Takahashi, M. Tomita, J. Wagner and J. Wang, Ben-Dor, A. Adler, L. Bruhn, P. Tsao, T. Quertermous and Bioinformatics, 2003, 19, 524–531.
E. A. Ashley, Physiol. Genomics, 2005, 23, 103–118.
79 R. C. Gentleman, V. J. Carey, D. M. Bates, B. Bolstad, 55 P. Shannon, A. Markiel, O. Ozier, N. S. Baliga, J. T. Wang, M. Dettling, S. Dudoit, B. Ellis, L. Gautier, Y. Ge, J. Gentry, D. Ramage, N. Amin, B. Schwikowski and T. Ideker, Genome K. Hornik, T. Hothorn, W. Huber, S. Iacus, R. Irizarry, Res., 2003, 13, 2498–2504.
F. Leisch, C. Li, M. Maechler, A. J. Rossini, G. Sawitzki, 56 J. Cohen, Nature, 2002, 420, 885–891.
C. Smith, G. Smyth, L. Tierney, J. Y. Yang and J. Zhang, Genome 57 H. W. Tseng, H. F. Juan, H. C. Huang, J. Y. Lin, S. Sinchaikul, Biol., 2004, 5, R80.
T. C. Lai, C. F. Chen, S. T. Chen and G. J. Wang, Proteomics, 80 K. Aoki and M. Kanehisa, Current Protocols in Bioinformatics, 2006, 6, 5915–5928.
2005, chapter 1, unit 1.12.
58 H. J. Chung, M. Kim, C. H. Park, J. Kim and J. H. Kim, Nucleic 81 C. Klukas and F. Schreiber, Bioinformatics, 2007, 23, 344–350.
Acids Res., 2004, 32, W460–464.
82 P. H. Lee and D. Lee, Bioinformatics, 2005, 21, 2739–2747.
59 D. M. Wuttge, A. Sirsjo, P. Eriksson and S. Stemme, Mol. Med., 83 A. Vailaya, P. Bluvas, R. Kincaid, A. Kuchinsky, M. Creech and 2001, 7, 383–392.
A. Adler, Bioinformatics, 2005, 21, 430–438.
60 K. Jatta, D. Wagsater, L. Norgren, B. Stenberg and A. Sirsjo, 84 M. Reimers and V. J. Carey, Methods Enzymol., 2006, 411, 119–134.
J. Vasc. Res., 2005, 42, 266–271.
85 J. Schafer and K. Strimmer, Bioinformatics, 2005, 21, 754–764.
61 P. S. Olofsson, K. Jatta, D. Wagsater, S. Gredmark, U. Hedin, 86 D. Scholtens, M. Vidal and R. Gentleman, Bioinformatics, 2005, G. Paulsson-Berne, C. Soderberg-Naucler, G. K. Hansson and 21, 3548–3557.
A. Sirsjo, Arterioscler. Thromb. Vasc. Biol., 2005, 25, e113–116.
87 V. J. Carey, J. Gentry, E. Whalen and R. Gentleman, 62 P. S. Olofsson, L. A. Soderstrom, D. Wagsater, Y. Sheikine, Bioinformatics, 2005, 21, 135–136.
P. Ocaya, F. Lang, C. Rabu, L. Chen, M. Rudling, P. Aukrust, 88 P. T. Shannon, D. J. Reiss, R. Bonneau and N. S. Baliga, BMC U. Hedin, G. Paulsson-Berne, A. Sirsjo and G. K. Hansson, Bioinf., 2006, 7, 176.
Circulation, 2008, 117, 1292–1301.
63 R. Laaksonen, M. Katajamaa, H. Paiva, M. Sysi-Aho, M. H. Johnson and T. Galitski, BMC Bioinf., 2006, 7, 286.
L. Saarinen, P. Junni, D. Lutjohann, J. Smet, R. Van Coster, 90 M. Baitaluk, M. Sedova, A. Ray and A. Gupta, Nucleic Acids T. Seppanen-Laakso, T. Lehtimaki, J. Soini and M. Oresic, PLoS Res., 2006, 34, W466–471.
One, 2006, 1, e97.
91 M. Baitaluk, X. Qian, S. Godbole, A. Raval, A. Ray and 64 J. H. Dwyer, H. Allayee, K. M. Dwyer, J. Fan, H. Wu, R. Mar, A. Gupta, BMC Bioinf., 2006, 7, 55.
A. J. Lusis and M. Mehrabian, N. Engl. J. Med., 2004, 350, 29–37.
92 C. F. Taylor, N. W. Paton, K. S. Lilley, P. A. Binz, R. K. Julian 65 H. Qiu, A. Gabrielsen, H. E. Agardh, M. Wan, A. Wetterholm, Jr, A. R. Jones, W. Zhu, R. Apweiler, R. Aebersold, C. H. Wong, U. Hedin, J. Swedenborg, G. K. Hansson, E. W. Deutsch, M. J. Dunn, A. J. Heck, A. Leitner, M. Macht, B. Samuelsson, G. Paulsson-Berne and J. Z. Haeggstrom, Proc.
M. Mann, L. Martens, T. A. Neubert, S. D. Patterson, P. Ping, Natl. Acad. Sci. U. S. A., 2006, 103, 8161–8166.
S. L. Seymour, P. Souda, A. Tsugita, J. Vandekerckhove, 66 K. H. Pietila¨inen, M. Sysi-Aho, A. Rissanen, T. Seppa¨nen- T. M. Vondriska, J. P. Whitelegge, M. R. Wilkins, I. Xenarios, Laakso, H. Yki-Ja¨rvinen, J. Kaprio and M. Oresic, PLoS One, J. R. Yates, 3rd and H. Hermjakob, Nat. Biotechnol., 2007, 25, 2007, 2, e218.
67 J. Nikkila, M. Sysi-Aho, A. Ermolov, T. Seppanen-Laakso, 93 S. A. Sansone, T. Fan, R. Goodacre, J. L. Griﬃn, N. W. Hardy, O. Simell, S. Kaski and M. Oresic, Mol. Syst. Biol., 2008, 4, 197.
R. Kaddurah-Daouk, B. S. Kristal, J. Lindon, P. Mendes, 68 J. Skogsberg, J. Lundstro¨m, A. Kovacs, R. Nilsson, P. Noori, N. Morrison, B. Nikolau, D. Robertson, L. W. Sumner, S. Maleki, M. Ko¨hler, A. Hamsten, J. Tegner and J. Bjo¨rkegren, C. Taylor, M. van der Werf, B. van Ommen and O. Fiehn, Nat.
PLoS Genetics, 2008, 4, e1000036.
Biotechnol., 2007, 25, 846–848.
69 R. Kleemann, L. Verschuren, M. J. van Erk, Y. Nikolsky, 94 G. An, J. Crit. Care, 2006, 21, 105–110; discussion 110–101.
N. H. Cnubben, E. R. Verheij, A. K. Smilde, H. F. Hendriks, 95 Y. Vodovotz, Immunol. Res., 2006, 36, 237–245.
S. Zadelaar, G. J. Smith, V. Kaznacheev, T. Nikolskaya, 96 Y. Vodovotz, C. C. Chow, J. Bartels, C. Lagoa, J. M. Prince, A. Melnikov, E. Hurt-Camejo, J. van der Greef, B. van Ommen R. M. Levy, R. Kumar, J. Day, J. Rubin, G. Constantine, and T. Kooistra, Genome Biol., 2007, 8, R200.
T. R. Billiar, M. P. Fink and G. Clermont, Shock, 2006, 26, 70 P. Libby, Nature, 2002, 420, 868–874.
71 A. Ng, B. Bursteinas, Q. Gao, E. Mollison and M. Zvelebil, 97 Y. Vodovotz, M. Csete, J. Bartels, S. Chang and G. An, PLoS Brieﬁngs Bioinf., 2006, 7, 318–330.
Comput. Biol., 2008, 4, e1000014.
72 M. Kanehisa, M. Araki, S. Goto, M. Hattori, M. Hirakawa, 98 D. Noble, Science, 2002, 295, 1678–1682.
M. Itoh, T. Katayama, S. Kawashima, S. Okuda, T. Tokimatsu 99 B. J. Bennett, C. E. Romanoski and A. J. Lusis, Expert Rev.
and Y. Yamanishi, Nucleic Acids Res., 2008, 36, D480–484.
Cardiovasc. Ther., 2007, 5, 1095–1103.
73 M. P. van Iersel, T. Kelder, A. R. Pico, K. Hanspers, S. Coort, 100 S. Y. Shin, S. M. Choo, S. H. Woo and K. H. Cho, Adv. Biochem.
B. R. Conklin and C. Evelo, BMC Bioinf., 2008, 9, 399.
Eng. Biotechnol., 2008, 110, 25–45.
74 A. Ng, B. Bursteinas, Q. Gao, E. Mollison and M. Zvelebil, 101 R. C. Kerckhoﬀs, S. M. Narayan, J. H. Omens, L. J. Mulligan Nucleic Acids Res., 2006, 34, D527–534.
and A. D. McCulloch, Heart Fail Clin., 2008, 4, 371–378.
75 S. Ekins, Y. Nikolsky, A. Bugrim, E. Kirillov and T. Nikolskaya, 102 U. Sauer, M. Heinemann and N. Zamboni, Science, 2007, 316, Methods Mol. Biol., 2007, 356, 319–350.
602 Mol. BioSyst., 2009, 5, 588–602 c The Royal Society of Chemistry 2009
Micro-Level Value Creation Under Managerial Short-termism ∗ Jonathan B. Cohn† University of Texas at Austin University of Texas at Dallas Wharton Research Data Services We present evidence that managers facing short-termist incentives set a lower threshold for accepting projects. Using novel data on new client and product an- nouncements in both the U.S. and international markets, we find that the marketresponds less positively to a new project announcement when the firm's managers haveincentives to focus on short-term stock price performance. Furthermore, textual analy-sis of project announcements show that firms with short-termist CEOs use more vagueand generically positive language when introducing new projects to the marketplace.Keywords: CEO Short-termism, Corporate Investment, CEO Compensation, CareerConcerns, Corporate Governance