B902356a 588.602
Systems biology approaches and pathway tools for investigatingcardiovascular diseasew
Craig E. Wheelock,*ab A˚sa M. Wheelock,bcd Shuichi Kawashima,e Diego Diez,bMinoru Kanehisa,be Marjan van Erk,f Robert Kleemann,g Jesper Z. Haeggstro¨maand Susumu Gotob
Received 4th February 2009, Accepted 26th March 2009First published as an Advance Article on the web 27th April 2009DOI: 10.1039/b902356a
Systems biology aims to understand the nonlinear interactions of multiple biomolecularcomponents that characterize a living organism. One important aspect of systems biologyapproaches is to identify the biological pathways or networks that connect the differing elementsof a system, and examine how they evolve with temporal and environmental changes. The utilityof this method becomes clear when applied to multifactorial diseases with complex etiologies,such as inflammatory-related diseases, herein exemplified by atherosclerosis. In this paper, theinitial studies in this discipline are reviewed and examined within the context of the developmentof the field. In addition, several different software tools are briefly described and a novelapplication for the KEGG database suite called KegArray is presented. This tool is designed formapping the results of high-throughput omics studies, including transcriptomics, proteomics andmetabolomics data, onto interactive KEGG metabolic pathways. The utility of KegArray isdemonstrated using a combined transcriptomics and lipidomics dataset from a published studydesigned to examine the potential of cholesterol in the diet to influence the inflammatorycomponent in the development of atherosclerosis. These data were mapped onto the KEGGPATHWAY database, with a low cholesterol diet affecting 60 distinct biochemical pathways anda high cholesterol exposure affecting 76 biochemical pathways. A total of 77 pathways weredifferentially affected between low and high cholesterol diets. The KEGG pathways ‘‘Biosynthesisof unsaturated fatty acids'' and ‘‘Sphingolipid metabolism'' evidenced multiple changes ingene/lipid levels between low and high cholesterol treatment, and are discussed in detail.
Taken together, this paper provides a brief introduction to systems biology and the applicationsof pathway mapping to the study of cardiovascular disease, as well as a summary of availabletools. Current limitations and future visions of this emerging field are discussed, with theconclusion that combining knowledge from biological pathways and high-throughput omics datawill move clinical medicine one step further to individualize medical diagnosis and treatment.
a Department of Medical Biochemistry and Biophysics,
Division of Physiological Chemistry II, Karolinska Institutet,S-171 77, Stockholm, Sweden. E-mail:
[email protected];
An organism is an individual living system capable of reacting
Fax: +46-8-736-0439; Tel: +46-8-5248-7630
to stimuli, reproducing and maintaining a stable structure
b Bioinformatics Center, Institute for Chemical Research,
over time. Organisms are composed of multiple individual
Kyoto University, Uji, Kyoto, 611-0011, Japan
components, e.g. cells and their corresponding genes, proteins,
Lung Research Lab L4:01, Respiratory Medicine Unit, Departmentof Medicine, Karolinska Institutet, 171 76, Stockholm, Sweden
metabolites, etc., which are all governed by an intricate
d Karolinska Biomics Center Z5:02, Karolinska University Hospital,
network of interactions. This network is not static, and the
171 76, Stockholm, Sweden
various components evolve and adapt dynamically to internal
Human Genome Center, Institute of Medical Science, University of
and environmental changes. The study of this complex system
Tokyo, 4-6-1 Shirokanedai, Minato-ku, Tokyo, 108-8639, Tokyo
f Department of Physiological Genomics, TNO-Quality of Life,
as a single entity is a challenge that has been traditionally
BioSciences, Utrechtseweg 48, 3704 HE, Zeist, The Netherlands
addressed by studying different components of the system
g Department of Vascular and Metabolic Disease, TNO-Quality of
in isolation. Although such approaches have produced a
Life, BioSciences, Gaubius Laboratory, Zernikedreef 9,2333 CK, Leiden, The Netherlands
significant amount of knowledge and understanding, they
w Electronic supplementary information (ESI) available: Complete list
are limited in their ability to predict the effects of alterations
of all KEGG biochemical pathways identified by KegArray as being
in single or multiple components upon the dynamics of the
affected by low cholesterol treatment, high cholesterol treatment, and
whole system. This limitation may reflect why in some cases,
differentially affected between low and high cholesterol treatment. SeeDOI: 10.1039/b902356a
significant research advances do not translate, for example,
588 Mol. BioSyst., 2009, 5, 588–602
c The Royal Society of Chemistry 2009
into improved therapeutics or a ‘‘cure'' for the disease under
interindividual basis. The normal or control state is more
study. The discipline of systems biology attempts to shift the
appropriately categorized as one of dynamic stability in which
way in which an organism is perceived to address the
our concept of homeostasis is more correctly defined as
complexity of living systems. Multiple definitions for systems
homeodynamics.3 Accordingly, by defining the parameters of
biology exist, one of which describes it as a new field of study
the network that determine disease from healthy state, inter-
that aims to understand the living cell as a complete system.1,2
ventions or treatments can be derived that are tailored for the
In other words, systems biology seeks to understand how
individual variability of the parameters for this steady state—
system properties emerge from the nonlinear interactions of
in other words, personalized medicine.
multiple components.3,4
The era of personalized medicine has been heralded for a
The applications of systems biology approaches are
number of years, and systems biology is a key component of
increasing dramatically; however, the exact nature of what a
this new paradigm.6–8 The intent is to identify disease before
‘‘systems approach'' entails remains diffuse in the literature.
pathogenic manifestation, thereby initiating therapeutic inter-
The fundamental theme of systems biology is integration
vention prior to significant adverse effects. Current medical
practice is a reductionist approach that involves treating each
disciplines.5 However, it should be noted that systems science
problem or symptom in isolation. By these standards, the relief
is not novel and has been advocated for many years in a
of symptoms as determined by clinical evaluations following a
number of research fields. At the simplest level, a systems
treatment regimen embodies the definition of a cured or
approach signifies a study based upon examining the entire
maintained patient. A corresponding ‘‘limited'' systems
‘‘system'' simultaneously, as opposed to a reductionist
biology approach, where a multitude of clinical and bio-
approach that focuses on a single gene, metabolite, pathway,
chemical variables are combined with multivariate statistical
etc. In other words, a systems biology approach does not focus
analyses often reveals that the patient indeed has been
on identifying a single target or mechanism for an observed
removed from the disease group following treatment, but
phenotype (e.g. disease). Systems biology instead seeks to
not necessarily back towards a healthy state as is often
identify the biological networks or pathways that connect
assumed. Instead, the treated patient belongs to a novel
the differing elements of a system, and in the process describe
biological status, distinctly different from both healthy
the characteristics that define a shift in equilibrium, such as
individuals and peers in the disease group. This novel
metabolic fluxes or altered protein activities, which may cause
pharmacological state is generally not discernable in classical
a shift from a healthy to a diseased state. The hypothesis then
medicine, as the patient per definition is classified as belonging
becomes that those components of the network that are
to the ‘‘healthy'' group as soon as the symptoms that define
the disease are no longer detectable. More importantly, the
and potentially descriptive of the disease, and accordingly
classical reductionist approach does not reveal the novel
represent potential targets for intervention to return the
pharmaceutical state that the treatment regimen has induced,
system to its original state (i.e. a healthy state). However, it
and consequently implications on the patient's future health
is important to realize that the concept of equilibrium may not
cannot be predicted. In contrast, a true systems biology
be as static as previously thought. It is more likely that
approach offers the ability to distinguish between multiple
equilibrium is a steady state that represents a range of
disease, healthy, or pharmacological states, as well as
fluctuations in the biological network that varies on an
causative and adaptive responses and variables. However, in
Associate Professor Craig E. Wheelock heads a research group at
University and a professor at the Human Genome Center
the Karolinska Institute that examines the role of bioactive lipid
in the Institute of Medical Science at the University of
mediators in inflammatory diseases, with a focus on cardiovascular
Tokyo. His research involves deciphering systemic biological
disease. He is broadly interested in the development of bioinfor-
functions by integrated analysis of genomic and chemical
matics tools for probing inflammatory diseases at the systems level.
Assistant Professor A˚sa M. Wheelock heads a research group at
Dr. Marjan van Erk is a researcher at TNO Quality of Life who
the Karolinska Institute that investigates pneumotoxicants and
is interested in developing bioinformatical systems biology tools
inflammatory lung diseases, as well as gel-based quantitative
for metabolic and cardiovascular diseases.
Dr. Robert Kleemann heads a research unit at TNO Quality of
Assistant professor Shuichi Kawashima is a researcher at the
Life that investigates the role of inflammation in cardiovascular
Human Genome Center in the Institute of Medical Science at
disease and metabolic disorders and has particular interest in
the University of Tokyo who is broadly interested in the devel-
gene regulation and drug intervention.
opment of genome databases, bioinformatics web services and
Professor Jesper Z. Haeggstro¨m heads a research group at the
the biology of eukaryotic genomes.
Karolinska Institute that examines the role of bioactive lipid
Dr. Diego Diez is a postdoctoral researcher at the Kyoto
mediators in inflammatory disease.
University Bioinformatics Center working on applying systems
Associate Professor Susumu Goto is interested in the develop-
biology approaches to cardiovascular disease.
ment of databases for molecular interaction networks and
Professor Minoru Kanehisa is the Director of the Bioinformatics
network analysis using the KEGG database suite. His work also
Center in the Institute for Chemical Research at Kyoto
involves in silico metabolic reconstruction.
c The Royal Society of Chemistry 2009
Mol. BioSyst., 2009, 5, 588–602 589
order to make conclusions regarding causative relationships, it
requiring the life scientist to become familiar with this research
is necessary to have a sufficient number of variables and
field. These technical properties provide information regarding
observations. In addition, the quantitative quality and source
the global behavior of the network and therefore of the
of the data, as well as the choice of multivariate statistical tools
biological system under study. For example, one important
both in the experimental design and the post-experimental
finding was the scale-free topology nature of biological
analyses, are vital for interpretation.
networks. In this type of network, most nodes have few links,
The increase in systems biology applications is a reflection
whereas a few nodes have many links (called hubs or nexus
of a ‘‘perfect storm'' of advances in analytical methodology,
nodes). One of the translations of this characteristic into a
computing power and data acquisition. The completion of the
biological context is the hypothesis that hub nodes perform
human genome sequencing project heralded the age of
key functions in the network. Accordingly, many fundamental
large-scale biology and data acquisition. This paradigm shift
genes, proteins, enzymes and compounds have been identified
coupled to commensurate developments in technology and
as hubs in their respective biological networks. Another
experimental techniques that can simultaneously interrogate
consequence derived from this finding is that because of the
many elements of a system (i.e., microarrays, mass spectro-
sparse nature of scale-free networks (i.e. most nodes having a
metry, computational power and the Internet) has led to a
few edges), they are very robust to environmental alterations.
veritable explosion in ‘‘omics'' science and systems biology
However, although network analysis can help us understand
related research. The challenge for systems biology is to
the behavior of the system as a whole, the importance of
integrate the disparate disciplines of biology, chemistry,
individual elements is not lost in this global view. For example,
statistics, computer science and engineering into a cohesive
the study of biological networks shows that complex networks
science. Towards this end, it is necessary to develop common
are constructed of recurrent simple motifs.29 Initially described
platforms for the analysis, presentation and archiving of data
in simple bacteria, these motifs are also found in the regulatory
to ensure inter-laboratory and cross-disciplinary compatibility
networks of higher eukaryotes and are fundamental to
and accessibility of data sets. Significant steps have already
understanding the behavior of complex networks, including
been taken in this direction, and it is not our aim to review the
biological networks. Moreover, the mathematical models used
status of the technological platforms or compatibility of
to generate the network itself can be used to predict the
data formats, as these aspects have been covered in
behavior of the network when specific elements are altered.
detail elsewhere.9–17 In contrast, this review focuses on the
For example, what are the effects if a specific node of a gene
integration of different types of data sets, and aims to
regulatory network is removed by a knockout mutation?
summarize the current state of systems biology research into
How does this change affect the global stability and robustness
cardiovascular disease as well as present a number of different
of the network, and eventually, the phenotype of the
pathway mapping tools that have been developed. In addition,
studied system? Systems biology seeks to answer these and
an example of a pathway analysis of atherosclerosis is
other questions by modeling the relationship between the
presented using a novel tool for mapping of omics data to
the KEGG database suite.
One critical step is how the network is constructed from the
raw data (transcriptomics, proteomics, metabolomics, etc.).
Networks in a nutshell
This is accomplished by using different mathematicaltechniques, ranging from simple Pearson correlations to the
One of the recurrent concepts in system biology is that of the
use of ordinary differential equations, Boolean networks, etc.
network. Much of the early work in networks focused
(reviewed in refs. 31 and 32). Through this modeling,
on simple model organisms including bacteria, yeast and
fundamental concepts in the understanding of biological
nematodes;18–24 however, this work is expanding to the under-
systems, like robustness, modularity, emergence, etc. are
standing of human diseases.25–28 A network type of represen-
incorporated. Unfortunately not of all these questions are
tation formalizes the interaction of different components of a
easily answered, even within the context of the systems biology
system utilizing the infrastructure of a branch of mathematics
paradigm. Whereas most studies currently focus on individual
called graph theory. In the network paradigm, nodes represent
networks (i.e. a transcription network or a protein–protein
elements of the system while relations are symbolized by edges.
interaction network), in reality these different networks func-
For example, in a metabolic network, enzymes and com-
tion as a connected system. Therefore, a change in the gene
pounds are nodes, and reactions are edges. In a protein–protein
regulatory network may have a corresponding effect in the
interaction network, two nodes connected by an edge
protein–protein interaction network, the metabolic network,
represent interacting proteins. This formalism enables the
etc., which collectively may manifest changes in the observed
study of living systems in a way never thought possible before.
phenotype. To understand the whole system, it is critical to
The individual elements are integrated in a network whose
integrate knowledge from different studies. However, the
properties can be analyzed globally: the number of edges per
crosstalk between different networks is not yet well understood
node, the degree distribution (the probability that a node has a
and although some progress has been made,33,34 the
specific number of edges), the cluster coefficient, etc. Barabasi
integration of different types of data is still in its infancy.12
and Oltvai have reviewed these concepts in detail, and
Through the generation of mathematical models that integrate
provided a comprehensive review of the terminology and
different types of data (e.g. transcriptomic, metabolomic, and
concepts associated with network analysis.2 This new termi-
protein–protein interactions),2 we can explain the observed
nology is increasingly prevalent in the biological literature,
phenotype, and hopefully make predictions regarding how the
590 Mol. BioSyst., 2009, 5, 588–602
c The Royal Society of Chemistry 2009
phenotype is altered when the network itself is modified
components is utilized, it is possible to build a model that can
through the alteration of internal or environmental factors.
describe any data set with a perfect correlation (i.e. R2 = 1.0;Fig. 1). A comparison of the correlation coefficient to the
Data processing and statistical analysis
predictive power of the model is therefore essential. Thepredictive power (Q2) can be calculated through the use of a
The pre-processing of data is crucial in network applications,
training set and a test set, or if the data set is too small to allow
as well as other systems level analyses. It is important to
this, through a cross-validation approach. A good rule of
recognize that the nature of large scale omics data is very
thumb is to remove all components that do not contribute
different from that of reductionist approaches, and other
to an increased predictive power of the model. If the data set is
statistical methods should be utilized. The majority of the
sufficiently large, Q2 can be used as a measure to evaluate the
univariate methods that have dominated biological sciences
robustness of the model in relation to the whole population.
for centuries (e.g. Student's t-test) are not well-suited for a
Another concern when utilizing MVA is that of strong
number of reasons. For example, univariate statistical
outliers. One should be cautious of any observation that is
methods employ repeated testing to evaluate whether the null
located on either end of the axis of the first component
hypothesis for a certain variable can be rejected, i.e. if it is
(strong outliers), as it is likely that characteristics that are
significantly altered compared to the control group. Given the
unique for this individual are influencing the entire model.
cumulative nature of the error in repeated testing, these
Interpretability represents another concern in MVA. MVA
methods are prone to high false positive rates, which become
summarizes the entire data set in a few latent variables, which
particularly pronounced in omics analyses where a large
cannot be directly connected to the original measured
number of variables are tested simultaneously. Even though
variables. As such, it can be difficult for the untrained eye to
a range of approaches have been developed to correct
interpret which variables are important or ‘‘significant'' in
for the resulting large false positive rates, most notably
driving the separation of the different study groups. This
Bonferroni35,36 and false discovery rate (FDR) corrections,37
becomes particularly pronounced in more complex analyses
the use of univariate methods remains a compromise. The fact
such as PLS. A recent addition to this group of analysis,
that univariate methods are very sensitive to missing data
orthogonal PLS (OPLS), greatly simplifies the interpretability
points further decreases the robustness of network analyses
by separating the variance in the data set according to the
based solely on traditional statistical pre-processing of
correlation to the selected Y matrix (e.g. disease group).38 In
contrast, the ‘‘orthogonal'' component pulls out the variance
Multivariate analysis (MVA) is a more suitable option for
that is not correlated to the Y-variables of interest, and thus
these ‘‘short and fat'' data sets that are typical for omics
represents internal variance in the X-matrix. While this
studies (i.e. a large number of variables with few observations).
approach is well-suited for motivating variable selection, it
Instead of repeated testing of single variables, MVA aims to
should be used cautiously in this aspect, given that the
create a model that reduces the complexity of multi-
back-drop of the method is a supervised selection of the
dimensional data to a few latent variables that express the
Y-variables that determine the separation. When in doubt, it
majority of the variance of the data set. Exemplified
is generally better to include all of the variables in subsequent
by principal component analysis (PCA), the most utilizedunsupervised method in omics applications, the model isstructured so that the first principal component (PC1) isoriented so that it describes the largest possible portion ofthe variance in the data set that can be described by a linearvector. Accordingly, each subsequent PC contains a smallerportion of the variance in the data set than the previouscomponent. Given that the MVA is based on all individualvariable data points for all observations, the resulting model isrobust both against false positives and missing data points.
Furthermore, a confidence interval representing all of thevariables is obtained, in contrast to univariate methods whereeach variable is analyzed as a separate unit, and consequentlyonly confidence intervals for individual variables can beobtained. MVA can also be utilized to perform regressionanalysis between large data sets, most commonly throughpartial least squares between latent structures (PLS). Thesetypes of analyses are referred to as supervised methods,
Overfitting of data represents one of the main pitfalls
since the user defines which variables belong to the X dataset
associated with multivariate analyses. With a sufficient number of
(dictating variables) and which belong to the Y dataset
components, a model that explains 100% of the variance (R2 = 1.0)
can be built for any data set. In the above example, the simplest
While useful, multivariate statistical methods are not
(linear) model represents the most representative model for the data,
without their own weaknesses. A major pitfall in MVA relates
demonstrating that the simplest model provides optimal prediction,
to overfitting of the model to the data. If a sufficient number of
even though the correlation coefficient is lower.
c The Royal Society of Chemistry 2009
Mol. BioSyst., 2009, 5, 588–602 591
analyses. Taken together, this section emphasized the point
syndrome are recalcitrant to current interventions and
that it is vital to employ the correct statistical analysis in both
challenge the ability of the pharmaceutical industry to produce
experimental design as well as data processing. These
effective and inexpensive therapies. For example, in cardio-
approaches require an in-depth knowledge of MVA in order
vascular disease, each known risk factor is addressed
to correctly interpret the output of statistical models, prevent
individually, whether it be hyperlipidemia or hypertension.3
overfitting of the data, apply multitest corrections, and achieve
However, given the complex etiology of this disease, it is
an appropriate balance of false positives and power.
likely that multiple factors are responsible for the observedpathology, resulting in a need for holistic treatment
Systems biology in cardiovascular disease
approaches that address the underlying problems. Accord-ingly, these diseases are logical targets for systems biology
The utility of systems biology becomes clear when applied
approaches to understanding disease mechanism, progression
to multifactorial diseases whose etiology is complex. For
and pathogenesis.
example, the etiology of inflammatory diseases such as
atherosclerosis and asthma has proven recalcitrant to
and linked to other systemic disorders,43,44 and the role of
elucidation with reductionist approaches. It is possible that
inflammation in the development of atherosclerosis and
part of the difficulty in identifying new therapeutics lies in the
cardiovascular disease is firmly established.45,46 The onset
inability of current approaches to visualize the complexity of
and development of cardiovascular disease has been shown
these biological systems.39 The development of lead drug
to involve multiple factors including lifestyle, diet, body
candidates would also benefit from a systems approach. For
mass index, (epi)genetics, dyslipidemia, hypertension, and
example, drugs such as torcetrapib, statin + ezetimide and
inflammation among others. However, the current paradigm
rimonabant have been withdrawn from the market because
of patient treatment involves addressing these individual
of side effects that were not predicted with reductionistic
risk factors in isolation, even though they are known to
thinking. Diseases and disorders such as cardiovascular
concomitantly contribute to disease pathogenesis. While
disease, diabetes, metabolic syndrome, asthma and chronic
effective in many cases, this approach has not provided a cure
or even a full understanding of the disease, which remains a
complicated developments that resist efforts to identify a single
major source of mortality and morbidity worldwide.
gene or pathway responsible for disease onset and progression.
A number of studies have begun to address the issues
Numerous therapeutics have been successfully developed that
outlined above in a comprehensive fashion, and active
intervene in different stages of the disease; however, we are still
research is being performed to develop systems biology
far from developing a true cure for any of these pathologies.
approaches to cardiovascular disease.47 We present a few of
The cellular complexity of many of the affected organs
these studies in chronological order, but stress that this list is
represents a major obstacle in the elucidation of the systems
not comprehensive. Many of the early studies that performed
biology behind these pathologies. The lung, for example,
systems biology-related investigations into cardiovascular
consists of more than 40 different cell phenotypes, all of which
disease focused on a single omics profiling method (i.e.,
may elicit different responses to up- or down-regulation of a
transcriptomics or metabolomics) and then included clinical
certain factor. Add to that the spatial and temporal aspects of
parameters using multivariate statistics to develop models of
the cellular response, and we are starting to approach the true
disease. It is only recently that unifying systems biology
complexity of biological systems. Accordingly, while beyond
models employing multiple analytical platforms linked with
the scope of this review, sampling design and strategy can have
bioinformatics analyses have been produced. One of the
significant effects upon experimental observations. Given
earliest attempts to bring systems biology to cardiovascular
the heterogeneity of many tissue types, it is challenging to
function involved mapping important cardiovascular pheno-
reproducibly sample tissue in such a way as to enable
types onto the human genome. Stoll et al. studied 239
intra- and interlab comparisons. The obstacles involved in
cardiovascular and renal phenotypes in 113 male rats. They
this area are not trivial and need to be addressed by the
identified and mapped a total of 81 cardiovascular phenotypes
research community.
from an F2 intercross onto the human genome using correla-
Cardiovascular disease is the major cause of premature
tion patterns (‘‘physiological profiles'') and comparative
death in Europe, resulting in 44 million deaths in the year
genomics.25 The resulting genomic-systems biology map
2000.40 In the United States, cardiovascular disease was
was applicable for gene hunting and mechanism-based physio-
responsible for one of every five deaths in 2004, with an
logical studies of cardiovascular function. For example, the
average of one death every 37 seconds.41 The rapidly increasing
authors presented a correlation matrix with phenotypic
incidence of obesity and commensurate health effects
ordering of 125 likely determinants of arterial blood pressure,
including atherosclerosis, metabolic syndrome and diabetes
which could be used to assess the impact of allelic substitutions
is of epidemic proportions, with the potential for significant
on each of the traits in either the parental or F2 generation
increases in developing countries. It is anticipated that the
of the intercross. The phenotypes were grouped into
‘‘BRIC'' countries (Brazil, Russia, India and China) will
functionally related clusters (vascular, heart, renal, endocrine
significantly contribute to the global cardiovascular disease
burden such that by 2020 an additional B4% of deaths in the
blood pressure, and ordered within the clusters by known
world will be due to ischemic heart disease.42 The complexities
physiological relationships. All of the results of the linkage
analyses and the phenotypic physiological profiles for each
592 Mol. BioSyst., 2009, 5, 588–602
c The Royal Society of Chemistry 2009
While useful for identifying potential markers of disease,
the previous studies do not represent a systems methodology.
(http://brc.mcw.edu/phyprf/). A more diagnostic application
One of the first comprehensive systems biology approaches
was presented by Brindle et al. who employed a supervised
involving the integration of multiple omics platforms
partial least squares discriminant analysis (PLS-DA) approach
(transcriptomics, proteomics and metabolomics) examined
to analyze 1H NMR spectra of human serum to diagnose the
presence, as well as the severity of coronary heart disease.48
(ApoE*3Leiden) mouse model (a commonly used model of
The PLS-DA model predicted the presence of coronary heart
atherosclerosis50). The authors integrated gene transcripts,
disease with a sensitivity of 92% and a specificity of 93%
and protein and lipid data along with their putative relation-
based on a 99% confidence limit. The major driving factor for
ships to gain insight into the early onset of disease.51,52 As is
the observed separation in severe coronary heart disease
common with many systems approaches, the authors devel-
patients (triple vessel disease, TVD) was the presence of lipids,
oped a number of their methods for data processing
particularly LDL and VLDL, whereas the most influential
and network analysis in-house, demonstrating a significant
loadings for the angiographically normal coronary arteries
obstacle in the advance of systems biology. It is challenging to
(NVA) were HDL-associated (e.g., fatty acid chains and
integrate bioanalytical results from multiple platforms and
phosphotidylcholine). Of particular importance is the fact that
between different research groups, making it difficult to
the authors confirmed that the method was able to diagnose
standardize results.12 The ApoE knockout mouse was used
coronary heart disease independently of the inevitable
in another investigation into atherosclerosis mechanisms
associated gender bias. However, work by Kirschenlohr
involving conjugated linoleic acids (CLAs) to determine how
et al. concluded that plasma-based 1H NMR analysis is a
individual CLA isomers differently affected pathways involved
weak predictor of coronary heart disease.49 They found that
in atherosclerosis.53 ApoE knockout mice were fed a diet
the predictive power was significantly weaker, with NVA and
supplemented with 1% cis9, trans11-CLA, 1% trans10,
coronary heart disease groups identified 80.3% correctly for
cis12-CLA or 1% linoleic acid for twelve weeks. The effects
patients not receiving statin therapy and 61.3% for patients
upon lipid and glucose metabolism were measured, as well as
treated with statins. The main reason postulated for the
the regulation of hepatic proteins. Correlation analysis
observed study discrepancy was the inclusion of additional
between physiological and protein data identified two clusters
variables in the Kirschenlohr et al. study, including drug
associated with glucose metabolism. The results showed that
treatment regimen. Statins significantly affect LDL levels,
cis9, trans11-CLA specifically increased expression of the
which was a discriminating factor in the PLS-DA model.
anti-inflammatory HSP 70, as well as decreased expression
Accordingly, as the most significant loadings associated with
of the pro-inflammatory macrophage migration inhibitory
diagnosis in both studies were related to lipid species, it is not
factor, suggesting that consumption of cis9, trans11-CLA
surprising that treatments affecting lipid levels influenced the
could protect against the development of atherosclerosis.
observed separation power of the model. In other words, statin
A systems biology approach to elucidating biological
treatment partially resolves the incidence of coronary artery
pathways in coronary atherosclerosis was published by King
disease, thus reducing the biomarker signal in these patients. It
et al. who performed custom microarray analysis of coronary
would be interesting to further examine these patients to
artery segments.54 A number of clinical variables were
determine if they were truly moving towards a ‘‘healthy''
examined, and diabetic states provided the most interesting
phenotype or were instead representative of a third pharma-
results, with 653 upregulated genes in the no diabetes class and
cological state as discussed above. This point demonstrates
37 upregulated genes in the diabetes class, with an FDR of
one of the main challenges in developing diagnostic markers of
0.08%. The top gene upregulated in the diabetes class was
complex disease in that in many cases patients will present
IGF-1, followed by the IL-1 receptor and IL-2 receptor-a,
distinct genotypes as well as personal therapeutic treatment
indicating that there were changes in cytokine-induced
regimens that can potentially confound the use of biomarkers,
immune and inflammatory responses. These results suggest
as reported by Brindle et al. At the very least, these studies
that inflammation is more prominent in diabetic than
demonstrate the importance of including as much patient
metadata in the analyses as possible. The work of both
expression profiles were then used to construct a novel
groups supports further research into exploring the potential
pathway based upon gene connectivity as determined by
of applying metabolomics methods to identify plasma
language parsing of the published literature, and ranking as
(i.e., non-invasive) biomarkers of coronary heart disease. It
determined by the significance of differentially regulated genes
is possible that biomarkers could be identified in a study with
in the network. The resulting gene subnets were visualized with
increased cohort size composed of the myriad of clinical
Cytoscape, an open-source bioinformatics resource (discussed
and interindividual variables. An important aspect of these
in more detail below55), to identify nexus genes in disease
metabolomic analyses is that in order to correctly classify
severity. Results indicated that the key process in the
individuals with coronary heart disease, it is not necessary
progression of atherosclerosis relates to smooth muscle cell
to fully understand the complex molecular differences
dedifferentiation, suggesting a focus on changes in the smooth
that underlie disease etiology.48 This methodology is an
muscle phenotype as a target for atherosclerosis. The results
important first step towards being able to identify individuals
also provided insight into the severe form of coronary artery
at risk of disease development or in the early stages of
disease associated with diabetes, reporting an overabundance
disease onset.
of immune and inflammatory signals in diabetics. This method
c The Royal Society of Chemistry 2009
Mol. BioSyst., 2009, 5, 588–602 593
for querying multiple search engines and/or databases
biomarker of myopathy. The results showed that the arachi-
combined with parsing of the retrieved results (documents)
donate 5-lipoxygenase activating protein gene (ALOX5AP)
for biological associations is extremely powerful for generating
had high positive regression coefficients with plasma levels
networks, and is used extensively in multiple software
of phosphatidylethanolamine(42:6) and negative regression
applications for network generation.
coefficients for cholesterol ester ChoE(18:0). These results
Lipopolysaccharide (LPS) is a critical inducer of sepsis,
were particularly intriguing as the ALOX5 gene has been
which is characterized by systemic inflammation, hypotension
previously shown to predispose humans to atherosclerosis.64,65
and multiple organ failure.56 Tseng et al.57 examined the
This systems biology approach successfully identified potential
molecular effects of late-phase LPS stimulation on primary
plasma-based markers of the effects of statin treatment
rat endothelial cells in an attempt to develop diagnostic
and showed that observed effects upon pathways were
markers of inflammatory disease. A combination of cDNA
statin-specific. In particular it also provided mechanistic
microarray, 2-DE and MALDI-TOF MS/MS, as well as
insight into the development of atherosclerosis, demonstrating
cytokine protein arrays were analyzed using custom bio-
the utility of a systems approach. A similar method was
informatics applications. Differentially expressed genes and
employed by Pietila¨inen et al. who examined obesity in
proteins were mapped onto their corresponding biological
pathways using BioCarta or KEGG, and the results were
obesity to be associated with deleterious alterations in lipid
ordered using the BGSSJ software (bulk gene search system
metabolism pathways known to promote atherogenesis,
for Java) followed by analysis with ArrayXPath.58 The results
inflammation and insulin resistance.66 Intriguingly, they
showed significant effects (p o 0.05) on the BioCarta path-
reported that obesity primarily related to increases in
ways ‘‘LDL pathway during atherogenesis'', ‘‘MSP/RON
lyso-phosphatidylcholines and decreases in ether phospholipids.
receptor signaling pathway'' (MSP, macrophage-stimulating
Nikkila¨ et al.67 used this method to examine the gender-
protein; RON, tyrosine kinase/receptor d'origine nantais),
dependent progression of systemic metabolic states in early
‘‘signal transduction through IL-1R'', and ‘‘IL-5 signaling
childhood. They were able to categorize children in terms of
pathway'', demonstrating that inflammatory pathways were
metabolic state at a very young age (from birth to 4 years old).
significantly affected by LPS treatment, as would be expected.
Using lipidomics profiling methodology and hidden Markov
Overall, this study used a systems biology approach to
models, they found that the major developmental state differ-
show that NF-kB-associated responses in endothelial cells
ences between girls and boys can be attributed to sphingolipids.
affected pathways involved in proliferation, atherogenesis,
They also found multiple previously unknown age- and gender-
inflammation and apoptosis, thereby providing information
related metabolome changes of potential medical significance.
on multiple pathways simultaneously. However, it should be
In addition, they demonstrated the feasibility of state-based
stressed that it is necessary to differentiate protein concentra-
alignment of personal metabolic trajectories, which is an
tions from protein activities in order to make meaningful
important proof-of-principle step for applications of meta-
deductions. Several studies using ‘‘focused'' arrays to analyze
bolomics towards systems biology and personalized medicine.
Children were shown to have different development rates at the
confirmed that short-term LPS exposure results in vivid
level of the metabolome and thus the state-based approach may
upregulation of a spectrum of proinflammatory genes
be advantageous when applying metabolome profiling in search
including IL-1b, IL-15, interferon-induced genes, and a series
of markers for subtle (patho)physiological changes.
of TNF superfamily members.59–62
Statins are an important therapeutic in the control of
plasma lipoproteins upon plaque formation using the
hyperlipidemia, with demonstrated efficacy in lowering
Ldlr / Apo100/100Mttpflox/floxMx1-Cre mouse model, which
cholesterol levels. However, there are concerns regarding the
has a plasma lipoprotein profile similar to that of familial
development of statin-induced myopathy following aggressive
hypercholesterolemia and a genetic switch to block the hepatic
treatment. Laaksonen et al. employed a systems biology
synthesis of lipoproteins.68 Transcriptional profiling of
approach to probe the cellular mechanisms leading to
atherosclerosis-prone mice with human-like hypercholestero-
myopathy and identify potential biomarkers.63 Muscle
lemia and reverse engineering of whole-genome expression
biopsies were analyzed for whole genome expression and
data provided a network of cholesterol-response atherosclerosis
plasma samples were profiled using a lipidomics approach.
target genes. This regulatory gene network appeared to
The microarray analysis revealed modest changes in the
control foam cell formation, suggesting that these genes could
atorvastatin treatment group (five altered genes), but 111
potentially serve as drug targets to prevent the transformation
genes were affected in the simvastatin group. The differences
of early lesions into advanced, clinically significant plaques.
in response are not necessarily unexpected given that the two
Kleemann et al. employed a systems approach to examine
statins differ in their hydrophobicity/lipophilicity, and thus in
the effects of dietary cholesterol upon atherosclerosis.69 Of
the extent that they affect the vasculature. The lipidomics
particular interest in this study is the focus of the effects of
profiling identified 132 unique lipid molecular species
dietary cholesterol upon inflammation. The role of inflamma-
(however, this method does not allow for the unequivocal
tion in cardiovascular disease and atherosclerosis in particular
identification of fatty acid substitution position on lipid head
has been established;70 however, the source of inflammation
groups). The gene expression data and the lipidomics data
and the exact mechanisms of how inflammation is evoked and
were combined following gene set enrichment analysis (GSEA)
contributes to disease development and progression are still
and further analyzed with PLS-DA to look for a plasma-based
unclear. The data of Kleeman et al. demonstrated that the liver
594 Mol. BioSyst., 2009, 5, 588–602
c The Royal Society of Chemistry 2009
is capable of absorbing moderate cholesterol-induced stress
pSTIING. These types of tools enable the visualization of the
(up to about 0.5% w/w in the diet), but a further increase
results integrated with the information provided in these
evoked the expression of hepatic pro-inflammatory genes
databases. Other tools enable the generation of networks that
including a number of pro-atherosclerotic candidate genes.
are inferred from omics data, such as Cytoscape (through
These data also showed that dietary cholesterol can be a
several plugins), VANTED, some of the R/Bioconductor
trigger of hepatic inflammation (as reflected by elevated
packages79 and many of the commercial software packages.
plasma levels of acute phase genes) and that it may be involved
Most of these tools can also be used to analyze and manipulate
in the development of the inflammatory component of
networks. However, to date there is no perfect solution and
atherosclerosis by switching on four distinct inflammatory
substantial effort is needed to integrate multiple datasets in a
comprehensive fashion. Herein we provide a brief overview of
pathways). Furthermore, the authors used a network
some of the diverse options.
analysis approach to demonstrate that lipid metabolism and
The Kyoto Encyclopedia of Genes and Genomes (KEGG)
inflammatory pathways are closely linked via specific
is a web-based resource that contains a series of databases of
transcriptional regulators. They confirmed that targeting of
biological systems, consisting of genetic building blocks of
a prototype transcription factor of the inflammatory response
genes and proteins (KEGG GENES), chemical building
(NF-kB) affected plasma lipid levels and lowered plasma
blocks of both endogenous and exogenous substances (KEGG
LIGAND), molecular wiring diagrams of interaction and
demonstrated the strength of a systems approach in that
reaction networks (KEGG PATHWAY), and hierarchies
multiple analytical platforms were combined to build an
and relationships of various biological objects (KEGG
overall model of disease, which provided mechanistic
BRITE). KEGG provides a reference knowledge base for
information across multiple biological pathways that suggest
linking genomes to biological systems, and also to environ-
potential new strategies for therapeutic interventions affecting
ments, by the processes of PATHWAY mapping and BRITE
inflammation, as well as plasma lipids, in a beneficial way. The
mapping. The visualization objects in the KEGG suite are
results of this study are examined in greater detail using the
consistent, with the nodes of a pathway map shown as
KegArray tool discussed below.
rectangles that represent gene products, usually proteins, andsmall circles representing chemical compounds and othermolecules. A large oval represents a link to another pathway
An expanding toolbox
map, and a cluster of rectangles represents a protein complex.
An important bottleneck in the development of systems
Aoki and Kanehisa provide a comprehensive tutorial on
approaches is the need for software capable of analyzing
KEGG for interested readers.80
collected omics data from multiple platforms. There are many
The Systems Biology Markup Language (SBML) is a
software packages and web resources available, all of which
are too numerous to describe in this review (see ref. 71 for a
biochemical reaction networks in software. It is oriented
comprehensive list of 4150 resources for systems biology).
towards describing systems of biochemical reactions, including
A few resources worth briefly mentioning here include
cell signaling pathways, metabolic pathways, biochemical
KEGG,72 PathVisio,73 pSTIING,74 MetaCoret,75 Cytoscape,55
reactions and gene regulation.78 The SBML project has
VANTED,76 Pathway-Express,77 Ingenuitys Systems and a
produced a KEGG2SBML tool that is useful for converting
plethora of SBML applications78 (Table 1). Some of this
KEGG-based metabolic pathways into SBML format. The
software is designed to map the results from omics experi-
pSTIING resource consists of a web-based application
ments onto existing pathway databases such as KEGG or
containing metabolic pathways, protein–protein, protein–lipid
Network and pathway mapping software, including tools for network visualization/manipulation and network inference from
high-throughput dataa
Various (plugins)
Ingenuitys Systems http://www.ingenuity.com/
KEGG (Kyoto Encyclopedia of Genes and Genomes) http://www.genome.jp/
Same as Cytoscape
a This list is non-exhaustive and is solely provided to give an example of some of the available resources. See Ng et al. for a more comprehensivelist.71 b Systems biology markup language (see http://sbml.org/). c Affinity purification-Mass spectrometry.
c The Royal Society of Chemistry 2009
Mol. BioSyst., 2009, 5, 588–602 595
but interested readers are suggested to examine work by the
transcriptional regulatory associations. It is focused on
Institute for Systems Biology SBEAMS (Systems Biology
regulatory networks relevant to chronic inflammation, cell
migration and cancer, therefore, making it a useful resource
sbeams.org/), a framework for collecting, storing, and
for inflammatory-related applications. The pSTIING web site
accessing data produced by these and other experiments.89
also features a tool for inferring networks (Cladist). VANTED
Other efforts in this area include the Biological Networks
is a multiplatform tool for the manipulation of graphs that
server, which is a systems biology software platform with
represent either biological pathways or functional hierarchies.
multiple visualization and analysis functions including:
It also allows the mapping of experimental data into
visualization of molecular interaction networks, sequence
the network and is capable of processing flux data. Graph
and 3D structure information, integration with other graph-
information is loaded in SBML format, but it also has a
structured data such as ontologies (e.g., gene ontology) and
KEGG interface.81 Cytoscape is an open source platform for
taxonomies (e.g., enzyme classification system), integration of
visualizing molecular interaction networks and biological
interactions with experimental data (e.g., gene expression),
pathways. One of its most useful features is the ability to
and extraction of biologically meaningful relations, as well as
accept custom plugins to perform specific tasks, extending the
number of initial features. A number of useful plugins are
Networks server provides querying services and an information
already available, including MONET,82 a method for inferring
management framework over PathSys, which is a graph-based
gene regulatory networks from gene expression data, and
system for creating a combined database of biological
the AgilentLiteratureSearch plugin,83 which enables the
pathways, gene regulatory networks and protein interaction
generation of association networks from literature mining
maps, which integrates over 14 curated and publicly contributed
(see below). R and Bioconductor are a platform extensively
data sources for eight representative organisms.91 There is also
used for the analysis of high-throughput data.84 In addition,
currently a significant amount of effort to determine standards
there are several free resources available related to the
for storing microarray data (MAGE-OM/ML, GeneX,
analysis of networks, including packages such as GeneNet,85
apComplex86 and Rgraphivz,87 (for creating and visualizing
and metabolomics standards initiatives.93 Data-integration
networks). The package Gaggle88 enables interaction between
techniques for omics data sets have been reviewed in detail
Cytoscape and R.
by Joyce and Palsson,12 and references therein.
The two main commercial packages are MetaCoret and
One of the long-range goals of systems biology approaches
Ingenuitys Systems. MetaCoret (GeneGo, Inc.) is an
is to develop models capable of predicting clinical phenotypes,
integrated suite of software applications that is designed for
as well as patient treatment regimens and associated outcomes.
functional analysis of experimental data, including omics data,
However, the complexity of cardiovascular disease and other
CGH arrays, SNPs, SAGE gene expression and pathway
inflammatory-related diseases makes model development
analysis. MetaCoret is based on a proprietary manually
challenging. A number of different groups are working on
curated database of human protein–protein, protein–DNA
developing in silico models of inflammation, with the majority
and protein–compound interactions, metabolic and signaling
of efforts focused on the acute inflammatory response.94–97
pathways, and the effects of bioactive molecules on gene
However, it is likely that these models can eventually be
expression. GeneGo is also in the process of creating a systems
adapted for diseases of chronic inflammation. Recent reviews
biology and pathway analysis platform specific for cardio-
have addressed the status of cardiac systems biology, with a
vascular diseases (MetaMiner Cardiac Consortium). Ingenuity
number of promising developments.5,47,98–100 These models
Pathways Analysis (IPA) enables researchers to model and
represent the logical extension of the systems biology tools
analyze biological and chemical systems. The IPA suite
discussed above and as the amount of data increases, our
contains a series of modules including IPA-Biomarkert
ability to develop interactive models of individual pathologies
will increase. This translational systems biology approach will
Analysis. IPA-Biomarkert identifies the most promising and
make it feasible to develop patient-specific modeling based
relevant biomarker candidates within experimental data.
upon known disease mechanisms.97 These models will be
IPA-Toxt delivers a focused toxicity and safety assessment
useful in clinical settings to predict and optimize the outcome
of candidate compounds, elucidates toxicity mechanisms and
from surgery and non-interventional therapy.101
identifies potential markers of toxicity, with a focus oncardiovascular toxicity, nephrotoxicity, and hepatotoxicity.
IPA-Metabolomicst analyzes metabolomics data in thecontext of metabolic and signaling pathways. This module
To address the need for software capable of analyzing data
can integrate transcriptomics, proteomics and metabolomics
from multiple omics platforms, KEGG has recently intro-
data in a systems biology approach to biomarker discovery,
duced a new application called KegArray that is designed to
molecular toxicology, and mechanism of action studies.
map omics data onto the KEGG suite of databases. KegArray
Multiple efforts are currently under way to synchronize the
is a Java application that provides an environment for
data being collected by research groups around the world. In
analyzing transcriptomics or proteomics (expression profiles)
order to advance the field, it is therefore necessary to develop
and metabolomics data (compound profiles) individually or
databases with defined metrics for evaluating the quality of the
simultaneously. The application is tightly integrated with the
global data sets. This area is beyond the scope of this review,
KEGG database, and maps input data to KEGG resources
596 Mol. BioSyst., 2009, 5, 588–602
c The Royal Society of Chemistry 2009
including PATHWAY, BRITE and genome maps. KegArray
genes/proteins/compounds. In this case, the ranking represents
is available for running in Mac, Windows or Linux
how well the respective pathways have been covered by the
environments and can be downloaded freely from the KEGG
experimental analyses. Subsequently, by only including the
up- and down-regulated entries in the mapping, a ranking
The KegArray tool is designed to facilitate integrated
based on biological effects on the pathway can be achieved.
mapping of omics results onto a KEGG application of choice.
The statistical evaluation of systems biology data is a complex
Metabolic pathways significantly affected in high cholesterol
and highly debated subject (see Data Processing and Statistical
exposure relative to low cholesterol exposurea
Analysis). As such, the KegArray tool itself does not imposeany statistical evaluation on the inputted data, but is rather
mmu01040 Biosynthesis of unsaturated fatty acidsmmu03320 PPAR signaling pathway
intended as a link between processed data and the interactive
mmu00564 Glycerophospholipid metabolism
KEGG environment. This conceptual solution allows the user
mmu00071 Fatty acid metabolism
to have full control over the choice of statistical methods, data
mmu04920 Adipocytokine signaling pathway
transformation and data selection prior to mapping onto the
mmu00565 Ether lipid metabolismmmu00590 Arachidonic acid metabolism
KEGG tool of choice. KegArray allows full flexibility in
mmu00100 Biosynthesis of steroids
determining the significance or cut-off levels, as well as the
mmu00120 Bile acid biosynthesis
corresponding color coding for the mapping. KegArray can
mmu00561 Glycerolipid metabolismmmu00600 Sphingolipid metabolism
thus be described as a visualization tool, but with the added
mmu00591 Linoleic acid metabolism
advantage of a sustained interactive environment with the vast
mmu00592 alpha-Linolenic acid metabolism
KEGG database. It is not necessary to pre-select the pathways
a Data are from a KegArray-based analysis of quantified lipid and
of interest and the output is formatted as a list of links
transcriptomics data from Kleemann et al.69 Pathways are from
to all affected pathways, organized in the order of highest
KEGG PATHWAY and are listed with pathway name and KEGG
number of mapped genes/proteins/compounds per pathway.
ID number (e.g. mmu for mouse). The pathways are ranked in order of
KegArray can be configured to display any combination
greatest number of components significantly affected in the pathway.
A total of 77 different pathways were affected, of which the top 13 areshown here. A complete list of all 77 affected pathways is provided inTable S3. In addition, those pathways significantly affected by low and
An example for expression ratios between two channels for
high cholesterol exposure are provided in Table S1 and S2, respec-
the input of transcriptomics data into KegArraya
tively. It is not possible to state whether an entire pathway is positivelyor negatively affected, but these individual pathways can be visualized
following mapping to KEGG and inspected for specific fluctuations in
the data. Examples of this are shown in Fig. 3 and Fig. 4.
a Data are the high cholesterol (HC) treatment shown in Fig. 2.
KegArray input format for metabolomics dataa
Venn diagram displaying the number of metabolic pathways
significantly affected following treatment with either low cholesterol
(LC) or high cholesterol (HC) relative to control in n ApoE*3Leiden
mouse model of atherosclerosis. In addition, the changes between HC
and LC were compared, evidencing five pathways that were specifically
affected between these two treatments (mmu00010 glycolysis/
gluconeogenesis, mmu00641 3-chloroacrylic acid degradation, mmu00680
methane metabolism, mmu00980 metabolism of xenobiotics by cyto-
chrome P450, and mmu00982 drug metabolism-cytochrome P450).
Data are from a KegArray-based analysis of quantified lipid and
transcriptomics data from Kleemann et al.69 A complete list of all
Data are the high cholesterol (HC) treatment shown in Fig. 2.
pathways affected is provided in the ESI, Tables S1–S3.w
c The Royal Society of Chemistry 2009
Mol. BioSyst., 2009, 5, 588–602 597
The expected mapping format is that of ratios between e.g. a
available. Additional information regarding experimental
treated and control group, and a specific tab-delimited format
descriptions, reference information, etc., can also be included
to facilitate the automatic calculation of ratios from raw data
in the input file by simply adding the ‘#' character at the
is available (KEGG EXPRESSION format). However, in
beginning of the line, which will result in that line being
order to increase the versatility of the tool, an additional
skipped by KegArray (other than the ‘#organism:' or
generic file input format has also been constructed (RATIO
‘#source:' line).
format) to allow other aspects of the data to be evaluated
The lines in tab-delimited format below the ‘#'-delimited
through the KegArray tool (e.g. weighting according to
section contain omics profiling data. The first column must
statistical significance, ranking etc.). Both formats, described
contain the KEGG GENES ID, which is the unique identifier
in detail in the ReadMe file available for download with
of the organism-specific gene. The second and third columns
KegArray (http://www.genome.jp/kegg/expression/), can be
are aimed for entering X- and Y-coordinates, e.g. those
used for the input of transcriptomics or proteomics data.
derived from a microarray experiment, to facilitate a
Organism-specific mapping of the results is facilitated by the
schematic view of the microarray through the ‘‘ArrayViewer''
organism information provided on the first line of the input
application. If the data are from a proteomics experiment, the
file, in the format ‘#organism:' followed by the organism
second and third columns can be left blank. Accordingly, it is
three- or four-letter organism identifier code used in KEGG.
not necessary to input the microarray coordinate information,
(e.g., ‘hsa' for human and ‘mmu' for mouse). If organism-
and the KEGG ID and data columns are sufficient. If the
specific mapping is not desirable, the abbreviation for the
RATIO file format is utilized, the fourth column contains the
all-inclusive generic pathway can be used (‘map'). Since the
data value of interest, as exemplified by the ratios between
interactive environment of KEGG is maintained, it is easy to
control channel and target channel in Table 2. In contrast, if
scroll between the many different organism-specific pathways
the EXPRESSION file format is utilized, the fourth through
Results of KegArray-based analysis of quantified lipid and transcriptomics data from Kleemann et al.69 The KEGG metabolic pathway
‘‘Biosynthesis of unsaturated fatty acids'' (map01040) was the pathway that evidenced the greatest number of changes between low and highcholesterol treatment. KegArray was run with a 1.1-fold threshold, with red and orange indicating a 10% and 5% increase, respectively, yellowindicating no change (grey indicates that the enzyme/metabolite is present in the organism), and light green and dark green indicating a 5% and10% decrease, respectively. Table 4 provides a list of the top 13 pathways that differed between low and high cholesterol treatment.
598 Mol. BioSyst., 2009, 5, 588–602
c The Royal Society of Chemistry 2009
seventh columns contain the total signal from the treated/
KEGG PATHWAY maps as well as KEGG BRITE and
diseased sample, background signal from the treated sample,
KEGG DAS for further analysis. These data can also be
total signal from control sample, and background signal from
mapped onto the KEGG DISEASE pathways.
the control sample in the indicated order. KegArray then
In order to demonstrate the utility of KegArray, we
performs the background subtraction and calculates the ratio
have applied it to a dataset of gene and metabolite data
between treated and control sample upon submission of the
taken from Kleemann et al.69 This study was designed to
examine the potential of increasing doses of dietary cholesterol
The data format for metabolomics data is similar to the
to evoke the inflammatory component that is necessary for the
gene/protein data; however, only the ratio format can be used.
onset of atherosclerosis. Towards this end, ApoE*3Leiden
All metabolites (compounds) must be assigned KEGG
mice were fed either a control diet (cholesterol-free),
COMPOUND ID numbers in order to be recognized by
low cholesterol (LC, 0.25% w/w) or high cholesterol
KegArray. In the data file, the first column contains the
(HC, 1.0% w/w) diet for ten weeks (to achieve early mild
KEGG COMPOUND ID (e.g., C00219 for arachidonic acid)
atherosclerotic plaques), with the amount of cholesterol being
and the second column contains the pre-processed data value
the only dietary variable in the study. At the end of the study,
of interest, e.g. ratios of the target compound relative to the
the mice were sacrificed, scored for atherosclerosis and
control (Table 3).
profiled using microarray analysis (livers) and lipidomics
Because entry IDs must be in KEGG GENES ID format,
quantification (liver and plasma). The results showed that
an ID converter has also been created. Currently, the
only the HC diet evoked hepatic inflammation and induced
following external databases are supported: NCBI GI, NCBI
Entrez Gene, GenBank, UniGene, UniProt and IPI. When
observed with the LC diet). A total of 264 genes involved in
using KegArray, a number of parameters can be customized,
lipid metabolism were measured, with 23 genes differentially
including the threshold, normalization and color scheme.
expressed in the LC diet, and 64 in the HC diet. In addition,
The output can be viewed as significantly either up-
a range of intrahepatic fatty acids were quantified, of which
regulated, down-regulated or all data that were input into
27 free fatty acids were mapped along with the gene data
KegArray. These data are then visualized onto interactive
onto the KEGG database using KegArray. The KegArray
Results of KegArray-based analysis of quantified lipid and transcriptomics data from Kleemann et al.69 The KEGG metabolic pathway
‘‘Sphingolipid metabolism'' (map00600) evidenced a number of changes between low and high cholesterol treatment. KegArray was run with a1.1-fold threshold, with red and orange indicating a 10% and 5% increase, respectively, yellow indicating no change (grey indicates that theenzyme/metabolite is present in the organism), and light green and dark green indicating a 5% and 10% decrease, respectively. Table 4 provides alist of the top 13 pathways that differed between low and high cholesterol treatment.
c The Royal Society of Chemistry 2009
Mol. BioSyst., 2009, 5, 588–602 599
parameters were set to display a 1.1-fold difference and
non-affected pathways were excluded. For the LC exposure,60 biochemical pathways were affected (ESI, Table S1w) as
One of the main current obstacles in systems biology is the
opposed to 76 pathways for the HC exposure (ESI, Table S2w),
heterogeneity of available datasets. The field requires the
which included all 60 pathways from the LC dosing. This
creation of legacy databases of omics data that are formatted
suggests that already with LC, a very pronounced adaptation
to enable inter-study comparison. Many existing methodologies
of liver lipid metabolism occurs. With these adaptations,
the liver is capable of dealing with cholesterol as there is
manipulation and analysis. In order to increase the utility
very little development of early atherosclerotic lesions and
and availability of these tools, it is necessary to either develop
there is no significant inflammation. However, when the
simplified web-based applications that are equally useable for
dose of dietary cholesterol is increased (HC condition),
cross-disciplinary users and/or shift the educational paradigm
16 additional lipid pathways are activated. These data suggest
to place increased emphasis on the acquisition of computer
that a very low dose of cholesterol affects a significant part of
skills. Future advances in understanding complex medical
the pathways involved in lipid handling. It appears that with
problems are highly dependent on methodological advances
HC, the quality of the lipids changes and increased number of
and integration of the computational systems biology
unsaturated or proatherogenic lipids such as sphinogomyelin
community with biologists and clinicians.97
are significantly impacted. Of particular interest was the
Although commercial tools are more complete in terms of
difference in affected pathways between LC and HC diets.
features, they are often closed platforms that do not allow for
A total of 77 pathways were differentially affected (ESI,
the development and interchange of analysis tools and data
Table S3w), of which the top 13 pathways affected are provided
beyond their supported applications. In addition, these tools
in Table 4. These differences are shown on treatment-specific
can be expensive, which can be prohibitive for the academic
basis in Fig. 2. A total of 59 pathways were affected in both
and/or clinical settings. It is desirable that developments in
LC and HC treatment, as well as between treatments. Of
these fields be based upon open standards that allow the easy
particular interest are the five pathways that differ between LC
interchange of multiple types of data and the subsequent
and HC treatment, but did not evidence changes in LC or HC
analyses. The adoption of standard file formats should reduce
the difficulties in the integration of data derived from different
mmu00641 3-chloroacrylic acid degradation, mmu00680
analysis tools.
methane metabolism, mmu00980 metabolism of xenobiotics
The ultimate goal for translational systems biology
by cytochrome P450, and mmu00982 drug metabolism-
approaches is to bring forth an understanding of the
cytochrome P450). Examples of affected metabolic pathways
pathogenesis and disease etiology at the organism level that
are shown for the biosynthesis of unsaturated fatty acids
goes beyond what traditional minimalistic approaches have to
(Fig. 3) and sphingolipid metabolism (Fig. 4). Kleemann
offer. Such an in depth understanding of the differences
et al.69 reported that with increasing cholesterol uptake, the
between the healthy and diseased states can help solve crucial
liver switched from an adaptive state to an inflammatory
clinical issues, and provide markers and insights that aid
pro-atherosclerotic state (with LC there is primarily an
clinicians in making prognostic and diagnostic evaluations.
adaptive response of key metabolic pathways required to
In terms of atherosclerosis, one of the most important clinical
cope with lipids). At the gene expression level, there is
dilemmas is determining if and when a patient is at risk of
clearly a further adaptation of the pathways switched on/off
developing symptomatic disease. A systems biology approach
with LC when animals receive HC. These effects were
could potentially identify alterations in molecular pathways
in accordance with the metabolite levels, with significant
and targets that precede plaque instability, and thus assist in
(p o 0.05) decreases in myristic, palmitic, stearic, arachidonic,
developing molecular tools that can substitute imaging
docosapentaenoic and docosahexaenoic acids. This finding
modalities such as MRI or PET CT to more accurate identi-
is supported by the observation that the biosynthesis
fication of vulnerable lesions. Accordingly, systems biology
of unsaturated fatty acids was the metabolic pathway with
tools can be utilized to develop concrete clinical applications
the greatest number of changes between LC and HC
that will help improve patient selection, monitoring of
treatment. Specific decreases were observed in unsaturated
stroke preventive intervention, and other needs of the medical
fatty acids in the HC treatment: a decrease in arachidonic
acid was observed at p o 0.05 and docosahexaenoic
The advent of systems biology is bringing forth a change in
acid (DHA) at p o 0.07). This pathway is a potential source
the philosophy of medicine, and is rapidly changing the way
of the unsaturated fatty acid substrates for the many of
we view the disease process. However, in order to realize the
the pro-inflammatory lipids involved in the development of
promise of systems biology, i.e. the understanding of the
atherosclerosis (e.g., observed reductions in arachidonic acid
organism as a whole, the next major challenge is to facilitate
levels). Accordingly, mapping of these data to KEGG was
integrated analysis of data from multiple sources.102 Without
a rapid method for providing information on which
the integration of individual networks and biochemical
pathways were most affected by cholesterol treatment and
pathways into the entire system, the observed effects of
provided a mechanistic insight into the disease process. This
individual components remain without meaning and context,
new tool for the KEGG suite will be a useful compliment
and cannot provide understanding of pathological processes at
to existing strategies for network analysis and pathway
the systems level. Some steps in the direction of integrated
analyses have already been made,33 but increased integration
600 Mol. BioSyst., 2009, 5, 588–602
c The Royal Society of Chemistry 2009
of heterogeneous data and networks is non-trivial. The
26 S. E. Calvano, W. Xiao, D. R. Richards, R. M. Felciano,
potential of combining the knowledge from multiple networks
H. V. Baker, R. J. Cho, R. O. Chen, B. H. Brownstein,J. P. Cobb, S. K. Tschoeke, C. Miller-Graziano, L. L. Moldawer,
with high-throughput data, as exemplified herein by the
M. N. Mindrinos, R. W. Davis, R. G. Tompkins and S. F. Lowry,
KegArray tool and the KEGG database, will move us one
Nature, 2005, 437, 1032–1037.
step further towards a true understanding of the living
27 K. I. Goh, M. E. Cusick, D. Valle, B. Childs, M. Vidal and
organism. The rapid advances in computer sciences and
A. L. Barabasi, Proc. Natl. Acad. Sci. U. S. A., 2007, 104,8685–8690.
high-throughput technologies, coupled with paradigm shifts
28 X. Wu, R. Jiang, M. Q. Zhang and S. Li, Mol. Syst. Biol., 2008, 4, 189.
in the way clinical and pre-clinical researchers perceive science,
29 U. Alon, Nat. Rev. Genet., 2007, 8, 450–461.
holds the key to understanding the intricate systems that
30 M. Isalan, C. Lemerle, K. Michalodimitrakis, C. Horn,
P. Beltrao, E. Raineri, M. Garriga-Canut and L. Serrano, Nature,
dictate the switch from healthy to diseased, and represents
2008, 452, 840–845.
the path that will lead us to true personalized medicine.
31 F. Markowetz and R. Spang, BMC Bioinf., 2007, 8(Suppl 6), S5.
32 T. Schlitt and A. Brazma, BMC Bioinf., 2007, 8(Suppl 6), S9.
33 N. Ishii, K. Nakahigashi, T. Baba, M. Robert, T. Soga, A. Kanai,
T. Hirasawa, M. Naba, K. Hirai, A. Hoque, P. Y. Ho,Y. Kakazu, K. Sugawara, S. Igarashi, S. Harada, T. Masuda,
This research was supported by the A˚ke Wibergs Stiftelse, the
N. Sugiyama, T. Togashi, M. Hasegawa, Y. Takai, K. Yugi,
Fredrik and Ingrid Thurings Stiftelse, The Royal Swedish
K. Arakawa, N. Iwata, Y. Toya, Y. Nakayama, T. Nishioka,
Academy of Sciences, the Swedish Heart-Lung Foundation
K. Shimizu, H. Mori and M. Tomita, Science, 2007, 316, 593–597.
34 J. Zhu, B. Zhang, E. N. Smith, B. Drees, R. B. Brem,
and the Japanese Society for the Promotion of Science (JSPS).
L. Kruglyak, R. E. Bumgarner and E. E. Schadt, Nat. Genet.,
C.E.W was supported by a Center for Allergy Research
2008, 40, 854–861.
Fellowship. R.K. and M.v.E. received support from the
35 C. Bonferroni, Pubblicazioni del R Istituto Superiore di Scienze
TNO Research Program VP9 Personalized Health.
Economiche e Commerciali di Firenze, 1936, vol. 8, pp. 3–62.
36 R. G. Miller, Simultaneous Statistical Inference, Springer Verlag,
New York, 1981, pp. 6–8.
37 Y. Benjamini and Y. Hochberg, J. R. Stat. Soc. Ser. B
(Methodological), 1995, 289–300.
1 H. Kitano, Science, 2002, 295, 1662–1664.
38 J. Trygg and S. Wold, J. Chemom., 2002, 16, 119–128.
2 A. L. Barabasi and Z. N. Oltvai, Nat. Rev. Genet., 2004, 5, 101–113.
39 L. Hood and R. M. Perlmutter, Nat. Biotechnol., 2004, 22, 1215–1217.
3 A. C. Ahn, M. Tewari, C. S. Poon and R. S. Phillips, PLoS Med.,
40 I. Graham, D. Atar, K. Borch-Johnsen, G. Boysen, G. Burell,
2006, 3, e208.
R. Cifkova, J. Dallongeville, G. De Backer, S. Ebrahim,
4 A. C. Ahn, M. Tewari, C. S. Poon and R. S. Phillips, PLoS Med.,
B. Gjelsvik, C. Herrmann-Lingen, A. Hoes, S. Humphries,
2006, 3, e209.
M. Knapton, J. Perk, S. G. Priori, K. Pyorala, Z. Reiner,
5 A. D. McCulloch and G. Paternostro, Ann. N. Y. Acad. Sci.,
L. Ruilope, S.
2005, 1047, 283–295.
P. Weissberg, D. Wood, J. Yarnell, J. L. Zamorano, E. Walma,
6 A. D. Weston and L. Hood, J. Proteome Res., 2004, 3, 179–196.
T. Fitzgerald, M. T. Cooney, A. Dudina, A. Vahanian, J. Camm,
7 J. van der Greef, T. Hankemeier and R. N. McBurney,
R. De Caterina, V. Dean, K. Dickstein, C. Funck-Brentano,
Pharmacogenomics, 2006, 7, 1087–1094.
G. Filippatos, I. Hellemans, S. D. Kristensen, K. McGregor,
8 J. van der Greef, S. Martin, P. Juhasz, A. Adourian, T. Plasterer,
U. Sechtem, S. Silber, M. Tendera, P. Widimsky, J. L. Zamorano,
E. R. Verheij and R. N. McBurney, J. Proteome Res., 2007, 6,
I. Hellemans, A. Altiner, E. Bonora, P. N. Durrington,
R. Fagard, S. Giampaoli, H. Hemingway, J. Hakansson,
9 D. J. Lockhart and E. A. Winzeler, Nature, 2000, 405, 827–836.
S. E. Kjeldsen, M. L. Larsen, G. Mancia, A. J. Manolis,
10 R. Aebersold and M. Mann, Nature, 2003, 422, 198–207.
11 B. Domon and R. Aebersold, Science, 2006, 312, 212–217.
12 A. R. Joyce and B. O. Palsson, Nat. Rev. Mol. Cell Biol., 2006, 7,
L. Tokgozoglu, O. Wiklund and A. Zampelas, Eur. Heart J.,
2007, 28, 2375–2414.
13 J. C. Smith and D. Figeys, Mol. BioSyst., 2006, 2, 364–370.
41 W. Rosamond, K. Flegal, K. Furie, A. Go, K. Greenlund,
14 B. F. Cravatt, G. M. Simon and J. R. Yates, 3rd, Nature, 2007,
N. Haase, S. M. Hailpern, M. Ho, V. Howard, B. Kissela,
450, 991–1000.
S. Kittner, D. Lloyd-Jones, M. McDermott, J. Meigs, C. Moy,
15 K. Dettmer, P. A. Aronov and B. D. Hammock, Mass Spectrom.
G. Nichol, C. O'Donnell, V. Roger, P. Sorlie, J. Steinberger,
Rev., 2007, 26, 51–78.
T. Thom, M. Wilson and Y. Hong, Circulation, 2008, 117,
16 X. Han, A. Aslanian and J. R. Yates 3rd, Curr. Opin. Chem. Biol.,
2008, 12, 483–490.
42 D. B. Mark, F. J. Van de Werf, R. J. Simes, H. D. White,
17 J. Zaia, Chem. Biol., 2008, 15, 881–892.
L. C. Wallentin, R. M. Califf and P. W. Armstrong, Eur. Heart J.,
18 H. Jeong, B. Tombor, R. Albert, Z. N. Oltvai and A. L. Barabasi,
2007, 28, 2678–2684.
Nature, 2000, 407, 651–654.
43 A. J. Lusis, J. Lipid Res., 2006, 47, 1887–1890.
19 S. S. Shen-Orr, R. Milo, S. Mangan and U. Alon, Nat. Genet.,
44 A. J. Lusis, Nature, 2000, 407, 233–241.
2002, 31, 64–68.
45 G. K. Hansson, N. Engl. J. Med., 2005, 352, 1685–1695.
20 E. Ravasz, A. L. Somera, D. A. Mongru, Z. N. Oltvai and
46 G. K. Hansson and J. Nilsson, J. Intern. Med., 2008, 263,
A. L. Barabasi, Science, 2002, 297, 1551–1555.
21 R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii
47 P. K. Shreenivasaiah, S. H. Rho, T. Kim and H. Kim do, J. Mol.
and U. Alon, Science, 2002, 298, 824–827.
Cell. Cardiol., 2008, 44, 460–469.
48 J. T. Brindle, H. Antti, E. Holmes, G. Tranter, J. K. Nicholson,
C. D. Maranas, Genome Res., 2004, 14, 301–312.
H. W. Bethell, S. Clarke, P. M. Schofield, E. McKilligin,
23 E. V. Nikolaev, A. P. Burgard and C. D. Maranas, Biophys. J.,
D. E. Mosedale and D. J. Grainger, Nat. Med., 2002, 8,
2005, 88, 37–49.
24 V. Vermeirssen, M. I. Barrasa, C. A. Hidalgo, J. A. Babon,
49 H. L. Kirschenlohr, J. L. Griffin, S. C. Clarke, R. Rhydwen,
A. A. Grace, P. M. Schofield, K. M. Brindle and J. C. Metcalfe,
A. J. Walhout, Genome Res., 2007, 17, 1061–1071.
Nat. Med., 2006, 12, 705–710.
25 M. Stoll, A. W. Cowley, Jr, P. J. Tonellato, A. S. Greene,
50 A. M. van den Maagdenberg, M. H. Hofker, P. J. Krimpenfort,
M. L. Kaldunski, R. J. Roman, P. Dumas, N. J. Schork,
I. de Bruijn, B. van Vlijmen, H. van der Boom, L. M. Havekes
Z. Wang and H. J. Jacob, Science, 2001, 294, 1723–1726.
and R. R. Frants, J. Biol. Chem., 1993, 268, 10540–10545.
c The Royal Society of Chemistry 2009
Mol. BioSyst., 2009, 5, 588–602 601
51 C. B. Clish, E. Davidov, M. Oresic, T. N. Plasterer, G. Lavine,
76 B. H. Junker, C. Klukas and F. Schreiber, BMC Bioinf., 2006, 7,
T. Londo, M. Meys, P. Snell, W. Stochaj, A. Adourian,
X. Zhang, N. Morel, E. Neumann, E. Verheij, J. T. Vogels,
77 S. Draghici, P. Khatri, A. L. Tarca, K. Amin, A. Done,
L. M. Havekes, N. Afeyan, F. Regnier, J. van der Greef and
C. Voichita, C. Georgescu and R. Romero, Genome Res., 2007,
S. Naylor, Omics, 2004, 8, 3–13.
17, 1537–1545.
52 M. Oresic, C. B. Clish, E. J. Davidov, E. Verheij, J. Vogels,
78 M. Hucka, A. Finney, H. M. Sauro, H. Bolouri, J. C. Doyle,
L. M. Havekes, E. Neumann, A. Adourian, S. Naylor, J. van der
H. Kitano, A. P. Arkin, B. J. Bornstein, D. Bray, A. Cornish-
Greef and T. Plasterer, Appl. Bioinf., 2004, 3, 205–217.
Bowden, A. A. Cuellar, S. Dronov, E. D. Gilles, M. Ginkel,
53 B. de Roos, G. Rucklidge, M. Reid, K. Ross, G. Duncan,
V. Gor, Goryanin, II, W. J. Hedley, T. C. Hodgman,
M. A. Navarro, J. M. Arbones-Mainar, M. A. Guzman-Garcia,
J. H. Hofmeyr, P. J. Hunter, N. S. Juty, J. L. Kasberger,
J. Osada, J. Browne, C. E. Loscher and H. M. Roche, FASEB J.,
A. Kremling, U. Kummer, N. Le Novere, L. M. Loew,
2005, 19, 1746–1748.
54 J. Y. King, R. Ferrara, R. Tabibiazar, J. M. Spin, M. M. Chen,
Y. Nakayama, M. R. Nelson, P. F. Nielsen, T. Sakurada,
A. Kuchinsky, A. Vailaya, R. Kincaid, A. Tsalenko, D. X. Deng,
J. C. Schaff, B. E. Shapiro, T. S. Shimizu, H. D. Spence,
A. Connolly, P. Zhang, E. Yang, C. Watt, Z. Yakhini, A.
J. Stelling, K. Takahashi, M. Tomita, J. Wagner and J. Wang,
Ben-Dor, A. Adler, L. Bruhn, P. Tsao, T. Quertermous and
Bioinformatics, 2003, 19, 524–531.
E. A. Ashley, Physiol. Genomics, 2005, 23, 103–118.
79 R. C. Gentleman, V. J. Carey, D. M. Bates, B. Bolstad,
55 P. Shannon, A. Markiel, O. Ozier, N. S. Baliga, J. T. Wang,
M. Dettling, S. Dudoit, B. Ellis, L. Gautier, Y. Ge, J. Gentry,
D. Ramage, N. Amin, B. Schwikowski and T. Ideker, Genome
K. Hornik, T. Hothorn, W. Huber, S. Iacus, R. Irizarry,
Res., 2003, 13, 2498–2504.
F. Leisch, C. Li, M. Maechler, A. J. Rossini, G. Sawitzki,
56 J. Cohen, Nature, 2002, 420, 885–891.
C. Smith, G. Smyth, L. Tierney, J. Y. Yang and J. Zhang, Genome
57 H. W. Tseng, H. F. Juan, H. C. Huang, J. Y. Lin, S. Sinchaikul,
Biol., 2004, 5, R80.
T. C. Lai, C. F. Chen, S. T. Chen and G. J. Wang, Proteomics,
80 K. Aoki and M. Kanehisa, Current Protocols in Bioinformatics,
2006, 6, 5915–5928.
2005, chapter 1, unit 1.12.
58 H. J. Chung, M. Kim, C. H. Park, J. Kim and J. H. Kim, Nucleic
81 C. Klukas and F. Schreiber, Bioinformatics, 2007, 23, 344–350.
Acids Res., 2004, 32, W460–464.
82 P. H. Lee and D. Lee, Bioinformatics, 2005, 21, 2739–2747.
59 D. M. Wuttge, A. Sirsjo, P. Eriksson and S. Stemme, Mol. Med.,
83 A. Vailaya, P. Bluvas, R. Kincaid, A. Kuchinsky, M. Creech and
2001, 7, 383–392.
A. Adler, Bioinformatics, 2005, 21, 430–438.
60 K. Jatta, D. Wagsater, L. Norgren, B. Stenberg and A. Sirsjo,
84 M. Reimers and V. J. Carey, Methods Enzymol., 2006, 411, 119–134.
J. Vasc. Res., 2005, 42, 266–271.
85 J. Schafer and K. Strimmer, Bioinformatics, 2005, 21, 754–764.
61 P. S. Olofsson, K. Jatta, D. Wagsater, S. Gredmark, U. Hedin,
86 D. Scholtens, M. Vidal and R. Gentleman, Bioinformatics, 2005,
G. Paulsson-Berne, C. Soderberg-Naucler, G. K. Hansson and
21, 3548–3557.
A. Sirsjo, Arterioscler. Thromb. Vasc. Biol., 2005, 25, e113–116.
87 V. J. Carey, J. Gentry, E. Whalen and R. Gentleman,
62 P. S. Olofsson, L. A. Soderstrom, D. Wagsater, Y. Sheikine,
Bioinformatics, 2005, 21, 135–136.
P. Ocaya, F. Lang, C. Rabu, L. Chen, M. Rudling, P. Aukrust,
88 P. T. Shannon, D. J. Reiss, R. Bonneau and N. S. Baliga, BMC
U. Hedin, G. Paulsson-Berne, A. Sirsjo and G. K. Hansson,
Bioinf., 2006, 7, 176.
Circulation, 2008, 117, 1292–1301.
63 R. Laaksonen, M. Katajamaa, H. Paiva, M. Sysi-Aho,
M. H. Johnson and T. Galitski, BMC Bioinf., 2006, 7, 286.
L. Saarinen, P. Junni, D. Lutjohann, J. Smet, R. Van Coster,
90 M. Baitaluk, M. Sedova, A. Ray and A. Gupta, Nucleic Acids
T. Seppanen-Laakso, T. Lehtimaki, J. Soini and M. Oresic, PLoS
Res., 2006, 34, W466–471.
One, 2006, 1, e97.
91 M. Baitaluk, X. Qian, S. Godbole, A. Raval, A. Ray and
64 J. H. Dwyer, H. Allayee, K. M. Dwyer, J. Fan, H. Wu, R. Mar,
A. Gupta, BMC Bioinf., 2006, 7, 55.
A. J. Lusis and M. Mehrabian, N. Engl. J. Med., 2004, 350, 29–37.
92 C. F. Taylor, N. W. Paton, K. S. Lilley, P. A. Binz, R. K. Julian
65 H. Qiu, A. Gabrielsen, H. E. Agardh, M. Wan, A. Wetterholm,
Jr, A. R. Jones, W. Zhu, R. Apweiler, R. Aebersold,
C. H. Wong, U. Hedin, J. Swedenborg, G. K. Hansson,
E. W. Deutsch, M. J. Dunn, A. J. Heck, A. Leitner, M. Macht,
B. Samuelsson, G. Paulsson-Berne and J. Z. Haeggstrom, Proc.
M. Mann, L. Martens, T. A. Neubert, S. D. Patterson, P. Ping,
Natl. Acad. Sci. U. S. A., 2006, 103, 8161–8166.
S. L. Seymour, P. Souda, A. Tsugita, J. Vandekerckhove,
66 K. H. Pietila¨inen, M. Sysi-Aho, A. Rissanen, T. Seppa¨nen-
T. M. Vondriska, J. P. Whitelegge, M. R. Wilkins, I. Xenarios,
Laakso, H. Yki-Ja¨rvinen, J. Kaprio and M. Oresic, PLoS One,
J. R. Yates, 3rd and H. Hermjakob, Nat. Biotechnol., 2007, 25,
2007, 2, e218.
67 J. Nikkila, M. Sysi-Aho, A. Ermolov, T. Seppanen-Laakso,
93 S. A. Sansone, T. Fan, R. Goodacre, J. L. Griffin, N. W. Hardy,
O. Simell, S. Kaski and M. Oresic, Mol. Syst. Biol., 2008, 4, 197.
R. Kaddurah-Daouk, B. S. Kristal, J. Lindon, P. Mendes,
68 J. Skogsberg, J. Lundstro¨m, A. Kovacs, R. Nilsson, P. Noori,
N. Morrison, B. Nikolau, D. Robertson, L. W. Sumner,
S. Maleki, M. Ko¨hler, A. Hamsten, J. Tegner and J. Bjo¨rkegren,
C. Taylor, M. van der Werf, B. van Ommen and O. Fiehn, Nat.
PLoS Genetics, 2008, 4, e1000036.
Biotechnol., 2007, 25, 846–848.
69 R. Kleemann, L. Verschuren, M. J. van Erk, Y. Nikolsky,
94 G. An, J. Crit. Care, 2006, 21, 105–110; discussion 110–101.
N. H. Cnubben, E. R. Verheij, A. K. Smilde, H. F. Hendriks,
95 Y. Vodovotz, Immunol. Res., 2006, 36, 237–245.
S. Zadelaar, G. J. Smith, V. Kaznacheev, T. Nikolskaya,
96 Y. Vodovotz, C. C. Chow, J. Bartels, C. Lagoa, J. M. Prince,
A. Melnikov, E. Hurt-Camejo, J. van der Greef, B. van Ommen
R. M. Levy, R. Kumar, J. Day, J. Rubin, G. Constantine,
and T. Kooistra, Genome Biol., 2007, 8, R200.
T. R. Billiar, M. P. Fink and G. Clermont, Shock, 2006, 26,
70 P. Libby, Nature, 2002, 420, 868–874.
71 A. Ng, B. Bursteinas, Q. Gao, E. Mollison and M. Zvelebil,
97 Y. Vodovotz, M. Csete, J. Bartels, S. Chang and G. An, PLoS
Briefings Bioinf., 2006, 7, 318–330.
Comput. Biol., 2008, 4, e1000014.
72 M. Kanehisa, M. Araki, S. Goto, M. Hattori, M. Hirakawa,
98 D. Noble, Science, 2002, 295, 1678–1682.
M. Itoh, T. Katayama, S. Kawashima, S. Okuda, T. Tokimatsu
99 B. J. Bennett, C. E. Romanoski and A. J. Lusis, Expert Rev.
and Y. Yamanishi, Nucleic Acids Res., 2008, 36, D480–484.
Cardiovasc. Ther., 2007, 5, 1095–1103.
73 M. P. van Iersel, T. Kelder, A. R. Pico, K. Hanspers, S. Coort,
100 S. Y. Shin, S. M. Choo, S. H. Woo and K. H. Cho, Adv. Biochem.
B. R. Conklin and C. Evelo, BMC Bioinf., 2008, 9, 399.
Eng. Biotechnol., 2008, 110, 25–45.
74 A. Ng, B. Bursteinas, Q. Gao, E. Mollison and M. Zvelebil,
101 R. C. Kerckhoffs, S. M. Narayan, J. H. Omens, L. J. Mulligan
Nucleic Acids Res., 2006, 34, D527–534.
and A. D. McCulloch, Heart Fail Clin., 2008, 4, 371–378.
75 S. Ekins, Y. Nikolsky, A. Bugrim, E. Kirillov and T. Nikolskaya,
102 U. Sauer, M. Heinemann and N. Zamboni, Science, 2007, 316,
Methods Mol. Biol., 2007, 356, 319–350.
602 Mol. BioSyst., 2009, 5, 588–602
c The Royal Society of Chemistry 2009
Source: http://web.kuicr.kyoto-u.ac.jp/~diez/preprints/Mol%20Biosyst%202009%20Wheelock.pdf
An independent investigation into the care and treatmentof TW A report forNHS London Authors:Alan WatsonDr Sally Adams Verita is an independent consultancy which specialises in conducting and managinginquiries, investigations and reviews for public sector and statutory organisations. Verita77 Shaftesbury AvenueLondon W1D 5DU
Micro-Level Value Creation Under Managerial Short-termism ∗ Jonathan B. Cohn† University of Texas at Austin University of Texas at Dallas Wharton Research Data Services We present evidence that managers facing short-termist incentives set a lower threshold for accepting projects. Using novel data on new client and product an- nouncements in both the U.S. and international markets, we find that the marketresponds less positively to a new project announcement when the firm's managers haveincentives to focus on short-term stock price performance. Furthermore, textual analy-sis of project announcements show that firms with short-termist CEOs use more vagueand generically positive language when introducing new projects to the marketplace.Keywords: CEO Short-termism, Corporate Investment, CEO Compensation, CareerConcerns, Corporate Governance