HM Medical Clinic

 

Suits.ppt







Large Scale Biomolecular Simulation: Blue Matter Molecular Dynamics on Blue Gene/L Frank SuitsBiomolecular Dynamics & Scalable Modelinghttp://www.research.ibm.com/bluegene High Performance Computing for 2004 IBM Corporation Large Scale Biomolecular Simulation



 Blue Gene protein science goals and history
 Overview of molecular dynamics
 Ways to use power of BG/L for protein science
 Our current simulation efforts and results
 Blue Matter design goals
 Optimization efforts
High Performanc e Computing for Large Sc ale Dec ember 8, 2004 2004 IBM Corporation Biomolec ular Simulation Large team effort  Blue Gene Hardware
 System Software
 Blue Matter development
– Biomolecular dynamics and Scalable Modeling Group: • Bob Germain, Blake Fitch, Mike Pitman, Yuriy Zhestkov, Alex Rayshubskiy, Maria Eleftheriou, Alan Grossfield • Almaden science team: William Swope, Jed Pitera, Hans Horn  Protein Science collaborators
 My own background:
– Physics (there are a lot of us) – Current role: mostly analysis of scientific results – Touched much of the code base, but specialists assigned to key code High Performanc e Computing for Large Sc ale Dec ember 8, 2004 2004 IBM Corporation Biomolec ular Simulation IBM Announces $100 Million Research Initiative to
build World's Fastest Supercomputer
"Blue Gene" to Tackle Protein Folding Grand Challenge YORKTOWN HEIGHTS, NY, December 6, 1999 -- IBM today announced a new $100million exploratory research initiative to build a supercomputer 500 times more powerfulthan the world's fastest computers today. The new computer -- nicknamed "Blue Gene" byIBM researchers -- will be capable of more than one quadrillion operations per second (onepetaflop). This level of performance will make Blue Gene 1,000 times more powerful thanthe Deep Blue machine that beat world chess champion Garry Kasparov in 1997, and about2 million times more powerful than today's top desktop PCs.
Blue Gene's massive computing power will initially be used to model the folding of humanproteins, making this fundamental study of biology the company's first computing "grandchallenge" since the Deep Blue experiment. Learning more about how proteins fold isexpected to give medical researchers better understanding of diseases, as well as potentialcures.
High Performanc e Computing for Large Sc ale Dec ember 8, 2004 2004 IBM Corporation Biomolec ular Simulation Blue Gene program
December 1999: Blue Gene project announcement
November 2001: Research partnership with Lawrence Livermore National
Laboratory (LLNL).

June 2003: First chips completed
November 2003: BG/L Half rack prototype (512 nodes) ranked #73 on 22nd
Top500 List announced at SC2003 (1.435 TFlop/s ).

– 32 node system folding proteins live on the demo floor at SC2003 February 2, 2004: Second pass BG/L chips delivered to Research
March 2, 2004: 1024 node prototype achieves 2.8 TFlop/s on Linpack – would
qualify as #23

April 16, 2004: 2048 node prototype achieves 5.6 TFlop/s on Linpack – would
qualify as #10

May 11, 2004: 4096 node prototype (500 MHz) achieves 11.68 TFlop/s on
Linpack – #4 on Top500

May 18, 2004 First production Blue Matter runs on membrane systems
June 2, 2004 2048 node prototype (pass 2 chips, 700MHz) achieves 8.655
TFlop/s on Linpack-- #8 on Top500

September 29, 2004 8192 node system (pass 2 chips) achieves 36.01 TFlop/s on
Linpack (passes Earth Simulator)

October 2004: 120ns on rhodopsin in membrane (NVE)
November 2004: #1 in Top500 at 70 TFlop/s (1/4 of completed system)
High Performanc e Computing for Large Sc ale Dec ember 8, 2004 2004 IBM Corporation Biomolec ular Simulation Blue Gene Science Mission  Advance our understanding of biologically
important processes via simulation, in particular the
mechanisms behind protein folding

 Current Activities include:
– Thermodynamic & kinetic studies of model – Structural and dynamical studies of membrane and membrane/protein systems High Performanc e Computing for Large Sc ale Dec ember 8, 2004 2004 IBM Corporation Biomolec ular Simulation Scaling Directions statistical certainty High Performanc e Computing for Large Sc ale Dec ember 8, 2004 2004 IBM Corporation Biomolec ular Simulation Time Scales: Biopolymers and Membranes Helix-Coil Transition Lipid exchange via diffusion Ligand-Protein Binding Torsional correlation in lipid headgroups Electron Transfer Adapted from "The Protein Folding Problem", Chan and Dill, Physics Today, Feb. 1993 High Performanc e Computing for Large Sc ale Dec ember 8, 2004 2004 IBM Corporation Biomolec ular Simulation The science plan – a spectrum of projects  systematically cover a range of system sizes, topological complexity
– discovering the "rules" of folding – applying those rules to have impact on disease  address a broad range of scientific questions and impact areas:
– thermodynamics – folding kinetics – folding-related disease (CF, Alzheimer's, GPCR's)  improve our understanding not just of protein folding but protein function
High Performanc e Computing for Large Sc ale Dec ember 8, 2004 2004 IBM Corporation Biomolec ular Simulation b-hairpin Simulation High Performanc e Computing for Large Sc ale Dec ember 8, 2004 2004 IBM Corporation Biomolec ular Simulation Free Energy Landscape of Beta Hairpin (PNAS 2001) High Performanc e Computing for Large Sc ale Dec ember 8, 2004 2004 IBM Corporation Biomolec ular Simulation Free energy surface with trajectories: Kinetics Each color isa separate trajectory Some overlap,others are distinct Can they be chainedtogether? J. Phys. Chem. B, 2004 (2 papers) High Performanc e Computing for Large Sc ale Dec ember 8, 2004 2004 IBM Corporation Biomolec ular Simulation "trp-cage" folding (PNAS 2003)  Small 20 amino
 Simulations started
from a completely
unfolded state

 Simulations could
reproduce &
explain sequence-
dependent folding

High Performanc e Computing for Large Sc ale Dec ember 8, 2004 2004 IBM Corporation Biomolec ular Simulation Membrane Proteins  Membrane processes enable:
– cell signal detection, ion and nutrient transport – infection processes target specific membranes – Over 50% of drug discovery research targets are membrane Experiment and simulation
play a concerted role in
understanding membrane
biophysics

Simulation can be
validated by
experiment

Simulation can then help to interpret experiment
High Performanc e Computing for Large Sc ale Dec ember 8, 2004 2004 IBM Corporation Biomolec ular Simulation Lipid Membrane Simulation High Performanc e Computing for Large Sc ale Dec ember 8, 2004 2004 IBM Corporation Biomolec ular Simulation Overview ofBlue Gene Membrane Protein Studies – Extensive hydrogen bonding network with headgroups – Excellent agreement with experiment for both structural and dynamic properties – Cholesterol induces dramatic lateral organization – Cholesterol shows preference of STEA over DHA – Significant Angular anisotropy of Cholesterol Environment GPCR in a membrane environment
– Rhodopsin with 2:2:1 SDPC/SDPE/CHOL – 100 ns cis-retinal - 200+ ns trans-retinal – Current production rate 15 hrs / ns on 512 nodes BG/L High Performanc e Computing for Large Sc ale Dec ember 8, 2004 2004 IBM Corporation Biomolec ular Simulation Most Recent Publication (yesterday) Molecular-Level Organization of Saturated and Polyunsaturated Fatty Acids
in a Phosphatidylcholine Bilayer Containing Cholesterol
Pitman, Suits, MacKerell, Feller, Biochemistry 2004 High Performanc e Computing for Large Sc ale Dec ember 8, 2004 2004 IBM Corporation Biomolec ular Simulation Rhodopsin and the Eye High Performanc e Computing for Large Sc ale Dec ember 8, 2004 2004 IBM Corporation Biomolec ular Simulation GPCR-based drugs among the 200 best-selling prescriptions,and their GPCR targets GPCR target
2000 sales(US $m)
Johnson & Johnson
Eli Lilly
Congestiv e heart
Serev ent
Atrov ent
High Performanc e Computing for Large Sc ale Dec ember 8, 2004 2004 IBM Corporation Biomolec ular Simulation Current Simulations of Rhodopsin in Membrane High Performanc e Computing for Large Sc ale Dec ember 8, 2004 2004 IBM Corporation Biomolec ular Simulation Some analysis examples: Lipid neighborhood around a cholesterol Each lipid has two different "chains," shown red and blue High Performanc e Computing for Large Sc ale Dec ember 8, 2004 2004 IBM Corporation Biomolec ular Simulation 2D contours give some idea of neighborhood,but only in slice. 3D possibilities? High Performanc e Computing for Large Sc ale Dec ember 8, 2004 2004 IBM Corporation Biomolec ular Simulation 3D isosurfaces of density show lipid distributed symmetrically,while cholesterols show strong orientation preference… Red: Lipid Blue: Other cholesterols High Performanc e Computing for Large Sc ale Dec ember 8, 2004 2004 IBM Corporation Biomolec ular Simulation Also see water pulled in from aboveand cholesterols preferentially oriented to each other . .
Blue: other cholesterols High Performanc e Computing for Large Sc ale Dec ember 8, 2004 2004 IBM Corporation Biomolec ular Simulation Selected Publications  Molecular-Level Organization of Saturated and Polyunsaturated Fatty Acids in a Phosphatidylcholine Bilayer Containing Cholesterol;Biochemistry, In Press, 2004  Describing Protein Folding Kinetics by Molecular Dynamics Simulations.
1. Theory; The Journal of Physical Chemistry B; 2004; 108(21); 6571-6581  Describing Protein Folding Kinetics by Molecular Dynamics Simulations.
2. Example Applications to Alanine Dipeptide and a beta-HairpinPeptide; The Journal of Physical Chemistry B; 2004; 108(21); 6582-6594  Understanding folding and design: Replica-exchange simulations of "Trp- cage" miniproteins, PNAS USA, Vol. 100, Issue 13, June 24, 2003, pp.
7587-7592  Can a continuum solvent model reproduce the free energy landscape of a beta-hairpin folding in water?, Proc. Natl. Acad. Sci. USA, Vol. 99,Issue 20, October 1, 2002, pp. 12777-12782  The free energy landscape for beta-hairpin folding in explicit water, Proc.
Natl. Acad. Sci. USA, Vol. 98, Issue 26, December 18, 2001, pp. 14931-14936 High Performanc e Computing for Large Sc ale Dec ember 8, 2004 2004 IBM Corporation Biomolec ular Simulation BG/L communication network High Performanc e Computing for Large Sc ale Dec ember 8, 2004 2004 IBM Corporation Biomolec ular Simulation Ocean view with Torus High Performanc e Computing for Large Sc ale Dec ember 8, 2004 2004 IBM Corporation Biomolec ular Simulation Why another MD program? High Performanc e Computing for Large Sc ale Dec ember 8, 2004 2004 IBM Corporation Biomolec ular Simulation Blue Matter "Porting" issues  Written from scratch
 Small memory footprint and state (megabytes)
 Low i/o needs
– Still, can accumulate large amount of data – Staged reduction with archive/spinning – "streaming" demo  Strong scaling needs (small #atoms per node)
 For large node count, communication bound
– Novel strategies for decomposition High Performanc e Computing for Large Sc ale Dec ember 8, 2004 2004 IBM Corporation Biomolec ular Simulation  Design scalable MD environment for large node
 Address strong scalability problems
 Research novel modular programming techniques
 Build reusable framework components
High Performanc e Computing for Large Sc ale Dec ember 8, 2004 2004 IBM Corporation Biomolec ular Simulation  C++ with templates
 Database oriented
 Scientific functions "registered" as User-Defined Functions
 Molecular system represented as xml and stored in database
 Each system generates uniqe C++
– No single executable – Java pulls system from database and generates code based on run-time parameters – Opitimization due to compile time constants and reduced code High Performanc e Computing for Large Sc ale Dec ember 8, 2004 2004 IBM Corporation Biomolec ular Simulation Blue Matter Overview  Separate MD program into multiple subpackages (offload
function to host where possible)
– MD core engine (massively parallel, minimal in size) – Setup programs to setup force field assignments, etc – Monitoring and analysis tools to analyze MD trajectories, etc  Run time parameters have already been built in
High Performanc e Computing for Large Sc ale Dec ember 8, 2004 2004 IBM Corporation Biomolec ular Simulation Blue Matter Overview Blue Matter Runtime RegressionTest Driver Scripts Parallel ApplicationDatagrams Management High Performanc e Computing for Large Sc ale Dec ember 8, 2004 2004 IBM Corporation Biomolec ular Simulation Blue Matter Molecular Dynamics code  Multiple Force Field Support
– CHARMM, OPLS-AA, AMBER, GROMOS (in progress), Polarizable  Explicit water models
– TIP3P, SPC, SPCE, rigid or floppy  Integrators, time reversible
– Verlet, rRespa  Temperature control
– Andersen, Nose-Hoover  Pressure control
– Andersen (time reversible)  Methods for long-range electrostatics
– Implemented: Ewald, P3ME (FFT-based), Lekner (pairwise) – Tentative: Fast Multipole High Performanc e Computing for Large Sc ale Dec ember 8, 2004 2004 IBM Corporation Biomolec ular Simulation What is being calculated?  Bonded atoms 1-2, 1-3, 1-4 (quick, list based)
 Non-bond (N 2 – but switch truncated range)
– Lennard-Jones  Periodic imaging
– Ewald (DFT) or – P3ME (3D FFT) High Performanc e Computing for Large Sc ale Dec ember 8, 2004 2004 IBM Corporation Biomolec ular Simulation How is the problem partitioned?  CURRENTLY (very good to 1024 nodes):
– Atoms in fragments of 1-5 or so – Fragments distributed across nodes – Load balancing occurs based on measured times – All nodes know positions of all atoms – Each node calculates forces on its atoms – Parallel 3D FFT across all nodes, leaving piece of result on each node – Each node applies force due to FFT piece to all atoms – All forces are combined (all reduce), each node knows forces on all atoms – All nodes update positions  FUTURE (Many K nodes):
– Interaction decomposition – "N 2" interactions are distributed rather than N atoms High Performanc e Computing for Large Sc ale Dec ember 8, 2004 2004 IBM Corporation Biomolec ular Simulation Excellent energy conservation – validation of code High Performanc e Computing for Large Sc ale Dec ember 8, 2004 2004 IBM Corporation Biomolec ular Simulation Optimization and Scalability With empirical results High Performanc e Computing for Large Sc ale Dec ember 8, 2004 2004 IBM Corporation Biomolec ular Simulation Timing 512 way for rhodopsin, lipids, water, 43k atoms
assign c harge (4.6ms) Floating point MPI all setting up globalize positions update positions (2ms) bonded forc e c omputation Convolution (1.25ms) pairwise non-bonded MPI c all in globalizing positions (5.1ms) High Performanc e Computing for Large Sc ale Dec ember 8, 2004 2004 IBM Corporation Biomolec ular Simulation Optimization of non-bond interactions  Verlet lists
– Check only O(N) interactions with particles on the list – Lists are recalculated only when particles cross the – Dynamic tuning of the guard zone size for optimization High Performanc e Computing for Large Sc ale Dec ember 8, 2004 2004 IBM Corporation Biomolec ular Simulation Verlet list tuning to find optimum High Performanc e Computing for Large Sc ale Dec ember 8, 2004 2004 IBM Corporation Biomolec ular Simulation Verlet list tuning After tuning
Short step 0.25 ms/time step - most steps Short step 0.2 ms/time step – 5 out of 6 Long step 0.39 ms/time step - infrequently Long step 0.3 ms/time step – 1 out of 6 Average about 0.25 ms/time step
Average about 0.22 ms/time step
High Performanc e Computing for Large Sc ale Dec ember 8, 2004 2004 IBM Corporation Biomolec ular Simulation Verlet list tuning  Before tuning
– Short step 0.25 ms/time step - most steps – Long step 0.39 ms/time step - infrequently – Average about 0.25 ms/time step  After tuning
– Short step 0.2 ms/time step – 5 out of 6 steps – Long step 0.3 ms/time step – 1 out of 6 steps – Average about 0.22 ms/time step High Performanc e Computing for Large Sc ale Dec ember 8, 2004 2004 IBM Corporation Biomolec ular Simulation High Performanc e Computing for Large Sc ale Dec ember 8, 2004 2004 IBM Corporation Biomolec ular Simulation What Limits the Scalability of MD?  Inherent limitations on concurrency:
– Bonded force evaluation  Represents only small fraction of computation, can be distributed moderately well. – Real space non-bond force evaluation  Large fraction of computation, but good distribution can be achieved using volume or – Reciprocal space contribution to force evaluation for Ewald  P3ME uses 3D FFT with global communication  Ewald with direct evaluation uses floating point reduction  Load balancing
 System software
High Performanc e Computing for Large Sc ale Dec ember 8, 2004 2004 IBM Corporation Biomolec ular Simulation Long range electrostatics  Ewald method
– Replaces a slowly conditionally converging infinite sum for electrostatic force with two fast converging sums, one in real space and another inreciprocal (Fourier transformed) space High Performanc e Computing for Large Sc ale Dec ember 8, 2004 2004 IBM Corporation Biomolec ular Simulation Long range electrostatics  Ewald method
– Replaces a slowly conditionally converging infinite sum for electrostatic force with two fast converging sums, one in real space and another inreciprocal (Fourier transformed) space  Real space term is computed together with other pairwise terms High Performanc e Computing for Large Sc ale Dec ember 8, 2004 2004 IBM Corporation Biomolec ular Simulation Long range electrostatics  Ewald method
– Replaces a slowly conditionally converging infinite sum for electrostatic force with two fast converging sums, one in real space and another inreciprocal (Fourier transformed) space  Real space term is computed either directly or using FFT (in P3ME method) High Performanc e Computing for Large Sc ale Dec ember 8, 2004 2004 IBM Corporation Biomolec ular Simulation Parallel decomposition of P3ME  Reciprocal term in P3ME algorithm using FFT
– Charges redistributed over points on a mesh – Fourier transformation takes into reciprocal space (FFT) – Convolution in reciprocal space with Green functions – Inverse Fourier transformation to get electric potentials on the – Interpolation of these potentials to particle locations High Performanc e Computing for Large Sc ale Dec ember 8, 2004 2004 IBM Corporation Biomolec ular Simulation  Multi dimensional FFTs are important kernels for a
Molecular Dynamics algorithm, that we used in Blue Gene
science program

– We use the P3ME (Particle-Particle-Particle-Mesh-Ewald) method to compute long range interaction between chargesin the simulated system – P3ME requires computation of 3D FFT of charge distribution in every time step of the simulation – Target simulation sizes on the BG/L: 5K-200K atoms • Typical sizes of 3D-FFT needed are 643 to 2563  Because of their importance we need a 3D FFT solution for
BG/L that scales to very large node counts.
High Performanc e Computing for Large Sc ale Dec ember 8, 2004 2004 IBM Corporation Biomolec ular Simulation Existing implementations and Challenge for BG/L  Typical parallel 3D FFT implementations (e.g., FFTW) use slab
decomposition to min communication
– In principle the scalability is limited to N processors (for a N x N x N FFT) – Typical sizes of FFT used for MD are 643 to 2563  Our application in Blue Gene/L must scale to 2048 nodes or more
 In theory, row-column decomposition can scale to N2 nodes without
parallelizing individual 1D FFTs
 In volumetric decomposition, each computation phase is separated by
data movement (transposition)
 Important to perform the transposes efficiently, because they can
become very expensive
High Performanc e Computing for Large Sc ale Dec ember 8, 2004 2004 IBM Corporation Biomolec ular Simulation The 3D FFT Algorithm
 Volumetric decomposition divides 3D FFT computation into three stages ofcomputation of N2 1D FFT of length N Each 1D FFT is independent and can be computed in parallel N x N 1D FFTs along the z-dim N x N 1D FFTs along the y- N x N 1D FFTs along the x-dim High Performanc e Computing for Large Sc ale Dec ember 8, 2004 2004 IBM Corporation Biomolec ular Simulation 3D-FFT for Blue Gene/L  Requirement: Scalable 3D FFT for meshes ranging from 323
to 2563 as part of particle mesh molecular dynamics
 Design goal: 3D FFT decomposition with strong scaling
characteristics for mesh sizes of interest (as an alternative
to "slab" based 3D FFT decomposition).

 Prototyping: "Active Packet" and MPI programming model
versions of volumetric 3D FFT have been implemented.
 Results: MPI version shows scaling on SP (Power4)
superior to that of FFTW; BG/L versions show continued
speedups through 1024 nodes.

 Conclusion: Volumetric'' 3D FFT will scale well enough to
support many biomolecular simulation experiments,
including mesh sizes around 1283.

High Performanc e Computing for Large Sc ale Dec ember 8, 2004 2004 IBM Corporation Biomolec ular Simulation Volumetric 3D-FFT on Power4 Cluster 128x128x128 FFTBG 3D-FFT time (seconds) MPI version shows scaling on SP (Power4) superior to that of FFTW
High Performanc e Computing for Large Sc ale Dec ember 8, 2004 2004 IBM Corporation Biomolec ular Simulation  Original goals of Blue Gene science program
 Blue Matter provides scalable MD environment
with innovative design approaches
 Large simulations are running right now, and will
get bigger as nodes arrive
 Stay tuned
High Performanc e Computing for Large Sc ale Dec ember 8, 2004 2004 IBM Corporation Biomolec ular Simulation  Alex Balaeff
 Mike Pitman
 Bruce Berne
 Alex Rayshubskiy
 Maria Eleftheriou
 Yuk Sham
 Scott Feller
 Frank Suits
 Blake Fitch
 Bill Swope
 Klaus Gawrisch
 Chris Ward
 Alan Grossfield
 Yuri Zhestkov
 Jed Pitera
 Ruhong Zhou
 Blue Gene Hardware and
System Software teams
High Performanc e Computing for Large Sc ale Dec ember 8, 2004 2004 IBM Corporation Biomolec ular Simulation

Source: http://www.scc.acad.bg/ncsa/articles/library/Library2014_Supercomputers-at-Work/Molecular%20Dynamics%20Blue%20Matter/Largre_Scale_Biomoleculare_Simulation.pdf

C2chi#080035 1.4

J_ID: CHI Customer A_ID: 08-0027 Cadmus Art: CHI20564 Date: 23-MAY-08 Stage: I CHIRALITY 00:000–000 (2008) Use of Large-Scale Chromatography in the Preparation of Armodafinil WILLY HAUCK,1 PHILIPPE ADAM,2 CHRISTELLE BOBIER,2* AND NELSON LANDMESSER3 1Novasep Inc., Boothwyn, Pennsylvania 2Novasep SAS, Pompey, France 3Cephalon Inc., West Chester, Pennsylvania Armodafinil, the (R)-enantiomer of modafinil, is a medication used to

Programme/sub-programme/section

PROGRAMME: TECHNOLOGY, RESEARCH AND DEVELOPMENT SERVICES DIRECTORATE: PLANT SCIENCE A. PROGRAMME & PROJECT LEADER INFORMATION Programme leader Project leader (Researcher) Title, initials, surname Present position Specialist Agricultural Scientist Specialist Agricultural Scientist