## Molecular Modelling: Principles and Applications## Andrew R. Leach
This text provides a detailed description of the techniques employed in molecular modelling and computational chemistry. The first part of the book covers the two major methods used to describe the interactions within a system (quantum mechanics and molecular mechanics). The second part then deals with techniques that use such energy models, including energy minimisation, molecular dynamics, Monte Carlo simulations and conformational analysis. The author also discusses the use of more advanced modelling techniques such as the calculation of free energies and the simulation of chemical reactions. In addition he considers methods such as database searching that can be used to design new molecules with specific properties. Many of the topics are treated in considerable depth but the author assumes that the reader has only a basic knowledge of the relevant physical and chemical principles. Most of the theoretical sections are accompanied by simple calculations together with examples drawn from the literature. The book is well illustrated and a colour plate section highlights the impact of computer molecular graphics. The book will prove a valuable text for postgraduate students and professionals and many sections will be useful to final-year undergraduates taking courses in molecular modelling or computational chemistry. ## PrefaceMolecular modelling used to be restricted to a small number of scientists who had access to the necessary computer hardware and software. Its practitioners wrote their own programs, managed their own computer systems and mended them when they broke down. Today's computer workstations are much more powerful than the mainframe computers of even a few years ago and can be purchased relatively cheaply. It is no longer necessary for the modeller to write computer programs as software can be obtained from commercial software companies and academic laboratories. Molecular modelling can now be performed in any laboratory or classroom.This book is intended to provide an introduction to some of the techniques used in molecular modelling and computational chemistry, and to illustrate how these techniques can be used to study physical, chemical and biological phenomena. A major objective is to provide, in one volume, some of the theoretical background to the vast array of methods available to the molecular modeller. I also hope that the book will help the reader to select the most appropriate method for a problem and so make the most of his or her modelling hardware and software. Many modelling programs are extremely simple to use and are often supplied with seductive graphical interfaces which obviously helps to make modelling techniques more accessible, but it can also be very easy to select a wholly inappropriate technique or method. Most molecular modelling studies involve three stages. In the first stage a model is selected to describe the intra- and inter- molecular interactions in the system. The two most common models that are used in molecular modelling are quantum mechanics and molecular mechanics. These models enable the energy of any arrangement of the atoms and molecules in the system to be calculated, and allow the modeller to determine how the energy of the system varies as the positions of the atoms and molecules change. The second stage of a molecular modelling study is the calculation itself, such as an energy minimisation, a molecular dynamics or Monte Carlo simulation, or a conformational search. Finally, the calculation must be analysed, not only to calculate properties but also to check that it has been performed properly. The book is organised so that some of the techniques discussed in later chapters refer to material discussed earlier, though I have tried to make each chapter as independent of the others as possible. Some readers may therefore be pleased to know that it is not essential to completely digest the chapters on quantum mechanics and molecular mechanics in order to read about methods for searching conformational space! Readers with experience in one or more areas may of course wish to be more selective. I have tried to provide as much of the underlying theory as seems appropriate to enable the reader to understand the fundamentals of each method. In doing so I have assumed some background knowledge of quantum mechanics, statistical mechanics, conformational analysis and mathematics. A reader with an undergraduate degree in chemistry should have covered this material, which should also be familiar to many undergraduates in the final year of their degree course. Full discussions can be found in the suggestions for further reading at the end of each chapter. I have also attempted to provide a reasonable selection of original references, though in a book of this scope it is obviously impossible to provide a comprehensive coverage of the literature. In this context, I apologise in advance if any technique is inappropriately inattributed. In Chapter 1 we consider some of the historical background to
molecular modelling and discuss a number of important general principles
that are common to many modelling methods. We also examine the use of computer
graphics, the Internet and the World-Wide Web and the molecular modelling
literature. Chapter 1 concludes with a brief summary of some relevant mathematical
concepts. Chapters 2 and 3 describe quantum mechanics and molecular mechanics
, which are the two major methods used to model the interactions within
a molecular system. These methods can be used to calculate the energy of
a given arrangement of the atoms as well as certain other properties. In
chapters 4-8 we examine energy minimisation, molecular dynamics, Monte
Carlo simulations and conformational analysis. These techniques use an
appropriate energy model to determine a wide range of structural and thermodynamic
properties. The final two chapters describe various techniques that combine
concepts from previous chapters. In Chapter 8 we discuss the calculation
of free energies using computer simulation , continuum solvent models,
and methods for simulating chemical reactions. Chapter 9 is concerned with
computational methods for discovering and designing new molecules, such
as database searching , The range of systems that can be considered in molecular modelling is extremely broad, from isolated molecules through simple atomic and molecular liquids to polymers, biological macromolecules such as proteins and DNA and solids. Many of the techniques are illustrated with examples chosen to reflect the breadth of applications. It is inevitable that for reasons of space some techniques must be dealt with in a rudimentary fashion (or not at all), and that many interesting and important applications cannot be described. Molecular modelling is a rapidly developing discipline, and has benefitted from the dramatic improvements in computer hardware and software of recent years. Calculations that were major undertakings only a few years ago can now be performed using personal computing facilities. Thus, examples used to indicate the 'state of the art' at the time of writing will invariably be routine within a short time.
## Contents## Preface## Acknowledgements## Chapter 1. Useful Concepts in Molecular Modelling1.1 Introduction 1.2 Coordinate systems 1.3 Potential Energy surfaces 1.4 Molecular Graphics 1.5 Surfaces 1.6 Computer hardware and sofware 1.7 Units of length and energy 1.8 The molecular modelling literature 1.9 The Internet 1.10 Mathematical Concepts 1.10.1 Series expansions 1.10.2 Vectors 1.10.3 Matrices, eigenvalues and eigenvectors 1.10.4 Complex numbers 1.10.5 Lagrange multipliers 1.10.6 Multiple integrals 1.10.7 Some basic elements of statistics ## Chapter 2. Quantum mechanical models2.1 Introduction 2.1.1 Operators 2.1.2 Atomic units 2.1.3 Exact solutions to the Schrödinger equation 2.2 One-electron atoms 2.3 Polyelectronic atoms and molecules 2.3.1 The Born-Oppenheimer approximation 2.3.2 The helium atom 2.3.3 General polyelectronic systems and Slater Determinants 2.4 Molecular orbital calculations 2.4.1 Calculating the energy from the wavefunction: the hydrogen molecule 2.4.2 The energy of a general polyelectronic system 2.4.3 Shorthand representations of the 1 and 2 electron integrals 2.4.4 The energy of a closed-shell system 2.5 The Hartree-Fock equations 2.5.1 The Hartree-Fock equations for atoms and Slater's rules 2.5.2 The linear combination of atomic orbitals (LCAO) in Hartree-Fock theory 2.5.3 Closed-shell systems and the Roothaan-Hall equations 2.5.4 Solving the Roothaan-Hall equations 2.5.5 A simple illustration of the Roothaan-Hall approach 2.5.6 The application of the Hartree-Fock equations to molecular systems 2.6 Basis sets 2.6.1 Creating a basis set 2.7 Open-shell systems 2.8 Electron correlation 2.8.1 Configuration interaction 2.8.2 Many body perturbation theory 2.9 Practical considerations when running ## Chapter 3 Empirical Force field models: molecular mechanics3.1 Introduction 3.1.1 A simple molecular mechanics force field 3.2 Some general features of molecular mechanics force fields 3.3 Bond stretching 3.4 Angle bending 3.5 Torsional terms 3.6 Improper torsions and out-of-plane bending motions 3.7 Cross terms Non-bonded interactions 3.8 Electrostatic interactions 3.8.1 The central multipole expansion 3.8.2 Point-charge electrostatic models 3.8.3 Calculating partial atomic charges 3.8.4 Charges derived from the molecular electrostatic potential 3.8.5 Deriving charge models for large systems 3.8.6 Rapid methods for calculating atomic charges 3.8.7 Beyond partial atomic charge models 3.8.8 Distributed multipole models 3.8.9 Applications of charge schemes to the study of aromatic-aromatic interactions 3.8.10 Polarization 3.8.11 Solvent dielectric models 3.9 van der Waals interactions 3.9.1 Dispersive interactions 3.9.2 The repulsive contribution 3.9.3 Modelling van der Waals interactions 3.9.4 van der Waals interactions in polyatomic systems 3.9.5 Reduced units 3.10 Many body effects in empirical potentials 3.11 Effective pair potentials 3.12 Hydrogen bonding in molecular mechanics 3.13 Force field models for the simulation of liquid water 3.13.1 Simple water models 3.13.2 Polarisable water models 3.13.3 ## Chapter 4 Energy minimization and other methods for exploring the energy surface4.1 Introduction 4.1.1 Energy minimisation: statement of the problem 4.1.2 Derivatives 4.2 Non-derivative minimisation methods 4.2.1 The Simplex method 4.2.2 The sequential univariate method 4.3 Introduction to derivative minimisation methods 4.4 First-order minimisation methods 4.4.1 The steepest descents method 4.4.2 Line search in one dimension 4.4.3 Arbitrary step approach 4.4.4 Conjugate gradients minimisation 4.5 Second derivative methods: the Newton-Raphson method 4.5.1 Variants on the Newton-Raphson method 4.6 Quasi-Newton methods 4.7 Which minimisation method should I use? 4.7.1 Distinguishing between minima, maxima and transition points 4.7.2 Convergence criteria 4.8 Applications of energy minimisation 4.8.1 Normal mode analysis 4.8.2 The study of intermolecular processes using energy minimisation and normal mode analysis 4.9 The determination of transition points and reaction pathways 4.9.1 Methods to locate saddle points 4.9.2 Reaction path following 4.9.3 Locating transition points and elucidating reaction pathways for large systems 4.9.4 The transition structures of pericyclic reactions ## Chapter 5 An introduction to computer simulation methods5.1 Introduction 5.1.1 Time averages, ensemble averages and some historical background 5.1.2 A brief description of the molecular dynamics method 5.1.3 The basic elements of the Monte Carlo method 5.1.4 Differences between the molecular dynamics and Monte Carlo methods 5.2 Calculation of simple thermodynamic properties 5.2.1 Energy 5.2.2 Heat capacity 5.2.3 Pressure 5.2.4 Temperature 5.2.5 Radial disribution functions 5.3 The concept of phase space 5.4 Practical aspects of computer simulation 5.4.1 Setting up and running a simulation 5.4.2 Choosing the initial configuration 5.5 Boundaries 5.5.1 Periodic boundary conditions 5.5.2 Non-periodic boundary conditions 5.6 Monitoring the equilibration 5.7 Truncating the potential and the minimum image convention 5.7.1 Non-bonded neighbour lists 5.7.2 Group-based cutoffs 5.7.3 Problems with cutoffs and how to avoid them 5.8 Long-range forces 5.8.1 The Ewald summation method 5.8.2 The reaction field and image charge methods 5.9 The cell-multipole method for calculating non-bonded interactions 5.10 Analysing the results of a simulation and estimating errors Appendix 5.1 Basic statistical mechanics Appendix 5.2 Relationship between heat capacity and energy fluctuations Appendix 5.3 Calculation of the real gas contribution to the virial Appendix 5.4 Fomulae to translate particle back into central box for various periodic shapes ## Chapter 6 Molecular Dynamics6.1 Introduction 6.2 Molecular dynamics using simple models 6.3 Molecular dynamics with continuous potentials 6.3.1 Finite difference methods 6.3.2 Predictor-corrector integration methods 6.3.3 Which integration algorithm is most appropriate? 6.3.4 Choosing the time step 6.4 Setting up and running a molecular dynamics simulation 6.4.1 Calculating the temperature 6.5 Constraint dynamics 6.6 Time-dependent properties 6.6.1 Correlation functions 6.6.2 Orientational correlation functions 6.6.3 Transport properties 6.7 Constant temperature and constant pressure molecular dynamics 6.7.1 Constant temperature dynamics 6.7.2 Constant pressure dynamics 6.8 Incorporating solvent effects into molecular dynamics: Potentials of Mean Force and Stochastic dynamics 6.8.1 Practical aspects of stochastic dynamics simulations 6.9 Conformational changes from molecular dynamics simulations 6.10 Molecular Dynamics simulations of chain amphiphiles 6.10.1 Simulations of lipids 6.10.2 Simulations of Langmuir-Blodgett films Appendix 6.1 Energy conservation in molecular dynamics Appendix 6.2 Fourier series and fourier analysis ## Chapter 7 Monte Carlo simulation methods7.1 Introduction 7.2 Calculating properties by integration 7.3 Some theoretical background to the Metropolis method 7.4 Implementation of the Metropolis Monte Carlo method 7.4.1 Random number generators 7.5 Monte Carlo simulation of molecules 7.5.1 Rigid molecules 7.5.2 Monte Carlo simulations of flexible molecules 7.6 Models used in Monte Carlo simulations of polymers 7.6.1 Lattice models of polymers 7.6.2 'Continuous' polymer models 7.7 `Biased' Monte Carlo methods 7.8 Monte Carlo sampling from different ensembles 7.8.1 Grand-canonical Monte Carlo simulations 7.8.2 Grand-canonical Monte Carlo simulations of adsorption processes 7.9 Calculating the Chemical Potential 7.10 The Configurational Bias Monte Carlo method 7.10.1 Applications of the Configurational bias Monte Carlo method 7.11 Simulating phase equilibria by the Gibbs Ensemble Monte Carlo method 7.12 Monte Carlo or molecular dynamics? Appendix 7.1 The Marsaglia random number generator ## Chapter 8 Conformational analysis8.1 Introduction 8.2 Systematic methods for exploring conformational space 8.3 Model-building approaches 8.4 Random search methods 8.5 Genetic algorithms 8.6 Distance geometry 8.6.1 The use of distance geometry in NMR 8.7 Exploring conformational space using simulation methods 8.7.1 Simulated annealing 8.7.2 Solving protein structures by restrained molecular dynamics refinement 8.7.3 X-ray crystallographic refinement 8.7.4 Molecular dynamics refinement of NMR data 8.7.5 Time-averaged NMR refinement 8.8 Which conformational search method should I use? 8.9 Structural databases 8.10 Molecular fitting 8.11 Clustering algorithms and pattern recognition techniques in molecular modelling 8.12 Reducing the dimensionality of a data set 8.12.1 Principal components analysis 8.13 The role of conformational analysis in predicting the structure of peptides and proteins 8.13.1 Some basic principles of protein structure 8.13.2 The hydrophobic effect 8.13.3 First-principles methods for predicting protein structure 8.13.4 Lattice models for invesigating protein structure 8.13.5 Rule-based approaches 8.13.6 Homology and comparative modelling methods 8.13.7 Aligning protein sequences 8.13.8 Constructing and evaluating an homology model 8.13.9 Predicting protein structures by 'threading' 8.13.10 A comparison of comparative modelling stategies ## Chapter 9 Three challenges in molecular modelling: free energies, solvation and simulating reactions9.1 The difficulty of calculating free energies from a computer simulation 9.2 The calculation of free energy differences 9.2.1 Thermodynamic perturbation 9.2.2 Implementation of free-energy perturbation 9.2.3 Thermodynamic integration 9.2.4 The 'Slow growth' method 9.3 Applications of methods for calculating free energy differences 9.3.1 Thermodynamic cycles 9.3.2 Applications of the thermodynamic cycle perturbation method 9.3.3 The calculation of absolute free energies 9.4 The calculation of enthalpy and entropy differences 9.5 Partitioning the free energy 9.6 Potential pitfalls with free energy calculations 9.6.1 Implementation aspects 9.7 Potentials of mean force 9.7.1 Umbrella sampling 9.7.2 Calculating the potential of mean force for flexible molecules 9.8 Continuum representations of the solvent 9.8.1 Thermodynamic background 9.9 The electrostatic contribution to the free energy of solvation: the Born and Onsager models 9.9.1 Calculating the electrostatic contribution via quantum mechanics 9.9.2 Continuum models for molecular mechanics 9.9.3 The Langevin dipole model 9.9.4 Methods based upon the Poisson-Boltzmann equation 9.9.5 Applications of finite difference Poisson-Boltzmann calculations 9.6 Non-electrostatic contributions to the solvation free energy 9.7 Very simple solvation models 9.8 Modelling chemical reactions 9.8.1 Empirical approaches to simulating reactions 9.8.2 The potential of mean force of a reaction 9.8.3 Combined quantum mechanical/molecular mechanical approaches 9.9 Density functional theory 9.9.1 Density Functional Methods in the study of processes on solids 9.9.2 The Car-Parinello method 9.9.3 The Application of ## Chapter 10 The use of molecular modelling to discover and design new molecules10.1 Molecular modelling in drug discovery 10.2 Deriving and using three-dimensional pharmacophores 10.2.1 Constrained systematic search 10.2.2 Ensemble distance geometry and ensemble molecular dynamics 10.2.3 Clique detection methods for finding pharmacophores 10.2.4 Incorporating geometric features in a 3D pharmacophore 10.3 Molecular docking 10.4 Structure-basd methods to identify lead compounds 10.4.1 Finding lead compounds by searching 3D databases 10.4.2 Sources of 3D data 10.5 ## Pictures...Click the number to view the image.- 1.4 Some of the common molecular graphics representations of molecules, illustrated using the crystal structure of nicotinamide adenine dinucleotide phosphate (NADPH) [Reddy et al 1981]. Clockwise, from top left: stick, CPK/space filling, 'balls and stick' and 'tube'. Image generated using InsightII.
- 1.5 Graphical representations of proteins illustrated using the enzyme dihydrofolate reductase [Bolin et al 1982]. Clockwise from top left: stick, CPK, 'cartoon' and 'ribbon'. Image generated using InsightII.
- 1.7 Graphical representations of the molecular surface of tryptophan. Clockwise from top left: dots, opaque solid, mesh, transluscent solid. Image generated using InsightII.
- 1.8 A sample WWW page. Items highlighted in blue are hyperlinks to other documents, data, graphics, sound etc. The text in purple ('MidasPlus') indicates a hyperlink that has previously been read.
- 2.17 Surface representation of electron density around formamide at a contour of 0.0001au (electrons/bohr3). Image generated using Spartan.
- 2.18 HOMO of formamide. The red contour indicates a negative part of the wavefunction and blue a positive part of the wavefunction. The formamide molecule is oriented with the oxygen atom on the left pointing towards the viewer as in Figure 2.17. Image generated using Spartan.
- 2.19 LUMO of formamide. Image generated using Spartan.
- 2.24 Electrostatic potential mapped onto the electron density surface of formamide. The orientation of the molecule is as in Figure 2.17. Red indicates negative electrostatic potential and blue is positive potential. Image generated using Spartan.
- 6.21 Snapshot from molecular dynamics simulation of a solvated lipid bilayer [Robinson and Richards 1995]. The disorder of the alkyl chains can be clearly seen. Image generated using InsightII from data generated by Alan Robinson.
- 7.19 Final configuration obtained from a configurational bias Monte Carlo simulation of thioalkanes adsorbed on a gold surface [Siepmann and MacDonald 1993a]. The system contains 224 molecules which are colour coded according to the number of gauche defects, with red chains being all trans, yellow chains containing three gauche bonds and green chains containing five gauche bonds. Data and figure supplied by J. Ilja Siepmann.
- 8.19 Twelve conformations of the chemokine RANTES generated from NMR data using distance geometry. [Chung et al 1995]. Image generated using InsightII.
- 8.22 Fitting a polypeptide chain to the electron density when determining the structure of a protein using X-ray crystallography. The figure shows part of the structure of rat ADP-ribosylation factor-1 (ARF-1) [Greasley et al 1995]. Image generated using Quanta.
- 8.39 Trypsin [Turk et al 1991], chymotrypsin [Birktoft and Blow 1972] and thrombin [Turk et al 1992] have similar three-dimensional structures. Image generated using InsightII.
- 8.40 The 'TIM barrel' [Noble et al 1991] Image generated using InsightII.
- 8.41 A superposition of the aspartic acid, histidine and serine amino acids in the active sites of trypsin (yellow), chymotrypsin (red) and thrombin (green) Image generated using InsightII.
- 9.22 3D Electrostatic isopotential contours around trypsin [Marquart et al 1983]. Contours are drawn at -1kT (red) and +1kT (blue). The trypsin inhibitor is also shown with its electrostatic potential mapped onto the molecular surface. Figure generated using GRASP.
- 9.23 Electrostatic potential around Cu-Zn superoxide dismutase [McRee et al 1990]. Red contours indicate negative electrostatic potential and blue contours indicate positive electrostatic potential. Two active sites are present in the dimer, and are located at the top left and bottom right of the Figure where there is a significant concentration of positive electrostatic potential. Figure generated using GRASP.
- 10.1 Surface complementarirty of avidin (purple) and biotin (white) [Pugliese et al 1993]. Image generated using InsightII.
- 10.21 The result of a GRID calculation using carboxylate and amidine probes in the binding site of neuraminidase. The regions of minimum energy are contoured (carboxylate red; amidine blue). Also shown is the inhibitor 4-guanidino-Neu5Ac2en which contains these two functional groups [von Itzstein et al 1993]. Image generated using InsightII.
- 10.22 The representation of potential hydrogen binding regions in tyrosyl-tRNA synthetase [Brick et al 1989]. Regions where a hydrogen bond donor could be positioned are indicated by red disks and acceptor regions are represented by pink hemispheres. The ligand (tyrosyl adenylate) is also shown. Figure generated using HIPPO.
- 10.24 The HIV-1 protease with the inhibitor CGP53820 bound [Priestle et al 1995]. The water molecule that hydrogen bonds to the inhibitor and to the 'flaps' is drawn as a white sphere, and the catalytic aspartate groups of the enzyme are also represented. Image generated using InsightII.
## Hyperlinks referenced in Appendix 1.1Computational Chemistry List (a discussion forum for all those interested in computational chemistry) An overview of chemistry mailing lists Imperial College Chemistry home page (contains links to a host of useful modelling- related pages) National Institutes of Health molecular modelling home page Swiss-Model (automated protein modelling server) Brookhaven databank (protein structures) Cambridge Crystallographic Data Centre Network Science (an on-line journal of science and computers) Royal Society of Chemistry (U.K.) ## ContactsPlease send feedback to: arl22958@ggr.co.uk |