Published by Pearson
Education EMA, January 2001

Overview

Preface to the second edition

Preface to the first edition

Comprehensive contents listing

Colour figures

Appendices (Acronyms in Quantum Chemistry, Bioinformatics abbreviations and acronyms, Sequence/structure databases)

To send email

Separate pages for:

Hyperlinks

3D visualisation using Chime

This book provides a detailed description of the techniques employed in molecular modelling and computational chemistry. The first part of the book covers the two major methods used to describe the interactions within a system (quantum mechanics and molecular mechanics). The second part then deals with techniques that use such energy models, including energy minimisation, molecular dynamics, Monte Carlo simulations and conformational analysis. The author also discusses the use of more advanced modelling techniques such as the calculation of free energies and the simulation of chemical reactions. In addition he considers aspects of both chemoinformatics and bioinformatics and techniques that can be used to design new molecules with specific properties. Many of the topics are treated in considerable depth but the reader is assumed to have but a basic knowledge of the relevant physical and chemical principles.

Most of the theoretical sections are accompanied by simple calculations together with examples drawn from the literature. The book is well illustrated and a colour plate section highlights the impact of computer molecular graphics. The book will prove a valuable text for postgraduate students and professionals and many sections will be useful to final-year undergraduates taking courses in molecular modelling or computational chemistry.

The impetus for this second edition is a desire to include some of the new techniques that have emerged in recent years and also extend the scope of the book to cover certain areas that were under-represented (even neglected) in the first edition. In this second volume there are three topics that fall into the first category (density functional theory, bioinformatics/protein structure analysis and chemoinformatics) and one main area in the second category (modelling of the solid-state). In addition, of course, a new edition provides an opportunity to take a critical view of the text and to re-organise and update the material. Thus whilst much remains from the first edition, and this second book follows much the same path through the subject, readers familiar with the first edition will find some changes which I hope they will agree are for the better.

As with the first edition we initially consider quantum
mechanics, but this is now split into two chapters.
Thus Chapter 2 provides an introduction to the *
ab initio* and semi-empirical approaches together
with some examples of the uses of quantum mechanics. Chapter
3 covers more advanced aspects of the *ab
initio* approach, density functional theory and the
particular problems of the solid-state. Molecular mechanics is the
subject of chapter 4 and then in Chapter 5 we consider energy
minimisation and other "static" techniques.
Chapters 6, 7 and 8 deal with the two main simulation
methods (molecular dynamics and Monte Carlo). Chapter 9 is devoted
to the conformational analysis of "small"
molecules but also includes some topics (e.g. cluster
analysis, principal components analysis) that are widely used in informatics.
In Chapter 10 the problems of protein structure prediction and protein
folding are considered; this chapter also contains an introduction
to some of the more widely used methods in bioinformatics. In
Chapter 11 we draw upon material from the previous chapters in a
discussion of free energy calculations, continuum solvent models, methods for
simulating chemical reactions and defects in solids. Finally, Chapter 12
is concerned with modelling and chemoinformatics techqniques for discovering and designing new
molecules, including database searching, docking, *de
novo* design, quantitative structure-activity relationships and
combinatorial library design.

As in the first edition, the inexorable pace of change means that what is currently considered "cutting edge" will soon become routine. The examples are thus chosen primarily because they illuminate the underlying theory rather than because they are the first application of a particular technique or are the most recent available. In a similar vein, it is impossible in a volume such as this to even attempt to cover everything and so there are undoubtedly areas which are under-represented. This is not intended to be a definitive historical account nor a review of the current state- of-the-art. Thus, whilst I have tried to include many literature references it is possible that the invention of some technique may appear to be incorrectly attributed or a " ;classic" application may be missing. A general guiding principle has been to focus on those techniques that are in widespread use rather than those which are the province of one particular research group. Despite these caveats I hope that the coverage is sufficient to provide an solid introduction to the main areas and also that those readers who are "experts" will find something new to interest them.

Molecular modelling used to be restricted to a small number of scientists who had access to the necessary computer hardware and software. Its practitioners wrote their own programs, managed their own computer systems and mended them when they broke down. Today's computer workstations are much more powerful than the mainframe computers of even a few years ago and can be purchased relatively cheaply. It is no longer necessary for the modeller to write computer programs as software can be obtained from commercial software companies and academic laboratories. Molecular modelling can now be performed in any laboratory or classroom.

This book is intended to provide an introduction to some of the techniques used in molecular modelling and computational chemistry, and to illustrate how these techniques can be used to study physical, chemical and biological phenomena. A major objective is to provide, in one volume, some of the theoretical background to the vast array of methods available to the molecular modeller. I also hope that the book will help the reader to select the most appropriate method for a problem and so make the most of his or her modelling hardware and software. Many modelling programs are extremely simple to use and are often supplied with seductive graphical interfaces which obviously helps to make modelling techniques more accessible, but it can also be very easy to select a wholly inappropriate technique or method.

Most molecular modelling studies involve three stages. In the first stage a model is selected to describe the intra- and inter- molecular interactions in the system. The two most common models that are used in molecular modelling are quantum mechanics and molecular mechanics. These models enable the energy of any arrangement of the atoms and molecules in the system to be calculated, and allow the modeller to determine how the energy of the system varies as the positions of the atoms and molecules change. The second stage of a molecular modelling study is the calculation itself, such as an energy minimisation, a molecular dynamics or Monte Carlo simulation, or a conformational search. Finally, the calculation must be analysed, not only to calculate properties but also to check that it has been performed properly.

The book is organised so that some of the techniques discussed in later chapters refer to material discussed earlier, though I have tried to make each chapter as independent of the others as possible. Some readers may therefore be pleased to know that it is not essential to completely digest the chapters on quantum mechanics and molecular mechanics in order to read about methods for searching conformational space! Readers with experience in one or more areas may of course wish to be more selective.

I have tried to provide as much of the underlying theory as seems appropriate to enable the reader to understand the fundamentals of each method. In doing so I have assumed some background knowledge of quantum mechanics, statistical mechanics, conformational analysis and mathematics. A reader with an undergraduate degree in chemistry should have covered this material, which should also be familiar to many undergraduates in the final year of their degree course. Full discussions can be found in the suggestions for further reading at the end of each chapter. I have also attempted to provide a reasonable selection of original references, though in a book of this scope it is obviously impossible to provide a comprehensive coverage of the literature. In this context, I apologise in advance if any technique is inappropriately inattributed.

In Chapter 1 we consider some of the historical background to molecular modelling and discuss
a number of important general principles that are common to many modelling methods. We
also examine the use of computer graphics, the Internet and the World-Wide
Web and the molecular modelling literature. Chapter 1 concludes with a brief summary of
some relevant mathematical concepts. Chapters 2 and 3 describe quantum mechanics and molecular mechanics
, which are the two major methods used to model the interactions within a molecular
system. These methods can be used to calculate the energy of a given arrangement
of the atoms as well as certain other properties. In chapters 4-8
we examine energy minimisation, molecular dynamics, Monte Carlo simulations and conformational analysis.
These techniques use an appropriate energy model to determine a wide range of structural and
thermodynamic properties. The final two chapters describe various techniques that combine concepts from previous
chapters. In Chapter 8 we discuss the calculation of free energies using computer simulation
, continuum solvent models, and methods for simulating chemical reactions. Chapter 9 is
concerned with computational methods for discovering and designing new molecules, such as database searching
, *de novo* design and quantitative structure-activity relationships.

The range of systems that can be considered in molecular modelling is extremely broad, from isolated molecules through simple atomic and molecular liquids to polymers, biological macromolecules such as proteins and DNA and solids. Many of the techniques are illustrated with examples chosen to reflect the breadth of applications. It is inevitable that for reasons of space some techniques must be dealt with in a rudimentary fashion (or not at all), and that many interesting and important applications cannot be described. Molecular modelling is a rapidly developing discipline, and has benefitted from the dramatic improvements in computer hardware and software of recent years. Calculations that were major undertakings only a few years ago can now be performed using personal computing facilities. Thus, examples used to indicate the 'state of the art' at the time of writing will invariably be routine within a short time.

Preface to the second edition Preface to the first edition Symbols and physical constants Acknowledgements 1. USEFUL CONCEPTS IN MOLECULAR MODELLING 1.1 Introduction 1.2 Coordinate systems 1.3 Potential energy surfaces 1.4 Molecular graphics 1.5 Surfaces 1.6 Computer hardware and software 1.7 Units of length and energy 1.8 The molecular modelling literature 1.9 The Internet 1.10 Mathematical concepts 1.10.1 Series expansions 1.10.2 Vectors 1.10.3 Matrices, eigenvectors and eigenvalues 1.10.4 Complex numbers 1.10.5 Lagrange multipliers 1.10.6 Multiple integrals 1.10.7 Some basic elements of statistics 1.10.8 The Fourier series, Fourier transform and fast-Fourier transform 2. AN INTRODUCTION TO COMPUTATIONAL QUANTUM MECHANICS 2.1 Introduction 2.1.1 Operators 2.1.2 Atomic units 2.1.3 Exact solutions to the Schrödinger equation 2.2 One-electron atoms 2.3 Polyelectronic atoms and molecules 2.3.1 The Born-Oppenheimer approximation 2.3.2 The helium atom 2.3.3 General polyelectronic systems and Slater determinants 2.4 Molecular orbital calculations 2.4.1 Calculating the energy from the wavefunction: the hydrogen molecule 2.4.2 The energy of a general polyelectronic system 2.4.3 Shorthand representations of the one- and two-electron integrals 2.4.4 The energy of a closed-shell system 2.5 The Hartree-Fock equations 2.5.1 Hartree-Fock calculations for atoms and Slater's rules 2.5.2 Linear combination of atomic orbitals (LCAO) in Hartree-Fock theory 2.5.3 Closed-shell systems and the Roothaan-Hall equations 2.5.4 Solving the Roothaan-Hall equations 2.5.5 A simple illustration of the Roothaan-Hall approach 2.5.6 Application of the Hartree-Fock equations to molecular systems 2.6 Basis sets 2.6.1 Creating a basis set 2.7 Calculating molecular properties using ab initio quantum mechanics 2.7.1 Setting up the calculation and the choice of coordinates 2.7.2 Energies, Koopman's theorem and ionisation potentials 2.7.3 Calculation of electric multipoles 2.7.4 The total electron density distribution and molecular orbitals 2.7.5 Population analysis 2.7.6 Mulliken and Löwdin population analysis 2.7.7 Partitioning electron density: the theory of atoms in molecules 2.7.8 Bond orders 2.7.9 Electrostatic potentials 2.7.10 Thermodynamic and structural properties 2.8 Approximate molecular orbital theories 2.9 Semi-empirical methods 2.9.1 Zero-differential overlap 2.9.2 CNDO 2.9.3 INDO 2.9.4 NDDO 2.9.5 MINDO/3 2.9.6 MNDO 2.9.7 AM1 2.9.8 PM3 2.9.9 SAM1 2.9.10 Programs for semi-empirical quantum mechanical calculations 2.10 Hückel theory 2.10.1 Extended Hückel theory 2.11 Performance of semi-empirical methods Appendix 2.1 Some Common Acronyms Used in Computational Quantum Chemistry 3. ADVANCED AB INITIO METHODS, DENSITY FUNCTIONAL THEORY AND SOLID-STATE QUANTUM MECHANICS 3.1 Introduction 3.2 Open-shell systems 3.3 Electron correlation 3.3.1 Configuration interaction 3.3.2 Many body perturbation theory 3.4 Practical considerations when performing ab initio calculations 3.4.1 Convergence of self-consistent field calculations 3.4.2 The direct SCF method 3.4.3 Calculating derivatives of the energy 3.4.4 Basis set superposition error 3.5 Energy component analysis 3.5.1 Morokuma analylsis of the water dimer 3.6 Valence bond theories 3.7 Density functional theory 3.7.1 Spin-polarised density functional theory 3.7.2 The exchange-correlation functional 3.7.3 Beyond the local density approximation: gradient-corrected functionals 3.7.4 Hybrid Hartree-Fock/Density Functional Methods 3.7.5 Performance and applications of density functional theory 3.8 Quantum mechanical methods for studying the solid-state 3.8.1 Introduction 3.8.2 Band theory and orbital-based approaches 3.8.3 The periodic Hartree-Fock approach to studying the solid state 3.8.4 The nearly-free electron approximation 3.8.5 The Fermi surface and density of states 3.8.6 Density Functional Methods for studying the solid state: plane waves and pseudopotentials 3.8.7 Application of solid-state quantum mechanics to the group 14 elements 3.9 The future role of quantum mechanics: theory and experiment working together Appendix 3.1 Alternative Expression for a Wavefunction Satisfying Bloch's Function 4. EMPIRICAL FORCE FIELD MODELS: MOLECULAR MECHANICS 4.1 Introduction 4.1.1 A simple molecular mechanics force field 4.2 Some general features of molecular mechanics force fields 4.3 Bond stretching 4.4 Angle bending 4.5 Torsional terms 4.6 Improper torsions and out-of-plane bending motions 4.7 Cross terms Class 1, 2 and 3 force fields 4.8 Introduction to non-bonded interactions 4.9 Electrostatic interactions 4.9.1 The central multipole expansion 4.9.2 Point-charge electrostatic models 4.9.3 Calculating partial atomic charges 4.9.4 Charges derived from the molecular electrostatic potential 4.9.5 Deriving charge models for large systems 4.9.6 Rapid methods for calculating atomic charges 4.9.7 Beyond partial atomic charge models 4.9.8 Distributed multipole models 4.9.9 Using charge schemes to study aromatic-aromatic interactions 4.9.10 Polarisation 4.9.11 Solvent dielectric models 4.10 van der Waals interactions 4.10.1 Dispersive interactions 4.10.2 The repulsive contribution 4.10.3 Modelling van der Waals interactions 4.10.4 van der Waals interactions in polyatomic systems 4.10.5 Reduced units 4.11 Many-body effects in empirical potentials 4.12 Effective pair potentials 4.13 Hydrogen bonding in molecular mechanics 4.14 Force field models for the simulation of liquid water 4.14.1 Simple water models 4.14.2 Polarisable water models 4.14.3 Ab initio potentials for water 4.15 United atom force fields and reduced representations 4.15.1 Other simplified models 4.16 Derivatives of the molecular mechanics energy function 4.17 Calculating thermodynamic properties using a force field 4.18 Force field parametrisation 4.19 Transferability of force field parameters 4.20 The treatment of delocalised ?-systems 4.21 Force fields for inorganic molecules 4.22 Force fields for solid-state systems 4.22.1 Covalent solids: zeolites 4.22.2 Ionic solids 4.23 Empirical potentials for metals and semiconductors Appendix 4.1 The Interaction Between Two Drude Molecules 5. ENERGY MINIMISATION AND RELATED METHODS FOR EXPLORING THE ENERGY SURFACE 5.1 Introduction 5.1.1 Energy minimisation: statement of the problem 5.1.2 Derivatives 5.2 Non-derivative minimisation methods 5.2.1 The simplex method 5.2.2 The sequential univariate method 5.3 Introduction to derivative minimisation methods 5.4 First-order minimisation methods 5.4.1 The steepest descents method 5.4.2 Line search in one dimension 5.4.3 Arbitrary step approach 5.4.4 Conjugate gradients minimisation 5.5 Second derivative methods: the Newton-Raphson method 5.5.1 Variants on the Newton-Raphson method 5.6 Quasi-Newton methods 5.7 Which minimisation method should I use? 5.7.1 Distinguishing between minima, maxima and saddle points 5.7.2 Convergence criteria 5.8 Applications of energy minimisation 5.8.1 Normal mode analysis 5.8.2 The study of intermolecular processes 5.9 Determination of transition structures and reaction pathways 5.9.1 Methods to locate saddle points 5.9.2 Reaction path following 5.9.3 Transition structures and reaction pathways for large systems 5.9.4 The transition structures of pericyclic reactions 5.10 Solid-state systems: lattice statics and lattice dynamics 6. COMPUTER SIMULATION METHODS 6.1 Introduction 6.1.1 Time averages, ensemble averages and some historical background 6.1.2 A brief description of the molecular dynamics method 6.1.3 The basic elements of the Monte Carlo method 6.1.4 Differences between the molecular dynamics and Monte Carlo methods 6.2 Calculation of simple thermodynamic properties 6.2.1 Energy 6.2.2 Heat capacity 6.2.3 Pressure 6.2.4 Temperature 6.2.5 Radial distribution functions 6.3 Phase space 6.4 Practical aspects of computer simulation 6.4.1 Setting up and running a simulation 6.4.2 Choosing the initial configuration 6.5 Boundaries 6.5.1 Periodic boundary conditions 6.5.2 Non-periodic boundary methods 6.6 Monitoring the equilibration 6.7 Truncating the potential and the minimum image convention 6.7.1 Non-bonded neighbour lists 6.7.2 Group-based cutoffs 6.7.3 Problems with cutoffs and how to avoid them 6.8 Long-range forces 6.8.1 The Ewald summation method 6.8.2 The reaction field and image charge methods 6.8.3 The cell multipole method for non-bonded interactions 6.9 Analysing the results of a simulation and estimating errors Appendix 6.1 Basic Statistical Mechanics Appendix 6.2 Heat Capacity and Energy Flucutations Appendix 6.3 The Real Gas Contribution to the Virial Appendix 6.4 Translating Particle Back into Central Box 7. MOLECULAR DYNAMICS SIMULATION METHODS 7.1 Introduction 7.2 Molecular dynamics using simple models 7.3 Molecular dynamics with continuous potentials 7.3.1 Finite difference methods 7.3.2 Predictor-corrector integration methods 7.3.3 Which integration algorithm is most appropriate? 7.3.4 Choosing the time step 7.3.5 Multiple time step dynamics 7.4 Setting up and running a molecular dynamics simulation 7.4.1 Calculating the temperature 7.5 Constraint dynamics 7.6 Time-dependent properties 7.6.1 Correlation functions 7.6.2 Orientational correlation functions 7.6.3 Transport properties 7.7 Molecular dynamics at constant temperature and pressure 7.7.1 Constant temperature dynamics 7.7.2 Constant pressure dynamics 7.8 Incorporating solvent effects into molecular dynamics: potentials of mean force and stochastic dynamics 7.8.1 Practical aspects of stochastic dynamics simulations 7.9 Conformational changes from molecular dynamics simulations 7.10 Molecular dynamics simulations of chain amphiphiles 7.10.1 Simulation of lipids 7.10.2 Simulations of Langmuir-Blodgett films 7.10.3 Mesoscale modelling: Dissipative Particle Dynamics Appendix 7.1 Energy Conservation in Molecular Dynamics 8. MONTE CARLO SIMULATION METHODS 8.1 Introduction 8.2 Calculating properties by integration 8.3 Some theoretical background to the Metropolis method 8.4 Implementation of the Metropolis Monte Carlo method 8.4.1 Random number generators 8.5 Monte Carlo simulation of molecules 8.5.1 Rigid molecules 8.5.2 Monte Carlo simulations of flexible molecules 8.6 Models used in Monte Carlo simulations of polymers 8.6.1 Lattice models of polymers 8.6.2 'Continuous' polymer models 8.7 'Biased' Monte Carlo methods 8.8 Tackling the problem of quasi-ergodicity: J-walking and multicanonical Monte Carlo 8.8.1 J-walking 8.8.2 The multicanonical Monte Carlo method 8.9 Monte Carlo sampling from different ensembles 8.9.1 Grand canonical Monte Carlo simulations 8.9.2 Grand canonical Monte Carlo simulations of adsorption processes 8.10 Calculating the chemical potential 8.11 The configurational bias Monte Carlo method 8.11.1 Applications of the configurational bias Monte Carlo method 8.12 Simulating phase equilibria by the Gibbs ensemble Monte Carlo method 8.13 Monte Carlo or molecular dynamics? Appendix 8.1 The Marsaglia Random Number Generator 9. CONFORMATIONAL ANALYSIS 9.1 Introduction 9.2 Systematic methods for exploring conformational space 9.3 Model-building approaches 9.4 Random search methods 9.5 Distance geometry 9.5.1 The use of distance geometry in NMR 9.6 Exploring conformational space using simulation methods 9.7 Which conformational search method should I use? A comparison of different approaches 9.8 Variations upon the standard methods 9.8.1 The systematic unbounded multiple minimum method (SUMM) 9.8.2 Low-Mode Search 9.9 Finding the global energy minimum: Evolutionary algorithms and simulated annealing 9.9.1 Genetic and evolutionary algorithms 9.9.2 Simulated annealing 9.10 Solving protein structures using restrained molecular dynamics and simulated annealing 9.10.1 X-ray crystallographic refinement 9.10.2 Molecular dynamics refinement of NMR data 9.10.3 Time-averaged NMR refinement 9.11 Structural databases 9.12 Molecular fitting 9.13 Clustering algorithms and pattern recognition techniques 9.14 Reducing the dimensionality of a data set 9.14.1 Principal components analysis 9.15 Covering conformational space: poling 9.16 A "classic" optimisation problem: predicting crystal structures 10. PROTEIN STRUCTURE PREDICTION, SEQUENCE ANALYSIS AND PROTEIN FOLDING 10.1 Introduction 10.2 Some basic principles of protein structure 10.2.1 The hydrophobic effect 10.3 First-principles methods for predicting protein structure 10.3.1 Lattice models for investigating protein structure 10.3.2 Rule-based approaches using secondary structure prediction 10.4 Introduction to comparative modelling 10.5 Sequence alignment 10.5.1 Dynamic programming and the Needleman-Wunsch algorithm 10.5.2 The Smith-Waterman algorithm 10.5.3 Heuristic search methods: FASTA and BLAST 10.5.4 Multiple sequence alignment 10.5.5 Protein structure alignment and structural databases 10.6 Constructing and evaluating a comparative model 10.7 Predicting protein structures by 'threading' 10.8 A comparison of protein structure prediction methods: CASP 10.8.1 Automated protein modelling 10.9 Protein folding and unfolding Appendix 10.1 Some Common Abbreviations and Acronyms Used in Bioinformatics Appendix 10.2 Some of the Most Common Sequence and Structural Databases Used in Bioinformatics Appendix 10.3 Mutation Probability Matrix for 1 PAM Appendix 10.4 Mutation Probability Matrix for 250 PAM 11. FOUR CHALLENGES IN MOLECULAR MODELLING: FREE ENERGIES, SOLVATION, REACTIONS AND SOLID-STATE DEFECTS 11.1 Free energy calculations 11.1.1 The difficulty of calculating free energies by computer 11.2 The calculation of free energy differences 11.2.1 Thermodynamic perturbation 11.2.2 Implementation of free energy perturbation 11.2.3 Thermodynamic integration 11.2.4 The 'slow growth' method 11.3 Applications of methods for calculating free energy differences 11.3.1 Thermodynamic cycles 11.3.2 Applications of the thermodynamic cycle perturbation method 11.3.3 The calculation of absolute free energies 11.4 The calculation of enthalpy and entropy differences 11.5 Partitioning the free energy 11.6 Potential pitfalls with free energy calculations 11.6.1 Implementation aspects 11.7 Potentials of mean force 11.7.1 Umbrella sampling 11.7.2 Calculating the potential of mean force for flexible molecules 11.8 Approximate/"rapid" free energy methods 11.9 Continuum representations of the solvent 11.9.1 Thermodynamic background 11.10 The electrostatic contribution to the free energy of solvation: the Born and Onsager models 11.10.1 Calculating the electrostatic contribution via quantum mechanics 11.10.2 Continuum models for molecular mechanics 11.10.3 The Langevin dipole model 11.10.4 Methods based upon the Poisson-Boltzmann equation 11.10.5 Applications of finite difference Poisson-Boltzmann calculations 11.11 Non-electrostatic contributions to the solvation free energy 11.12 Very simple solvation models 11.13 Modelling chemical reactions 11.13.1 Empirical approaches to simulating reactions 11.13.2 The potential of mean force of a reaction 11.13.3 Combined quantum mechanical/molecular mechanical approaches 11.13.4 Ab initio molecular dynamics and the Car-Parrinello method 11.13.5 Examples of ab initio molecular dynamics simulations 11.14 Modelling solid-state defects 11.14.1 Defect studies of the high-Tc superconductor YBa2Cu3O7-x Appendix 11.1 Calculating Free Energy Differences Using Thermodynamic Integration Appendix 11.2 Using the Slow Growth Method for Calculating Free Energy Differences Appendix 11.3 Expansion of Zwanzig Expression for the Free Energy Difference for the Linear Response Method 12. THE USE OF MOLECULAR MODELLING AND CHEMOINFORMATICS TO DISCOVER AND DESIGN NEW MOLECULES 12.1 Molecular modelling in drug discovery 12.2 Computer representations of molecules, chemical databases and 2D substructure searching 12.3 3D database searching 12.4 Deriving and using three-dimensional pharmacophores 12.4.1 Constrained systematic search 12.4.2 Ensemble distance geometry, ensemble molecular dynamics and genetic algorithms 12.4.3 Clique detection methods for finding pharmacophores 12.4.4 Maximum likelihood method 12.4.5 Incorporating additional geometric features into a 3D pharmacophore 12.5 Sources of data for 3D databases 12.6 Molecular docking 12.6.1 Scoring functions for molecular docking 12.7 Applications of 3D database searching and docking 12.8 Molecular similarity and similarity searching 12.9 Molecular Descriptors 12.9.1 Partition coefficients 12.9.2 Molar refractivity 12.9.3 Topological indices 12.9.4 Pharmacophore keys 12.9.5 Calculating the similarity 12.9.6 Similarity based on 3D properties 12.10 Selecting "diverse" sets of compounds 12.10.1 Data manipulation 12.10.2 Selection of diverse sets using cluster analysis 12.10.3 Dissimiliarity-based selection methods 12.10.4 Partition-based methods for compound selection 12.11 Structure-based de novo ligand design 12.11.1 Locating favourable positions of molecular fragments within a binding site 12.11.2 Connecting molecular fragments in a binding site 12.11.3 Structure-based design methods to design HIV-1 protease inhibitors 12.11.4 Structure-based design of templates for zeolite synthesis 12.12 Quantitative structure-activity relationships 12.12.1 Selecting the compounds for a QSAR analysis 12.12.2 Deriving the QSAR equation 12.12.3 Cross-validation 12.12.4 Interpreting a QSAR equation 12.12.5 Alternatives to multiple linear regression: discriminant analysis, neural networks and classification methods 12.12.6 Principal Components Regression 12.13 Partial least squares 12.13.1 Partial least squares and molecular field analysis 12.14 Combinatorial libraries 12.14.1 The design of "drug-like" libraries 12.14.2 Library enumeration 12.14.3 Combinatorial subset selection 12.14.4 The future INDEX

Click the number to view the image (gif format).

- 1.4
Some of the common molecular graphics representations of molecules,
illustrated using the crystal structure of nicotinamide adenine dinucleotide phosphate
(NADPH) [Reddy et al 1981]. Clockwise, from top left: stick, CPK/space filling,
'balls and stick' and 'tube'. Image generated using InsightII.
- 1.5
Graphical representations of proteins illustrated using the enzyme
dihydrofolate reductase [Bolin et al 1982]. Clockwise from top left: stick, CPK,
'cartoon' and 'ribbon'. Image generated using InsightII.
- 1.7
Graphical representations of the molecular surface of tryptophan.
Clockwise from top left: dots, opaque solid, mesh, transluscent solid. Image generated using InsightII.
- 2.11
Surface representation
of electron density around formamide at a contour of 0.0001au (electrons/bohr3). Image generated using Spartan.
- 2.12
HOMO of formamide. The red contour indicates a
negative part of the wavefunction
and blue a positive part of the wavefunction.
The formamide molecule is oriented with the oxygen
atom on the left pointing towards the viewer as in Figure 2.17. Image generated using Spartan.
- 2.13
LUMO of formamide. Image generated using Spartan.
- 2.18
Electrostatic potential mapped onto the electron density surface of
formamide. The orientation of the molecule is as in Figure 2.17.
Red indicates negative electrostatic potential and blue is positive potential. Image generated using Spartan.
- 5.36
The zeolite NU-87. Image generated using InsightII.
- 7.21
Snapshot from molecular dynamics simulation of a solvated lipid bilayer
[Robinson et al 1995]. The disorder
of the alkyl chains can be clearly seen. Image generated using InsightII with
data from Alan Robinson.
7.24. Graphical representation of final configurations obtained from dissipative particle dynamics simulations on block copolymers. Figures redrawn from [Groot and Madden 1998]. Images generated using Cerius2.

- 7.24(a) shows the lamellar phase obtained for the A5B5 system.
- 7.24(b) shows the hexagonal phase from A3B7
- 7.24(c)
shows the body-centred-cubic phase for A2B8.
- 8.21
Final configuration obtained from a configurational bias Monte Carlo
simulation of thioalkanes adsorbed on a gold surface [Siepmann and MacDonald
1993a]. The system contains 224 molecules which are colour coded according to the
number of gauche defects, with red chains being all trans, yellow chains containing
three gauche bonds and green chains containing five gauche bonds. Data
and figure supplied by J. Ilja Siepmann.
- 9.18
Twelve conformations of the chemokine RANTES generated from NMR data
using distance geometry.
[Chung et al 1995]. Image generated using InsightII.
- 9.23
Fitting a polypeptide chain to the electron density when
determining the structure of a protein using X-ray crystallography.
The figure shows part of the structure of rat ADP-ribosylation
factor-1 (ARF-1) [Greasley et al 1995]. Image generated using Quanta.
- 9.27
Distribution of hydroxyl groups around thiazole ring systems as extracted from the Cambridge Structural Database, illustrating the greated
propensity of the nitrogen atom to act as a hydrogen-bond acceptor. Image generated using InsightII with data from Isostar/Cambridge Structural Database.
- 10.9
Trypsin [Turk et al 1991],
chymotrypsin [Birktoft and Blow 1972] and thrombin [Turk et al
1992] have similar three-dimensional structures. Image generated using InsightII.
- 10.11
A superposition of the aspartic acid, histidine and serine
amino acids in the active sites of trypsin
(yellow), chymotrypsin (red) and thrombin (green) Image generated using InsightII.
- 11.29
3D Electrostatic isopotential contours around trypsin [Marquart et al 1983].
Contours are drawn at -1kT (red) and +1kT (blue).
The trypsin inhibitor is also
shown with its electrostatic potential mapped onto the molecular surface.
Figure generated using GRASP.
- 11.30
Electrostatic potential around Cu-Zn superoxide dismutase [McRee et al
1990]. Red contours indicate negative electrostatic potential and blue contours indicate
positive electrostatic potential. Two active sites are present in the dimer, and
are located at the top left and bottom right of the Figure where there is
a significant concentration of positive electrostatic potential. Figure generated
using GRASP.
- 11.12
Surface complementarirty of streptavidin (purple) and biotin (white)
[Freitag et al 1997]. Image generated using InsightII.
12.16 3D pharmacophore derived for a series of molecules with activity at the 5HT3 receptor. The spheres indicate location constraints where an appropriate pharmacophore group should be located (red: positively ionisable, green: hydrogen-bond acceptor, blue: hydrophobic region).

- 12.16(a) A very active molecule, JMC-35-903-10 superimposed on the pharmacophore.
- 12.16(b)
A much less potent molecule, 2Me-5HT. The inactive molecule is not able to match all of the points in the pharmacophore in a low-energy conformation. Images generated using catalyst.
- 12.32
The result of a GRID calculation using carboxylate and amidine probes
in the binding site of neuraminidase.
The regions of minimum energy are contoured
(carboxylate red; amidine blue).
Also shown is the inhibitor 4-guanidino-Neu5Ac2en
which contains these two functional groups [von Itzstein et al 1993]. Image generated using InsightII.
- 12.34
The HIV-1 protease with the inhibitor CGP53820 bound [Priestle et al 1995].
The water molecule that hydrogen bonds to the inhibitor and to the 'flaps'
is drawn as a white sphere, and the catalytic aspartate groups of the enzyme
are also represented. Image generated using InsightII.
- 12.41 Contour representation of key features from a CoMFA analysis of a series of coumarin substrates and inhibitors of cytochrome P4502A5 [Poso et al 1995]. The red and blue regions indicate positions where it would be favourable and unfavourable respectively to place a negative charge and the green/yellow regions where it would be favourable/unfavourable to locate steric bulk. Image generated using Sybyl.

AM1 | Austin Model 1 |

AO | Atomic Obital |

BSSE | Basis-Set Superposition Error |

CI | Configuration Interaction |

CIS | Configuration Interaction Singles |

CISD | Configuration Interaction Singles and Doubles |

CNDO | Complete Neglect of Differential Overlap |

DFT | Density Functional Theory |

DIIS | Direct Inversion of Iterative Subspace |

DVP | Double Zeta with Polarisation |

DZ | Double Zeta |

EHT | Extended Huckel Theory |

GVB | Generalised Valence Bond model |

HF | Hartree-Fock |

HOMO | Highest Occupied Molecular Orbital |

INDO | Intermediate Neglect of Differential Overlap |

LCAO | Linear Combination of Atomic Orbitals |

LUMO | Lowest Unoccupied Molecular Orbital |

MBPT | Many-body Perturbation Theory |

MINDO/3 | Modified INDO version 3 |

MNDO | Modified Neglect of Diatomic Overlap |

MO | Molecular Orbital |

MP2, MP3 etc | Moller-Plesset theory at second order, third order etc. |

NDDO | Neglect of Diatomic Differential Overlap |

PM3 | Parameterisation 3 of MNDO |

QCISD | Quadratic Configuration Interaction Singles and Doubles |

RHF | Restricted Hartree Fock |

SAM1 | Semi-Ab-initio Model 1 |

SCF | Self-Consistent Field |

STO | Slater Type Orbital |

STO-3G, STO-4G, etc. | Minimal basis sets in which 3, 4 etc, Gaussian functions are used to represent the atomic orbitals on an atom |

UHF | Unrestricted Hartree Fock |

ZDO | Zero Differential Overlap |

CASSCF | Complete Active Space Self-Consistent Field |

QCISD(T) | Configuration interation method involving single, double and quadratic excitations with an estimated triple excitation |

LSDFT | Local Spin Density Functional Theory |

LDA | Local Density Approximation |

BLYP | Becke-Lee-Yang-Parr gradient-corrected functional for use with density functional theory |

WVN | correlation functional due to Wilk, Vosko and Nusair |

B3LYP | Scheme for hybrid Hartree-Fock/Density functional theory introduced by Becke |

Appendix 10.1 Some common abbreviations and acronyms used in bioinformatics

A, G, C, T (U) | Adenine, Guanine, Cytosine, Thymine - the four bases present in DNA. Uracil replaces thymine in RNA |

Bp | Base pair |

cDNA | Complementary DNA, synthesised from mesenger RNA |

Chromosome | Discrete unit of the genome consisting of a single molecule of DNA that carries many genes. |

Clone | Genetically identical copy (of a gene, cell or organism) |

Codon | Sequence of three nucleotides that codes for a single amino acid (or a termination signal) |

Contig | A group of pieces of DNA, derived from a cloning experiment (often a series of ESTs, see below), that represent overlapping regions of a chromosome. |

Deletion | One or nucleotides that are not copied during DNA replication |

DNA | Deoxyribose nucleic acid |

Domain | Sequence of polypeptide chain that can independently fold into a stable three-dimensional structure |

Dynamic programming | Technique widely used in sequence alignment |

EST | Expressed Sequence Tag. An EST is a partial sequence (typically less than 400 bases) selected from cDNA and used to identify genes expressed in a particular tissue. |

Eukaryote | Organism whose cells have a discrete nucleus and other subcellular compartments (cf. prokaryote) |

Exon | Translated sequence of DNA |

Gap | A break in DNA or protein sequence which enables two or more sequences to be aligned |

gene | A sequence of DNA at a particular position on a specific chromosome that encodes a precise functional product (usually protein) |

Genome | All of the genetic material in the chromosomes of an organism |

Indel | Insertion or deletion required to optimise sequence alignment |

Intron | Non-translated sequence of DNA |

Kb | Kilobase - one thousand nucleotide bases |

ktup | k-tuple. Parameter used in FASTA and FASTP sequence alignment methods |

Mbp | Megabase - one million nucleotide bases |

mRNA | Messenger RNA |

Mutation | A change in the DNA sequence |

Nucleotide | Three components that make up the basic building block in DNA and RNA: a nitrogenous base (A, T, G, C, U), a phosphate and a sugar |

Oligonucleotide | A molecule composed of a small number of nucleotides |

Orthologue | Homologous proteins that perform the same function within different organisms |

ORF | Open Reading Frame - region of DNA that is transcribed into RNA. Delineated by an initiator codon at one end and a stop codon at the other end. |

PAM | Point Accepted Mutation per 100 residues |

Paralogue | Homologous proteins that perform different but related functions within one organism |

PCR | Polymerase Chain Reaction. Widely used method for amplifying a DNA base sequence |

Polymorphism | Differences in DNA sequence among individuals |

Prokaryote | Organism lacking a nucleus and subcellular compartments (cf. eukaryote). Includes bacteria and viruses |

RNA | Ribonucleic acid |

SNP | Single Polynucleotide Polymorphism - single base-pair variations in DNA |

STS | Sequence tagged site. A short DNA sequence that occurs just once in the human genome and whose locatino and base sequence are known. |

Transcription | First step in gene expression, corresponding to the generation of mRNA from the original DNA |

Translation | Second step in gene expression, the synthesis of proteins from mRNA |

tRNA | Transfer RNA |

Appendix 10.2. Some of the most common sequence and structural databases used in bioinformatics

GenBank (NCBI, USA)
EMBL Nucleotide Sequence Database (Europe)
DDBJ (Japan) | The three main nucleotide sequence databases which are synchronised daily |

PIR-International Protein Sequence Database | Redundant protein sequence database |

Swiss-Prot, TrEMBL | Annotated non-redundant protein sequence database. TrEMBL is a computer- annotated supplement to Swiss-Prot. TrEMBL contains the translations of all coding sequences present in the EMBL Nucleotide Sequence Database, which are not yet integrated into Swiss-Prot |

GenPept | Compendium of amino acid translations derived from GenBank |

PDB,
NRL3D | Protein Data Bank - protein structures (mostly from X-ray crystallography). NRL3D is a derived sequence database in PIR format. |

SCOP
| Structural Classification of Proteins. Hierarchical protein structure database |

CATH,
FSSP
| Sequence-structure classification databases |

Prosite
| Motif database. |

References for colour figures

Reddy B. S., W. Saenger, K. Muehlegger and G Weimann 1981. Crystal and molecular structure of the lithium salt of nicotinamide adenine dinucleotide dihydrate NAD, DPN, cozymase, codehydrase I. Journal of the American Chemical Society. 103 907-14

Bolin J T, D J Filman, D A Matthews, R C Hamlin and J Kraut 1982. Crystal Structures of Escherichia coli and Lactobacillus casei Dihydrofolate Reductase Refined at 1.7 ngstroms Resolution. I. Features and Binding of Methotrexate. Journal of Biological Chemistry 25713650-13662.

Robinson A J, W G Richards, P J Thomas and M M Hann 1994. Head Group and Chain Behaviour in Biological Membranes-A Molecular Dynamics Simulation. Biophysical Journal 672345-2354.

Groot R D and T J Madden 1998. Dynamics simulation of diblock copolymer microphase separation. The Journal of Chemical Physics 108:8713-8724.

Siepmann J I and I R McDonald 1993b. Monte Carlo Study of the Properties of Self-Assembled Monolayers Formed by Adsorption of CH3CH215SH on the 111 Surface of Gold. Molecular Physics 79457-473.

Chung C-W, R M Cooke, A E I Proudfoot and T N C Wells 1995. The Three-Dimensional Structure of RANTES. Biochemistry 349307-9314.

Greasley S E, H Jhoti, C Teahan, R Solari, A Fensom, G M H Thomas, S Cockroft and B Bax 1995. The Structure of Rat ADP-Ribosylation Factor-1 ARF-1 Complexed to GDP Determined from Two Different Crystal Forms. Nature Structural Biology 2797-806.

Turk D, J Sturzebecher and W Bode 1991. Geometry of Binding of the N-Alpha-Tosylated Piperidides of meta-Amidino-Phenylalanine, Para Amidino-Phenylalanine and para-Guanidino-Phenylalanine to Thrombin and Trypsin-X-ray Crystal Structures of Their Trypsin Complexes and Modeling of their Thrombin Complexes. Febs Letters 287133-138.

Birktoft J J and D M Blow 1972. The structure of Crystalline Alpha-Chymotrypsin V. The Atomic Structure of Tosyl-Alpha-Chymotrypsin at 2 Angstroms Resolution. Journal Of Molecular Biology 68187-240.

Turk D, H W Hoeffken, D Grosse, J Stuerzebecher, P D Martin, B F P Edwards and W Bode 1992. Refined 2.3 ngstroms X-Ray Crystal Structure of Bovine Thrombin Complexes Formed with the 3 Benzamidine and Arginine-Based Thrombin Inhibitors NAPAP, 4-TAPAP and MQPA A Starting Point for Improving Antithrombotics. Journal Of Molecular Biology 2261085-1099.

Bruno I J, J C Cole, J P M Lommerse, R S Rowland, R Taylor and M L Verdonk 1997. Isostar a library of information about nonbonded interactions. The Journal of Computer-Aided Molecular Design 11525-537

Marquart M, J Walter, J Deisenhofer, W Bode, R Huber 1983. The Geometry of the Reactive Site and of the Peptide Groups in Trypsin, Trypsinogen and its Complexes with Inhibitors. Acta Crystallographica B39480-490.

McRee D E, S M Redford, E D Getzoff, J R Lepock, R A Hallewell and J A Tainer 1990. Changes in Crystallographic Structure and Thermostability of a Cu, Zn Superoxide Dismutase Mutant Resulting from the Removal of Buried Cysteine. Journal of Biological Chemistry 26514234-14241.

Freitag S, I Le Trong, P S Stayton and R E Stenkamp 1997. Structural Studies of the Streptavidin Binding Loop. Protein Science 61157-

Priestle J P, A Fassler, J Rosel, M Tintelnog-Blomley, P Strop and M G Gruetter 1995. Comparative Analysis of The X-Ray Structures of HIV-1 and HIV-2 Proteases in Complex with a Novel Pseudosymmetric Inhibitor. Structure London 3381-389.

Poso A, R Juvonen and J Gynther 1995. Comparative molecular field analysis of compounds with CYP2A5 binding affinity. Quantitative Structure-Activity Relationships 14507-511

Von Itzstein M, W Y Wu, G B Kok, M S Pegg, J C Dyason, B Jin, T V Phan, M L Smythe, H F Whites, S W Oliver, P M Colman, J N Varghese, D M Ryan, J M Woods, R C Bethell, V J Hotham, J M Cameron and C R Penn 1993. Rational Design of Potent Sialidase-Based Inhibitors of Influenza-Virus Replication. Nature 363:418-423.

Comments, questions, corrections?

Click here to send email (if your browser supports the "mailto" command)