Molecular Modelling: Principles and Applications, second edition

Molecular Modelling: Principles and Applications, second edition

Andrew R. Leach

Published by Pearson Education EMA, January 2001

On this page:
Preface to the second edition
Preface to the first edition
Comprehensive contents listing
Colour figures
Appendices (Acronyms in Quantum Chemistry, Bioinformatics abbreviations and acronyms, Sequence/structure databases)
To send email
Separate pages for:
3D visualisation using Chime

This book provides a detailed description of the techniques employed in molecular modelling and computational chemistry. The first part of the book covers the two major methods used to describe the interactions within a system (quantum mechanics and molecular mechanics). The second part then deals with techniques that use such energy models, including energy minimisation, molecular dynamics, Monte Carlo simulations and conformational analysis. The author also discusses the use of more advanced modelling techniques such as the calculation of free energies and the simulation of chemical reactions. In addition he considers aspects of both chemoinformatics and bioinformatics and techniques that can be used to design new molecules with specific properties. Many of the topics are treated in considerable depth but the reader is assumed to have but a basic knowledge of the relevant physical and chemical principles.

Most of the theoretical sections are accompanied by simple calculations together with examples drawn from the literature. The book is well illustrated and a colour plate section highlights the impact of computer molecular graphics. The book will prove a valuable text for postgraduate students and professionals and many sections will be useful to final-year undergraduates taking courses in molecular modelling or computational chemistry.

Preface to the second edition

The impetus for this second edition is a desire to include some of the new techniques that have emerged in recent years and also extend the scope of the book to cover certain areas that were under-represented (even neglected) in the first edition. In this second volume there are three topics that fall into the first category (density functional theory, bioinformatics/protein structure analysis and chemoinformatics) and one main area in the second category (modelling of the solid-state). In addition, of course, a new edition provides an opportunity to take a critical view of the text and to re-organise and update the material. Thus whilst much remains from the first edition, and this second book follows much the same path through the subject, readers familiar with the first edition will find some changes which I hope they will agree are for the better.

As with the first edition we initially consider quantum mechanics, but this is now split into two chapters. Thus Chapter 2 provides an introduction to the ab initio and semi-empirical approaches together with some examples of the uses of quantum mechanics. Chapter 3 covers more advanced aspects of the ab initio approach, density functional theory and the particular problems of the solid-state. Molecular mechanics is the subject of chapter 4 and then in Chapter 5 we consider energy minimisation and other "static" techniques. Chapters 6, 7 and 8 deal with the two main simulation methods (molecular dynamics and Monte Carlo). Chapter 9 is devoted to the conformational analysis of "small" molecules but also includes some topics (e.g. cluster analysis, principal components analysis) that are widely used in informatics. In Chapter 10 the problems of protein structure prediction and protein folding are considered; this chapter also contains an introduction to some of the more widely used methods in bioinformatics. In Chapter 11 we draw upon material from the previous chapters in a discussion of free energy calculations, continuum solvent models, methods for simulating chemical reactions and defects in solids. Finally, Chapter 12 is concerned with modelling and chemoinformatics techqniques for discovering and designing new molecules, including database searching, docking, de novo design, quantitative structure-activity relationships and combinatorial library design.

As in the first edition, the inexorable pace of change means that what is currently considered "cutting edge" will soon become routine. The examples are thus chosen primarily because they illuminate the underlying theory rather than because they are the first application of a particular technique or are the most recent available. In a similar vein, it is impossible in a volume such as this to even attempt to cover everything and so there are undoubtedly areas which are under-represented. This is not intended to be a definitive historical account nor a review of the current state- of-the-art. Thus, whilst I have tried to include many literature references it is possible that the invention of some technique may appear to be incorrectly attributed or a " ;classic" application may be missing. A general guiding principle has been to focus on those techniques that are in widespread use rather than those which are the province of one particular research group. Despite these caveats I hope that the coverage is sufficient to provide an solid introduction to the main areas and also that those readers who are "experts" will find something new to interest them.

Preface to the first edition

Molecular modelling used to be restricted to a small number of scientists who had access to the necessary computer hardware and software. Its practitioners wrote their own programs, managed their own computer systems and mended them when they broke down. Today's computer workstations are much more powerful than the mainframe computers of even a few years ago and can be purchased relatively cheaply. It is no longer necessary for the modeller to write computer programs as software can be obtained from commercial software companies and academic laboratories. Molecular modelling can now be performed in any laboratory or classroom.

This book is intended to provide an introduction to some of the techniques used in molecular modelling and computational chemistry, and to illustrate how these techniques can be used to study physical, chemical and biological phenomena. A major objective is to provide, in one volume, some of the theoretical background to the vast array of methods available to the molecular modeller. I also hope that the book will help the reader to select the most appropriate method for a problem and so make the most of his or her modelling hardware and software. Many modelling programs are extremely simple to use and are often supplied with seductive graphical interfaces which obviously helps to make modelling techniques more accessible, but it can also be very easy to select a wholly inappropriate technique or method.

Most molecular modelling studies involve three stages. In the first stage a model is selected to describe the intra- and inter- molecular interactions in the system. The two most common models that are used in molecular modelling are quantum mechanics and molecular mechanics. These models enable the energy of any arrangement of the atoms and molecules in the system to be calculated, and allow the modeller to determine how the energy of the system varies as the positions of the atoms and molecules change. The second stage of a molecular modelling study is the calculation itself, such as an energy minimisation, a molecular dynamics or Monte Carlo simulation, or a conformational search. Finally, the calculation must be analysed, not only to calculate properties but also to check that it has been performed properly.

The book is organised so that some of the techniques discussed in later chapters refer to material discussed earlier, though I have tried to make each chapter as independent of the others as possible. Some readers may therefore be pleased to know that it is not essential to completely digest the chapters on quantum mechanics and molecular mechanics in order to read about methods for searching conformational space! Readers with experience in one or more areas may of course wish to be more selective.

I have tried to provide as much of the underlying theory as seems appropriate to enable the reader to understand the fundamentals of each method. In doing so I have assumed some background knowledge of quantum mechanics, statistical mechanics, conformational analysis and mathematics. A reader with an undergraduate degree in chemistry should have covered this material, which should also be familiar to many undergraduates in the final year of their degree course. Full discussions can be found in the suggestions for further reading at the end of each chapter. I have also attempted to provide a reasonable selection of original references, though in a book of this scope it is obviously impossible to provide a comprehensive coverage of the literature. In this context, I apologise in advance if any technique is inappropriately inattributed.

In Chapter 1 we consider some of the historical background to molecular modelling and discuss a number of important general principles that are common to many modelling methods. We also examine the use of computer graphics, the Internet and the World-Wide Web and the molecular modelling literature. Chapter 1 concludes with a brief summary of some relevant mathematical concepts. Chapters 2 and 3 describe quantum mechanics and molecular mechanics , which are the two major methods used to model the interactions within a molecular system. These methods can be used to calculate the energy of a given arrangement of the atoms as well as certain other properties. In chapters 4-8 we examine energy minimisation, molecular dynamics, Monte Carlo simulations and conformational analysis. These techniques use an appropriate energy model to determine a wide range of structural and thermodynamic properties. The final two chapters describe various techniques that combine concepts from previous chapters. In Chapter 8 we discuss the calculation of free energies using computer simulation , continuum solvent models, and methods for simulating chemical reactions. Chapter 9 is concerned with computational methods for discovering and designing new molecules, such as database searching , de novo design and quantitative structure-activity relationships.

The range of systems that can be considered in molecular modelling is extremely broad, from isolated molecules through simple atomic and molecular liquids to polymers, biological macromolecules such as proteins and DNA and solids. Many of the techniques are illustrated with examples chosen to reflect the breadth of applications. It is inevitable that for reasons of space some techniques must be dealt with in a rudimentary fashion (or not at all), and that many interesting and important applications cannot be described. Molecular modelling is a rapidly developing discipline, and has benefitted from the dramatic improvements in computer hardware and software of recent years. Calculations that were major undertakings only a few years ago can now be performed using personal computing facilities. Thus, examples used to indicate the 'state of the art' at the time of writing will invariably be routine within a short time.


Preface to the second edition 

Preface to the first edition

Symbols and physical constants 



1.1 Introduction 

1.2 Coordinate systems 

1.3 Potential energy surfaces 

1.4 Molecular graphics 

1.5 Surfaces 

1.6 Computer hardware and software 

1.7 Units of length and energy 

1.8 The molecular modelling literature 

1.9 The Internet 

1.10 Mathematical concepts 

  1.10.1 Series expansions 

  1.10.2 Vectors 

  1.10.3 Matrices, eigenvectors and eigenvalues 

  1.10.4 Complex numbers 

  1.10.5 Lagrange multipliers 

  1.10.6 Multiple integrals 

  1.10.7 Some basic elements of statistics 

  1.10.8 The Fourier series, Fourier transform and fast-Fourier transform 


2.1 Introduction 

  2.1.1 Operators 

  2.1.2 Atomic units 

  2.1.3 Exact solutions to the Schrödinger equation 

2.2 One-electron atoms 

2.3 Polyelectronic atoms and molecules 

  2.3.1 The Born-Oppenheimer approximation 

  2.3.2 The helium atom 

  2.3.3 General polyelectronic systems and Slater determinants 

2.4 Molecular orbital calculations 

  2.4.1 Calculating the energy from the wavefunction: the hydrogen molecule 

  2.4.2 The energy of a general polyelectronic system 

  2.4.3 Shorthand representations of the one- and two-electron integrals 

  2.4.4 The energy of a closed-shell system 

2.5 The Hartree-Fock equations 

  2.5.1 Hartree-Fock calculations for atoms and Slater's rules 

  2.5.2 Linear combination of atomic orbitals (LCAO) in Hartree-Fock theory 

  2.5.3 Closed-shell systems and the Roothaan-Hall equations 

  2.5.4 Solving the Roothaan-Hall equations 

  2.5.5 A simple illustration of the Roothaan-Hall approach 

  2.5.6 Application of the Hartree-Fock equations to molecular systems 

2.6 Basis sets 

  2.6.1 Creating a basis set 

2.7 Calculating molecular properties using ab initio quantum mechanics 

  2.7.1 Setting up the calculation and the choice of coordinates 

  2.7.2 Energies, Koopman's theorem and ionisation potentials 

  2.7.3 Calculation of electric multipoles 

  2.7.4 The total electron density distribution and molecular orbitals 

  2.7.5 Population analysis 

  2.7.6 Mulliken and Löwdin population analysis 

  2.7.7 Partitioning electron density: the theory of atoms in molecules 

  2.7.8 Bond orders 

  2.7.9 Electrostatic potentials 

  2.7.10 Thermodynamic and structural properties 

2.8 Approximate molecular orbital theories 

2.9 Semi-empirical methods 

  2.9.1 Zero-differential overlap 

  2.9.2 CNDO 

  2.9.3 INDO 

  2.9.4 NDDO 

  2.9.5 MINDO/3 

  2.9.6 MNDO 

  2.9.7 AM1 

  2.9.8 PM3 

  2.9.9 SAM1 

  2.9.10 Programs for semi-empirical quantum mechanical calculations 

2.10 Hückel theory 

  2.10.1 Extended Hückel theory 

2.11 Performance of semi-empirical methods 

Appendix 2.1 Some Common Acronyms Used in Computational Quantum Chemistry 


3.1 Introduction 

3.2 Open-shell systems 

3.3 Electron correlation 

  3.3.1 Configuration interaction 

  3.3.2 Many body perturbation theory 

3.4 Practical considerations when performing ab initio calculations 

  3.4.1 Convergence of self-consistent field calculations 

  3.4.2 The direct SCF method 

  3.4.3 Calculating derivatives of the energy 

  3.4.4 Basis set superposition error 

3.5 Energy component analysis 

  3.5.1 Morokuma analylsis of the water dimer 

3.6 Valence bond theories 

3.7 Density functional theory 

  3.7.1 Spin-polarised density functional theory 

  3.7.2 The exchange-correlation functional 

  3.7.3 Beyond the local density approximation: gradient-corrected functionals 

  3.7.4 Hybrid Hartree-Fock/Density Functional Methods 

  3.7.5 Performance and applications of density functional theory 

3.8 Quantum mechanical methods for studying the solid-state 

  3.8.1 Introduction 

  3.8.2 Band theory and orbital-based approaches 

  3.8.3 The periodic Hartree-Fock approach to studying the solid state 

  3.8.4 The nearly-free electron approximation 

  3.8.5 The Fermi surface and density of states 

  3.8.6 Density Functional Methods for studying the solid state: plane waves and pseudopotentials 

  3.8.7 Application of solid-state quantum mechanics to the group 14 elements 

3.9 The future role of quantum mechanics: theory and experiment working together 

Appendix 3.1 Alternative Expression for a Wavefunction Satisfying Bloch's Function 


4.1 Introduction 

  4.1.1 A simple molecular mechanics force field 

4.2 Some general features of molecular mechanics force fields 

4.3 Bond stretching 

4.4 Angle bending 

4.5 Torsional terms 

4.6 Improper torsions and out-of-plane bending motions 

4.7 Cross terms Class 1, 2 and 3 force fields 

4.8 Introduction to non-bonded interactions 

4.9 Electrostatic interactions 

  4.9.1 The central multipole expansion 

  4.9.2 Point-charge electrostatic models 

  4.9.3 Calculating partial atomic charges 

  4.9.4 Charges derived from the molecular electrostatic potential 

  4.9.5 Deriving charge models for large systems 

  4.9.6 Rapid methods for calculating atomic charges 

  4.9.7 Beyond partial atomic charge models 

  4.9.8 Distributed multipole models 

  4.9.9 Using charge schemes to study aromatic-aromatic interactions 

  4.9.10 Polarisation 

  4.9.11 Solvent dielectric models 

4.10 van der Waals interactions 

  4.10.1 Dispersive interactions 

  4.10.2 The repulsive contribution 

  4.10.3 Modelling van der Waals interactions 

  4.10.4 van der Waals interactions in polyatomic systems 

  4.10.5 Reduced units 

4.11 Many-body effects in empirical potentials 

4.12 Effective pair potentials 

4.13 Hydrogen bonding in molecular mechanics 

4.14 Force field models for the simulation of liquid water 

  4.14.1 Simple water models 

  4.14.2 Polarisable water models 

  4.14.3 Ab initio potentials for water 

4.15 United atom force fields and reduced representations 

  4.15.1 Other simplified models 

4.16 Derivatives of the molecular mechanics energy function 

4.17 Calculating thermodynamic properties using a force field 

4.18 Force field parametrisation 

4.19 Transferability of force field parameters 

4.20 The treatment of delocalised ?-systems 

4.21 Force fields for inorganic molecules 

4.22 Force fields for solid-state systems 

  4.22.1 Covalent solids: zeolites 

  4.22.2 Ionic solids 

4.23 Empirical potentials for metals and semiconductors 

Appendix 4.1 The Interaction Between Two Drude Molecules 


5.1 Introduction 

  5.1.1 Energy minimisation: statement of the problem 

  5.1.2 Derivatives 

5.2 Non-derivative minimisation methods 

  5.2.1 The simplex method 

  5.2.2 The sequential univariate method 

5.3 Introduction to derivative minimisation methods 

5.4 First-order minimisation methods 

  5.4.1 The steepest descents method 

  5.4.2 Line search in one dimension 

  5.4.3 Arbitrary step approach 

  5.4.4 Conjugate gradients minimisation 

5.5 Second derivative methods: the Newton-Raphson method 

  5.5.1 Variants on the Newton-Raphson method 

5.6 Quasi-Newton methods 

5.7 Which minimisation method should I use? 

  5.7.1 Distinguishing between minima, maxima and saddle points 

  5.7.2 Convergence criteria 

5.8 Applications of energy minimisation 

  5.8.1 Normal mode analysis 

  5.8.2 The study of intermolecular processes 

5.9 Determination of transition structures and reaction pathways 

  5.9.1 Methods to locate saddle points 

  5.9.2 Reaction path following 

  5.9.3 Transition structures and reaction pathways for large systems 

  5.9.4 The transition structures of pericyclic reactions 

5.10 Solid-state systems: lattice statics and lattice dynamics 


6.1 Introduction 

  6.1.1 Time averages, ensemble averages and some historical background 

  6.1.2 A brief description of the molecular dynamics method 

  6.1.3 The basic elements of the Monte Carlo method 

  6.1.4 Differences between the molecular dynamics and Monte Carlo methods 

6.2 Calculation of simple thermodynamic properties 

  6.2.1 Energy 

  6.2.2 Heat capacity 

  6.2.3 Pressure 

  6.2.4 Temperature 

  6.2.5 Radial distribution functions 

6.3 Phase space 

6.4 Practical aspects of computer simulation 

  6.4.1 Setting up and running a simulation 

  6.4.2 Choosing the initial configuration 

6.5 Boundaries 

  6.5.1 Periodic boundary conditions 

  6.5.2 Non-periodic boundary methods 

6.6 Monitoring the equilibration 

6.7 Truncating the potential and the minimum image convention 

  6.7.1 Non-bonded neighbour lists 

  6.7.2 Group-based cutoffs 

  6.7.3 Problems with cutoffs and how to avoid them 

6.8 Long-range forces 

  6.8.1 The Ewald summation method 

  6.8.2 The reaction field and image charge methods 

  6.8.3 The cell multipole method for non-bonded interactions 

6.9 Analysing the results of a simulation and estimating errors 

Appendix 6.1 Basic Statistical Mechanics 

Appendix 6.2 Heat Capacity and Energy Flucutations 

Appendix 6.3 The Real Gas Contribution to the Virial 

Appendix 6.4 Translating Particle Back into Central Box 


7.1 Introduction 

7.2 Molecular dynamics using simple models 

7.3 Molecular dynamics with continuous potentials 

  7.3.1 Finite difference methods 

  7.3.2 Predictor-corrector integration methods 

  7.3.3 Which integration algorithm is most appropriate? 

  7.3.4 Choosing the time step 

  7.3.5 Multiple time step dynamics 

7.4 Setting up and running a molecular dynamics simulation 

  7.4.1 Calculating the temperature 

7.5 Constraint dynamics 

7.6 Time-dependent properties 

  7.6.1 Correlation functions 

  7.6.2 Orientational correlation functions 

  7.6.3 Transport properties 

7.7 Molecular dynamics at constant temperature and pressure 

  7.7.1 Constant temperature dynamics 

  7.7.2 Constant pressure dynamics 

7.8 Incorporating solvent effects into molecular dynamics: potentials of mean force and stochastic dynamics 

  7.8.1 Practical aspects of stochastic dynamics simulations 

7.9 Conformational changes from molecular dynamics simulations 

7.10 Molecular dynamics simulations of chain amphiphiles 

  7.10.1 Simulation of lipids 

  7.10.2 Simulations of Langmuir-Blodgett films 

  7.10.3 Mesoscale modelling: Dissipative Particle Dynamics 

Appendix 7.1 Energy Conservation in Molecular Dynamics 


8.1 Introduction 

8.2 Calculating properties by integration 

8.3 Some theoretical background to the Metropolis method 

8.4 Implementation of the Metropolis Monte Carlo method 

  8.4.1 Random number generators 

8.5 Monte Carlo simulation of molecules 

  8.5.1 Rigid molecules 

  8.5.2 Monte Carlo simulations of flexible molecules 

8.6 Models used in Monte Carlo simulations of polymers 

  8.6.1 Lattice models of polymers 

  8.6.2 'Continuous' polymer models 

8.7 'Biased' Monte Carlo methods 

8.8 Tackling the problem of quasi-ergodicity: J-walking and multicanonical Monte Carlo 

  8.8.1 J-walking 

  8.8.2 The multicanonical Monte Carlo method 

8.9 Monte Carlo sampling from different ensembles 

  8.9.1 Grand canonical Monte Carlo simulations 

  8.9.2 Grand canonical Monte Carlo simulations of adsorption processes 

8.10 Calculating the chemical potential 

8.11 The configurational bias Monte Carlo method 

  8.11.1 Applications of the configurational bias Monte Carlo method 

8.12 Simulating phase equilibria by the Gibbs ensemble Monte Carlo method 

8.13 Monte Carlo or molecular dynamics? 

Appendix 8.1 The Marsaglia Random Number Generator 


9.1 Introduction 

9.2 Systematic methods for exploring conformational space 

9.3 Model-building approaches 

9.4 Random search methods 

9.5 Distance geometry 

  9.5.1 The use of distance geometry in NMR 

9.6 Exploring conformational space using simulation methods 

9.7 Which conformational search method should I use? A comparison of different approaches 

9.8 Variations upon the standard methods 

  9.8.1 The systematic unbounded multiple minimum method (SUMM) 

  9.8.2 Low-Mode Search 

9.9 Finding the global energy minimum: Evolutionary algorithms and simulated annealing 

  9.9.1 Genetic and evolutionary algorithms 

  9.9.2 Simulated annealing 

9.10 Solving protein structures using restrained molecular dynamics and simulated annealing 

  9.10.1 X-ray crystallographic refinement 

  9.10.2 Molecular dynamics refinement of NMR data 

  9.10.3 Time-averaged NMR refinement 

9.11 Structural databases 

9.12 Molecular fitting 

9.13 Clustering algorithms and pattern recognition techniques 

9.14 Reducing the dimensionality of a data set 

  9.14.1 Principal components analysis 

9.15 Covering conformational space: poling 

9.16 A "classic" optimisation problem: predicting crystal structures 


10.1 Introduction 

10.2 Some basic principles of protein structure 

  10.2.1 The hydrophobic effect 

10.3 First-principles methods for predicting protein structure 

  10.3.1 Lattice models for investigating protein structure 

  10.3.2 Rule-based approaches using secondary structure prediction 

10.4 Introduction to comparative modelling 

10.5 Sequence alignment 

  10.5.1 Dynamic programming and the Needleman-Wunsch algorithm 

  10.5.2 The Smith-Waterman algorithm 

  10.5.3 Heuristic search methods: FASTA and BLAST 

  10.5.4 Multiple sequence alignment 

  10.5.5 Protein structure alignment and structural databases 

10.6 Constructing and evaluating a comparative model 

10.7 Predicting protein structures by 'threading' 

10.8 A comparison of protein structure prediction methods: CASP 

  10.8.1 Automated protein modelling 

10.9 Protein folding and unfolding 

Appendix 10.1 Some Common Abbreviations and Acronyms Used in Bioinformatics 

Appendix 10.2 Some of the Most Common Sequence and Structural Databases Used in Bioinformatics 

Appendix 10.3 Mutation Probability Matrix for 1 PAM 

Appendix 10.4 Mutation Probability Matrix for 250 PAM 


11.1 Free energy calculations 

  11.1.1 The difficulty of calculating free energies by computer 

11.2 The calculation of free energy differences 

  11.2.1 Thermodynamic perturbation 

  11.2.2 Implementation of free energy perturbation 

  11.2.3 Thermodynamic integration 

  11.2.4 The 'slow growth' method 

11.3 Applications of methods for calculating free energy differences 

  11.3.1 Thermodynamic cycles 

  11.3.2 Applications of the thermodynamic cycle perturbation method 

  11.3.3 The calculation of absolute free energies 

11.4 The calculation of enthalpy and entropy differences 

11.5 Partitioning the free energy 

11.6 Potential pitfalls with free energy calculations 

  11.6.1 Implementation aspects 

11.7 Potentials of mean force 

  11.7.1 Umbrella sampling 

  11.7.2 Calculating the potential of mean force for flexible molecules 

11.8 Approximate/"rapid" free energy methods 

11.9 Continuum representations of the solvent 

  11.9.1 Thermodynamic background 

11.10 The electrostatic contribution to the free energy of solvation: the Born and Onsager models 

  11.10.1 Calculating the electrostatic contribution via quantum mechanics 

  11.10.2 Continuum models for molecular mechanics 

  11.10.3 The Langevin dipole model 

  11.10.4 Methods based upon the Poisson-Boltzmann equation 

  11.10.5 Applications of finite difference Poisson-Boltzmann calculations 

11.11 Non-electrostatic contributions to the solvation free energy 

11.12 Very simple solvation models 

11.13 Modelling chemical reactions 

  11.13.1 Empirical approaches to simulating reactions 

  11.13.2 The potential of mean force of a reaction 

  11.13.3 Combined quantum mechanical/molecular mechanical approaches 

  11.13.4 Ab initio molecular dynamics and the Car-Parrinello method 

  11.13.5 Examples of ab initio molecular dynamics simulations 

11.14 Modelling solid-state defects 

  11.14.1 Defect studies of the high-Tc superconductor YBa2Cu3O7-x 

Appendix 11.1 Calculating Free Energy Differences Using Thermodynamic Integration 

Appendix 11.2 Using the Slow Growth Method for Calculating Free Energy Differences 

Appendix 11.3 Expansion of Zwanzig Expression for the Free Energy Difference for the Linear Response Method 


12.1 Molecular modelling in drug discovery 

12.2 Computer representations of molecules, chemical databases and 2D substructure searching 

12.3 3D database searching 

12.4 Deriving and using three-dimensional pharmacophores 

  12.4.1 Constrained systematic search 

  12.4.2 Ensemble distance geometry, ensemble molecular dynamics and genetic algorithms 

  12.4.3 Clique detection methods for finding pharmacophores 

  12.4.4 Maximum likelihood method 

  12.4.5 Incorporating additional geometric features into a 3D pharmacophore 

12.5 Sources of data for 3D databases 

12.6 Molecular docking 

  12.6.1 Scoring functions for molecular docking 

12.7 Applications of 3D database searching and docking 

12.8 Molecular similarity and similarity searching 

12.9 Molecular Descriptors 

  12.9.1 Partition coefficients 

  12.9.2 Molar refractivity 

  12.9.3 Topological indices 

  12.9.4 Pharmacophore keys 

  12.9.5 Calculating the similarity 

  12.9.6 Similarity based on 3D properties 

12.10 Selecting "diverse" sets of compounds 

  12.10.1 Data manipulation 

  12.10.2 Selection of diverse sets using cluster analysis 

  12.10.3 Dissimiliarity-based selection methods 

  12.10.4 Partition-based methods for compound selection 

12.11 Structure-based de novo ligand design 

  12.11.1 Locating favourable positions of molecular fragments within a binding site 

  12.11.2 Connecting molecular fragments in a binding site 

  12.11.3 Structure-based design methods to design HIV-1 protease inhibitors 

  12.11.4 Structure-based design of templates for zeolite synthesis 

12.12 Quantitative structure-activity relationships 

  12.12.1 Selecting the compounds for a QSAR analysis 

  12.12.2 Deriving the QSAR equation 

  12.12.3 Cross-validation 

  12.12.4 Interpreting a QSAR equation 

  12.12.5 Alternatives to multiple linear regression: discriminant analysis, neural networks and classification 


  12.12.6 Principal Components Regression 

12.13 Partial least squares 

  12.13.1 Partial least squares and molecular field analysis 

12.14 Combinatorial libraries 

  12.14.1 The design of "drug-like" libraries 

  12.14.2 Library enumeration 

  12.14.3 Combinatorial subset selection 

  12.14.4 The future 


Colour images

Click the number to view the image (gif format).


Appendix 2.1 Some Common Acronyms in Quantum Chemistry
AM1Austin Model 1
AOAtomic Obital
BSSEBasis-Set Superposition Error
CIConfiguration Interaction
CISConfiguration Interaction Singles
CISDConfiguration Interaction Singles and Doubles
CNDOComplete Neglect of Differential Overlap
DFTDensity Functional Theory
DIISDirect Inversion of Iterative Subspace
DVPDouble Zeta with Polarisation
DZDouble Zeta
EHTExtended Huckel Theory
GVBGeneralised Valence Bond model
HOMOHighest Occupied Molecular Orbital
INDOIntermediate Neglect of Differential Overlap
LCAOLinear Combination of Atomic Orbitals
LUMOLowest Unoccupied Molecular Orbital
MBPTMany-body Perturbation Theory
MINDO/3Modified INDO version 3
MNDOModified Neglect of Diatomic Overlap
MOMolecular Orbital
MP2, MP3 etcMoller-Plesset theory at second order, third order etc.
NDDONeglect of Diatomic Differential Overlap
PM3Parameterisation 3 of MNDO
QCISDQuadratic Configuration Interaction Singles and Doubles
RHFRestricted Hartree Fock
SAM1Semi-Ab-initio Model 1
SCFSelf-Consistent Field
STOSlater Type Orbital
STO-3G, STO-4G, etc.Minimal basis sets in which 3, 4 etc, Gaussian functions are used to represent the atomic orbitals on an atom
UHFUnrestricted Hartree Fock
ZDOZero Differential Overlap
CASSCFComplete Active Space Self-Consistent Field
QCISD(T)Configuration interation method involving single, double and quadratic excitations with an estimated triple excitation
LSDFTLocal Spin Density Functional Theory
LDALocal Density Approximation
BLYPBecke-Lee-Yang-Parr gradient-corrected functional for use with density functional theory
WVNcorrelation functional due to Wilk, Vosko and Nusair
B3LYPScheme for hybrid Hartree-Fock/Density functional theory introduced by Becke

Appendix 10.1 Some common abbreviations and acronyms used in bioinformatics
A, G, C, T (U)Adenine, Guanine, Cytosine, Thymine - the four bases present in DNA. Uracil replaces thymine in RNA
BpBase pair
cDNAComplementary DNA, synthesised from mesenger RNA
ChromosomeDiscrete unit of the genome consisting of a single molecule of DNA that carries many genes.
CloneGenetically identical copy (of a gene, cell or organism)
CodonSequence of three nucleotides that codes for a single amino acid (or a termination signal)
ContigA group of pieces of DNA, derived from a cloning experiment (often a series of ESTs, see below), that represent overlapping regions of a chromosome.
DeletionOne or nucleotides that are not copied during DNA replication
DNADeoxyribose nucleic acid
DomainSequence of polypeptide chain that can independently fold into a stable three-dimensional structure
Dynamic programmingTechnique widely used in sequence alignment
ESTExpressed Sequence Tag. An EST is a partial sequence (typically less than 400 bases) selected from cDNA and used to identify genes expressed in a particular tissue.
EukaryoteOrganism whose cells have a discrete nucleus and other subcellular compartments (cf. prokaryote)
ExonTranslated sequence of DNA
GapA break in DNA or protein sequence which enables two or more sequences to be aligned
geneA sequence of DNA at a particular position on a specific chromosome that encodes a precise functional product (usually protein)
GenomeAll of the genetic material in the chromosomes of an organism
IndelInsertion or deletion required to optimise sequence alignment
IntronNon-translated sequence of DNA
KbKilobase - one thousand nucleotide bases
ktupk-tuple. Parameter used in FASTA and FASTP sequence alignment methods
MbpMegabase - one million nucleotide bases
mRNAMessenger RNA
MutationA change in the DNA sequence
NucleotideThree components that make up the basic building block in DNA and RNA: a nitrogenous base (A, T, G, C, U), a phosphate and a sugar
OligonucleotideA molecule composed of a small number of nucleotides
OrthologueHomologous proteins that perform the same function within different organisms
ORFOpen Reading Frame - region of DNA that is transcribed into RNA. Delineated by an initiator codon at one end and a stop codon at the other end.
PAMPoint Accepted Mutation per 100 residues
ParalogueHomologous proteins that perform different but related functions within one organism
PCRPolymerase Chain Reaction. Widely used method for amplifying a DNA base sequence
PolymorphismDifferences in DNA sequence among individuals
ProkaryoteOrganism lacking a nucleus and subcellular compartments (cf. eukaryote). Includes bacteria and viruses
RNARibonucleic acid
SNPSingle Polynucleotide Polymorphism - single base-pair variations in DNA
STSSequence tagged site. A short DNA sequence that occurs just once in the human genome and whose locatino and base sequence are known.
TranscriptionFirst step in gene expression, corresponding to the generation of mRNA from the original DNA
TranslationSecond step in gene expression, the synthesis of proteins from mRNA
tRNATransfer RNA

Appendix 10.2. Some of the most common sequence and structural databases used in bioinformatics
GenBank (NCBI, USA) EMBL Nucleotide Sequence Database (Europe) DDBJ (Japan)The three main nucleotide sequence databases which are synchronised daily
PIR-International Protein Sequence DatabaseRedundant protein sequence database
Swiss-Prot, TrEMBLAnnotated non-redundant protein sequence database. TrEMBL is a computer- annotated supplement to Swiss-Prot. TrEMBL contains the translations of all coding sequences present in the EMBL Nucleotide Sequence Database, which are not yet integrated into Swiss-Prot
GenPeptCompendium of amino acid translations derived from GenBank
PDB, NRL3D Protein Data Bank - protein structures (mostly from X-ray crystallography). NRL3D is a derived sequence database in PIR format.
SCOP Structural Classification of Proteins. Hierarchical protein structure database
CATH, FSSP Sequence-structure classification databases
Prosite Motif database.

References for colour figures

Reddy B. S., W. Saenger, K. Muehlegger and G Weimann 1981. Crystal and molecular structure of the lithium salt of nicotinamide adenine dinucleotide dihydrate NAD, DPN, cozymase, codehydrase I. Journal of the American Chemical Society. 103 907-14

Bolin J T, D J Filman, D A Matthews, R C Hamlin and J Kraut 1982. Crystal Structures of Escherichia coli and Lactobacillus casei Dihydrofolate Reductase Refined at 1.7 ngstroms Resolution. I. Features and Binding of Methotrexate. Journal of Biological Chemistry 25713650-13662.

Robinson A J, W G Richards, P J Thomas and M M Hann 1994. Head Group and Chain Behaviour in Biological Membranes-A Molecular Dynamics Simulation. Biophysical Journal 672345-2354.

Groot R D and T J Madden 1998. Dynamics simulation of diblock copolymer microphase separation. The Journal of Chemical Physics 108:8713-8724.

Siepmann J I and I R McDonald 1993b. Monte Carlo Study of the Properties of Self-Assembled Monolayers Formed by Adsorption of CH3CH215SH on the 111 Surface of Gold. Molecular Physics 79457-473.

Chung C-W, R M Cooke, A E I Proudfoot and T N C Wells 1995. The Three-Dimensional Structure of RANTES. Biochemistry 349307-9314.

Greasley S E, H Jhoti, C Teahan, R Solari, A Fensom, G M H Thomas, S Cockroft and B Bax 1995. The Structure of Rat ADP-Ribosylation Factor-1 ARF-1 Complexed to GDP Determined from Two Different Crystal Forms. Nature Structural Biology 2797-806.

Turk D, J Sturzebecher and W Bode 1991. Geometry of Binding of the N-Alpha-Tosylated Piperidides of meta-Amidino-Phenylalanine, Para Amidino-Phenylalanine and para-Guanidino-Phenylalanine to Thrombin and Trypsin-X-ray Crystal Structures of Their Trypsin Complexes and Modeling of their Thrombin Complexes. Febs Letters 287133-138.

Birktoft J J and D M Blow 1972. The structure of Crystalline Alpha-Chymotrypsin V. The Atomic Structure of Tosyl-Alpha-Chymotrypsin at 2 Angstroms Resolution. Journal Of Molecular Biology 68187-240.

Turk D, H W Hoeffken, D Grosse, J Stuerzebecher, P D Martin, B F P Edwards and W Bode 1992. Refined 2.3 ngstroms X-Ray Crystal Structure of Bovine Thrombin Complexes Formed with the 3 Benzamidine and Arginine-Based Thrombin Inhibitors NAPAP, 4-TAPAP and MQPA A Starting Point for Improving Antithrombotics. Journal Of Molecular Biology 2261085-1099.

Bruno I J, J C Cole, J P M Lommerse, R S Rowland, R Taylor and M L Verdonk 1997. Isostar a library of information about nonbonded interactions. The Journal of Computer-Aided Molecular Design 11525-537

Marquart M, J Walter, J Deisenhofer, W Bode, R Huber 1983. The Geometry of the Reactive Site and of the Peptide Groups in Trypsin, Trypsinogen and its Complexes with Inhibitors. Acta Crystallographica B39480-490.

McRee D E, S M Redford, E D Getzoff, J R Lepock, R A Hallewell and J A Tainer 1990. Changes in Crystallographic Structure and Thermostability of a Cu, Zn Superoxide Dismutase Mutant Resulting from the Removal of Buried Cysteine. Journal of Biological Chemistry 26514234-14241.

Freitag S, I Le Trong, P S Stayton and R E Stenkamp 1997. Structural Studies of the Streptavidin Binding Loop. Protein Science 61157-

Priestle J P, A Fassler, J Rosel, M Tintelnog-Blomley, P Strop and M G Gruetter 1995. Comparative Analysis of The X-Ray Structures of HIV-1 and HIV-2 Proteases in Complex with a Novel Pseudosymmetric Inhibitor. Structure London 3381-389.

Poso A, R Juvonen and J Gynther 1995. Comparative molecular field analysis of compounds with CYP2A5 binding affinity. Quantitative Structure-Activity Relationships 14507-511

Von Itzstein M, W Y Wu, G B Kok, M S Pegg, J C Dyason, B Jin, T V Phan, M L Smythe, H F Whites, S W Oliver, P M Colman, J N Varghese, D M Ryan, J M Woods, R C Bethell, V J Hotham, J M Cameron and C R Penn 1993. Rational Design of Potent Sialidase-Based Inhibitors of Influenza-Virus Replication. Nature 363:418-423.

Comments, questions, corrections?

Click here to send email (if your browser supports the "mailto" command)