Download Notebook View in GitHub Open in Google Colab
Protein-ligand-water systems with Interchange
In this example, we’ll take a docked protein-ligand system from an OpenFF benchmark data set, parameterize and solvate it, and export the parameterized system to a variety of simulation engines.
import urllib
import nglview
import numpy as np
from openff.toolkit import ForceField, Molecule, Topology
from openff.units import unit
from openff.interchange import Interchange
from openff.interchange.components._packmol import UNIT_CUBE, pack_box
from openff.interchange.drivers import get_openmm_energies
from openff.interchange.drivers.all import get_summary_data
Collecting structures
In this example we’ll use starting coordinates data from MCL1, which is part of the Protein Ligand Benchmark data set curated by the Open Force Field Initiative. Conveniently for the purposes of this example, the ligand is already docked and the protein is relatively small (~2000 atoms), so we can focus on using Interchange without too much prep.
Start by downloading the protein and ligand structure files from GitHub:
url = (
"https://raw.githubusercontent.com/openforcefield/protein-ligand-benchmark/"
"8c94c0dcc892dfd77992567294b1ff31c62e8695/plbenchmark/sample_data/2020-08-26_mcl1_sample/"
)
urllib.request.urlretrieve(url + "01_protein/crd/protein.pdb", "protein.pdb")
urllib.request.urlretrieve(url + "02_ligands/lig_23/crd/lig_23.sdf", "lig_23.sdf")
# `protein.pdb` and `lig_23.sdf` should be in the local path now
!ls -lhrt
total 232K
-rw-r--r-- 1 runner runner 23K Feb 5 00:37 protein_ligand.ipynb
-rw-r--r-- 1 runner runner 211 Feb 5 00:37 README.md
-rw-r--r-- 1 runner runner 194K Feb 5 00:45 protein.pdb
-rw-r--r-- 1 runner runner 4.1K Feb 5 00:45 lig_23.sdf
The OpenFF Toolkit recently (version 0.13) added support for loading multi-component PDBs. There are some limitations but for our system - a well-structured PDB file including a protein and some crystal waters - it should work fine.
protein_with_crystal_water = Topology.from_pdb("protein.pdb")
protein_with_crystal_water.n_molecules
29
protein_with_crystal_water is a Topology, not a Molecule, containing the protein and a few crystal waters. We can splice out the protein as a Molecule object and visualize it:
protein = protein_with_crystal_water.molecule(0)
protein
Preparing components
This system has three components: Protein, ligand, and solvent (water). For each component, we need positions and parameters. Our protein and ligand positions come from PDBs, and we’ll generate solvent coordinates ourselves. For parameters, the Sage force field will be perfect for the ligand and water, but doesn’t support proteins - they’re coming in Rosemary. In the meantime, we’ll use a SMIRNOFF port of ff14SB, a popular force field in the Amber force field family which has a compatible functionaln form.
Unfortunately, this means we need to treat each component seperately. Interchange provides an means for combining these systems, which we’ll see in a bit.
Protein component
Let’s start with the protein component. We The Molecule.from_polymer_pdb() method constructs a Molecule from a PDB file encoding a protein. A Molecule object stores a molecule’s chemical identity, bond graph, and co-ordinates. The OpenFF Toolkit doesn’t accept PDB files for small molecules because they don’t have enough chemical information, but it makes an exception for biopolymers like proteins via a chemical substructure dictionary containing information about canonical amino aicds. This saves us from needing to do things like write up a SMILES string for an entire protein.
Our PDB file doesn’t only contain one molecule, though, it contains a protein and crystal waters. The OpenFF Toolkit recently (version 0.13) added support for loading multi-component PDBs. There are some limitations but for our system it should work fine.
protein_with_crystal_water = Topology.from_pdb("protein.pdb")
protein_with_crystal_water.n_molecules
29
protein_with_crystal_water is a Topology, not a Molecule, containing the protein and a few crystal waters. We can splice out the protein as a Molecule object and visualize it to make sure it looks reasonable:
protein = protein_with_crystal_water.molecule(0)
protein.visualize(backend="nglview")
OpenFF maintains a port of Amber ff14sb, which we’ll use for the protein parameters. We’re using the impropers variant because Interchange doesn’t support Amber’s improper torsion function.
ff14sb = ForceField("ff14sb_off_impropers_0.0.3.offxml")
We can use the Interchange.from_smirnoff constructor method to combine the protein molecule (with coordinates) and the ff14sb force field into an Interchange.
protein_intrcg = Interchange.from_smirnoff(
force_field=ff14sb,
topology=protein.to_topology(),
)
Ligand component
SDF files encode all the chemical information the Toolkit needs to construct a Molecule, so we can use the general-purpose small molecule from_file method for the ligand:
ligand = Molecule.from_file("lig_23.sdf")
ligand.name = "LIG"
ligand.visualize(backend="nglview")
We’ll use the OpenFF 2.0.0 “Sage” force field for the ligand, which is a production-ready small molecule SMIRNOFF force field. Its coordinates are taken from the lig_23.sdf file we downloaded earlier. We just want to do some point energy calculations as a proof of concept, so we’ll use the unconstrained variant of Sage (see the OpenFF Toolkit FAQ for details).
ligand_intrcg = Interchange.from_smirnoff(
force_field=ForceField("openff_unconstrained-2.0.0.offxml"),
topology=[ligand],
)
Now that we have two interchanges, we can combine them with the Interchange.combine method! We’ll need a combined system to solvate too, so this’ll be useful in a second.
docked_intrcg = protein_intrcg.combine(ligand_intrcg)
/home/runner/micromamba/envs/openff-docs-examples/lib/python3.11/site-packages/openff/interchange/operations/_combine.py:104: InterchangeCombinationWarning: Interchange object combination is complex and may produce strange results outside of use cases it has been tested in. Use with caution and thoroughly validate results!
warnings.warn(
/home/runner/micromamba/envs/openff-docs-examples/lib/python3.11/site-packages/openff/interchange/operations/_combine.py:84: InterchangeCombinationWarning: Found electrostatics 1-4 scaling factors of 5/6 with slightly different rounding (0.833333 and 0.8333333333). This likely stems from OpenFF using more digits in rounding 1/1.2. The value of 0.8333333333 will be used, which may or may not introduce small errors.
warnings.warn(
In addition to making it easy to parameterize systems for all sorts of engines, Interchange makes it easy to visualize systems. We can use the visualize() method to view our docked system in NGLView and make sure the co-ordinates make sense:
w = docked_intrcg.visualize()
w.clear_representations()
w.add_representation(
"licorice",
radius=0.1,
selection=[*range(protein_intrcg.topology.n_atoms)],
)
w.add_representation(
"spacefill",
selection=[*range(protein_intrcg.topology.n_atoms, docked_intrcg.topology.n_atoms)],
)
w
Before we move on to the solvent, we should check the partial charge of our protein-ligand complex. Since we want to run simulations with net neutral charge, we’ll add counter-ions to balance the total charge of the protein-ligand complex. To be safe, we can also double-check that the total charge in the docked_intrcg object matches the sum of the formal charges of the Molecule representations which more closely describe our intended chemistry.
total_charge = round(sum(docked_intrcg["Electrostatics"].charges.values()), 3)
assert total_charge == protein.total_charge + ligand.total_charge, (
f"Total charge of the system is {total_charge}, not {protein.total_charge + ligand.total_charge}"
)
total_charge
Solvent component
We’ll reuse the Sage force field from earlier here, as it includes parameters for TIP3P water, but first we need coordinates for our solvated system. This is a portion of the OpenFF ecosystem that will be streamlined in the future, but we can use a PACKMOL wrapper to get the job done. We’re adding a fixed amount of water for this quick example so the density will be wrong, but imagine it’s right.
We’ll also add three chloride ions to balance the net charge of the protein-ligand complex. For a production run, you’ll probably want a more realistic salt concentration - the goal here is just to get something running with a net neutral charge.
water = Molecule.from_smiles("O")
water.generate_conformers(n_conformers=1)
ion = Molecule.from_smiles("[Cl-]")
ion.generate_conformers(n_conformers=1)
# Come up with a box size based on the size of the protein plus a 2 nm buffer
xyz = protein.conformers[0]
centroid = xyz.sum(axis=0) / xyz.shape[0]
protein_radius = np.sqrt(((xyz - centroid) ** 2).sum(axis=-1).max())
box_vectors = UNIT_CUBE * (protein_radius * 2 + 2 * unit.nanometer)
# Pack the box with an arbitrary number of water
n_water = 1000
packed_topology = pack_box(
molecules=[water, ion],
number_of_copies=[n_water, int(total_charge.m)],
solute=docked_intrcg.topology,
box_vectors=box_vectors,
)
# Visualize
packed_topology.to_file("packed.pdb")
nglview.show_structure_file("packed.pdb")
And now we can create the interchange! The Topology we got from pack_box includes the positions we’ll later apply to the solvated complex. For now, we need an Interchange that represents the water component. We can pass it Sage, wihch contains TIP3P parameters, and a topology of n_water water molecules without worrying about the positions, since we’ll just set those later.
water_intrcg = Interchange.from_smirnoff(
force_field=ForceField("openff_unconstrained-2.0.0.offxml"),
topology=[water] * n_water + [ion] * int(total_charge.m),
)
Putting the system together
Now that we’ve got all the pieces, we can combine the docked protein-ligand system with the solvent, and add in the positions and box vectors we just worked out
system_intrcg = docked_intrcg.combine(water_intrcg)
system_intrcg.positions = packed_topology.get_positions()
system_intrcg.box = packed_topology.box_vectors
w = system_intrcg.visualize()
w.clear_representations()
# Protein rep
w.add_representation(
"licorice",
radius=0.2,
selection=[*range(protein_intrcg.topology.n_atoms)],
)
# Ligand rep
w.add_representation(
"spacefill",
selection=[*range(protein_intrcg.topology.n_atoms, docked_intrcg.topology.n_atoms)],
)
# Water rep
w.add_representation(
"licorice",
radius=0.1,
selection=[*range(docked_intrcg.topology.n_atoms, system_intrcg.topology.n_atoms)],
)
w
Exporting to simulation engines
Finally, we can export the final Interchange object to models understood by various simulation engines. Some of these exports are not yet optimized for large files.
OpenMM
openmm_system = system_intrcg.to_openmm()
openmm_topology = system_intrcg.topology.to_openmm(ensure_unique_atom_names=False)
print(type(openmm_system), type(openmm_topology))
<class 'openmm.openmm.System'> <class 'openmm.app.topology.Topology'>
Amber
system_intrcg.to_inpcrd("out.inpcrd")
system_intrcg.to_prmtop("out.prmtop")
LAMMPS
# The LAMMPS exporter has not yet been optimized for large molecules or systems
if False:
system_intrcg.to_lammps("out.lmp")
GROMACS
system_intrcg.to_gromacs(prefix="out", monolithic=False)
/home/runner/micromamba/envs/openff-docs-examples/lib/python3.11/site-packages/openff/interchange/components/mdconfig.py:503: UserWarning: Ambiguous failure while processing constraints. Constraining h-bonds as a stopgap.
warnings.warn(
Energy tests
In order to verify the accuracy of each export, we can use functions in the openff.interchange.drivers module to call out to each engine to evaluate single-point energies. Under the hood, each function uses the export functions just as we did in the above cells. The GROMACS and Amber exports are a little slower than the OpenMM export, so some of these cells might take a minute to execute.
To get a quick look at how a single engine reports energies, use get_openmm_energies (or get_gromacs_energies, get_amber_energies, or get_lammps_energies).
get_openmm_energies(system_intrcg)
EnergyReport(energies={'Bond': <Quantity(11255.7296, 'kilojoule / mole')>, 'Angle': <Quantity(1980.01657, 'kilojoule / mole')>, 'Torsion': <Quantity(7759.41267, 'kilojoule / mole')>, 'Nonbonded': <Quantity(5066671.6, 'kilojoule / mole')>})
For convenience there is a function get_summary_data that runs through all available engines and summarizes the results in a Pandas DataFrame. (This cell might take a minute to execute). We’re setting the argument _engines to a non-defualt value so that the LAMMPS driver is skipped even if it’s available; normally this argument is unnecessary if you don’t have LAMMPS installed on your system.
summary = get_summary_data(system_intrcg, _engines=["OpenMM", "GROMACS", "Amber"])
summary
/home/runner/micromamba/envs/openff-docs-examples/lib/python3.11/site-packages/openff/interchange/components/mdconfig.py:503: UserWarning: Ambiguous failure while processing constraints. Constraining h-bonds as a stopgap.
warnings.warn(
/home/runner/micromamba/envs/openff-docs-examples/lib/python3.11/site-packages/openff/interchange/components/mdconfig.py:431: SwitchingFunctionNotImplementedWarning: A switching distance 8.0 angstrom was specified by the force field, but Amber does not implement a switching function. Using a hard cut-off instead. Non-bonded interactions will be affected.
warnings.warn(
| Bond | Angle | Torsion | vdW | Electrostatics | RBTorsion | |
|---|---|---|---|---|---|---|
| OpenMM | 11255.729633 | 1980.016575 | 7759.412675 | 5.087932e+06 | -21260.085305 | NaN |
| Amber | 538.964470 | 2180.267338 | 7759.412514 | 5.087879e+06 | -21259.672601 | NaN |
| GROMACS | 538.962646 | 1980.016235 | 7759.417587 | 5.087855e+06 | -21271.959961 | 0.0 |
We can see from these large energy differences that something is wrong - this stems from the experimental Interchange combination operation producing incorrect results.
summary.describe()
| Bond | Angle | Torsion | vdW | Electrostatics | RBTorsion | |
|---|---|---|---|---|---|---|
| count | 3.000000 | 3.000000 | 3.000000 | 3.000000e+00 | 3.000000 | 1.0 |
| mean | 4111.218917 | 2046.766716 | 7759.414259 | 5.087889e+06 | -21263.905956 | 0.0 |
| std | 6187.327778 | 115.614930 | 0.002884 | 3.908917e+01 | 6.978025 | NaN |
| min | 538.962646 | 1980.016235 | 7759.412514 | 5.087855e+06 | -21271.959961 | 0.0 |
| 25% | 538.963558 | 1980.016405 | 7759.412595 | 5.087867e+06 | -21266.022633 | 0.0 |
| 50% | 538.964470 | 1980.016575 | 7759.412675 | 5.087879e+06 | -21260.085305 | 0.0 |
| 75% | 5897.347052 | 2080.141956 | 7759.415131 | 5.087905e+06 | -21259.878953 | 0.0 |
| max | 11255.729633 | 2180.267338 | 7759.417587 | 5.087932e+06 | -21259.672601 | 0.0 |
In the future this should work more smoothly with identical energies reported by each engine. In lieu of that, we can evaluate the energy of each component that we previously added together. This requires setting box vectors for each component and also setting the water positions, which we skipped earlier since we were able to use PACKMOL results directly.
for subset in [ligand_intrcg, protein_intrcg, water_intrcg]:
subset.box = system_intrcg.box
water_intrcg.positions = system_intrcg.positions[-3003:,]
get_summary_data(ligand_intrcg, _engines=["OpenMM", "GROMACS", "Amber"])
/home/runner/micromamba/envs/openff-docs-examples/lib/python3.11/site-packages/openff/interchange/components/mdconfig.py:431: SwitchingFunctionNotImplementedWarning: A switching distance 8.0 angstrom was specified by the force field, but Amber does not implement a switching function. Using a hard cut-off instead. Non-bonded interactions will be affected.
warnings.warn(
| Bond | Angle | Torsion | vdW | Electrostatics | RBTorsion | |
|---|---|---|---|---|---|---|
| OpenMM | 25.837656 | 356.516457 | 27.319830 | 105.023359 | -31.915307 | NaN |
| Amber | 25.837455 | 356.516548 | 27.319846 | 104.965682 | -31.912623 | NaN |
| GROMACS | 25.838177 | 356.516541 | 27.319841 | 105.025609 | -31.987274 | 0.0 |
get_summary_data(protein_intrcg, _engines=["OpenMM", "GROMACS", "Amber"])
/home/runner/micromamba/envs/openff-docs-examples/lib/python3.11/site-packages/openff/interchange/components/mdconfig.py:431: SwitchingFunctionNotImplementedWarning: A switching distance 8.0 angstrom was specified by the force field, but Amber does not implement a switching function. Using a hard cut-off instead. Non-bonded interactions will be affected.
warnings.warn(
| Bond | Angle | Torsion | vdW | Electrostatics | RBTorsion | |
|---|---|---|---|---|---|---|
| OpenMM | 11229.891977 | 1623.500118 | 7732.092845 | 5.086854e+06 | -20644.012231 | NaN |
| Amber | 11229.891982 | 1623.499947 | 7732.092668 | 5.086796e+06 | -20643.369819 | NaN |
| GROMACS | 11229.884766 | 1623.498413 | 7732.097672 | 5.086778e+06 | -20647.463379 | 0.0 |
get_summary_data(water_intrcg, _engines=["OpenMM", "GROMACS", "Amber"])
/home/runner/micromamba/envs/openff-docs-examples/lib/python3.11/site-packages/openff/interchange/components/mdconfig.py:431: SwitchingFunctionNotImplementedWarning: A switching distance 8.0 angstrom was specified by the force field, but Amber does not implement a switching function. Using a hard cut-off instead. Non-bonded interactions will be affected.
warnings.warn(
| Bond | Angle | Torsion | vdW | Electrostatics | RBTorsion | |
|---|---|---|---|---|---|---|
| OpenMM | 0.0 | 0.000000 | 0.0 | 860.237819 | -161.044844 | NaN |
| Amber | 0.0 | 200.250842 | 0.0 | 859.633762 | -161.283158 | NaN |
| GROMACS | 0.0 | 0.000000 | 0.0 | 860.232862 | -169.509521 | 0.0 |
We can see from these results that each engine reports nearly identical energies for each individual component.
Verifying charge neutrality
Earlier, when adding waters to the system, we added three chloride ions to balance the net +3 charge of the protein-ligand complex. Did we actually end up with a charge-neutral system?
import openmm
nbforce = openmm_system.getForces()[0]
charge = 0 * openmm.unit.elementary_charge
for i in range(nbforce.getNumParticles()):
charge += nbforce.getParticleParameters(i)[0]
charge
-8.012210317431823e-08 e