lammps_utils.rdkit package

High-level RDKit-based utilities for molecular structure processing.

This submodule includes: - Bond order estimation based on atomic distance and element types, - Coordinate unwrapping for RDKit molecules under periodic boundary conditions (PBC), - Main chain detection in polymer-like molecular structures.

lammps_utils.rdkit.compute_density(mol: Mol, cell_size: Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str]) float

Compute the mass density of a molecule in g/cm³ based on its atomic mass and a given cell size.

This function assumes that the input cell size is given in angstroms (Å), as commonly used in molecular simulations such as LAMMPS with units real or units metal.

Parameters:
  • mol (Chem.rdchem.Mol) – RDKit molecule object. The atomic masses are obtained using atom.GetMass(), which returns values in amu.

  • cell_size (ArrayLike) – A 1D array-like object of shape (3,) specifying the dimensions of the simulation cell (in angstroms).

Returns:

The computed density in grams per cubic centimeter (g/cm³).

Return type:

float

Notes

  • The atomic mass is converted from amu to grams using Avogadro’s number.

  • The cell volume is converted from ų to cm³ using the relation: 1 Å = 1e-8 cm.

lammps_utils.rdkit.compute_ffv(mol: Mol, confId: int = -1, cell_bounds: tuple[tuple[float, float], tuple[float, float], tuple[float, float]] | None = None, probe_radius: float = 1.4, grid_spacing: float = 1.0, n_jobs: int | None = None) float

Compute the fractional free volume (FFV) of a molecule.

If cell_bounds is not provided, it is automatically determined from the conformer properties (“xlo”, “xhi”, etc.) assumed to be preassigned.

Parameters:
  • mol (Chem.rdchem.Mol) – The input RDKit molecule.

  • confId (int) – The conformer ID to use from the molecule.

  • cell_bounds (tuple of tuple of float, optional) – The periodic cell boundaries as ((xlo, xhi), (ylo, yhi), (zlo, zhi)). If None, the bounds will be extracted from conformer properties.

  • probe_radius (float) – The radius of the spherical probe for free volume determination.

  • grid_spacing (float) – The spacing of the grid used to sample the cell volume.

  • n_jobs (int, optional) – The number of parallel jobs to run. Use -1 to utilize all CPUs.

Returns:

The fractional free volume of the molecule.

Return type:

float

lammps_utils.rdkit.compute_rg(mol: Mol, confId: int = -1, mode: Literal['geometry', 'mass'] = 'geometry', removeHs: bool = False) float

Compute the radius of gyration (Rg) of a molecule.

Parameters:
  • mol (Chem.rdchem.Mol) – The molecule for which Rg is computed.

  • confId (int, optional) – Index of the conformer to use (default is -1, which selects the first conformer).

  • mode ({'geometry', 'mass'}, optional) – Mode of computation: - ‘geometry’: Compute Rg based solely on atomic positions (default). - ‘mass’: Compute Rg based on atomic masses.

  • removeHs (bool, optional) – Whether to remove hydrogens from the molecule before computation (default is False).

Returns:

The radius of gyration (Rg) of the molecule.

Return type:

float

Raises:

ValueError – If no conformer is available for the molecule. If an invalid mode is specified.

Notes

  • If mode=’mass’, Rg is computed using atomic masses as weights.

  • If mode=’geometry’, Rg is computed based only on atomic positions.

lammps_utils.rdkit.find_main_chains(mol: Mol) Generator[Mol, None, None]

Extracts and yields the longest linear (main chain) fragments from a molecule.

Parameters:

mol (Chem.rdchem.Mol) – An RDKit Mol object representing the molecular structure.

Yields:

Chem.rdchem.Mol – A fragment corresponding to the main chain (as a substructure) for each connected component.

Raises:

TypeError – If the input is not an RDKit Mol object.

Notes

This function identifies the longest acyclic paths (main chains) within each connected component of the input molecule.

  • Hydrogen atoms are removed prior to analysis.

  • An undirected graph is constructed from the molecule using the adjacency matrix.

  • Cyclic atoms are ignored when determining the main chain endpoints.

  • The longest path between two non-cyclic atoms is obtained via a breadth-first search, followed by a shortest path search in the graph.

  • The corresponding substructure (bond path) is returned as an RDKit Mol fragment.

This method is useful for isolating backbone structures or linear scaffolds in a molecule.

lammps_utils.rdkit.get_bond_order(atom_symbols: tuple[str, str], bond_length: float) BondType

Estimate bond order based on atom symbols and bond length.

Parameters:
  • atom_symbols (tuple of str) – A tuple containing the atomic symbols of the two bonded atoms (e.g., (“C”, “O”)).

  • bond_length (float) – The bond length in angstroms.

Returns:

The estimated bond type (SINGLE, DOUBLE, TRIPLE, or AROMATIC).

Return type:

rdkit.Chem.rdchem.BondType

lammps_utils.rdkit.unwrap_rdkit_mol_under_pbc(mol: Mol, cell_size: Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str], confId: int = -1, determine_bonds: bool = False) Mol

Unwraps a periodic RDKit molecule so that bonded atoms are positioned close together in Cartesian space.

Parameters:
  • mol (Chem.rdchem.Mol) – The RDKit molecule to be unwrapped. Must have at least one 3D conformer.

  • cell_size (ArrayLike) – The size of the periodic simulation cell (a 3-element array-like object representing the box dimensions).

  • confId (int, optional) – The conformer ID to use for coordinate manipulation. Defaults to -1 (the first conformer).

  • determine_bonds (bool, optional) – If True, reassigns bond orders based on interatomic distances after unwrapping. Defaults to False.

Returns:

A new RDKit molecule object with unwrapped coordinates and optionally updated bond orders. All hydrogen atoms are removed from the returned molecule.

Return type:

Chem.rdchem.Mol

Raises:

AssertionError – If the input molecule has no conformers or if the cell size is invalid.

Notes

This function converts the molecule to a graph to assist in unwrapping it under periodic boundary conditions (PBC), using the unwrap_molecule_under_pbc utility. If determine_bonds is True, bond distances are recalculated post-unwrapping, and bond types are reassigned using the get_bond_order function. Hydrogens are removed from the returned molecule to simplify further processing.