lammps_utils.io package

Lammps utils I/O module.

This module provides functions to convert LAMMPS data files to GROMACS gro files and to load data from LAMMPS data files

lammps_utils.io.MolFromLAMMPSData(filepath_data_or_buffer: PathLike | str | TextIOBase, make_molecule_whole: bool = True, determine_bonds: bool = True) Mol

Constructs an RDKit Mol object from a LAMMPS data file or buffer.

This function reads atomic and bonding information from a LAMMPS-style data file, reconstructs the molecular structure by inferring bond orders based on interatomic distances, and returns a corresponding RDKit Mol object.

Parameters:

filepath_data_or_buffer (Union[os.PathLike, str, io.TextIOBase]) – Path to the LAMMPS data file, or a file-like buffer object containing the data.

Returns:

An RDKit Mol object with atoms and inferred bonds, including 3D coordinates as a single conformer.

Return type:

Chem.rdchem.Mol

lammps_utils.io.MolFromLAMMPSDump(filepath_dump: PathLike | str, mol_template: Mol, make_molecule_whole: bool = False, n_jobs: int | None = None) Mol

Create an RDKit molecule with conformers from a LAMMPS dump file.

This function loads atom coordinates from a LAMMPS trajectory file and assigns them to the provided molecular template as conformers. If specified, the molecule can be unwrapped under periodic boundary conditions to make it whole.

Parameters:
  • filepath_dump (Union[os.PathLike, str]) – Path to the LAMMPS dump file to load.

  • mol_template (Chem.rdchem.Mol) – An RDKit molecule used as a template. The returned molecule will copy its atom and bond structure.

  • make_molecule_whole (bool) – If True, unwrap the molecule based on PBC to make it whole in each frame.

  • n_jobs (int, optional) – Number of parallel jobs for loading the dump. -1 uses all available CPUs.

Returns:

An RDKit molecule with one conformer per frame in the LAMMPS dump file. Each conformer stores the simulation cell bounds as properties.

Return type:

Chem.rdchem.Mol

lammps_utils.io.data2gro(filepath_data_or_buffer: str | PathLike | TextIOBase, filepath_gro: str | PathLike | None = None) str | None

Convert LAMMPS data file to GROMACS gro file.

Parameters:
  • filepath_data_or_buffer (Union[str, os.PathLike, io.TextIOBase]) – file path or buffer of LAMMPS data file

  • filepath_gro (Optional[Union[str, os.PathLike]], optional) – file path of GROMACS gro file, by default None

Returns:

GROMACS gro file content if filepath_gro is None, otherwise None

Return type:

Union[str, None]

Raises:

FileNotFoundError – If the directory of filepath_gro does not exist

lammps_utils.io.data2pdb(filepath_data_or_buffer: str | PathLike | TextIOBase, filepath_pdb: str | PathLike | None = None) str | None

Convert LAMMPS data file to PDB file.

Parameters:
  • filepath_data_or_buffer (Union[str, os.PathLike, io.TextIOBase]) – file path or buffer of LAMMPS data file

  • filepath_pdb (Optional[Union[str, os.PathLike]], optional) – file path of PDB file, by default None

Returns:

PDB file content if filepath_pdb is None, otherwise None

Return type:

Union[str, None]

Raises:

FileNotFoundError – If the directory of filepath_pdb does not exist

lammps_utils.io.get_atom_type_masses(filepath_data_or_buffer: str | PathLike | TextIOBase | BufferedIOBase) dict[int, float]

Get the masses of atom types from a LAMMPS data file or a file-like object.

Parameters:

filepath_data_or_buffer (Union[str, os.PathLike, io.TextIOBase]) – The file path or file-like object to read.

Returns:

A dictionary mapping atom type IDs to their masses.

Return type:

dict[int, float]

Raises:

ValueError – If the atom type masses cannot be found in the file.

lammps_utils.io.get_atom_type_symbols(filepath_data_or_buffer: str | PathLike | TextIOBase | BufferedIOBase) dict[int, str]

Get the symbols of atom types from a LAMMPS data file or a file-like object.

Parameters:

filepath_data_or_buffer (Union[str, os.PathLike, io.TextIOBase]) – The file path or file-like object to read.

Returns:

A dictionary mapping atom type IDs to their symbols.

Return type:

dict[int, str]

lammps_utils.io.get_cell_bounds(filepath_data_or_buffer: str | PathLike | TextIOBase | BufferedIOBase) tuple[tuple[float, float], tuple[float, float], tuple[float, float]]

Get the cell limits from a LAMMPS data file or a file-like object.

Parameters:

filepath_data_or_buffer (Union[str, os.PathLike, io.TextIOBase]) – The file path or file-like object to read.

Returns:

A tuple containing the cell limits for x, y, and z axes.

Return type:

tuple[tuple[float, float], tuple[float, float], tuple[float, float]]

lammps_utils.io.get_n_atom_types(filepath_data_or_buffer: str | PathLike | TextIOBase | BufferedIOBase) int

Get the number of atom types from a LAMMPS data file or a file-like object.

Parameters:

filepath_data_or_buffer (Union[str, os.PathLike, io.TextIOBase]) – The file path or file-like object to read.

Returns:

The number of atom types in the LAMMPS data file.

Return type:

int

Raises:

ValueError – If the number of atom types cannot be found in the file.

lammps_utils.io.get_n_atoms(filepath_data_or_buffer: str | PathLike | TextIOBase | BufferedIOBase) int

Get the number of atoms from a LAMMPS data file or a file-like object.

Parameters:

filepath_data_or_buffer (Union[str, os.PathLike, io.TextIOBase]) – The file path or file-like object to read.

Returns:

The number of atoms in the LAMMPS data file.

Return type:

int

Raises:

ValueError – If the number of atoms cannot be found in the file.

lammps_utils.io.get_n_bonds(filepath_data_or_buffer: str | PathLike | TextIOBase | BufferedIOBase) int

Get the number of bonds from a LAMMPS data file or a file-like object.

Parameters:

filepath_data_or_buffer (Union[str, os.PathLike, io.TextIOBase]) – The file path or file-like object to read.

Returns:

The number of bonds in the LAMMPS data file.

Return type:

int

Raises:

ValueError – If the number of bonds cannot be found in the file.

lammps_utils.io.load_data(filepath_data_or_buffer: str | PathLike | TextIOBase | BufferedIOBase, make_molecule_whole: bool = False, return_bond_info: bool = False, return_cell_bounds: bool = False) DataFrame | tuple[DataFrame, DataFrame] | tuple[DataFrame, tuple[tuple[float, float], tuple[float, float], tuple[float, float]]] | tuple[DataFrame, DataFrame, tuple[tuple[float, float], tuple[float, float], tuple[float, float]]]

Load atom (and optionally bond and cell) data from a LAMMPS data file into a DataFrame.

This function supports file paths and file-like objects and optionally reconstructs molecules by unwrapping coordinates under periodic boundary conditions.

Parameters:
  • filepath_data_or_buffer (str or os.PathLike or io.TextIOBase or io.BufferedIOBase) – The path to a LAMMPS data file, or a file-like object containing the data.

  • make_molecule_whole (bool, default=False) – If True, unwraps atomic coordinates using bond connectivity and periodic cell bounds so that molecules are made whole (not split across periodic boundaries).

  • return_bond_info (bool, default=False) – If True, returns an additional DataFrame containing bond information.

  • return_cell_bounds (bool, default=False) – If True, returns the simulation cell bounds as a tuple of 3 (min, max) pairs for x, y, z.

Returns:

  • Union[pd.DataFrame,

  • tuple[pd.DataFrame, pd.DataFrame],

  • tuple[pd.DataFrame, tuple[tuple[float, float], tuple[float, float], tuple[float, float]]],

  • tuple[pd.DataFrame, pd.DataFrame, tuple[tuple[float, float], tuple[float, float], tuple[float, float]]]

  • ] – The atom DataFrame is always returned. Depending on the flags:

    • If return_bond_info is True, bond DataFrame is included.

    • If return_cell_bounds is True, simulation box bounds are included.

    • If both flags are True, all three values are returned as a tuple: (atom DataFrame, bond DataFrame, cell bounds).

Raises:

AssertionError – If required components (like conformers) are missing when make_molecule_whole is True.

Notes

This function assumes the input file is in LAMMPS data format with sections for atoms, bonds, and box bounds. If make_molecule_whole is enabled, bond and box information are automatically parsed regardless of the other flags.

lammps_utils.io.load_dump(filepath_dump: PathLike | str, buffer_size: int = 10485760, return_cell_bounds: bool = False, n_jobs: int | None = None) tuple[tuple[int, DataFrame, tuple[tuple[float, float], tuple[float, float], tuple[float, float]]], ...] | tuple[tuple[int, DataFrame], ...]

Load and parse a LAMMPS dump file into structured timestep data.

Parameters:
  • filepath_dump (Union[os.PathLike, str]) – Path to the LAMMPS dump file to be loaded.

  • buffer_size (int, optional) – Size of the buffer to use when scanning the file for timestep offsets (in bytes). Larger values may improve performance on large files. Default is 10 MB.

  • return_cell_bounds (bool, optional) – Whether to extract and return cell bounds for each timestep. If True, the output will include cell bounds in addition to timestep and atomic data. Default is False.

  • n_jobs (Optional[int], optional) – Number of parallel jobs to run. If None, defaults to single-threaded operation.

Returns:

  • Union[

  • tuple[tuple[int, pd.DataFrame, tuple[tuple[float, float], tuple[float, float], tuple[float, float]]], …],

  • tuple[tuple[int, pd.DataFrame], …]

  • ] – A tuple of timestep data. Each element is either: - (timestep, DataFrame) if return_cell_bounds is False - (timestep, DataFrame, cell_bounds) if return_cell_bounds is True

    timestep is an integer, DataFrame contains atomic data for that step, and cell_bounds is a 3-tuple of (min, max) pairs for x, y, z.

lammps_utils.io.unwrap_molecule_df_under_pbc(df_atoms: DataFrame, df_bonds: DataFrame, cell_bounds: tuple[tuple[float, float], tuple[float, float], tuple[float, float]]) DataFrame

Adjust atomic coordinates to make the molecule whole under periodic boundary conditions (PBC). This function shifts atoms so that bonded atoms appear spatially close, avoiding discontinuities across cell edges.

Parameters:
  • df_atoms (pd.DataFrame) – DataFrame containing atomic coordinates. Must include columns “x”, “y”, and “z”.

  • df_bonds (pd.DataFrame) – DataFrame defining atomic bonds, with columns “atom1” and “atom2” containing atom indices.

  • cell_bounds (tuple of tuple of float) – Bounds of the periodic cell along each axis. Format: ((xmin, xmax), (ymin, ymax), (zmin, zmax)).

Returns:

A new DataFrame with adjusted atomic coordinates that are spatially continuous across the cell.

Return type:

pd.DataFrame