lammps_utils.io package
Lammps utils I/O module.
This module provides functions to convert LAMMPS data files to GROMACS gro files and to load data from LAMMPS data files
- lammps_utils.io.MolFromLAMMPSData(filepath_data_or_buffer: PathLike | str | TextIOBase, make_molecule_whole: bool = True, determine_bonds: bool = True) Mol
Constructs an RDKit Mol object from a LAMMPS data file or buffer.
This function reads atomic and bonding information from a LAMMPS-style data file, reconstructs the molecular structure by inferring bond orders based on interatomic distances, and returns a corresponding RDKit Mol object.
- Parameters:
filepath_data_or_buffer (Union[os.PathLike, str, io.TextIOBase]) – Path to the LAMMPS data file, or a file-like buffer object containing the data.
- Returns:
An RDKit Mol object with atoms and inferred bonds, including 3D coordinates as a single conformer.
- Return type:
Chem.rdchem.Mol
- lammps_utils.io.MolFromLAMMPSDump(filepath_dump: PathLike | str, mol_template: Mol, make_molecule_whole: bool = False, n_jobs: int | None = None) Mol
Create an RDKit molecule with conformers from a LAMMPS dump file.
This function loads atom coordinates from a LAMMPS trajectory file and assigns them to the provided molecular template as conformers. If specified, the molecule can be unwrapped under periodic boundary conditions to make it whole.
- Parameters:
filepath_dump (Union[os.PathLike, str]) – Path to the LAMMPS dump file to load.
mol_template (Chem.rdchem.Mol) – An RDKit molecule used as a template. The returned molecule will copy its atom and bond structure.
make_molecule_whole (bool) – If True, unwrap the molecule based on PBC to make it whole in each frame.
n_jobs (int, optional) – Number of parallel jobs for loading the dump. -1 uses all available CPUs.
- Returns:
An RDKit molecule with one conformer per frame in the LAMMPS dump file. Each conformer stores the simulation cell bounds as properties.
- Return type:
Chem.rdchem.Mol
- lammps_utils.io.data2gro(filepath_data_or_buffer: str | PathLike | TextIOBase, filepath_gro: str | PathLike | None = None) str | None
Convert LAMMPS data file to GROMACS gro file.
- Parameters:
filepath_data_or_buffer (Union[str, os.PathLike, io.TextIOBase]) – file path or buffer of LAMMPS data file
filepath_gro (Optional[Union[str, os.PathLike]], optional) – file path of GROMACS gro file, by default None
- Returns:
GROMACS gro file content if filepath_gro is None, otherwise None
- Return type:
Union[str, None]
- Raises:
FileNotFoundError – If the directory of filepath_gro does not exist
- lammps_utils.io.data2pdb(filepath_data_or_buffer: str | PathLike | TextIOBase, filepath_pdb: str | PathLike | None = None) str | None
Convert LAMMPS data file to PDB file.
- Parameters:
filepath_data_or_buffer (Union[str, os.PathLike, io.TextIOBase]) – file path or buffer of LAMMPS data file
filepath_pdb (Optional[Union[str, os.PathLike]], optional) – file path of PDB file, by default None
- Returns:
PDB file content if filepath_pdb is None, otherwise None
- Return type:
Union[str, None]
- Raises:
FileNotFoundError – If the directory of filepath_pdb does not exist
- lammps_utils.io.get_atom_type_masses(filepath_data_or_buffer: str | PathLike | TextIOBase | BufferedIOBase) dict[int, float]
Get the masses of atom types from a LAMMPS data file or a file-like object.
- Parameters:
filepath_data_or_buffer (Union[str, os.PathLike, io.TextIOBase]) – The file path or file-like object to read.
- Returns:
A dictionary mapping atom type IDs to their masses.
- Return type:
dict[int, float]
- Raises:
ValueError – If the atom type masses cannot be found in the file.
- lammps_utils.io.get_atom_type_symbols(filepath_data_or_buffer: str | PathLike | TextIOBase | BufferedIOBase) dict[int, str]
Get the symbols of atom types from a LAMMPS data file or a file-like object.
- Parameters:
filepath_data_or_buffer (Union[str, os.PathLike, io.TextIOBase]) – The file path or file-like object to read.
- Returns:
A dictionary mapping atom type IDs to their symbols.
- Return type:
dict[int, str]
- lammps_utils.io.get_cell_bounds(filepath_data_or_buffer: str | PathLike | TextIOBase | BufferedIOBase) tuple[tuple[float, float], tuple[float, float], tuple[float, float]]
Get the cell limits from a LAMMPS data file or a file-like object.
- Parameters:
filepath_data_or_buffer (Union[str, os.PathLike, io.TextIOBase]) – The file path or file-like object to read.
- Returns:
A tuple containing the cell limits for x, y, and z axes.
- Return type:
tuple[tuple[float, float], tuple[float, float], tuple[float, float]]
- lammps_utils.io.get_n_atom_types(filepath_data_or_buffer: str | PathLike | TextIOBase | BufferedIOBase) int
Get the number of atom types from a LAMMPS data file or a file-like object.
- Parameters:
filepath_data_or_buffer (Union[str, os.PathLike, io.TextIOBase]) – The file path or file-like object to read.
- Returns:
The number of atom types in the LAMMPS data file.
- Return type:
int
- Raises:
ValueError – If the number of atom types cannot be found in the file.
- lammps_utils.io.get_n_atoms(filepath_data_or_buffer: str | PathLike | TextIOBase | BufferedIOBase) int
Get the number of atoms from a LAMMPS data file or a file-like object.
- Parameters:
filepath_data_or_buffer (Union[str, os.PathLike, io.TextIOBase]) – The file path or file-like object to read.
- Returns:
The number of atoms in the LAMMPS data file.
- Return type:
int
- Raises:
ValueError – If the number of atoms cannot be found in the file.
- lammps_utils.io.get_n_bonds(filepath_data_or_buffer: str | PathLike | TextIOBase | BufferedIOBase) int
Get the number of bonds from a LAMMPS data file or a file-like object.
- Parameters:
filepath_data_or_buffer (Union[str, os.PathLike, io.TextIOBase]) – The file path or file-like object to read.
- Returns:
The number of bonds in the LAMMPS data file.
- Return type:
int
- Raises:
ValueError – If the number of bonds cannot be found in the file.
- lammps_utils.io.load_data(filepath_data_or_buffer: str | PathLike | TextIOBase | BufferedIOBase, make_molecule_whole: bool = False, return_bond_info: bool = False, return_cell_bounds: bool = False) DataFrame | tuple[DataFrame, DataFrame] | tuple[DataFrame, tuple[tuple[float, float], tuple[float, float], tuple[float, float]]] | tuple[DataFrame, DataFrame, tuple[tuple[float, float], tuple[float, float], tuple[float, float]]]
Load atom (and optionally bond and cell) data from a LAMMPS data file into a DataFrame.
This function supports file paths and file-like objects and optionally reconstructs molecules by unwrapping coordinates under periodic boundary conditions.
- Parameters:
filepath_data_or_buffer (str or os.PathLike or io.TextIOBase or io.BufferedIOBase) – The path to a LAMMPS data file, or a file-like object containing the data.
make_molecule_whole (bool, default=False) – If True, unwraps atomic coordinates using bond connectivity and periodic cell bounds so that molecules are made whole (not split across periodic boundaries).
return_bond_info (bool, default=False) – If True, returns an additional DataFrame containing bond information.
return_cell_bounds (bool, default=False) – If True, returns the simulation cell bounds as a tuple of 3 (min, max) pairs for x, y, z.
- Returns:
Union[pd.DataFrame,
tuple[pd.DataFrame, pd.DataFrame],
tuple[pd.DataFrame, tuple[tuple[float, float], tuple[float, float], tuple[float, float]]],
tuple[pd.DataFrame, pd.DataFrame, tuple[tuple[float, float], tuple[float, float], tuple[float, float]]]
] – The atom DataFrame is always returned. Depending on the flags:
If return_bond_info is True, bond DataFrame is included.
If return_cell_bounds is True, simulation box bounds are included.
If both flags are True, all three values are returned as a tuple: (atom DataFrame, bond DataFrame, cell bounds).
- Raises:
AssertionError – If required components (like conformers) are missing when make_molecule_whole is True.
Notes
This function assumes the input file is in LAMMPS data format with sections for atoms, bonds, and box bounds. If make_molecule_whole is enabled, bond and box information are automatically parsed regardless of the other flags.
- lammps_utils.io.load_dump(filepath_dump: PathLike | str, buffer_size: int = 10485760, return_cell_bounds: bool = False, n_jobs: int | None = None) tuple[tuple[int, DataFrame, tuple[tuple[float, float], tuple[float, float], tuple[float, float]]], ...] | tuple[tuple[int, DataFrame], ...]
Load and parse a LAMMPS dump file into structured timestep data.
- Parameters:
filepath_dump (Union[os.PathLike, str]) – Path to the LAMMPS dump file to be loaded.
buffer_size (int, optional) – Size of the buffer to use when scanning the file for timestep offsets (in bytes). Larger values may improve performance on large files. Default is 10 MB.
return_cell_bounds (bool, optional) – Whether to extract and return cell bounds for each timestep. If True, the output will include cell bounds in addition to timestep and atomic data. Default is False.
n_jobs (Optional[int], optional) – Number of parallel jobs to run. If None, defaults to single-threaded operation.
- Returns:
Union[
tuple[tuple[int, pd.DataFrame, tuple[tuple[float, float], tuple[float, float], tuple[float, float]]], …],
tuple[tuple[int, pd.DataFrame], …]
] – A tuple of timestep data. Each element is either: - (timestep, DataFrame) if return_cell_bounds is False - (timestep, DataFrame, cell_bounds) if return_cell_bounds is True
timestep is an integer, DataFrame contains atomic data for that step, and cell_bounds is a 3-tuple of (min, max) pairs for x, y, z.
- lammps_utils.io.unwrap_molecule_df_under_pbc(df_atoms: DataFrame, df_bonds: DataFrame, cell_bounds: tuple[tuple[float, float], tuple[float, float], tuple[float, float]]) DataFrame
Adjust atomic coordinates to make the molecule whole under periodic boundary conditions (PBC). This function shifts atoms so that bonded atoms appear spatially close, avoiding discontinuities across cell edges.
- Parameters:
df_atoms (pd.DataFrame) – DataFrame containing atomic coordinates. Must include columns “x”, “y”, and “z”.
df_bonds (pd.DataFrame) – DataFrame defining atomic bonds, with columns “atom1” and “atom2” containing atom indices.
cell_bounds (tuple of tuple of float) – Bounds of the periodic cell along each axis. Format: ((xmin, xmax), (ymin, ymax), (zmin, zmax)).
- Returns:
A new DataFrame with adjusted atomic coordinates that are spatially continuous across the cell.
- Return type:
pd.DataFrame