Skip to content

Molecule

Molecule

Class to represent information about a molecule i.e. a set of atoms with 3D coordinates joined by covalent bonds

e.g. int, float, str etc. Will handle uncertainty values contained in parentheses.

Attributes:

Name Type Description
elements ndarray

list of element information for each atom in this molecule

positions ndarray

(N, 3) array of Cartesian coordinates for each atom in this molecule (Angstroms)

bonds dok_matrix

(N, N) adjacency matrix of bond lengths for connected atoms, 0 otherwise. If not provided this will be calculated.

labels ndarray

(N,) vector of string labels for each atom in this molecule If not provided this will assigned default labels i.e. numbered in order.

proerties

Additional keyword arguments will be stored in the properties member, and some may be utilized in methods, raising an exception if they are not set.

asym_symops property readonly

the symmetry operations which generate this molecule (default x,y,z if not set)

atomic_numbers: ndarray property readonly

Atomic numbers for each atom in this molecule

bbox_corners: Tuple property readonly

the lower, upper corners of a axis-aligned bounding box for this molecule

bbox_size: ndarray property readonly

the dimensions of the axis-aligned bounding box for this molecule

center_of_mass: ndarray property readonly

Mean cartesian position of atoms in this molecule, weighted by atomic mass

centroid: ndarray property readonly

Mean cartesian position of atoms in this molecule

distance_matrix: ndarray property readonly

The (dense) pairwise distance matrix for this molecule

molecular_formula: str property readonly

string of the molecular formula for this molecule

name property readonly

The name of this molecule, checks 'GENERIC_NAME' and 'name' keys in self.properties

partial_charges: ndarray property readonly

The partial charges associated with atoms in this molecule. If self._partial_charges is not set, the charges will be assigned based on EEM method.

unique_bonds: List property readonly

The unique bonds for this molecule. If bonds are not assigned, this will None

__init__(self, elements, positions, bonds=None, labels=None, **kwargs) special

Initialize a new molecule.

Parameters:

Name Type Description Default
elements List[Element]

N length list of elements associated with the sites

required
positions array_like

(N, 3) array of site positions in Cartesian coordinates

required
bonds dok_matrix

if bonds are already calculated provide them here

None
labels array_like

labels (array_like): N length array of string labels for each site

None
kwargs

Additional properties (will populate the properties member) to store in this molecule

{}
Source code in chmpy/core/molecule.py
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
def __init__(self, elements, positions, bonds=None, labels=None, **kwargs):
    """
    Initialize a new molecule.

    Arguments:
        elements (List[Element]): N length list of elements associated with the sites
        positions (array_like): (N, 3) array of site positions in Cartesian coordinates
        bonds (dok_matrix, optional): if bonds are already calculated provide them here
        labels (array_like, optional): labels (array_like): N length array of string labels for each site
        kwargs: Additional properties (will populate the properties member)
            to store in this molecule
    """
    self.positions = positions
    self.elements = elements
    self.properties = {}
    self.properties.update(kwargs)
    self.bonds = None

    self.charge = kwargs.get("charge", 0)
    self.multiplicity = kwargs.get("multiplicity", 1)

    if bonds is None:
        if kwargs.get("guess_bonds", False):
            self.guess_bonds()
    else:
        self.bonds = dok_matrix(bonds)

    if labels is None:
        self.assign_default_labels()
    else:
        self.labels = labels

assign_default_labels(self)

Assign the default labels to atom sites in this molecule (number them by element)

Source code in chmpy/core/molecule.py
162
163
164
165
166
167
168
169
def assign_default_labels(self):
    "Assign the default labels to atom sites in this molecule (number them by element)"
    counts = defaultdict(int)
    labels = []
    for el, _ in self:
        counts[el] += 1
        labels.append("{}{}".format(el.symbol, counts[el]))
    self.labels = np.asarray(labels)

atomic_shape_descriptors(self, l_max=5, radius=6.0, background=1e-05)

Calculate the shape descriptors[1,2] for all atoms in this isolated molecule. If you wish to use the crystal environment please see the corresponding method in :obj:chmpy.crystal.Crystal.

Parameters:

Name Type Description Default
l_max int

maximum level of angular momenta to include in the spherical harmonic transform of the shape function. (default=5)

5
radius float

Maximum distance in Angstroms between any atom in the molecule and the resulting neighbouring atoms (default=6.0)

6.0
background float

'background' density to ensure closed surfaces for isolated atoms (default=1e-5)

1e-05

Returns:

Type Description
ndarray

shape description vector

References

[1] PR Spackman et al. Sci. Rep. 6, 22204 (2016)
    https://dx.doi.org/10.1038/srep22204
[2] PR Spackman et al. Angew. Chem. 58 (47), 16780-16784 (2019)
    https://dx.doi.org/10.1002/anie.201906602
Source code in chmpy/core/molecule.py
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
def atomic_shape_descriptors(
    self, l_max=5, radius=6.0, background=1e-5
) -> np.ndarray:
    """
    Calculate the shape descriptors`[1,2]` for all
    atoms in this isolated molecule. If you wish to use
    the crystal environment please see the corresponding method
    in :obj:`chmpy.crystal.Crystal`.

    Args:
        l_max (int, optional): maximum level of angular momenta to include in the spherical harmonic
            transform of the shape function. (default=5)
        radius (float, optional): Maximum distance in Angstroms between any atom in the molecule
            and the resulting neighbouring atoms (default=6.0)
        background (float, optional): 'background' density to ensure closed surfaces for isolated atoms
            (default=1e-5)

    Returns:
        shape description vector

    References
    ```
    [1] PR Spackman et al. Sci. Rep. 6, 22204 (2016)
        https://dx.doi.org/10.1038/srep22204
    [2] PR Spackman et al. Angew. Chem. 58 (47), 16780-16784 (2019)
        https://dx.doi.org/10.1002/anie.201906602
    ```
    """
    descriptors = []
    from chmpy.shape import SHT, stockholder_weight_descriptor

    sph = SHT(l_max=l_max)
    elements = self.atomic_numbers
    positions = self.positions
    dists = self.distance_matrix

    for n in range(elements.shape[0]):
        els = elements[n : n + 1]
        pos = positions[n : n + 1, :]
        idxs = np.where((dists[n, :] < radius) & (dists[n, :] > 1e-3))[0]
        neighbour_els = elements[idxs]
        neighbour_pos = positions[idxs]
        ubound = Element[n].vdw_radius * 3
        descriptors.append(
            stockholder_weight_descriptor(
                sph,
                els,
                pos,
                neighbour_els,
                neighbour_pos,
                bounds=(0.2, ubound),
                background=background,
            )
        )
    return np.asarray(descriptors)

atomic_stockholder_weight_isosurfaces(self, **kwargs)

Calculate the stockholder weight isosurfaces for the atoms in this molecule, with the provided background density.

Parameters:

Name Type Description Default
kwargs dict

keyword arguments to be passed to isosurface generation code

Options are:

background: float, optional
    'background' density to ensure closed surfaces for isolated atoms
    (default=1e-5)
isovalue: float, optional
    level set value for the isosurface (default=0.5). Must be between
    0 and 1, but values other than 0.5 probably won't make sense anyway.
separation: float, optional
    separation between density grid used in the surface calculation
    (default 0.2) in Angstroms.
radius: float, optional
    maximum distance for contributing neighbours for the stockholder
    weight calculation
color: str, optional
    surface property to use for vertex coloring, one of ('d_norm_i',
    'd_i', 'd_norm_e', 'd_e', 'd_norm')
colormap: str, optional
    matplotlib colormap to use for surface coloring (default 'viridis_r')
midpoint: float, optional, default 0.0 if using d_norm
    use the midpoint norm (as is used in CrystalExplorer)
{}

Returns:

Type Description
List[trimesh.Trimesh]

A list of meshes representing the stockholder weight isosurfaces

Source code in chmpy/core/molecule.py
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
def atomic_stockholder_weight_isosurfaces(self, **kwargs):
    """
    Calculate the stockholder weight isosurfaces for the atoms
    in this molecule, with the provided background density.

    Args:
        kwargs (dict): keyword arguments to be passed to isosurface
            generation code

            Options are:
            ```
            background: float, optional
                'background' density to ensure closed surfaces for isolated atoms
                (default=1e-5)
            isovalue: float, optional
                level set value for the isosurface (default=0.5). Must be between
                0 and 1, but values other than 0.5 probably won't make sense anyway.
            separation: float, optional
                separation between density grid used in the surface calculation
                (default 0.2) in Angstroms.
            radius: float, optional
                maximum distance for contributing neighbours for the stockholder
                weight calculation
            color: str, optional
                surface property to use for vertex coloring, one of ('d_norm_i',
                'd_i', 'd_norm_e', 'd_e', 'd_norm')
            colormap: str, optional
                matplotlib colormap to use for surface coloring (default 'viridis_r')
            midpoint: float, optional, default 0.0 if using d_norm
                use the midpoint norm (as is used in CrystalExplorer)
            ```

    Returns:
        List[trimesh.Trimesh]: A list of meshes representing the stockholder weight isosurfaces
    """

    from chmpy import StockholderWeight
    from chmpy.surface import stockholder_weight_isosurface
    from matplotlib.cm import get_cmap
    import trimesh
    from chmpy.util.color import DEFAULT_COLORMAPS

    sep = kwargs.get("separation", kwargs.get("resolution", 0.2))
    radius = kwargs.get("radius", 12.0)
    background = kwargs.get("background", 1e-5)
    vertex_color = kwargs.get("color", "d_norm_i")
    isovalue = kwargs.get("isovalue", 0.5)
    midpoint = kwargs.get("midpoint", 0.0 if vertex_color == "d_norm" else None)
    meshes = []
    colormap = get_cmap(
        kwargs.get("colormap", DEFAULT_COLORMAPS.get(vertex_color, "viridis_r"))
    )
    isos = []
    elements = self.atomic_numbers
    positions = self.positions
    dists = self.distance_matrix

    for n in range(elements.shape[0]):
        els = elements[n : n + 1]
        pos = positions[n : n + 1, :]
        idxs = np.where((dists[n, :] < radius) & (dists[n, :] > 1e-3))[0]
        neighbour_els = elements[idxs]
        neighbour_pos = positions[idxs]

        s = StockholderWeight.from_arrays(
            els, pos, neighbour_els, neighbour_pos, background=background
        )
        iso = stockholder_weight_isosurface(s, isovalue=isovalue, sep=sep)
        isos.append(iso)
    for iso in isos:
        prop = iso.vertex_prop[vertex_color]
        norm = None
        if midpoint is not None:
            from matplotlib.colors import DivergingNorm

            norm = DivergingNorm(vmin=prop.min(), vcenter=midpoint, vmax=prop.max())
            prop = norm(prop)
        color = colormap(prop)
        mesh = trimesh.Trimesh(
            vertices=iso.vertices,
            faces=iso.faces,
            normals=iso.normals,
            vertex_colors=color,
        )
        meshes.append(mesh)
    return meshes

bond_graph(self)

Calculate the graph_tool.Graph object corresponding to this molecule. Requires the graph_tool library to be installed

Returns:

Type Description
graph_tool.Graph

the (undirected) graph of this molecule

Source code in chmpy/core/molecule.py
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
def bond_graph(self):
    """
    Calculate the `graph_tool.Graph` object corresponding
    to this molecule. Requires the graph_tool library to be
    installed

    Returns:
        graph_tool.Graph: the (undirected) graph of this molecule
    """

    if hasattr(self, "_bond_graph"):
        return getattr(self, "_bond_graph")
    try:
        import graph_tool as gt
    except ImportError:
        raise RuntimeError(
            "Please install the graph_tool library for graph operations"
        )
    if self.bonds is None:
        self.guess_bonds()
    g = gt.Graph(directed=False)
    v_el = g.new_vertex_property("int")
    g.add_edge_list(self.bonds.keys())
    e_w = g.new_edge_property("float")
    v_el.a[:] = self.atomic_numbers
    g.vertex_properties["element"] = v_el
    e_w.a[:] = list(self.bonds.values())
    g.edge_properties["bond_distance"] = e_w
    self._bond_graph = g
    return g

connected_fragments(self)

Separate this molecule into fragments/molecules based on covalent bonding criteria.

Returns:

Type Description
List

a list of connected Molecule objects

Source code in chmpy/core/molecule.py
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
def connected_fragments(self) -> List:
    """
    Separate this molecule into fragments/molecules based
    on covalent bonding criteria.

    Returns:
        a list of connected `Molecule` objects
    """
    from chmpy.util.num import cartesian_product
    from scipy.sparse.csgraph import connected_components

    if self.bonds is None:
        self.guess_bonds()

    nfrag, labels = connected_components(self.bonds)
    molecules = []
    for frag in range(nfrag):
        atoms = np.where(labels == frag)[0]
        na = len(atoms)
        sqidx = cartesian_product(atoms, atoms)
        molecules.append(
            Molecule(
                [self.elements[i] for i in atoms],
                self.positions[atoms],
                labels=self.labels[atoms],
                bonds=self.bonds[sqidx[:, 0], sqidx[:, 1]].reshape(na, na),
            )
        )
    return molecules

distance_to(self, other, method='centroid')

Calculate the euclidean distance between this molecule and another. May use the distance between centres-of-mass, centroids, or nearest atoms.

Parameters other (Molecule): the molecule to calculate distance to method (str, optional): one of 'centroid', 'center_of_mass', 'nearest_atom'

Source code in chmpy/core/molecule.py
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
def distance_to(self, other, method="centroid"):
    """
    Calculate the euclidean distance between this
    molecule and another. May use the distance between
    centres-of-mass, centroids, or nearest atoms.

    Parameters
        other (Molecule): the molecule to calculate distance to
        method (str, optional): one of 'centroid', 'center_of_mass', 'nearest_atom'
    """
    method = method.lower()
    if method == "centroid":
        return np.linalg.norm(self.centroid - other.centroid)
    elif method == "center_of_mass":
        return np.linalg.norm(self.center_of_mass - other.center_of_mass)
    elif method == "nearest_atom":
        return np.min(cdist(self.positions, other.positions))
    else:
        raise ValueError(f"Unknown method={method}")

electrostatic_potential(self, positions)

Calculate the electrostatic potential based on the partial charges associated with this molecule. The potential will be in atomic units.

Parameters:

Name Type Description Default
positions np.ndarray

(N, 3) array of locations where the molecular ESP should be calculated. Assumed to be in Angstroms.

required

Returns:

Type Description
ndarray

np.ndarray: (N,) array of electrostatic potential values (atomic units) at the given positions.

Source code in chmpy/core/molecule.py
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
def electrostatic_potential(self, positions) -> np.ndarray:
    """
    Calculate the electrostatic potential based on the partial
    charges associated with this molecule. The potential will be
    in atomic units.

    Args:
        positions (np.ndarray): (N, 3) array of locations where the molecular ESP should
            be calculated. Assumed to be in Angstroms.

    Returns:
        np.ndarray: (N,) array of electrostatic potential values (atomic units) at the given
        positions.
    """
    from chmpy.util.unit import BOHR_TO_ANGSTROM

    BOHR_PER_ANGSTROM = 1 / BOHR_TO_ANGSTROM

    v_pot = np.zeros(positions.shape[0])
    for charge, position in zip(self.partial_charges, self.positions):
        if charge == 0.0:
            continue
        r = np.linalg.norm(positions - position[np.newaxis, :], axis=1)
        v_pot += charge / (r * BOHR_PER_ANGSTROM)
    return v_pot

from_arrays(elements, positions, **kwargs) classmethod

Construct a molecule from the provided arrays. kwargs will be passed through to the Molecule constructor.

Parameters:

Name Type Description Default
elements np.ndarray

(N,) array of atomic numbers for each atom in this molecule

required
positions np.ndarray

(N, 3) array of Cartesian coordinates for each atom in this molecule (Angstroms)

required

Returns:

Type Description
Molecule

a new molecule object

Source code in chmpy/core/molecule.py
844
845
846
847
848
849
850
851
852
853
854
855
856
857
@classmethod
def from_arrays(cls, elements, positions, **kwargs):
    """
    Construct a molecule from the provided arrays. kwargs
    will be passed through to the Molecule constructor.

    Args:
        elements (np.ndarray): (N,) array of atomic numbers for each atom in this molecule
        positions (np.ndarray): (N, 3) array of Cartesian coordinates for each atom in this molecule (Angstroms)

    Returns:
        Molecule: a new molecule object
    """
    return cls([Element[x] for x in elements], np.array(positions), **kwargs)

from_sdf_dict(sdf_dict, **kwargs) classmethod

Construct a molecule from the provided dictionary of sdf terms. Not intended for typical use cases, but as a helper method for Molecule.from_sdf_file

Parameters:

Name Type Description Default
sdf_dict dict

a dictionary containing the 'atoms', 'x', 'y', 'z', 'symbol', 'bonds' members.

required

Returns:

Type Description
Molecule

Molecule: a new Molecule from the provided data

Source code in chmpy/core/molecule.py
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
@classmethod
def from_sdf_dict(cls, sdf_dict, **kwargs) -> "Molecule":
    """
    Construct a molecule from the provided dictionary of
    sdf terms. Not intended for typical use cases, but as a
    helper method for `Molecule.from_sdf_file`

    Args:
        sdf_dict (dict): a dictionary containing the 'atoms', 'x', 'y', 'z',
            'symbol', 'bonds' members.

    Returns:
        Molecule: a new `Molecule` from the provided data
    """
    atoms = sdf_dict["atoms"]
    positions = np.c_[atoms["x"], atoms["y"], atoms["z"]]
    elements = [Element[x] for x in atoms["symbol"]]
    # TODO use bonds from SDF
    # bonds = sdf_dict["bonds"]
    m = cls(elements, positions, **sdf_dict["data"])
    if "sdf" in sdf_dict:
        m.properties["sdf"] = sdf_dict["sdf"]
    return m

from_sdf_file(filename, **kwargs) classmethod

Construct a molecule from the provided SDF file. Because an SDF file can have multiple molecules, an optional keyword argument 'progress' may be provided to track the loading of many molecules.

Parameters:

Name Type Description Default
filename str

the path of the SDF file to read.

required

Returns:

Type Description
Molecule

a new Molecule or list of :obj:Molecule objects from the provided SDF file.

Source code in chmpy/core/molecule.py
 992
 993
 994
 995
 996
 997
 998
 999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
@classmethod
def from_sdf_file(cls, filename, **kwargs):
    """
    Construct a molecule from the provided SDF file.
    Because an SDF file can have multiple molecules,
    an optional keyword argument 'progress' may be provided
    to track the loading of many molecules.

    Args:
        filename (str): the path of the SDF file to read.

    Returns:
        Molecule: a new `Molecule` or list of :obj:`Molecule` objects
        from the provided SDF file.
    """

    from chmpy.fmt.sdf import parse_sdf_file

    sdf_data = parse_sdf_file(filename, **kwargs)
    progress = kwargs.get("progress", False)
    update = lambda x: None

    if progress:
        from tqdm import tqdm

        pbar = tqdm(
            desc="Creating molecule objects", total=len(sdf_data), leave=False
        )
        update = pbar.update

    molecules = []
    for d in sdf_data:
        molecules.append(cls.from_sdf_dict(d, **kwargs))
        update(1)

    if progress:
        pbar.close()

    if len(molecules) == 1:
        return molecules[0]
    return molecules

from_turbomole_file(filename, **kwargs) classmethod

Construct a molecule from the provided turbomole file. kwargs will be passed through to the Molecule constructor.

Parameters:

Name Type Description Default
filename str

the path to the .xyz file

required
kwargs

keyword arguments passed ot the Molecule constructor

{}

Returns:

Type Description
Molecule

A new Molecule object

Source code in chmpy/core/molecule.py
317
318
319
320
321
322
323
324
325
326
327
328
329
330
@classmethod
def from_turbomole_file(cls, filename, **kwargs):
    """
    Construct a molecule from the provided turbomole file. kwargs
    will be passed through to the Molecule constructor.

    Args:
        filename (str): the path to the .xyz file
        kwargs: keyword arguments passed ot the `Molecule` constructor

    Returns:
        Molecule: A new `Molecule` object
    """
    return cls.from_turbomole_string(Path(filename).read_text(), **kwargs)

from_turbomole_string(contents, **kwargs) classmethod

Construct a molecule from the provided turbomole file contents. kwargs will be passed through to the Molecule constructor.

Parameters:

Name Type Description Default
contents str

the contents of the .xyz file to read

required
kwargs

keyword arguments passed ot the Molecule constructor

{}

Returns:

Type Description
Molecule

A new Molecule object

Source code in chmpy/core/molecule.py
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
@classmethod
def from_turbomole_string(cls, contents, **kwargs):
    """
    Construct a molecule from the provided turbomole file contents. kwargs
    will be passed through to the Molecule constructor.

    Args:
        contents (str): the contents of the .xyz file to read
        kwargs: keyword arguments passed ot the `Molecule` constructor

    Returns:
        Molecule: A new `Molecule` object
    """
    from chmpy.fmt.tmol import parse_tmol_string

    elements, positions = parse_tmol_string(contents)
    return cls(elements, np.asarray(positions), **kwargs)

from_xyz_file(filename, **kwargs) classmethod

Construct a molecule from the provided xmol .xyz file. kwargs will be passed through to the Molecule constructor.

Parameters:

Name Type Description Default
filename str

the path to the .xyz file

required
kwargs

keyword arguments passed ot the Molecule constructor

{}

Returns:

Type Description
Molecule

A new Molecule object

Source code in chmpy/core/molecule.py
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
@classmethod
def from_xyz_file(cls, filename, **kwargs):
    """
    Construct a molecule from the provided xmol .xyz file. kwargs
    will be passed through to the Molecule constructor.

    Args:
        filename (str): the path to the .xyz file
        kwargs: keyword arguments passed ot the `Molecule` constructor

    Returns:
        Molecule: A new `Molecule` object
    """

    return cls.from_xyz_string(Path(filename).read_text(), **kwargs)

from_xyz_string(contents, **kwargs) classmethod

Construct a molecule from the provided xmol .xyz file. kwargs will be passed through to the Molecule constructor.

Parameters:

Name Type Description Default
contents str

the contents of the .xyz file to read

required
kwargs

keyword arguments passed ot the Molecule constructor

{}

Returns:

Type Description
Molecule

A new Molecule object

Source code in chmpy/core/molecule.py
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
@classmethod
def from_xyz_string(cls, contents, **kwargs):
    """
    Construct a molecule from the provided xmol .xyz file. kwargs
    will be passed through to the Molecule constructor.

    Args:
        contents (str): the contents of the .xyz file to read
        kwargs: keyword arguments passed ot the `Molecule` constructor

    Returns:
        Molecule: A new `Molecule` object
    """
    from chmpy.fmt.xyz_file import parse_xyz_string

    elements, positions = parse_xyz_string(contents)
    return cls(elements, np.asarray(positions), **kwargs)

functional_groups(self, kind=None)

Find all indices of atom groups which constitute subgraph isomorphisms with stored functional group data

Parameters:

Name Type Description Default
kind str

Find only matches of the given kind

None

Returns:

Type Description
Union[dict, List]

Either a dict with keys as functional group type and values as list of lists of indices, or a list of lists of indices if kind is specified.

Source code in chmpy/core/molecule.py
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
def functional_groups(self, kind=None) -> Union[dict, List]:
    """
    Find all indices of atom groups which constitute
    subgraph isomorphisms with stored functional group data

    Args:
        kind (str, optional):Find only matches of the given kind

    Returns:
        Either a dict with keys as functional group type and values as list of
        lists of indices, or a list of lists of indices if kind is specified.
    """
    global _FUNCTIONAL_GROUP_SUBGRAPHS
    try:
        import graph_tool.topology as top  # noqa: F401
    except ImportError:
        raise RuntimeError(
            "Please install the graph_tool library for graph operations"
        )
    if not _FUNCTIONAL_GROUP_SUBGRAPHS:
        from chmpy.subgraphs import load_data

        _FUNCTIONAL_GROUP_SUBGRAPHS = load_data()

    if kind is not None:
        sub = _FUNCTIONAL_GROUP_SUBGRAPHS[kind]
        matches = self.matching_subgraph(sub)
        if kind == "ring":
            matches = list(set(tuple(sorted(x)) for x in matches))
        return matches

    matches = {}
    for n, sub in _FUNCTIONAL_GROUP_SUBGRAPHS.items():
        m = self.matching_subgraph(sub)
        if n == "ring":
            m = list(set(tuple(sorted(x)) for x in m))
        matches[n] = m
    return matches

guess_bonds(self, tolerance=0.4)

Use geometric distances and covalent radii to determine bonding information for this molecule.

Bonding is determined by the distance between sites being closer than the sum of covalent radii + tolerance

Will set the bonds member.

If the graph_tool library is available, this will call the bond_graph method to populate the connectivity graph.

Parameters:

Name Type Description Default
tolerance float

Additional tolerance for attributing two sites as 'bonded'. The default is 0.4 angstroms, which is recommended by the CCDC

0.4
Source code in chmpy/core/molecule.py
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
def guess_bonds(self, tolerance=0.40):
    """
    Use geometric distances and covalent radii
    to determine bonding information for this molecule.

    Bonding is determined by the distance between
    sites being closer than the sum of covalent radii + `tolerance`

    Will set the `bonds` member.

    If the `graph_tool` library is available, this will call the
    `bond_graph` method to populate the connectivity graph.


    Args:
        tolerance (float, optional): Additional tolerance for attributing two sites as 'bonded'.
            The default is 0.4 angstroms, which is recommended by the CCDC
    """
    tree = KDTree(self.positions)
    covalent_radii = np.array([x.cov for x in self.elements])
    max_cov = np.max(covalent_radii)
    thresholds = (
        covalent_radii[:, np.newaxis] + covalent_radii[np.newaxis, :] + tolerance
    )
    max_distance = max_cov * 2 + tolerance
    dist = tree.sparse_distance_matrix(tree, max_distance=max_distance).toarray()
    mask = (dist > 0) & (dist < thresholds)
    self.bonds = np.zeros(dist.shape)
    self.bonds[mask] = dist[mask]
    self.bonds = dok_matrix(self.bonds)
    try:
        self.bond_graph()
    except Exception:
        pass

load(filename, **kwargs) classmethod

Construct a molecule from the provided file.

Parameters:

Name Type Description Default
filename str

the path to the (xyz or SDF) file

required
kwargs

keyword arguments passed ot the Molecule constructor

{}

Returns:

Type Description
Molecule

A new Molecule object

Source code in chmpy/core/molecule.py
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
@classmethod
def load(cls, filename, **kwargs):
    """
    Construct a molecule from the provided file. 

    Args:
        filename (str): the path to the (xyz or SDF) file
        kwargs: keyword arguments passed ot the `Molecule` constructor

    Returns:
        Molecule: A new `Molecule` object
    """
    fpath = Path(filename)
    n = fpath.name
    fname_map = cls._fname_load_map()
    if n in fname_map:
        return fname_map[n](filename)
    extension_map = cls._ext_load_map()
    extension = kwargs.pop("fmt", fpath.suffix.lower())
    if not extension.startswith("."):
        extension = "." + extension
    return extension_map[extension](filename, **kwargs)

mask(self, mask, **kwargs)

Convenience method to construct a new molecule from this molecule with the given mask array.

Parameters:

Name Type Description Default
mask np.ndarray

a numpy mask array used to filter which atoms to keep in the new molecule.

required

Returns:

Type Description
Molecule

a new Molecule, with atoms filtered by the mask.

Source code in chmpy/core/molecule.py
859
860
861
862
863
864
865
866
867
868
869
870
871
872
def mask(self, mask, **kwargs):
    """
    Convenience method to construct a new molecule from this molecule with the given mask
    array.

    Args:
        mask (np.ndarray): a numpy mask array used to filter which atoms to keep in the new molecule.

    Returns:
        Molecule: a new `Molecule`, with atoms filtered by the mask.
    """
    return Molecule.from_arrays(
        self.atomic_numbers[mask], self.positions[mask], **kwargs
    )

matching_fragments(self, fragment, method='connectivity')

Find the indices of a matching fragment to the given molecular fragment

Parameters:

Name Type Description Default
fragment Molecule

Molecule object containing the desired fragment

required
method str

the method for matching

'connectivity'

Returns:

Type Description
List[dict]

List of maps between matching indices in this molecule and those in the fragment

Source code in chmpy/core/molecule.py
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
def matching_fragments(self, fragment, method="connectivity"):
    """
    Find the indices of a matching fragment to the given
    molecular fragment

    Args:
        fragment (Molecule): Molecule object containing the desired fragment
        method (str, optional): the method for matching

    Returns:
        List[dict]: List of maps between matching indices in this molecule and those
            in the fragment
    """
    try:
        import graph_tool.topology as top
    except ImportError:
        raise RuntimeError(
            "Please install the graph_tool library for graph operations"
        )

    sub = fragment.bond_graph()
    g = self.bond_graph()
    matches = top.subgraph_isomorphism(
        sub,
        g,
        vertex_label=(
            sub.vertex_properties["element"],
            g.vertex_properties["element"],
        ),
    )
    return [list(x.a) for x in matches]

matching_subgraph(self, sub)

Find all indices of atoms which match the given graph.

Parameters:

Name Type Description Default
sub graph_tool.Graph

the subgraph

required

Returns:

Type Description
List

list of lists of atomic indices matching the atoms in sub to those in this molecule

Source code in chmpy/core/molecule.py
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
def matching_subgraph(self, sub):
    """Find all indices of atoms which match the given graph.

    Args:
        sub (graph_tool.Graph): the subgraph

    Returns:
        List: list of lists of atomic indices matching the atoms in sub
            to those in this molecule
    """

    try:
        import graph_tool.topology as top
    except ImportError:
        raise RuntimeError(
            "Please install the graph_tool library for graph operations"
        )

    g = self.bond_graph()
    matches = top.subgraph_isomorphism(
        sub,
        g,
        vertex_label=(
            sub.vertex_properties["element"],
            g.vertex_properties["element"],
        ),
    )
    return [tuple(x.a) for x in matches]

promolecule_density_isosurface(self, **kwargs)

Calculate promolecule electron density isosurface for this molecule.

Parameters:

Name Type Description Default
kwargs

keyword arguments passed to promolecule_density_isosurface Options are:

isovalue: float, optional
    level set value for the isosurface (default=0.002) in au.
separation: float, optional
    separation between density grid used in the surface calculation
    (default 0.2) in Angstroms.
color: str, optional
    surface property to use for vertex coloring, one of ('d_norm_i',
    'd_i', 'd_norm_e', 'd_e')
colormap: str, optional
    matplotlib colormap to use for surface coloring (default 'viridis_r')
midpoint: float, optional, default 0.0 if using d_norm
    use the midpoint norm (as is used in CrystalExplorer)
{}

Returns:

Type Description
trimesh.Trimesh

A mesh representing the promolecule density isosurface

Source code in chmpy/core/molecule.py
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
def promolecule_density_isosurface(self, **kwargs):
    """
    Calculate promolecule electron density isosurface
    for this molecule.
    Args:
        kwargs: keyword arguments passed to `promolecule_density_isosurface`
            Options are:
            ```
            isovalue: float, optional
                level set value for the isosurface (default=0.002) in au.
            separation: float, optional
                separation between density grid used in the surface calculation
                (default 0.2) in Angstroms.
            color: str, optional
                surface property to use for vertex coloring, one of ('d_norm_i',
                'd_i', 'd_norm_e', 'd_e')
            colormap: str, optional
                matplotlib colormap to use for surface coloring (default 'viridis_r')
            midpoint: float, optional, default 0.0 if using d_norm
                use the midpoint norm (as is used in CrystalExplorer)
            ```

    Returns:
        trimesh.Trimesh: A mesh representing the promolecule density isosurface
    """
    from chmpy import PromoleculeDensity
    from chmpy.surface import promolecule_density_isosurface
    from chmpy.util.color import property_to_color
    import trimesh

    isovalue = kwargs.get("isovalue", 0.002)
    sep = kwargs.get("separation", kwargs.get("resolution", 0.2))
    vertex_color = kwargs.get("color", "d_norm_i")
    extra_props = {}
    pro = PromoleculeDensity((self.atomic_numbers, self.positions))
    if vertex_color == "esp":
        extra_props["esp"] = self.electrostatic_potential
    iso = promolecule_density_isosurface(
        pro, sep=sep, isovalue=isovalue, extra_props=extra_props
    )
    prop = iso.vertex_prop[vertex_color]
    color = property_to_color(prop, cmap=kwargs.get("cmap", vertex_color))
    mesh = trimesh.Trimesh(
        vertices=iso.vertices,
        faces=iso.faces,
        normals=iso.normals,
        vertex_colors=color,
    )
    return mesh

rotate(self, rotation, origin=(0, 0, 0))

Convenience method to rotate this molecule by a given rotation matrix

Parameters:

Name Type Description Default
rotation np.ndarray

A (3, 3) rotation matrix

required
Source code in chmpy/core/molecule.py
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
def rotate(self, rotation, origin=(0, 0, 0)):
    """
    Convenience method to rotate this molecule by a given 
    rotation matrix

    Args:
        rotation (np.ndarray): A (3, 3) rotation matrix
    """

    if np.allclose(origin, (0, 0, 0)):
        np.dot(self.positions, rotation, out=self.positions)
    else:
        self.positions -= origin
        np.dot(self.positions, rotation, out=self.positions)
        self.positions += origin

rotated(self, rotation, origin=(0, 0, 0))

Convenience method to construct a new copy of thismolecule rotated by a given rotation matrix

Parameters:

Name Type Description Default
rotation np.ndarray

A (3, 3) rotation matrix

required

Returns:

Type Description
Molecule

a new copy of this Molecule rotated by the given rotation matrix.

Source code in chmpy/core/molecule.py
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
def rotated(self, rotation, origin=(0, 0, 0)):
    """
    Convenience method to construct a new copy of thismolecule
    rotated by a given rotation matrix

    Args:
        rotation (np.ndarray): A (3, 3) rotation matrix

    Returns:
        Molecule: a new copy of this `Molecule` rotated by the given rotation matrix.
    """
    from copy import deepcopy

    result = deepcopy(self)
    result.rotate(rotation, origin=origin)
    return result

save(self, filename, **kwargs)

Save this molecule to the destination file in xyz format, optionally discarding the typical header.

Parameters:

Name Type Description Default
filename str

path to the destination file

required
kwargs

keyword arguments passed to the relevant method

{}
Source code in chmpy/core/molecule.py
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
def save(self, filename, **kwargs):
    """
    Save this molecule to the destination file in xyz format,
    optionally discarding the typical header.

    Args:
        filename (str): path to the destination file
        kwargs: keyword arguments passed to the relevant method
    """
    fpath = Path(filename)
    n = fpath.name
    fname_map = self._fname_save_map()
    if n in fname_map:
        return fname_map[n](filename, **kwargs)
    extension_map = self._ext_save_map()
    extension = kwargs.pop("fmt", fpath.suffix.lower())
    if not extension.startswith("."):
        extension = "." + extension
    return extension_map[extension](filename, **kwargs)

shape_descriptors(self, l_max=5, **kwargs)

Calculate the molecular shape descriptors[1,2] for this molecule using the promolecule density.

Parameters:

Name Type Description Default
kwargs

keyword arguments passed to promolecule_density_descriptor Options are:

l_max (int, optional): maximum level of angular momenta to include in the spherical harmonic
    transform of the molecular shape function.
with_property (str, optional): describe the combination of the radial shape function and a surface
    property in the real, imaginary channels of a complex function
isovalue (float, optional): the isovalue for the promolecule density surface (default 0.0002 au)
{}

Returns:

Type Description
ndarray

shape description vector

References:

[1] PR Spackman et al. Sci. Rep. 6, 22204 (2016)
    https://dx.doi.org/10.1038/srep22204
[2] PR Spackman et al. Angew. Chem. 58 (47), 16780-16784 (2019)
    https://dx.doi.org/10.1002/anie.201906602
Source code in chmpy/core/molecule.py
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
def shape_descriptors(self, l_max=5, **kwargs) -> np.ndarray:
    """
    Calculate the molecular shape descriptors`[1,2]` for 
    this molecule using the promolecule density.

    Args:
        kwargs: keyword arguments passed to `promolecule_density_descriptor`
            Options are:
            ```
            l_max (int, optional): maximum level of angular momenta to include in the spherical harmonic
                transform of the molecular shape function.
            with_property (str, optional): describe the combination of the radial shape function and a surface
                property in the real, imaginary channels of a complex function
            isovalue (float, optional): the isovalue for the promolecule density surface (default 0.0002 au)
            ```

    Returns:
        shape description vector

    References:
    ```
    [1] PR Spackman et al. Sci. Rep. 6, 22204 (2016)
        https://dx.doi.org/10.1038/srep22204
    [2] PR Spackman et al. Angew. Chem. 58 (47), 16780-16784 (2019)
        https://dx.doi.org/10.1002/anie.201906602
    ```
    """
    from chmpy.shape import SHT, promolecule_density_descriptor

    sph = SHT(l_max=l_max)
    return promolecule_density_descriptor(
        sph, self.atomic_numbers, self.positions, **kwargs
    )

to_mesh(self, **kwargs)

Convert this molecule to a mesh of spheres and cylinders, colored by element. The origins of the spheres will be at the corresponding atomic position, and all units will be Angstroms.

Returns:

Type Description
dict

a dictionary of trimesh.Trimesh objects representing this molecule.

Source code in chmpy/core/molecule.py
818
819
820
821
822
823
824
825
826
827
828
829
830
def to_mesh(self, **kwargs):
    """
    Convert this molecule to a mesh of spheres and
    cylinders, colored by element. The origins of the spheres
    will be at the corresponding atomic position, and all units
    will be Angstroms.

    Returns:
        dict: a dictionary of `trimesh.Trimesh` objects representing this molecule.
    """
    from chmpy.util.mesh import molecule_to_meshes

    return molecule_to_meshes(self, **kwargs)

to_xyz_file(self, filename, **kwargs)

Represent this molecule as an of an xmol .xyz file. Keyword arguments are passed to self.to_xyz_string.

Parameters:

Name Type Description Default
filename str

The path in which store this molecule

required
kwargs

Keyword arguments to be passed to self.to_xyz_string

{}
Source code in chmpy/core/molecule.py
414
415
416
417
418
419
420
421
422
423
424
425
def to_xyz_file(self, filename, **kwargs):
    """
    Represent this molecule as an
    of an xmol .xyz file. Keyword arguments are
    passed to `self.to_xyz_string`.

    Args:
        filename (str): The path in which store this molecule
        kwargs: Keyword arguments to be passed to `self.to_xyz_string`
    """

    Path(filename).write_text(self.to_xyz_string(**kwargs))

to_xyz_string(self, header=True)

Represent this molecule as a string in the format of an xmol .xyz file.

Parameters:

Name Type Description Default
header bool

toggle whether or not to return the 'header' of the xyz file i.e. the number of atoms line and the comment line

True

Returns:

Type Description
str

contents (str) the contents of the .xyz file

Source code in chmpy/core/molecule.py
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
def to_xyz_string(self, header=True) -> str:
    """
    Represent this molecule as a string in the format
    of an xmol .xyz file. 

    Args:
        header (bool, optional):toggle whether or not to return the 'header' of the
            xyz file i.e. the number of atoms line and the comment line

    Returns:
        contents (str) the contents of the .xyz file
    """
    if header:
        lines = [
            f"{len(self)}",
            self.properties.get("comment", self.molecular_formula),
        ]
    else:
        lines = []
    for el, (x, y, z) in zip(self.elements, self.positions):
        lines.append(f"{el} {x: 20.12f} {y: 20.12f} {z: 20.12f}")
    return "\n".join(lines)

transform(self, rotation=None, translation=None)

Convenience method to transform this molecule by rotation and translation.

Parameters:

Name Type Description Default
rotation np.ndarray

A (3,3) rotation matrix

None
translation np.ndarray

A (3,) vector of x, y, z coordinates of the translation

None
Source code in chmpy/core/molecule.py
934
935
936
937
938
939
940
941
942
943
944
945
946
947
def transform(self, rotation=None, translation=None):
    """
    Convenience method to transform this molecule
    by rotation and translation.

    Args:
        rotation (np.ndarray): A (3,3) rotation matrix
        translation (np.ndarray): A (3,) vector of x, y, z coordinates of the translation
    """

    if rotation is not None:
        self.rotate(rotation, origin=(0, 0, 0))
    if translation is not None:
        self.translate(translation)

transformed(self, rotation=None, translation=None)

Convenience method to transform this molecule by rotation and translation.

Parameters:

Name Type Description Default
rotation np.ndarray

A (3,3) rotation matrix

None
translation np.ndarray

A (3,) vector of x, y, z coordinates of the translation

None

Returns:

Type Description
Molecule

a new copy of this Molecule transformed by the provided matrix and vector.

Source code in chmpy/core/molecule.py
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
def transformed(self, rotation=None, translation=None):
    """
    Convenience method to transform this molecule
    by rotation and translation.

    Args:
        rotation (np.ndarray): A (3,3) rotation matrix
        translation (np.ndarray): A (3,) vector of x, y, z coordinates of the translation

    Returns:
        Molecule: a new copy of this `Molecule` transformed by the provided matrix and vector.
    """

    from copy import deepcopy

    result = deepcopy(self)
    result.transform(rotation=rotation, translation=translation)
    return result

translate(self, translation)

Convenience method to translate this molecule by a given translation vector

Parameters:

Name Type Description Default
translation np.ndarray

A (3,) vector of x, y, z coordinates of the translation

required
Source code in chmpy/core/molecule.py
907
908
909
910
911
912
913
914
915
def translate(self, translation):
    """
    Convenience method to translate this molecule by a given 
    translation vector

    Args:
        translation (np.ndarray): A (3,) vector of x, y, z coordinates of the translation
    """
    self.positions += translation

translated(self, translation)

Convenience method to construct a new copy of this molecule translated by a given translation vector

Parameters:

Name Type Description Default
translation np.ndarray

A (3,) vector of x, y, z coordinates of the translation

required

Returns:

Type Description
Molecule

a new copy of this Molecule translated by the given vector.

Source code in chmpy/core/molecule.py
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
def translated(self, translation):
    """
    Convenience method to construct a new copy of this molecule
    translated by a given translation vector

    Args:
        translation (np.ndarray): A (3,) vector of x, y, z coordinates of the translation

    Returns:
        Molecule: a new copy of this `Molecule` translated by the given vector.
    """
    import copy

    result = copy.deepcopy(self)
    result.positions += translation
    return result