Package Design#
Overview & Mission#
smol is intentionally designed to be easy to use, install and extend. In order to achieve these goals the package has few dependencies [1], and has a heavily object-oriented and modular design that closely follows mathematical and methodological abstractions. This enables flexible creation of complex workflows and hassle-free implementation methodology extensions, that will rarely need to be implemented from scratch.
Inheritance, polymorphism and composition are key OOP design concepts in smol that result in a modular design, This hopefully allows to design complex calculation workflows and implement new functionality without having to implement new classes and functions from scratch.
smol has been designed to enable efficient and open development of new methodology for fitting and sampling applied lattice models in a user-friendly way; and as a result allow quick development-to-application turnaround time in the study of configuration dependent properties of inorganic materials.
Module Design#
smol is organized into two main submodules, smol.cofe and
smol.moca:
- smol.cofeimplements the classes and functions to represent site spaces, configuration spaces and function subspaces over them, and the tools necessary to fit lattice models, in the spirit of cluster expansions and the structure inversion method.
- smol.mocahas implementations of classes to carry out Monte Carlo sampling of ensembles associated with a fitted lattice model Hamiltonian, as well as simple empirical models such as a point Coulomb potential.
Below is a detailed description of the classes and their relationships in the main modules. Before delving into these we recommend going over the simpler overview of the package in the User Guide page.
Diagrams showing the main objects and their relationships for the smol.cofe and
smol.moca are presented below. In these diagrams base classes are depicted in
filled (and unboxed) rectangles, derived classes are depited in boxed rectangles with
an colored arrow pointing from the base class to the derived classes.
Ownership relationships between distinct classes (i.e. when a class has another class as an attribute) are depicted as dark blue colored arrows. The arrow points from the class which is an attribute to the class that holds the former as an attributed.
smol.cofe#
The diagram below summarizes the overall design of the smol.cofe module. The main
classes that users directly instantiate and interact with are the
ClusterSubspace, StructureWrangler, and ClusterExpansion as
detailed in the User Guide. All main classes are derived from monty
MSONable and thus can be serialized and saved as JSON objects.
In addition, several other classes are defined to enable the various features and
functionality in the module.
Following the diagram above, the main purpose of the classes depicted is,
- ClusterSubspace(and its derived class- PottsSubspace) is the main work horse to construct and represent configuration spaces and the function spaces over them. Its main purpose is to compute correlation function values given an ordered structure. In order to do so, it holds a list of- Orbitobjects generated for a particular disordered- pymatgen- Structure. Additionally an external term representing a simple empricial potential model can also be included. Currently only an- EwaldTerm, to represent a point electrostatic potential is available.- An - Orbitrepresents both the set of symmetrically equivalent clusters, as well as the set of product functions that act over the configurations of those clusters. An orbit holds a base- Clusterand a list of SiteBasis associated with each site in the cluster.- A - Clusteris a collection of sites (derived from- SiteCollectionin- pymatgen). The two key concepts of a- Clusteris that compared to a- Structurethey do hold periodic sites, and compared to a- Moleculethe hold a lattice (of the structure they are associated with).
- A - SiteBasisrepresents the basis set that spans the function space of the configurations for a single site. Several types of basis sets are included and implementing new ones is relatively straightforward. A- SiteBasisholds a- SiteSpacewhich represents its single site configuration space.- SiteSpacerepresents the possible configurations of a given site. It essentially holds a- pymatgen- Composition, with the addition of an explicit- Vacancyspecies, for compositions that do not sum to 1.
 
 
- An external term, representing a simple empirical pair potential can also be included in a - ClusterSubspaceto create a mixture model (i.e. a Cluster expansion + an empirical potential). Currently an- EwaldTermis implemented to allow mixture models of cluster expansions with explicit electrostatics.
 
- A - StructureWrangleris the main object for training data preparation. Training data is held as a list of- pymatgen- ComputedStructureEntry. The- StructureWranglerholds a given- ClusterSubspace, and takes care of making sure training structures can be correctly mapped to the disordered unit structure of the- ClusterSubspace. The corresponding correlation vectors of the training structures that can be correctly mapped are computed to form the correlation matrix (feature matrix) necessary for training. In addition, the- StructureWranglerhas several methods that allow to inspect and further prepare training data, such as checking for duplicates, obtaining correlation matrix properties, and most importantly obtaining a properly normalized property vector for training (the normalization is done per the disordered unit cell from the- ClusterSubspace).
- A - ClusterExpansionrepresent the final fitted lattice model. It holds a- ClusterSubspace, and a corresponding set of fitted coefficients. A- ClusterSubspacecan be used to predict the energy of new structures, as well as obtain the effective cluster interactions (ECI), and prune unimportant terms.
smol.moca#
A diagram showing the overall design of the smol.moca module. The
main classes in this module that are necessary to run Monte Carlo sampling using
a ClusterExpansion are the Processor classes, the Ensemble,
and the Sampler class. However, a number of helper classes are implemented
to allow running and implementing a large variety of types of MC calculations.
The class descriptions are as follows,
- An - Ensemblerepresents the probability space associated with a particular lattice model and thermodynamic boundary conditions (i.e.- chemical_potentials) over a finite simulation domain size, which is represented by a supercell matrix of an associated unit cell. An- Ensembleholds a- Processorand a list of- Sublatticeinstances.- Sublatticerepresents the set of sites of the defined supercell that are have the same site space (set of allowed species).- Sublatticeinstances can also be split according to a particular frozen configuration.
- Processorinstances, hold a- ClusterSubspaceand represent a particular sampling domain in the form of a supercell matrix (of the corresponding disordered unit cell). A- Processorallows to quickly and efficiently calculate the fitted property and differences from local updates for a given configuration (over the represented supercell). A- Processorcan also generate occupancy strings given an ordered supercell structure, as well as generate the- Sublatticeinstances for the different sites in the supercell domain.
 
- A - Sampleris the main class to run MC sampling given a particular- Ensembleinstance. Apart from the ensemble to be sampled, a- Samplerholds the an- MCKernelthat implements the particular sampling algorithm, and a- SampleContainerto record the sampled configurations, correlation vectors, enthalpy, and any other associated state variable.- MCKernelare implementations of particular MC algorithms, such as the- Metropoliskernel. They take care of generating the Markov chain, sampling histograms, and any other configuration and state attribute. The include an- MCUsherto propose steps, and may also include an optional- MCBiasto bias samples to particular configurations in phase space.- An - MCUsherhas the purpose of proposing the steps to carry out the random walk for an MC calculation. The simplest are a- Flipfor single site species changes, and a- Swapfor swapping the species at two sites. However implementing new- MCUsherfor more complex random walks should be relatively straightforward.
- An - MCBiasserves as an additional term that can be included to bias acceptance probabilities in order to carry out sampling from extended or biased ensembles.
 
- MC samples are saved in a - SamplerContainerfrom sampled- Traceobjects. These will always include the configurations, correlations or features, and the energy or enthalpy. Additional values depending on the particular- MCKernelused are also saved. A- SampleContainerhas functionality to obtain simple mean and variances of sampled values. In addition, a- SampleContainercan be saved as either a json file or an hdf5 container. During lengthy simulations, samples can be streamed into an hdf5 container to minimize memory requirements; further, using hdf5 containers in single writer multiple reader mode allows users to begin looking at samples before a simulation has concluded.
 
Footnotes
