Expansion#

ClusterExpansion contains the fitted coefficients of the cluster expansion for predicting CE properties of new structures.

We provide a thin RegressionData dataclass to record the specifics of the regression used while fitting, for example when using a linear model from scikit-learn.

ClusterExpansion#

This module implements the ClusterExpansion class.

A ClusterExpansion holds the necessary attributes to represent a CE and predict the property for new structures.

The class also allows pruning of the CE to remove low importance orbits function terms and speed up Monte Carlo runs.

Also has numerical ECI conversion to other basis sets, but this functionality has not been strongly tested.

class ClusterExpansion(cluster_subspace, coefficients, regression_data=None)[source]#

Bases: MSONable

Class for the ClusterExpansion proper.

This needs a ClusterSubspace and a corresponding set of coefficients from a fit.

The main method to use is the predict() method to predict the fitted property for new structures. This can be used to compare the accuracy of the fit with a set of test structures not used in training.

Although this is purely optional and does not change the class performance, it is also recommended you save some information about learn metrics such as CV score, test/train rmse, or anything to quantify the “goodness” in the metadata dictionary. See for example learn metrics in sklearn.metrics for many useful methods to get these quantities.

This class is also used for Monte Carlo simulations to create a ClusterExpansionProcessor that calculates the CE for a fixed supercell size. Before using a ClusterExpansion for Monte Carlo you should consider pruning the correlation/orbit functions with very small coefficients or eci.

coefficients#

coefficients of the ClusterExpansion

Type:: ndarry

metadata#

dict to save optional values describing cluster expansion. i.e. if it was pruned, any error metrics etc.

Type:: dict

Initialize a ClusterExpansion.

Parameters:

cluster_subspace (ClusterSubspace) – a ClusterSubspace representing the subspace over which the ClusterExpansion was fit. Must be the same used to create the feature matrix.
coefficients (ndarray) – coefficients for cluster expansion. Make sure the supplied coefficients match the correlation vector terms (length and order) These correspond to the ECI x the multiplicity of orbit x multiplicity of bit ordering.
regression_data (RegressionData) – optional RegressionData object with details used in the fit of the corresponding expansion. The feature_matrix attribute here is necessary to compute things like numerical ECI transformations for different bases.

as_dict()[source]#

Get Json-serialization dict representation.

Returns:: MSONable dict

property cluster_interaction_tensors#

Get tuple of cluster interaction tensors.

Tuple of ndarrays where each array is the interaction tensor for the corresponding orbit of clusters.

cluster_interactions_from_structure(structure, normalized=True, scmatrix=None, site_mapping=None)[source]#

Compute the vector of cluster interaction values for given structure.

A cluster interaction is simply a vector made up of the sum of all cluster expansion terms over the same orbit.

Parameters:

structure (Structure) – Structures to predict from
normalized (bool) – Whether to return the predicted property normalized by the prim cell size.
scmatrix (ndarray) – optional supercell matrix relating the prim structure to the given structure. Passing this if it has already been matched will make things much quicker. You are responsible that the supercell matrix is correct.
site_mapping (list) – optional Site mapping as obtained by StructureMatcher.get_mapping such that the elements of site_mapping represent the indices of the matching sites to the prim structure. If you pass this option, you are fully responsible that the mappings are correct!

Returns: ndarray: vector of cluster interaction values

property cluster_subspace#: Get ClusterSubspace.

copy()[source]#: Return a copy of self.

property eci#

Get the ECI for the cluster expansion.

This just divides coefficients by the corresponding multiplicities. External terms are dropped since their fitted coefficients do not represent ECI.

property eci_orbit_ids#

Get Orbit ids corresponding to each ECI in the Cluster Expansion.

If the Cluster Expansion includes external terms these are not included in the list since they are not associated with any orbit.

property effective_cluster_weights#

Calculate the cluster weights.

The cluster weights are defined as the weighted sum of ECI squared, where the weights are the ordering multiplicities.

property expansion_structure#

Get expansion structure.

Prim structure with only sites included in the expansion (i.e. sites with partial occupancies)

property feature_matrix#

Get the feature matrix used in fit.

If not given, returns an identity matrix of len num_corrs

classmethod from_dict(d)[source]#: Create ClusterExpansion from serialized MSONable dict.

predict(structure, normalized=False, scmatrix=None, site_mapping=None)[source]#

Predict the fitted property for a given set of structures.

Parameters:

structure (Structure) – Structures to predict from
normalized (bool) – optional Whether to return the predicted property normalized by the prim cell size.
scmatrix (ndarray) – optional supercell matrix relating the prim structure to the given structure. Passing this if it has already been matched will make things much quicker. You are responsible that the supercell matrix is correct.
site_mapping (list) – optional Site mapping as obtained by StructureMatcher.get_mapping such that the elements of site_mapping represent the indices of the matching sites to the prim structure. If you pass this option, you are fully responsible that the mappings are correct!

Returns:

float

prune(threshold=0, with_multiplicity=False)[source]#

Remove fit coefficients or ECI’s with small values.

Removes ECI’s and orbits in the ClusterSubspaces that have ECI/parameter values smaller than the given threshold.

This will change the fits error metrics (i.e. RMSE) a little, but it should not be much. If they change a lot then the threshold used is probably too high and important functions are being pruned.

This will not re-fit the ClusterExpansion. Note that if you re-fit after pruning, the ECI will probably change and hence also the fit performance.

Parameters:

threshold (float) – threshold below which to remove.
with_multiplicity (bool) – if True, threshold is applied to the ECI proper, otherwise to the fit coefficients

property structure#: Get primitive structure which the expansion is based on.

RegressionData#

class RegressionData(module, estimator_name, feature_matrix, property_vector, parameters)[source]#

Bases: object

Dataclass used to store regression model details.

This class is used to store the details used in fitting a cluster expansion for future reference and good provenance practices. It is highly recommended to initialize ClusterExpansion objects with this class

estimator_name: str#

feature_matrix: ndarray#

classmethod from_object(estimator, feature_matrix, property_vector, parameters=None)[source]#

Create a RegressionData object from an estimator class.

Parameters:

estimator (object) – Estimator class or function.
feature_matrix (ndarray) – feature matrix used in fit.
property_vector (ndarray) – target property vector used in fit.
parameters (dict) – Dictionary with pertinent fitting parameters, i.e. regularization, etc. It is highly recommended that you save this out of good practice and to ensure reproducibility.

Returns:

RegressionData

classmethod from_sklearn(estimator, feature_matrix, property_vector)[source]#

Create a RegressionData object from sklearn estimator.

Parameters:

estimator (object) – scikit-learn estimator class or derived class.
feature_matrix (ndarray) – feature matrix used in fit.
property_vector (ndarray) – target property vector used in fit.

Returns:

RegressionData

module: str#

parameters: dict#

property_vector: ndarray#