This page was generated from docs/src/notebooks/creating-a-ce-w-electrostatics.ipynb.

Creating a Cluster Expansion with an additional Ewald electrostatic term#

Fitting cluster expansions with an ewald term was proposed by former student Will Richards. See chapter 4.6 of his thesis for details.

[1]:

import numpy as np
import json
from monty.serialization import loadfn
from pymatgen.core.structure import Structure
from smol.cofe import ClusterSubspace, StructureWrangler, ClusterExpansion, RegressionData

[2]:

# load the prim structure
lno_prim = loadfn('data/lno_prim.json')

# load the fitting data
lno_entries = loadfn("data/lno_entries.json")

1) Create the cluster subspace#

[3]:

subspace = ClusterSubspace.from_cutoffs(
    lno_prim,
    cutoffs={2: 5, 3: 4.1},
    basis='sinusoid',
    supercell_size='O2-'
)

2) Add the ewald term.#

An EwaldTerm can be added to a cluster expansion to account for long range electrostatic interactions in ionic materials and therefore reduce the cluster complexity required to train the cluster expansion.

[4]:

from smol.cofe.extern import EwaldTerm

# eta is the screening parameter used in computing
# real/reciprocal space parts in the Ewald summation
# See pymatgen.analysis.ewald.EwaldSummation
subspace.add_external_term(EwaldTerm(eta=None))

2.1) The Electrostatic term#

The last entry in the correlation vector corresponds to the electrostatic energy from the Ewald summation. It essentially is the normalized electrostatic interaction energy. Since it has units of energy it is not rigorously a correlation like the orbit correlations which are unitless.

[5]:

structure = lno_entries[1].structure
corr = subspace.corr_from_structure(structure)

print(f'The Ewald interaction for a structure with'
      f' composition {structure.composition} is: '
      f'\n{corr[-1]} eV/prim')

The Ewald interaction for a structure with composition Li+1 Ni4+5 Ni3+1 O2-12 is:
-116.41651881128503 eV/prim

3) Creating the cluster expansion#

Preparing the training data, fiting and creating the cluster expansion with the Ewald term is same procedure as done for a regular cluster expansion.

[6]:

from sklearn.linear_model import LinearRegression

# create and add data to the wrangler
wrangler = StructureWrangler(subspace)
for entry in lno_entries:
    wrangler.add_entry(entry, verbose=False)

# fit the data with an estimator
estimator = LinearRegression(fit_intercept=False)
estimator.fit(wrangler.feature_matrix,
              wrangler.get_property_vector('energy'))

# save regression details
reg_data = RegressionData.from_sklearn(
    estimator, wrangler.feature_matrix, wrangler.get_property_vector('energy')
)
# create the cluster expansion
expansion = ClusterExpansion(
    subspace, coefficients=estimator.coef_, regression_data=reg_data
)

3.1) Check the quality of the fit and the “dielectric” constant#

We will check the quality of the fit with the simple methods from before.

It is also useful to look at the value of the fit coefficient obtained for the Ewald interaction, since its inverse can be interpreted as a dielectric constant.

[7]:

from sklearn.metrics import mean_squared_error, max_error

train_predictions = np.dot(wrangler.feature_matrix,
                           expansion.coefs)
rmse = mean_squared_error(
    wrangler.get_property_vector('energy'),
    train_predictions,
    squared=False
)
maxer = max_error(
    wrangler.get_property_vector('energy'),
    train_predictions
)

print(f'RMSE {1E3 * rmse} meV/prim')
print(f'MAX {1E3 * maxer} meV/prim')
print(f'Fitted dielectric constant {1/expansion.coefs[-1]}')

RMSE 7.366465328125201 meV/prim
MAX 16.051863315254877 meV/prim
Fitted dielectric constant 9.413114998055713

4) Constraining the value of the “dielectric” constant#

Sometimes we want to constrain the obtained value of the dielectric constant, when it is too large or when it is negative (which would mean higher electrostatic interaction energy is more favorable?)

[8]:

from sparselm.tools import constrain_coefficients

max_dielectric = 5.0

# since the sklearn LinearRegression.fit does not return the coefs, we can do this:
def fit(X, y):
    estimator.fit(X, y)
    return estimator.coef_

coefs = constrain_coefficients(
    indices=[-1,], low=1/max_dielectric, high=np.inf)(fit)(
    wrangler.feature_matrix, wrangler.get_property_vector('energy')
)
train_predictions = np.dot(wrangler.feature_matrix, coefs)
rmse = mean_squared_error(
    wrangler.get_property_vector('energy'),
    train_predictions,
    squared=False
)
maxer = max_error(
    wrangler.get_property_vector('energy'),
    train_predictions
)

print(f'RMSE {1E3 * rmse} meV/prim')
print(f'MAX {1E3 * maxer} meV/prim')
print(f'Fitted dielectric constant {1/coefs[-1]}')

RMSE 10.332921196955532 meV/prim
MAX 22.073503797322758 meV/prim
Fitted dielectric constant 5.0

4.1) If you want to play with decorators you can also do the above in a cleaner looking way.#

[9]:

@constrain_coefficients(indices=[-1,], low=1/max_dielectric, high=np.inf)
def fit(X, y):
    estimator.fit(X, y)
    return estimator.coef_

coefs = fit(wrangler.feature_matrix,
            wrangler.get_property_vector('energy'))

train_predictions = np.dot(wrangler.feature_matrix, coefs)
rmse = mean_squared_error(
    wrangler.get_property_vector('energy'),
    train_predictions,
    squared=False
)
maxer = max_error(
    wrangler.get_property_vector('energy'),
    train_predictions
)

print(f'RMSE {1E3 * rmse} meV/prim')
print(f'MAX {1E3 * maxer} meV/prim')
print(f'Fitted dielectric constant {1/coefs[-1]}')

RMSE 10.332921196955532 meV/prim
MAX 22.073503797322758 meV/prim
Fitted dielectric constant 5.0

5) Save work#

[ ]:

from smol.io import save_work

file_path = 'data/basic_ce_ewald.mson'
# we can save the subspace as well, but since both the wrangler
# and the expansion have it, there is no need to do so.
save_work(file_path, wrangler, expansion)