This page was generated from docs/src/notebooks/running-canonical-mc.ipynb.

Running Canonical Monte Carlo Sampling#

[1]:

import numpy as np
import json
from pymatgen.core.structure import Structure
from smol.io import load_work

0) Load the previous LNO CE with electrostatics#

[2]:

work = load_work('data/basic_ce_ewald.mson')
expansion = work['ClusterExpansion']

1) Create a canonical ensemble#

The Ensemble class can be used to run MC in a fixed composition ensemble. The ensemble classes will determine the active sublattices by grouping all sites that have the same possible partial occupancies.

To run for fixed chemical potential see the notebook on semigrand ensemble MC.

[3]:

from smol.moca import Ensemble

# Create the ensemble
# This specifies the size of the MC simulation domain.
sc_matrix = np.array([
    [6, 1, 1],
    [1, 2, 1],
    [1, 1, 2]
])

# this convenience method will take care of creating the appropriate
# processor for the given cluster expansion.
ensemble = Ensemble.from_cluster_expansion(expansion, sc_matrix)

# In a real scenario you may want a much larger processor.size
# An MC step is O(1) with the processor.size, meaning it runs at
# the same speed regardless of the size. However, larger sizes
# will need many more steps to reach equilibrium in an MC simulation.
print(f'The supercell size for the processor is {ensemble.processor.size} prims.')
print(f'The ensemble has a total of {ensemble.num_sites} sites.')
print(f'The active sublattices are:')
for sublattice in ensemble.sublattices:
    print(sublattice)

The supercell size for the processor is 16 prims.
The ensemble has a total of 64 sites.
The active sublattices are:
Sublattice(site_space=Li+0.5 vacA0+0.5, sites=array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15]), active_sites=array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15]), encoding=array([0, 1]))
Sublattice(site_space=Ni3+0.5 Ni4+0.5 , sites=array([16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]), active_sites=array([16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]), encoding=array([0, 1]))
Sublattice(site_space=O2-1 , sites=array([32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48,
       49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63]), active_sites=array([], dtype=float64), encoding=array([0]))

3) Create an MC sampler#

A Sampler will take care of running MC sampling runs for a given ensemble. The sampler allows many different options for MC sampling most importantly setting the MCMC algorithm and the type of MC steps taken. However the defaults are usually enough for almost all use cases.

[4]:

from smol.moca import Sampler

# This will take care of setting the defaults
# for the supplied canonical ensemble
sampler = Sampler.from_ensemble(ensemble, temperature=1500)
print(f"Sampling information: {sampler.samples.metadata}")

Sampling information: Metadata(cls_name='SampleContainer', kernels=[Metadata(seed=20591352976779962497111620739889289287, step=Metadata(sublattices=[(Species Li+, Vacancy vacA0+), (Species Ni3+, Species Ni4+), (Species O2-,)], sublattice_probabilities=array([0.5, 0.5]), cls_name='Swap'), cls_name='Metropolis')])

3) Create an initial structure and get occupancies#

You will need to create an initial test structure to obtain an initial occupancy to start an MC run. There are many ways to do this, you could simply rescale a training structure and use that. But since the composition is fixed in a canonical ensemble you need to make sure you input the right composition. It can also be helpful to run a simulated anneal step to get a good initial structure rather than starting with a low energy one.

Here we will use the underlying processor to generate a random occupancy at the composition of the disordered structure used in the original cluster expansion

[5]:

from smol.capp.generate import generate_random_ordered_occupancy

compositions = [sublattice.composition for sublattice in ensemble.sublattices]
init_occu = generate_random_ordered_occupancy(ensemble.processor, composition=compositions)

print(f"The disordered structure has composition: {ensemble.processor.structure.composition}")
print(f"The initial occupancy has composition: {ensemble.processor.structure_from_occupancy(init_occu).composition}")

The disordered structure has composition: Li+8 Ni3+8 Ni4+8 O2-32
The initial occupancy has composition: Li+8 Ni3+8 Ni4+8 O2-32

[6]:

# The occupancy strings created by the processor
# are by default "encoded" by the indices of the species
# for each given site. You can always see the actual
# species in the occupancy string by decoding it.
print(f'The encoded occupancy is:\n{init_occu}')
print(f'The initial occupancy is:\n {ensemble.processor.decode_occupancy(init_occu)}')

The encoded occupancy is:
[1 0 0 1 0 0 0 0 1 0 1 0 1 1 1 1 0 1 0 1 0 0 1 1 0 1 1 0 0 1 1 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
The initial occupancy is:
 [Vacancy vacA0+, Species Li+, Species Li+, Vacancy vacA0+, Species Li+, Species Li+, Species Li+, Species Li+, Vacancy vacA0+, Species Li+, Vacancy vacA0+, Species Li+, Vacancy vacA0+, Vacancy vacA0+, Vacancy vacA0+, Vacancy vacA0+, Species Ni3+, Species Ni4+, Species Ni3+, Species Ni4+, Species Ni3+, Species Ni3+, Species Ni4+, Species Ni4+, Species Ni3+, Species Ni4+, Species Ni4+, Species Ni3+, Species Ni3+, Species Ni4+, Species Ni4+, Species Ni3+, Species O2-, Species O2-, Species O2-, Species O2-, Species O2-, Species O2-, Species O2-, Species O2-, Species O2-, Species O2-, Species O2-, Species O2-, Species O2-, Species O2-, Species O2-, Species O2-, Species O2-, Species O2-, Species O2-, Species O2-, Species O2-, Species O2-, Species O2-, Species O2-, Species O2-, Species O2-, Species O2-, Species O2-, Species O2-, Species O2-, Species O2-, Species O2-]

4) Run MC iterations#

MC iterations are by default run by swapping sites from all active sublattices, but fine grained simulations can also be ran by only flipping on some of the active sublattices or even freezen specific sites in active sublattices. MC samples are saved in a SampleContainer created or given to the Sampler when created.

[7]:

# run 1M iterations
# since this is the first run, the initial occupancy must be supplied
sampler.run(
    1000000,
    initial_occupancies=init_occu,
    thin_by=100, # thin_by will save every 100th sample only
    progress=True
) # progress will show progress bar

Sampling 1 chain(s) from a cell with 64 sites: 100%|█| 1000000/1000000 [01:45<00:00, 9437.58i

[8]:

# Samples are saved in a sample container
samples = sampler.samples

print(f'Fraction of successful steps (efficiency) {sampler.efficiency()}')
print(f'The last step energy is {samples.get_energies()[-1]} eV')
print(f'The minimum energy in trajectory is {samples.get_minimum_energy()} eV')

# You can get the minimum energy structure and current structure
# by using the ensemble processor
curr_s = ensemble.processor.structure_from_occupancy(samples.get_occupancies()[-1])
min_s = ensemble.processor.structure_from_occupancy(samples.get_minimum_energy_occupancy())

Fraction of successful steps (efficiency) 0.3849
The last step energy is -551.6679436062601 eV
The minimum energy in trajectory is -552.6314360816037 eV

4.1) Continuing or resetting the MC trajectory#

You can always continue running more iterations from where the trajectory left off by calling run again. You can also reset to the initial state. (we will skip this step for now so we can show results from the run above.

[9]:

# You can continue the MC trajectory simmply by calling run again
# it is recommended to use the same thin_by used before
#sampler.run(10000, thin_by=100)  # this will append new data

# If you want to start from scratch
#sampler.clear_samples()  # this will delete data, and reset the ensemble to its initial state
# Now you can start a fresh run
#sampler.run(1000000,
#             initial_occupancies=init_occu,
#             thin_by=100, # thin_by will save every 100th sample only
#             progress=True) # progress will show progress bar

5) Check convergence of MC sampling#

Find the minimum number of samples required to discard for equilibration while converging the energy

[10]:

from smol.moca.analysis.convergence import check_property_converged, determine_discard_number

energies = samples.get_energies()
# 100 as an initial guess for amount to discard
opt_discard = determine_discard_number(property_array=energies, init_discard=100, verbose=True)
converged = check_property_converged(energies[opt_discard:])
print(f'Is the energy converged after discarding the first {opt_discard} samples?', converged)

Is the energy converged after discarding the first 100 samples? True

6) Look at trajectory samples and averages#

We can look at the sampled energies, the average and variance directly from the class properties.

For further analysis samples are stored as a list of dictionaries for each sampled step in the CanonicalEnsemble.data attribute. In the CanonicalEnsemble class only the energy and occupancy string of each sample are saved.

[11]:

# Set 100 samples for burn-in, as determined in 5)
discard = 100 # this is in terms of samples so it would be discard*thin_by steps
print(f'A total of {len(samples)} samples taken.')
print(f'A total of {len(samples.get_energies(discard=discard))} samples used for production.')
print(f'The average energy is {samples.mean_energy(discard=discard)} eV')
print(f'The energy variance is {samples.energy_variance(discard=discard)} eV^2')
print(f'The sampling efficiency (acceptance rate) is approximately {samples.sampling_efficiency(discard=discard)}')

A total of 10000 samples taken.
A total of 9900 samples used for production.
The average energy is -551.7841691590747 eV
The energy variance is 0.04799366234166897 eV^2
The sampling efficiency (acceptance rate) is approximately 0.38505050505050503

Save your work#

The Sampler class does and can not be saved since it does not really have any computed values. However the SampleContainter where the MC samples are recorded can be saved. You can use the same save_work convenience function to save your work.

You can also save the SampleContainer as an hdf5 file. You will need h5py installed.