Running Canonical Monte Carlo Sampling#
[1]:
import numpy as np
import json
from pymatgen.core.structure import Structure
from smol.io import load_work
0) Load the previous LNO CE with electrostatics#
[2]:
work = load_work('data/basic_ce_ewald.mson')
expansion = work['ClusterExpansion']
1) Create a canonical ensemble#
The Ensemble
class can be used to run MC in a fixed composition ensemble. The ensemble classes will determine the active sublattices by grouping all sites that have the same possible partial occupancies.
To run for fixed chemical potential see the notebook on semigrand ensemble MC.
[3]:
from smol.moca import Ensemble
# Create the ensemble
# This specifies the size of the MC simulation domain.
sc_matrix = np.array([
[6, 1, 1],
[1, 2, 1],
[1, 1, 2]
])
# this convenience method will take care of creating the appropriate
# processor for the given cluster expansion.
ensemble = Ensemble.from_cluster_expansion(expansion, sc_matrix)
# In a real scenario you may want a much larger processor.size
# An MC step is O(1) with the processor.size, meaning it runs at
# the same speed regardless of the size. However, larger sizes
# will need many more steps to reach equilibrium in an MC simulation.
print(f'The supercell size for the processor is {ensemble.processor.size} prims.')
print(f'The ensemble has a total of {ensemble.num_sites} sites.')
print(f'The active sublattices are:')
for sublattice in ensemble.sublattices:
print(sublattice)
The supercell size for the processor is 16 prims.
The ensemble has a total of 64 sites.
The active sublattices are:
Sublattice(site_space=Li+0.5 vacA0+0.5, sites=array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]), active_sites=array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]), encoding=array([0, 1]))
Sublattice(site_space=Ni3+0.5 Ni4+0.5 , sites=array([16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]), active_sites=array([16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]), encoding=array([0, 1]))
Sublattice(site_space=O2-1 , sites=array([32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48,
49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63]), active_sites=array([], dtype=float64), encoding=array([0]))
3) Create an MC sampler#
A Sampler
will take care of running MC sampling runs for a given ensemble. The sampler allows many different options for MC sampling most importantly setting the MCMC algorithm and the type of MC steps taken. However the defaults are usually enough for almost all use cases.
[4]:
from smol.moca import Sampler
# This will take care of setting the defaults
# for the supplied canonical ensemble
sampler = Sampler.from_ensemble(ensemble, temperature=1500)
print(f"Sampling information: {sampler.samples.metadata}")
Sampling information: Metadata(cls_name='SampleContainer', kernels=[Metadata(seed=305970771585639230070722133755157608508, step=Metadata(sublattices=[(Species Li+, Vacancy vacA0+), (Species Ni3+, Species Ni4+), (Species O2-,)], sublattice_probabilities=array([0.5, 0.5]), cls_name='Swap'), cls_name='Metropolis')])
3) Create an initial structure and get occupancies#
You will need to create an initial test structure to obtain an initial occupancy to start an MC run. There are many ways to do this, you could simply rescale a training structure and use that. But since the composition is fixed in a canonical ensemble you need to make sure you input the right composition. It can also be helpful to run a simulated anneal step to get a good initial structure rather than starting with a low energy one.
Here we will use the underlying processor to generate a random occupancy at the composition of the disordered structure used in the original cluster expansion
[5]:
from smol.capp.generate import generate_random_ordered_occupancy
compositions = [sublattice.composition for sublattice in ensemble.sublattices]
init_occu = generate_random_ordered_occupancy(ensemble.processor, composition=compositions)
print(f"The disordered structure has composition: {ensemble.processor.structure.composition}")
print(f"The initial occupancy has composition: {ensemble.processor.structure_from_occupancy(init_occu).composition}")
The disordered structure has composition: Li+8 Ni3+8 Ni4+8 O2-32
The initial occupancy has composition: Li+8 Ni3+8 Ni4+8 O2-32
[6]:
# The occupancy strings created by the processor
# are by default "encoded" by the indices of the species
# for each given site. You can always see the actual
# species in the occupancy string by decoding it.
print(f'The encoded occupancy is:\n{init_occu}')
print(f'The initial occupancy is:\n {ensemble.processor.decode_occupancy(init_occu)}')
The encoded occupancy is:
[1 1 1 0 1 1 1 0 0 0 0 1 0 0 0 1 0 1 1 0 1 0 0 0 1 0 0 1 0 1 1 1 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
The initial occupancy is:
[Vacancy vacA0+, Vacancy vacA0+, Vacancy vacA0+, Species Li+, Vacancy vacA0+, Vacancy vacA0+, Vacancy vacA0+, Species Li+, Species Li+, Species Li+, Species Li+, Vacancy vacA0+, Species Li+, Species Li+, Species Li+, Vacancy vacA0+, Species Ni3+, Species Ni4+, Species Ni4+, Species Ni3+, Species Ni4+, Species Ni3+, Species Ni3+, Species Ni3+, Species Ni4+, Species Ni3+, Species Ni3+, Species Ni4+, Species Ni3+, Species Ni4+, Species Ni4+, Species Ni4+, Species O2-, Species O2-, Species O2-, Species O2-, Species O2-, Species O2-, Species O2-, Species O2-, Species O2-, Species O2-, Species O2-, Species O2-, Species O2-, Species O2-, Species O2-, Species O2-, Species O2-, Species O2-, Species O2-, Species O2-, Species O2-, Species O2-, Species O2-, Species O2-, Species O2-, Species O2-, Species O2-, Species O2-, Species O2-, Species O2-, Species O2-, Species O2-]
4) Run MC iterations#
MC iterations are by default run by swapping sites from all active sublattices, but fine grained simulations can also be ran by only flipping on some of the active sublattices or even freezen specific sites in active sublattices. MC samples are saved in a SampleContainer
created or given to the Sampler
when created.
[8]:
# run 1M iterations
# since this is the first run, the initial occupancy must be supplied
sampler.run(
1000000,
initial_occupancies=init_occu,
thin_by=100, # thin_by will save every 100th sample only
progress=True
) # progress will show progress bar
Sampling 1 chain(s) from a cell with 64 sites...: 100%|██████████| 1000000/1000000 [08:03<00:00, 2070.18it/s]
[9]:
# Samples are saved in a sample container
samples = sampler.samples
print(f'Fraction of successful steps (efficiency) {sampler.efficiency()}')
print(f'The last step energy is {samples.get_energies()[-1]} eV')
print(f'The minimum energy in trajectory is {samples.get_minimum_energy()} eV')
# You can get the minimum energy structure and current structure
# by using the ensemble processor
curr_s = ensemble.processor.structure_from_occupancy(samples.get_occupancies()[-1])
min_s = ensemble.processor.structure_from_occupancy(samples.get_minimum_energy_occupancy())
Fraction of successful steps (efficiency) 0.003902
The last step energy is -551.9159632697251 eV
The minimum energy in trajectory is -552.6314360816468 eV
4.1) Continuing or resetting the MC trajectory#
You can always continue running more iterations from where the trajectory left off by calling run
again. You can also reset to the initial state. (we will skip this step for now so we can show results from the run above.
[10]:
# You can continue the MC trajectory simmply by calling run again
# it is recommended to use the same thin_by used before
#sampler.run(10000, thin_by=100) # this will append new data
# If you want to start from scratch
#sampler.clear_samples() # this will delete data, and reset the ensemble to its initial state
# Now you can start a fresh run
#sampler.run(1000000,
# initial_occupancies=init_occu,
# thin_by=100, # thin_by will save every 100th sample only
# progress=True) # progress will show progress bar
5) Look at trajectory samples and averages#
We can look at the sampled energies, the average and variance directly from the class properties.
For further analysis samples are stored as a list of dictionaries for each sampled step in the CanonicalEnsemble.data
attribute. In the CanonicalEnsemble
class only the energy and occupancy string of each sample are saved.
[11]:
# you can discard burn-in samples from analysis
# ie here we set 1000 samples as burn-in
discard = 1000 # this is in terms of samples so it would be discard*thin_by steps
print(f'A total of {len(samples)} samples taken.')
print(f'A total of {len(samples.get_energies(discard=discard))} samples used for production.')
print(f'The average energy is {samples.mean_energy(discard=discard)} eV')
print(f'The energy variance is {samples.energy_variance(discard=discard)} eV^2')
A total of 10000 samples taken.
A total of 9000 samples used for production.
The average energy is -551.7811770626871 eV
The energy variance is 0.04697867310315079 eV^2
Save your work#
The Sampler
class does and can not be saved since it does not really have any computed values. However the SampleContainter
where the MC samples are recorded can be saved. You can use the same save_work
convenience function to save your work.
You can also save the SampleContainer
as an hdf5 file. You will need h5py
installed.