Sample Container#

A SampleContainer holds data (occupancies, feature vectors, energies) sampled during MC sampling. A SampleContainer is generally automatically created when a Sampler is instantiated. In addition to holding sample information, the SamplerContainer also provides some methods for simple analysis of the raw data, including means, variances, and minima of energies and enthalpies, mean compositions, minimum-energy occupancies, etc.

class SampleContainer(ensemble, sample_trace, sampling_metadata=None)[source]#

Bases: MSONable

A SampleContainer class stores Monte Carlo simulation samples.

A SampleContainer holds samples and sampling information from an MCMC sampling run. It is useful to obtain the raw data and minimal empirical properties of the underlying distribution in order to carry out further analysis of the MCMC sampling results.

When getting any value from the provided attributes, the highly repeated args are:

discard (int): optional

number of samples to discard to obtain the value requested.

thin_by (int): optional

use every thin_by sample to obtain the value requested.

flat (bool): optional

if more than 1 walker is used, flattening will flatten all chains into one. Defaults to True.

num_sites#

Size of system (usually in number of prims in supercell, but can be anything representative, i.e. number of sites)

Type:

int

sublattices#

Sublattices of the ensemble sampled.

Type:

list of Sublattice

natural_parameters#

Array of natural parameters used in the ensemble.

Type:

ndarray

metadata#

Dictionary of metadata from the MC run that generated the samples.

Type:

dict

Initialize a sample container.

Parameters:
  • ensemble (Ensemble) – Ensemble object used to generate the samples.

  • sample_trace (Trace) – a trace object for the traced values during MC sampling

  • sampling_metadata (Ensemble) – sampling metadata (i.e. ensemble name, mckernel type, etc)

allocate(nsamples)[source]#

Allocate more space in arrays for more samples.

as_dict()[source]#

Get Json-serialization dict representation.

Returns:

MSONable dict

clear()[source]#

Clear all samples from container.

composition_variance(discard=0, thin_by=1, flat=True)[source]#

Get the variance in composition of all species.

energy_variance(discard=0, thin_by=1, flat=True)[source]#

Calculate the variance of sampled energies.

property ensemble#

Return the ensemble name.

enthalpy_variance(discard=0, thin_by=1, flat=True)[source]#

Get the variance in enthalpy.

feature_vector_variance(discard=0, thin_by=1, flat=True)[source]#

Get the variance of feature vector elements.

flush_to_backend(backend)[source]#

Flush current samples and trace to backend file.

Parameters:

backend (object) – backend file object, currently only hdf5 supported.

classmethod from_dict(d, ensemble=None)[source]#

Instantiate a SampleContainer from dict representation.

Parameters:
  • d (dict) – dictionary representation.

  • ensemble (Ensemble) – optional The ensemble object to used for generating samples. Only needed to update legacy files.

Returns:

SampleContainer

classmethod from_hdf5(file_path, swmr_mode=True, ensemble=None)[source]#

Instantiate a SampleContainer from an hdf5 file.

Parameters:
  • file_path (str) – path to file

  • swmr_mode (bool) – optional If true allows to read file from other processes. Single Writer Multiple Readers.

  • ensemble (Ensemble) – optional The ensemble object to used for generating samples. Only needed to update legacy files.

Returns:

SampleContainer

get_backend(file_path, alloc_nsamples=0, swmr_mode=False)[source]#

Get a backend file object.

Currently only hdf5 files supported

Parameters:
  • file_path (str) – path to backend file.

  • alloc_nsamples (int) – optional number of new samples to allocate. Will only extend datasets if number given is larger than space left to write samples into.

  • swmr_mode (bool) – optional If true allows to read file from other processes. Single Writer Multiple Readers.

Returns:

h5.File object

get_compositions(discard=0, thin_by=1, flat=True)[source]#

Get the compositions for each occupancy in the chain.

get_energies(discard=0, thin_by=1, flat=True)[source]#

Get the energies from samples in chain.

get_enthalpies(discard=0, thin_by=1, flat=True)[source]#

Get the generalized enthalpy changes from samples in chain.

get_feature_vectors(discard=0, thin_by=1, flat=True)[source]#

Get the feature vector changes from samples in chain.

get_minimum_energy(discard=0, thin_by=1, flat=True)[source]#

Get the minimum energy from samples.

get_minimum_energy_occupancy(discard=0, thin_by=1, flat=True)[source]#

Find the occupancy with minimum energy from samples.

get_minimum_enthalpy(discard=0, thin_by=1, flat=True)[source]#

Get the minimum energy from samples.

get_minimum_enthalpy_occupancy(discard=0, thin_by=1, flat=True)[source]#

Find the occupancy with minimum energy from samples.

get_occupancies(discard=0, thin_by=1, flat=True)[source]#

Get the occupancy chain of the samples.

get_orbit_factors(function_orbit_ids, discard=0, thin_by=1, flat=True)[source]#

Get the orbit factor vectors for samples.

get_sampled_structures(indices, flat=True)[source]#

Get sampled structures for MC steps given by indices.

Parameters:
  • indices (list of int or int) – A single index or list of indices to obtain sampled structures.

  • flat (bool) – optional If true will flatten chains, and the indices correspond to flattened values. If false chain is not flattened, and if multiple walkers where used returns a list of list where each inner list has the sampled structure for each walker. Default is set to flat True.

Returns:

list of Structure or list of list of Structure

get_species_counts(discard=0, thin_by=1, flat=True)[source]#

Get the species counts for each occupancy in the chain.

get_sublattice_compositions(sublattice, discard=0, thin_by=1, flat=True)[source]#

Get the compositions of a specific sublattice.

get_sublattice_species_counts(sublattice, discard=0, thin_by=1, flat=True)[source]#

Get the counts of each species in a sublattices.

Returns:

where last axis is the count for each species in the same order as the underlying site space.

Return type:

ndarray

get_trace_value(name, discard=0, thin_by=1, flat=True)[source]#

Get sampled values of a traced value given by name.

mean_composition(discard=0, thin_by=1, flat=True)[source]#

Get mean composition for all species regardless of sublattice.

mean_energy(discard=0, thin_by=1, flat=True)[source]#

Calculate the mean energy from samples.

mean_enthalpy(discard=0, thin_by=1, flat=True)[source]#

Get the mean generalized enthalpy.

mean_feature_vector(discard=0, thin_by=1, flat=True)[source]#

Get the mean feature vector from samples.

mean_sublattice_composition(sublattice, discard=0, thin_by=1, flat=True)[source]#

Get the mean composition of a specific sublattice.

mean_trace_value(name, discard=0, thin_by=1, flat=True)[source]#

Get mean of a traced value given by name.

property natural_parameters#

Return the natural parameters.

property num_samples#

Get the total number of samples.

sampling_efficiency(discard=0, flat=True)[source]#

Return the sampling efficiency for chains.

If the sampling is thinned by > 1, this value becomes only an estimate for the true sampling efficiency, as we do not know the efficiency of the discarded samples.

save_sampled_trace(trace, thinned_by)[source]#

Save a sampled trace.

Parameters:
  • trace (Trace) – Trace of sampled values

  • thinned_by (int) – the amount that the sampling was thinned by. Used to update the total mc iterations.

property shape#

Get the shape of the samples in chain.

sublattice_composition_variance(sublattice, discard=0, thin_by=1, flat=True)[source]#

Get the variance in composition of a specific sublattice.

property sublattices#

Return the sublattices.

to_hdf5(file_path)[source]#

Save SampleContainer as an HDF5 file.

Parameters:

file_path (str) – path to file save location. If file exists and dimensions match samples will be appended.

property total_mc_steps#

Return the total number of MC steps taken during sampling.

trace_value_variance(name, discard=0, thin_by=1, flat=True)[source]#

Get variance of a traced value given by name.

property traced_values#

Get the names of traced values being sampled.

vacuum()[source]#

Remove any trailing allocated space that has not been used.