Sample Container#
A SampleContainer
holds data (occupancies, feature vectors, energies)
sampled during MC sampling. A SampleContainer
is generally
automatically created when a Sampler
is instantiated. In addition to
holding sample information, the SamplerContainer
also provides
some methods for simple analysis of the raw data, including means,
variances, and minima of energies and enthalpies, mean compositions,
minimum-energy occupancies, etc.
- class SampleContainer(ensemble, sample_trace, sampling_metadata=None)[source]#
Bases:
MSONable
A SampleContainer class stores Monte Carlo simulation samples.
A SampleContainer holds samples and sampling information from an MCMC sampling run. It is useful to obtain the raw data and minimal empirical properties of the underlying distribution in order to carry out further analysis of the MCMC sampling results.
When getting any value from the provided attributes, the highly repeated args are:
- discard (int): optional
number of samples to discard to obtain the value requested.
- thin_by (int): optional
use every thin_by sample to obtain the value requested.
- flat (bool): optional
if more than 1 walker is used, flattening will flatten all chains into one. Defaults to True.
- num_sites#
Size of system (usually in number of prims in supercell, but can be anything representative, i.e. number of sites)
- Type:
int
- sublattices#
Sublattices of the ensemble sampled.
- Type:
list of Sublattice
- natural_parameters#
Array of natural parameters used in the ensemble.
- Type:
ndarray
- metadata#
Dictionary of metadata from the MC run that generated the samples.
- Type:
dict
Initialize a sample container.
- Parameters:
- composition_variance(discard=0, thin_by=1, flat=True)[source]#
Get the variance in composition of all species.
- energy_variance(discard=0, thin_by=1, flat=True)[source]#
Calculate the variance of sampled energies.
- property ensemble#
Return the ensemble name.
- feature_vector_variance(discard=0, thin_by=1, flat=True)[source]#
Get the variance of feature vector elements.
- flush_to_backend(backend)[source]#
Flush current samples and trace to backend file.
- Parameters:
backend (object) – backend file object, currently only hdf5 supported.
- classmethod from_dict(d, ensemble=None)[source]#
Instantiate a SampleContainer from dict representation.
- Parameters:
d (dict) – dictionary representation.
ensemble (Ensemble) – optional The ensemble object to used for generating samples. Only needed to update legacy files.
- Returns:
SampleContainer
- classmethod from_hdf5(file_path, swmr_mode=True, ensemble=None)[source]#
Instantiate a SampleContainer from an hdf5 file.
- Parameters:
file_path (str) – path to file
swmr_mode (bool) – optional If true allows to read file from other processes. Single Writer Multiple Readers.
ensemble (Ensemble) – optional The ensemble object to used for generating samples. Only needed to update legacy files.
- Returns:
SampleContainer
- get_backend(file_path, alloc_nsamples=0, swmr_mode=False)[source]#
Get a backend file object.
Currently only hdf5 files supported
- Parameters:
file_path (str) – path to backend file.
alloc_nsamples (int) – optional number of new samples to allocate. Will only extend datasets if number given is larger than space left to write samples into.
swmr_mode (bool) – optional If true allows to read file from other processes. Single Writer Multiple Readers.
- Returns:
h5.File object
- get_compositions(discard=0, thin_by=1, flat=True)[source]#
Get the compositions for each occupancy in the chain.
- get_enthalpies(discard=0, thin_by=1, flat=True)[source]#
Get the generalized enthalpy changes from samples in chain.
- get_feature_vectors(discard=0, thin_by=1, flat=True)[source]#
Get the feature vector changes from samples in chain.
- get_minimum_energy_occupancy(discard=0, thin_by=1, flat=True)[source]#
Find the occupancy with minimum energy from samples.
- get_minimum_enthalpy_occupancy(discard=0, thin_by=1, flat=True)[source]#
Find the occupancy with minimum energy from samples.
- get_orbit_factors(function_orbit_ids, discard=0, thin_by=1, flat=True)[source]#
Get the orbit factor vectors for samples.
- get_sampled_structures(indices, flat=True)[source]#
Get sampled structures for MC steps given by indices.
- Parameters:
indices (list of int or int) – A single index or list of indices to obtain sampled structures.
flat (bool) – optional If true will flatten chains, and the indices correspond to flattened values. If false chain is not flattened, and if multiple walkers where used returns a list of list where each inner list has the sampled structure for each walker. Default is set to flat True.
- Returns:
list of Structure or list of list of Structure
- get_species_counts(discard=0, thin_by=1, flat=True)[source]#
Get the species counts for each occupancy in the chain.
- get_sublattice_compositions(sublattice, discard=0, thin_by=1, flat=True)[source]#
Get the compositions of a specific sublattice.
- get_sublattice_species_counts(sublattice, discard=0, thin_by=1, flat=True)[source]#
Get the counts of each species in a sublattices.
- Returns:
where last axis is the count for each species in the same order as the underlying site space.
- Return type:
ndarray
- get_trace_value(name, discard=0, thin_by=1, flat=True)[source]#
Get sampled values of a traced value given by name.
- mean_composition(discard=0, thin_by=1, flat=True)[source]#
Get mean composition for all species regardless of sublattice.
- mean_feature_vector(discard=0, thin_by=1, flat=True)[source]#
Get the mean feature vector from samples.
- mean_sublattice_composition(sublattice, discard=0, thin_by=1, flat=True)[source]#
Get the mean composition of a specific sublattice.
- mean_trace_value(name, discard=0, thin_by=1, flat=True)[source]#
Get mean of a traced value given by name.
- property natural_parameters#
Return the natural parameters.
- property num_samples#
Get the total number of samples.
- sampling_efficiency(discard=0, flat=True)[source]#
Return the sampling efficiency for chains.
If the sampling is thinned by > 1, this value becomes only an estimate for the true sampling efficiency, as we do not know the efficiency of the discarded samples.
- save_sampled_trace(trace, thinned_by)[source]#
Save a sampled trace.
- Parameters:
trace (Trace) – Trace of sampled values
thinned_by (int) – the amount that the sampling was thinned by. Used to update the total mc iterations.
- property shape#
Get the shape of the samples in chain.
- sublattice_composition_variance(sublattice, discard=0, thin_by=1, flat=True)[source]#
Get the variance in composition of a specific sublattice.
- property sublattices#
Return the sublattices.
- to_hdf5(file_path)[source]#
Save SampleContainer as an HDF5 file.
- Parameters:
file_path (str) – path to file save location. If file exists and dimensions match samples will be appended.
- property total_mc_steps#
Return the total number of MC steps taken during sampling.
- trace_value_variance(name, discard=0, thin_by=1, flat=True)[source]#
Get variance of a traced value given by name.
- property traced_values#
Get the names of traced values being sampled.