API References

Contents in module s4.thermo.calc

Module for computing thermodynamics using calculated databases (Materials Project).

The general application of this module is to compute thermodynamics quantities for calculated material entries. The following is the step to do this:

Obtain the system of interest using query_system_mp() or
query_system_scan().
Obtain the zero-temperature formation enthalpy using
compute_corrected_dhf_mp() or compute_corrected_dhf_scan().
Interpolate the finite temperature Gibbs energy of formation using
compute_corrected_dgf_mp() or compute_corrected_dgf_scan().

s4.thermo.calc.compute_corrected_dgf_mp(mp_entry: pymatgen.entries.computed_entries.ComputedEntry, temperature: float = 300) → float

Compute corrected \(\Delta G_f(T)\) in ev/atom for a material in the Materials Project.

We apply three corrections as follows:

Elemental contributions as used in the Materials Project.
For carbonates, we apply a correction for CO32- anion (fitted by Haoyan).
Chris Bartels’ finite \(dG_f(T)\) correction.

Parameters

mp_entry – A computed entry retrieved from the Materials Project.
temperature – Temperature for which \(dG_f(T)\) will be computed.

Returns

The corrected gibbs energy of formation at the temperature specified.

s4.thermo.calc.compute_corrected_dgf_scan(mp_entry: pymatgen.entries.computed_entries.ComputedEntry, temperature: float = 300) → float

Compute corrected \(\Delta G_f(T)\) in ev/atom for a material in the SCAN database.

We apply only one corrections as follows:

Chris Bartels’ finite \(dG_f(T)\) correction.

Parameters

mp_entry – A computed entry retrieved from the SCAN database.
temperature – Temperature for which \(dG_f(T)\) will be computed.

Returns

The corrected gibbs energy of formation at the temperature specified.

s4.thermo.calc.compute_corrected_dhf_mp(mp_entry: pymatgen.entries.computed_entries.ComputedEntry) → float

Compute corrected \(\Delta H_f(0K)\) in ev/atom for a material in the Materials Project.

We apply two corrections as follows:

Elemental contributions as used in the Materials Project.
For carbonates, we apply a correction for CO32- anion (fitted by Haoyan).

Parameters: mp_entry – A computed entry retrieved from the Materials Project.
Returns: The corrected formation enthalpy at zero Kelvin.

s4.thermo.calc.compute_corrected_dhf_scan(mp_entry: pymatgen.entries.computed_entries.ComputedEntry) → float

Compute corrected \(\Delta H_f(0K)\) in ev/atom for a material in the SCAN database.

Parameters: mp_entry – A computed entry retrieved from the SCAN database.
Returns: The corrected formation enthalpy at zero Kelvin.

s4.thermo.calc.finite_dg_correction(mp_entry: pymatgen.entries.computed_entries.ComputedEntry, temperature: float, dhf: float) → float

Compute finite-temperature \(dG(T)\) correction using Chris Bartel’s method, see [Chris2018].

Parameters

mp_entry – The entry to a Materials Project entry, which must contain the volume of the structure.
temperature – Finite temperature for which \(dG(T)\) is approximated.
dhf – Zero-temperature formation enthalpy.

Returns

Interpolated gibbs energy of formation at finite temperature.

Chris2018: Bartel, Christopher J., et al. “Physical descriptor for the Gibbs energy of inorganic crystalline solids and temperature-dependent materials chemistry.” Nature communications 9.1 (2018): 1-10.

s4.thermo.calc.query_system_mp(system: Union[List[str], str]) → List[pymatgen.entries.computed_entries.ComputedEntry]

Query the Materials Project to get a list of compositions within a system or a specific composition. There are multiple ways of specifying a system:

List of chemical elements, e.g., Ba-Ti-O.
Exact composition, e.g., BaCO3.

Parameters: system – Chemical system identifier.
Returns: List of matched entries in the Materials Project.

s4.thermo.calc.query_system_scan(system: Union[List[str], str]) → List[pymatgen.entries.computed_entries.ComputedEntry]

Query Ryan’s SCAN database to get a list of compositions within a system or a specific composition. There are multiple ways of specifying a system:

List of chemical elements, e.g., Ba-Ti-O.
Exact composition, e.g., BaCO3.

Parameters: system – Chemical system identifier.
Returns: List of matched entries in the SCAN database.

Contents in module s4.thermo.exp

Module for computing thermodynamics using the FREED database [FREED].

FREED: https://www.thermart.net/freed-thermodynamic-database/

class s4.thermo.exp.EnthalpyEq(A: float, B: float, C: float, D: float, E: float, F: float, T_transition: float, H_transition: Optional[float], comment: Optional[str])

Bases: object

Represents an enthalpy equation in the form of:

\(H(T) - H_0(298K) = AT + BT^2 + \frac{C}{T} + DT^{\frac{1}{2}} + ET^3 + F\)

A: float: Coefficient \(A\).

B: float: Coefficient \(B\).

C: float: Coefficient \(C\).

D: float: Coefficient \(D\).

E: float: Coefficient \(E\).

F: float: Coefficient \(F\).

H_transition: Optional[float]: Transition enthalpy value.

T_transition: float: Transition temperature (upper bound of this equation).

comment: Optional[str]: Comments on this enthalpy equation.

class s4.thermo.exp.ExpThermoDatabase(freed_db_fn='./FREED 11.0.xlsm')

Bases: object

A database containing all entries in FREED database.

dgf(composition: Union[pymatgen.core.composition.Composition, str], temperature: float, unit='ev', allow_extrapolate=False) → float: Get the formation gibbs free energy of a compound at desired temperature.

dhf(composition: Union[pymatgen.core.composition.Composition, str], temperature: float, unit='ev', allow_extrapolate=False) → float: Get the formation enthalpy of a compound at desired temperature.

h(composition: Union[pymatgen.core.composition.Composition, str], temperature: float, unit='ev', allow_extrapolate=False) → float

Get the enthalpy of a compound at desired temperature.

Parameters

composition – Composition of the compound.
temperature – Desired temperature.
unit – Return unit.
allow_extrapolate – Whether an extrapolated value is accepted.

Returns

class s4.thermo.exp.FREEDEntry(formula: str, desc: str, name: str, mineral_name: str, molar_mass: float, density: float, h0: float, ent0: float, tmax: float, dh_eqs: List[s4.thermo.exp.freed.EnthalpyEq], dgf_eqs: List[s4.thermo.exp.freed.GibbsFormationEq])

Bases: object

A verbatim entry in FREED database.

density: float: Density of the material.

desc: str: Description of the material.

dgf(temperature: float, unit='ev', allow_extrapolate=False) → float

Compute gibbs energy of formation at a temperature. If you specify allow_extrapolate=True and the temperature falls outside of the range defined by FREED database, you will get an extrapolated value.

Parameters

temperature – Temperature for the energy calculation.
unit – Can be one of {‘ev’, ‘ev/atom’, ‘kcal’, ‘kj’, ‘cal’}
allow_extrapolate – Whether extrapolation is allowed.

Returns

Gibbs energy of formation at the specified temperature

dgf_eqs: List[s4.thermo.exp.freed.GibbsFormationEq]: List of Gibbs energy equations.

dh_eqs: List[s4.thermo.exp.freed.EnthalpyEq]: List of enthalpy equations.

ent0: float: Entropy of formation at 298K (unit in cal/kelvin).

formula: str: Formula of the compound.

static from_row(row: pandas.core.series.Series): Construct a <FREEDEntry> from a pandas row.

h(temperature: float, unit='ev', allow_extrapolate=False) → float

Compute enthalpy of formation at a temperature. If you specify allow_extrapolate=True and the temperature falls outside of the range defined by FREED database, you will get an extrapolated value.

Parameters

temperature – Temperature for the enthalpy calculation.
unit – Can be one of {‘ev’, ‘ev/atom’, ‘kcal’, ‘kj’, ‘cal’}
allow_extrapolate – Whether extrapolation is allowed.

Returns

Enthalpy of formation at the specified temperature.

h0: float: Standard enthalpy of formation at 298K (unit in cal).

mineral_name: str: Mineral name of the material.

molar_mass: float: Molar mass of the material.

name: str: Name of the material.

plot_dgf(temp_range=None, unit='ev/atom', ax=None, show=True) → None

Plot Gibbs energy of formation at various temperatures

Parameters

temp_range – The range of temperatures. If None, defaults to 298K-TMax.
unit – Unit of the plot
ax – If not None, plot to an existing axis.
show – If True, display this plot after function call.

plot_h(temp_range=None, unit='ev/atom', ax=None, show=True) → None

Plot enthalpy at various temperatures

Parameters

temp_range – The range of temperatures. If None, defaults to 298K-TMax.
unit – Unit of the plot
ax – If not None, plot to an existing axis.
show – If True, display this plot after function call.

tmax: float: Maximal characterized temperature.

class s4.thermo.exp.GibbsFormationEq(A: float, B: float, C: float, D: float, E: float, F: float, G: float, T_transition: float)

Bases: object

Represents an gibbs formation energy equation in the form of:

\(\Delta G_f(T) = AT\ln(T) + BT + CT^2 + \frac{D}{T} + ET^{\frac{1}{2}} + FT^3 + G\)

A: float: Coefficient \(A\).

B: float: Coefficient \(B\).

C: float: Coefficient \(C\).

D: float: Coefficient \(D\).

E: float: Coefficient \(E\).

F: float: Coefficient \(F\).

G: float: Coefficient \(G\).

T_transition: float: Transition temperature (upper bound of this equation).

Contents in module s4.tmr

class s4.tmr.ExpDeterminedMaterial(composition: pymatgen.core.composition.Composition)

Bases: object

Material with experimentally measured (from FREED, see s4.thermo.exp.ExpThermoDatabase) thermodynamic quantities.

composition: pymatgen.core.composition.Composition: The composition of the material.

dgf(temperature: float) → float

Returns the Gibbs energy of formation of this material in ev/atom.

Parameters: temperature – Temperature for which Gibbs energy of formation is calculated..
Returns: Gibbs energy of formation at the specified temperature.

dhf(temperature: float) → float

Returns the formation enthalpy of this material in ev/atom.

Parameters: temperature – Temperature for which formation enthalpy is calculated.
Returns: Formation enthalpy at the specified temperature.

static from_material_dict(material, allow_many=False)

Generate from material data. The parameter material should contain a field named “thermo”. The following is an example:

material = {
    "thermo": [
    {
        "interpolation": "BaCO3",
        "formula": "BaCO3",
    }
]}

class s4.tmr.GasMaterial(composition: pymatgen.core.composition.Composition, fugacity: float)

Bases: object

Gaseous material, such as O2, CO2, etc.

composition: pymatgen.core.composition.Composition: Composition of the gas.

dgf(temperature: float) → float

Returns the Gibbs energy of formation of this material in ev/atom.

Parameters: temperature – Temperature for which Gibbs energy of formation is calculated..
Returns: Gibbs energy of formation at the specified temperature.

dhf(temperature: float) → float

Returns the formation enthalpy of this material in ev/atom.

Parameters: temperature – Temperature for which formation enthalpy is calculated.
Returns: Formation enthalpy at the specified temperature.

fugacity: float: Effective partial pressure of the gas.

mu(temperature: float) → float

Returns the chemical potential of this gas in ev/atom

Parameters: temperature – Temperature for which chemical potential is calculated..
Returns: Chemical potential at the specified temperature.

exception s4.tmr.InvalidMPIdError

Bases: KeyError

The material cannot be associated with a Materials Project ID.

class s4.tmr.MPInterpolatedMaterial(compositions: List[pymatgen.core.composition.Composition], amounts: List[float], mp_entries: List[pymatgen.entries.computed_entries.ComputedEntry])

Bases: object

Material with thermo quantities interpolated using MP.

amounts: List[float]

property composition: Total composition.

compositions: List[pymatgen.core.composition.Composition]

dgf(temperature: float) → float

Returns the dGf of this interpolated material in ev/atom.

Parameters: temperature – Temperature of dGf.
Returns

dhf(temperature) → float

Returns the dHf of this interpolated material in ev/atom.

Returns

static from_material_dict(material, allow_many=False): Generate an instance from materials data.

mp_entries: List[pymatgen.entries.computed_entries.ComputedEntry]

property polytypes: List of polytypes in the structure as predicted by geometry analyzer.

class s4.tmr.MPUniverseInterpolation(mp_space=None, mp_data_by_temp=None)

Bases: object

Interpolate a unknown material to the Materials Project (MP) using all compounds in the MP.

This code is adapted from the original version by Christopher J. Bartel.

The interpolation is done by optimizing the geometry energy, calculated by \(-\exp(-D)\) where \(D=sqrt(\sum fractional\_comp\_diff^2)\), under the constraint that all atoms must conserve during optimization.

static competing_phases(composition: pymatgen.core.composition.Composition, target_space: Set[str], neighbors, neighbor_energies) → Tuple[Dict, List[pymatgen.core.composition.Composition]]

Find all competing phases for a target compound given list of neighboring compounds.

Parameters

composition – Composition to search for.
target_space – Set of chemical elements in the target composition.
neighbors – Neighboring compounds.
neighbor_energies – Energies of the neighboring compounds.

Returns

Energies of the neighboring compounds, and the energy hull.

static geometry_energy(composition: pymatgen.core.composition.Composition, target_space, neighbors): Compute geometry energy given a target compound and a list of neighboring compounds.

interpolate(composition: Union[str, pymatgen.core.composition.Composition]) → Dict[pymatgen.core.composition.Composition, Dict[str, float]]

Interpolate a composition using compounds in the Materials Project.

The returned data looks like the following:

result = {
    Composition("BaO"): {'amt': 1.0, 'E': 0.0}
}

Parameters: composition – The composition to interpolate.
Returns: Dictionary that contains the decomposed compositions and their information.

property mp_data_by_temp: MP thermodynamic data by temperature.

property mp_space: Dictionary describing the MP universe.

neighbors(composition: pymatgen.core.composition.Composition, target_space): Compute the neighboring compounds for a given compound.

static optimize_energy(composition: pymatgen.core.composition.Composition, target_space: Set[str], hull: Dict[pymatgen.core.composition.Composition, Dict[str, float]], competing_phases: List[pymatgen.core.composition.Composition]) → scipy.optimize.optimize.OptimizeResult

Optimize geometry energy and find the combination of competing phases that generate the lowest geometry energy.

Parameters

composition – Composition of the target material.
target_space – List of chemical elements in the target material.
hull – Dictionary whose keys are phases and values are {‘E’: energy}.
competing_phases – List of compositions as competing phases.

Returns

Optimization result that contains the solution.

class s4.tmr.MaterialWithEnergy(thermo: Union[s4.tmr.thermo.MPInterpolatedMaterial, s4.tmr.thermo.GasMaterial], composition: pymatgen.core.composition.Composition, is_target: bool, side: str, amt: float)

Bases: object

A material with thermodynamics quantities resolved.

amt: float: The amount of the material.

composition: pymatgen.core.composition.Composition: The composition of this material.

is_target: bool: Whether this material is a target material.

side: str: Which side of the reaction this material belongs to.

thermo: Union[s4.tmr.thermo.MPInterpolatedMaterial, s4.tmr.thermo.GasMaterial]: The resolved thermodynamic entity.

class s4.tmr.ReactionEnergies(target: pymatgen.core.composition.Composition, species: List[s4.tmr.entry.MaterialWithEnergy], vars_sub: Dict[str, float])

Bases: object

Reaction with thermodynamics quantities.

dgcrxn(temperature: float) → float

Compute the grand canonical energy of reaction at the specified temperature.

Parameters: temperature – Temperature for which the reaction grand canonical energy should be calculated.
Returns: Calculated grand canonical energy of reaction at the specified temperature.

dgrxn(temperature: float) → float

Compute the Gibbs energy of reaction at the specified temperature.

Parameters: temperature – Temperature for which the reaction Gibbs energy should be calculated.
Returns: Calculated Gibbs energy of reaction at the specified temperature.

property reaction_string: str: String representation of the reaction.

species: List[s4.tmr.entry.MaterialWithEnergy]: Species involved in this reaction.

target: pymatgen.core.composition.Composition: The target composition.

vars_sub: Dict[str, float]: Variable substitutions

class s4.tmr.ReactionEntry(k: str, reaction_string: str, exp_t: float, exp_time: float, reactions: List[s4.tmr.entry.ReactionEnergies])

Bases: object

An entry derived from a text-mined recipe.

exp_t: float: Experimental temperature.

exp_time: float: Experimental time.

experimental_dgcrxn() → Dict[pymatgen.core.composition.Composition, float]: Grand canonical free energy of the reaction.

experimental_dgrxn() → Dict[pymatgen.core.composition.Composition, float]: Gibbs free energy of the reaction.

static from_reaction_entry(data, key, override_fugacity=None, use_database='mp')

Compute ReactionEntry from text-mined data, identified by keys.

Parameters

data – Dictionary consisting of reaction entries.
key – Key of entry.
override_fugacity – If not None, set the fugacity (effective partial pressure of certain gases.
use_database – Which thermodynamic database to use. Choices are {‘mp’, ‘freed’}.

k: str: Identifier or key of the reaction.

reaction_string: str: Reaction string.

reactions: List[s4.tmr.entry.ReactionEnergies]: List of reactions contained in this recipe.

s4.tmr.from_reactions(data: Dict[str, dict], keys: List[str], *, override_fugacity=None, use_database='mp')

Generate list of ReactionEntry.

Parameters

data – Dictionary containing solid-state synthesis dataset.
keys – List of keys for which reactions are generate.
override_fugacity – Override some of the fugacities.
use_database – Which database to use.

Returns

s4.tmr.from_reactions_multiprocessing(data: Dict[str, dict], keys: List[str], *, processes=16, suppress_output=True, use_database='mp', override_fugacity: Optional[Dict[pymatgen.core.composition.Composition, float]] = None, return_errors: bool = False) → Union[Dict[str, s4.tmr.entry.ReactionEntry], Tuple]

Generate list of ReactionEntry using multiprocessing.

Parameters

data – Dictionary containing solid-state synthesis dataset.
keys – List of keys for which reactions are generate.
processes – Number of processes to use.
suppress_output – Suppress the output of workers.
use_database – Which database to use.
override_fugacity – Override some of the fugacities.
return_errors – Whether or not to return errors.

Returns

Contents in module s4.ml

Computes features for each reaction.

class s4.ml.features.Featurizer(thermo_temps=None)

Bases: object

This featurizer computes all 133 features used in this work.

The features are divided into four types: 1. Precursor properties (melting points, Gibbs formation energy, formation enthalpy, etc.) 2. Target compositions. 3. Experiment-adjacent features. 4. Thermodynamic driving forces.

elements = ['Li', 'Be', 'B', 'C', 'N', 'O', 'F', 'Na', 'Mg', 'Al', 'Si', 'P', 'S', 'Cl', 'K', 'Ca', 'Sc', 'Ti', 'V', 'Cr', 'Mn', 'Fe', 'Co', 'Ni', 'Cu', 'Zn', 'Ga', 'Ge', 'As', 'Se', 'Br', 'Rb', 'Sr', 'Y', 'Zr', 'Nb', 'Mo', 'Ru', 'Rh', 'Pd', 'Ag', 'Cd', 'In', 'Sn', 'Sb', 'Te', 'Cs', 'Ba', 'La', 'Ce', 'Pr', 'Nd', 'Sm', 'Eu', 'Gd', 'Tb', 'Dy', 'Ho', 'Er', 'Tm', 'Yb', 'Lu', 'Hf', 'Ta', 'W', 'Re', 'Os', 'Ir', 'Hg', 'Tl', 'Pb', 'Bi', 'Th', 'U']: Chemical elements that qualify as a compositional feature.

featurize(reaction: s4.tmr.entry.ReactionEnergies, exp_t: float, exp_time: float, k: str = '', i: int = 0, tms_data: Optional[Dict] = None, calculate_thermo_bounds=True) → Dict[str, float]

Featurize a reaction using the 133 features.

Parameters

reaction – The reaction to compute features for.
k – Key of the recipe.
i – Index of the reaction.
exp_t – Experimental temperature in Celsius.
exp_time – Experimental time.
tms_data – Text-mined dataset dictionary.
calculate_thermo_bounds – Whether to calculate synthesis temperature bounds predicted by thermodynamic.

Returns

Features for this reaction.

thermo_temps = [1073.15, 1173.15, 1273.15, 1373.15, 1473.15, 1573.15]: Temperatures for which thermodynamic driving forces are calculated.

Plotting utility functions.

s4.ml.plotting.generate_pt_triangulation(ncols: int, rows2id: Dict[int, int], padding=(0.1, 0.1, 0.1, 0.1), spacing=1.5, header=0.5)

Generate triangulation for plotting side-by-side periodic tables.

Parameters

ncols – Number of columns.
rows2id – Periodic-table row id to index-based row id.
padding – Padding of the triangles inside each box.
spacing – Height of each box.
header – Height of the header (i.e., chemical element symbol).

Returns

Generated triangulation.

s4.ml.plotting.periodic_table_heatmap(elemental_data: Dict[str, float], cbar_label: str = '', cbar_label_size: int = 14, show_plot: bool = False, cmap: str = 'YlOrRd', cmap_range: Optional[Tuple[float, float]] = None, blank_color: str = 'w', value_format: Optional[str] = None, max_row: int = 9, ax=None)

Plot heatmap in a periodic table.

Copied from pymatgen.util.plotting.

Parameters

elemental_data – The dictionary of elemental data.
cbar_label – Label for the colorbar.
cbar_label_size – Label size for the colorbar.
show_plot – Whether to show this plot after plotting.
cmap – What colormap to use.
cmap_range – Range of the colormap.
cax – The ax to put colorbar.
blank_color – Color of blank elemental data.
value_format – Formatter to use for values.
max_row – Maximal number of rows.
ax – Ax to put the periodic table.

s4.ml.plotting.periodic_table_heatmap_sbs(elemental_data1: Dict[str, float], elemental_data2: Dict[str, float], cbar_label: str = '', cbar_label_size: int = 10, show_plot: bool = False, cmap: str = 'YlOrRd', cmap_range: Optional[Tuple[float, float]] = None, cax=None, blank_color: str = 'w', value_format: Optional[str] = None, include_rows: Optional[List[int]] = None, ax=None)

Plot heatmaps side-by-side.

Parameters

elemental_data1 – The first set of elemental data.
elemental_data2 – The second set of elemental data.
cbar_label – Label for the colorbar.
cbar_label_size – Label size for the colorbar.
show_plot – Whether to show this plot after plotting.
cmap – What colormap to use.
cmap_range – Range of the colormap.
cax – The ax to put colorbar.
blank_color – Color of blank elemental data.
value_format – Formatter to use for values.
include_rows – What rows to include in the plot.
ax – Ax to put the periodic table.

s4.ml.plotting.scatter_matrix_plot(data_frame: pandas.core.frame.DataFrame, figsize: Tuple[int, int] = (10, 10), fontsize: int = 9, confusion_matrix_fontsize: int = 9, hist_bins: int = 20, binary_bins: int = 20, scatter_alpha: float = 0.3, scatter_size: int = 2)

Plot scatter matrix, just as pandas.plotting.scatter_matrix.

Parameters

data_frame – The data frame to plot.
figsize – Figure size.
fontsize – Fontsize in the plot.
confusion_matrix_fontsize – Fontsize for the labels in the confusion matrix.
hist_bins – Number of bins for histograms.
binary_bins – Number of bins for the binary variable histograms.
scatter_alpha – Alpha of the points in the scatter plot.
scatter_size – Size of the points in the scatter plot.

Simple linear models (OLS) and weighted least squares (WLS)

s4.ml.wls.feature_importance(data_frame: pandas.core.frame.DataFrame, features: List[str], y_name: str, *, do_wls: bool = True, weights=None, submodel_search=100, processes=None)

Estimate average partial importance.

The feature importance is calculated by sampling N=submodel_search submodels, since the total size of the submodel space may be huge to sample and calculate.

Parameters

data_frame – Training data frame.
features – List of features to use.
y_name – Prediction target name.
do_wls – Whether or not to perform WLS.
weights – Sample weights.
submodel_search – Number of sub-models when computing importance.
processes – Number of processes.

s4.ml.wls.forward_selection_by_bic(data_frame: pandas.core.frame.DataFrame, features_sorted: List[str], y_name: str, weights: Optional[numpy.ndarray] = None, do_wls: bool = False)

Perform feature forward selection by optimizing BIC (Bayesian Information Criteria). The features are taken according to the order defined by features_sorted.

Parameters

data_frame – The training data frame.
features_sorted – Features to use. Features are taken in order.
y_name – Prediction target name.
weights – If do_wls=True, use this as sample weights.
do_wls – Whether to use Weighted Least Squares.

s4.ml.wls.linear_model_analysis(data_frame: pandas.core.frame.DataFrame, features: List[str], y_name: str, y_desc: str, *, do_wls: bool = True, weights: Optional[numpy.ndarray] = None, do_feature_selection: bool = True, do_plot: bool = True, processes=None)

Perform feature importance analysis and feature selection on a dataset using OLS/WLS.

Parameters

data_frame – Training data frame.
features – List of features to use.
y_name – Prediction target name.
y_desc – Description of prediction target, will used in plots.
do_wls – Whether or not to perform WLS.
weights – Sample weights.
do_feature_selection – Whether or not to perform feature selection.
do_plot – Whether or not to plot data.
processes – Number of processes to use.

Functions for creating nonlinear models for synthesis prediction.

s4.ml.nonlinear.custom_grid_search(estimator_cls, params, data_frame, features, y_name, weights=None, do_normalization=False, cv_method='loo', processes=None)

Perform Grid search CV and find the best parameters.

Parameters

estimator_cls – scikit-learn learner class.
params – Parameters to use for estimator_cls.
data_frame – Training data frame.
features – Features to use.
y_name – Prediction target variable name.
weights – Sample weights.
do_normalization – Whether to do normalization for training features.
cv_method – Cross-validation method.
processes – Number of multiprocessing processes to use.

s4.ml.nonlinear.estimate_average_partial_importance(estimator: sklearn.base.RegressorMixin, data_frame: pandas.core.frame.DataFrame, features: List[str], y_name: str, *, cv_method: Union[int, str] = 'loo', weights: Optional[numpy.ndarray] = None, features_inspect: Optional[List[str]] = None, submodel_search: int = 100, processes: Optional[int] = None)

Estimate average partial importance using cross-validation method specified.

Since the total size of the submodel space may be huge to sample and calculate, the feature importance is calculated by sampling N=submodel_search submodels,

Parameters

estimator – scikit-learn estimator to use.
data_frame – Training data frame.
features – List of features to use.
y_name – Prediction target name.
cv_method – What cross-validation to use, can be ‘loo’, int (k-fold).
weights – Sample weights.
features_inspect – What features to use when computing importance.
submodel_search – Number of sub-models when computing importance.
processes – Number of multiprocessing processes.

s4.ml.nonlinear.model_cv_analysis(data_frame: pandas.core.frame.DataFrame, features: List[str], y_name: str, y_desc: str, *, weights: Optional[numpy.ndarray] = None, estimator: Optional[sklearn.base.RegressorMixin] = None, do_normalization: bool = False, cv_method: Union[str, int] = 'loo', do_plot: bool = True, processes: Optional[int] = None, display_pbar: bool = True)

Perform cross validation on a dataset using multiprocessing.

Parameters

estimator – scikit-learn estimator to use.
data_frame – Training data frame.
features – List of features to use.
y_name – Prediction target name.
y_desc – Description of prediction target, will used in plots.
do_normalization – Do normalization before running model.
cv_method – What cross-validation to use, can be ‘loo’, int (k-fold).
weights – Sample weights.
processes – Number of multiprocessing processes.
do_plot – Whether or not to plot data.
display_pbar – Whether or not to display progress bar.

Contents in module s4.cascade

Stuff for analyzing cascades.

s4.cascade.analysis.compute_cascade(reaction: s4.tmr.entry.ReactionEnergies, step_temperatures: List[float], only_icsd: bool = True, use_scan: bool = False)

Compute cascade for a reaction.

Parameters

reaction – The reaction to calculate cascades for.
step_temperatures – List of temperatures for the cascade reactions.
only_icsd – Only use ICSD universe.
use_scan – Use SCAN energies.

Returns

List of pairwise reactions.

Thermodynamic calculations of the cascade model.

class s4.cascade.thermo.ReactionDrivingForce(reaction_string: str, driving_force: float, reactants: List[Tuple[Union[pymatgen.core.composition.Composition, str], float]], gases: List[Tuple[Union[pymatgen.core.composition.Composition, str], float]])

Bases: object

Calculated driving force of a inorganic synthesis reaction.

driving_force: float

gases: List[Tuple[Union[pymatgen.core.composition.Composition, str], float]]

reactants: List[Tuple[Union[pymatgen.core.composition.Composition, str], float]]

reaction_string: str

s4.cascade.thermo.get_dgf_fu(composition: Union[pymatgen.core.composition.Composition, str], temperature: float, use_mp=False, use_scan=False) → float

Get the gibbs formation energy for a material. If use_mp=True, the value is from Materials Project with finite gibbs formation energy interpolation. Otherwise, it’s obtained from FREED (experimental) database.

Parameters

composition – Composition of the material.
temperature – Temperature of the material.
use_mp – Whether to use MP data.
use_scan – Whether to use SCAN data.

Returns

Gibbs formation energy of the compound.

s4.cascade.thermo.get_gas_mu(composition: Union[pymatgen.core.composition.Composition, str], temperature: float, fugacity: float = 1.0) → float

Compute chemical potential of gas. Enthalpy values are from the FREED experimental database. Entropy values are from NIST data files.

Parameters

composition – Composition of the gas.
temperature – Temperature of the gas.
fugacity – Fugacity (partial pressure) of the gas.

Returns

Gas molecule chemical potential.

s4.cascade.thermo.reaction_driving_force(precursors: List[Union[pymatgen.core.composition.Composition, str]], open_comp: List[Union[pymatgen.core.composition.Composition, str]], target: Union[pymatgen.core.composition.Composition, str], target_mixture: Dict[s4.thermo.utils.as_composition, float], atom_set: Set[pymatgen.core.periodic_table.Element], temperature: float, gas_partials: Dict[s4.thermo.utils.as_composition, float], use_scan: bool) → Optional[s4.cascade.thermo.ReactionDrivingForce]

Balance reaction and compute grand canonical driving force at once. The computed driving forces are in ev/metal_atom. Note that the reaction equation is normalized to have 1 metal atom per target composition.

Parameters

precursors – List of precursors of the reaction.
open_comp – List of open compositions of the reaction.
target – Target material composition.
target_mixture – Dictionary containing the mixtures of target material.
atom_set – Set of “non-volatile” atoms.
temperature – Temperature of the reaction.
gas_partials – Dictionary containing gas partial pressures.
use_scan – Whether to use SCAN database.

Returns

Calculated reaction driving force, or None if no reaction can be balanced.

s4.cascade.thermo.reduce_formula(sym_amt: Mapping[pymatgen.core.periodic_table.Element, float], iupac_ordering: bool = False) → Tuple[str, int]

Faster implementation of pymatgen.periodic_table.reduce_formula.

The original pymatgen implementation is too slow. For example, some conversions between string and Element are not necessary. Since we will call this function for significantly many times, we need to optimize this function as much as possible.

Parameters

sym_amt – Dictionary that contains {Elements: amount}
iupac_ordering – Whether to use IUPAC ordering.

Returns

The reduced composition as string and the factor of reduction.

Greedy finding competing pairwise reactions.

s4.cascade.greedy_resolver.find_competing_reactions(precursors: List[pymatgen.core.composition.Composition], open_comp: List[pymatgen.core.composition.Composition], all_targets: List[pymatgen.core.composition.Composition], target_mixture: Dict[pymatgen.core.composition.Composition, Dict[pymatgen.core.composition.Composition, float]], atom_set, temperature: float, gas_partials: Optional[Dict[pymatgen.core.composition.Composition, float]], use_scan=False)

Find competing pairwise reactions at a interface by finding largest driving force.

Parameters

precursors – List of precursors to consider.
open_comp – List of open compositions such as gas.
all_targets – List of possible target compounds.
target_mixture – Composition mixture.
atom_set – Set of “non-volatile” atoms.
temperature – Temperature of the reaction.
gas_partials – Dictionary containing gas partial pressures.
use_scan – Whether to use SCAN database.

The cascade model implementation.

s4.cascade.cascade_model.find_cascade_reaction(precursors: List[pymatgen.core.composition.Composition], open_comp: List[pymatgen.core.composition.Composition], temperature: float, target_mixture: Dict[pymatgen.core.composition.Composition, Dict[pymatgen.core.composition.Composition, float]], gas_partials: Optional[Dict[pymatgen.core.composition.Composition, float]] = None, use_scan=False, only_icsd=True): Find pairwise reaction by comparing energies at the interface.

s4.cascade.cascade_model.find_competing_reactions(precursors: List[pymatgen.core.composition.Composition], open_comp: List[pymatgen.core.composition.Composition], all_targets: List[pymatgen.core.composition.Composition], target_mixture: Dict[pymatgen.core.composition.Composition, Dict[pymatgen.core.composition.Composition, float]], atom_set, temperature: float, gas_partials: Optional[Dict[pymatgen.core.composition.Composition, float]], use_scan=False)

Find competing pairwise reactions at a interface by finding largest driving force.

Parameters

precursors – List of precursors to consider.
open_comp – List of open compositions such as gas.
all_targets – List of possible target compounds.
target_mixture – Composition mixture.
atom_set – Set of “non-volatile” atoms.
temperature – Temperature of the reaction.
gas_partials – Dictionary containing gas partial pressures.
use_scan – Whether to use SCAN database.

Utility to quickly balance reactions.

exception s4.cascade.balance.CannotBalance

Bases: ValueError

A reaction cannot be balanced.

s4.cascade.balance.quick_balance(precursors: List[Union[pymatgen.core.composition.Composition, str]], open_comp: List[Union[pymatgen.core.composition.Composition, str]], target: Union[pymatgen.core.composition.Composition, str]) → Tuple[List[float], List[float]]

Balance a chemical reaction.

Parameters

precursors – List of precursors.
open_comp – List of open compositions.
target – The target material.

Returns

Coefficients for the precursor and open materials.