API References

Contents in module s4.thermo.calc

Module for computing thermodynamics using calculated databases (Materials Project).

The general application of this module is to compute thermodynamics quantities for calculated material entries. The following is the step to do this:

  1. Obtain the system of interest using query_system_mp() or

    query_system_scan().

  2. Obtain the zero-temperature formation enthalpy using

    compute_corrected_dhf_mp() or compute_corrected_dhf_scan().

  3. Interpolate the finite temperature Gibbs energy of formation using

    compute_corrected_dgf_mp() or compute_corrected_dgf_scan().

s4.thermo.calc.compute_corrected_dgf_mp(mp_entry: pymatgen.entries.computed_entries.ComputedEntry, temperature: float = 300) float

Compute corrected \(\Delta G_f(T)\) in ev/atom for a material in the Materials Project.

We apply three corrections as follows:

  1. Elemental contributions as used in the Materials Project.

  2. For carbonates, we apply a correction for CO32- anion (fitted by Haoyan).

  3. Chris Bartels’ finite \(dG_f(T)\) correction.

Parameters
  • mp_entry – A computed entry retrieved from the Materials Project.

  • temperature – Temperature for which \(dG_f(T)\) will be computed.

Returns

The corrected gibbs energy of formation at the temperature specified.

s4.thermo.calc.compute_corrected_dgf_scan(mp_entry: pymatgen.entries.computed_entries.ComputedEntry, temperature: float = 300) float

Compute corrected \(\Delta G_f(T)\) in ev/atom for a material in the SCAN database.

We apply only one corrections as follows:

  1. Chris Bartels’ finite \(dG_f(T)\) correction.

Parameters
  • mp_entry – A computed entry retrieved from the SCAN database.

  • temperature – Temperature for which \(dG_f(T)\) will be computed.

Returns

The corrected gibbs energy of formation at the temperature specified.

s4.thermo.calc.compute_corrected_dhf_mp(mp_entry: pymatgen.entries.computed_entries.ComputedEntry) float

Compute corrected \(\Delta H_f(0K)\) in ev/atom for a material in the Materials Project.

We apply two corrections as follows:

  1. Elemental contributions as used in the Materials Project.

  2. For carbonates, we apply a correction for CO32- anion (fitted by Haoyan).

Parameters

mp_entry – A computed entry retrieved from the Materials Project.

Returns

The corrected formation enthalpy at zero Kelvin.

s4.thermo.calc.compute_corrected_dhf_scan(mp_entry: pymatgen.entries.computed_entries.ComputedEntry) float

Compute corrected \(\Delta H_f(0K)\) in ev/atom for a material in the SCAN database.

Parameters

mp_entry – A computed entry retrieved from the SCAN database.

Returns

The corrected formation enthalpy at zero Kelvin.

s4.thermo.calc.finite_dg_correction(mp_entry: pymatgen.entries.computed_entries.ComputedEntry, temperature: float, dhf: float) float

Compute finite-temperature \(dG(T)\) correction using Chris Bartel’s method, see [Chris2018].

Parameters
  • mp_entry – The entry to a Materials Project entry, which must contain the volume of the structure.

  • temperature – Finite temperature for which \(dG(T)\) is approximated.

  • dhf – Zero-temperature formation enthalpy.

Returns

Interpolated gibbs energy of formation at finite temperature.

Chris2018

Bartel, Christopher J., et al. “Physical descriptor for the Gibbs energy of inorganic crystalline solids and temperature-dependent materials chemistry.” Nature communications 9.1 (2018): 1-10.

s4.thermo.calc.query_system_mp(system: Union[List[str], str]) List[pymatgen.entries.computed_entries.ComputedEntry]

Query the Materials Project to get a list of compositions within a system or a specific composition. There are multiple ways of specifying a system:

  1. List of chemical elements, e.g., Ba-Ti-O.

  2. Exact composition, e.g., BaCO3.

Parameters

system – Chemical system identifier.

Returns

List of matched entries in the Materials Project.

s4.thermo.calc.query_system_scan(system: Union[List[str], str]) List[pymatgen.entries.computed_entries.ComputedEntry]

Query Ryan’s SCAN database to get a list of compositions within a system or a specific composition. There are multiple ways of specifying a system:

  1. List of chemical elements, e.g., Ba-Ti-O.

  2. Exact composition, e.g., BaCO3.

Parameters

system – Chemical system identifier.

Returns

List of matched entries in the SCAN database.

Contents in module s4.thermo.exp

Module for computing thermodynamics using the FREED database [FREED].

FREED

https://www.thermart.net/freed-thermodynamic-database/

class s4.thermo.exp.EnthalpyEq(A: float, B: float, C: float, D: float, E: float, F: float, T_transition: float, H_transition: Optional[float], comment: Optional[str])

Bases: object

Represents an enthalpy equation in the form of:

\(H(T) - H_0(298K) = AT + BT^2 + \frac{C}{T} + DT^{\frac{1}{2}} + ET^3 + F\)

A: float

Coefficient \(A\).

B: float

Coefficient \(B\).

C: float

Coefficient \(C\).

D: float

Coefficient \(D\).

E: float

Coefficient \(E\).

F: float

Coefficient \(F\).

H_transition: Optional[float]

Transition enthalpy value.

T_transition: float

Transition temperature (upper bound of this equation).

comment: Optional[str]

Comments on this enthalpy equation.

class s4.thermo.exp.ExpThermoDatabase(freed_db_fn='./FREED 11.0.xlsm')

Bases: object

A database containing all entries in FREED database.

dgf(composition: Union[pymatgen.core.composition.Composition, str], temperature: float, unit='ev', allow_extrapolate=False) float

Get the formation gibbs free energy of a compound at desired temperature.

dhf(composition: Union[pymatgen.core.composition.Composition, str], temperature: float, unit='ev', allow_extrapolate=False) float

Get the formation enthalpy of a compound at desired temperature.

h(composition: Union[pymatgen.core.composition.Composition, str], temperature: float, unit='ev', allow_extrapolate=False) float

Get the enthalpy of a compound at desired temperature.

Parameters
  • composition – Composition of the compound.

  • temperature – Desired temperature.

  • unit – Return unit.

  • allow_extrapolate – Whether an extrapolated value is accepted.

Returns

class s4.thermo.exp.FREEDEntry(formula: str, desc: str, name: str, mineral_name: str, molar_mass: float, density: float, h0: float, ent0: float, tmax: float, dh_eqs: List[s4.thermo.exp.freed.EnthalpyEq], dgf_eqs: List[s4.thermo.exp.freed.GibbsFormationEq])

Bases: object

A verbatim entry in FREED database.

density: float

Density of the material.

desc: str

Description of the material.

dgf(temperature: float, unit='ev', allow_extrapolate=False) float

Compute gibbs energy of formation at a temperature. If you specify allow_extrapolate=True and the temperature falls outside of the range defined by FREED database, you will get an extrapolated value.

Parameters
  • temperature – Temperature for the energy calculation.

  • unit – Can be one of {‘ev’, ‘ev/atom’, ‘kcal’, ‘kj’, ‘cal’}

  • allow_extrapolate – Whether extrapolation is allowed.

Returns

Gibbs energy of formation at the specified temperature

dgf_eqs: List[s4.thermo.exp.freed.GibbsFormationEq]

List of Gibbs energy equations.

dh_eqs: List[s4.thermo.exp.freed.EnthalpyEq]

List of enthalpy equations.

ent0: float

Entropy of formation at 298K (unit in cal/kelvin).

formula: str

Formula of the compound.

static from_row(row: pandas.core.series.Series)

Construct a <FREEDEntry> from a pandas row.

h(temperature: float, unit='ev', allow_extrapolate=False) float

Compute enthalpy of formation at a temperature. If you specify allow_extrapolate=True and the temperature falls outside of the range defined by FREED database, you will get an extrapolated value.

Parameters
  • temperature – Temperature for the enthalpy calculation.

  • unit – Can be one of {‘ev’, ‘ev/atom’, ‘kcal’, ‘kj’, ‘cal’}

  • allow_extrapolate – Whether extrapolation is allowed.

Returns

Enthalpy of formation at the specified temperature.

h0: float

Standard enthalpy of formation at 298K (unit in cal).

mineral_name: str

Mineral name of the material.

molar_mass: float

Molar mass of the material.

name: str

Name of the material.

plot_dgf(temp_range=None, unit='ev/atom', ax=None, show=True) None

Plot Gibbs energy of formation at various temperatures

Parameters
  • temp_range – The range of temperatures. If None, defaults to 298K-TMax.

  • unit – Unit of the plot

  • ax – If not None, plot to an existing axis.

  • show – If True, display this plot after function call.

plot_h(temp_range=None, unit='ev/atom', ax=None, show=True) None

Plot enthalpy at various temperatures

Parameters
  • temp_range – The range of temperatures. If None, defaults to 298K-TMax.

  • unit – Unit of the plot

  • ax – If not None, plot to an existing axis.

  • show – If True, display this plot after function call.

tmax: float

Maximal characterized temperature.

class s4.thermo.exp.GibbsFormationEq(A: float, B: float, C: float, D: float, E: float, F: float, G: float, T_transition: float)

Bases: object

Represents an gibbs formation energy equation in the form of:

\(\Delta G_f(T) = AT\ln(T) + BT + CT^2 + \frac{D}{T} + ET^{\frac{1}{2}} + FT^3 + G\)

A: float

Coefficient \(A\).

B: float

Coefficient \(B\).

C: float

Coefficient \(C\).

D: float

Coefficient \(D\).

E: float

Coefficient \(E\).

F: float

Coefficient \(F\).

G: float

Coefficient \(G\).

T_transition: float

Transition temperature (upper bound of this equation).

Contents in module s4.tmr

class s4.tmr.ExpDeterminedMaterial(composition: pymatgen.core.composition.Composition)

Bases: object

Material with experimentally measured (from FREED, see s4.thermo.exp.ExpThermoDatabase) thermodynamic quantities.

composition: pymatgen.core.composition.Composition

The composition of the material.

dgf(temperature: float) float

Returns the Gibbs energy of formation of this material in ev/atom.

Parameters

temperature – Temperature for which Gibbs energy of formation is calculated..

Returns

Gibbs energy of formation at the specified temperature.

dhf(temperature: float) float

Returns the formation enthalpy of this material in ev/atom.

Parameters

temperature – Temperature for which formation enthalpy is calculated.

Returns

Formation enthalpy at the specified temperature.

static from_material_dict(material, allow_many=False)

Generate from material data. The parameter material should contain a field named “thermo”. The following is an example:

material = {
    "thermo": [
    {
        "interpolation": "BaCO3",
        "formula": "BaCO3",
    }
]}
class s4.tmr.GasMaterial(composition: pymatgen.core.composition.Composition, fugacity: float)

Bases: object

Gaseous material, such as O2, CO2, etc.

composition: pymatgen.core.composition.Composition

Composition of the gas.

dgf(temperature: float) float

Returns the Gibbs energy of formation of this material in ev/atom.

Parameters

temperature – Temperature for which Gibbs energy of formation is calculated..

Returns

Gibbs energy of formation at the specified temperature.

dhf(temperature: float) float

Returns the formation enthalpy of this material in ev/atom.

Parameters

temperature – Temperature for which formation enthalpy is calculated.

Returns

Formation enthalpy at the specified temperature.

fugacity: float

Effective partial pressure of the gas.

mu(temperature: float) float

Returns the chemical potential of this gas in ev/atom

Parameters

temperature – Temperature for which chemical potential is calculated..

Returns

Chemical potential at the specified temperature.

exception s4.tmr.InvalidMPIdError

Bases: KeyError

The material cannot be associated with a Materials Project ID.

class s4.tmr.MPInterpolatedMaterial(compositions: List[pymatgen.core.composition.Composition], amounts: List[float], mp_entries: List[pymatgen.entries.computed_entries.ComputedEntry])

Bases: object

Material with thermo quantities interpolated using MP.

amounts: List[float]
property composition

Total composition.

compositions: List[pymatgen.core.composition.Composition]
dgf(temperature: float) float

Returns the dGf of this interpolated material in ev/atom.

Parameters

temperature – Temperature of dGf.

Returns

dhf(temperature) float

Returns the dHf of this interpolated material in ev/atom.

Returns

static from_material_dict(material, allow_many=False)

Generate an instance from materials data.

mp_entries: List[pymatgen.entries.computed_entries.ComputedEntry]
property polytypes

List of polytypes in the structure as predicted by geometry analyzer.

class s4.tmr.MPUniverseInterpolation(mp_space=None, mp_data_by_temp=None)

Bases: object

Interpolate a unknown material to the Materials Project (MP) using all compounds in the MP.

This code is adapted from the original version by Christopher J. Bartel.

The interpolation is done by optimizing the geometry energy, calculated by \(-\exp(-D)\) where \(D=sqrt(\sum fractional\_comp\_diff^2)\), under the constraint that all atoms must conserve during optimization.

static competing_phases(composition: pymatgen.core.composition.Composition, target_space: Set[str], neighbors, neighbor_energies) Tuple[Dict, List[pymatgen.core.composition.Composition]]

Find all competing phases for a target compound given list of neighboring compounds.

Parameters
  • composition – Composition to search for.

  • target_space – Set of chemical elements in the target composition.

  • neighbors – Neighboring compounds.

  • neighbor_energies – Energies of the neighboring compounds.

Returns

Energies of the neighboring compounds, and the energy hull.

static geometry_energy(composition: pymatgen.core.composition.Composition, target_space, neighbors)

Compute geometry energy given a target compound and a list of neighboring compounds.

interpolate(composition: Union[str, pymatgen.core.composition.Composition]) Dict[pymatgen.core.composition.Composition, Dict[str, float]]

Interpolate a composition using compounds in the Materials Project.

The returned data looks like the following:

result = {
    Composition("BaO"): {'amt': 1.0, 'E': 0.0}
}
Parameters

composition – The composition to interpolate.

Returns

Dictionary that contains the decomposed compositions and their information.

property mp_data_by_temp

MP thermodynamic data by temperature.

property mp_space

Dictionary describing the MP universe.

neighbors(composition: pymatgen.core.composition.Composition, target_space)

Compute the neighboring compounds for a given compound.

static optimize_energy(composition: pymatgen.core.composition.Composition, target_space: Set[str], hull: Dict[pymatgen.core.composition.Composition, Dict[str, float]], competing_phases: List[pymatgen.core.composition.Composition]) scipy.optimize.optimize.OptimizeResult

Optimize geometry energy and find the combination of competing phases that generate the lowest geometry energy.

Parameters
  • composition – Composition of the target material.

  • target_space – List of chemical elements in the target material.

  • hull – Dictionary whose keys are phases and values are {‘E’: energy}.

  • competing_phases – List of compositions as competing phases.

Returns

Optimization result that contains the solution.

class s4.tmr.MaterialWithEnergy(thermo: Union[s4.tmr.thermo.MPInterpolatedMaterial, s4.tmr.thermo.GasMaterial], composition: pymatgen.core.composition.Composition, is_target: bool, side: str, amt: float)

Bases: object

A material with thermodynamics quantities resolved.

amt: float

The amount of the material.

composition: pymatgen.core.composition.Composition

The composition of this material.

is_target: bool

Whether this material is a target material.

side: str

Which side of the reaction this material belongs to.

thermo: Union[s4.tmr.thermo.MPInterpolatedMaterial, s4.tmr.thermo.GasMaterial]

The resolved thermodynamic entity.

class s4.tmr.ReactionEnergies(target: pymatgen.core.composition.Composition, species: List[s4.tmr.entry.MaterialWithEnergy], vars_sub: Dict[str, float])

Bases: object

Reaction with thermodynamics quantities.

dgcrxn(temperature: float) float

Compute the grand canonical energy of reaction at the specified temperature.

Parameters

temperature – Temperature for which the reaction grand canonical energy should be calculated.

Returns

Calculated grand canonical energy of reaction at the specified temperature.

dgrxn(temperature: float) float

Compute the Gibbs energy of reaction at the specified temperature.

Parameters

temperature – Temperature for which the reaction Gibbs energy should be calculated.

Returns

Calculated Gibbs energy of reaction at the specified temperature.

property reaction_string: str

String representation of the reaction.

species: List[s4.tmr.entry.MaterialWithEnergy]

Species involved in this reaction.

target: pymatgen.core.composition.Composition

The target composition.

vars_sub: Dict[str, float]

Variable substitutions

class s4.tmr.ReactionEntry(k: str, reaction_string: str, exp_t: float, exp_time: float, reactions: List[s4.tmr.entry.ReactionEnergies])

Bases: object

An entry derived from a text-mined recipe.

exp_t: float

Experimental temperature.

exp_time: float

Experimental time.

experimental_dgcrxn() Dict[pymatgen.core.composition.Composition, float]

Grand canonical free energy of the reaction.

experimental_dgrxn() Dict[pymatgen.core.composition.Composition, float]

Gibbs free energy of the reaction.

static from_reaction_entry(data, key, override_fugacity=None, use_database='mp')

Compute ReactionEntry from text-mined data, identified by keys.

Parameters
  • data – Dictionary consisting of reaction entries.

  • key – Key of entry.

  • override_fugacity – If not None, set the fugacity (effective partial pressure of certain gases.

  • use_database – Which thermodynamic database to use. Choices are {‘mp’, ‘freed’}.

k: str

Identifier or key of the reaction.

reaction_string: str

Reaction string.

reactions: List[s4.tmr.entry.ReactionEnergies]

List of reactions contained in this recipe.

s4.tmr.from_reactions(data: Dict[str, dict], keys: List[str], *, override_fugacity=None, use_database='mp')

Generate list of ReactionEntry.

Parameters
  • data – Dictionary containing solid-state synthesis dataset.

  • keys – List of keys for which reactions are generate.

  • override_fugacity – Override some of the fugacities.

  • use_database – Which database to use.

Returns

s4.tmr.from_reactions_multiprocessing(data: Dict[str, dict], keys: List[str], *, processes=16, suppress_output=True, use_database='mp', override_fugacity: Optional[Dict[pymatgen.core.composition.Composition, float]] = None, return_errors: bool = False) Union[Dict[str, s4.tmr.entry.ReactionEntry], Tuple]

Generate list of ReactionEntry using multiprocessing.

Parameters
  • data – Dictionary containing solid-state synthesis dataset.

  • keys – List of keys for which reactions are generate.

  • processes – Number of processes to use.

  • suppress_output – Suppress the output of workers.

  • use_database – Which database to use.

  • override_fugacity – Override some of the fugacities.

  • return_errors – Whether or not to return errors.

Returns

Contents in module s4.ml

Computes features for each reaction.

class s4.ml.features.Featurizer(thermo_temps=None)

Bases: object

This featurizer computes all 133 features used in this work.

The features are divided into four types: 1. Precursor properties (melting points, Gibbs formation energy, formation enthalpy, etc.) 2. Target compositions. 3. Experiment-adjacent features. 4. Thermodynamic driving forces.

elements = ['Li', 'Be', 'B', 'C', 'N', 'O', 'F', 'Na', 'Mg', 'Al', 'Si', 'P', 'S', 'Cl', 'K', 'Ca', 'Sc', 'Ti', 'V', 'Cr', 'Mn', 'Fe', 'Co', 'Ni', 'Cu', 'Zn', 'Ga', 'Ge', 'As', 'Se', 'Br', 'Rb', 'Sr', 'Y', 'Zr', 'Nb', 'Mo', 'Ru', 'Rh', 'Pd', 'Ag', 'Cd', 'In', 'Sn', 'Sb', 'Te', 'Cs', 'Ba', 'La', 'Ce', 'Pr', 'Nd', 'Sm', 'Eu', 'Gd', 'Tb', 'Dy', 'Ho', 'Er', 'Tm', 'Yb', 'Lu', 'Hf', 'Ta', 'W', 'Re', 'Os', 'Ir', 'Hg', 'Tl', 'Pb', 'Bi', 'Th', 'U']

Chemical elements that qualify as a compositional feature.

featurize(reaction: s4.tmr.entry.ReactionEnergies, exp_t: float, exp_time: float, k: str = '', i: int = 0, tms_data: Optional[Dict] = None, calculate_thermo_bounds=True) Dict[str, float]

Featurize a reaction using the 133 features.

Parameters
  • reaction – The reaction to compute features for.

  • k – Key of the recipe.

  • i – Index of the reaction.

  • exp_t – Experimental temperature in Celsius.

  • exp_time – Experimental time.

  • tms_data – Text-mined dataset dictionary.

  • calculate_thermo_bounds – Whether to calculate synthesis temperature bounds predicted by thermodynamic.

Returns

Features for this reaction.

thermo_temps = [1073.15, 1173.15, 1273.15, 1373.15, 1473.15, 1573.15]

Temperatures for which thermodynamic driving forces are calculated.

Plotting utility functions.

s4.ml.plotting.generate_pt_triangulation(ncols: int, rows2id: Dict[int, int], padding=(0.1, 0.1, 0.1, 0.1), spacing=1.5, header=0.5)

Generate triangulation for plotting side-by-side periodic tables.

Parameters
  • ncols – Number of columns.

  • rows2id – Periodic-table row id to index-based row id.

  • padding – Padding of the triangles inside each box.

  • spacing – Height of each box.

  • header – Height of the header (i.e., chemical element symbol).

Returns

Generated triangulation.

s4.ml.plotting.periodic_table_heatmap(elemental_data: Dict[str, float], cbar_label: str = '', cbar_label_size: int = 14, show_plot: bool = False, cmap: str = 'YlOrRd', cmap_range: Optional[Tuple[float, float]] = None, blank_color: str = 'w', value_format: Optional[str] = None, max_row: int = 9, ax=None)

Plot heatmap in a periodic table.

Copied from pymatgen.util.plotting.

Parameters
  • elemental_data – The dictionary of elemental data.

  • cbar_label – Label for the colorbar.

  • cbar_label_size – Label size for the colorbar.

  • show_plot – Whether to show this plot after plotting.

  • cmap – What colormap to use.

  • cmap_range – Range of the colormap.

  • cax – The ax to put colorbar.

  • blank_color – Color of blank elemental data.

  • value_format – Formatter to use for values.

  • max_row – Maximal number of rows.

  • ax – Ax to put the periodic table.

s4.ml.plotting.periodic_table_heatmap_sbs(elemental_data1: Dict[str, float], elemental_data2: Dict[str, float], cbar_label: str = '', cbar_label_size: int = 10, show_plot: bool = False, cmap: str = 'YlOrRd', cmap_range: Optional[Tuple[float, float]] = None, cax=None, blank_color: str = 'w', value_format: Optional[str] = None, include_rows: Optional[List[int]] = None, ax=None)

Plot heatmaps side-by-side.

Parameters
  • elemental_data1 – The first set of elemental data.

  • elemental_data2 – The second set of elemental data.

  • cbar_label – Label for the colorbar.

  • cbar_label_size – Label size for the colorbar.

  • show_plot – Whether to show this plot after plotting.

  • cmap – What colormap to use.

  • cmap_range – Range of the colormap.

  • cax – The ax to put colorbar.

  • blank_color – Color of blank elemental data.

  • value_format – Formatter to use for values.

  • include_rows – What rows to include in the plot.

  • ax – Ax to put the periodic table.

s4.ml.plotting.scatter_matrix_plot(data_frame: pandas.core.frame.DataFrame, figsize: Tuple[int, int] = (10, 10), fontsize: int = 9, confusion_matrix_fontsize: int = 9, hist_bins: int = 20, binary_bins: int = 20, scatter_alpha: float = 0.3, scatter_size: int = 2)

Plot scatter matrix, just as pandas.plotting.scatter_matrix.

Parameters
  • data_frame – The data frame to plot.

  • figsize – Figure size.

  • fontsize – Fontsize in the plot.

  • confusion_matrix_fontsize – Fontsize for the labels in the confusion matrix.

  • hist_bins – Number of bins for histograms.

  • binary_bins – Number of bins for the binary variable histograms.

  • scatter_alpha – Alpha of the points in the scatter plot.

  • scatter_size – Size of the points in the scatter plot.

Simple linear models (OLS) and weighted least squares (WLS)

s4.ml.wls.feature_importance(data_frame: pandas.core.frame.DataFrame, features: List[str], y_name: str, *, do_wls: bool = True, weights=None, submodel_search=100, processes=None)

Estimate average partial importance.

The feature importance is calculated by sampling N=submodel_search submodels, since the total size of the submodel space may be huge to sample and calculate.

Parameters
  • data_frame – Training data frame.

  • features – List of features to use.

  • y_name – Prediction target name.

  • do_wls – Whether or not to perform WLS.

  • weights – Sample weights.

  • submodel_search – Number of sub-models when computing importance.

  • processes – Number of processes.

s4.ml.wls.forward_selection_by_bic(data_frame: pandas.core.frame.DataFrame, features_sorted: List[str], y_name: str, weights: Optional[numpy.ndarray] = None, do_wls: bool = False)

Perform feature forward selection by optimizing BIC (Bayesian Information Criteria). The features are taken according to the order defined by features_sorted.

Parameters
  • data_frame – The training data frame.

  • features_sorted – Features to use. Features are taken in order.

  • y_name – Prediction target name.

  • weights – If do_wls=True, use this as sample weights.

  • do_wls – Whether to use Weighted Least Squares.

s4.ml.wls.linear_model_analysis(data_frame: pandas.core.frame.DataFrame, features: List[str], y_name: str, y_desc: str, *, do_wls: bool = True, weights: Optional[numpy.ndarray] = None, do_feature_selection: bool = True, do_plot: bool = True, processes=None)

Perform feature importance analysis and feature selection on a dataset using OLS/WLS.

Parameters
  • data_frame – Training data frame.

  • features – List of features to use.

  • y_name – Prediction target name.

  • y_desc – Description of prediction target, will used in plots.

  • do_wls – Whether or not to perform WLS.

  • weights – Sample weights.

  • do_feature_selection – Whether or not to perform feature selection.

  • do_plot – Whether or not to plot data.

  • processes – Number of processes to use.

Functions for creating nonlinear models for synthesis prediction.

Perform Grid search CV and find the best parameters.

Parameters
  • estimator_cls – scikit-learn learner class.

  • params – Parameters to use for estimator_cls.

  • data_frame – Training data frame.

  • features – Features to use.

  • y_name – Prediction target variable name.

  • weights – Sample weights.

  • do_normalization – Whether to do normalization for training features.

  • cv_method – Cross-validation method.

  • processes – Number of multiprocessing processes to use.

s4.ml.nonlinear.estimate_average_partial_importance(estimator: sklearn.base.RegressorMixin, data_frame: pandas.core.frame.DataFrame, features: List[str], y_name: str, *, cv_method: Union[int, str] = 'loo', weights: Optional[numpy.ndarray] = None, features_inspect: Optional[List[str]] = None, submodel_search: int = 100, processes: Optional[int] = None)

Estimate average partial importance using cross-validation method specified.

Since the total size of the submodel space may be huge to sample and calculate, the feature importance is calculated by sampling N=submodel_search submodels,

Parameters
  • estimator – scikit-learn estimator to use.

  • data_frame – Training data frame.

  • features – List of features to use.

  • y_name – Prediction target name.

  • cv_method – What cross-validation to use, can be ‘loo’, int (k-fold).

  • weights – Sample weights.

  • features_inspect – What features to use when computing importance.

  • submodel_search – Number of sub-models when computing importance.

  • processes – Number of multiprocessing processes.

s4.ml.nonlinear.model_cv_analysis(data_frame: pandas.core.frame.DataFrame, features: List[str], y_name: str, y_desc: str, *, weights: Optional[numpy.ndarray] = None, estimator: Optional[sklearn.base.RegressorMixin] = None, do_normalization: bool = False, cv_method: Union[str, int] = 'loo', do_plot: bool = True, processes: Optional[int] = None, display_pbar: bool = True)

Perform cross validation on a dataset using multiprocessing.

Parameters
  • estimator – scikit-learn estimator to use.

  • data_frame – Training data frame.

  • features – List of features to use.

  • y_name – Prediction target name.

  • y_desc – Description of prediction target, will used in plots.

  • do_normalization – Do normalization before running model.

  • cv_method – What cross-validation to use, can be ‘loo’, int (k-fold).

  • weights – Sample weights.

  • processes – Number of multiprocessing processes.

  • do_plot – Whether or not to plot data.

  • display_pbar – Whether or not to display progress bar.

Contents in module s4.cascade

Stuff for analyzing cascades.

s4.cascade.analysis.compute_cascade(reaction: s4.tmr.entry.ReactionEnergies, step_temperatures: List[float], only_icsd: bool = True, use_scan: bool = False)

Compute cascade for a reaction.

Parameters
  • reaction – The reaction to calculate cascades for.

  • step_temperatures – List of temperatures for the cascade reactions.

  • only_icsd – Only use ICSD universe.

  • use_scan – Use SCAN energies.

Returns

List of pairwise reactions.

Thermodynamic calculations of the cascade model.

class s4.cascade.thermo.ReactionDrivingForce(reaction_string: str, driving_force: float, reactants: List[Tuple[Union[pymatgen.core.composition.Composition, str], float]], gases: List[Tuple[Union[pymatgen.core.composition.Composition, str], float]])

Bases: object

Calculated driving force of a inorganic synthesis reaction.

driving_force: float
gases: List[Tuple[Union[pymatgen.core.composition.Composition, str], float]]
reactants: List[Tuple[Union[pymatgen.core.composition.Composition, str], float]]
reaction_string: str
s4.cascade.thermo.get_dgf_fu(composition: Union[pymatgen.core.composition.Composition, str], temperature: float, use_mp=False, use_scan=False) float

Get the gibbs formation energy for a material. If use_mp=True, the value is from Materials Project with finite gibbs formation energy interpolation. Otherwise, it’s obtained from FREED (experimental) database.

Parameters
  • composition – Composition of the material.

  • temperature – Temperature of the material.

  • use_mp – Whether to use MP data.

  • use_scan – Whether to use SCAN data.

Returns

Gibbs formation energy of the compound.

s4.cascade.thermo.get_gas_mu(composition: Union[pymatgen.core.composition.Composition, str], temperature: float, fugacity: float = 1.0) float

Compute chemical potential of gas. Enthalpy values are from the FREED experimental database. Entropy values are from NIST data files.

Parameters
  • composition – Composition of the gas.

  • temperature – Temperature of the gas.

  • fugacity – Fugacity (partial pressure) of the gas.

Returns

Gas molecule chemical potential.

s4.cascade.thermo.reaction_driving_force(precursors: List[Union[pymatgen.core.composition.Composition, str]], open_comp: List[Union[pymatgen.core.composition.Composition, str]], target: Union[pymatgen.core.composition.Composition, str], target_mixture: Dict[s4.thermo.utils.as_composition, float], atom_set: Set[pymatgen.core.periodic_table.Element], temperature: float, gas_partials: Dict[s4.thermo.utils.as_composition, float], use_scan: bool) Optional[s4.cascade.thermo.ReactionDrivingForce]

Balance reaction and compute grand canonical driving force at once. The computed driving forces are in ev/metal_atom. Note that the reaction equation is normalized to have 1 metal atom per target composition.

Parameters
  • precursors – List of precursors of the reaction.

  • open_comp – List of open compositions of the reaction.

  • target – Target material composition.

  • target_mixture – Dictionary containing the mixtures of target material.

  • atom_set – Set of “non-volatile” atoms.

  • temperature – Temperature of the reaction.

  • gas_partials – Dictionary containing gas partial pressures.

  • use_scan – Whether to use SCAN database.

Returns

Calculated reaction driving force, or None if no reaction can be balanced.

s4.cascade.thermo.reduce_formula(sym_amt: Mapping[pymatgen.core.periodic_table.Element, float], iupac_ordering: bool = False) Tuple[str, int]

Faster implementation of pymatgen.periodic_table.reduce_formula.

The original pymatgen implementation is too slow. For example, some conversions between string and Element are not necessary. Since we will call this function for significantly many times, we need to optimize this function as much as possible.

Parameters
  • sym_amt – Dictionary that contains {Elements: amount}

  • iupac_ordering – Whether to use IUPAC ordering.

Returns

The reduced composition as string and the factor of reduction.

Greedy finding competing pairwise reactions.

s4.cascade.greedy_resolver.find_competing_reactions(precursors: List[pymatgen.core.composition.Composition], open_comp: List[pymatgen.core.composition.Composition], all_targets: List[pymatgen.core.composition.Composition], target_mixture: Dict[pymatgen.core.composition.Composition, Dict[pymatgen.core.composition.Composition, float]], atom_set, temperature: float, gas_partials: Optional[Dict[pymatgen.core.composition.Composition, float]], use_scan=False)

Find competing pairwise reactions at a interface by finding largest driving force.

Parameters
  • precursors – List of precursors to consider.

  • open_comp – List of open compositions such as gas.

  • all_targets – List of possible target compounds.

  • target_mixture – Composition mixture.

  • atom_set – Set of “non-volatile” atoms.

  • temperature – Temperature of the reaction.

  • gas_partials – Dictionary containing gas partial pressures.

  • use_scan – Whether to use SCAN database.

The cascade model implementation.

s4.cascade.cascade_model.find_cascade_reaction(precursors: List[pymatgen.core.composition.Composition], open_comp: List[pymatgen.core.composition.Composition], temperature: float, target_mixture: Dict[pymatgen.core.composition.Composition, Dict[pymatgen.core.composition.Composition, float]], gas_partials: Optional[Dict[pymatgen.core.composition.Composition, float]] = None, use_scan=False, only_icsd=True)

Find pairwise reaction by comparing energies at the interface.

s4.cascade.cascade_model.find_competing_reactions(precursors: List[pymatgen.core.composition.Composition], open_comp: List[pymatgen.core.composition.Composition], all_targets: List[pymatgen.core.composition.Composition], target_mixture: Dict[pymatgen.core.composition.Composition, Dict[pymatgen.core.composition.Composition, float]], atom_set, temperature: float, gas_partials: Optional[Dict[pymatgen.core.composition.Composition, float]], use_scan=False)

Find competing pairwise reactions at a interface by finding largest driving force.

Parameters
  • precursors – List of precursors to consider.

  • open_comp – List of open compositions such as gas.

  • all_targets – List of possible target compounds.

  • target_mixture – Composition mixture.

  • atom_set – Set of “non-volatile” atoms.

  • temperature – Temperature of the reaction.

  • gas_partials – Dictionary containing gas partial pressures.

  • use_scan – Whether to use SCAN database.

Utility to quickly balance reactions.

exception s4.cascade.balance.CannotBalance

Bases: ValueError

A reaction cannot be balanced.

s4.cascade.balance.quick_balance(precursors: List[Union[pymatgen.core.composition.Composition, str]], open_comp: List[Union[pymatgen.core.composition.Composition, str]], target: Union[pymatgen.core.composition.Composition, str]) Tuple[List[float], List[float]]

Balance a chemical reaction.

Parameters
  • precursors – List of precursors.

  • open_comp – List of open compositions.

  • target – The target material.

Returns

Coefficients for the precursor and open materials.