API References
Contents in module s4.thermo.calc
Module for computing thermodynamics using calculated databases (Materials Project).
The general application of this module is to compute thermodynamics quantities for calculated material entries. The following is the step to do this:
- Obtain the system of interest using
query_system_mp()
or
- Obtain the system of interest using
- Obtain the zero-temperature formation enthalpy using
- Interpolate the finite temperature Gibbs energy of formation using
- s4.thermo.calc.compute_corrected_dgf_mp(mp_entry: pymatgen.entries.computed_entries.ComputedEntry, temperature: float = 300) float
Compute corrected \(\Delta G_f(T)\) in ev/atom for a material in the Materials Project.
We apply three corrections as follows:
Elemental contributions as used in the Materials Project.
For carbonates, we apply a correction for CO32- anion (fitted by Haoyan).
Chris Bartels’ finite \(dG_f(T)\) correction.
- Parameters
mp_entry – A computed entry retrieved from the Materials Project.
temperature – Temperature for which \(dG_f(T)\) will be computed.
- Returns
The corrected gibbs energy of formation at the temperature specified.
- s4.thermo.calc.compute_corrected_dgf_scan(mp_entry: pymatgen.entries.computed_entries.ComputedEntry, temperature: float = 300) float
Compute corrected \(\Delta G_f(T)\) in ev/atom for a material in the SCAN database.
We apply only one corrections as follows:
Chris Bartels’ finite \(dG_f(T)\) correction.
- Parameters
mp_entry – A computed entry retrieved from the SCAN database.
temperature – Temperature for which \(dG_f(T)\) will be computed.
- Returns
The corrected gibbs energy of formation at the temperature specified.
- s4.thermo.calc.compute_corrected_dhf_mp(mp_entry: pymatgen.entries.computed_entries.ComputedEntry) float
Compute corrected \(\Delta H_f(0K)\) in ev/atom for a material in the Materials Project.
We apply two corrections as follows:
Elemental contributions as used in the Materials Project.
For carbonates, we apply a correction for CO32- anion (fitted by Haoyan).
- Parameters
mp_entry – A computed entry retrieved from the Materials Project.
- Returns
The corrected formation enthalpy at zero Kelvin.
- s4.thermo.calc.compute_corrected_dhf_scan(mp_entry: pymatgen.entries.computed_entries.ComputedEntry) float
Compute corrected \(\Delta H_f(0K)\) in ev/atom for a material in the SCAN database.
- Parameters
mp_entry – A computed entry retrieved from the SCAN database.
- Returns
The corrected formation enthalpy at zero Kelvin.
- s4.thermo.calc.finite_dg_correction(mp_entry: pymatgen.entries.computed_entries.ComputedEntry, temperature: float, dhf: float) float
Compute finite-temperature \(dG(T)\) correction using Chris Bartel’s method, see [Chris2018].
- Parameters
mp_entry – The entry to a Materials Project entry, which must contain the volume of the structure.
temperature – Finite temperature for which \(dG(T)\) is approximated.
dhf – Zero-temperature formation enthalpy.
- Returns
Interpolated gibbs energy of formation at finite temperature.
- Chris2018
Bartel, Christopher J., et al. “Physical descriptor for the Gibbs energy of inorganic crystalline solids and temperature-dependent materials chemistry.” Nature communications 9.1 (2018): 1-10.
- s4.thermo.calc.query_system_mp(system: Union[List[str], str]) List[pymatgen.entries.computed_entries.ComputedEntry]
Query the Materials Project to get a list of compositions within a system or a specific composition. There are multiple ways of specifying a system:
List of chemical elements, e.g., Ba-Ti-O.
Exact composition, e.g., BaCO3.
- Parameters
system – Chemical system identifier.
- Returns
List of matched entries in the Materials Project.
- s4.thermo.calc.query_system_scan(system: Union[List[str], str]) List[pymatgen.entries.computed_entries.ComputedEntry]
Query Ryan’s SCAN database to get a list of compositions within a system or a specific composition. There are multiple ways of specifying a system:
List of chemical elements, e.g., Ba-Ti-O.
Exact composition, e.g., BaCO3.
- Parameters
system – Chemical system identifier.
- Returns
List of matched entries in the SCAN database.
Contents in module s4.thermo.exp
Module for computing thermodynamics using the FREED database [FREED].
- class s4.thermo.exp.EnthalpyEq(A: float, B: float, C: float, D: float, E: float, F: float, T_transition: float, H_transition: Optional[float], comment: Optional[str])
Bases:
object
Represents an enthalpy equation in the form of:
\(H(T) - H_0(298K) = AT + BT^2 + \frac{C}{T} + DT^{\frac{1}{2}} + ET^3 + F\)
- A: float
Coefficient \(A\).
- B: float
Coefficient \(B\).
- C: float
Coefficient \(C\).
- D: float
Coefficient \(D\).
- E: float
Coefficient \(E\).
- F: float
Coefficient \(F\).
- H_transition: Optional[float]
Transition enthalpy value.
- T_transition: float
Transition temperature (upper bound of this equation).
- comment: Optional[str]
Comments on this enthalpy equation.
- class s4.thermo.exp.ExpThermoDatabase(freed_db_fn='./FREED 11.0.xlsm')
Bases:
object
A database containing all entries in FREED database.
- dgf(composition: Union[pymatgen.core.composition.Composition, str], temperature: float, unit='ev', allow_extrapolate=False) float
Get the formation gibbs free energy of a compound at desired temperature.
- dhf(composition: Union[pymatgen.core.composition.Composition, str], temperature: float, unit='ev', allow_extrapolate=False) float
Get the formation enthalpy of a compound at desired temperature.
- h(composition: Union[pymatgen.core.composition.Composition, str], temperature: float, unit='ev', allow_extrapolate=False) float
Get the enthalpy of a compound at desired temperature.
- Parameters
composition – Composition of the compound.
temperature – Desired temperature.
unit – Return unit.
allow_extrapolate – Whether an extrapolated value is accepted.
- Returns
- class s4.thermo.exp.FREEDEntry(formula: str, desc: str, name: str, mineral_name: str, molar_mass: float, density: float, h0: float, ent0: float, tmax: float, dh_eqs: List[s4.thermo.exp.freed.EnthalpyEq], dgf_eqs: List[s4.thermo.exp.freed.GibbsFormationEq])
Bases:
object
A verbatim entry in FREED database.
- density: float
Density of the material.
- desc: str
Description of the material.
- dgf(temperature: float, unit='ev', allow_extrapolate=False) float
Compute gibbs energy of formation at a temperature. If you specify allow_extrapolate=True and the temperature falls outside of the range defined by FREED database, you will get an extrapolated value.
- Parameters
temperature – Temperature for the energy calculation.
unit – Can be one of {‘ev’, ‘ev/atom’, ‘kcal’, ‘kj’, ‘cal’}
allow_extrapolate – Whether extrapolation is allowed.
- Returns
Gibbs energy of formation at the specified temperature
- dgf_eqs: List[s4.thermo.exp.freed.GibbsFormationEq]
List of Gibbs energy equations.
- dh_eqs: List[s4.thermo.exp.freed.EnthalpyEq]
List of enthalpy equations.
- ent0: float
Entropy of formation at 298K (unit in cal/kelvin).
- formula: str
Formula of the compound.
- static from_row(row: pandas.core.series.Series)
Construct a <FREEDEntry> from a pandas row.
- h(temperature: float, unit='ev', allow_extrapolate=False) float
Compute enthalpy of formation at a temperature. If you specify allow_extrapolate=True and the temperature falls outside of the range defined by FREED database, you will get an extrapolated value.
- Parameters
temperature – Temperature for the enthalpy calculation.
unit – Can be one of {‘ev’, ‘ev/atom’, ‘kcal’, ‘kj’, ‘cal’}
allow_extrapolate – Whether extrapolation is allowed.
- Returns
Enthalpy of formation at the specified temperature.
- h0: float
Standard enthalpy of formation at 298K (unit in cal).
- mineral_name: str
Mineral name of the material.
- molar_mass: float
Molar mass of the material.
- name: str
Name of the material.
- plot_dgf(temp_range=None, unit='ev/atom', ax=None, show=True) None
Plot Gibbs energy of formation at various temperatures
- Parameters
temp_range – The range of temperatures. If None, defaults to 298K-TMax.
unit – Unit of the plot
ax – If not None, plot to an existing axis.
show – If True, display this plot after function call.
- plot_h(temp_range=None, unit='ev/atom', ax=None, show=True) None
Plot enthalpy at various temperatures
- Parameters
temp_range – The range of temperatures. If None, defaults to 298K-TMax.
unit – Unit of the plot
ax – If not None, plot to an existing axis.
show – If True, display this plot after function call.
- tmax: float
Maximal characterized temperature.
- class s4.thermo.exp.GibbsFormationEq(A: float, B: float, C: float, D: float, E: float, F: float, G: float, T_transition: float)
Bases:
object
Represents an gibbs formation energy equation in the form of:
\(\Delta G_f(T) = AT\ln(T) + BT + CT^2 + \frac{D}{T} + ET^{\frac{1}{2}} + FT^3 + G\)
- A: float
Coefficient \(A\).
- B: float
Coefficient \(B\).
- C: float
Coefficient \(C\).
- D: float
Coefficient \(D\).
- E: float
Coefficient \(E\).
- F: float
Coefficient \(F\).
- G: float
Coefficient \(G\).
- T_transition: float
Transition temperature (upper bound of this equation).
Contents in module s4.tmr
- class s4.tmr.ExpDeterminedMaterial(composition: pymatgen.core.composition.Composition)
Bases:
object
Material with experimentally measured (from FREED, see
s4.thermo.exp.ExpThermoDatabase
) thermodynamic quantities.- composition: pymatgen.core.composition.Composition
The composition of the material.
- dgf(temperature: float) float
Returns the Gibbs energy of formation of this material in ev/atom.
- Parameters
temperature – Temperature for which Gibbs energy of formation is calculated..
- Returns
Gibbs energy of formation at the specified temperature.
- dhf(temperature: float) float
Returns the formation enthalpy of this material in ev/atom.
- Parameters
temperature – Temperature for which formation enthalpy is calculated.
- Returns
Formation enthalpy at the specified temperature.
- static from_material_dict(material, allow_many=False)
Generate from material data. The parameter material should contain a field named “thermo”. The following is an example:
material = { "thermo": [ { "interpolation": "BaCO3", "formula": "BaCO3", } ]}
- class s4.tmr.GasMaterial(composition: pymatgen.core.composition.Composition, fugacity: float)
Bases:
object
Gaseous material, such as O2, CO2, etc.
- composition: pymatgen.core.composition.Composition
Composition of the gas.
- dgf(temperature: float) float
Returns the Gibbs energy of formation of this material in ev/atom.
- Parameters
temperature – Temperature for which Gibbs energy of formation is calculated..
- Returns
Gibbs energy of formation at the specified temperature.
- dhf(temperature: float) float
Returns the formation enthalpy of this material in ev/atom.
- Parameters
temperature – Temperature for which formation enthalpy is calculated.
- Returns
Formation enthalpy at the specified temperature.
- fugacity: float
Effective partial pressure of the gas.
- mu(temperature: float) float
Returns the chemical potential of this gas in ev/atom
- Parameters
temperature – Temperature for which chemical potential is calculated..
- Returns
Chemical potential at the specified temperature.
- exception s4.tmr.InvalidMPIdError
Bases:
KeyError
The material cannot be associated with a Materials Project ID.
- class s4.tmr.MPInterpolatedMaterial(compositions: List[pymatgen.core.composition.Composition], amounts: List[float], mp_entries: List[pymatgen.entries.computed_entries.ComputedEntry])
Bases:
object
Material with thermo quantities interpolated using MP.
- amounts: List[float]
- property composition
Total composition.
- compositions: List[pymatgen.core.composition.Composition]
- dgf(temperature: float) float
Returns the dGf of this interpolated material in ev/atom.
- Parameters
temperature – Temperature of dGf.
- Returns
- dhf(temperature) float
Returns the dHf of this interpolated material in ev/atom.
- Returns
- static from_material_dict(material, allow_many=False)
Generate an instance from materials data.
- mp_entries: List[pymatgen.entries.computed_entries.ComputedEntry]
- property polytypes
List of polytypes in the structure as predicted by geometry analyzer.
- class s4.tmr.MPUniverseInterpolation(mp_space=None, mp_data_by_temp=None)
Bases:
object
Interpolate a unknown material to the Materials Project (MP) using all compounds in the MP.
This code is adapted from the original version by Christopher J. Bartel.
The interpolation is done by optimizing the geometry energy, calculated by \(-\exp(-D)\) where \(D=sqrt(\sum fractional\_comp\_diff^2)\), under the constraint that all atoms must conserve during optimization.
- static competing_phases(composition: pymatgen.core.composition.Composition, target_space: Set[str], neighbors, neighbor_energies) Tuple[Dict, List[pymatgen.core.composition.Composition]]
Find all competing phases for a target compound given list of neighboring compounds.
- Parameters
composition – Composition to search for.
target_space – Set of chemical elements in the target composition.
neighbors – Neighboring compounds.
neighbor_energies – Energies of the neighboring compounds.
- Returns
Energies of the neighboring compounds, and the energy hull.
- static geometry_energy(composition: pymatgen.core.composition.Composition, target_space, neighbors)
Compute geometry energy given a target compound and a list of neighboring compounds.
- interpolate(composition: Union[str, pymatgen.core.composition.Composition]) Dict[pymatgen.core.composition.Composition, Dict[str, float]]
Interpolate a composition using compounds in the Materials Project.
The returned data looks like the following:
result = { Composition("BaO"): {'amt': 1.0, 'E': 0.0} }
- Parameters
composition – The composition to interpolate.
- Returns
Dictionary that contains the decomposed compositions and their information.
- property mp_data_by_temp
MP thermodynamic data by temperature.
- property mp_space
Dictionary describing the MP universe.
- neighbors(composition: pymatgen.core.composition.Composition, target_space)
Compute the neighboring compounds for a given compound.
- static optimize_energy(composition: pymatgen.core.composition.Composition, target_space: Set[str], hull: Dict[pymatgen.core.composition.Composition, Dict[str, float]], competing_phases: List[pymatgen.core.composition.Composition]) scipy.optimize.optimize.OptimizeResult
Optimize geometry energy and find the combination of competing phases that generate the lowest geometry energy.
- Parameters
composition – Composition of the target material.
target_space – List of chemical elements in the target material.
hull – Dictionary whose keys are phases and values are {‘E’: energy}.
competing_phases – List of compositions as competing phases.
- Returns
Optimization result that contains the solution.
- class s4.tmr.MaterialWithEnergy(thermo: Union[s4.tmr.thermo.MPInterpolatedMaterial, s4.tmr.thermo.GasMaterial], composition: pymatgen.core.composition.Composition, is_target: bool, side: str, amt: float)
Bases:
object
A material with thermodynamics quantities resolved.
- amt: float
The amount of the material.
- composition: pymatgen.core.composition.Composition
The composition of this material.
- is_target: bool
Whether this material is a target material.
- side: str
Which side of the reaction this material belongs to.
- thermo: Union[s4.tmr.thermo.MPInterpolatedMaterial, s4.tmr.thermo.GasMaterial]
The resolved thermodynamic entity.
- class s4.tmr.ReactionEnergies(target: pymatgen.core.composition.Composition, species: List[s4.tmr.entry.MaterialWithEnergy], vars_sub: Dict[str, float])
Bases:
object
Reaction with thermodynamics quantities.
- dgcrxn(temperature: float) float
Compute the grand canonical energy of reaction at the specified temperature.
- Parameters
temperature – Temperature for which the reaction grand canonical energy should be calculated.
- Returns
Calculated grand canonical energy of reaction at the specified temperature.
- dgrxn(temperature: float) float
Compute the Gibbs energy of reaction at the specified temperature.
- Parameters
temperature – Temperature for which the reaction Gibbs energy should be calculated.
- Returns
Calculated Gibbs energy of reaction at the specified temperature.
- property reaction_string: str
String representation of the reaction.
- species: List[s4.tmr.entry.MaterialWithEnergy]
Species involved in this reaction.
- target: pymatgen.core.composition.Composition
The target composition.
- vars_sub: Dict[str, float]
Variable substitutions
- class s4.tmr.ReactionEntry(k: str, reaction_string: str, exp_t: float, exp_time: float, reactions: List[s4.tmr.entry.ReactionEnergies])
Bases:
object
An entry derived from a text-mined recipe.
- exp_t: float
Experimental temperature.
- exp_time: float
Experimental time.
- experimental_dgcrxn() Dict[pymatgen.core.composition.Composition, float]
Grand canonical free energy of the reaction.
- experimental_dgrxn() Dict[pymatgen.core.composition.Composition, float]
Gibbs free energy of the reaction.
- static from_reaction_entry(data, key, override_fugacity=None, use_database='mp')
Compute ReactionEntry from text-mined data, identified by keys.
- Parameters
data – Dictionary consisting of reaction entries.
key – Key of entry.
override_fugacity – If not None, set the fugacity (effective partial pressure of certain gases.
use_database – Which thermodynamic database to use. Choices are {‘mp’, ‘freed’}.
- k: str
Identifier or key of the reaction.
- reaction_string: str
Reaction string.
- reactions: List[s4.tmr.entry.ReactionEnergies]
List of reactions contained in this recipe.
- s4.tmr.from_reactions(data: Dict[str, dict], keys: List[str], *, override_fugacity=None, use_database='mp')
Generate list of ReactionEntry.
- Parameters
data – Dictionary containing solid-state synthesis dataset.
keys – List of keys for which reactions are generate.
override_fugacity – Override some of the fugacities.
use_database – Which database to use.
- Returns
- s4.tmr.from_reactions_multiprocessing(data: Dict[str, dict], keys: List[str], *, processes=16, suppress_output=True, use_database='mp', override_fugacity: Optional[Dict[pymatgen.core.composition.Composition, float]] = None, return_errors: bool = False) Union[Dict[str, s4.tmr.entry.ReactionEntry], Tuple]
Generate list of ReactionEntry using multiprocessing.
- Parameters
data – Dictionary containing solid-state synthesis dataset.
keys – List of keys for which reactions are generate.
processes – Number of processes to use.
suppress_output – Suppress the output of workers.
use_database – Which database to use.
override_fugacity – Override some of the fugacities.
return_errors – Whether or not to return errors.
- Returns
Contents in module s4.ml
Computes features for each reaction.
- class s4.ml.features.Featurizer(thermo_temps=None)
Bases:
object
This featurizer computes all 133 features used in this work.
The features are divided into four types: 1. Precursor properties (melting points, Gibbs formation energy, formation enthalpy, etc.) 2. Target compositions. 3. Experiment-adjacent features. 4. Thermodynamic driving forces.
- elements = ['Li', 'Be', 'B', 'C', 'N', 'O', 'F', 'Na', 'Mg', 'Al', 'Si', 'P', 'S', 'Cl', 'K', 'Ca', 'Sc', 'Ti', 'V', 'Cr', 'Mn', 'Fe', 'Co', 'Ni', 'Cu', 'Zn', 'Ga', 'Ge', 'As', 'Se', 'Br', 'Rb', 'Sr', 'Y', 'Zr', 'Nb', 'Mo', 'Ru', 'Rh', 'Pd', 'Ag', 'Cd', 'In', 'Sn', 'Sb', 'Te', 'Cs', 'Ba', 'La', 'Ce', 'Pr', 'Nd', 'Sm', 'Eu', 'Gd', 'Tb', 'Dy', 'Ho', 'Er', 'Tm', 'Yb', 'Lu', 'Hf', 'Ta', 'W', 'Re', 'Os', 'Ir', 'Hg', 'Tl', 'Pb', 'Bi', 'Th', 'U']
Chemical elements that qualify as a compositional feature.
- featurize(reaction: s4.tmr.entry.ReactionEnergies, exp_t: float, exp_time: float, k: str = '', i: int = 0, tms_data: Optional[Dict] = None, calculate_thermo_bounds=True) Dict[str, float]
Featurize a reaction using the 133 features.
- Parameters
reaction – The reaction to compute features for.
k – Key of the recipe.
i – Index of the reaction.
exp_t – Experimental temperature in Celsius.
exp_time – Experimental time.
tms_data – Text-mined dataset dictionary.
calculate_thermo_bounds – Whether to calculate synthesis temperature bounds predicted by thermodynamic.
- Returns
Features for this reaction.
- thermo_temps = [1073.15, 1173.15, 1273.15, 1373.15, 1473.15, 1573.15]
Temperatures for which thermodynamic driving forces are calculated.
Plotting utility functions.
- s4.ml.plotting.generate_pt_triangulation(ncols: int, rows2id: Dict[int, int], padding=(0.1, 0.1, 0.1, 0.1), spacing=1.5, header=0.5)
Generate triangulation for plotting side-by-side periodic tables.
- Parameters
ncols – Number of columns.
rows2id – Periodic-table row id to index-based row id.
padding – Padding of the triangles inside each box.
spacing – Height of each box.
header – Height of the header (i.e., chemical element symbol).
- Returns
Generated triangulation.
- s4.ml.plotting.periodic_table_heatmap(elemental_data: Dict[str, float], cbar_label: str = '', cbar_label_size: int = 14, show_plot: bool = False, cmap: str = 'YlOrRd', cmap_range: Optional[Tuple[float, float]] = None, blank_color: str = 'w', value_format: Optional[str] = None, max_row: int = 9, ax=None)
Plot heatmap in a periodic table.
Copied from pymatgen.util.plotting.
- Parameters
elemental_data – The dictionary of elemental data.
cbar_label – Label for the colorbar.
cbar_label_size – Label size for the colorbar.
show_plot – Whether to show this plot after plotting.
cmap – What colormap to use.
cmap_range – Range of the colormap.
cax – The ax to put colorbar.
blank_color – Color of blank elemental data.
value_format – Formatter to use for values.
max_row – Maximal number of rows.
ax – Ax to put the periodic table.
- s4.ml.plotting.periodic_table_heatmap_sbs(elemental_data1: Dict[str, float], elemental_data2: Dict[str, float], cbar_label: str = '', cbar_label_size: int = 10, show_plot: bool = False, cmap: str = 'YlOrRd', cmap_range: Optional[Tuple[float, float]] = None, cax=None, blank_color: str = 'w', value_format: Optional[str] = None, include_rows: Optional[List[int]] = None, ax=None)
Plot heatmaps side-by-side.
- Parameters
elemental_data1 – The first set of elemental data.
elemental_data2 – The second set of elemental data.
cbar_label – Label for the colorbar.
cbar_label_size – Label size for the colorbar.
show_plot – Whether to show this plot after plotting.
cmap – What colormap to use.
cmap_range – Range of the colormap.
cax – The ax to put colorbar.
blank_color – Color of blank elemental data.
value_format – Formatter to use for values.
include_rows – What rows to include in the plot.
ax – Ax to put the periodic table.
- s4.ml.plotting.scatter_matrix_plot(data_frame: pandas.core.frame.DataFrame, figsize: Tuple[int, int] = (10, 10), fontsize: int = 9, confusion_matrix_fontsize: int = 9, hist_bins: int = 20, binary_bins: int = 20, scatter_alpha: float = 0.3, scatter_size: int = 2)
Plot scatter matrix, just as pandas.plotting.scatter_matrix.
- Parameters
data_frame – The data frame to plot.
figsize – Figure size.
fontsize – Fontsize in the plot.
confusion_matrix_fontsize – Fontsize for the labels in the confusion matrix.
hist_bins – Number of bins for histograms.
binary_bins – Number of bins for the binary variable histograms.
scatter_alpha – Alpha of the points in the scatter plot.
scatter_size – Size of the points in the scatter plot.
Simple linear models (OLS) and weighted least squares (WLS)
- s4.ml.wls.feature_importance(data_frame: pandas.core.frame.DataFrame, features: List[str], y_name: str, *, do_wls: bool = True, weights=None, submodel_search=100, processes=None)
Estimate average partial importance.
The feature importance is calculated by sampling N=submodel_search submodels, since the total size of the submodel space may be huge to sample and calculate.
- Parameters
data_frame – Training data frame.
features – List of features to use.
y_name – Prediction target name.
do_wls – Whether or not to perform WLS.
weights – Sample weights.
submodel_search – Number of sub-models when computing importance.
processes – Number of processes.
- s4.ml.wls.forward_selection_by_bic(data_frame: pandas.core.frame.DataFrame, features_sorted: List[str], y_name: str, weights: Optional[numpy.ndarray] = None, do_wls: bool = False)
Perform feature forward selection by optimizing BIC (Bayesian Information Criteria). The features are taken according to the order defined by features_sorted.
- Parameters
data_frame – The training data frame.
features_sorted – Features to use. Features are taken in order.
y_name – Prediction target name.
weights – If do_wls=True, use this as sample weights.
do_wls – Whether to use Weighted Least Squares.
- s4.ml.wls.linear_model_analysis(data_frame: pandas.core.frame.DataFrame, features: List[str], y_name: str, y_desc: str, *, do_wls: bool = True, weights: Optional[numpy.ndarray] = None, do_feature_selection: bool = True, do_plot: bool = True, processes=None)
Perform feature importance analysis and feature selection on a dataset using OLS/WLS.
- Parameters
data_frame – Training data frame.
features – List of features to use.
y_name – Prediction target name.
y_desc – Description of prediction target, will used in plots.
do_wls – Whether or not to perform WLS.
weights – Sample weights.
do_feature_selection – Whether or not to perform feature selection.
do_plot – Whether or not to plot data.
processes – Number of processes to use.
Functions for creating nonlinear models for synthesis prediction.
- s4.ml.nonlinear.custom_grid_search(estimator_cls, params, data_frame, features, y_name, weights=None, do_normalization=False, cv_method='loo', processes=None)
Perform Grid search CV and find the best parameters.
- Parameters
estimator_cls – scikit-learn learner class.
params – Parameters to use for estimator_cls.
data_frame – Training data frame.
features – Features to use.
y_name – Prediction target variable name.
weights – Sample weights.
do_normalization – Whether to do normalization for training features.
cv_method – Cross-validation method.
processes – Number of multiprocessing processes to use.
- s4.ml.nonlinear.estimate_average_partial_importance(estimator: sklearn.base.RegressorMixin, data_frame: pandas.core.frame.DataFrame, features: List[str], y_name: str, *, cv_method: Union[int, str] = 'loo', weights: Optional[numpy.ndarray] = None, features_inspect: Optional[List[str]] = None, submodel_search: int = 100, processes: Optional[int] = None)
Estimate average partial importance using cross-validation method specified.
Since the total size of the submodel space may be huge to sample and calculate, the feature importance is calculated by sampling N=submodel_search submodels,
- Parameters
estimator – scikit-learn estimator to use.
data_frame – Training data frame.
features – List of features to use.
y_name – Prediction target name.
cv_method – What cross-validation to use, can be ‘loo’, int (k-fold).
weights – Sample weights.
features_inspect – What features to use when computing importance.
submodel_search – Number of sub-models when computing importance.
processes – Number of multiprocessing processes.
- s4.ml.nonlinear.model_cv_analysis(data_frame: pandas.core.frame.DataFrame, features: List[str], y_name: str, y_desc: str, *, weights: Optional[numpy.ndarray] = None, estimator: Optional[sklearn.base.RegressorMixin] = None, do_normalization: bool = False, cv_method: Union[str, int] = 'loo', do_plot: bool = True, processes: Optional[int] = None, display_pbar: bool = True)
Perform cross validation on a dataset using multiprocessing.
- Parameters
estimator – scikit-learn estimator to use.
data_frame – Training data frame.
features – List of features to use.
y_name – Prediction target name.
y_desc – Description of prediction target, will used in plots.
do_normalization – Do normalization before running model.
cv_method – What cross-validation to use, can be ‘loo’, int (k-fold).
weights – Sample weights.
processes – Number of multiprocessing processes.
do_plot – Whether or not to plot data.
display_pbar – Whether or not to display progress bar.
Contents in module s4.cascade
Stuff for analyzing cascades.
- s4.cascade.analysis.compute_cascade(reaction: s4.tmr.entry.ReactionEnergies, step_temperatures: List[float], only_icsd: bool = True, use_scan: bool = False)
Compute cascade for a reaction.
- Parameters
reaction – The reaction to calculate cascades for.
step_temperatures – List of temperatures for the cascade reactions.
only_icsd – Only use ICSD universe.
use_scan – Use SCAN energies.
- Returns
List of pairwise reactions.
Thermodynamic calculations of the cascade model.
- class s4.cascade.thermo.ReactionDrivingForce(reaction_string: str, driving_force: float, reactants: List[Tuple[Union[pymatgen.core.composition.Composition, str], float]], gases: List[Tuple[Union[pymatgen.core.composition.Composition, str], float]])
Bases:
object
Calculated driving force of a inorganic synthesis reaction.
- driving_force: float
- gases: List[Tuple[Union[pymatgen.core.composition.Composition, str], float]]
- reactants: List[Tuple[Union[pymatgen.core.composition.Composition, str], float]]
- reaction_string: str
- s4.cascade.thermo.get_dgf_fu(composition: Union[pymatgen.core.composition.Composition, str], temperature: float, use_mp=False, use_scan=False) float
Get the gibbs formation energy for a material. If use_mp=True, the value is from Materials Project with finite gibbs formation energy interpolation. Otherwise, it’s obtained from FREED (experimental) database.
- Parameters
composition – Composition of the material.
temperature – Temperature of the material.
use_mp – Whether to use MP data.
use_scan – Whether to use SCAN data.
- Returns
Gibbs formation energy of the compound.
- s4.cascade.thermo.get_gas_mu(composition: Union[pymatgen.core.composition.Composition, str], temperature: float, fugacity: float = 1.0) float
Compute chemical potential of gas. Enthalpy values are from the FREED experimental database. Entropy values are from NIST data files.
- Parameters
composition – Composition of the gas.
temperature – Temperature of the gas.
fugacity – Fugacity (partial pressure) of the gas.
- Returns
Gas molecule chemical potential.
- s4.cascade.thermo.reaction_driving_force(precursors: List[Union[pymatgen.core.composition.Composition, str]], open_comp: List[Union[pymatgen.core.composition.Composition, str]], target: Union[pymatgen.core.composition.Composition, str], target_mixture: Dict[s4.thermo.utils.as_composition, float], atom_set: Set[pymatgen.core.periodic_table.Element], temperature: float, gas_partials: Dict[s4.thermo.utils.as_composition, float], use_scan: bool) Optional[s4.cascade.thermo.ReactionDrivingForce]
Balance reaction and compute grand canonical driving force at once. The computed driving forces are in ev/metal_atom. Note that the reaction equation is normalized to have 1 metal atom per target composition.
- Parameters
precursors – List of precursors of the reaction.
open_comp – List of open compositions of the reaction.
target – Target material composition.
target_mixture – Dictionary containing the mixtures of target material.
atom_set – Set of “non-volatile” atoms.
temperature – Temperature of the reaction.
gas_partials – Dictionary containing gas partial pressures.
use_scan – Whether to use SCAN database.
- Returns
Calculated reaction driving force, or None if no reaction can be balanced.
- s4.cascade.thermo.reduce_formula(sym_amt: Mapping[pymatgen.core.periodic_table.Element, float], iupac_ordering: bool = False) Tuple[str, int]
Faster implementation of pymatgen.periodic_table.reduce_formula.
The original pymatgen implementation is too slow. For example, some conversions between string and Element are not necessary. Since we will call this function for significantly many times, we need to optimize this function as much as possible.
- Parameters
sym_amt – Dictionary that contains {Elements: amount}
iupac_ordering – Whether to use IUPAC ordering.
- Returns
The reduced composition as string and the factor of reduction.
Greedy finding competing pairwise reactions.
- s4.cascade.greedy_resolver.find_competing_reactions(precursors: List[pymatgen.core.composition.Composition], open_comp: List[pymatgen.core.composition.Composition], all_targets: List[pymatgen.core.composition.Composition], target_mixture: Dict[pymatgen.core.composition.Composition, Dict[pymatgen.core.composition.Composition, float]], atom_set, temperature: float, gas_partials: Optional[Dict[pymatgen.core.composition.Composition, float]], use_scan=False)
Find competing pairwise reactions at a interface by finding largest driving force.
- Parameters
precursors – List of precursors to consider.
open_comp – List of open compositions such as gas.
all_targets – List of possible target compounds.
target_mixture – Composition mixture.
atom_set – Set of “non-volatile” atoms.
temperature – Temperature of the reaction.
gas_partials – Dictionary containing gas partial pressures.
use_scan – Whether to use SCAN database.
The cascade model implementation.
- s4.cascade.cascade_model.find_cascade_reaction(precursors: List[pymatgen.core.composition.Composition], open_comp: List[pymatgen.core.composition.Composition], temperature: float, target_mixture: Dict[pymatgen.core.composition.Composition, Dict[pymatgen.core.composition.Composition, float]], gas_partials: Optional[Dict[pymatgen.core.composition.Composition, float]] = None, use_scan=False, only_icsd=True)
Find pairwise reaction by comparing energies at the interface.
- s4.cascade.cascade_model.find_competing_reactions(precursors: List[pymatgen.core.composition.Composition], open_comp: List[pymatgen.core.composition.Composition], all_targets: List[pymatgen.core.composition.Composition], target_mixture: Dict[pymatgen.core.composition.Composition, Dict[pymatgen.core.composition.Composition, float]], atom_set, temperature: float, gas_partials: Optional[Dict[pymatgen.core.composition.Composition, float]], use_scan=False)
Find competing pairwise reactions at a interface by finding largest driving force.
- Parameters
precursors – List of precursors to consider.
open_comp – List of open compositions such as gas.
all_targets – List of possible target compounds.
target_mixture – Composition mixture.
atom_set – Set of “non-volatile” atoms.
temperature – Temperature of the reaction.
gas_partials – Dictionary containing gas partial pressures.
use_scan – Whether to use SCAN database.
Utility to quickly balance reactions.
- exception s4.cascade.balance.CannotBalance
Bases:
ValueError
A reaction cannot be balanced.
- s4.cascade.balance.quick_balance(precursors: List[Union[pymatgen.core.composition.Composition, str]], open_comp: List[Union[pymatgen.core.composition.Composition, str]], target: Union[pymatgen.core.composition.Composition, str]) Tuple[List[float], List[float]]
Balance a chemical reaction.
- Parameters
precursors – List of precursors.
open_comp – List of open compositions.
target – The target material.
- Returns
Coefficients for the precursor and open materials.