Tutorial 1: Automated Refinement (with BGMN)#

Dara provides a Python-based wrapper to the refinement software, BGMN, which implements a robust optimization algorithm that can refine automatically in most cases. This tutorial will show you how to interact with BGMN software and how to submit, adjust, and visualize your refinements.

You can download this tutorial project from here.

%pip install ipywidgets nbformat
from pathlib import Path

from dara.refine import do_refinement_no_saving
data = Path("tutorial_data")
cif_paths = list(data.glob("*.cif"))  # include all the cif files in the data folder

pattern_fn = "CaNi(PO3)4_800_240_Ca(OH)2_(NH4)2HPO4_NiO.xy"

Basic Refinement#

A one-line refinement with the default settings.

Do refinement#

By running do_refinement_no_saving, the refinement will be performed and the results will be printed out. There will be no BGMN refinement output folder saved on the disk.

The only two things you will need to feed into the system:

  • the path to the pattern. Currently, Dara only supports the xy, xrdml, raw formats.

  • a list of CIF file paths. The CIF will be used as the reference structure for the refinement.

refinement = do_refinement_no_saving(data / pattern_fn, cif_paths)
2026-06-08 02:48:56,506 WARNING dara.bgmn_worker BGMN executable not found. Downloading BGMN.
/opt/hostedtoolcache/Python/3.12.13/x64/lib/python3.12/site-packages/pymatgen/core/composition.py:1372: FutureWarning: gcd is deprecated, and will be removed on 2028-01-01
Use math.gcd instead.
  factor = abs(gcd(*(int(i) for i in sym_amt.values())))
/opt/hostedtoolcache/Python/3.12.13/x64/lib/python3.12/site-packages/pymatgen/core/composition.py:1372: FutureWarning: gcd is deprecated, and will be removed on 2028-01-01
Use math.gcd instead.
  factor = abs(gcd(*(int(i) for i in sym_amt.values())))
  0%|          | 0.00/1.59M [00:00<?, ?iB/s]
100%|██████████| 1.59M/1.59M [00:00<00:00, 51.2MiB/s]

Visualization#

You can call visualize to visualize the refinement results. The observed, calculated, and difference patterns will be plotted.

refinement.visualize()

Save the refinement plot#

Optionally, if you want to share the plot with others, you can save the plot by calling write_image or write_html in the plotly.Figure object returned by .visualize(). The plot will be saved on the disk.

refinement.visualize().write_html("tutorial_refinement.html")  # output the interactive html file to the disk
refinement.visualize().write_image("tutorial_refinement.png")  # output the png image to the disk

Extracting information from the refinement#

After finishing refinement, you can read information from the RefinementResult object. The object contains the following attributes:

  • lst_data: information about phases, metrics of refinement (from the .lst file in BGMN)

  • peak_data: the simulated peaks in the calculated pattern

  • plot_data: x (two-theta), y_obs, y_calc, y_bkg, contribution from each phase. This is mainly used for visualization.

For example, you can get Rwp from lst_data

f"The refinement has Rwp = {refinement.lst_data.rwp} %"
'The refinement has Rwp = 5.24 %'

You can also get the information about the lattice, weight fraction for the phase.

Some values are in a tuple, with first value as the value and the second value as the error.

phase_name = "CaNi(PO3)4_15_sym"
phase_result = refinement.lst_data.phases_results[phase_name]

gewicht = phase_result.gewicht
lattice_a = phase_result.a
lattice_b = phase_result.b
lattice_c = phase_result.c
lattice_alpha = phase_result.alpha
lattice_beta = phase_result.beta
lattice_gamma = phase_result.gamma

print(f"The lattice parameters of the phase {phase_name} are:\n" \
f"    a = {lattice_a} nm, b = {lattice_b} nm, c = {lattice_c} nm,\n"  \
f"    alpha = {lattice_alpha}, beta = {lattice_beta}, gamma = {lattice_gamma}\n")
print(f"The weight fraction of the phase {phase_name} is ({gewicht[0]} ± {gewicht[1]}) %")
The lattice parameters of the phase CaNi(PO3)4_15_sym are:
    a = (1.20707, 0.0001) nm, b = (0.869242, 9.2e-05) nm, c = (0.97908, 0.00011) nm,
    alpha = None, beta = (117.9527, 0.0055), gamma = None

The weight fraction of the phase CaNi(PO3)4_15_sym is (0.2163 ± 0.0019) %

Peak data is stored in a pandas DataFrame.

refinement.peak_data
2theta intensity b1 b2 h k l phase phase_idx
0 13.374690 36.856555 0.004895 8.341699e-12 1 1 0 CaNi(PO3)4_15_sym 0
1 14.301134 1004.319829 0.004895 1.094419e-11 1 1 -1 CaNi(PO3)4_15_sym 0
2 16.858555 21.811937 0.004895 2.127608e-11 2 0 0 CaNi(PO3)4_15_sym 0
3 19.170896 507.016959 0.004895 3.568629e-11 1 1 1 CaNi(PO3)4_15_sym 0
4 19.673643 311.878004 0.004895 3.959430e-11 2 0 -2 CaNi(PO3)4_15_sym 0
... ... ... ... ... ... ... ... ... ...
481 43.435170 560.348820 0.004461 1.129632e-09 2 0 0 NiO_225_sym 1
482 62.965924 271.558717 0.004461 4.491814e-09 2 2 0 NiO_225_sym 1
483 75.453810 96.103098 0.004461 8.478542e-09 3 1 1 NiO_225_sym 1
484 79.430805 70.799557 0.004461 1.008618e-08 2 2 2 NiO_225_sym 1
485 95.010070 29.037994 0.004461 1.791315e-08 4 0 0 NiO_225_sym 1

486 rows × 9 columns

Export the refined structure#

You can also export the refined structure as a pymatgen.Structure object.

structure = refinement.export_structure("CaNi(PO3)4_15_sym")
print(structure.to("tutorial_refinement_refined_CaNi(PO3)4_15_sym.cif", symprec=1e-3))
# generated using pymatgen
data_CaNi(PO3)4
_symmetry_space_group_name_H-M   C2/c
_cell_length_a   12.07070000
_cell_length_b   8.69242000
_cell_length_c   9.79080000
_cell_angle_alpha   90.00000000
_cell_angle_beta   117.95270000
_cell_angle_gamma   90.00000000
_symmetry_Int_Tables_number   15
_chemical_formula_structural   CaNi(PO3)4
_chemical_formula_sum   'Ca4 Ni4 P16 O48'
_cell_volume   907.43746863
_cell_formula_units_Z   4
loop_
 _symmetry_equiv_pos_site_id
 _symmetry_equiv_pos_as_xyz
  1  'x, y, z'
  2  '-x, -y, -z'
  3  '-x, y, -z+1/2'
  4  'x, -y, z+1/2'
  5  'x+1/2, y+1/2, z'
  6  '-x+1/2, -y+1/2, -z'
  7  '-x+1/2, y+1/2, -z+1/2'
  8  'x+1/2, -y+1/2, z+1/2'
loop_
 _atom_site_type_symbol
 _atom_site_label
 _atom_site_symmetry_multiplicity
 _atom_site_fract_x
 _atom_site_fract_y
 _atom_site_fract_z
 _atom_site_occupancy
  Ca  Ca0  4  0.00000000  0.04710000  0.25000000  1.0
  Ni  Ni1  4  0.25000000  0.25000000  0.50000000  1.0
  P  P2  8  0.00320000  0.72360000  0.47450000  1.0
  P  P3  8  0.18860000  0.49180000  0.19450000  1.0
  O  O4  8  0.03010000  0.86190000  0.41040000  1.0
  O  O5  8  0.06150000  0.25470000  0.42910000  1.0
  O  O6  8  0.07200000  0.60250000  0.15680000  1.0
  O  O7  8  0.13280000  0.36240000  0.06680000  1.0
  O  O8  8  0.21690000  0.08020000  0.33440000  1.0
  O  O9  8  0.22930000  0.42260000  0.34390000  1.0
/opt/hostedtoolcache/Python/3.12.13/x64/lib/python3.12/site-packages/pymatgen/core/composition.py:1372: FutureWarning:

gcd is deprecated, and will be removed on 2028-01-01
Use math.gcd instead.

/opt/hostedtoolcache/Python/3.12.13/x64/lib/python3.12/site-packages/pymatgen/core/composition.py:1372: FutureWarning:

gcd is deprecated, and will be removed on 2028-01-01
Use math.gcd instead.

Refining with customized phase parameters#

The refinement with default setting looks good. But can it be better?

In BGMN, peak profiles are modeled by deconvoluting the instrumental profile from the sample’s real structure profile. The sample’s broadening is handled using a combination of Lorentzian and Gaussian-like functions.

Dara supports the basic refinement parameters in BGMN / Profex. You can adjust the refinement parameters by passing the parameters to the do_refinement_no_saving function.

Common parameters include:

  • lattice_range: you can (and need to) specify the range that the lattice parameters can vary. Usually, it can be a small range, like 0.01 ~ 0.03. By default it is applied symmetrically to all lattice parameters (a, b, c, alpha, beta, gamma), but it accepts several forms:

    • a single float r: a symmetric range [a*(1-r), a*(1+r)] applied to all parameters.

    • the string "fixed": all lattice parameters are held fixed and not refined.

    • a tuple (lo, hi): explicit fractional deltas giving bounds [a*(1+lo), a*(1+hi)], which lets you build asymmetric or one-sided windows. The range can span both signs (e.g. (-0.05, 0.2)), allow only positive deltas / expansion (e.g. (0.02, 0.1)), or only negative deltas / contraction (e.g. (-0.1, 0.0)). If the starting value falls outside the window, it is clamped to the nearest bound.

    • a dict mapping parameter names ("A", "B", "C", "ALPHA", "BETA", "GAMMA", case-insensitive) to any of the above, for per-parameter control. Use the wildcard key "*" as a fallback for unlisted parameters; without it, unlisted parameters default to a symmetric 0.1. For example, {"C": "fixed", "A": (-0.05, 0.2), "*": 0.1}.

  • b1 (Lorentzian Size Broadening): controls the Lorentian broadening of the peak. Physically, it accounts for the broadening caused by the average crystallite (domain) size. A larger b1 creates a broader Lorentzian profile with heavy tails, indicating smaller average crystallites. Usually, it is constrained to a small range, such as 0 to 0.005. If the fitted b1 is too large, you will see the peaks go too broad. In this case, your simulated pattern will look like an amorphous material that can be easily fit into the background.

  • k1 (Gaussian Size Broadening): This defines the Gaussian-like contribution to the size effects. Physically, it acts as a measure of the width of the crystallite size distribution. The larger the k1, the smaller the distribution is. Usually, it can be constrained to 0 ~ 1.

  • k2 (Gaussian Strain Broadening): describes the microstrain in the sample. This parameter defines the Gaussian-like broadening caused by microstrain (the mean squared strain in the crystal lattice). A larger k2 indicates higher internal strain in your sample. Usually, it can be a fixed value, like 0.

  • gewicht: means “weight” in German. It contains the information of scale factor. However, in BGMN, it can also be used to specify the preferred orientation you would like to use in the refinement. By specifying the preferred orientation, you can vary the intensity of a set of reflections in the pattern, which can help you fit your pattern better. BGMN is able to decide which reflection to adjust automatically. You only need to specify how strong the preferred orientation is. Usually, it can be SPHAR0 (none), SPHAR2 (two preferred orientation parameters), or SPHAR4 (four preferred orientation parameters), … (up to SPHAR10). The larger the order, the stronger the preferred orientation is. But it can cause overfitting as well.

Input parameter format#

In Dara, all the phase parameters are passed as a dictionary arg called phase_params. The key is the parameter name, and the value is the parameter value. Dara supports three types of values:

  • fixed. This is a string. The parameter will be fixed to the default value (usually 0).

  • (initial value)_(min value)^(max value). This is a string. The parameter will be allowed to vary in the refinement between the initial value, the min value, and the max value. The min value begins with _, and the max value begins with ^.

  • Other values. It can be a string or a number. For example, setting gewicht to SPHAR2 means that preferred orientation is modeled with the SPHAR2 settings; setting lattice_range to 0.05 means that the lattice parameters can vary up to 5%. See the BGMN / Profex manual for more information on these settings.

If you want to allow 5% variation in lattice parameters, then use the following settings. b1: (started from 0, min = 0, max = 0.005), k1: (started from 0, min = 0, max = 1), k2: fixed to 0, and gewicht: SPHAR2. This corresponds to a phase_params dict of:

phase_params = {
    "lattice_range": 0.05,
    "b1": "0_0^0.005",
    "k1": "0_0^1",
    "k2": "fixed",
    "gewicht": "SPHAR2"
}
refinement = do_refinement_no_saving(data / pattern_fn, cif_paths, phase_params=phase_params)
/opt/hostedtoolcache/Python/3.12.13/x64/lib/python3.12/site-packages/pymatgen/core/composition.py:1372: FutureWarning:

gcd is deprecated, and will be removed on 2028-01-01
Use math.gcd instead.

/opt/hostedtoolcache/Python/3.12.13/x64/lib/python3.12/site-packages/pymatgen/core/composition.py:1372: FutureWarning:

gcd is deprecated, and will be removed on 2028-01-01
Use math.gcd instead.
refinement.visualize()

Now you can see the Rwp of the refinement is slightly lower, indicating a better fit.

Advanced: custom parameters#

For cases where the standard parameters above aren’t enough, e.g. the refinement of high-temperature measurements acquired during in-situ XRD, the code supports two optional arguments that let you inject your own BGMN definitions: custom_params and custom_params_map.

  • custom_params: an optional list of raw BGMN string lines, for defining global parameters or equations you can reference elsewhere, e.g. "PARAM=Bglobal=0.05_0.01^0.20 //".

  • custom_params_map: an optional dictionary mapping element symbols to per-site parameters to add or overwrite, e.g. TDS for thermal displacement and Occ for occupancy. Use the wildcard key "*" to apply parameters to every element not otherwise matched.

See the BGMN / Profex manual for more details on these settings.

custom_params = [
    "PARAM=Bglobal=0.05_0.01^0.20 //",
    "PARAM=BO=0.1_0.02^0.3 //",
    "PARAM=BCo=0.1_0.02^0.3 //",
]

custom_params_map = {
    "*": {"TDS": "Bglobal"},
    "O": {"TDS": "BO", "Occ": "OccO"},
    "Co": {"TDS": "BCo"},
}

Specify different parameters for different phases#

In the previous example, the refinement option is applied to all phases. But you can also specify different parameters for different phases. To do so, you will need to pass a special RefinementPhase object to the phases parameter. Any of the parameters defined above can be specified differently for one or more indiviudal phases.

from dara import RefinementPhase

phases = [RefinementPhase.make(cif_path) for cif_path in cif_paths]

for phase in phases:
    # use a smaller lattice range for each phase
    phase.params["lattice_range"] = 0.01


refinement = do_refinement_no_saving(
    data / pattern_fn,
    phases=phases,
    # if one parameter is both specified in phase_params and in the RefinementPhase object, the value in RefinementPhase will be used.
    phase_params={
        "lattice_range": 0.05,  # <- this will be ignored because it has already been set in the eahc RefinementPhase object
        "b1": "0_0^0.005",
        "k1": "0_0^1",
        "k2": "fixed",
        "gewicht": "SPHAR2"
    }
)
/opt/hostedtoolcache/Python/3.12.13/x64/lib/python3.12/site-packages/pymatgen/core/composition.py:1372: FutureWarning:

gcd is deprecated, and will be removed on 2028-01-01
Use math.gcd instead.

/opt/hostedtoolcache/Python/3.12.13/x64/lib/python3.12/site-packages/pymatgen/core/composition.py:1372: FutureWarning:

gcd is deprecated, and will be removed on 2028-01-01
Use math.gcd instead.

Refining with different instrument profiles, angle range, wavelength, etc.#

If you would like to do refinement in a different instrument profiles or angle range, you can specify it in the refinement function as well.

  • instrument_name: the instrument profile you would like to use. The default instrument profile used is Aeris-fds-Pixcel1d-Medipix3. You can find the available instrument profiles in the dara/data/BGMN-Templates/Devices folder.

  • wavelength: the wavelength you would like to use in the refinement. It can be two types:

    • a number: the wavelength in nm. It is useful when you analyzing the data from a synchrotron.

    • a string: the element symbol. It represents the target material in the X-ray tube. BGMN can automatically find the distribution of the wavelength for the given metal. Currently, it supports sources of [“Cu”, “Co”, “Cr”, “Fe”, “Mo”]

  • wmin, wmax: the angle range you would like to use in the refinement. It is set in refinement_params.

refinement = do_refinement_no_saving(
    data / pattern_fn,
    cif_paths,
    instrument_profile="Aeris-fds-Pixcel1d-Medipix3",
    wavelength="Cu",
    refinement_params={
        "wmin": 20,  # set the minimum two-thera in the refinement to be 20 deg.
        "wmax": 50  # set the maximum two-theta in the refinement to be 50 deg.
    }
)
/opt/hostedtoolcache/Python/3.12.13/x64/lib/python3.12/site-packages/pymatgen/core/composition.py:1372: FutureWarning:

gcd is deprecated, and will be removed on 2028-01-01
Use math.gcd instead.

/opt/hostedtoolcache/Python/3.12.13/x64/lib/python3.12/site-packages/pymatgen/core/composition.py:1372: FutureWarning:

gcd is deprecated, and will be removed on 2028-01-01
Use math.gcd instead.
refinement.visualize()

Save the project file to a folder on the disk#

Usually, you don’t have to directly interact with the BGMN input/output files.

However, if you would like to save the project directory for later use, you can use the do_refinement function instead.

The refinement files will be saved in the path specified by working_dir. Other than that, do_refinement and do_refinement_no_saving share the same parameters and output.

Feel free to modify the refinement project file yourself or by loading with the Profex software.

from dara import do_refinement

refinement = do_refinement(data / pattern_fn, cif_paths, working_dir="tutorial_refinement")
/opt/hostedtoolcache/Python/3.12.13/x64/lib/python3.12/site-packages/pymatgen/core/composition.py:1372: FutureWarning:

gcd is deprecated, and will be removed on 2028-01-01
Use math.gcd instead.

/opt/hostedtoolcache/Python/3.12.13/x64/lib/python3.12/site-packages/pymatgen/core/composition.py:1372: FutureWarning:

gcd is deprecated, and will be removed on 2028-01-01
Use math.gcd instead.
refinement_folder = Path("tutorial_refinement")

# show all the files in the folder
for file in refinement_folder.glob("*"):
    print(">", file.name)
> NiO_225_sym.str
> CaNi(PO3)4_800_240_Ca(OH)2_(NH4)2HPO4_NiO.lst
> CaNi(PO3)4_800_240_Ca(OH)2_(NH4)2HPO4_NiO.par
> Aeris-fds-Pixcel1d-Medipix3.geq
> CaNi(PO3)4_800_240_Ca(OH)2_(NH4)2HPO4_NiO.dia
> CaNi(PO3)4_15_sym.str
> CaNi(PO3)4_800_240_Ca(OH)2_(NH4)2HPO4_NiO.sav
> CaNi(PO3)4_800_240_Ca(OH)2_(NH4)2HPO4_NiO.xy