Tutorial 2: Phase analysis with tree search#
Dara is equipped with a parallelized tree search algorithm to identify possible phases present in a given XRD pattern.
In this tutorial, we will try to identify the phases in one experimental solid-state
reaction sample between GeO2 and ZnO.
You can download this tutorial project from here.
%pip install ipywidgets nbformat
from pathlib import Path
from dara import search_phases
pattern_path = "tutorial_data/GeO2-ZnO_700C_60min.xrdml"
# three elements are present in the sample
chemical_system = "Ge-O-Zn"
Step 1: Prepare reference phases#
Dara pre-builds an index of all the unique and low-energy phases in ICSD and COD databases. It also implements a method to download CIF structures from COD data server so that there is no need to obtain the offline database.
Before every search, we will need to gather all the reference phases in the chemical
system for the search algorithm. Dara provides ICSDDatabase and CODDatabase to do
the filtering.
In this example, we will use CODDatabase to download all the phases in the chemical system of Ge-O-Zn.
from dara.structure_db import CODDatabase
# The COD database contains methods to filter phases in the chemical system
cod_database = CODDatabase()
# gather reference phases and save them to a directory called "cifs"
all_icsd_ids = cod_database.get_cifs_by_chemsys(chemical_system, dest_dir="cifs")
2026-06-08 02:49:32,871 WARNING dara.structure_db Local copy of database not found. Attempting to download structures...
2026-06-08 02:49:36,278 INFO dara.structure_db Saving downloaded CIFs to dara_downloaded_cifs
Skipping high-energy phase: 1528389 (Ge, 96): e_hull = 0.1494
Skipping high-energy phase: 9013109 (Ge, 64): e_hull = 0.3137
2026-06-08 02:49:36,289 INFO dara.structure_db Skipping common gas: O2
2026-06-08 02:49:36,289 INFO dara.structure_db Skipping common gas: O2
2026-06-08 02:49:36,290 INFO dara.structure_db Skipping common gas: O2
2026-06-08 02:49:36,290 INFO dara.structure_db Skipping common gas: O2
2026-06-08 02:49:36,291 INFO dara.structure_db Skipping common gas: O2
2026-06-08 02:49:36,291 INFO dara.structure_db Skipping common gas: O2
2026-06-08 02:49:36,292 INFO dara.structure_db Skipping common gas: O2
2026-06-08 02:49:36,292 INFO dara.structure_db Skipping common gas: O2
2026-06-08 02:49:36,293 INFO dara.structure_db Skipping common gas: O2
2026-06-08 02:49:36,293 INFO dara.structure_db Skipping common gas: O2
2026-06-08 02:49:36,293 INFO dara.structure_db Skipping common gas: O2
2026-06-08 02:49:36,294 INFO dara.structure_db Skipping common gas: O2
2026-06-08 02:49:36,294 INFO dara.structure_db Skipping common gas: O2
Skipping high-energy phase: 1525835 (GeO2, 205): e_hull = 0.2246
Skipping high-energy phase: 1533322 (Ge7O23, 215): e_hull = 0.6571
Skipping high-energy phase: 1011223 (ZnO2, 19): e_hull = 0.1674
Skipping high-energy phase: 1529590 (ZnO2, 164): e_hull = 0.4588
Skipping high-energy phase: 1534836 (ZnO, 225): e_hull = 0.1473
Successfully copied 9011050.cif to Ge_227_(cod_9011050)-0.cif in cifs
Successfully copied 7101738.cif to Ge_227_(cod_7101738)-0.cif in cifs
Successfully copied 1538108.cif to O17.28_12_(cod_1538108)-None.cif in cifs
Successfully copied 9012435.cif to Zn_194_(cod_9012435)-0.cif in cifs
Successfully copied 4030923.cif to Zn_12_(cod_4030923)-None.cif in cifs
Successfully copied 9007435.cif to GeO2_136_(cod_9007435)-0.cif in cifs
Successfully copied 1525833.cif to GeO2_60_(cod_1525833)-36.cif in cifs
Successfully copied 2104024.cif to GeO2_60_(cod_2104024)-36.cif in cifs
Successfully copied 1526227.cif to GeO2_14_(cod_1526227)-None.cif in cifs
Successfully copied 2300365.cif to GeO2_152_(cod_2300365)-0.cif in cifs
Successfully copied 8000212.cif to Ge5O11_12_(cod_8000212)-None.cif in cifs
Successfully copied 9006858.cif to GeO2_58_(cod_9006858)-6.cif in cifs
Successfully copied 9007477.cif to GeO2_154_(cod_9007477)-0.cif in cifs
Successfully copied 9015579.cif to GeO2_92_(cod_9015579)-1.cif in cifs
Successfully copied 9004178.cif to ZnO_186_(cod_9004178)-0.cif in cifs
Successfully copied 1527883.cif to ZnO2_44_(cod_1527883)-None.cif in cifs
Successfully copied 1536063.cif to Zn10.26O48_160_(cod_1536063)-None.cif in cifs
Successfully copied 1537875.cif to ZnO_216_(cod_1537875)-7.cif in cifs
Successfully copied 4517837.cif to Zn5O12_15_(cod_4517837)-None.cif in cifs
Successfully copied 1007256.cif to Zn2Ge3O8_212_(cod_1007256)-2.cif in cifs
Successfully copied 1549040.cif to Zn2GeO4_227_(cod_1549040)-None.cif in cifs
Successfully copied 1549041.cif to Zn2GeO4_95_(cod_1549041)-None.cif in cifs
Successfully copied 9014631.cif to Zn2GeO4_148_(cod_9014631)-0.cif in cifs
Since we are using a pre-filterd database (i.e., the COD), the downloaded CIF files will automatically be named according to the following convention:
{composition}_{spacegroup}_(cod|icsd_{id})-{e_hull}.cif
Where the e_hull is the energy above the convex hull in meV/atom, as determined from
the Materials Project database for the ground-state entry with matching composition and spacegroup.
Step 2: Search for phases#
After preparing the reference CIFs, we can start the phase search on a provided XRD pattern.
In this case, we are using the XRD pattern from the solid-state reaction sample
on our laboratory’s Aeris diffractometer (tutorial_data/GeO2-ZnO_700C_60min.xrdml).
# gather all the phases in the "cifs" directory
all_cifs = list(Path("cifs").glob("*.cif"))
search_results = search_phases(
pattern_path=pattern_path,
phases=all_cifs,
wavelength="Cu",
instrument_profile="Aeris-fds-Pixcel1d-Medipix3",
)
2026-06-08 02:49:37,192 INFO worker.py:1852 -- Started a local Ray instance.
2026-06-08 02:49:38,350 INFO dara.search.tree rpb_threshold automatically set to 1.00 based on pattern SNR.
2026-06-08 02:49:38,437 INFO dara.search.tree Detecting peaks in the pattern.
2026-06-08 02:50:06,412 INFO dara.search.tree The wmax is automatically adjusted to 60.04.
2026-06-08 02:50:06,414 INFO dara.search.tree The intensity threshold is automatically set to 9.06 % of maximum peak intensity.
2026-06-08 02:50:06,414 INFO dara.search.tree Creating the root node.
2026-06-08 02:50:06,415 INFO dara.search.tree Refining all the phases in the dataset.
(remote_do_refinement_no_saving pid=3067) /opt/hostedtoolcache/Python/3.12.13/x64/lib/python3.12/site-packages/pymatgen/core/composition.py:1372: FutureWarning: gcd is deprecated, and will be removed on 2028-01-01
(remote_do_refinement_no_saving pid=3067) Use math.gcd instead.
(remote_do_refinement_no_saving pid=3067) factor = abs(gcd(*(int(i) for i in sym_amt.values())))
(remote_do_refinement_no_saving pid=3070) /opt/hostedtoolcache/Python/3.12.13/x64/lib/python3.12/site-packages/pymatgen/core/composition.py:1372: FutureWarning: gcd is deprecated, and will be removed on 2028-01-01 [repeated 9x across cluster] (Ray deduplicates logs by default. Set RAY_DEDUP_LOGS=0 to disable log deduplication, or see https://docs.ray.io/en/master/ray-observability/user-guides/configure-logging.html#log-deduplication for more options.)
(remote_do_refinement_no_saving pid=3070) Use math.gcd instead. [repeated 9x across cluster]
(remote_do_refinement_no_saving pid=3070) factor = abs(gcd(*(int(i) for i in sym_amt.values()))) [repeated 9x across cluster]
2026-06-08 02:50:29,706 INFO dara.search.tree The initial value of eps2 is automatically set to 0.000000_-0.05^0.05.
2026-06-08 02:50:29,707 INFO dara.search.tree Finished refining 23 phases, with 7 phases removed.
2026-06-08 02:50:29,707 INFO dara.search.tree Express mode is enabled. Grouping phases before starting.
2026-06-08 02:50:30,366 INFO dara.search.tree Phases are grouped into 15 groups. In express mode, only the best phase in each group will be considered during the search.
(_remote_expand_node pid=3069) 2026-06-08 02:50:30,425 INFO dara.search.tree Expanding node c187992e-62e4-11f1-9fe1-70a8a59b6a40 with current phases [], Rwp = None
(remote_do_refinement_no_saving pid=3067) /opt/hostedtoolcache/Python/3.12.13/x64/lib/python3.12/site-packages/pymatgen/core/composition.py:1372: FutureWarning: gcd is deprecated, and will be removed on 2028-01-01 [repeated 12x across cluster]
(remote_do_refinement_no_saving pid=3067) Use math.gcd instead. [repeated 12x across cluster]
(remote_do_refinement_no_saving pid=3067) factor = abs(gcd(*(int(i) for i in sym_amt.values()))) [repeated 12x across cluster]
(_remote_expand_node pid=3070) 2026-06-08 02:50:31,438 INFO dara.search.tree Expanding node d051727d-62e4-11f1-9fe1-70a8a59b6a40 with current phases [RefinementPhase(path=PosixPath('cifs/ZnO_186_(cod_9004178)-0.cif'), params={'k1': '0.000000_0.0^0.01', 'b1': '0.004610_0.0^0.005'})], Rwp = 49.17
(remote_do_refinement_no_saving pid=3551) /opt/hostedtoolcache/Python/3.12.13/x64/lib/python3.12/site-packages/pymatgen/core/composition.py:1372: FutureWarning: gcd is deprecated, and will be removed on 2028-01-01 [repeated 10x across cluster]
(remote_do_refinement_no_saving pid=3551) Use math.gcd instead. [repeated 10x across cluster]
(remote_do_refinement_no_saving pid=3551) factor = abs(gcd(*(int(i) for i in sym_amt.values()))) [repeated 10x across cluster]
(_remote_expand_node pid=3069) 2026-06-08 02:50:37,353 INFO dara.search.tree Expanding node d3d60ce7-62e4-11f1-9fe1-70a8a59b6a40 with current phases [RefinementPhase(path=PosixPath('cifs/ZnO_186_(cod_9004178)-0.cif'), params={'k1': '0.000000_0.0^0.01', 'b1': '0.004610_0.0^0.005'}), RefinementPhase(path=PosixPath('cifs/Zn2GeO4_148_(cod_9014631)-0.cif'), params={'k1': '0.010000_0.0^0.01', 'b1': '0.005000_0.0^0.005'})], Rwp = 44.9 [repeated 2x across cluster]
(remote_do_refinement_no_saving pid=3762) /opt/hostedtoolcache/Python/3.12.13/x64/lib/python3.12/site-packages/pymatgen/core/composition.py:1372: FutureWarning: gcd is deprecated, and will be removed on 2028-01-01 [repeated 8x across cluster]
(remote_do_refinement_no_saving pid=3762) Use math.gcd instead. [repeated 8x across cluster]
(remote_do_refinement_no_saving pid=3762) factor = abs(gcd(*(int(i) for i in sym_amt.values()))) [repeated 8x across cluster]
(_remote_expand_node pid=3765) 2026-06-08 02:50:45,051 INFO dara.search.tree Expanding node d6dd3aea-62e4-11f1-9fe1-70a8a59b6a40 with current phases [RefinementPhase(path=PosixPath('cifs/GeO2_152_(cod_2300365)-0.cif'), params={'k1': '0.010000_0.0^0.01', 'b1': '0.005000_0.0^0.005'}), RefinementPhase(path=PosixPath('cifs/ZnO_186_(cod_9004178)-0.cif'), params={'k1': '0.000000_0.0^0.01', 'b1': '0.004610_0.0^0.005'}), RefinementPhase(path=PosixPath('cifs/Zn2GeO4_148_(cod_9014631)-0.cif'), params={'k1': '0.010000_0.0^0.01', 'b1': '0.005000_0.0^0.005'})], Rwp = 12.11 [repeated 4x across cluster]
(remote_do_refinement_no_saving pid=3067) /opt/hostedtoolcache/Python/3.12.13/x64/lib/python3.12/site-packages/pymatgen/core/composition.py:1372: FutureWarning: gcd is deprecated, and will be removed on 2028-01-01 [repeated 14x across cluster]
(remote_do_refinement_no_saving pid=3067) Use math.gcd instead. [repeated 14x across cluster]
(remote_do_refinement_no_saving pid=3067) factor = abs(gcd(*(int(i) for i in sym_amt.values()))) [repeated 14x across cluster]
Step 3: Result analysis#
The returned search result will be a list of SearchResult object.
search_results
[SearchResult(refinement_result=RefinementResult(lst_data=LstResult(raw_lst='Rietveld refinement to file(s) GeO2-ZnO_700C_60min.xy\nBGMN version 4.2.23, 4614 measured points, 135 peaks, 24 parameters\nStart: Mon Jun 8 02:50:38 2026; End: Mon Jun 8 02:50:42 2026\n23 iteration steps\n\nRp=9.82% Rpb=18.72% R=10.35% Rwp=12.11% Rexp=2.68%\nDurbin-Watson d=0.10\n1-rho=2.04%\n\nGlobal parameters and GOALs\n****************************\nQGeO2152cod23003650=0.4809+-0.0021\nQZnO186cod90041780=0.3862+-0.0024\nQZn2GeO4148cod90146310=0.1329+-0.0013\nEPS2=-0.002894+-0.000012\n\nLocal parameters and GOALs for phase GeO2152cod23003650\n******************************************************\nSpacegroupNo=152\nHermannMauguin=P3_121\nXrayDensity=4.276\nRphase=11.17%\nUNIT=NM\nA=0.499118+-0.000020\nC=0.564812+-0.000033\nk1=0.0100000\nB1=0.00500000\nGEWICHT=0.2613+-0.0011\nGrainSize(1,1,1)=84.1811\nAtomic positions for phase GeO2152cod23003650\n---------------------------------------------\n 3 0.4512 0.0000 0.3333 E=(GE(1.0000))\n 6 0.3974 0.3022 0.2429 E=(O(1.0000))\n\nLocal parameters and GOALs for phase ZnO186cod90041780\n******************************************************\nSpacegroupNo=186\nHermannMauguin=P6_3mc\nXrayDensity=5.669\nRphase=9.24%\nUNIT=NM\nA=0.325086+-0.000010\nC=0.520833+-0.000029\nk1=0\nB1=0.003365+-0.000094\nGEWICHT=0.2098+-0.0019\nGrainSize(1,1,1)=126.1+-3.5\nAtomic positions for phase ZnO186cod90041780\n---------------------------------------------\n 2 0.3333 0.6667 0.0000 E=(ZN(1.0000))\n 2 0.3333 0.6667 0.3821 E=(O(1.0000))\n\nLocal parameters and GOALs for phase Zn2GeO4148cod90146310\n******************************************************\nSpacegroupNo=148\nHermannMauguin=R-3\nXrayDensity=4.776\nRphase=19.33%\nUNIT=NM\nA=1.423920+-0.000081\nC=0.952754+-0.000072\nk1=0.0100000\nB1=0.00500000\nGEWICHT=0.07221+-0.00069\nGrainSize(1,1,1)=84.1811\nAtomic positions for phase Zn2GeO4148cod90146310\n---------------------------------------------\n 18 0.2150 0.1940 0.5830 E=(ZN(1.0000))\n 18 0.5483 0.8607 0.5837 E=(ZN(1.0000))\n 18 0.2150 0.1940 0.2500 E=(GE(1.0000))\n 18 0.8877 0.4633 0.4293 E=(O(1.0000))\n 18 0.2220 0.1310 0.4030 E=(O(1.0000))\n 18 0.2230 0.1140 0.7500 E=(O(1.0000))\n 18 0.9957 0.6613 0.5833 E=(O(1.0000))\n', pattern_name='GeO2-ZnO_700C_60min.xy', num_steps=23, rp=9.82, rpb=18.72, r=10.35, rwp=12.11, rexp=2.68, d=0.1, rho=2.04, phases_results={'GeO2_152_(cod_2300365)-0': PhaseResult(spacegroup_no=152, hermann_mauguin='P3_121', xray_density=4.276, rphase=11.17, unit='NM', gewicht=(0.2613, 0.0011), gewicht_name=None, a=(0.499118, 2e-05), b=None, c=(0.564812, 3.3e-05), alpha=None, beta=None, gamma=None, atom_positions_string=' 3 0.4512 0.0000 0.3333 E=(GE(1.0000))\n 6 0.3974 0.3022 0.2429 E=(O(1.0000))', k1=0.01, B1=0.005), 'ZnO_186_(cod_9004178)-0': PhaseResult(spacegroup_no=186, hermann_mauguin='P6_3mc', xray_density=5.669, rphase=9.24, unit='NM', gewicht=(0.2098, 0.0019), gewicht_name=None, a=(0.325086, 1e-05), b=None, c=(0.520833, 2.9e-05), alpha=None, beta=None, gamma=None, atom_positions_string=' 2 0.3333 0.6667 0.0000 E=(ZN(1.0000))\n 2 0.3333 0.6667 0.3821 E=(O(1.0000))', k1=0, B1=(0.003365, 9.4e-05)), 'Zn2GeO4_148_(cod_9014631)-0': PhaseResult(spacegroup_no=148, hermann_mauguin='R-3', xray_density=4.776, rphase=19.33, unit='NM', gewicht=(0.07221, 0.00069), gewicht_name=None, a=(1.42392, 8.1e-05), b=None, c=(0.952754, 7.2e-05), alpha=None, beta=None, gamma=None, atom_positions_string=' 18 0.2150 0.1940 0.5830 E=(ZN(1.0000))\n 18 0.5483 0.8607 0.5837 E=(ZN(1.0000))\n 18 0.2150 0.1940 0.2500 E=(GE(1.0000))\n 18 0.8877 0.4633 0.4293 E=(O(1.0000))\n 18 0.2220 0.1310 0.4030 E=(O(1.0000))\n 18 0.2230 0.1140 0.7500 E=(O(1.0000))\n 18 0.9957 0.6613 0.5833 E=(O(1.0000))', k1=0.01, B1=0.005)}, QGeO2152cod23003650=(0.4809, 0.0021), QZnO186cod90041780=(0.3862, 0.0024), QZn2GeO4148cod90146310=(0.1329, 0.0013), EPS2=(-0.002894, 1.2e-05))), phases=((RefinementPhase(path=PosixPath('cifs/GeO2_154_(cod_9007477)-0.cif'), params={'k1': '0.010000_0.0^0.01', 'b1': '0.005000_0.0^0.005'}), RefinementPhase(path=PosixPath('cifs/GeO2_152_(cod_2300365)-0.cif'), params={'k1': '0.010000_0.0^0.01', 'b1': '0.005000_0.0^0.005'})), (RefinementPhase(path=PosixPath('cifs/ZnO_186_(cod_9004178)-0.cif'), params={'k1': '0.000000_0.0^0.01', 'b1': '0.004610_0.0^0.005'}),), (RefinementPhase(path=PosixPath('cifs/Zn2GeO4_148_(cod_9014631)-0.cif'), params={'k1': '0.010000_0.0^0.01', 'b1': '0.005000_0.0^0.005'}),)), foms=((0.036542242871968895, 0.036651544160763515), (0.023575326529864105,), (0.014573994434645588,)), lattice_strains=((0.0005516300389977617, 0.0002442188732087964), (0.00039042658300650893,), (-0.007717561975400113,)), missing_peaks=[], extra_peaks=[])]
In this pattern, we only have one solution found with Rwp = 12.04 %.
for i in range(len(search_results)):
print(f"Rwp of solution {i} = {search_results[i].refinement_result.lst_data.rwp} %")
Rwp of solution 0 = 12.11 %
Each SearchResult has a .visualize() method to visualize the refined pattern and
missing/extra peaks in the solution. If there are no missing or extra peaks, this option
will not appear.
search_results[0].visualize()
You can also view all the alternative phases in one solution from SearchResult.phases attribute.
print("Phases found in solution 0:")
for i, phases_ in enumerate(search_results[0].phases):
print(f" - Phase {i}: {[phase.path.name for phase in phases_]}")
Phases found in solution 0:
- Phase 0: ['GeO2_154_(cod_9007477)-0.cif', 'GeO2_152_(cod_2300365)-0.cif']
- Phase 1: ['ZnO_186_(cod_9004178)-0.cif']
- Phase 2: ['Zn2GeO4_148_(cod_9014631)-0.cif']
From the result, you can see that for the phase GeO2, the algorithm identifies two
similar phases with slightly different spacegroups (152 and 154).