Duplicacy check#

Check duplicacy between structures.

clean_up_decoration(s)[source]#

Remove all decoration from a structure.

Typically, used before comparing two structures.

Parameters:

s (Structure) – A structure.

Returns:

The cleaned up structure containing only Element.

Return type:

Structure

is_corr_duplicate(s1, proc1, s2=None, proc2=None, features2=None)[source]#

Check whether two structures have the same correlation vectors.

Note

This is the most used criteria for structure duplicacy as two structures with the same correlation vector should in principle not be included in the training set together! Also, comparing correlation vectors can be much faster than comparing two structures with StructureMatcher.

Parameters:
  • s1 (Structure) – A structure to be checked.

  • proc1 (CompositeProcessor) –

    A processor established on the super-cell matrix of s1.

    Note

    Must use ClusterExpansionProcessor instead of ClusterDecompositionProcessor.

  • s2 (Structure) – optional Same as s1, but if a feature vector is already given, no need to give s2.

  • proc2 (CompositeProcessor) – optional Same as proc1. But if a feature vector is already given, no need to give.

  • features2 (1D arrayLike) – optional The feature vector of s2. If not given, must give both s2 and proc2.

is_duplicate(s1, s2, remove_decorations=False, matcher=None)[source]#

Check the duplicacy between structures.

Parameters:
  • s1 (Structure) – A structure to be checked.

  • s2 (Structure) – Same as s1.

  • remove_decorations (bool) – optional Whether to remove all decorations from species (i.e, charge and other properties). Default to false.

  • matcher (StructureMatcher) – optional A StructureMatcher to compare two structures. Using the same _site_matcher as cluster_subspace is highly recommended.

Returns:

bool