A set of all possible SoLAs (structures with one labeled atom) with the appropriate LMNA descriptors is generated for a new compound under the prediction of reacting atoms (RAs). The results of prediction of RAs for new compounds are created on the basis of the prediction results of all SoLAs generated for a compound. Each SoLA relates to one appropriate RA.
For every SoLA the following values are calculating:
Pt is the probability that labeled atom in the SoLA is the RA of the appropriate reaction.
Pf is the probability that the labeled atom in SoLA is not the RA of the appropriate reaction.
The atoms in compounds are arranged according to deltaP (Pt -Pf ) values.
Invariant Accuracy of Prediction (IAP) criterion, similar to AUC (the Area Under the ROC Curve), was used to estimate the accuracy of the created method. Mathematically, IAP values equal the probability that the deltaP estimation has a higher value for a randomly selected positive example (SoLAs in which labeled atom is a PT, DeltaP+ ) than for a randomly selected negative example (SoLAs in which labeled atom is not a PT, DeltaP- ):
IAP = Probability{DeltaP+ > DeltaP-}.During the training procedure, each SoLA is excluded from the training set, thus, the leave-oneāout cross-validation (LOO CV) procedure is performed.
The detailed description of the algorithm is available here