A set of all possible SoLAs (structures with one labeled atom) with the appropriate LMNA descriptors is generated for a new compound under the prediction of sites of metabolism (SOMs). The results of prediction of SOMs for new compounds are created on the basis of the prediction results of all SoLAs generated for a compound. Each SoLA relates to one appropriate SOM.
For every SoLA the following values are calculating:
Pt is the probability that labeled atom in the SoLA is the SOM of the appropriate enzyme.
Pf is the probability that the labeled atom in SoLA is not the SOM of the appropriate enzyme.
The atoms in compounds are arranged according to deltaP (Pt -Pf ) values.
Invariant Accuracy of Prediction (IAP) criterion, similar to AUC (the Area Under the ROC Curve), was used to estimate the accuracy of the created method. Mathematically, IAP values equal the probability that the deltaP estimation has a higher value for a randomly selected positive example (SoLAs in which labeled atom is a SOM, DeltaP+ ) than for a randomly selected negative example (SoLAs in which labeled atom is not a SOM, DeltaP- ):
IAP = Probability{DeltaP+ > DeltaP-}.During the training procedure, each SoLA is excluded from the training set, thus, the leave-oneāout cross-validation (LOO CV) procedure is performed.
The detailed description of the algorithm is available here