Pa (probability "to be active") estimates the chance that the studied peptide is belonging to the class of the appropriate Secondary Structure of Protein (SSP), resembles the structures of peptides, which are the most typical in a sub-set of "actives" in MultiPASS training set. We may consider the relation of the appropriate peptide with the class of SSP as its activity.
Pi (probability "to be inactive") estimates the chance that the studied peptide is not belonging to the class of the appropriate SSP (resembles the structures of peptides, which are the most typical in a sub-set of "inactives" in MultiPASS training set).
IAP (Invariant Accuracy of Prediction) is the average accuracy of prediction that is obtained for the whole MultiPASS training set in leave-one-out cross-validation procedure.
IAP equals numerically to ROC AUC
Leave-one-out cross-validation (LOO CV)procedure is performed using the whole MultiPASS training set for validation of prediction quality. The prediction result is compared with the known experimental 3D protein structures from PDB database (version from November 2023) annotated by DSSP software (version 4.0.4). The procedure is repeated for all peptides from the MultiPASS training set; then the average Invariant Accuracy of Prediction (IAP=1-IEP) values are calculated for each class of SSP. IAP is numerically it equal to the ROC AUC value.
Only activities with Pa > Pi are considered as possible for a particular peptide (Confidence >0).
Even known peptides from proteins, whose sequences is not typical to the sequences of "actives" from the training set, may obtain a low Pa value and even Pa < Pi during the prediction. This is clear from the way how the functions Pa(B) and Pi(B) are constructed: the values Pa for "actives" and Pi for "inactives" are distributed fully uniformly. Taking this into account, the following interpretation of prediction results is possible.
If, for instance, Pa value equals to 0.9, then for 90% of "actives" from the training set the B values are less than for this compound, and only for 10% of "actives" this value is higher. If we decline the suggestion that this peptide is active, we will make a wrong decision with probability 0.9.
In case if Pa value is less than 0.5, but Pa > Pi, then for more than half of "actives" from the training set the B values are higher than for this peptide. If we decline the suggestion that this peptide is active, we will make a wrong decision with probability less than 0.5. In such case the probability to confirm this kind of activity in the experiment is small, but it will be confirmed more than 50% chances that this peptide has a high novelty.
Based on these criteria, one may choose which types of SSP may be real for the studied protein on the basis of compromise between the novelty and the risk to obtain the negative result in experimental testing.