Definitions
The Biological Activity Spectrum of a chemical compound is the set of different types of biological activity that reflect the results of the compound's interaction with various biological entities. Biological activity is defined qualitatively ("yes"/"none") suggesting that the biological activity spectrum represents the "intrinsic" property of a substance depending only on its structure and physical-chemical characteristics. Though this may be a generalization, it provides the possibility for combining information from many different sources in the same training set, which is necessary because no one particular publication comprehensively covers all the various facets of the biological action of a compound.
PASS (Prediction of Activity Spectra for Substances) is a software product designed as a tool for evaluating the general biological potential of an organic drug-like molecule. PASS provides simultaneous predictions of many types of biological activity based on the structure of organic compounds. Thus, PASS can be used to estimate the biological activity profiles for virtual molecules, prior to their chemical synthesis and biological testing.
Pa (probability "to be active") estimates the chance that the studied compound is belonging to the sub-class of active compounds (resembles the structures of molecules, which are the most typical in a sub-set of "actives" in PASS training set).
Pi (probability "to be inactive") estimates the chance that the studied compound is belonging to the sub-class of inactive compounds (resembles the structures of molecules, which are the most typical in a sub-set of "inactives" in PASS training set).
IEP (Invariant Error of Prediction) is the average error of prediction that is obtained for the whole PASS training set in leave-one-out cross-validation procedure.
Leave-one-out cross-validation (LOO CV) procedure is performed using the whole PASS training set for validation of prediction quality. Biological activity spectrum is predicted for each compound using the structure-activity relationships calculated from the data for all other compounds. The prediction result is compared with known experimental data for the studied compound. The procedure is repeated for all compounds from the PASS training set; then the average Invariant Accuracy of Prediction (IAP=1-IEP) values are calculated for each biological activity and for all biological activites.
Robustness of PASS algorithm means that PASS provides reasonable estimates of structure-activity relationships despite of incompleteness (or some errors in data) of PASS training set.
MOLfile is a file format created by MDL (later Symyx, now Accelrys), for holding information about the atoms, bonds, connectivity and coordinates of a molecule. The MOLfile consists of some header information, the Connection Table (CT) containing atom info, then bond connections and types, followed by sections for more complex information.
SDfile is one of a family of chemical-data file formats developed by MDL; it is intended especially for structural information. "SDF" stands for structure-data file, and SDF files actually wrap the MOLfile format. Multiple compounds are delimited by lines consisting of four dollar signs ($$$$). A feature of the SDF format is its ability to include associated data.