2-(4-hydroxyphenyl)-2-oxoacetaldehyde oxime
- 
            Formula: C8H7NO3 
- 
            Molecular weight: 165.15 
- 
            Smiles: C1=CC(=CC=C1C(=O)C=NO)O 
2-(4-hydroxyphenyl)-2-oxoacetaldehyde oxime
Names
- 
                Mycotoxin name: 2-(4-hydroxyphenyl)-2-oxoacetaldehyde oxime 
- 
                First synonym: SCHEMBL9056522 
- 
                Synonyms: SCHEMBL9056522, ZINC2559912, 1-(p-Hydroxyphenyl)glyoxal 2-oxime, (E)-2-(4-Hydroxyphenyl)-2-oxoacetaldehyde oxime 
Identifiers / External links
- 
                PubChem CID: 6412242 
- 
                SCHEMBL: SCHEMBL9056522 
Structure
- 
                Smiles: C1=CC(=CC=C1C(=O)C=NO)O 
- 
                Isomeric smiles: C1=CC(=CC=C1C(=O)/C=NO)O 
- 
                Inchi: InChI=1S/C8H7NO3/c10-7-3-1-6(2-4-7)8(11)5-9-12/h1-5,10,12H/b9-5- 
- 
                Inchikey: QIMBQAFHVVTXTD-UITAMQMPSA-N 
- 
            2D structure: 
- 
            3D structure: 
Physico-chemical properties
- 
                Formula: C8H7NO3 
- 
                Molecular weight: 165.15 
- 
                Monoisotopic mass: 165.042593085 
Select an endpoint:
| Endpoint | Tool | QSAR ID | Value | Unit | Comments | Reference | 
|---|---|---|---|---|---|---|
| nRot | PKCSM | 2.0 | doi: 10.1021/acs.jmedchem.5b00104 | |||
| LogP | VEGA | MLogP | 0.3 | Log(mol.L) | LogP model (MLogP)-prediction | https://www.vegahub.eu/portfolio-item/vega-qsar-models-qrmf/ | 
| nRing | ADMETLAB2 | 1.0 | doi: 10.1093/nar/gkab255 | |||
| LogP | VEGA | ALogP | 0.66 | Log(mol.L) | LogP model (ALogP)-prediction | https://www.vegahub.eu/portfolio-item/vega-qsar-models-qrmf/ | 
| MaxRing | ADMETLAB2 | 6.0 | doi: 10.1093/nar/gkab255 | |||
| LogP | PKCSM | iLOGP | 1.0349 | Log(mol.L) | LOGP | doi: 10.1021/acs.jmedchem.5b00104 | 
| nHet | ADMETLAB2 | 4.0 | doi: 10.1093/nar/gkab255 | |||
| LogP | SWISSADME | iLOGP | 0.59 | Log(mol.L) | ||
| fChar | ADMETLAB2 | 0.0 | doi: 10.1093/nar/gkab255 | |||
| LogP | SWISSADME | XLOGP3 | 1.72 | Log(mol.L) | ||
| nRig | ADMETLAB2 | 8.0 | doi: 10.1093/nar/gkab255 | |||
| LogP | SWISSADME | WLOGP | 1.03 | Log(mol.L) | ||
| MW | ADMETLAB2 | 165.04 | doi: 10.1093/nar/gkab255 | |||
| nHAt | SWISSADME | 12.0 | ||||
| LogP | SWISSADME | MLOGP | 0.12 | Log(mol.L) | ||
| MW | SWISSADME | 165.15 | ||||
| Flex | ADMETLAB2 | 0.25 | doi: 10.1093/nar/gkab255 | |||
| LogP | SWISSADME | Silicos-IT Log P | 1.07 | Log(mol.L) | ||
| MW | PKCSM | 165.148 | doi: 10.1021/acs.jmedchem.5b00104 | |||
| nStereo | ADMETLAB2 | 0.0 | doi: 10.1093/nar/gkab255 | |||
| LogS | ADMETLAB2 | -1.77 | Log(mol.L) | logS: The logarithm of aqueous solubility value.: The prediction is based on Multi-task Graph Attention (MGA) framework MGA is composed of input, Relation graph convolution network (RGCN) layers, attention layer and fully-connected (FC) layers. In the Input, a node represents the information of an atom, and after passing RGCN layers, the node represents general features of circular substructure centered on the atom. RGCN is an extension of the standard graph convolution network (GCN) by introducing edge features to enrich the messages used to update the hidden states in the network. Attention layers can assign different attention weights to different substructures, and then generate the customized fingerprints (CFP) from the general features for a specific task. The prediction results are mainly displayed in the tabular format in the browser, with the 2D molecular structure and a radar plot summarizing the physicochemical quality of the compound. For those endpoints predicted by the regression models concrete predictive values are provided. To obtain robust and accurate prediction models, the model training process was repeated ten times with random data splitting. The best performing models were incorporated into the online platform, and different performance measures for the training test regression model: R-square (R2) of 0.967, mean absolute error (MAE) of 0.399, and root mean squared error (RMSE) of 0.287. The logarithm of aqueous solubility value. The first step in the drug absorption process is the disintegration of the tablet or capsule, followed by the dissolution of the active drug. Low solubility is detrimental to good and complete oral absorption, and early measurement of this property is of great importance in drug discovery. This model is based on 4797 Total molecules, with 3836 in the training set, 480 test set, and 480 validation set drug like molecules. How to interpret: The predicted solubility of a compound is given as the logarithm of the molar concentration (log mol/L). Compounds in the range from -4 to 0.5 log mol/L will be considered proper. | doi: 10.1093/nar/gkab255 | |
| RatioCsp3 | SWISSADME | 0.0 | ||||
| LogD7.4 | ADMETLAB2 | 0.736 | Log(mol.L) | doi: 10.1093/nar/gkab255 | ||
| LogS | ADMETSAR | ESOL | -1.6214 | Log(mol.L) | DOI: 10.1093/bioinformatics/bty707 | |
| nARO | SWISSADME | 6.0 | ||||
| VDW_Vol | ADMETLAB2 | 162.553 | doi: 10.1093/nar/gkab255 | |||
| LogS | PKCSM | Ali | -1.342 | Log(mol.L) | doi: 10.1021/acs.jmedchem.5b00104 | |
| nHA | ADMETLAB2 | 4.0 | doi: 10.1093/nar/gkab255 | |||
| Dens | ADMETLAB2 | 1.015 | doi: 10.1093/nar/gkab255 | |||
| LogS | VEGA | ESOL | -1.52 | Log(mol.L) | https://www.vegahub.eu/portfolio-item/vega-qsar-models-qrmf/ | |
| nHA | SWISSADME | 4.0 | ||||
| VSA | PKCSM | 68.761 | doi: 10.1021/acs.jmedchem.5b00104 | |||
| LogS | SWISSADME | ESOL | -2.19 | Log(mol.L) | ||
| nHA | PKCSM | 4.0 | doi: 10.1021/acs.jmedchem.5b00104 | |||
| Mref | SWISSADME | 43.07 | ||||
| LogS | SWISSADME | ALI | -2.8 | Log(mol.L) | ||
| nHD | SWISSADME | 2.0 | ||||
| TPSA | ADMETLAB2 | 69.89 | A² | doi: 10.1093/nar/gkab255 | ||
| LogS | SWISSADME | Silicos-IT | -1.28 | Log(mol.L) | ||
| nHD | ADMETLAB2 | 2.0 | doi: 10.1093/nar/gkab255 | |||
| TPSA | SWISSADME | 69.89 | A² | |||
| nHD | PKCSM | 2.0 | doi: 10.1021/acs.jmedchem.5b00104 | |||
| nRot | ADMETLAB2 | 2.0 | doi: 10.1093/nar/gkab255 | |||
| LogP | ADMETLAB2 | 1.312 | Log(mol.L) | logP: The logarithm of the n-octanol/water distribution coefficient: The prediction is based on Multi-task Graph Attention (MGA) framework MGA is composed of input, Relation graph convolution network (RGCN) layers, attention layer and fully-connected (FC) layers. In the Input, a node represents the information of an atom, and after passing RGCN layers, the node represents general features of circular substructure centered on the atom. RGCN is an extension of the standard graph convolution network (GCN) by introducing edge features to enrich the messages used to update the hidden states in the network. Attention layers can assign different attention weights to different substructures, and then generate the customized fingerprints (CFP) from the general features for a specific task. The prediction results are mainly displayed in the tabular format in the browser, with the 2D molecular structure and a radar plot summarizing the physicochemical quality of the compound. For those endpoints predicted by the regression models concrete predictive values are provided. To obtain robust and accurate prediction models, the model training process was repeated ten times with random data splitting. The best performing models were incorporated into the online platform, and different performance measures for the training test regression model: R-square (R2) of 0.980, mean absolute error (MAE) of 0.257, and root mean squared error (RMSE) of 0.193. The logarithm of the n-octanol/water distribution coefficient. log P possess a leading position with considerable impact on both membrane permeability and hydrophobic binding to macromolecules, including the target receptor as well as other proteins like plasma proteins, transporters, or metabolizing enzymes. This model is based on 12682 Total molecules, with 10145 in the training set, 1270 test set, and 1267 validation set drug like molecules. How to interpret: The predicted logP of a compound is given as the logarithm of the molar concentration (log mol/L). Compounds in the range from 0 to 3 log mol/L will be considered proper. | doi: 10.1093/nar/gkab255 | |
| nRot | SWISSADME | 2.0 | ||||
| LogP | VEGA | Meylan-Kowwi | 0.6 | Log(mol.L) | LogP model (Meylan-Kowwin)-prediction | https://www.vegahub.eu/portfolio-item/vega-qsar-models-qrmf/ | 
Select an endpoint:
| Category | Endpoint | Tool | QSAR ID | Value | Unit | Comments | Reference | 
|---|---|---|---|---|---|---|---|
| Metabolism | CYP2C9-inh | ADMETSAR | - | Active/"-" = Inactive/Not predicted | Five subgroups of CYP inhibitors were collected by Cheng et al.,[1] including 1a2, 2d6, 2c9, 2c19, and 3a4. A compound was assigned as a CYP inhibitor if the AC50 (the compound concentration leads to 50% of the activity of an inhibition control) value was 10 μM, and it was considered as a noninhibitor if AC50 was >57 μM. In addition, a compound was regarded as a CYP inhibitor if it has the PubChem activity score between 40 and 100, and as a noninhibitor if it has PubChem activity score equal to 0. Three subgroups of CYP substrates were collected by Carbon-Mangles et al., including 2d6, 2c9, and 3a4.[2] The models were built by by MACCS fingerprints and Support vector machine. | DOI: 10.1093/bioinformatics/bty707 | |
| Absorption | HIA | SWISSADME | Active | Active/"-" = Inactive/Not predicted | The predictions for passive human gastrointestinal absorption (HIA) and blood-brain barrier (BBB) permeation both consist in the readout of the BOILED-Egg model (Daina, A. & Zoete, V. A BOILED-Egg To Predict Gastrointestinal Absorption and Brain Penetration of Small Molecules. ChemMedChem 11, 1117–1121 (2016), an intuitive graphical classification model, which can be displayed in the SwissADME result page by clicking the red button appearing below the sketcher when all input molecules have been processed (refer to Graphical Output). This models are based on the computation of the lipophilicity (WLOGP) and polarity (tPSA). Combining both best ellipses yields the BOILED‐Egg predictive model for respectively HIA and BBB. The white region is the physicochemical space of molecules with highest probability of being absorbed by the gastrointestinal tract, and the yellow region (yolk) is the physicochemical space of molecules with highest probability to permeate to the brain. Yolk and white areas are not mutually exclusive. Other binary classification models are included, which focus on the propensity for a given small molecule to be substrate or inhibitor of proteins governing important pharmacokinetic behaviours. Gastro intestinal absorption: according to the white of the BOILED-egg. doi/10.1002/cmdc.201600182. | ||
| Transporter | Pgp Inhibitor | vNN-ADMET | - | Active/"-" = Inactive/Not predicted | Pgp inhibitors: P-glycoprotein inhibitors. The k-nearest neighbor (k-NN) method is widely used to develop QSAR models (Zheng and Tropsha, 2000). An alternative approach is to use a predetermined similarity criterion, vNN method, which uses all nearest neighbors that meet a structural similarity criterion to define the model's applicability domain (Liu et al., 2012, 2015; Liu and Wallqvist, 2014). When no nearest neighbor meets the criterion, the vNN method makes no prediction. P-glycoprotein (Pgp) is an essential cell membrane protein that extracts many foreign substances from the cell (Ambudkar et al., 2003). As such, it is a critical determinant of the pharmacokinetic properties of drugs. Cancer cells often overexpress Pgp, which increases the efflux of chemotherapeutic agents from the cell and prevents treatment by reducing the effective intracellular concentrations of such agents—a phenomenon known as multidrug resistance (Borst and Elferink, 2002). For this reason, identifying compounds that can either be transported out of the cell by Pgp (substrates) or impair Pgp function (inhibitors) is of great interest. vNN method based on dataset included 1,319 inhibitors and 937 non-inhibitors. We classified the Pgp inhibitors and non-inhibitors as positives and negatives, espectively. Overall accuracy of 85%, when using 10-fold CV, with corresponding kappa value of 0.66. These models reliably predicted 76% of the compounds in their datasets to be Pgp inhibitors. Results were adapted and the “Yes” and “No” indicators were changed respectively to “Active” and “-“. | doi: 10.3389/fphar.2017.00889. | |
| Transporter | MATE1-inh | ADMETSAR | - | Active/"-" = Inactive/Not predicted | MATE1 is an apically expressed poly-specific proton antiporter which mediates the efflux of diverse substrates, primarily organic cations, in the kidney and the liver. Following its relatively recent discovery, MATE1 has rapidly emerged as an important transporter in the renal and biliary excretion of endogenous and exogenous organic cations, particularly metformin. It appears that clinical inhibitors of organic cation transporters (OCTs) are also potent inhibitors of MATEs, and therefore modulation of the activity of both OCTs and MATEs, or predominantly of MATEs, may better describe DDIs currently ascribed to OCTs. The major focus of investigation for MATE1 has been on its role in renal drug disposition and elimination, notably on the renal elimination of metformin and the renal toxicity of cisplatin. Various studies of the impact of functional gene polymorphisms of MATE1, MATE2K, OCT1, and OCT2 on metformin pharmacokinetics, efficacy and safety, as well as preclinical assessments in Mate1 knockout mice, imply a significant role for these transporters. The recent FDA regulatory guideline now recommends evaluation of MATE1-mediated drug interactions for NCEs that undergo significant renal elimination. The human multidrug and toxin extrusion (MATE) transporter 1 contributes to the tissue distribution and excretion of many drugs. Inhibition of MATE1 may result in potential drug–drug interactions (DDIs) and alterations in drug exposure and accumulation in various tissues. In total 80 inhibitors and 738 non inhibitors were collected and the model was built by SubFP and random forest. | DOI: 10.1093/bioinformatics/bty707 | |
| Distribution | BBB permeant | ADMETSAR | - | Active/"-" = Inactive/Not predicted | Blood brain barrier (BBB) model based on 1438/401 (positive/negative) molecules based on binary model. ADMET data are collected from literature and databases, represented by fingerprints and descriptors, and the models were built by machine (deep) learning methods. ADMETopt can be used to optimize the ADMET properties of a query compound by scaffold hopping. Robustness of the model: AUC: 0.944, Accuracy: 0.907, Sensitivity: 0.921, Specificity: 0.861. | DOI: 10.1093/bioinformatics/bty707 | |
| Absorption | Caco-2 Permeability | ADMETSAR | Active | Active/"-" = Inactive/Not predicted | In total, 674 drug or drug-like molecules with Caco-2 permeability values were used with 303 positives and 371 negatives experimental values. The dataset were collected from Hai Pham The et al. (2011). The model is based on AtomPairs with Support vector machine (SVM). The binary model’s performances were AUC: 0.857, Accuracy: 0.768, Sensitivity: 0.73, Specificity: 0.799. The result of the prediction is binary Active or - (inactive). | DOI: 10.1093/bioinformatics/bty707 | |
| Metabolism | CYP1A2-inh | PKCSM | - | Active/"-" = Inactive/Not predicted | CYP1A2 inhibitor: Cytochrome P450 substrate 1A2 isoform: The prediction is based on molecular properties (Molecular Weight; Heavy Atom count; LogP; Heteroatoms count; Rotatable Bonds count; Ring count; TPSA; Labute ASA; Fluorine atom Count; Toxicophore [1-36]; Pharmacophore count) calculated using the RDKit cheminformatics toolkit and used for training the predictive models. Two different algorithms, Random Forest and Logistic Regression, did the qualitative predictions (classification tasks). Cytochrome P450 is an important detoxification enzyme in the body, mainly found in the liver. It oxidises xenobiotics to facilitate their excretion. Many drugs are deactivated by the cytochrome P450’s, and some can be activated by it. Inhibitors of this enzyme, such as grapefruit juice, can affect drug metabolism and are contraindicated. It is therefore important to assess a compound’s ability to inhibit the cytochrome P450. Model for CYP1A2 inhibitor was built using from over 14903 compounds whose ability to inhibit the cytochrome P450 1A2 has been determined. A compound is considered to be a cytochrome P450 inhibitor if the concentration required to lead to 50% inhibition is less than 10 uM. The best performing predictor in each task was chosen based 5-fold cv approach. The Weka toolkit was used for training and testing the models. How to interpret the results: The predictors will assess a given molecule to determine whether it is likely going to be a cytochrome P450 inhibitor, for a given isoform. Qualitative information were changed: "Yes" was replaced to "Active" and "No" prediction was replaced to "-". | doi: 10.1021/acs.jmedchem.5b00104 | |
| Metabolism | CYP3A4-inh | ADMETSAR | - | Active/"-" = Inactive/Not predicted | Five subgroups of CYP inhibitors were collected by Cheng et al.,[1] including 1a2, 2d6, 2c9, 2c19, and 3a4. A compound was assigned as a CYP inhibitor if the AC50 (the compound concentration leads to 50% of the activity of an inhibition control) value was 10 μM, and it was considered as a noninhibitor if AC50 was >57 μM. In addition, a compound was regarded as a CYP inhibitor if it has the PubChem activity score between 40 and 100, and as a noninhibitor if it has PubChem activity score equal to 0. Three subgroups of CYP substrates were collected by Carbon-Mangles et al., including 2d6, 2c9, and 3a4.[2] The models were built by by MACCS fingerprints and Support vector machine. | DOI: 10.1093/bioinformatics/bty707 | |
| Excretion | CL | ADMETLAB2 | 5.5 | ml/min/kg bw | CL: The clearance of a drug: The prediction is based on Multi-task Graph Attention (MGA) framework MGA is composed of input, Relation graph convolution network (RGCN) layers, attention layer and fully-connected (FC) layers. In the Input, a node represents the information of an atom, and after passing RGCN layers, the node represents general features of circular substructure centered on the atom. RGCN is an extension of the standard graph convolution network (GCN) by introducing edge features to enrich the messages used to update the hidden states in the network. Attention layers can assign different attention weights to different substructures, and then generate the customized fingerprints (CFP) from the general features for a specific task. The prediction results are mainly displayed in the tabular format in the browser, with the 2D molecular structure and a radar plot summarizing the physicochemical quality of the compound. For those endpoints predicted by the regression models concrete predictive values are provided. To obtain robust and accurate prediction models, the model training process was repeated ten times with random data splitting. The best performing models were incorporated into the online platform, and different performance measures for the training test regression model: R-square (R2) of 0.977, mean absolute error (MAE) of 0.740, and root mean squared error (RMSE) of 0.556. The fraction unbound in plasma. The clearance of a drug. Clearance is an important pharmacokinetic parameter that defines, together with the volume of distribution, the half-life, and thus the frequency of dosing of a drug. This model is based on 831 Total molecules, with 666 in the training set, 81 test set, and 84 validation set drug like molecules. How to interpret: The unit of predicted CL penetration is ml/min/kg. >15 ml/min/kg: high clearance; 5- 15 ml/min/kg: moderate clearance; < 5: poor (red). | doi: 10.1093/nar/gkab255 | |
| Transporter | Pgp substrate | VEGA | - | Active/"-" = Inactive/Not predicted | The model provides a qualitative prediction of P-Glycoprotein inhibition/substrate activity. 96 molecular descriptors were used. To further reduce the likelihood of correlations between descriptors, a Kohonen top-map was used (Drganet al., 2017). In this way, the remaining descriptors were mapped onto a network with a 7 by 7 architecture of neurons using the transpose of the descriptor matrix; two descriptors were selected from each neuron, those with the largest and the shortest Euclidean distance to the central neuron, yielding a final set of 96 molecular descriptors for further use. The dataset was collected mainly from the admet SAR database (http://lmmd.ecust.edu.cn/admetsar2) and from the work of Li et al (doi.org/10.1021/mp400450m) and contain 1785 chemicals (training set). P-Glycoprotein Activity Classification Model (NIC) is Counter Propagation Artificial Neural Network (CP ANN) Multiclass classification model Counter Propagation Artificial Neural Network (CP ANN) in combination with The genetic algorithm (GA)Mora Lagares, L., Minovski, N., & Novic, M. (2019). Multiclass Classifier for P-Glycoprotein Substrates, Inhibitors, and Non-Active Compounds. Molecules, 24(10). doi:10.3390/molecules24102006. The training set properties were of Accuracy = 0.95, Specificity = 0.95, Sensitivity = 0.95. The initial predictions were Inhibitor/Substrate/Non-active. | https://www.vegahub.eu/portfolio-item/vega-qsar-models-qrmf/ | |
| Transporter | BSEP-inh | ADMETSAR | - | Active/"-" = Inactive/Not predicted | ABCB11, more commonly referred to as BSEP (Bile Salt Export Pump) is a uni-directional, ATP-dependent efflux transporter that plays an important role in the elimination of bile salts from the hepatocyte into the bile canaliculi for export into the gastrointestinal tract (GIT). It is almost exclusively expressed in the liver, with much lower levels reported in the kidney. It is mainly of relevance to hepatotoxicity, as BSEP inhibition by a drug and/or its metabolites can result in the buildup of bile salts in the liver, which can lead to cholestasis and drug-induced liver injury (DILI). Compared to other drug transporters there are only few identified drug substrates and inhibitors of BSEP; thus, its involvement in drug-drug interactions (DDI) is very limited. The relevance of in vitro BSEP inhibition as a predictor of clinical outcomes is not clearly established, but whenever cholestatic liver injury is observed in clinical or preclinical trials, characterization of BSEP interactions should be considered. In contrast with the FDA guidance, the EMA guidance recommends consideration of in vitro BSEP inhibition testing for NCEs. In total 317 inhibitors and 290 noninhibitors were collected and the model was built by AtomParis and Support vector machine. | DOI: 10.1093/bioinformatics/bty707 | |
| Metabolism | CYP2D6-sub | PKCSM | - | Active/"-" = Inactive/Not predicted | CYP2D6 substrate: Cytochrome P450 substrate 2D6 isoform: The prediction is based on molecular properties (Molecular Weight; Heavy Atom count; LogP; Heteroatoms count; Rotatable Bonds count; Ring count; TPSA; Labute ASA; Fluorine atom Count; Toxicophore [1-36]; Pharmacophore count) calculated using the RDKit cheminformatics toolkit and used for training the predictive models. Two different algorithms, Random Forest and Logistic Regression, did the qualitative predictions (classification tasks). The cytochrome P450’s are responsible for metabolism of many drugs. However inhibitors of the P450’s can dramatically alter the pharmacokinetics of these drugs. It is therefore important to assess whether a given compound is likely to be a cytochrome P450 substrate. The two main isoforms responsible for drug metabolism are 2D6 and 3A4. These models were built using 671 compounds whose metabolism by each cytochrome P450 isoform has been measured. The best performing predictor in each task was chosen based 5-fold cv approach. The Weka toolkit was used for training and testing the models How to interpret the results: The predictor will assess whether a given molecule is likely to be metabolized by either P450. Qualitative information were changed: "Yes" was replaced to "Active" and "No" prediction was replaced to "-". | doi: 10.1021/acs.jmedchem.5b00104 | |
| Absorption | Caco-2 Permeability | ADMETLAB2 | 27.6058 | 10-6 cm/s | Caco-2 Permeability. The prediction is based on Multi-task Graph Attention (MGA) framework MGA is composed of input, Relation graph convolution network (RGCN) layers, attention layer and fully-connected (FC) layers. In the Input, a node represents the information of an atom, and after passing RGCN layers, the node represents general features of circular substructure centered on the atom. RGCN is an extension of the standard graph convolution network (GCN) by introducing edge features to enrich the messages used to update the hidden states in the network. Attention layers can assign different attention weights to different substructures, and then generate the customized fingerprints (CFP) from the general features for a specific task. The prediction results are mainly displayed in the tabular format in the browser, with the 2D molecular structure and a radar plot summarizing the physicochemical quality of the compound. For those endpoints predicted by the regression models concrete predictive values are provided. To obtain robust and accurate prediction models, the model training process was repeated ten times with random data splitting. The best performing models were incorporated into the online platform, and different performance measures for the training test regression model: R-square (R2) of 0.943, mean absolute error (MAE) of 0.152, and root mean squared error (RMSE) of 0.117. CACO-2: Before an oral drug reaches the systemic circulation, it must pass through intestinal cell membranes via passive diffusion, carrier-mediated uptake or active transport processes. The human colon adenocarcinoma cell lines (Caco-2), as an alternative approach for the human intestinal epithelium, has been commonly used to estimate in vivo drug permeability due to their morphological and functional similarities. Thus, Caco-2 cell permeability has also been an important index for an eligible candidate drug compound. This model is based on 2464 Total molecules (positive/Negative), with 1970 in the training set (positive/Negative), 247 test set (positive/Negative), and 247 validation set (positive/Negative) drug like molecules with Caco-2 permeability values and predicts the logarithm of the apparent permeability coefficient (log Papp; log cm/s). | doi: 10.1093/nar/gkab255 | |
| Excretion | T1/2 | VEGA | 1.7906 | h | This study addresses the development of QSAR models for the prediction of total body elimination half-lives. The first aim of this work is the creation of statistically valid and predictive models for the prediction of half-lives in human; the second aim is to show how QSAR predictions can be used for the refinement of chemical screening procedures for hazard assessment. kT (h-1) rate was converted to normalized biotransformation half-life value (HLT, h), and then expressed in base 10 log units LogHLT. The dataset was taken from literature (J.A. Arnot, T.N. Brown, F. Wania, Estimating screening-level organic chemical half-lives in human. Environ Sci Technol. 2014; 48:723-730) The HL dataset consists of the union of several datasets to obtain a variety of discrete organic chemical structures with a range of HL values. The final data set is composed of 1105 chemicals with molar mass ranging from 30 (formaldehyde) to 960 (decabromodiphenyl ether) g/mol. The HLs span approximately 7.5 orders of magnitude from 0.05 h (0.002d) for nitroglycerin to 2 × 106 h (83 000 d) for 2,3,4,5,2′,3′,5′,6′-octachlorobiphenyl with a median of 7.6 h (0.32 d). The corresponding rate constants range from 14/h (330/d) to 3.5 × 10–7/h (8.3 × 10–6)/d with a median of 0.091/h (2.2/d). Eighty percent of the chemicals in the HLT QSAR data set are pharmaceuticals (measured HLT) and 20% are environmental contaminants (estimated or assumed to approximate HLT). The range of LogHLTare -1.30 / 6.30 for the training set and -1.08 / 5.83 for the test set. After successful validation, the model has been retrained on the entire dataset for implementation. Dataset was splitted in training (552) and test set (553). For more details see section 6.6 and 7.6 of QMRF. LogHLT (total elimination half-life in human) OLS-MLR method. Model developed on a training set of 552 compounds LogHLT (total elimination half-life in human)_Full model OLS-MLR method. Model developed on a training set of 1105 compounds Split model equation: LogHLT= 0.5683 + 0.3299 ScCl + 0.6018AATS7p + 0.2385 nF - 0.0043 TopoPSA - 0.0484 gmax + 0.0778 GGI1 - 0.2404minsCl - 0.27 minsHsOH Full model Equation: LogHLT= 0.6577 + 0.351 ScCl + 0.5905AATS7p - 0.0042 TopoPSA + 0.2105 nF - 0.0495 gmax - 0.4298 minsCl +0.0686 GGI1 - 0.2927 minsHsOH. .Statistics for goodness-of-fit: R2= 0.78 ; CCCtr[9,10]= 0.88 ; RMSEtr=0.62 The VEGA implementation returns the following statistics on the entire dataset (1105 compounds): R2 ext = 0.77 ; MAE = 0.489 ; MSE = 0.404 ; RMSE = 0.63. | https://www.vegahub.eu/portfolio-item/vega-qsar-models-qrmf/ | |
| Transporter | Pgp substrate | vNN-ADMET | - | Active/"-" = Inactive/Not predicted | Pgp substrates: P-glycoprotein substrates. The k-nearest neighbor (k-NN) method is widely used to develop QSAR models (Zheng and Tropsha, 2000). An alternative approach is to use a predetermined similarity criterion, vNN method, which uses all nearest neighbors that meet a structural similarity criterion to define the model's applicability domain (Liu et al., 2012, 2015; Liu and Wallqvist, 2014). When no nearest neighbor meets the criterion, the vNN method makes no prediction. P-glycoprotein (Pgp) is an essential cell membrane protein that extracts many foreign substances from the cell (Ambudkar et al., 2003). As such, it is a critical determinant of the pharmacokinetic properties of drugs. Cancer cells often overexpress Pgp, which increases the efflux of chemotherapeutic agents from the cell and prevents treatment by reducing the effective intracellular concentrations of such agents—a phenomenon known as multidrug resistance (Borst and Elferink, 2002). For this reason, identifying compounds that can either be transported out of the cell by Pgp (substrates) or impair Pgp function (inhibitors) is of great interest. vNN method is based on dataset included measurements for 422 substrates and 400 non-substrates. We classified the Pgp substrates and non-substrates as positives and negatives, respectively. Overall accuracy of 79%, when using 10-fold CV, with corresponding kappa value of 0.58. These model reliably predicted 65% of the compounds in their datasets to be Pgp substrates. | doi: 10.3389/fphar.2017.00889. | |
| Metabolism | CYP2C9-sub | ADMETLAB2 | Active | Active/"-" = Inactive/Not predicted | CYP2C9 substrate: The prediction is based on Multi-task Graph Attention (MGA) framework MGA is composed of input, Relation graph convolution network (RGCN) layers, attention layer and fully-connected (FC) layers. In the Input, a node represents the information of an atom, and after passing RGCN layers, the node represents general features of circular substructure centered on the atom. RGCN is an extension of the standard graph convolution network (GCN) by introducing edge features to enrich the messages used to update the hidden states in the network. Attention layers can assign different attention weights to different substructures, and then generate the customized fingerprints (CFP) from the general features for a specific task. The prediction results are mainly displayed in the tabular format in the browser, with the 2D molecular structure and a radar plot summarizing the physicochemical quality of the compound. For those endpoints predicted by the regression models concrete predictive values are provided. To obtain robust and accurate prediction models, the model training process was repeated ten times with random data splitting. The best performing models were incorporated into the online platform, and different performance of classification models in training and validation sets: AUC: 0.967, ACC: 0.904, SP: 0.911, Sen: 0.894, MCC: 0.801. Based on the chemical nature of biotransformation, the process of drug metabolism reactions can be divided into two broad categories: phase I (oxidative reactions) and phase II (conjugative reactions). The human cytochrome P450 family (phase I enzymes) contains 57 isozymes and these isozymes metabolize approximately two-thirds of known drugs in human with 80% of this attribute to five isozymes––1A2, 3A4, 2C9, 2C19 and 2D6. Most of these CYPs responsible for phase I reactions are concentrated in the liver. This model is based on 811 (325/486) Total molecules, with 647 (259/388) in the training set, 82 (33/49) test set, and 82 (33/49) validation set drug like molecules. Leave-cluster-out validation of classification models was used for model validation. How to interpret: Category 0: Non-substrate / Non-inhibitor; Category 1: substrate / inhibitor. The output value is the probability of being substrate / inhibitor, within the range of 0 to 1. Empirical decision: If the prediction >= 0.5 the endpoint is considered “active”. If not it is considered as inactive “-“. | doi: 10.1093/nar/gkab255 | |
| Absorption | F(20%) | ADMETLAB2 | Active | Active/"-" = Inactive/Not predicted | F20%. The human oral bioavailability 20%. For any drug administrated by the oral route, oral bioavailability is undoubtedly one of the most important pharmacokinetic parameters because it is the indicator of the efficiency of the drug delivery to the systemic circulation. Result interpretation: Molecules with a bioavailability ≥ 20% were classified as F20%- (Category 0), while molecules with a bioavailability < 20% were classified as F20%+ (Category 1). | doi: 10.1093/nar/gkab255 | |
| Distribution | VDss | ADMETLAB2 | 0.542 | L/Kg | VDss: Volume Distribution: The prediction is based on Multi-task Graph Attention (MGA) framework MGA is composed of input, Relation graph convolution network (RGCN) layers, attention layer and fully-connected (FC) layers. In the Input, a node represents the information of an atom, and after passing RGCN layers, the node represents general features of circular substructure centered on the atom. RGCN is an extension of the standard graph convolution network (GCN) by introducing edge features to enrich the messages used to update the hidden states in the network. Attention layers can assign different attention weights to different substructures, and then generate the customized fingerprints (CFP) from the general features for a specific task. The prediction results are mainly displayed in the tabular format in the browser, with the 2D molecular structure and a radar plot summarizing the physicochemical quality of the compound. For those endpoints predicted by the regression models concrete predictive values are provided. To obtain robust and accurate prediction models, the model training process was repeated ten times with random data splitting. The best performing models were incorporated into the online platform, and different performance measures for the training test regression model: R-square (R2) of 0.895, mean absolute error (MAE) of 0.492, and root mean squared error (RMSE) of 0.330. Volume Distribution. The VD is a theoretical concept that connects the administered dose with the actual initial concentration present in the circulation and it is an important parameter to describe the in vivo distribution for drugs. In practical, we can speculate the distribution characters for an unknown compound according to its VD value, such as its condition binding to plasma protein, its distribution amount in body fluid and its uptake amount in tissues. This model is based on 1086 Total molecules, with 872 in the training set, 107 test set, and 107 validation set drug like molecules. How to interpret: The unit of predicted VD is L/kg. | doi: 10.1093/nar/gkab255 | |
| Transporter | OATP1B3-inh | ADMETSAR | Active | Active/"-" = Inactive/Not predicted | OATP1B3 is an uptake transporter exclusively expressed in the liver on the sinusoidal (basolateral) side of centrilobular hepatocytes. In conjunction with OATP1B1, it is responsible for the hepatic uptake of some important drug classes, notably statins, for the uptake of bile acids (in conjunction with OATP1B1 and NTCP) and bilirubin, as well as some other endogenous molecules. It is a mediator of drug interactions, but as it shares many substrates and inhibitors with another major hepatic uptake transporter, OATP1B1, its role may not be fully appreciated. The FDA and EMA recommend in vitro testing of OATP1B3 interactions for drug candidates that are eliminated in part via the liver and/or will be co-administered with OATP1B3 substrates. OATP1B3i was trained by 1743 inhibitors and 130 noninhibitors with Morgan fingerprint and random forest. | DOI: 10.1093/bioinformatics/bty707 | |
| Metabolism | CYP1A2-sub | ADMETLAB2 | - | Active/"-" = Inactive/Not predicted | CYP1A2 substrate: The prediction is based on Multi-task Graph Attention (MGA) framework MGA is composed of input, Relation graph convolution network (RGCN) layers, attention layer and fully-connected (FC) layers. In the Input, a node represents the information of an atom, and after passing RGCN layers, the node represents general features of circular substructure centered on the atom. RGCN is an extension of the standard graph convolution network (GCN) by introducing edge features to enrich the messages used to update the hidden states in the network. Attention layers can assign different attention weights to different substructures, and then generate the customized fingerprints (CFP) from the general features for a specific task. The prediction results are mainly displayed in the tabular format in the browser, with the 2D molecular structure and a radar plot summarizing the physicochemical quality of the compound. For those endpoints predicted by the regression models concrete predictive values are provided. To obtain robust and accurate prediction models, the model training process was repeated ten times with random data splitting. The best performing models were incorporated into the online platform, and different performance of classification models in training and validation sets: AUC: 0.985, ACC: 0.936, SP: 0.942, Sen: 0.929, MCC: 0.871. Based on the chemical nature of biotransformation, the process of drug metabolism reactions can be divided into two broad categories: phase I (oxidative reactions) and phase II (conjugative reactions). The human cytochrome P450 family (phase I enzymes) contains 57 isozymes and these isozymes metabolize approximately two-thirds of known drugs in human with 80% of this attribute to five isozymes––1A2, 3A4, 2C9, 2C19 and 2D6. Most of these CYPs responsible for phase I reactions are concentrated in the liver. This model is based on 366 (176/190) Total molecules, with 292 (140/152) in the training set, 37 (18/19)) test set, and 37 (18/19) validation set drug like molecules. Leave-cluster-out validation of classification models was used for model validation. How to interpret: Category 0: Non-substrate / Non-inhibitor; Category 1: substrate / inhibitor. The output value is the probability of being substrate / inhibitor, within the range of 0 to 1. Empirical decision: If the prediction >= 0.5 the endpoint is considered “active”. If not it is considered as inactive “-“. | doi: 10.1093/nar/gkab255 | |
| Transporter | Pgp substrate | ADMETSAR | - | Active/"-" = Inactive/Not predicted | The P-glycoprotein substrates (pgps+) and nonsubstrates (pgps-) were collected from two research articles. In total 718 pgps+ and 847 pgps- were obtained after prepreparation including removing salts, repetative and inorganic compounds. The model was built by Morgan fingerprint with support vector machine. | DOI: 10.1093/bioinformatics/bty707 | |
| Metabolism | CYP2D6-inh | SWISSADME | - | Active/"-" = Inactive/Not predicted | CYP2D6 inhibitor: Cytochrome P450 inhibition (drug-drug interaction): The Support Vector Machin (SVM) method (SVM) Cortes, C. & Vapnik, V. (1995) on meticulously cleansed large datasets of known inhibitors/non-inhibitors. In similar contexts, SVM was found to perform better than other machine-learning algorithms for binary classification (Mishra et al. 2010). The models return “Yes” or “No” if the molecule under investigation has higher probability to be respectively inhibitor or non-inhibitor of a given CYP. Cytochrome P450 enzymes (CYPs) constitute a superfamily of proteins that play an important role in the metabolism and detoxification of xenobiotics (Brown et al., 2008). A drug should not be rapidly metabolized by CYPs if it is to maintain an effective concentration. In addition, it should not inhibit drug-metabolizing CYPs, because such an effect could elevate the concentration of a co-administered drug and potentially lead to drug overdose—an effect known as a drug-drug interaction. CYP2D6: Cytochrome P 450 2D6 inhibitor: SVM Model built on 3664 molecules (Training set) and tested on 1068 molecules (Test set). 10 fold CV: ACC=0.79/AUC=0.85, External: ACC = 0.81 / AUC = 0.87. Results were adapted and the “Yes” and “No” indicators were changed respectively to “Active” and “-“. | ||
| Transporter | OCT1-inh | ADMETSAR | - | Active/"-" = Inactive/Not predicted | OCT1 is primarily a hepatic uptake transporter, expressed on the sinusoidal membrane (blood side) of hepatocytes. It plays a key role in the disposition and hepatic clearance of mostly cationic drugs and endogenous compounds. It functions in conjunction with MATE1 that facilitates the biliary elimination of OCT1 substrates transported into the liver. Metformin is an important clinical substrate of OCT1. Genetic polymorphisms of OCT1 are associated with altered metformin pharmacokinetics, safety, and efficacy, but the contributions of other cation transporters and their functional SNPs are also important. Since the discovery of MATEs, DDIs ascribed to OCTs are being re-evaluated, and it is likely that some interactions may be re-assigned to MATEs. Regardless of this, the role of OCT1 as the first step in active hepatic extraction of cationic drugs remains important. Current FDA and EMA guidances do not specifically recommend evaluation of OCT1 liabilities, although investigation of OCT2 or OCTs in general is advised. It is appropriate to consider evaluating OCT1 interactions for drugs that are likely to be co-administered with OCT/MATE substrates, particularly metformin. Although there is no guidance for MATEs either, simultaneous evaluation of their interactions is also advisable. | DOI: 10.1093/bioinformatics/bty707 | |
| Metabolism | CYP2C19-inh | SWISSADME | - | Active/"-" = Inactive/Not predicted | CYP2C19 inhibitor: Cytochrome P450 inhibition (drug-drug interaction): The Support Vector Machin (SVM) method (SVM) Cortes, C. & Vapnik, V. (1995) on meticulously cleansed large datasets of known inhibitors/non-inhibitors. In similar contexts, SVM was found to perform better than other machine-learning algorithms for binary classification (Mishra et al. 2010). The models return “Yes” or “No” if the molecule under investigation has higher probability to be respectively inhibitor or non-inhibitor of a given CYP. Cytochrome P450 enzymes (CYPs) constitute a superfamily of proteins that play an important role in the metabolism and detoxification of xenobiotics (Brown et al., 2008). A drug should not be rapidly metabolized by CYPs if it is to maintain an effective concentration. In addition, it should not inhibit drug-metabolizing CYPs, because such an effect could elevate the concentration of a co-administered drug and potentially lead to drug overdose—an effect known as a drug-drug interaction. CYP2C19: Cytochrome P 450 2C19 inhibitor: SVM Model is built on 9272 molecules (Training set) and tested on 3000 molecules (Test set). 10 fold CV: ACC=0.80/AUC=0.86, external: ACC = 0.80 / AUC = 0.87. | ||
| Metabolism | UGT activity | ADMETSAR | - | Active/"-" = Inactive/Not predicted | Human uridine diphosphate (UDP)-glucuronosyltransferases (UGTs) are major phase II drug-metabolizing enzymes that catalyze transfer of glucuronic acid from UDP-glucuronic acid to various substrates containing nucleophilic functional group, e.g. alcohols, phenols, carboxylic acids, amines, thiols and so forth. Up until now, 22 human UGT proteins have been identified, and they can be classified in four families: UGT1, UGT2, UGT3 and UGT8 | DOI: 10.1093/bioinformatics/bty707 | |
| Metabolism | CYP2D6-inh | ADMETSAR | - | Active/"-" = Inactive/Not predicted | Five subgroups of CYP inhibitors were collected by Cheng et al.,[1] including 1a2, 2d6, 2c9, 2c19, and 3a4. A compound was assigned as a CYP inhibitor if the AC50 (the compound concentration leads to 50% of the activity of an inhibition control) value was 10 μM, and it was considered as a noninhibitor if AC50 was >57 μM. In addition, a compound was regarded as a CYP inhibitor if it has the PubChem activity score between 40 and 100, and as a noninhibitor if it has PubChem activity score equal to 0. Three subgroups of CYP substrates were collected by Carbon-Mangles et al., including 2d6, 2c9, and 3a4.[2] The models were built by by MACCS fingerprints and Support vector machine. | DOI: 10.1093/bioinformatics/bty707 | |
| Transporter | Renal OCT2 substrate | PKCSM | - | Active/"-" = Inactive/Not predicted | Renal OCT2 substrate: Organic Cation Transporter 2: The prediction is based on molecular properties (Molecular Weight; Heavy Atom count; LogP; Heteroatoms count; Rotatable Bonds count; Ring count; TPSA; Labute ASA; Fluorine atom Count; Toxicophore [1-36]; Pharmacophore count) calculated using the RDKit cheminformatics toolkit and used for training the predictive models. Two different algorithms, Random Forest and Logistic Regression, did the qualitative predictions (classification tasks). Organic Cation Transporter 2 is a renal uptake transporter that plays an important role in disposition and renal clearance of drugs and endogenous compounds. OCT2 substrates also have the potential for adverse interactions with coadministered OCT2 inhibitors. Assessing a candidate’s potential to be transported by OCT2 provides useful information regarding not only its clearance but potential contraindications. This model was built using 906 compounds whose transport by OCT2 has been experimentally measured. The best performing predictor in each task was chosen based 5-fold cv approach. The Weka toolkit was used for training and testing the models. How to interpret the results: The predictor will assess whether a given molecule is likely to be an OCT2 substrate. Qualitative information were changed: "Yes" was replaced to "Active" and "No" prediction was replaced to "-". | doi: 10.1021/acs.jmedchem.5b00104 | |
| Metabolism | CYP2C19-inh | ADMETSAR | - | Active/"-" = Inactive/Not predicted | Five subgroups of CYP inhibitors were collected by Cheng et al.,[1] including 1a2, 2d6, 2c9, 2c19, and 3a4. A compound was assigned as a CYP inhibitor if the AC50 (the compound concentration leads to 50% of the activity of an inhibition control) value was 10 μM, and it was considered as a noninhibitor if AC50 was >57 μM. In addition, a compound was regarded as a CYP inhibitor if it has the PubChem activity score between 40 and 100, and as a noninhibitor if it has PubChem activity score equal to 0. Three subgroups of CYP substrates were collected by Carbon-Mangles et al., including 2d6, 2c9, and 3a4.[2] The models were built by by MACCS fingerprints and Support vector machine. | DOI: 10.1093/bioinformatics/bty707 | |
| Transporter | Pgp Inhibitor | VEGA | Active | Active/"-" = Inactive/Not predicted | The model provides a qualitative prediction of P-Glycoprotein inhibition/substrate activity. 96 molecular descriptors were used. To further reduce the likelihood of correlations between descriptors, a Kohonen top-map was used (Drganet al., 2017). In this way, the remaining descriptors were mapped onto a network with a 7 by 7 architecture of neurons using the transpose of the descriptor matrix; two descriptors were selected from each neuron, those with the largest and the shortest Euclidean distance to the central neuron, yielding a final set of 96 molecular descriptors for further use. The dataset was collected mainly from the admet SAR database (http://lmmd.ecust.edu.cn/admetsar2) and from the work of Li et al (doi.org/10.1021/mp400450m) and contain 1785 chemicals (training set). P-Glycoprotein Activity Classification Model (NIC) is Counter Propagation Artificial Neural Network (CP ANN) Multiclass classification model Counter Propagation Artificial Neural Network (CP ANN) in combination with The genetic algorithm (GA)Mora Lagares, L., Minovski, N., & Novic, M. (2019). Multiclass Classifier for P-Glycoprotein Substrates, Inhibitors, and Non-Active Compounds. Molecules, 24(10). doi:10.3390/molecules24102006. The training set properties were of Accuracy = 0.95, Specificity = 0.95, Sensitivity = 0.95. The initial predictions were Inhibitor/Substrate/Non-active. | https://www.vegahub.eu/portfolio-item/vega-qsar-models-qrmf/ | |
| Metabolism | CYP2C9-inh | ADMETLAB2 | - | Active/"-" = Inactive/Not predicted | CYP2C9 inhibitor: The prediction is based on Multi-task Graph Attention (MGA) framework MGA is composed of input, Relation graph convolution network (RGCN) layers, attention layer and fully-connected (FC) layers. In the Input, a node represents the information of an atom, and after passing RGCN layers, the node represents general features of circular substructure centered on the atom. RGCN is an extension of the standard graph convolution network (GCN) by introducing edge features to enrich the messages used to update the hidden states in the network. Attention layers can assign different attention weights to different substructures, and then generate the customized fingerprints (CFP) from the general features for a specific task. The prediction results are mainly displayed in the tabular format in the browser, with the 2D molecular structure and a radar plot summarizing the physicochemical quality of the compound. For those endpoints predicted by the regression models concrete predictive values are provided. To obtain robust and accurate prediction models, the model training process was repeated ten times with random data splitting. The best performing models were incorporated into the online platform, and different performance of classification models in training and validation sets: AUC: 0.960, ACC: 0.880, SP: 0.849 Sen: 0.942, MCC: 0.755. Based on the chemical nature of biotransformation, the process of drug metabolism reactions can be divided into two broad categories: phase I (oxidative reactions) and phase II (conjugative reactions). The human cytochrome P450 family (phase I enzymes) contains 57 isozymes and these isozymes metabolize approximately two-thirds of known drugs in human with 80% of this attribute to five isozymes––1A2, 3A4, 2C9, 2C19 and 2D6. Most of these CYPs responsible for phase I reactions are concentrated in the liver. This model is based on 12111 (4017/8094), Total molecules, with 9686 (3213/6473) in the training set, 1213 (402/811) test set, and 1212 (402/810) validation set drug like molecules. Leave-cluster-out validation of classification models was used for model validation. How to interpret: Category 0: Non-substrate / Non-inhibitor; Category 1: substrate / inhibitor. The output value is the probability of being substrate / inhibitor, within the range of 0 to 1. Empirical decision: If the prediction >= 0.5 the endpoint is considered “active”. If not it is considered as inactive “-“. | doi: 10.1093/nar/gkab255 | |
| Distribution | BBB permeant | ADMETLAB2 | - | Active/"-" = Inactive/Not predicted | BBB Penetration: The prediction is based on Multi-task Graph Attention (MGA) framework MGA is composed of input, Relation graph convolution network (RGCN) layers, attention layer and fully-connected (FC) layers. In the Input, a node represents the information of an atom, and after passing RGCN layers, the node represents general features of circular substructure centered on the atom. RGCN is an extension of the standard graph convolution network (GCN) by introducing edge features to enrich the messages used to update the hidden states in the network. Attention layers can assign different attention weights to different substructures, and then generate the customized fingerprints (CFP) from the general features for a specific task. The prediction results are mainly displayed in the tabular format in the browser, with the 2D molecular structure and a radar plot summarizing the physicochemical quality of the compound. For those endpoints predicted by the regression models concrete predictive values are provided. To obtain robust and accurate prediction models, the model training process was repeated ten times with random data splitting. The best performing models were incorporated into the online platform, and different performance of classification models in training and validation sets: AUC: 0.992, ACC: 0.957, SP: 0.948, Sen: 0.964, MCC: 0.912. Drugs that act in the CNS need to cross the blood–brain barrier (BBB) to reach their molecular target. By contrast, for drugs with a peripheral target, little or no BBB penetration might be required in order to avoid CNS side effects. This model is based on 2865 (1651 pos/1254 neg) Total molecules, with 2324 (1321/1003) in the training set, 290 (165/125) test set, and 291 (165/126) validation set drug like molecules. Leave-cluster-out validation of classification models was used for model validation. How to interpret: The unit of BBB penetration is cm/s. Molecules with logBB > -1 were classified as BBB+ (Category 1), while molecules with logBB ≤ -1 were classified as BBB- (Category 0). The output value is the probability of being BBB+, within the range of 0 to 1. | doi: 10.1093/nar/gkab255 | |
| Metabolism | CYP3A4-inh | ADMETLAB2 | Active | Active/"-" = Inactive/Not predicted | CYP3A4 inhibitor: The prediction is based on Multi-task Graph Attention (MGA) framework MGA is composed of input, Relation graph convolution network (RGCN) layers, attention layer and fully-connected (FC) layers. In the Input, a node represents the information of an atom, and after passing RGCN layers, the node represents general features of circular substructure centered on the atom. RGCN is an extension of the standard graph convolution network (GCN) by introducing edge features to enrich the messages used to update the hidden states in the network. Attention layers can assign different attention weights to different substructures, and then generate the customized fingerprints (CFP) from the general features for a specific task. The prediction results are mainly displayed in the tabular format in the browser, with the 2D molecular structure and a radar plot summarizing the physicochemical quality of the compound. For those endpoints predicted by the regression models concrete predictive values are provided. To obtain robust and accurate prediction models, the model training process was repeated ten times with random data splitting. The best performing models were incorporated into the online platform, and different performance of classification models in training and validation sets: AUC: 0.960, ACC: 0.891, SP: 0.869 Sen: 0.922, MCC: 0.781. Based on the chemical nature of biotransformation, the process of drug metabolism reactions can be divided into two broad categories: phase I (oxidative reactions) and phase II (conjugative reactions). The human cytochrome P450 family (phase I enzymes) contains 57 isozymes and these isozymes metabolize approximately two-thirds of known drugs in human with 80% of this attribute to five isozymes––1A2, 3A4, 2C9, 2C19 and 2D6. Most of these CYPs responsible for phase I reactions are concentrated in the liver. This model is based on 12339 (5092/7247), Total molecules, with 9880 (4074/5806) in the training set, 1232 (510/722) test set, and 1227 (508/719) validation set drug like molecules. Leave-cluster-out validation of classification models was used for model validation. How to interpret: Category 0: Non-substrate / Non-inhibitor; Category 1: substrate / inhibitor. The output value is the probability of being substrate / inhibitor, within the range of 0 to 1. Empirical decision: If the prediction >= 0.5 the endpoint is considered “active”. If not it is considered as inactive “-“. | doi: 10.1093/nar/gkab255 | |
| Absorption | Kp | PKCSM | 0.5658 | 10-6 cm/s | Skin Permeability (log Kp): The prediction is based on molecular properties (Molecular Weight; Heavy Atom count; LogP; Heteroatoms count; Rotatable Bonds count; Ring count; TPSA; Labute ASA; Fluorine atom Count; Toxicophore [1-36]; Pharmacophore count) calculated using the RDKit cheminformatics toolkit and used for training the predictive models. Two different algorithms, Gaussian Processes and Model Tree Regression did the quantitative predictions (regression tasks). Skin permeability is a significant consideration for many consumer products efficacy, and of interest for the development of transdermal drug delivery. The best performing predictor in each task was chosen based on 5-fold cv approach. The Weka toolkit was used for training and testing the models. This predictor was built using 211 compounds whose in vitro human skin permeability has been measured. How to interpret the results. It predicts whether if given compound is likely to be skin permeable, expressed as the skin permeability constant logKp (cm/h). A compound is considered to have a relatively low skin permeability if it has a logKp > -2.5. DOI: 10.1016/j.taap.2014.12.013. The data were transformed in cm/s to be more easily compared to other tools. | doi: 10.1021/acs.jmedchem.5b00104 | |
| Distribution | CNS permeability | PKCSM | 1.1641 | (µL/min/g brain) | CNS permeability: central nervous system (CNS) permeability (alternative to blood-brain barrier permeability). The prediction is based on molecular properties (Molecular Weight; Heavy Atom count; LogP; Heteroatoms count; Rotatable Bonds count; Ring count; TPSA; Labute ASA; Fluorine atom Count; Toxicophore [1-36]; Pharmacophore count) calculated using the RDKit cheminformatics toolkit and used for training the predictive models. Two different algorithms, Gaussian Processes and Model Tree Regression did the quantitative predictions (regression tasks). Measuring blood brain permeability can difficult with confounding factors. The blood-brain permeability-surface area product (logPS) is a more direct measurement. It is obtained from in situ brain perfusions with the compound directly injected into the carotid artery. This lacks the systemic distribution effects which may distort brain penetration. This predictive model was built using 153 compounds whose logPS has been experimentally measured. PS (measured in the unit mL/min/g brain). [PS = −F ln(1 – (Kin/F))] where F is the cerebral blood or perfusion flow rate and Kin is the unidirectional transfer constant. [Kin = (Qbr/Cpf)/T] , in which Qbr is the concentration, corrected for the vascular volume, of compound in the brain, Cpf is the concentration of compound in the perfusion fluid and T is the perfusion time. The best performing predictor in each task was chosen based 10-fold cv approach. The Weka toolkit was used for training and testing the models How to interpret the results: Compounds with a logPS > -2 are considered to penetrate the Central Nervous System (CNS), while those with logPS < -3 are considered as unable to penetrate the CNS. | doi: 10.1021/acs.jmedchem.5b00104 | |
| Metabolism | CYP3A4-sub | ADMETSAR | - | Active/"-" = Inactive/Not predicted | Five subgroups of CYP inhibitors were collected by Cheng et al.,[1] including 1a2, 2d6, 2c9, 2c19, and 3a4. A compound was assigned as a CYP inhibitor if the AC50 (the compound concentration leads to 50% of the activity of an inhibition control) value was 10 μM, and it was considered as a noninhibitor if AC50 was >57 μM. In addition, a compound was regarded as a CYP inhibitor if it has the PubChem activity score between 40 and 100, and as a noninhibitor if it has PubChem activity score equal to 0. Three subgroups of CYP substrates were collected by Carbon-Mangles et al., including 2d6, 2c9, and 3a4.[2] The models were built by by MACCS fingerprints and Support vector machine. | DOI: 10.1093/bioinformatics/bty707 | |
| Metabolism | CYP3A4-inh | SWISSADME | - | Active/"-" = Inactive/Not predicted | CYP3A4 inhibitors: Cytochrome P450 inhibition (drug-drug interaction): The Support Vector Machin method (SVM _ Cortes, C. & Vapnik, V. (1995)) was applied on meticulously cleansed large datasets of known inhibitors/non-inhibitors. In similar contexts, SVM was found to perform better than other machine-learning algorithms for binary classification (Mishra et al. 2010). The models return “Yes” or “No” if the molecule under investigation has higher probability to be respectively inhibitor or non-inhibitor of a given CYP. Cytochrome P450 enzymes (CYPs) constitute a superfamily of proteins that play an important role in the metabolism and detoxification of xenobiotics (Brown et al., 2008). A drug should not be rapidly metabolized by CYPs if it is to maintain an effective concentration. In addition, it should not inhibit drug-metabolizing CYPs, because such an effect could elevate the concentration of a co-administered drug and potentially lead to drug overdose—an effect known as a drug-drug interaction. CYP3A4: Cytochrome P 450 3A4 inhibitor: SVM Model built on 7518 molecules (Training set) and tested on 2579 molecules (Test set). 10 fold CV: ACC=0.77/AUC=0.85, External: ACC = 0.78 / AUC = 0.86. Results were adapted and the “Yes” and “No” indicators were changed respectively to “Active” and “-“. | ||
| Metabolism | CYP2C9-inh | vNN-ADMET | - | Active/"-" = Inactive/Not predicted | CYP2C9 Cytochrome P450 inhibition (drug-drug interaction): The k-nearest neighbor (k-NN) method is widely used to develop QSAR models (Zheng and Tropsha, 2000). An alternative approach is to use a predetermined similarity criterion, vNN method, which uses all nearest neighbors that meet a structural similarity criterion to define the model's applicability domain (Liu et al., 2012, 2015; Liu and Wallqvist, 2014). When no nearest neighbor meets the criterion, the vNN method makes no prediction. Cytochrome P450 enzymes (CYPs) constitute a superfamily of proteins that play an important role in the metabolism and detoxification of xenobiotics (Brown et al., 2008). A drug should not be rapidly metabolized by CYPs if it is to maintain an effective concentration. In addition, it should not inhibit drug-metabolizing CYPs, because such an effect could elevate the concentration of a co-administered drug and potentially lead to drug overdose—an effect known as a drug-drug interaction. CYP2C9: CYP inhibitors from ChEMBL (Bento et al., 2014) and classified them as inhibitors if the IC50 was below 10 μM. VNN medel was applied on the base of dataset of 8,072 molecules Tanimoto-distance thresold value of 0.50 Accurancy 0.91 sensitivity 0.55 Specificty 0.96 kappa 0.54 coverage 0.76. Results were adapted and the “Yes” and “No” indicators were changed respectively to “Active” and “-“. | doi: 10.3389/fphar.2017.00889. | |
| Excretion | T1/2 | ADMETLAB2 | Active | Active/"-" = Inactive/Not predicted | T1/2 substrate: half time below or upper than 3 hours. The prediction is based on Multi-task Graph Attention (MGA) framework MGA is composed of input, Relation graph convolution network (RGCN) layers, attention layer and fully-connected (FC) layers. In the Input, a node represents the information of an atom, and after passing RGCN layers, the node represents general features of circular substructure centered on the atom. RGCN is an extension of the standard graph convolution network (GCN) by introducing edge features to enrich the messages used to update the hidden states in the network. Attention layers can assign different attention weights to different substructures, and then generate the customized fingerprints (CFP) from the general features for a specific task. The prediction results are mainly displayed in the tabular format in the browser, with the 2D molecular structure and a radar plot summarizing the physicochemical quality of the compound. For those endpoints predicted by the regression models concrete predictive values are provided. To obtain robust and accurate prediction models, the model training process was repeated ten times with random data splitting. The best performing models were incorporated into the online platform, and different performance of classification models in training and validation sets: AUC: 0.948, ACC: 0.869, SP: 0.822, Sen: 0.938, MCC: 0.746. The half-life of a drug is a hybrid concept that involves clearance and volume of distribution, and it is arguably more appropriate to have reliable estimates of these two properties instead. This model is based on 1219 (500/719) Total molecules, with 973 (399/574) in the training set, 124 (51/73) test set, and 122 (50/72) validation set drug like molecules. Leave-cluster-out validation of classification models was used for model validation. How to interpret: Molecules with T1/2 > 3 hours were classified as T1/2 - (Category 0), while molecules with T1/2 ≤ 3 hours were classified as T1/2 + (Category 1). The output value is the probability of being T1/2+, within the range of 0 to 1. Empirical decision: If the prediction >= 0.5 the endpoint is considered “active”. If not it is considered as inactive “-“. | doi: 10.1093/nar/gkab255 | |
| Distribution | BBB permeant | PKCSM | 0.6622 | Brain/blood partition coefficient (no unit) | BBB permeability: blood-brain barrier (BBB) permeability. The prediction is based on molecular properties (Molecular Weight; Heavy Atom count; LogP; Heteroatoms count; Rotatable Bonds count; Ring count; TPSA; Labute ASA; Fluorine atom Count; Toxicophore [1-36]; Pharmacophore count) calculated using the RDKit cheminformatics toolkit and used for training the predictive models. Two different algorithms, Gaussian Processes and Model Tree Regression did the quantitative predictions (regression tasks). The brain is protected from exogenous compounds by the blood-brain barrier (BBB). The ability of a drug to cross into the brain is an important parameter to consider to help reduce side effects and toxicities or to improve the efficacy of drugs whose pharmacological activity is within the brain. Blood-brain permeability is measured in vivo in animal models as logBB, the logarithmic ratio of brain to plasma drug concentrations. LogBB = log (C brain/ C blood). This predictive model was built using 320 compounds whose logBB has been experimentally measured. The best performing predictor in each task was chosen based 10-fold cv. approach. The Weka toolkit was used for training and testing the models. | doi: 10.1021/acs.jmedchem.5b00104 | |
| Absorption | F(30%) | ADMETLAB2 | Active | Active/"-" = Inactive/Not predicted | F30%. The human oral bioavailability 30%. For any drug administrated by the oral route, oral bioavailability is undoubtedly one of the most important pharmacokinetic parameters because it is the indicator of the efficiency of the drug delivery to the systemic circulation. Result interpretation: Molecules with a bioavailability ≥ 30% were classified as F30%- (Category 0), while molecules with a bioavailability < 30% were classified as F30%+ (Category 1). The output value is the probability of being F30%+, within the range of 0 to 1. Empirical decision: 0-0.3: excellent (green); 0.3-0.7: medium (yellow); 0.7-1.0(++): poor (red). If the prediction is upper or equal to 0.5 the molecules is considered as “Active”. If not the molecules is noted “-“. | doi: 10.1093/nar/gkab255 | |
| Metabolism | CYP-inh Pro | ADMETSAR | - | Active/"-" = Inactive/Not predicted | Five subgroups of CYP inhibitors were collected by Cheng et al.,[1] including 1a2, 2d6, 2c9, 2c19, and 3a4. A compound was assigned as a CYP inhibitor if the AC50 (the compound concentration leads to 50% of the activity of an inhibition control) value was 10 μM, and it was considered as a noninhibitor if AC50 was >57 μM. In addition, a compound was regarded as a CYP inhibitor if it has the PubChem activity score between 40 and 100, and as a noninhibitor if it has PubChem activity score equal to 0. Three subgroups of CYP substrates were collected by Carbon-Mangles et al., including 2d6, 2c9, and 3a4.[2] The models were built by by MACCS fingerprints and Support vector machine. | DOI: 10.1093/bioinformatics/bty707 | |
| Metabolism | CYP3A4-inh | vNN-ADMET | - | Active/"-" = Inactive/Not predicted | CYP3A4 Cytochrome P450 inhibition (drug-drug interaction): The k-nearest neighbor (k-NN) method is widely used to develop QSAR models (Zheng and Tropsha, 2000). An alternative approach is to use a predetermined similarity criterion, vNN method, which uses all nearest neighbors that meet a structural similarity criterion to define the model's applicability domain (Liu et al., 2012, 2015; Liu and Wallqvist, 2014). When no nearest neighbor meets the criterion, the vNN method makes no prediction. Cytochrome P450 enzymes (CYPs) constitute a superfamily of proteins that play an important role in the metabolism and detoxification of xenobiotics (Brown et al., 2008). A drug should not be rapidly metabolized by CYPs if it is to maintain an effective concentration. In addition, it should not inhibit drug-metabolizing CYPs, because such an effect could elevate the concentration of a co-administered drug and potentially lead to drug overdose—an effect known as a drug-drug interaction. CYP3A4: CYP inhibitors from ChEMBL (Bento et al., 2014) and classified them as inhibitors if the IC50 was below 10 μM. VNN medel was applied on the base of dataset of 10,373 molecules Tanimoto-distance threshold value of 0.50 Accuracy, 0.88 sensitivity 0.76, Specificity 0.92, kappa 0.68, coverage 0.78. Results were adapted and the “Yes” and “No” indicators were changed respectively to “Active” and “-“. | doi: 10.3389/fphar.2017.00889. | |
| Metabolism | CYP2C9-inh | PKCSM | - | Active/"-" = Inactive/Not predicted | CYP2C9 inhibitor: Cytochrome P450 substrate 2C9 isoform: The prediction is based on molecular properties (Molecular Weight; Heavy Atom count; LogP; Heteroatoms count; Rotatable Bonds count; Ring count; TPSA; Labute ASA; Fluorine atom Count; Toxicophore [1-36]; Pharmacophore count) calculated using the RDKit cheminformatics toolkit and used for training the predictive models. Two different algorithms, Random Forest and Logistic Regression, did the qualitative predictions (classification tasks). Cytochrome P450 is an important detoxification enzyme in the body, mainly found in the liver. It oxidises xenobiotics to facilitate their excretion. Many drugs are deactivated by the cytochrome P450’s, and some can be activated by it. Inhibitors of this enzyme, such as grapefruit juice, can affect drug metabolism and are contraindicated. It is therefore important to assess a compound’s ability to inhibit the cytochrome P450. Model for CYP2C9 inhibitor was built using from over 14709 compounds whose ability to inhibit the cytochrome P450 2C9 has been determined. A compound is considered to be a cytochrome P450 inhibitor if the concentration required to lead to 50% inhibition is less than 10 uM. The best performing predictor in each task was chosen based 5-fold cv approach. The Weka toolkit was used for training and testing the models. How to interpret the results: The predictors will assess a given molecule to determine whether it is likely going to be a cytochrome P450 inhibitor, for a given isoform. Qualitative information were changed: "Yes" was replaced to "Active" and "No" prediction was replaced to "-". | doi: 10.1021/acs.jmedchem.5b00104 | |
| Distribution | FU | ADMETLAB2 | 21.2616 | % | FU: Fraction unbound: The prediction is based on Multi-task Graph Attention (MGA) framework MGA is composed of input, Relation graph convolution network (RGCN) layers, attention layer and fully-connected (FC) layers. In the Input, a node represents the information of an atom, and after passing RGCN layers, the node represents general features of circular substructure centered on the atom. RGCN is an extension of the standard graph convolution network (GCN) by introducing edge features to enrich the messages used to update the hidden states in the network. Attention layers can assign different attention weights to different substructures, and then generate the customized fingerprints (CFP) from the general features for a specific task. The prediction results are mainly displayed in the tabular format in the browser, with the 2D molecular structure and a radar plot summarizing the physicochemical quality of the compound. For those endpoints predicted by the regression models concrete predictive values are provided. To obtain robust and accurate prediction models, the model training process was repeated ten times with random data splitting. The best performing models were incorporated into the online platform, and different performance measures for the training test regression model: R-square (R2) of 0.861, mean absolute error (MAE) of 0.268, and root mean squared error (RMSE) of 0.197. The fraction unbound in plasma. Most drugs in plasma will exist in equilibrium between either an unbound state or bound to serum proteins. Efficacy of a given drug may be affect by the degree to which it binds proteins within blood, as the more that is bound the less efficiently it can traverse cellular membranes or diffuse. This model is based on 2575 Total molecules, with 2059 in the training set, 258 test set, and 258 validation set drug like molecules. | doi: 10.1093/nar/gkab255 | |
| Absorption | HOB | ADMETSAR | - | Active/"-" = Inactive/Not predicted | In total, 995 molecules were collected from Kim et al. (2014) , including 509 positive and 486 negative compounds. Compounds with logK(%F) > 0 were considered as positive. The model is based on Morgan fingerprint descriptor and random forest algorithm. The binary model’s performances were AUC: 0.752, Accuracy: 0.697, Sensitivity: 0.739, Specificity: 0.654. The result of the prediction is binary Active or - (inactive). | DOI: 10.1093/bioinformatics/bty707 | |
| Transporter | Pgp II Inhibitor | PKCSM | - | Active/"-" = Inactive/Not predicted | Pgp inhibitor II: The P-glycoprotein I is an ATP-binding cassette (ABC) transporter. The prediction is based on molecular properties (Molecular Weight; Heavy Atom count; LogP; Heteroatoms count; Rotatable Bonds count; Ring count; TPSA; Labute ASA; Fluorine atom Count; Toxicophore [1-36]; Pharmacophore count) calculated using the RDKit cheminformatics toolkit and used for training the predictive models. Two different algorithms, Random Forest and Logistic Regression, did the qualitative predictions (classification tasks). The best performing predictor in each task was chosen. The Weka toolkit was used for training and testing the models. The best performing predictor in each task was chosen based on 5-fold cv approach. The Weka toolkit was used for training and testing the models. P-glycoprotein I and II inhibitors: Modulation of P-glycoprotein mediated transport has significant pharmacokinetic implications for Pgp substrates, which may either be exploited for specific therapeutic advantages or result in contraindications. This predictive models were build using 1273 and 1275 compounds that have been characterized for their ability to inhibit P-glycoprotein I and P-glycoprotein II transport, respectively. How to interpret the results: The predictor will determine is a given compound is likely to be a P-glycoprotein I/II inhibitor. | doi: 10.1021/acs.jmedchem.5b00104 | |
| Transporter | Pgp substrate | ADMETLAB2 | - | Active/"-" = Inactive/Not predicted | Pgp substrate: substrate of P-glycoprotein. The prediction is based on Multi-task Graph Attention (MGA) framework MGA is composed of input, Relation graph convolution network (RGCN) layers, attention layer and fully-connected (FC) layers. In the Input, a node represents the information of an atom, and after passing RGCN layers, the node represents general features of circular substructure centered on the atom. RGCN is an extension of the standard graph convolution network (GCN) by introducing edge features to enrich the messages used to update the hidden states in the network. Attention layers can assign different attention weights to different substructures, and then generate the customized fingerprints (CFP) from the general features for a specific task. The prediction results are mainly displayed in the tabular format in the browser, with the 2D molecular structure and a radar plot summarizing the physicochemical quality of the compound. For those endpoints predicted by the regression models concrete predictive values are provided. To obtain robust and accurate prediction models, the model training process was repeated ten times with random data splitting. The best performing models were incorporated into the online platform, and different performance of classification models in training and validation sets: AUC: 1.000, ACC 1.000, SP: 1.000, Sen: 1.000, MCC: 1.000. As described in the Pgp-inhibitor section, modulation of P-glycoprotein mediated transport has significant pharmacokinetic implications for Pgp substrates, which may either be exploited for specific therapeutic advantages or result in contraindications. This model is based on 1185 (586/599) molecules, 949 (471/478) Total in the training set, 118 (58/60) test set, and 118 (57/61) validation set drug like molecules. Leave-cluster-out validation of classification models was used for model validation. How to interpret: Category 0: Non-substrate / Non-inhibitor; Category 1: substrate / inhibitor. The output value is the probability of being substrate / inhibitor, within the range of 0 to 1. Empirical decision: If the prediction >= 0.5 the endpoint is considered “active”. If not it is considered as inactive “-“. | doi: 10.1093/nar/gkab255 | |
| Distribution | BBB permeant | SWISSADME | - | Active/"-" = Inactive/Not predicted | BBB: blood-brain barrier: The predictions for passive human gastrointestinal absorption (HIA) and blood-brain barrier (BBB) permeation both consist in the readout of the BOILED-Egg model (Daina, A. & Zoete, V. A BOILED-Egg To Predict Gastrointestinal Absorption and Brain Penetration of Small Molecules. ChemMedChem 11, 1117–1121 (2016), an intuitive graphical classification model, which can be displayed in the SwissADME result page by clicking the red button appearing below the sketcher when all input molecules have been processed (refer to Graphical Output). This models are based on the computation of the lipophilicity (WLOGP) and polarity (tPSA). Combining both best ellipses yields the BOILED‐Egg predictive model for respectively HIA and BBB. The white region is the physicochemical space of molecules with highest probability of being absorbed by the gastrointestinal tract, and the yellow region (yolk) is the physicochemical space of molecules with highest probability to permeate to the brain. Yolk and white areas are not mutually exclusive. Other binary classification models are included, which focus on the propensity for a given small molecule to be substrate or inhibitor of proteins governing important pharmacokinetic behaviours. blood-brain barrier: according to the yolk of the BOILED-egg. doi/10.1002/cmdc.201600182. The "High", "low" and "null" predictions were replaced respectively by "Active", "-", and "NP" for not predicted. | ||
| Metabolism | CYP1A2-inh | SWISSADME | - | Active/"-" = Inactive/Not predicted | CYP1A2 inhibitor: Cytochrome P450 inhibition (drug-drug interaction). The Support Vector Machin (SVM) method (SVM) Cortes, C. & Vapnik, V. (1995) on meticulously cleansed large datasets of known inhibitors/non-inhibitors. In similar contexts, SVM was found to perform better than other machine-learning algorithms for binary classification (Mishra et al. 2010). The models return “Yes” or “No” if the molecule under investigation has higher probability to be respectively inhibitor or non-inhibitor of a given CYP. Cytochrome P450 enzymes (CYPs) constitute a superfamily of proteins that play an important role in the metabolism and detoxification of xenobiotics (Brown et al., 2008). A drug should not be rapidly metabolized by CYPs if it is to maintain an effective concentration. In addition, it should not inhibit drug-metabolizing CYPs, because such an effect could elevate the concentration of a co-administered drug and potentially lead to drug overdose—an effect known as a drug-drug interaction. Cytochrome P 4501A2 inhibitor: SVM Model built on 9145 molecules (Training set) and tested on 3000 molecules (Test set). 10 fold CV: ACC=0.83/AUC=0.90, external: ACC = 0.84 / AUC = 0.91. Results were adapted and the “Yes” and “No” indicators were changed respectively to “Active” and “-“. | ||
| Metabolism | CYP3A4-inh | PKCSM | - | Active/"-" = Inactive/Not predicted | CYP3A4 inhibitor: Cytochrome P450 substrate 3A4 isoform: The prediction is based on molecular properties (Molecular Weight; Heavy Atom count; LogP; Heteroatoms count; Rotatable Bonds count; Ring count; TPSA; Labute ASA; Fluorine atom Count; Toxicophore [1-36]; Pharmacophore count) calculated using the RDKit cheminformatics toolkit and used for training the predictive models. Two different algorithms, Random Forest and Logistic Regression, did the qualitative predictions (classification tasks). Cytochrome P450 is an important detoxification enzyme in the body, mainly found in the liver. It oxidises xenobiotics to facilitate their excretion. Many drugs are deactivated by the cytochrome P450’s, and some can be activated by it. Inhibitors of this enzyme, such as grapefruit juice, can affect drug metabolism and are contraindicated. It is therefore important to assess a compound’s ability to inhibit the cytochrome P450. Model for CYP3A4 inhibitor was built using from over 18561 compounds whose ability to inhibit the cytochrome P450 2D6 has been determined. A compound is considered to be a cytochrome P450 inhibitor if the concentration required to lead to 50% inhibition is less than 10 uM. The best performing predictor in each task was chosen based 5-fold cv approach. The Weka toolkit was used for training and testing the models. How to interpret the results: The predictors will assess a given molecule to determine whether it is likely going to be a cytochrome P450 inhibitor, for a given isoform. Qualitative information were changed: "Yes" was replaced to "Active" and "No" prediction was replaced to "-". | doi: 10.1021/acs.jmedchem.5b00104 | |
| Excretion | CL | PKCSM | 1.5959 | ml/min/kg bw | CLtot: Total Clearance. The prediction is based on molecular properties (Molecular Weight; Heavy Atom count; LogP; Heteroatoms count; Rotatable Bonds count; Ring count; TPSA; Labute ASA; Fluorine atom Count; Toxicophore [1-36]; Pharmacophore count) calculated using the RDKit cheminformatics toolkit and used for training the predictive models. Two different algorithms, Gaussian Processes and Model Tree Regression did the quantitative predictions (regression tasks). Drug clearance is measured by the proportionality constant CLtot, and occurs primarily as a combination of hepatic clearance (metabolism in the liver and biliary clearance) and renal clearance (excretion via the kidneys). It is related to bioavailability, and is important for determining dosing rates to achieve steady[1]state concentrations. This predictor was built using the total clearance data for 398 compounds. The best performing predictor in each task was chosen based train/test approach. The Weka toolkit was used for training and testing the models. How to interpret the results: The predicted total clearance log(CLtot) of a given compound is given in log(ml/min/kg). | doi: 10.1021/acs.jmedchem.5b00104 | |
| Distribution | PPB | ADMETLA2 | 73.718 | % | PPB: Plasma protein binding: The prediction is based on Multi-task Graph Attention (MGA) framework MGA is composed of input, Relation graph convolution network (RGCN) layers, attention layer and fully-connected (FC) layers. In the Input, a node represents the information of an atom, and after passing RGCN layers, the node represents general features of circular substructure centered on the atom. RGCN is an extension of the standard graph convolution network (GCN) by introducing edge features to enrich the messages used to update the hidden states in the network. Attention layers can assign different attention weights to different substructures, and then generate the customized fingerprints (CFP) from the general features for a specific task. The prediction results are mainly displayed in the tabular format in the browser, with the 2D molecular structure and a radar plot summarizing the physicochemical quality of the compound. For those endpoints predicted by the regression models concrete predictive values are provided. To obtain robust and accurate prediction models, the model training process was repeated ten times with random data splitting. The best performing models were incorporated into the online platform, and different performance measures for the training test regression model: R-square (R2) of 0.961, mean absolute error (MAE) of 0.054, and root mean squared error (RMSE) of 0.037. One of the major mechanisms of drug uptake and distribution is through PPB, thus the binding of a drug to proteins in plasma has a strong influence on its pharmacodynamic behavior. PPB can directly influence the oral bioavailability because the free concentration of the drug is at stake when a drug binds to serum proteins in this process. This model is based on 4712 Total molecules, with 3771 in the training set, 479 test set, and 480 validation set drug like molecules with PPB values. | doi: 10.3389/fphar.2017.00889 | |
| Distribution | Kab | VEGA | 0.4029 | Adipose/blood partition coefficient (no unit) | Adipose tissue:blood partition coefficient (Kab). Key-endpoint to predict the bioaccumulation and the pharmacokinetics in humans and animals, since other organ: blood affinities can be estimated as a function of this parameter. 101 in vivo data of Kab measured in rats retrieved from one review and several paper in literature ([5]-[10]) The dataset contains mono-constituent organic chemicals belonging to different categories and uses: drugs, plant protection products, polychlorinated biphenyls, volatile organic compounds. All chemicals’ names were converted in SMILES using the CIR and REST node of KNIME v 3.5.1; CAS were retrieved form ChemIDplus and PubChem. Consistence among the CAS numbers, the chemical names and chemical structures of all substances were checked. All structures were standardized and normalized. All duplicates were removed. The experimental data (in vivo tissue: plasma partition coefficients) were converted in adipose tissue: blood partition coefficient by dividing the partition coefficients determined in plasma by the blood-to-plasma ration. Due to the lack of experimental values of the blood-to-plasma ratio, we considered this value equal to 0.55 for acid drugs and equal to 1 for the remaining chemicals. All the values of KAB were changed to their base-10 logarithms. Dataset were split into training (63) and test (38) according to three criteria: 1) The covered range of the experimental activity 2) The chemical structures representativeness (using PCA on Padel descriptors) 3) A balanced distribution between training and test set chemicals with respect to their ionisation state. Model is based on Random Forest (RF) approach. Machine Learning Algorithim for Regression. The number of trees selected for the RF were finely tuned by identifying the onset of the plateau of the curve describing the Q2LOO as a function of the number of trees. Six PaDEL-Descriptors v. 2.21 PaDEL compute all the 2D descriptors: ALogp2, ATSC1s, BCUTp-1l, minHBd, XLogP, WTPT-5. Robustness - Statistics obtained by leave-one-out cross-validation: Q2 LOO = 0.73. Robustness - Statistics obtained by other methods: MAE LOO = 0.41 (Mean Absolute Error calculated for leave one out). | https://www.vegahub.eu/portfolio-item/vega-qsar-models-qrmf/ | |
| Transporter | Pgp Inhibitor | ADMETSAR | - | Active/"-" = Inactive/Not predicted | The P-glycoprotein inhibitors (pgpi+) and noninhibitors (pgpi-) were collected from four research articles. In total 1172 pgpi+ and 771 pgpi- were obtained after prepreparation including removing salts, repetative and inorganic compounds. The model was built by AtomPairs fingerprint with support vector machine. | DOI: 10.1093/bioinformatics/bty707 | |
| Transporter | BRCP-inh | ADMETSAR | - | Active/"-" = Inactive/Not predicted | ABCG2, more commonly referred to as BCRP (Breast Cancer Resistance Protein), is an efflux transporter that serves two major drug transport functions. Firstly, it restricts the distribution of its substrates into organs such as the brain, testes, placenta, and across the gastrointestinal tract (GIT). Secondly, it eliminates its substrates from excretory organs, mediating both biliary and renal excretion, and occasionally direct gut secretion. Although less well studied than e.g. MDR1, BCRP is generally co-expressed with MDR1, and shares many of its substrates, inhibitors and inducers. Of its known substrates, rosuvastatin has been implicated in DDI, especially with perpetrator drugs that also inhibit OATPs (e.g. cyclosporine). It is probable that a synergy exists between the action of BCRP, MDR1, and the drug-metabolizing enzyme CYP3A4, particularly in the GIT. BCRP is included in the list of important drug transporters that both the FDA and EMA consider necessary to investigate regarding liabilities for NCEs. Drugs whose ADME, and bioavailability in particular, is influenced by BCRP may require clinical investigation to reveal a potential DDI with potent clinical BCRP inhibitors. For instance, since the GIT absorption of rosuvastatin is modulated by BCRP, it may be necessary to study the impact of BCRP inhibitors on the oral absorption of rosuvastatin. Because of the potential synergy between BCRP, CYP3A4, and MDR1, a clinical investigation examining the contribution of both drug transporters and enzymes to drug ADME may be necessary. In total 432 inhibitors and 538 noninhibitors were collected and the model was built by AtomPairs fingerprint and Support vector machine. | DOI: 10.1093/bioinformatics/bty707 | |
| Metabolism | CYP2D6-inh | ADMETLAB2 | - | Active/"-" = Inactive/Not predicted | CYP2D6 inhibitor: The prediction is based on Multi-task Graph Attention (MGA) framework MGA is composed of input, Relation graph convolution network (RGCN) layers, attention layer and fully-connected (FC) layers. In the Input, a node represents the information of an atom, and after passing RGCN layers, the node represents general features of circular substructure centered on the atom. RGCN is an extension of the standard graph convolution network (GCN) by introducing edge features to enrich the messages used to update the hidden states in the network. Attention layers can assign different attention weights to different substructures, and then generate the customized fingerprints (CFP) from the general features for a specific task. The prediction results are mainly displayed in the tabular format in the browser, with the 2D molecular structure and a radar plot summarizing the physicochemical quality of the compound. For those endpoints predicted by the regression models concrete predictive values are provided. To obtain robust and accurate prediction models, the model training process was repeated ten times with random data splitting. The best performing models were incorporated into the online platform, and different performance of classification models in training and validation sets: AUC: 0.973, ACC: 0.880, SP: 0.866 Sen: 0.958, MCC: 0.715. Based on the chemical nature of biotransformation, the process of drug metabolism reactions can be divided into two broad categories: phase I (oxidative reactions) and phase II (conjugative reactions). The human cytochrome P450 family (phase I enzymes) contains 57 isozymes and these isozymes metabolize approximately two-thirds of known drugs in human with 80% of this attribute to five isozymes––1A2, 3A4, 2C9, 2C19 and 2D6. Most of these CYPs responsible for phase I reactions are concentrated in the liver. This model is based on 13073 (2535 positive /10538 negative), Total molecules, with 10471 (2032/8439) in the training set, 1304 (255/1051) test set, and 1298 (250/1048) validation set drug like molecules. Leave-cluster-out validation of classification models was used for model validation. How to interpret: Category 0: Non-substrate / Non-inhibitor; Category 1: substrate / inhibitor. The output value is the probability of being substrate / inhibitor, within the range of 0 to 1. Empirical decision: If the prediction >= 0.5 the endpoint is considered “active”. If not it is considered as inactive “-“. | doi: 10.1093/nar/gkab255 | |
| Metabolism | CYP1A2-inh | ADMETSAR | - | Active/"-" = Inactive/Not predicted | Five subgroups of CYP inhibitors were collected by Cheng et al.,[1] including 1a2, 2d6, 2c9, 2c19, and 3a4. A compound was assigned as a CYP inhibitor if the AC50 (the compound concentration leads to 50% of the activity of an inhibition control) value was 10 μM, and it was considered as a noninhibitor if AC50 was >57 μM. In addition, a compound was regarded as a CYP inhibitor if it has the PubChem activity score between 40 and 100, and as a noninhibitor if it has PubChem activity score equal to 0. Three subgroups of CYP substrates were collected by Carbon-Mangles et al., including 2d6, 2c9, and 3a4.[2] The models were built by by MACCS fingerprints and Support vector machine. | DOI: 10.1093/bioinformatics/bty707 | |
| Metabolism | CYP2C19-inh | ADMETLAB2 | - | Active/"-" = Inactive/Not predicted | CYP2C19 inhibitor: The prediction is based on Multi-task Graph Attention (MGA) framework MGA is composed of input, Relation graph convolution network (RGCN) layers, attention layer and fully-connected (FC) layers. In the Input, a node represents the information of an atom, and after passing RGCN layers, the node represents general features of circular substructure centered on the atom. RGCN is an extension of the standard graph convolution network (GCN) by introducing edge features to enrich the messages used to update the hidden states in the network. Attention layers can assign different attention weights to different substructures, and then generate the customized fingerprints (CFP) from the general features for a specific task. The prediction results are mainly displayed in the tabular format in the browser, with the 2D molecular structure and a radar plot summarizing the physicochemical quality of the compound. For those endpoints predicted by the regression models concrete predictive values are provided. To obtain robust and accurate prediction models, the model training process was repeated ten times with random data splitting. The best performing models were incorporated into the online platform, and different performance of classification models in training and validation sets: AUC: 0.952, ACC: 0.877, SP: 0.845, Sen: 0.916, MCC: 0.758. Based on the chemical nature of biotransformation, the process of drug metabolism reactions can be divided into two broad categories: phase I (oxidative reactions) and phase II (conjugative reactions). The human cytochrome P450 family (phase I enzymes) contains 57 isozymes and these isozymes metabolize approximately two-thirds of known drugs in human with 80% of this attribute to five isozymes––1A2, 3A4, 2C9, 2C19 and 2D6. Most of these CYPs responsible for phase I reactions are concentrated in the liver. This model is based on 12611 (5770/6841) Total molecules, with 10096 (4618/5478) in the training set, 1257 (577/680) test set, and 1258 (575/683) validation set drug like molecules. Leave-cluster-out validation of classification models was used for model validation. How to interpret: Category 0: Non-substrate / Non-inhibitor; Category 1: substrate / inhibitor. The output value is the probability of being substrate / inhibitor, within the range of 0 to 1. Empirical decision: If the prediction >= 0.5 the endpoint is considered “active”. If not it is considered as inactive “-“. | doi: 10.1093/nar/gkab255 | |
| Transporter | Pgp substrate | SWISSADME | - | Active/"-" = Inactive/Not predicted | Pgp substrates: P-glycoprotein substrates. Implementation within SwissADME enriched the graphical output with the prediction of P-gp substrate, which is the most important active efflux mechanism involved in those biological barriers (refer to the SVM model described in Pharmacokinetics). As a result, the user conveniently obtains on the same graph a global evaluation about passive absorption (inside/outside the white), passive brain access (inside/outside the yolk) and active efflux from the CNS or to the gastrointestinal lumen by colour-coding: blue dots for P-gp substrates (PGP+) and red dots for P-gp non-substrate (PGP−). The SVM model was based on the training set (TR: 1033) and then appled on the test set (TS: 415). The 10-fold cross-validation accuracy: 0.72, 10-fold cross-validation area under receiver operating characteristic (ROC) curve: 0.77, external validation accuracy: 0.89, external validation area under ROC curve: 0.94. | ||
| Distribution | VDss | PKCSM | 0.4592 | L/Kg | VDss (Human): Distribution Human Volume of Distribution at steady state. The prediction is based on molecular properties (Molecular Weight; Heavy Atom count; LogP; Heteroatoms count; Rotatable Bonds count; Ring count; TPSA; Labute ASA; Fluorine atom Count; Toxicophore [1-36]; Pharmacophore count) calculated using the RDKit cheminformatics toolkit and used for training the predictive models. Two different algorithms, Gaussian Processes and Model Tree Regression did the quantitative predictions (regression tasks). The steady state volume of distribution (VDss) is the theoretical volume that the total dose of a drug would need to be uniformly distributed to give the same concentration as in blood plasma. The higher the VD is, the more of a drug is distributed in tissue rather than plasma. It can be affected by renal failure and dehydration. This predictive model was built using the calculated steady state volume of distribution (VDss) in humans from 670 drugs. The best performing predictor in each task was chosen based on Leave-one-out cv and Train/test approach. The Weka toolkit was used for training and testing the models. The predicted logarithm of VDss of a given compound is given as the log L/kg. How to interpret the results: VDss is considered low if below 0.71 L/kg (log VDss < -0.15) and high if above 2.81 L/kg (log VDss > 0.45). | doi: 10.1021/acs.jmedchem.5b00104 | |
| Transporter | OATP1B1-inh | ADMETSAR | Active | Active/"-" = Inactive/Not predicted | OATP1B1 is an uptake transporter exclusively expressed on the sinusoidal side of hepatocytes. It is responsible for the hepatic uptake of drugs and endogenous compounds from the blood. OATP1B1 substrates often, but by no means always, contain a carboxylic acid moiety. Some important therapeutic drugs, most notably HMG-CoA inhibitors also known as statins, are substrates and/or inhibitors of OATP1B1. Inhibition of OATP1B1 can result in supra-proportional systemic exposure of the victim drug. This is particularly important for members of the statin class of medicines, where elevated blood concentrations due to inhibition of hepatic uptake can result in myopathy and rhabdomyolysis. Complex drug interactions involving OATP1B1, other uptake and efflux transporters, and drug-metabolizing enzymes (DMEs) have been described, as have clinically important genetic polymorphisms, resulting in label recommendations, dose adjustments, and product withdrawals. The FDA and EMA recommend in vitro testing of OATP1B1 interactions for drug candidates that are eliminated in part via the liver and/or will be co-administered with OATP1B1 substrates .OATP1B1i was trained by 1657 inhibitors and 198 noninhibitors with Morgan fingerprint and support vector machine. | DOI: 10.1093/bioinformatics/bty707 | |
| Metabolism | CYP2D6-sub | ADMETSAR | - | Active/"-" = Inactive/Not predicted | Five subgroups of CYP inhibitors were collected by Cheng et al.,[1] including 1a2, 2d6, 2c9, 2c19, and 3a4. A compound was assigned as a CYP inhibitor if the AC50 (the compound concentration leads to 50% of the activity of an inhibition control) value was 10 μM, and it was considered as a noninhibitor if AC50 was >57 μM. In addition, a compound was regarded as a CYP inhibitor if it has the PubChem activity score between 40 and 100, and as a noninhibitor if it has PubChem activity score equal to 0. Three subgroups of CYP substrates were collected by Carbon-Mangles et al., including 2d6, 2c9, and 3a4.[2] The models were built by by MACCS fingerprints and Support vector machine. | DOI: 10.1093/bioinformatics/bty707 | |
| Transporter | Pgp substrate | PKCSM | - | Active/"-" = Inactive/Not predicted | Pgp Substrate: The P-glycoprotein is an ATP-binding cassette (ABC) transporter. The prediction is based on molecular properties (Molecular Weight; Heavy Atom count; LogP; Heteroatoms count; Rotatable Bonds count; Ring count; TPSA; Labute ASA; Fluorine atom Count; Toxicophore [1-36]; Pharmacophore count) calculated using the RDKit cheminformatics toolkit and used for training the predictive models. Two different algorithms, Random Forest and Logistic Regression, did the qualitative predictions (classification tasks). The best performing predictor in each task was chosen. The Weka toolkit was used for training and testing the models. The best performing predictor in each task was chosen based on 5-fold cv approach. The Weka toolkit was used for training and testing the models. The P-glycoprotein is an ATP-binding cassette (ABC) transporter. It functions as a biological barrier by extruding toxins and xenobiotics out of cells. P-glycoprotein transport screening is performed using transgenic mdr knockout mice and in vitro cell systems. This model was built using 332 compounds that have been characterized for their ability to be transported by Pgp. How to interpret the results: The model predicts whether a given compound is likely to be a substrate of Pgp or not. | doi: 10.1021/acs.jmedchem.5b00104 | |
| Metabolism | CYP2D6-inh | vNN-ADMET | - | Active/"-" = Inactive/Not predicted | CYP2D6 Cytochrome P450 inhibition (drug-drug interaction): The k-nearest neighbor (k-NN) method is widely used to develop QSAR models (Zheng and Tropsha, 2000). An alternative approach is to use a predetermined similarity criterion, vNN method, which uses all nearest neighbors that meet a structural similarity criterion to define the model's applicability domain (Liu et al., 2012, 2015; Liu and Wallqvist, 2014). When no nearest neighbor meets the criterion, the vNN method makes no prediction. Cytochrome P450 enzymes (CYPs) constitute a superfamily of proteins that play an important role in the metabolism and detoxification of xenobiotics (Brown et al., 2008). A drug should not be rapidly metabolized by CYPs if it is to maintain an effective concentration. In addition, it should not inhibit drug-metabolizing CYPs, because such an effect could elevate the concentration of a co-administered drug and potentially lead to drug overdose—an effect known as a drug-drug interaction. CYP2D6: Cytochrome P450 Inhibition (Drug-Drug Interaction): CYP inhibitors from ChEMBL (Bento et al., 2014) and classified them as inhibitors if the IC50 was below 10 μM. VNN medel was applied on the base of dataset of 7,805 molecules Tanimoto-distance thresold value of 0.50 Accurancy, 0.89 sensitivity 0.61, Specificty 0.94, kappa 0.57, coverage 0.75.Results were adapted and the “Yes” and “No” indicators were changed respectively to “Active” and “-“. | doi: 10.3389/fphar.2017.00889. | |
| Transporter | OATP2B1-inh | ADMETSAR | - | Active/"-" = Inactive/Not predicted | OATP2B1 is a ubiquitously expressed uptake transporter with broad substrate specificity. It mostly transports organic anionic endo- and xenobiotics, and its activity appears to be pH-dependent. OATP2B1 is primarily associated with the oral absorption of drugs, notably fexofenadine, whose PK is altered when intestinal OATPs and/or MDR1 are inhibited. Its expression in the liver and other tissues, as well as the results of in vitro studies, suggest a broader role in drug ADME, DDI, and toxicology; these aspects, however, are not well understood or characterized. The FDA and EMA guidances recommend evaluation of OATP drug interaction liabilities, but do not specifically recommend investigation of OATP2B1. OATP2B1i was trained by 44 inhibitors and 175 noninhibitors with AtomPairs fingerprint and random forest. | DOI: 10.1093/bioinformatics/bty707 | |
| Metabolism | CYP2C19-inh | vNN-ADMET | - | Active/"-" = Inactive/Not predicted | CYP2C19 Cytochrome P450 inhibition (drug-drug interaction): The k-nearest neighbor (k-NN) method is widely used to develop QSAR models (Zheng and Tropsha, 2000). An alternative approach is to use a predetermined similarity criterion, vNN method, which uses all nearest neighbors that meet a structural similarity criterion to define the model's applicability domain (Liu et al., 2012, 2015; Liu and Wallqvist, 2014). When no nearest neighbor meets the criterion, the vNN method makes no prediction. Cytochrome P450 enzymes (CYPs) constitute a superfamily of proteins that play an important role in the metabolism and detoxification of xenobiotics (Brown et al., 2008). A drug should not be rapidly metabolized by CYPs if it is to maintain an effective concentration. In addition, it should not inhibit drug-metabolizing CYPs, because such an effect could elevate the concentration of a co-administered drug and potentially lead to drug overdose—an effect known as a drug-drug interaction. CYP2C19: CYP inhibitors from ChEMBL (Bento et al., 2014) and classified them as inhibitors if the IC50 was below 10 μM. VNN medel was applied on the base of dataset of 8,155 molecules Tanimoto-distance thresold value of 0.50 Accurancy 0.87 sensitivity 0.64 Specificty 0.93 kappa 0.58 coverage 0.76 Results were adapted and the “Yes” and “No” indicators were changed respectively to “Active” and “-“. | doi: 10.3389/fphar.2017.00889. | |
| Metabolism | CYP3A4-sub | ADMETLAB2 | - | Active/"-" = Inactive/Not predicted | CYP3A4 substrate: The prediction is based on Multi-task Graph Attention (MGA) framework MGA is composed of input, Relation graph convolution network (RGCN) layers, attention layer and fully-connected (FC) layers. In the Input, a node represents the information of an atom, and after passing RGCN layers, the node represents general features of circular substructure centered on the atom. RGCN is an extension of the standard graph convolution network (GCN) by introducing edge features to enrich the messages used to update the hidden states in the network. Attention layers can assign different attention weights to different substructures, and then generate the customized fingerprints (CFP) from the general features for a specific task. The prediction results are mainly displayed in the tabular format in the browser, with the 2D molecular structure and a radar plot summarizing the physicochemical quality of the compound. For those endpoints predicted by the regression models concrete predictive values are provided. To obtain robust and accurate prediction models, the model training process was repeated ten times with random data splitting. The best performing models were incorporated into the online platform, and different performance of classification models in training and validation sets: AUC: 0.948, ACC: 0.887, SP: 0.920, Sen: 0.855, MCC: 0.776. Based on the chemical nature of biotransformation, the process of drug metabolism reactions can be divided into two broad categories: phase I (oxidative reactions) and phase II (conjugative reactions). The human cytochrome P450 family (phase I enzymes) contains 57 isozymes and these isozymes metabolize approximately two-thirds of known drugs in human with 80% of this attribute to five isozymes––1A2, 3A4, 2C9, 2C19 and 2D6. Most of these CYPs responsible for phase I reactions are concentrated in the liver. This model is based on 979 (497/482) Total molecules, with 786 (397/389) in the training set, 97 (49/48) test set, and 96 (51/45) validation set drug like molecules. Leave-cluster-out validation of classification models was used for model validation. How to interpret: Category 0: Non-substrate / Non-inhibitor; Category 1: substrate / inhibitor. The output value is the probability of being substrate / inhibitor, within the range of 0 to 1. Empirical decision: If the prediction >= 0.5 the endpoint is considered “active”. If not it is considered as inactive “-“. | doi: 10.1093/nar/gkab255 | |
| Metabolism | CYP2D6-inh | PKCSM | - | Active/"-" = Inactive/Not predicted | CYP2D6 inhibitor: Cytochrome P450 substrate 2D6 isoform: The prediction is based on molecular properties (Molecular Weight; Heavy Atom count; LogP; Heteroatoms count; Rotatable Bonds count; Ring count; TPSA; Labute ASA; Fluorine atom Count; Toxicophore [1-36]; Pharmacophore count) calculated using the RDKit cheminformatics toolkit and used for training the predictive models. Two different algorithms, Random Forest and Logistic Regression, did the qualitative predictions (classification tasks). Cytochrome P450 is an important detoxification enzyme in the body, mainly found in the liver. It oxidises xenobiotics to facilitate their excretion. Many drugs are deactivated by the cytochrome P450’s, and some can be activated by it. Inhibitors of this enzyme, such as grapefruit juice, can affect drug metabolism and are contraindicated. It is therefore important to assess a compound’s ability to inhibit the cytochrome P450. Model for CYP2D6 inhibitor was built using from over 14741 compounds whose ability to inhibit the cytochrome P450 2D6 has been determined. A compound is considered to be a cytochrome P450 inhibitor if the concentration required to lead to 50% inhibition is less than 10 uM. The best performing predictor in each task was chosen based 5-fold cv approach. The Weka toolkit was used for training and testing the models. How to interpret the results: The predictors will assess a given molecule to determine whether it is likely going to be a cytochrome P450 inhibitor, for a given isoform. Qualitative information were changed: "Yes" was replaced to "Active" and "No" prediction was replaced to "-". | doi: 10.1021/acs.jmedchem.5b00104 | |
| Absorption | Caco-2 Permeability | PKCSM | 1.9588 | 10-6 cm/s | Caco-2 Permeability. The prediction is based on molecular properties (Molecular Weight; Heavy Atom count; LogP; Heteroatoms count; Rotatable Bonds count; Ring count; TPSA; Labute ASA; Fluorine atom Count; Toxicophore [1-36]; Pharmacophore count) calculated using the RDKit cheminformatics toolkit and used for training the predictive models. Two different algorithms, Gaussian Processes and Model Tree Regression did the quantitative predictions (regression tasks). The best performing predictor in each task was chosen based on 5-fold cv. The Weka toolkit was used for training and testing the models. The Caco-2 cell line is composed of human epithelial colorectal adenocarcinoma cells. The Caco 2 monolayer of cells is widely used as an in vitro model of the human intestinal mucosa to predict the absorption of orally administered drugs. This model is based on 674 drug like molecules with Caco-2 permeability values and predicts the logarithm of the apparent permeability coefficient (log Papp; log cm/s). How to interpret: A compound is considered to have a high Caco-2 permeability if it has a Papp > 8 x 10-6 cm/s. For the pkCSM predictive model, high Caco-2 permeability would translate in predicted values > 0.90. More information: DOI 10.1021/acs.jmedchem.5b00104 | doi: 10.1021/acs.jmedchem.5b00104 | |
| Transporter | OCT2-inh | ADMETSAR | - | Active/"-" = Inactive/Not predicted | OCT2 is a primarily renal uptake transporter that is expressed on the basolateral (blood) side of proximal tubule cells. It plays a key role in the disposition and renal clearance of mostly cationic drugs and endogenous compounds. It functions in conjunction with MATE1 and MATE2-K which facilitate the elimination of OCT2 substrates into the urine. Important clinical substrates include metformin and cisplatin. Gene polymorphisms of OCT2 are associated with altered metformin and cisplatin pharmacokinetics and toxicity, but the role of other cation transporters, and their functional SNPs are also important. Since the discovery of MATEs, DDIs ascribed to OCT2 are being re-evaluated, and it is likely that some interactions may be re-assigned to MATEs. Regardless of this, the role of OCT2 as the first step in active renal secretion of cationic drugs remains important. Current FDA and EMA guidelines recommend evaluation of OCT2 liabilities for drugs with high renal elimination, or which are likely to be co-administered with OCT2 substrates such as metformin. Simultaneous evaluation of MATE interactions is also advisable. In total 244 inhibitors and 633 noninhibitors were collected and the model was built by MACCS fingerprint and random forest. | DOI: 10.1093/bioinformatics/bty707 | |
| Metabolism | CYP2C19-inh | PKCSM | - | Active/"-" = Inactive/Not predicted | CYP2C19 inhibitor: Cytochrome P450 substrate 2C19 isoform: The prediction is based on molecular properties (Molecular Weight; Heavy Atom count; LogP; Heteroatoms count; Rotatable Bonds count; Ring count; TPSA; Labute ASA; Fluorine atom Count; Toxicophore [1-36]; Pharmacophore count) calculated using the RDKit cheminformatics toolkit and used for training the predictive models. Two different algorithms, Random Forest and Logistic Regression, did the qualitative predictions (classification tasks). Cytochrome P450 is an important detoxification enzyme in the body, mainly found in the liver. It oxidises xenobiotics to facilitate their excretion. Many drugs are deactivated by the cytochrome P450’s, and some can be activated by it. Inhibitors of this enzyme, such as grapefruit juice, can affect drug metabolism and are contraindicated. It is therefore important to assess a compound’s ability to inhibit the cytochrome P450. Model for CYP2C19 inhibitor was built using from over 14576 compounds whose ability to inhibit the cytochrome P450 2C19 has been determined. A compound is considered to be a cytochrome P450 inhibitor if the concentration required to lead to 50% inhibition is less than 10 uM. The best performing predictor in each task was chosen based 5-fold cv approach. The Weka toolkit was used for training and testing the models. How to interpret the results: The predictors will assess a given molecule to determine whether it is likely going to be a cytochrome P450 inhibitor, for a given isoform. | doi: 10.1021/acs.jmedchem.5b00104 | |
| Absorption | HIA | PKCSM | 67.285 | % of Absorption | Human Intestinal Absorption (HIA): The prediction is based on molecular properties (Molecular Weight; Heavy Atom count; LogP; Heteroatoms count; Rotatable Bonds count; Ring count; TPSA; Labute ASA; Fluorine atom Count; Toxicophore [1-36]; Pharmacophore count) calculated using the RDKit cheminformatics toolkit and used for training the predictive models. Two different algorithms, Gaussian Processes and Model Tree Regression did the quantitative predictions (regression tasks). The Intestine is normally the primary site for absorption of a drug from an orally administered solution. This method is built to predict the proportion of compounds that were absorbed through the human small intestine. How to interpret the results: For a given compound it predicts the percentage that will be absorbed through the human intestine. The best performing predictor in each task was chosen based on train/test approach. The Weka toolkit was used for training and testing the models. The Human Intestinal Absorption (HIA) model is based on 552 drug information. How to interpret the results: For a given compound it predicts the percentage that will be absorbed through the human intestine. | doi: 10.1021/acs.jmedchem.5b00104 | |
| Absorption | Kp | VEGA | Potts and Guy method | 0.1613 | 10-6 cm/s | The model is based on a dataset of 271 compounds. Following the criteria reported in the OECD guideline 428, only data obtained in compliance with the following features have been kept: - All the data are retrieved from “in vitro” experiments - Data are collected from human skin experiments - Studies concerned skin application of chemicals dissolved in water, aqueous solution, water gel, PBS and distilled water - The buffer solution at a pH of 7.4 - The permeation coefficients were measured under comparable circumstances. The model is an application of the Potts and Guy equation to the entire dataset. For this reason, a splitting into training and test set is not provided. | https://www.vegahub.eu/portfolio-item/vega-qsar-models-qrmf/ | 
| Metabolism | CYP2C9-sub | ADMETSAR | - | Active/"-" = Inactive/Not predicted | Five subgroups of CYP inhibitors were collected by Cheng et al.,[1] including 1a2, 2d6, 2c9, 2c19, and 3a4. A compound was assigned as a CYP inhibitor if the AC50 (the compound concentration leads to 50% of the activity of an inhibition control) value was 10 μM, and it was considered as a noninhibitor if AC50 was >57 μM. In addition, a compound was regarded as a CYP inhibitor if it has the PubChem activity score between 40 and 100, and as a noninhibitor if it has PubChem activity score equal to 0. Three subgroups of CYP substrates were collected by Carbon-Mangles et al., including 2d6, 2c9, and 3a4.[2] The models were built by by MACCS fingerprints and Support vector machine. | DOI: 10.1093/bioinformatics/bty707 | |
| Absorption | HIA | ADMETSAR | Active | Active/"-" = Inactive/Not predicted | The Intestine is normally the primary site for absorption of a drug from an oraly administered solution. This method is built to predict the proportion of compounds that were absorbed through the human small intestine. The entire dataset were collected from Shen et al. (2010), which included 578 compounds (500 HIA+ and 78 HIA- compounds). How to interpret the results: For a given compound it predicts the percentage that will be absorbed through the human intestine. If a compound with the HIA% is less than 30%, it is labeled as -, otherwise it is labeled as Active. | DOI: 10.1093/bioinformatics/bty707 | |
| Transporter | Pgp Inhibitor | ADMETLAB2 | - | Active/"-" = Inactive/Not predicted | Pgp inhibitor: inhibitor of P-glycoprotein. The prediction is based on Multi-task Graph Attention (MGA) framework MGA is composed of input, Relation graph convolution network (RGCN) layers, attention layer and fully-connected (FC) layers. In the Input, a node represents the information of an atom, and after passing RGCN layers, the node represents general features of circular substructure centered on the atom. RGCN is an extension of the standard graph convolution network (GCN) by introducing edge features to enrich the messages used to update the hidden states in the network. Attention layers can assign different attention weights to different substructures, and then generate the customized fingerprints (CFP) from the general features for a specific task. The prediction results are mainly displayed in the tabular format in the browser, with the 2D molecular structure and a radar plot summarizing the physicochemical quality of the compound. For those endpoints predicted by the regression models concrete predictive values are provided. To obtain robust and accurate prediction models, the model training process was repeated ten times with random data splitting. The best performing models were incorporated into the online platform, and different performance of classification models in training and validation sets: AUC: 1.000, ACC 0.994, SP: 0.993, Sen: 0.994, MCC: 0.987. The inhibitor of P-glycoprotein. The P-glycoprotein, also known as MDR1 or 2 ABCB1, is a membrane protein member of the ATP-binding cassette (ABC) transporters superfamily. It is probably the most promiscuous efflux transporter, since it recognizes a number of structurally different and apparently unrelated xenobiotics; notably, many of them are also CYP3A4 substrates. This model is based on 2209 (1315/894) Total molecules, 1764 (1051/713) in the training set, 222 (132/90) test set, and 223 (132/91) validation set drug like molecules. Leave-cluster-out validation of classification models was used for model validation. How to interpret: Category 0: Non-substrate / Non-inhibitor; Category 1: substrate / inhibitor. | doi: 10.1093/nar/gkab255 | |
| Distribution | BBB permeant | vNN-ADMET | - | Active/"-" = Inactive/Not predicted | The k-nearest neighbor (k-NN) method is widely used to develop QSAR models (Zheng and Tropsha, 2000). An alternative approach is to use a predetermined similarity criterion, vNN method, which uses all nearest neighbors that meet a structural similarity criterion to define the model's applicability domain (Liu et al., 2012, 2015; Liu and Wallqvist, 2014). When no nearest neighbor meets the criterion, the vNN method makes no prediction. All information are available at doi: 10.3389/fphar.2017.00889. 353 compounds whose BBB permeability values (logBB) were obtained from the literature (Muehlbacher et al., 2011; Naef, 2015). | doi: 10.3389/fphar.2017.00889. | |
| Metabolism | CYP1A2-inh | ADMETLAB2 | Active | Active/"-" = Inactive/Not predicted | CYP1A2 inhibitor: The prediction is based on Multi-task Graph Attention (MGA) framework MGA is composed of input, Relation graph convolution network (RGCN) layers, attention layer and fully-connected (FC) layers. In the Input, a node represents the information of an atom, and after passing RGCN layers, the node represents general features of circular substructure centered on the atom. RGCN is an extension of the standard graph convolution network (GCN) by introducing edge features to enrich the messages used to update the hidden states in the network. Attention layers can assign different attention weights to different substructures, and then generate the customized fingerprints (CFP) from the general features for a specific task. The prediction results are mainly displayed in the tabular format in the browser, with the 2D molecular structure and a radar plot summarizing the physicochemical quality of the compound. For those endpoints predicted by the regression models concrete predictive values are provided. To obtain robust and accurate prediction models, the model training process was repeated ten times with random data splitting. The best performing models were incorporated into the online platform, and different performance of classification models in training and validation sets: AUC: 0.972, ACC: 0.914, SP: 0.898, Sen: 0.932, MCC: 0.828. Based on the chemical nature of biotransformation, the process of drug metabolism reactions can be divided into two broad categories: phase I (oxidative reactions) and phase II (conjugative reactions). The human cytochrome P450 family (phase I enzymes) contains 57 isozymes and these isozymes metabolize approximately two-thirds of known drugs in human with 80% of this attribute to five isozymes––1A2, 3A4, 2C9, 2C19 and 2D6. Most of these CYPs responsible for phase I reactions are concentrated in the liver. This model is based on 12635 (5876 positive /6759 negative) Total molecules, with 10111 (4702/5425) in the training set, 1261 (588/673) test set, and 1263 (586/677) validation set drug like molecules. Leave-cluster-out validation of classification models was used for model validation. How to interpret: Category 0: Non-substrate / Non-inhibitor; Category 1: substrate / inhibitor. The output value is the probability of being substrate / inhibitor, within the range of 0 to 1. Empirical decision: If the prediction >= 0.5 the endpoint is considered “active”. If not it is considered as inactive “-“. | doi: 10.1093/nar/gkab255 | |
| Metabolism | CYP3A4-sub | PKCSM | - | Active/"-" = Inactive/Not predicted | CYP3A4 substrate: Cytochrome P450 substrate 3A4 isoform: The prediction is based on molecular properties (Molecular Weight; Heavy Atom count; LogP; Heteroatoms count; Rotatable Bonds count; Ring count; TPSA; Labute ASA; Fluorine atom Count; Toxicophore [1-36]; Pharmacophore count) calculated using the RDKit cheminformatics toolkit and used for training the predictive models. Two different algorithms, Random Forest and Logistic Regression, did the qualitative predictions (classification tasks). The cytochrome P450’s are responsible for metabolism of many drugs. However inhibitors of the P450’s can dramatically alter the pharmacokinetics of these drugs. It is therefore important to assess whether a given compound is likely to be a cytochrome P450 substrate. The two main isoforms responsible for drug metabolism are 2D6 and 3A4. These models were built using 671 compounds whose metabolism by each cytochrome P450 isoform has been measured. The best performing predictor in each task was chosen based 5-fold cv approach. The Weka toolkit was used for training and testing the models How to interpret the results: The predictor will assess whether a given molecule is likely to be metabolized by either P450. Qualitative information were changed: "Yes" was replaced to "Active" and "No" prediction was replaced to "-". | doi: 10.1021/acs.jmedchem.5b00104 | |
| Absorption | Kp | VEGA | Ten Berge method | 0.1379 | 10-6 cm/s | The model is based on a dataset of 271 compounds. Following the criteria reported in the OECD guideline 428, only data obtained in compliance with the following features have been kept: All the data are retrieved from “in vitro” experiments Data are collected from human skin experiments Studies concerned skin application of chemicals dissolved in water, aqueous solution, water gel, PBS and distilled water The buffer solution at a pH of 7.4 The permeation coefficients were measured under comparable circumstances. The model is an application of the ten Berge equation to the entire dataset. For this reason, a splitting into training and test set is not provided. 4.Defining the algorithm - OECD Principle. | https://www.vegahub.eu/portfolio-item/vega-qsar-models-qrmf/ | 
| Metabolism | CYP2C19-sub | ADMETLAB2 | - | Active/"-" = Inactive/Not predicted | CYP2C19 substrate: The prediction is based on Multi-task Graph Attention (MGA) framework MGA is composed of input, Relation graph convolution network (RGCN) layers, attention layer and fully-connected (FC) layers. In the Input, a node represents the information of an atom, and after passing RGCN layers, the node represents general features of circular substructure centered on the atom. RGCN is an extension of the standard graph convolution network (GCN) by introducing edge features to enrich the messages used to update the hidden states in the network. Attention layers can assign different attention weights to different substructures, and then generate the customized fingerprints (CFP) from the general features for a specific task. The prediction results are mainly displayed in the tabular format in the browser, with the 2D molecular structure and a radar plot summarizing the physicochemical quality of the compound. For those endpoints predicted by the regression models concrete predictive values are provided. To obtain robust and accurate prediction models, the model training process was repeated ten times with random data splitting. The best performing models were incorporated into the online platform, and different performance of classification models in training and validation sets: AUC: 0.974, ACC: 0.928, SP: 0.894, Sen: 0.977, MCC: 0.859. Based on the chemical nature of biotransformation, the process of drug metabolism reactions can be divided into two broad categories: phase I (oxidative reactions) and phase II (conjugative reactions). The human cytochrome P450 family (phase I enzymes) contains 57 isozymes and these isozymes metabolize approximately two-thirds of known drugs in human with 80% of this attribute to five isozymes––1A2, 3A4, 2C9, 2C19 and 2D6. Most of these CYPs responsible for phase I reactions are concentrated in the liver. This model is based on 258 (107/151) Total molecules, with 206 (85/121) in the training set, 26 (11/15) test set, and 26 (11/15) validation set drug like molecules. Leave-cluster-out validation of classification models was used for model validation. How to interpret: Category 0: Non-substrate / Non-inhibitor; Category 1: substrate / inhibitor. The output value is the probability of being substrate / inhibitor, within the range of 0 to 1. | doi: 10.1093/nar/gkab255 | |
| Metabolism | CYP2D6-sub | ADMETLAB2 | - | Active/"-" = Inactive/Not predicted | CYP2D6 substrate: The prediction is based on Multi-task Graph Attention (MGA) framework MGA is composed of input, Relation graph convolution network (RGCN) layers, attention layer and fully-connected (FC) layers. In the Input, a node represents the information of an atom, and after passing RGCN layers, the node represents general features of circular substructure centered on the atom. RGCN is an extension of the standard graph convolution network (GCN) by introducing edge features to enrich the messages used to update the hidden states in the network. Attention layers can assign different attention weights to different substructures, and then generate the customized fingerprints (CFP) from the general features for a specific task. The prediction results are mainly displayed in the tabular format in the browser, with the 2D molecular structure and a radar plot summarizing the physicochemical quality of the compound. For those endpoints predicted by the regression models concrete predictive values are provided. To obtain robust and accurate prediction models, the model training process was repeated ten times with random data splitting. The best performing models were incorporated into the online platform, and different performance of classification models in training and validation sets: AUC: 0.947, ACC: 0.893, SP: 0.849, Sen: 0.937, MCC: 0.788. Based on the chemical nature of biotransformation, the process of drug metabolism reactions can be divided into two broad categories: phase I (oxidative reactions) and phase II (conjugative reactions). The human cytochrome P450 family (phase I enzymes) contains 57 isozymes and these isozymes metabolize approximately two-thirds of known drugs in human with 80% of this attribute to five isozymes––1A2, 3A4, 2C9, 2C19 and 2D6. Most of these CYPs responsible for phase I reactions are concentrated in the liver. This model is based on 877 (435/442) Total molecules, with 703 (347/356) in the training set, 85 (44/41) test set, and 89 (44/45) validation set drug like molecules. Leave-cluster-out validation of classification models was used for model validation. How to interpret: Category 0: Non-substrate / Non-inhibitor; Category 1: substrate / inhibitor. The output value is the probability of being substrate / inhibitor, within the range of 0 to 1. Empirical decision: If the prediction >= 0.5 the endpoint is considered “active”. If not it is considered as inactive “-“. | doi: 10.1093/nar/gkab255 | |
| Metabolism | HLM | vNN-ADMET | Active | Active/"-" = Inactive/Not predicted | HLM: Human liver microsomal stability. The k-nearest neighbor (k-NN) method is widely used to develop QSAR models (Zheng and Tropsha, 2000). An alternative approach is to use a predetermined similarity criterion, vNN method, which uses all nearest neighbors that meet a structural similarity criterion to define the model's applicability domain (Liu et al., 2012, 2015; Liu and Wallqvist, 2014). When no nearest neighbor meets the criterion, the vNN method makes no prediction. The human liver is the most important organ for drug metabolism. For a drug to achieve effective therapeutic concentrations in the body, it cannot be metabolized too rapidly by the liver. Otherwise, it would need to be administered at high doses, which are associated with high toxicity. To identify and exclude rapidly metabolized compounds (Di et al., 2003), pharmaceutical companies commonly use the human liver microsomal (HLM) stability assay. This has led to the accumulation of a substantial body of HLM stability data in publicly accessible databases. However, our knowledge of how enzymes in the HLM assay metabolize drugs remains fragmentary. Therefore, we examined whether the vNN method could effectively predict drugs that are rapidly metabolized by the liver. We retrieved HLM data from the ChEMBL database (Bento et al., 2014), manually curated the data, and classified compounds as stable or unstable based on the reported half-life [T1/2 > 30 min was considered stable, and T1/2 < 30 min unstable (Liu et al., 2015)]. The final dataset contained 3,219 compounds. Of these, we classified 2,047 as stable and 1,166 as unstable. The HLM model performed with an overall accuracy of 81%; sensitivity and specificity values of 71 and 87%, respectively; and a high kappa value of 0.60. The HLM model reliably predicted 91% of the compounds in the HLM dataset when using 10-fold CV. Results were adapted and the “Yes” and “No” indicators were changed respectively to “Active” and “-“. | doi: 10.3389/fphar.2017.00889. | |
| Absorption | Kp | SWISSADME | 0.8128 | 10-6 cm/s | Skin permeation (Log kp): One model is a multiple linear regression (QSPR model), which aims predicting the skin permeability coefficient (Kp). It is adapted from Potts and Guy (1992 Pharm. Res.), who found Kp linearly correlated with molecular size and lipophilicity (R2 = 0.67). | ||
| Absorption | MDCK Permeability | ADMETLAB2 | 15.7848 | 10-6 cm/s | MDCK Permeability. The prediction is based on Multi-task Graph Attention (MGA) framework MGA is composed of input, Relation graph convolution network (RGCN) layers, attention layer and fully-connected (FC) layers. In the Input, a node represents the information of an atom, and after passing RGCN layers, the node represents general features of circular substructure centered on the atom. RGCN is an extension of the standard graph convolution network (GCN) by introducing edge features to enrich the messages used to update the hidden states in the network. Attention layers can assign different attention weights to different substructures, and then generate the customized fingerprints (CFP) from the general features for a specific task. The prediction results are mainly displayed in the tabular format in the browser, with the 2D molecular structure and a radar plot summarizing the physicochemical quality of the compound. For those endpoints predicted by the regression models concrete predictive values are provided. To obtain robust and accurate prediction models, the model training process was repeated ten times with random data splitting. The best performing models were incorporated into the online platform, and different performance measures for the training test regression model: R-square (R2) of 0.934, mean absolute error (MAE) of 0.140, and root mean squared error (RMSE) of 0.105. Madin−Darby Canine Kidney cells (MDCK) have been developed as an in vitro model for permeability screening. Its apparent permeability coefficient, Papp, is widely considered to be the in vitro gold standard for assessing the uptake efficiency of chemicals into the body. Papp values of MDCK cell lines are also used to estimate the effect of the blood-brain barrier (BBB). This model is based on 1140 Total molecules (positive/Negative), with 912 in the training set (positive/Negative), 114 test set (positive/Negative), and 114 validation set (positive/Negative) drug like molecules with MDCK permeability values and predicts the logarithm of the apparent permeability coefficient (log Papp; log cm/s). How to interpret: The unit of predicted MDCK permeability is cm/s. A compound is considered to have a high passive MDCK permeability for a Papp > 20 x 10-6 cm/s, medium permeability for 2-20 x 10-6cm/s, low permeability for < 2 x 10-6cm/s. Empirical decision: >2 x 10-6cm/s: excellent (green), otherwise: poor (red). | doi: 10.1093/nar/gkab255 | |
| Metabolism | CYP2C9-inh | SWISSADME | - | Active/"-" = Inactive/Not predicted | CYP2C9 Cytochrome P450 inhibition (drug-drug interaction): The Support Vector Machin (SVM) method (SVM) Cortes, C. & Vapnik, V. (1995) on meticulously cleansed large datasets of known inhibitors/non-inhibitors. In similar contexts, SVM was found to perform better than other machine-learning algorithms for binary classification (Mishra et al. 2010). The models return “Yes” or “No” if the molecule under investigation has higher probability to be respectively inhibitor or non-inhibitor of a given CYP. Cytochrome P450 enzymes (CYPs) constitute a superfamily of proteins that play an important role in the metabolism and detoxification of xenobiotics (Brown et al., 2008). A drug should not be rapidly metabolized by CYPs if it is to maintain an effective concentration. In addition, it should not inhibit drug-metabolizing CYPs, because such an effect could elevate the concentration of a co-administered drug and potentially lead to drug overdose—an effect known as a drug-drug interaction. CYP2C9: Cytochrome P 450 2C9 inhibitor: SVM Model built on 5940 molecules (Training set) and tested on 2075 molecules (Test set). 10 fold CV: ACC=0.78/AUC=0.85, external: ACC = 0.71 / AUC = 0.81. Results were adapted and the “Yes” and “No” indicators were changed respectively to “Active” and “-“. | ||
| Distribution | FU | PKCSM | 47.9 | % | FU (Human): Human Fraction Unbound. The prediction is based on molecular properties (Molecular Weight; Heavy Atom count; LogP; Heteroatoms count; Rotatable Bonds count; Ring count; TPSA; Labute ASA; Fluorine atom Count; Toxicophore [1-36]; Pharmacophore count) calculated using the RDKit cheminformatics toolkit and used for training the predictive models. Two different algorithms, Gaussian Processes and Model Tree Regression did the quantitative predictions (regression tasks). Most drugs in plasma will exist in equilibrium between either an unbound state or bound to serum proteins. Efficacy of a given drug may be affect by the degree to which it binds proteins within blood, as the more that is bound the less efficiently it can traverse cellular membranes or diffuse. This predictive model was built using the measured free proportion of 552 compounds in human blood (Fu). The best performing predictor in each task was chosen based on Leave-one-out cv and Train/test approach. The Weka toolkit was used for training and testing the models. How to interpret the results: For a given compound the predicted fraction that would be unbound in plasma will be calculated and expressed in %. | doi: 10.1021/acs.jmedchem.5b00104 | |
| Transporter | Pgp I Inhibitor | PKCSM | - | Active/"-" = Inactive/Not predicted | Pgp inhibitor I: The P-glycoprotein I is an ATP-binding cassette (ABC) transporter. The prediction is based on molecular properties (Molecular Weight; Heavy Atom count; LogP; Heteroatoms count; Rotatable Bonds count; Ring count; TPSA; Labute ASA; Fluorine atom Count; Toxicophore [1-36]; Pharmacophore count) calculated using the RDKit cheminformatics toolkit and used for training the predictive models. Two different algorithms, Random Forest and Logistic Regression, did the qualitative predictions (classification tasks). The best performing predictor in each task was chosen. The Weka toolkit was used for training and testing the models. The best performing predictor in each task was chosen based on 5-fold cv approach. The Weka toolkit was used for training and testing the models. P-glycoprotein I and II inhibitors: Modulation of P-glycoprotein mediated transport has significant pharmacokinetic implications for Pgp substrates, which may either be exploited for specific therapeutic advantages or result in contraindications. This predictive models were build using 1273 and 1275 compounds that have been characterized for their ability to inhibit P-glycoprotein I and P-glycoprotein II transport, respectively. | doi: 10.1021/acs.jmedchem.5b00104 | |
| Metabolism | CYP1A2-inh | vNN-ADMET | - | Active/"-" = Inactive/Not predicted | CYP1A2 Cytochrome P450 inhibition (drug-drug interaction): The k-nearest neighbor (k-NN) method is widely used to develop QSAR models (Zheng and Tropsha, 2000). An alternative approach is to use a predetermined similarity criterion, vNN method, which uses all nearest neighbors that meet a structural similarity criterion to define the model's applicability domain (Liu et al., 2012, 2015; Liu and Wallqvist, 2014). When no nearest neighbor meets the criterion, the vNN method makes no prediction. Cytochrome P450 enzymes (CYPs) constitute a superfamily of proteins that play an important role in the metabolism and detoxification of xenobiotics (Brown et al., 2008). A drug should not be rapidly metabolized by CYPs if it is to maintain an effective concentration. In addition, it should not inhibit drug-metabolizing CYPs, because such an effect could elevate the concentration of a co-administered drug and potentially lead to drug overdose—an effect known as a drug-drug interaction. Cytochrome P450 Inhibition (Drug-Drug Interaction: CYP inhibitors from ChEMBL (Bento et al., 2014) and classified them as inhibitors if the IC50 was below 10 μM. VNN medel was applied on the base of dataset of 7,558 molecules Tanimoto-distance thresold value of 0.50 Accurancy 0.90 sensitivity 0.70 Specificty 0.95 kappa 0.66 coverage 0.75. | doi: 10.3389/fphar.2017.00889. | 
Select an endpoint:
| Category | Sub category | Endpoint | Tool | QSAR ID | Value | Unit | Comments | Reference | 
|---|---|---|---|---|---|---|---|---|
| Cell toxicology | Mito-toxicity | MMP | ADMETLAB2 | Active | Agonist/Antagonist/"-" = Inactive/Not predicted | Mitochondrial membrane potential (MMP), one of the parameters for mitochondrial function, is generated by mitochondrial electron transport chain that creates an electrochemical gradient by a series of redox reactions. This gradient drives the synthesis of ATP, a crucial molecule for various cellular processes. Measuring MMP in living cells is commonly used to assess the effect of chemicals on mitochondrial function; decreases in MMP can be detected using lipophilic cationic fluorescent dyes. Traditional multitask graph neural network (GNN) methods usually handle homogeneous tasks, such as pure regression or classification tasks. However, in ADMET prediction, both regression tasks and classification tasks are needed. Therefore, a multi-task graph attention (MGA) framework was used to simultaneously learn the regression and classification tasks for ADMET predictions in this study. Result interpretation: Category 1: actives ; Category 0: inactives. The output value is the probability of being actives within the range of 0 to 1. Empirical decision: Empirical decision: If the prediction is upper or equal to 0.5 the molecules is considered as “Active”. If not the molecules is noted “-“. | doi: 10.1093/nar/gkab255 | |
| Organ toxicology | Hepatotoxicity | Liver NOAEL | VEGA | 87.4984 | mg/kg bw /d | No-information available | https://www.vegahub.eu/portfolio-item/vega-qsar-models-qrmf/ | |
| Carcinogenicity | Carcino | VEGA | IRFMN-ISSCAN-CGX | - | Active/"-" = Inactive/Not predicted | The datasets used for the extraction of the rules (structural Alerts), was based on ISSCAN database and CGX dataset. The rules (structural alerts) have been extracted with SARpy. The method is based on a set of 43 rules (structural alerts) related to carcinogenic activity. Qualitative information were changed: "mutagenic" whatever the quality of the prediction was replaced to "Active" and "Possible NON-Carcinogen" prediction was replaced to "-" whatever the quality of the prediction. | https://www.vegahub.eu/portfolio-item/vega-qsar-models-qrmf/ | |
| Endocrine Disruption | GR | VEGA_NRMEA | - | Agonist/Antagonist/ a-anta agonist and antagonist /"-" = Inactive/Not predicted | https://www.vegahub.eu/wp/wp-content/uploads/2019/12/VEGA_NRMEA_model_Introduction.pdf | |||
| Organ toxicology | Cardiotoxicity | hERG Blocker | ADMETLAB2 | - | Active/"-" = Inactive/Not predicted | The human ether-a-go-go related gene. The During cardiac depolarization and repolarization, a voltage_x0002_gated potassium channel encoded by hERG plays a major role in the regulation of the exchange of cardiac action potential and resting potential. The hERG blockade may cause long QT syndrome (LQTS), arrhythmia, and Torsade de Pointes (TdP), which lead to palpitations, fainting, or even sudden death. Result interpretation: Molecules with IC50 more than 10 μM or less than 50% inhibition at 10 μM were classified as hERG - (Category 0), while molecules with IC50 less than 10 μM or more than 50% inhibition at 10 μM were classified as hERG+ (Category 1). The output value is the probability of being hERG+, within the range of 0 to 1. | doi: 10.1093/nar/gkab255 | |
| Human toxicology | MRTD | ADMETLAB2 | - | Active/"-" = Inactive/Not predicted | The prediction is based on Multi-task Graph Attention (MGA) framework MGA is composed of input, Relation graph convolution network (RGCN) layers, attention layer and fully-connected (FC) layers. In the Input, a node represents the information of an atom, and after passing RGCN layers, the node represents general features of circular substructure centered on the atom. RGCN is an extension of the standard graph convolution network (GCN) by introducing edge features to enrich the messages used to update the hidden states in the network. Attention layers can assign different attention weights to different substructures, and then generate the customized fingerprints (CFP) from the general features for a specific task. The prediction results are mainly displayed in the tabular format in the browser, with the 2D molecular structure and a radar plot summarizing the physicochemical quality of the compound. For those endpoints predicted by the regression models concrete predictive values are provided. To obtain robust and accurate prediction models, the model training process was repeated ten times with random data splitting. The best performing models were incorporated into the online platform, and different performance of classification models in training and validation sets: AUC: 0.869, ACC: 0.787, SP: 0.766, Sen: 0.810, MCC: 0.575. The half-life of a drug is a hybrid concept that involves clearance and volume of distribution, and it is arguably more appropriate to have reliable estimates of these two properties instead. This model is based on 1197 (561/636) Total molecules, with 957 (448/509) in the training set, 120 (56/64) test set, and 120 (57/63) validation set drug like molecules. Leave-cluster-out validation of classification models was used for model validation. Result interpretation: MRTD Active if MRTD ≤ 0.011 mmol/kg -bw/day | doi: 10.1093/nar/gkab255 | ||
| Carcinogenicity | Fem rat carcino | VEGA | -1.5109 | [log(1/(mg/kg-day))] | no information | https://www.vegahub.eu/portfolio-item/vega-qsar-models-qrmf/ | ||
| Cell toxicology | Response to Stress | P53 | ADMETLAB2 | Active | Agonist/Antagonist/"-" = Inactive/Not predicted | P53, a tumor suppressor protein, is activated following cellular insult, including DNA damage and other cellular stresses. The activation of p53 regulates cell fate by inducing DNA repair, cell cycle arrest, apoptosis, or cellular senescence. The activation of p53, therefore, is a good indicator of DNA damage and other cellular stresses. Traditional multitask graph neural network (GNN) methods usually handle homogeneous tasks, such as pure regression or classification tasks. However, in ADMET prediction, both regression tasks and classification tasks are needed. Therefore, a multi-task graph attention (MGA) framework was used to simultaneously learn the regression and classification tasks for ADMET predictions in this study. Result interpretation: Category 1: actives ; Category 0: inactives. The output value is the probability of being actives within the range of 0 to 1. Empirical decision: If the prediction is upper or equal to 0.5 the molecules is considered as “Active”. If not the molecules is noted “-“. | doi: 10.1093/nar/gkab255 | |
| Endocrine Disruption | ER | ADMETSAR | - | Agonist/Antagonist/"-" = Inactive/Not predicted | Binary models for 6 targets implicated in endocrine disruption (ED), namely AR (androgen receptor), ER (estrogen receptor), TR (thyroid receptor), GR (glucocorticoid receptor), PPARγ (peroxisome proliferator-activated receptors γ) and Aromatase. All the datasets were collected from Tox21 and random under-sampling technique was used to achieve a balanced dataset for model training. A multi-label model was developped by combining the best single-label model of each target and the resulting model can be used to distinguish whether certain endocrine disrupting chemicals can simultaneously modulate multiple receptors related to ED. Finally, all the binary models and multi-label model were respectively evaluated by corresponding single-label test sets and a multi-label test set with reasonable reliability. | DOI: 10.1093/bioinformatics/bty707 | ||
| Organ toxicology | Hepatotoxicity | PXR up liver stea | VEGA | - | Active/"-" = Inactive/Not predicted | Data referred to ToxCast assays ATG_PXR_TRANS_up (AEID: 135) and ATG_PXRE_CIS_up (AEID: 103). Attagene (ATG) assays are cell-based, multiplexed-redout assays that uses HepG2, a human liver cell line, with measurements taken at 24 hour after chemical dosing in 24-well plate. The consensus of four single models based on 1) Random Forest (RF) and Balanced Random Forest (BRF) were applied tod the training dataset of 853 chemicals. The output statistics for goodness of fit were Balance Accuracy 0.99, Sensitivity 0.99, Specificity 1, MCC 0.99. TP509, TN 422, FP 0, FN 3. | https://www.vegahub.eu/portfolio-item/vega-qsar-models-qrmf/ | |
| Organ toxicology | Ocular toxicity | EC | ADMETLAB2 | Active | Active/"-" = Inactive/Not predicted | Assessing the eye irritation/corrosion (EI/EC) potential of a chemical is a necessary component of risk assessment. Cornea and conjunctiva tissues comprise the anterior surface of the eye, and hence cornea and conjunctiva tissues are directly exposed to the air and easily suffer injury by chemicals. There are several substances, such as chemicals used in manufacturing, agriculture and warfare, ocular pharmaceuticals, cosmetic products, and household products, that can cause EI or EC. Result interpretation: Category 1: corrosives / irritants chemicals; Category 0: non-corrosives / non-irritants chemicals. The output value is the probability of being toxic, within the range of 0 to 1. | doi: 10.1093/nar/gkab255 | |
| Endocrine Disruption | RARa | VEGA_NRMEA | - | Agonist/Antagonist/"-" = Inactive/Not predicted | https://www.vegahub.eu/wp/wp-content/uploads/2019/12/VEGA_NRMEA_model_Introduction.pdf | |||
| Developmental/Reproductive Toxicology | Reprotoxicity | Reprotox | ADMETSAR | Active | Agonist/Antagonist/"-" = Inactive/Not predicted | A reproductive toxicity data set of 1823 compounds (861 positive compounds and 962 negative compounds) was collected from the ECHA‐C&L Inventory and OECD‐eChemPortal.[1] | DOI: 10.1093/bioinformatics/bty707 | |
| General Toxicology | LD50/ROA | ADMETSAR | 1265.6795 | mg/kg of bw | Rat oral Acute toxicity. In total. 10207 molecules with LD50 (mol/kg) against rat were collected from Zhu's work.[1] The model was built by graph convolutional neuronal network implemented in Deepchem. https://pubs.acs.org/doi/pdfplus/10.1021/tx900189p. the results are expressed in log(1/(mol/kg)) according to the website and to the publication by zhu et al. Therefore a conversion in mg/kg was performed. | DOI: 10.1093/bioinformatics/bty707 | ||
| Organ toxicology | Hepatotoxicity | DILI | ADMETSAR | Active | Active/"-" = Inactive/Not predicted | In total, 3115 toxic molecules and 593 nontoxic molecules were collected from publications and databases such as DrugBank by Mulliner et al. All the molecues were prepared in Pipeline Pilot, removing inorganic compounds, large molecules ( > 800 Da), and inorganic salts in mixtures. The "+" indicating a Hepatotoxic effect was changed to "active" in the table | DOI: 10.1093/bioinformatics/bty707 | |
| General Toxicology | NOAEL | VEGA | 43.0527 | mg/kg of bw/day | NOAEL: All doubtful or inorganic compounds, salts, and mixtures were eliminated, because the relationships between molecular structure and the NOAEL are very complex. We considered only data referring to 90 days of oral administration in rats and rejected reproductive toxicity studies. It is to be noted that the exchange of the 90-day study by shorter testing is an attractive alternative. Taking into account this circumstance, values for 28 days of treatment were considered but, in order to have consistent data, they were divided by a factor of 3, as specified by the scientific committee on consumer safety (SCCS) in order to approximate the 90-day NOAEL. After the above selection, about four hundreds of various substances with small molecules (e.g., 2–3 atoms) and vice versa with extremely large molecules (e.g., 100 or more atoms), molecules with specific groups, such as [N+], [NH4+], [nH], etc., and substances with molecules containing many various cycles / heterocycles were remained. Under such circumstances, the following limitations were used in the selection of compounds for the work set: (i) too large and, vice versa, too small molecules were removed (practically, molecules which can be represented by SMILES with length less than 70 and larger than 10 symbols, were selected); (ii) molecules which have only one cycles or have no cycles at all were selected; and (iii) molecules with special groups (indicated by square brackets) were removed from the work set. Thus, the dataset of 140 compounds has been selected. All values were converted to decimal logarithms (-log NOAEL). Algortihm is The Monte Carlo method. | https://www.vegahub.eu/portfolio-item/vega-qsar-models-qrmf/ | ||
| Organ toxicology | Hepatotoxicity | PPARa up liver stea | VEGA | - | Active/"-" = Inactive/Not predicted | Data referred to ToxCast assays ATG_PPARa_TRANS_up (AEID: 132). Attagene (ATG) assays are cell-based, multiplexed-redout assays that uses HepG2, a human liver cell line, with measurements taken at 24 hour after chemical dosing in 24-well plate. The consensus of four single models based on 1) Random Forest (RF) and Balanced Random Forest (BRF) were applied tod the training dataset of 1057 chemicals. The output statistics for goodness of fit were Balanced Accuracy: 0.76, Sensitivity: 0.60, Specificity: 0.91, MCC: 0.34. TP 30, TN 917, FP 90, FN 20 | https://www.vegahub.eu/portfolio-item/vega-qsar-models-qrmf/ | |
| Organ toxicology | Skin toxicity | SkinSen | PKCSM | - | Active/"-" = Inactive/Not predicted | SkinSen: Skin sensitization: The prediction is based on molecular properties (Molecular Weight; Heavy Atom count; LogP; Heteroatoms count; Rotatable Bonds count; Ring count; TPSA; Labute ASA; Fluorine atom Count; Toxicophore [1-36]; Pharmacophore count) calculated using the RDKit cheminformatics toolkit and used for training the predictive models. Two different algorithms, Random Forest and Logistic Regression, did the qualitative predictions (classification tasks). Skin sensitization is a potential adverse effect for dermally applied products. The evaluation of whether a compound, that may encountered the skin, can induce allergic contact dermatitis is an important safety concern. This predictor was built using 254 compounds which have been evaluated for their ability to induce skin sensitization. The best performing predictor in each task was chosen based 5-fold cv approach. The Weka toolkit was used for training and testing the models. How to interpret the results: How to interpret the results: It predicts whether a given compound is likely to be associated with skin sensitisation. Qualitative information were changed: "Yes" was replaced to "Active" and "No" prediction was replaced to "-". | doi: 10.1021/acs.jmedchem.5b00104 | |
| Endocrine Disruption | TRa | VEGA_NRMEA | - | Agonist/Antagonist/"-" = Inactive/Not predicted | https://www.vegahub.eu/wp/wp-content/uploads/2019/12/VEGA_NRMEA_model_Introduction.pdf | |||
| Organ toxicology | Cardiotoxicity | hERG I Blocker | PKCSM | - | Active/"-" = Inactive/Not predicted | hERG I Inhibitor: human ether-a-go-go gene I Inhibitor: The prediction is based on molecular properties (Molecular Weight; Heavy Atom count; LogP; Heteroatoms count; Rotatable Bonds count; Ring count; TPSA; Labute ASA; Fluorine atom Count; Toxicophore [1-36]; Pharmacophore count) calculated using the RDKit cheminformatics toolkit and used for training the predictive models. Two different algorithms, Random Forest and Logistic Regression, did the qualitative predictions (classification tasks). Inhibition of the potassium channels encoded by hERG (human ether-a-go-go gene) are the principal causes for the development of acquire long QT syndrome - leading to fatal ventricular arrhythmia. Inhibition of hERG channels has resulted in the withdrawal of many substances from the pharmaceutical market. This predictor was built using hERG I inhibition information for 368 compounds. The best performing predictor in each task was chosen based 5-fold cv approach. The Weka toolkit was used for training and testing the models. How to interpret the results: The predictor will determine if a given compound is likely to be a hERG I inhibitor. Qualitative information were changed: "Yes" was replaced to "Active" and "No" prediction was replaced to "-". | doi: 10.1021/acs.jmedchem.5b00104 | |
| General Toxicology | LD50/ROA | PKCSM | 1569.9048 | mg/kg of bw | Rat LD50 Oral Rat Acute Toxicity (LD50). The prediction is based on molecular properties (Molecular Weight; Heavy Atom count; LogP; Heteroatoms count; Rotatable Bonds count; Ring count; TPSA; Labute ASA; Fluorine atom Count; Toxicophore [1-36]; Pharmacophore count) calculated using the RDKit cheminformatics toolkit and used for training the predictive models. Two different algorithms, Gaussian Processes and Model Tree Regression did the quantitative predictions (regression tasks). It is important to consider the toxic potency of a potential compound. The lethal dosage values (LD50) are a standard measurement of acute toxicity used to assess the relative toxicity of different molecules. The LD50 is the amount of a compound given all at once that causes the death of 50% of a group of test animals. The model was built on over 10207 compounds tested in rats and predicts the LD50 (in mol/kg). The best performing predictor in each task was chosen based 5-fold CV approach. The Weka toolkit was used for training and testing the models. How to interpret the results: the LD50prediction (in mol/kg ). Values are also expressed in g/kg. (The prediction is based on the first version of the ADMETSAR algorithm). Considering the results obtained from the prediction we assumes that the results are expressed in log(1/(mol/kg)) as for ADMETSAR2. We therefore corrected the values as mol/kg = 10^(-value). the results are finally expressed in mg/Kg of BW | doi: 10.1021/acs.jmedchem.5b00104 | ||
| Organ toxicology | Hepatotoxicity | H-HT | VEGA | Not predicted | Active/"-" = Inactive/Not predicted | Only data on human from literature were considered: - Fourches et al. (2010) [2], which contains 950 hepatotoxicity data (drugs) on humans, rodents and non-rodent species. We selected only data referring to humans (650 data) and eliminated the rest. - United States Food and Drug Administration (US FDA) Human Liver Adverse Effects Database [3]. This contains 631 unique pharmaceuticals, 491 of which (non-proprietary data) have adverse drug reaction data for one or more of the 47 liver effects Coding Symbols for Thesaurus of Adverse Reaction (COSTAR) term endpoints. Since only two compounds were labeled as M (marginally active) we eliminated them in order to reduce the uncertainty of the data set. The two datasets were merged: duplicates and compounds with contrasting experimental values were eliminated. Compounds with concordant experimental activity considered ones. The final data set was fairly balanced, with 510 compounds labeled as hepatotoxic and 440 non-hepatotoxic. The final dataset was randomly splitted into a training set (760 mono constituent organic compounds) and a test set, test set 1, (190 mono constituent organic compounds) The external validation set (test set 2) was retrieved in the Liver Toxicity Knowledge Base (LTKB) Benchmark Dataset developed by the US FDA [7]. 101 chemicals are selected (after elimination of compounds already present in the dataset), 69 labeled as hepatotoxicity and 32 labeled as non-hepatotoxicity. The VEGA implemented model merged the test set 1 and the test set 2 (external validation set) and hence consisted of 291 number of substances ( 171 labelled hepatotoxic and 120 labeled non-hepatotoxic). Decision tree based on structural alerts (SAs). The NON-Toxic (whatever the reliability) are changed to “-“, Toxic (whatever the reliability) are changed to “Active“, and Unknown changed to NP for no prediction. | https://www.vegahub.eu/portfolio-item/vega-qsar-models-qrmf/ | |
| Genotoxicity/Mutagenicity | Mutagenicity | AMES | ADMETLAB2 | - | Active/"-" = Inactive/Not predicted | The Ames test for mutagenicity. The mutagenic effect has a close relationship with the carcinogenicity, and it is the most widely used assay for testing the mutagenicity of compounds. Result interpretation: Category 0: AMES negative(-); Category 1: AMES positive(+). The output value is the probability of being toxic, within the range of 0 to 1. We adapted the results, if prediction >= 0.5 the compound is considered as Active | doi: 10.1093/nar/gkab255 | |
| Endocrine Disruption | ER-RBA | VEGA | - | Agonist/Antagonist/"-" = Inactive/Not predicted | The tested substance is added to a system where radio-labeled reference hormone binds to a prescribed quantity of hormone receptor. The chemical concentration that inhibits 50% of the binding of the reference hormone to the receptor is measured and defined as IC50. Then, Relative Binding Affinity (RBA) between IC50 values of the chemical and natural hormone (E2) is defined as the endpoint when the IC50 concentration of natural hormone is set at 100. The final dataset comprised 806 single 2D structures, with the majority of the compounds considered inactive. The dataset was split into training (656 chemicals) and test (150 chemicals). Classification and regression tree (CART) uses the methodology of tree building as a hierarchical classification method. the model is based on 8 physicochemical descriptors. The model Statistics for goodness-of-fit were: Accuracy 0.86, Sensitivity 0.87, Specificity 0.85, MCC 0.70. TP 203, TN 356, FP 64, FN 31. | https://www.vegahub.eu/portfolio-item/vega-qsar-models-qrmf/ | ||
| Genotoxicity/Mutagenicity | Mutagenicity | MicroN-In vivo | VEGA | - | Active/"-" = Inactive/Not predicted | The models have been developed using a set of 1228 compounds and their experimental results of in vivo micronucleus test, classified as genotoxic (378) and non-genotoxic (850). The model performs a consensus assessment based on the predictions of two single models: 1) SAR in python (SARpy) and 2) k-nearest neighbor (k-NN). The Statistics for goodness-of-fit: Accuracy 0.99, Sensitivity 0.99, Specificity 1.00, MCC 0.99. TP 363, TN 839, FP 2. | https://www.vegahub.eu/portfolio-item/vega-qsar-models-qrmf/ | |
| Organ toxicology | Nephrotoxicity | Nephro | ADMETSAR | Active | Active/"-" = Inactive/Not predicted | Drug-induced nephrotoxicity has been one of the main reasons for the failure of drug development. Early prediction of the nephrotoxicity for drug candidates is critical to the success of clinical trials. We manually collected 777 valid data, and divided the molecules into negative and positive according to clinical reports and in vivo assay. Our model can be applied to the prediction of nephrotoxicity of Chinese herbal medicines and chemical drugs. | DOI: 10.1093/bioinformatics/bty707 | |
| Genotoxicity/Mutagenicity | Mutagenicity | MicroN-In vivo | ADMETSAR | Active | Active/"-" = Inactive/Not predicted | Genotoxicity testing of new chemical entities is an integral part of the drug development process and is a regulatory requirement prior to the approval of new drugs. In vivo micronucleus assay is common used to detect chemical genotoxicity. Chemicals were labeled as positive and negative according to the result of the assay. A total of 641 chemicals with the in vivo micronucleus assay results were collected from available literature and database. The performance of the binary models in admetSAR was of AUC: 0.937, Accuracy: 0.87, Sensitivity: 0.819, Specificity: 0.906. | DOI: 10.1093/bioinformatics/bty707 | |
| Cell toxicology | Mito-toxicity | MMP | vNN-ADMET | - | Agonist/Antagonist/"-" = Inactive/Not predicted | MMP: The k-nearest neighbor (k-NN) method is widely used to develop QSAR models (Zheng and Tropsha, 2000). An alternative approach is to use a predetermined similarity criterion, vNN method, which uses all nearest neighbors that meet a structural similarity criterion to define the model's applicability domain (Liu et al., 2012, 2015; Liu and Wallqvist, 2014). When no nearest neighbor meets the criterion, the vNN method makes no prediction. MMP Disruption (Mitochondrial Toxicity) vNN-based MMP prediction model, using 6,261 compounds collected from a previous study that screened a library of 10,000 compounds (∼8,300 unique chemicals) at 15 concentrations, each in triplicate, to measure changes in the MMP in HepG2 cells (Attene-Ramos et al., 2015). The study found that 913 compounds decreased the MMP, whereas 5,395 compounds had no effect. It made predictions for compounds that were well represented in the applicability domain, but not for any other compound. The model showed a high overall accuracy of 89% and a kappa value of 0.61, with a coverage of 69%. Results were adapted and the “Yes” and “No” indicators were changed respectively to “Active” and “-“. | doi: 10.3389/fphar.2017.00889. | |
| General Toxicology | LD50/ROA | VEGA | 3052.62 | mg/kg of bw | LD50 values (mg/kg) were converted to logLD50 (mmol/kg) in order to have a distribution of data more suitable for modelling. The MW used for conversion was calculated as the sum of MWs of the main molecule and of its counterion, if present. There were data referring to the same chemical (i.e., InChi code) in the “Training Dataset” and “Complete LD50 inventory”. They have been defined as follows in this document: Duplicates: two or more records sharing the same InChI code in the “Training Dataset”. The final dataset was composed of 6280 substances, 5029 as training set (TS) and 1251 as validation set (VS). After the implementation in VEGA, the final dataset was composed of 6280 substances (training set). Defining the algorithm - OECD Principle 2 4.1.Type of model: The Acute Toxicity (LD50) model is a Regression model (kNN) based on 6280 substances retrieved from several sources. It is specific for the acute oral systemic toxicity tests in Rats. Explicit algorithm: Regression model (kNN). | https://www.vegahub.eu/portfolio-item/vega-qsar-models-qrmf/ | ||
| Endocrine Disruption | AR | VEGA | - | Agonist/Antagonist/"-" = Inactive/Not predicted | Androgen Receptor-mediated effect (IRFMN-COMPARA)-assessment. 1689 curated chemical structures with AR experimental activity were provided by the EPA’s National Center for Computational Toxicology as a training set to develop the in silico models. Experimental data were derived from a collection of 11 in vitro HTS assays exploring multiple points in the AR pathway including three receptor binding, two cofactor recruitment, one RNA transcription, three agonist-mode protein production and two antagonist-mode protein production. A chemical was considered as a binder if it was either an active agonist or antagonist. The model provides a qualitative prediction for Androgen Receptor (AR) effects mediated through the AR pathway. The data were used to generate binary classification models to discriminate active (both agonists and antagonists) compounds from inactive ones. It is a two steps model developed with SARpy. In the first step SARpy was used to model the two classes, identifying a set of 127 rules (17 for active and 110 for inactive). Then, a second set of 22 rules identifying active compounds is applied to unpredicted compounds only. Statistics for goodness-of-fit were for training set: accuracy 0.94, sensitivity 0.77, specificity 0.96, MCC 0.70. TP 155, TN 1402, FP 65, FN 42. | https://www.vegahub.eu/portfolio-item/vega-qsar-models-qrmf/ | ||
| Genotoxicity/Mutagenicity | Mutagenicity | AMES | VEGA | KNN-Read-Across | Active | Active/"-" = Inactive/Not predicted | The read-across model has been built with the k-Nearest Neighbor (KNN) application, and it is based on the similarity index, k of the most similar compounds. Model based on 5,770 compounds. Qualitative information were transformed: "mutagenic" whatever the quality of the prediction is considered as "Active" and "non-mutagenic" prediction as "-" whatever the quality of the prediction. | https://www.vegahub.eu/portfolio-item/vega-qsar-models-qrmf/ | 
| Carcinogenicity | Carcino | VEGA | ISS | - | Active/"-" = Inactive/Not predicted | Decisional algorithm based on rules of toxicity. The model has been built as a set of 56 rules, taken from the work of Benigni and Bossa (ISS) as implemented in the software ToxTree and based from a training model of 797 compounds. Qualitative information were changed: "mutagenic" whatever the quality of the prediction was replaced to "Active" and "non-mutagenic" prediction was replaced to "-" whatever the quality of the prediction. | https://www.vegahub.eu/portfolio-item/vega-qsar-models-qrmf/ | |
| Endocrine Disruption | PPARg | ADMETLAB2 | - | Active/"-" = Inactive/Not predicted | PPAR-gamma: The peroxisome proliferator-activated receptors (PPARs) are lipid-activated transcription factors of the nuclear receptor superfamily with three distinct subtypes namely PPAR alpha, PPAR delta (also called PPAR beta) and PPAR gamma (PPARg). All these subtypes heterodimerize with Retinoid X receptor (RXR) and these heterodimers regulate transcription of various genes. PPAR-gamma receptor (glitazone receptor) is involved in the regulation of glucose and lipid metabolism. Traditional multitask graph neural network (GNN) methods usually handle homogeneous tasks, such as pure regression or classification tasks. However, in ADMET prediction, both regression tasks and classification tasks are needed. Therefore, a multi-task graph attention (MGA) framework was used to simultaneously learn the regression and classification tasks for ADMET predictions in this study. Result interpretation: Category 1: actives ; Category 0: inactives. The output value is the probability of being actives within the range of 0 to 1. Empirical decision: If the prediction is upper or equal to 0.5 the molecules is considered as “Active”. If not the molecules is noted “-“. | doi: 10.1093/nar/gkab255 | ||
| Organ toxicology | Cardiotoxicity | hERG Blocker | vNN-ADMET | - | Active/"-" = Inactive/Not predicted | hERG Blocker: human ether-à-go-go-related gene. The k-nearest neighbor (k-NN) method is widely used to develop QSAR models (Zheng and Tropsha, 2000). An alternative approach is to use a predetermined similarity criterion, vNN method, which uses all nearest neighbors that meet a structural similarity criterion to define the model's applicability domain (Liu et al., 2012, 2015; Liu and Wallqvist, 2014). When no nearest neighbor meets the criterion, the vNN method makes no prediction. hERG: 282 known hERG blockers from the literature and classified compounds with an IC50 cutoff value of 10 μM or less as blockers (Wang et al., 2012). We also collected a set of 404 compounds with IC50 values >10 μM from ChEMBL (Bento et al., 2014) and classified them as non blockers (Czodrowski, 2013). hERG blockers and non-blockers were classified as positives and negatives, respectively. The hERG model performed with an overall accuracy of 84%, well-balanced sensitivity and specificity values (84 and 83%, respectively), and a kappa value of 0.68. The model reliably predicted 80% of the compounds in our dataset when using 10-fold CV. Results were adapted and the “Yes” and “No” indicators were changed respectively to “Active” and “-“. | doi: 10.3389/fphar.2017.00889. | |
| Organ toxicology | Skin toxicity | SkinSen | VEGA | NCSTOX | - | Active/"-" = Inactive/Not predicted | no information | https://www.vegahub.eu/portfolio-item/vega-qsar-models-qrmf/ | 
| Human toxicology | MRTD | vNN-ADMET | 25.5167 | mg/Kg of bw /day | MRTD: maximum recommended therapeutic dose (MRTD). The k-nearest neighbor (k-NN) method is widely used to develop QSAR models (Zheng and Tropsha, 2000). An alternative approach is to use a predetermined similarity criterion, vNN method, which uses all nearest neighbors that meet a structural similarity criterion to define the model's applicability domain (Liu et al., 2012, 2015; Liu and Wallqvist, 2014). When no nearest neighbor meets the criterion, the vNN method makes no prediction. Maximum recommended therapeutic dose: A basic principle of toxicology is that “the dose makes the poison.” For most drugs, the therapeutic dose is limited by toxicity, and the maximum recommended therapeutic dose (MRTD) is an estimated upper daily dose that is safe (Contrera et al., 2004). Investigators carry out toxicological experiments on animals to determine the toxic effects of a drug and the initial dose for human clinical trials. Unfortunately, there is a lack of correlation between animal and human toxicity data. Therefore, we investigated whether the vNN method could predict the MRTD values of new compounds based on known human MRTD data. If so, the values could be used to estimate the starting dose in phase I clinical trials, while significantly reducing the number of animals used in preliminary toxicology studies. Maximum Recommended Therapeutic Dose: MRTD values publically disclosed by the FDA, mostly of single-day oral doses for an average adult with a body weight of 60 kg, for 1,220 compounds (most of which are small organic drugs). For modeling purposes we converted the MRTD unit from mg/kg-body weight/day to mol/kg-body weight/day via the molecular weight of the compound. However, the predicted values on the website are reported in mg/day based upon an average adult weighing 60 kg. We used an external test set of 160 compounds, which was collected by the FDA for validation. The total dataset for our model contained 1,184 compounds (Liu et al., 2012). The MRTD model reliably predicted 69% of the FDA MRTD dataset, with a Pearson's correlation coefficient (R) of 0.79 between the predicted and measured log(MRTD) values, and a mean deviation (mDev) of 0.56 log units, using 40-fold CV (Liu et al., 2012). To facilitate the comparison between tool predictions, the data expressed in mg/day for an adult of 60 kg were transformed in mg/Kg of BW/day. | doi: 10.3389/fphar.2017.00889. | ||
| Cell toxicology | Response to Stress | HSE | ADMETLAB2 | - | Agonist/Antagonist/"-" = Inactive/Not predicted | Heat shock factor response element. Various chemicals, environmental and physiological stress conditions may lead to the activation of heat shock response/ unfolded protein response (HSR/UPR). There are three heat shock transcription factors (HSFs) (HSF-1, -2, and -4) mediating transcriptional regulation of the human HSR. Traditional multitask graph neural network (GNN) methods usually handle homogeneous tasks, such as pure regression or classification tasks. However, in ADMET prediction, both regression tasks and classification tasks are needed. Therefore, a multi-task graph attention (MGA) framework was used to simultaneously learn the regression and classification tasks for ADMET predictions in this study. Result interpretation: Category 1: actives ; Category 0: inactives. The output value is the probability of being actives within the range of 0 to 1. Empirical decision: : If the prediction is upper or equal to 0.5 the molecules is considered as “Active”. If not the molecules is noted “-“. | doi: 10.1093/nar/gkab255 | |
| Endocrine Disruption | AhR | ADMETLAB2 | Active | Agonist/Antagonist/"-" = Inactive/Not predicted | NR-AhR: The Aryl hydrocarbon Receptor (AhR), a member of the family of basic helix-loop-helix transcription factors, is crucial to adaptive responses to environmental changes. AhR mediates cellular responses to environmental pollutants such as aromatic hydrocarbons through induction of phase I and II enzymes but also interacts with other nuclear receptor signaling pathways. Traditional multitask graph neural network (GNN) methods usually handle homogeneous tasks, such as pure regression or classification tasks. However, in ADMET prediction, both regression tasks and classification tasks are needed. Therefore, a multi-task graph attention (MGA) framework was used to simultaneously learn the regression and classification tasks for ADMET predictions in this study. Result interpretation: Category 1: actives ; Category 0: inactives. The output value is the probability of being actives within the range of 0 to 1. Empirical decision: If the prediction is upper or equal to 0.5 the molecules is considered as “Active”. If not the molecules is noted “-“. | doi: 10.1093/nar/gkab255 | ||
| Organ toxicology | Hepatotoxicity | Liver LOAEL | VEGA | 407.0052 | mg/kg bw/d | No-information available | https://www.vegahub.eu/portfolio-item/vega-qsar-models-qrmf/ | |
| Carcinogenicity | Carcino | ADMETSAR | Trinary | - | Active/"-" = Inactive/Not predicted | The data set used for model building was compiled from CPDB, which contains alarge number of chemical structures (1547 substances) with tumor data in rodents. Forthese chemicals, the carcinogenic potency is expressed as TD50 values. The data set was prepared infollowing steps:(1) Removing mixtures, inorganic, salts and organometallic compounds;(2) Removing compounds that have inconsistent results in different experimental groups;(3) Removing compounds with molecular weights less than 40 or more than 600;(4) Only one stereoisomer was retained because the 2D fingerprints of apair of stereoiso-mers are identical. Finally, 476 carcinogens and 440 noncarcinogens were collected. The trinary model was built by MACSS fingerprint and support vector machine. The "Danger" and "Active" prediction were changed to "Active" and "non-required" prediction changed to "-" | DOI: 10.1093/bioinformatics/bty707 | |
| Organ toxicology | Ocular toxicity | EC | ADMETSAR | - | Active/"-" = Inactive/Not predicted | A total of 5220 chemicals (3874+/1346-) for a serious eye irritation (EI) dataset and 2299 chemicals (887+/1412-) as an eye corrosion (EC) dataset were collected from available databases and literature. The EI model was built by AtomPairs with support vector machine and the EC model was built by MACCS and support vector machine. | DOI: 10.1093/bioinformatics/bty707 | |
| Endocrine Disruption | PR | VEGA_NRMEA | - | Agonist/Antagonist/ a-anta agonist and antagonist /"-" = Inactive/Not predicted | https://www.vegahub.eu/wp/wp-content/uploads/2019/12/VEGA_NRMEA_model_Introduction.pdf | |||
| Organ toxicology | Skin toxicity | SkinSen Rules | VEGA | No class found | Alerts | https://www.vegahub.eu/portfolio-item/vega-qsar-models-qrmf/ | ||
| Endocrine Disruption | ARO | ADMETLAB2 | - | Agonist/Antagonist/"-" = Inactive/Not predicted | Aromatase: Endocrine disrupting chemicals (EDCs) interfere with the biosynthesis and normal functions of steroid hormones including estrogen and androgen in the body. Aromatase catalyzes the conversion of androgen to estrogen and plays a key role in maintaining the androgen and estrogen balance in many of the EDC-sensitive organs. Traditional multitask graph neural network (GNN) methods usually handle homogeneous tasks, such as pure regression or classification tasks. However, in ADMET prediction, both regression tasks and classification tasks are needed. Therefore, a multi-task graph attention (MGA) framework was used to simultaneously learn the regression and classification tasks for ADMET predictions in this study. Result interpretation: Category 1: actives ; Category 0: inactives. The output value is the probability of being actives within the range of 0 to 1. Empirical decision: If the prediction is upper or equal to 0.5 the molecules is considered as “Active”. If not the molecules is noted “-“. | doi: 10.1093/nar/gkab255 | ||
| Carcinogenicity | Genotox-Carci-muta | ADMETLAB2 | 2.0 | Number of structural alert | Molecules containing these substructures may cause carcinogenicity or mutagenicity through genotoxic mechanisms.There are 117 substructures in this endpoint. | doi: 10.1093/nar/gkab255 | ||
| Endocrine Disruption | ER | VEGA | Active | Agonist/Antagonist/"-" = Inactive/Not predicted | The model has been built as a set of rules, extracted with Sarpy software from a dataset obtained from a collection of high-quality estrogen receptor (ER) signaling data (1529 chemicals screened across 18 high_x0002_throughput screening assays integrated into a single score) from the ToxCast program. The model is based on 59 rules. Statistics for goodness-of-fit were after the implementation in VEGA: n = 1529, not predicted = 241, Accuracy 0.97, Sensitivity 0.85, Specificity 0.97, MCC 0.70. TP 60, TN 1179, FP 38, FN 11. | https://www.vegahub.eu/portfolio-item/vega-qsar-models-qrmf/ | ||
| Organ toxicology | Skin toxicity | SkinSen | ADMETSAR | - | Active/"-" = Inactive/Not predicted | A large data set of 1007 compounds and their experimental LLNA data were collected from two databases including the OECD's eChemPortal database and the National Toxicology Program Interagency Center for the Evaluation of Alternative Toxicological Methods (NICEATM) database. The compounds were classified as negative and positive based on their EC3 values according to the following convention: Negative (without EC3) and Positive (with EC3).[1] | DOI: 10.1093/bioinformatics/bty707 | |
| Endocrine Disruption | TR | ADMETSAR | - | Active/"-" = Inactive/Not predicted | Binary models for 6 targets implicated in endocrine disruption (ED), namely AR (androgen receptor), ER (estrogen receptor), TR (thyroid receptor), GR (glucocorticoid receptor), PPARγ (peroxisome proliferator-activated receptors γ) and Aromatase. All the datasets were collected from Tox21 and random under-sampling technique was used to achieve a balanced dataset for model training. A multi-label model was developped by combining the best single-label model of each target and the resulting model can be used to distinguish whether certain endocrine disrupting chemicals can simultaneously modulate multiple receptors related to ED. Finally, all the binary models and multi-label model were respectively evaluated by corresponding single-label test sets and a multi-label test set with reasonable reliability. | DOI: 10.1093/bioinformatics/bty707 | ||
| Developmental/Reproductive Toxicology | Developmental toxicity | Dev tox | VEGA | Active | Agonist/Antagonist/"-" = Inactive/Not predicted | The data set was split into training (234 substances) and test sets (58 substances) using rational design, by CAESAR Partner Helmholtz-Zentrum für Umweltforschung, using ChemProp. QSAR classification model for Developmental Toxicity based on a Random Forest method implemented using WEKA open-source libraries. EPA descriptors have been used for modeling. They refer to descriptors calculated using Toxicity Estimation Software Tool (T.E.S.T.). The selected number of descriptors is 13. Statistics for goodness-of-fit were for the training set: n = 234Accuracy 100%; FP rate 0%; FN rate 0%; PPV 100%; NPV 100%; Sensitivity 100%; Specificity 100%; Nb unpredicted 0. | https://www.vegahub.eu/portfolio-item/vega-qsar-models-qrmf/ | |
| Endocrine Disruption | ARO | VEGA | TOX21 | NP | Agonist/Antagonist/"-" = Inactive/Not predicted | no information | https://www.vegahub.eu/portfolio-item/vega-qsar-models-qrmf/ | |
| Organ toxicology | Hepatotoxicity | DILI | ADMETLA2 | - | Active/"-" = Inactive/Not predicted | Drug-induced liver injury (DILI) has become the most common safety problem of drug withdrawal from the market over the past 50 years. Result interpretation: Category 0: DILI negative(-); Category 1: DILI positive(+). The output value is the probability of being toxic, within the range of 0 to 1. Empirical decision: 0-0.3: excellent (green); 0.3-0.7: medium (yellow); 0.7-1.0(++): poor (red). We adapted the results, if prediction >= 0.5 the compound is considered as Active. | doi: 10.3389/fphar.2017.00889 | |
| Genotoxicity/Mutagenicity | Mutagenicity | AMES | vNN-ADMET | - | Active/"-" = Inactive/Not predicted | Mutagens : Chemical mutagenicity (AMES test). The k-nearest neighbor (k-NN) method is widely used to develop QSAR models (Zheng and Tropsha, 2000). An alternative approach is to use a predetermined similarity criterion, vNN method, which uses all nearest neighbors that meet a structural similarity criterion to define the model's applicability domain (Liu et al., 2012, 2015; Liu and Wallqvist, 2014). When no nearest neighbor meets the criterion, the vNN method makes no prediction. Mutagens are chemicals that cause abnormal genetic mutations leading to cancer. A common way to assess a chemical's mutagenicity is the Ames test (Ames et al., 1973). This test has become the standard for assessing the safety of chemicals and drugs, and has been used to test thousands of molecules. We examined whether the vNN method could effectively use existing data to predict mutagenicity. Ames mutagenicity dataset consisting of 6,512 compounds, of which 3,503 were Ames-positive (Hansen et al., 2009), and developed a vNN Ames mutagenicity prediction model. The model performed well, with an overall accuracy of 82%; sensitivity and specificity values of 86 and 75%, respectively; and a high kappa value of 0.62. The model also reliably predicted 79% of the compounds in the Ames dataset when using 10-fold CV. | doi: 10.3389/fphar.2017.00889. | |
| Endocrine Disruption | ER-LBD | ADMETLAB2 | - | Agonist/Antagonist/"-" = Inactive/Not predicted | ER-LBD: Estrogen receptor (ER), a nuclear hormone receptor, plays an important role in development, metabolic homeostasis and reproduction. Two subtypes of ER, ER-alpha and ER-beta have similar expression patterns with some uniqueness in both types. Endocrine disrupting chemicals (EDCs) and their interactions with steroid hormone receptors like ER causes disruption of normal endocrine function. Traditional multitask graph neural network (GNN) methods usually handle homogeneous tasks, such as pure regression or classification tasks. However, in ADMET prediction, both regression tasks and classification tasks are needed. Therefore, a multi-task graph attention (MGA) framework was used to simultaneously learn the regression and classification tasks for ADMET predictions in this study. Result interpretation: Category 1: actives ; Category 0: inactives. The output value is the probability of being actives within the range of 0 to 1. Empirical decision: If the prediction is upper or equal to 0.5 the molecules is considered as “Active”. If not the molecules is noted “-“. | doi: 10.1093/nar/gkab255 | ||
| Organ toxicology | Skin toxicity | SkinSen | VEGA | CAESAR | Active | Active/"-" = Inactive/Not predicted | Skin sensitisation on mouse (local lymph node assay model) OECD 429. The final dataset is composed of 209 mono- constituent organic compounds. The dataset was randomly split into training and test set with respectively the 80% (167) and the 20% (42) of the compounds. The model consists in an Adaptive Fuzzy Partition (AFP) based on 8 descriptors. The AFP produces as output two values that represent the belonging degree respectively to the sensitizer and non-sensitizer classes. Statistics on the training set Accuracy: 91% Sensitivity: 95% Specificity: 74% TP 127, TN 27, FP 7, FN 6. | https://www.vegahub.eu/portfolio-item/vega-qsar-models-qrmf/ | 
| Cell toxicology | Mito-toxicity | MMP | ADMETSAR | - | Agonist/Antagonist/"-" = Inactive/Not predicted | The chemicals associated with mitochondrial toxicity were collected from Pubchem bioassay database and DrugBank, and Zhang's dataset.[1] In total, 1440 positive chemicals which can cause membrane potential drop and 1089 negative chemicals that have been marketed but without related mitochondrial toxicity and side effects were collected from Zhao et al.[2] The model was built by MACCS and random forest. | DOI: 10.1093/bioinformatics/bty707 | |
| Genotoxicity/Mutagenicity | Mutagenicity | AMES | VEGA | SarPy-IRFMN | - | Active/"-" = Inactive/Not predicted | Model based on a set of rules extracted from a set of 4,000 compounds that were used for defining structural alerts (SAs) by SARpy software without any ‘a priori’ knowledge. There are 112 rules for mutagenicity and 93 rules for non-mutagenicity. Qualitative information were transformed: "mutagenic" whatever the quality of the prediction is considered as "Active" and "non-mutagenic" prediction as "-" whatever the quality of the prediction. | https://www.vegahub.eu/portfolio-item/vega-qsar-models-qrmf/ | 
| Carcinogenicity | Carcino | VEGA | CAESAR | Active | Active/"-" = Inactive/Not predicted | It uses a Counter Propagation Artificial Neural Network (CP ANN) consisting of two layers of neurons arranged in a two-dimensional rectangular matrix. The algorithm is based on 12 descriptors and 645 chemical as training set. Qualitative information were changed: "mutagenic" whatever the quality of the prediction was replaced to "Active" and "non-mutagenic" prediction was replaced to "-" whatever the quality of the prediction. | https://www.vegahub.eu/portfolio-item/vega-qsar-models-qrmf/ | |
| Endocrine Disruption | PPARg | ADMETSAR | - | Active/"-" = Inactive/Not predicted | Binary models for 6 targets implicated in endocrine disruption (ED), namely AR (androgen receptor), ER (estrogen receptor), TR (thyroid receptor), GR (glucocorticoid receptor), PPARγ (peroxisome proliferator-activated receptors γ) and Aromatase. All the datasets were collected from Tox21 and random under-sampling technique was used to achieve a balanced dataset for model training. A multi-label model was developped by combining the best single-label model of each target and the resulting model can be used to distinguish whether certain endocrine disrupting chemicals can simultaneously modulate multiple receptors related to ED. Finally, all the binary models and multi-label model were respectively evaluated by corresponding single-label test sets and a multi-label test set with reasonable reliability. | DOI: 10.1093/bioinformatics/bty707 | ||
| Organ toxicology | Cardiotoxicity | hERG Blocker | ADMETSAR | - | Active/"-" = Inactive/Not predicted | The original chemicals with experimental IC50 values were collected from literature and ChEMBL databse by Zhang et al.[1] Only patch clamp determined IC50 values on different mammalian cell lines were collected in this study. In tatal, 717 toxic moleucles (IC50 < 30 uMol) and 261 nontxsoic molecules were collected. The model was built by AtomPairs and support vector machine. | DOI: 10.1093/bioinformatics/bty707 | |
| Cell toxicology | Genome Instability | ATAD5 | ADMETLAB2 | Active | Agonist/Antagonist/"-" = Inactive/Not predicted | ATAD5: ATPase family AAA domain-containing protein 5. As cancer cells divide rapidly and during every cell division they need to duplicate their genome by DNA replication. The failure to do so results in the cancer cell death. Based on this concept, many chemotherapeutic agents were developed but have limitations such as low efficacy and severe side effects etc. Enhanced Level of Genome Instability Gene 1 (ELG1; human ATAD5) protein levels increase in response to various types of DNA damage. Traditional multitask graph neural network (GNN) methods usually handle homogeneous tasks, such as pure regression or classification tasks. However, in ADMET prediction, both regression tasks and classification tasks are needed. Therefore, a multi-task graph attention (MGA) framework was used to simultaneously learn the regression and classification tasks for ADMET predictions in this study. Result interpretation: Category 1: actives ; Category 0: inactives. The output value is the probability of being actives within the range of 0 to 1. Empirical decision: If the prediction is upper or equal to 0.5 the molecules is considered as “Active”. If not the molecules is noted “-“. | doi: 10.1093/nar/gkab255 | |
| Endocrine Disruption | AR-LBD | ADMETLAB2 | - | Agonist/Antagonist/"-" = Inactive/Not predicted | AR-LBD: Androgen receptor (AR), a nuclear hormone receptor, plays a critical role in AR-dependent prostate cancer and other androgen related diseases. Endocrine disrupting chemicals (EDCs) and their interactions with steroid hormone receptors like AR may cause disruption of normal endocrine function as well as interfere with metabolic homeostasis, reproduction, developmental and behavioral functions. Traditional multitask graph neural network (GNN) methods usually handle homogeneous tasks, such as pure regression or classification tasks. However, in ADMET prediction, both regression tasks and classification tasks are needed. Therefore, a multi-task graph attention (MGA) framework was used to simultaneously learn the regression and classification tasks for ADMET predictions in this study. Result interpretation: Category 1: actives ; Category 0: inactives. The output value is the probability of being actives within the range of 0 to 1. Empirical decision: If the prediction is upper or equal to 0.5 the molecules is considered as “Active”. If not the molecules is noted “-“. | doi: 10.1093/nar/gkab255 | ||
| Carcinogenicity | Carcino | ADMETSAR | Binary | Active | Active/"-" = Inactive/Not predicted | The data set used for model building was compiled from CPDB, which contains alarge number of chemical structures (1547 substances) with tumor data in rodents. Forthese chemicals, the carcinogenic potency is expressed as TD50 values. The data set was prepared infollowing steps:(1) Removing mixtures, inorganic, salts and organometallic compounds;(2) Removing compounds that have inconsistent results in different experimental groups;(3) Removing compounds with molecular weights less than 40 or more than 600;(4) Only one stereoisomer was retained because the 2D fingerprints of apair of stereoiso-mers are identical. Finally, 476 carcinogens and 440 noncarcinogens were collected and the binary model was built by Morgan fingerprint and k-nearest neighors method. The "+"prediction was changed to "Active". The trinary model was built by MACSS fingerprint and support vector machine. The "+"prediction was changed to "Active" | DOI: 10.1093/bioinformatics/bty707 | |
| Endocrine Disruption | MR | VEGA_NRMEA | - | Agonist/Antagonist/"-" = Inactive/Not predicted | https://www.vegahub.eu/wp/wp-content/uploads/2019/12/VEGA_NRMEA_model_Introduction.pdf | |||
| Organ toxicology | Respiratory toxicity | Respiratory | ADMETLAB2 | Active | Active/"-" = Inactive/Not predicted | Among these safety issues, respiratory toxicity has become the main cause of drug withdrawal. Drug_x0002_induced respiratory toxicity is usually underdiagnosed because it may not have distinct early signs or symptoms in common medications and can occur with significant morbidity and mortality.Therefore, careful surveillance and treatment of respiratory toxicity is of great importance. Result interpretation: Category 1: respiratory toxicants; Category 0: non-respiratory toxicants. The output value is the probability of being toxic, within the range of 0 to 1. | doi: 10.1093/nar/gkab255 | |
| Endocrine Disruption | ARO | ADMETSAR | - | Agonist/Antagonist/"-" = Inactive/Not predicted | Binary models for 6 targets implicated in endocrine disruption (ED), namely AR (androgen receptor), ER (estrogen receptor), TR (thyroid receptor), GR (glucocorticoid receptor), PPARγ (peroxisome proliferator-activated receptors γ) and Aromatase. All the datasets were collected from Tox21 and random under-sampling technique was used to achieve a balanced dataset for model training. A multi-label model was developped by combining the best single-label model of each target and the resulting model can be used to distinguish whether certain endocrine disrupting chemicals can simultaneously modulate multiple receptors related to ED. Finally, all the binary models and multi-label model were respectively evaluated by corresponding single-label test sets and a multi-label test set with reasonable reliability. | DOI: 10.1093/bioinformatics/bty707 | ||
| Carcinogenicity | Non-Genotox-Carc | ADMETLAB2 | 0.0 | Number of structural alert | Molecules containing these substructures may cause carcinogenicity through nongenotoxic mechanisms. There are 23 substructures in this endpoint. | doi: 10.1093/nar/gkab255 | ||
| Organ toxicology | Hepatotoxicity | PPARg up liver stea | VEGA | - | Active/"-" = Inactive/Not predicted | Data referred to ToxCast assays ATG_PPARg_TRANS_up (AEID: 134). Attagene (ATG) assays are cell-based, multiplexed-redout assays that uses HepG2, a human liver cell line, with measurements taken at 24 hour after chemical dosing in 24-well plate. The consensus of four single models based on 1) Random Forest (RF) and Balanced Random Forest (BRF) were applied tod the training dataset of 908 chemicals. The output statistics for goodness of fit were Balanced Accuracy: 0.97, Sensitivity: 0.99, Specificity: 0.94, MCC: 0.88. TP 211, TN 655, FP 41, FN 1 | https://www.vegahub.eu/portfolio-item/vega-qsar-models-qrmf/ | |
| Organ toxicology | Ocular toxicity | EI | ADMETLAB2 | Active | Active/"-" = Inactive/Not predicted | Assessing the eye irritation/corrosion (EI/EC) potential of a chemical is a necessary component of risk assessment. Cornea and conjunctiva tissues comprise the anterior surface of the eye, and hence cornea and conjunctiva tissues are directly exposed to the air and easily suffer injury by chemicals. There are several substances, such as chemicals used in manufacturing, agriculture and warfare, ocular pharmaceuticals, cosmetic products, and household products, that can cause EI or EC. Result interpretation: Category 1: corrosives / irritants chemicals; Category 0: non-corrosives / non-irritants chemicals. The output value is the probability of being toxic, within the range of 0 to 1. | doi: 10.1093/nar/gkab255 | |
| Endocrine Disruption | RARr | VEGA_NRMEA | - | Agonist/Antagonist/"-" = Inactive/Not predicted | https://www.vegahub.eu/wp/wp-content/uploads/2019/12/VEGA_NRMEA_model_Introduction.pdf | |||
| Carcinogenicity | inhal carcino | VEGA | - | Active/"-" = Inactive/Not predicted | The RAIS database include the inhalation slope factor (ISF) values only for chemicals with carcinogenic effects, so chemicals with a defined value (in our case ISF) were considered carcinogenic, and compounds with no value were considered non-carcinogenic. The slope of this line, known as the slope factor, is an upper-bound estimate of risk per increment of dose for carcinogens that can be used to assess the increase over a lifetime in incidence of cancers in humans from inhalation exposure to a dose of a carcinogenic chemical. The final dataset for the classification model included 598 compounds (210 positive, 388 negative). Classification and regression trees (CART) based on 9 molecular descriptors. Statistic for goodness-of-fit: Accuracy = 0.81 Sensitivity = 0.73 Specificity = 0.86 (TP 154, TN 333, FP 55, FN 56). | https://www.vegahub.eu/portfolio-item/vega-qsar-models-qrmf/ | ||
| Organ toxicology | Hepatotoxicity | DILI | vNN-ADMET | - | Active/"-" = Inactive/Not predicted | DILI: The k-nearest neighbor (k-NN) method is widely used to develop QSAR models (Zheng and Tropsha, 2000). An alternative approach is to use a predetermined similarity criterion, vNN method, which uses all nearest neighbors that meet a structural similarity criterion to define the model's applicability domain (Liu et al., 2012, 2015; Liu and Wallqvist, 2014). When no nearest neighbor meets the criterion, the vNN method makes no prediction. Drug-induced liver injury (DILI) has been one of the most commonly cited reason for drug withdrawals from the market. This application predicts whether a compound could cause DILI. The dataset of 1,431 compounds was obtained from four sources used by Xu et al. This dataset contains both pharmaceuticals and non-pharmaceuticals; Prediction classified a compound as causing DILI if it was associated with a high risk of DILI and not if there was no such risk. More information are available doi: 10.3389/fphar.2017.00889. “Yes”/”No” predictions are changed to "Active"/”-“ | doi: 10.3389/fphar.2017.00889. | |
| Genotoxicity/Mutagenicity | Mutagenicity | AMES | PKCSM | - | Active/"-" = Inactive/Not predicted | AMES test: mutagenicity prediction based on AMES test: The prediction is based on molecular properties (Molecular Weight; Heavy Atom count; LogP; Heteroatoms count; Rotatable Bonds count; Ring count; TPSA; Labute ASA; Fluorine atom Count; Toxicophore [1-36]; Pharmacophore count) calculated using the RDKit cheminformatics toolkit and used for training the predictive models. Two different algorithms, Random Forest and Logistic Regression, did the qualitative predictions (classification tasks). The Ames test is a widely employed method to assess a compounds mutagenic potential using bacteria. A positive test indicates that the compound is mutagenic and therefore may act as a carcinogen. This predictive model was built on the results of over 8445 compounds Ames tests. The best performing predictor in each task was chosen based 5-fold cv approach. The Weka toolkit was used for training and testing the models. How to interpret the results: It predicts whether a given compound is likely to be Ames positive and hence mutagenic. | doi: 10.1021/acs.jmedchem.5b00104 | |
| Endocrine Disruption | ERb | VEGA_NRMEA | - | Agonist/Antagonist/"-" = Inactive/Not predicted | https://www.vegahub.eu/wp/wp-content/uploads/2019/12/VEGA_NRMEA_model_Introduction.pdf | |||
| Organ toxicology | Hepatotoxicity | NRF2 up liver stea | VEGA | - | Active/"-" = Inactive/Not predicted | Data referred to ToxCast assays ATG_NRF2_ARE_CIS_up (AEID: 97). Attagene (ATG) assays are cell-based, multiplexed-redout assays that uses HepG2, a human liver cell line, with measurements taken at 24 hour after chemical dosing in 24-well plate. The consensus of four single models based on 1) Random Forest (RF) and Balanced Random Forest (BRF) were applied tod the training dataset of 853 chemicals. The output statistics for goodness of fit were Balance Accuracy: 0.99, Sensitivity: 1.00, Specificity: 0.99, MCC: 0.98. TP 276. TN 570, FP 7, FN 0. | https://www.vegahub.eu/portfolio-item/vega-qsar-models-qrmf/ | |
| Endocrine Disruption | VDR | VEGA_NRMEA | - | Agonist/Antagonist/ a-anta agonist and antagonist /"-" = Inactive/Not predicted | https://www.vegahub.eu/wp/wp-content/uploads/2019/12/VEGA_NRMEA_model_Introduction.pdf | |||
| Cell toxicology | Sub-loc | Sub-loc | ADMETSAR | Mitochondria | no information available | DOI: 10.1093/bioinformatics/bty707 | ||
| Carcinogenicity | inhal carcino | VEGA | 32.3594 | mg/kg-day | https://www.vegahub.eu/portfolio-item/vega-qsar-models-qrmf/ | |||
| Endocrine Disruption | AR | ADMETLAB2 | - | AR: Androgen receptor (AR), a nuclear hormone receptor, plays a critical role in AR-dependent prostate cancer and other androgen related diseases. Endocrine disrupting chemicals (EDCs) and their interactions with steroid hormone receptors like AR may cause disruption of normal endocrine function as well as interfere with metabolic homeostasis, reproduction, developmental and behavioral functions. Traditional multitask graph neural network (GNN) methods usually handle homogeneous tasks, such as pure regression or classification tasks. However, in ADMET prediction, both regression tasks and classification tasks are needed. Therefore, a multi-task graph attention (MGA) framework was used to simultaneously learn the regression and classification tasks for ADMET predictions in this study. Result interpretation: Category 1: actives ; Category 0: inactives. The output value is the probability of being actives within the range of 0 to 1. Empirical decision: If the prediction is upper or equal to 0.5 the molecules is considered as “Active”. If not the molecules is noted “-“. | doi: 10.1093/nar/gkab255 | |||
| Genotoxicity/Mutagenicity | Mutagenicity | AMES | VEGA | ISS | - | Active/"-" = Inactive/Not predicted | The model has been built as a set of 69 rules, taken from the work of Benigni and Bossa (ISS) as implemented in the software ToxTree and based from a training model of 670 compounds. Qualitative information were transformed: "mutagenic" whatever the quality of the prediction is considered as "Active" and "non-mutagenic" prediction as "-" whatever the quality of the prediction. | https://www.vegahub.eu/portfolio-item/vega-qsar-models-qrmf/ | 
| Carcinogenicity | Carcino | ADMETLAB2 | Active | Active/"-" = Inactive/Not predicted | Among various toxicological endpoints of chemical substances, carcinogenicity is of great concern because of its serious effects on human health. The carcinogenic mechanism of chemicals may be due to their ability to damage the genome or disrupt cellular metabolic processes. Many approved drugs have been identified as carcinogens in humans or animals and have been withdrawn from the market. Result interpretation: Category 1: carcinogens; Category 0: non-carcinogens. Chemicals are labelled as active (carcinogens) or inactive (non-carcinogens) according to their TD50 values. The output value is the probability of being toxic, within the range of 0 to 1. We adapted the results, if Prediction Value >= 0.5 the compound is considered as “Active”, if not the value is replaced by “-“. | doi: 10.1093/nar/gkab255 | ||
| Organ toxicology | Cardiotoxicity | hERG II Blocker | PKCSM | - | Active/"-" = Inactive/Not predicted | hERG II Inhibitor: human ether-a-go-go gene II Inhibitor: The prediction is based on molecular properties (Molecular Weight; Heavy Atom count; LogP; Heteroatoms count; Rotatable Bonds count; Ring count; TPSA; Labute ASA; Fluorine atom Count; Toxicophore [1-36]; Pharmacophore count) calculated using the RDKit cheminformatics toolkit and used for training the predictive models. Two different algorithms, Random Forest and Logistic Regression, did the qualitative predictions (classification tasks). Inhibition of the potassium channels encoded by hERG (human ether-a-go-go gene) are the principal causes for the development of acquire long QT syndrome - leading to fatal ventricular arrhythmia. Inhibition of hERG channels has resulted in the withdrawal of many substances from the pharmaceutical market. This predictor was built using hERG II inhibition information for 806 compounds. The best performing predictor in each task was chosen based 5-fold cv approach. The Weka toolkit was used for training and testing the models. How to interpret the results: The predictor will determine if a given compound is likely to be a hERG II inhibitor. Qualitative information were changed: "Yes" was replaced to "Active" and "No" prediction was replaced to "-". | doi: 10.1021/acs.jmedchem.5b00104 | |
| Organ toxicology | Skin toxicity | SkinSen | VEGA | IRFMN-JRC | - | Active/"-" = Inactive/Not predicted | Skin sensitisation on mouse (local lymph node assay model) OECD 429. The training set contains 264 compounds. The test set counts 68 compounds. The model consists in Decision trees based on 8 descriptors. Statistics for goodness-of-fit: Training set: n = 264; Accuracy = 0.80; Specificity = 0.79; Sensitivity = 0.81 TP: 145, TN: 66, FP:18, FN: 35. | https://www.vegahub.eu/portfolio-item/vega-qsar-models-qrmf/ | 
| Carcinogenicity | oral carcino | VEGA | Active | Active/"-" = Inactive/Not predicted | The RAIS database include the oral slope factor (OSF) values only for chemicals with carcinogenic effects, so chemicals with a defined value (in our case OSF) were considered carcinogenic, and compounds with no value were considered non-carcinogenic. The slope of this line, known as the slope factor, is an upper-bound estimate of risk per increment of dose for carcinogens that can be used to assess the increase over a lifetime in incidence of cancers in humans from oral or inhalation exposure to a dose of a carcinogenic chemical. The final dataset for the classification model included 593 compounds (257 positive, 336 negative). Classification and regression trees (CART) based on 7 molecular descriptors. Statistic for goodness-of-fit: Accuracy = 0.81 Sensitivity = 0.82 Specificity = 0.79 (TP 211, TN 267, FP 69, FN 46). | https://www.vegahub.eu/portfolio-item/vega-qsar-models-qrmf/ | ||
| Human toxicology | MRTD | PKCSM | 6.637 | mg/Kg of bw /day | MRTD: Max. Recommended Therapeutic Dose (MRTD). The prediction is based on molecular properties (Molecular Weight; Heavy Atom count; LogP; Heteroatoms count; Rotatable Bonds count; Ring count; TPSA; Labute ASA; Fluorine atom Count; Toxicophore [1-36]; Pharmacophore count) calculated using the RDKit cheminformatics toolkit and used for training the predictive models. Two different algorithms, Gaussian Processes and Model Tree Regression did the quantitative predictions (regression tasks). The maximum recommended tolerated dose (MRTD) provides an estimate of the toxic dose threshold of chemicals in humans. The model is trained using 1222 experimental data points from human clinical trials and predicts the logarithm of the MRTD (log mg/kg/day). This will help guide the maximum recommended starting dose for pharmaceuticals in phase I clinical trials, which are currently based on extrapolations from animal data. The best performing predictor in each task was chosen based 10-fold CV approach. The Weka toolkit was used for training and testing the models. How to interpret the results: For a given compound, a MRTD of less than or equal to 0.477 log(mg/kg/day) is considered low, and high if greater than 0.477 log(mg/kg/day). To facilitate the comparison between tool predictions, the data expressed in log(mg/kg/day) were transformed in mg/Kg of BW/day. | doi: 10.1021/acs.jmedchem.5b00104 | ||
| Cell toxicology | Oxydative stress | ARE | ADMETLAB2 | Active | Agonist/Antagonist/"-" = Inactive/Not predicted | ARE: Oxidative stress has been implicated in the pathogenesis of a variety of diseases ranging from cancer to neurodegeneration. The antioxidant response element (ARE) signaling pathway plays an important role in the amelioration of oxidative stress. The CellSensor ARE-bla HepG2 cell line (Invitrogen) can be used for analyzing the Nrf2/antioxidant response signaling pathway. Nrf2 (NF-E2-related factor 2) and Nrf1 are transcription factors that bind to AREs and activate these genes. Traditional multitask graph neural network (GNN) methods usually handle homogeneous tasks, such as pure regression or classification tasks. However, in ADMET prediction, both regression tasks and classification tasks are needed. Therefore, a multi-task graph attention (MGA) framework was used to simultaneously learn the regression and classification tasks for ADMET predictions in this study. Result interpretation: Category 1: actives ; Category 0: inactives. The output value is the probability of being actives within the range of 0 to 1. Empirical decision: If the prediction is upper or equal to 0.5 the molecules is considered as “Active”. If not the molecules is noted “-“. | doi: 10.1093/nar/gkab255 | |
| Carcinogenicity | Male rat carcino | VEGA | -4.0069 | [log(1/(mg/kg-day))] | no information | https://www.vegahub.eu/portfolio-item/vega-qsar-models-qrmf/ | ||
| General Toxicology | LD50/ROA | ADMETLAB2 | Active | Active/"-" = Inactive/Not predicted | Determination of acute toxicity in mammals (e.g. rats or mice) is one of the most important tasks for the safety evaluation of drug candidates. Result interpretation: Category 0: low-toxicity, > 500 mg/kg; Category 1: high-toxicity; < 500 mg/kg. The output value is the probability of being toxic, within the range of 0 to 1. We adapted the results, if prediction >= 0.5 the compound is considered as Active. | doi: 10.1093/nar/gkab255 | ||
| Endocrine Disruption | AR-LBD | ADMETSAR | - | Agonist/Antagonist/"-" = Inactive/Not predicted | Binary models for 6 targets implicated in endocrine disruption (ED), namely AR (androgen receptor), ER (estrogen receptor), TR (thyroid receptor), GR (glucocorticoid receptor), PPARγ (peroxisome proliferator-activated receptors γ) and Aromatase. All the datasets were collected from Tox21 and random under-sampling technique was used to achieve a balanced dataset for model training. A multi-label model was developped by combining the best single-label model of each target and the resulting model can be used to distinguish whether certain endocrine disrupting chemicals can simultaneously modulate multiple receptors related to ED. Finally, all the binary models and multi-label model were respectively evaluated by corresponding single-label test sets and a multi-label test set with reasonable reliability. | DOI: 10.1093/bioinformatics/bty707 | ||
| Carcinogenicity | Carcino | VEGA | IRFMN-Antares | Active | Active/"-" = Inactive/Not predicted | The model has been built as a set of 127 rules, extracted with SARpy software (based on molecular fragments) from a dataset obtained from the carcinogenicity database of EU-funded project ANTARES. 1,543 compounds were used as dataset. Qualitative information were changed: "mutagenic" whatever the quality of the prediction was replaced to "Active" and "Possible non-mutagenic" prediction was replaced to "-" whatever the quality of the prediction. | https://www.vegahub.eu/portfolio-item/vega-qsar-models-qrmf/ | |
| Endocrine Disruption | GR | ADMETSAR | - | Active/"-" = Inactive/Not predicted | Binary models for 6 targets implicated in endocrine disruption (ED), namely AR (androgen receptor), ER (estrogen receptor), TR (thyroid receptor), GR (glucocorticoid receptor), PPARγ (peroxisome proliferator-activated receptors γ) and Aromatase. All the datasets were collected from Tox21 and random under-sampling technique was used to achieve a balanced dataset for model training. A multi-label model was developped by combining the best single-label model of each target and the resulting model can be used to distinguish whether certain endocrine disrupting chemicals can simultaneously modulate multiple receptors related to ED. Finally, all the binary models and multi-label model were respectively evaluated by corresponding single-label test sets and a multi-label test set with reasonable reliability. | DOI: 10.1093/bioinformatics/bty707 | ||
| Organ toxicology | Respiratory toxicity | Respiratory | ADMETSAR | - | Active/"-" = Inactive/Not predicted | In total, 2529 compounds (1440+/1089-) were obtained from three databases. The positive data are compounds that have adverse effects on the human respiratory system, and the negative data are substances that are harmless to the respiratory system, including respiratory non-sensitizers and skin non-sensitizers.[1] | DOI: 10.1093/bioinformatics/bty707 | |
| Organ toxicology | Skin toxicity | SkinSen Rules | ADMETLAB2 | 0.0 | nomber of alert | Molecules containing these substructures may cause skin irritation.There are 155 substructures in this endpoint. Molecules containing these substructures may cause skin irritation. | doi: 10.1093/nar/gkab255 | |
| Carcinogenicity | oral carcino | VEGA | 22.3872 | mg/kg BW - day | The RAIS database include the oral slope factor (OSF) values only for chemicals with carcinogenic effects, so chemicals with a defined value (in our case OSF) were considered carcinogenic, and compounds with no value were considered non-carcinogenic. The slope of this line, known as the slope factor, is an upper-bound estimate of risk per increment of dose for carcinogens that can be used to assess the increase over a lifetime in incidence of cancers in humans from oral or inhalation exposure to a dose of a carcinogenic chemical. The final dataset for the classification model included 315 compounds and 226 were used for the training. The multi-layer perceptron – artificial neural networks (MLP-ANNs) based on 12 molecular descriptors. Statistic for goodness-of-fit: R2 0.70, RMSE 0.88. | https://www.vegahub.eu/portfolio-item/vega-qsar-models-qrmf/ | ||
| Endocrine Disruption | EDC-s | VEGA | - | Agonist/Antagonist/"-" = Inactive/Not predicted | No description | https://www.vegahub.eu/portfolio-item/vega-qsar-models-qrmf/ | ||
| General Toxicology | LOAEL | PKCSM | 699.842 | mg/kg of bw/day | LOAEL: Toxicity Oral Rat Chronic Toxicity (LOAEL). The prediction is based on molecular properties (Molecular Weight; Heavy Atom count; LogP; Heteroatoms count; Rotatable Bonds count; Ring count; TPSA; Labute ASA; Fluorine atom Count; Toxicophore [1-36]; Pharmacophore count) calculated using the RDKit cheminformatics toolkit and used for training the predictive models. Two different algorithms, Gaussian Processes and Model Tree Regression did the quantitative predictions (regression tasks). It is important to consider the toxic potency of a potential compound. Exposure to low-moderate doses of chemicals over long periods of time is of significant concern in many treatment strategies. Chronic studies aim to identify the lowest dose of a compound that results in an observed adverse effect (LOAEL), and the highest dose at which no adverse effects are observed (NOAEL). This predictor was built using the LOAEL results from 445 compounds. The best performing predictor in each task was chosen based Leave-one-out approach. The Weka toolkit was used for training and testing the models How to interpret the results: For a given compound, the predicted log Lowest Observed Adverse Effect (LOAEL) in log (mg/kg_bw/day) will be generated and convert in mg/kg bw/day. The LOAEL results need to be interpreted relative to the bioactive concentration and treatment lengths required. | doi: 10.1021/acs.jmedchem.5b00104 | ||
| Endocrine Disruption | ER | ADMETLAB2 | - | Agonist/Antagonist/"-" = Inactive/Not predicted | ER: Estrogen receptor (ER), a nuclear hormone receptor, plays an important role in development, metabolic homeostasis and reproduction. Endocrine disrupting chemicals (EDCs) and their interactions with steroid hormone receptors like ER causes disruption of normal endocrine function. Therefore, it is important to understand the effect of environmental chemicals on the ER signaling pathway. Traditional multitask graph neural network (GNN) methods usually handle homogeneous tasks, such as pure regression or classification tasks. However, in ADMET prediction, both regression tasks and classification tasks are needed. Therefore, a multi-task graph attention (MGA) framework was used to simultaneously learn the regression and classification tasks for ADMET predictions in this study. Result interpretation: Category 1: actives ; Category 0: inactives. The output value is the probability of being actives within the range of 0 to 1. Empirical decision: If the prediction is upper or equal to 0.5 the molecules is considered as “Active”. If not the molecules is noted “-“. | doi: 10.1093/nar/gkab255 | ||
| Genotoxicity/Mutagenicity | Mutagenicity | Chro-Ab | VEGA | Active | Active/"-" = Inactive/Not predicted | Data for chromosomal aberrations determined by in vitro test using Chinese hamster lung (CHL) and ovary (CHO) cells, with and without metabolic activation (metabolic system S9). After the implementation in VEGA, the dataset was split in training (442 chemicals) and test (35 chemicals). One-variable model based on SMILES-derived descriptors. Training set: n = 442, Balanced Accuracy 0.77, Sensitivity 0.72, Specificity 0.81, MCC 0.54. TP 149, TN 191, FP 44, FN 58. | https://www.vegahub.eu/portfolio-item/vega-qsar-models-qrmf/ | |
| Organ toxicology | Ocular toxicity | EI | ADMETSAR | Active | Active/"-" = Inactive/Not predicted | A total of 5220 chemicals (3874+/1346-) for a serious eye irritation (EI) dataset and 2299 chemicals (887+/1412-) as an eye corrosion (EC) dataset were collected from available databases and literature. The EI model was built by AtomPairs with support vector machine and the EC model was built by MACCS and support vector machine. | DOI: 10.1093/bioinformatics/bty707 | |
| Endocrine Disruption | RARb | VEGA_NRMEA | - | Agonist/Antagonist/"-" = Inactive/Not predicted | https://www.vegahub.eu/wp/wp-content/uploads/2019/12/VEGA_NRMEA_model_Introduction.pdf | |||
| Developmental/Reproductive Toxicology | Repro/dev toxicity | Repro/dev tox | VEGA | - | Agonist/Antagonist/"-" = Inactive/Not predicted | Data collection is described in "A Framework for Identifying Chemicals with Structural Features Associated with Potential to Act as Developmental or Reproductive Toxicants" Wu et al. 2013. (DOI:10.1021/tx400226u). The final dataset counts 685 substances: from the original dataset (n. 716) we selected substances on the basis of their structure (e.g. polymers, inorganics compounds and organometals were excluded) and with data for at least one ndpoint (developmental toxicity, reproductive toxicity). The model is a structure-based model and does not make use of descriptors. Statistics for goodness-of-fit: Sensitivity 89%, Specificity 44%, Accuracy 85%, MCC 0.27. | https://www.vegahub.eu/portfolio-item/vega-qsar-models-qrmf/ | |
| Endocrine Disruption | ARO | VEGA | IRFMN | - | Agonist/Antagonist/"-" = Inactive/Not predicted | This assay is based on Aromatase Breast cancer cell line (MCF-7 aro) Cell-based assay, and measures the inhibition of the conversion of testosterone to estradiol catalyzed by aromatase. The control used for this assay is Letrozole (IC50 =9.44 ± 1.4 nM (n =27)).. The final dataset has 3254 compounds, with 281 active agonists, 170 active antagonists, and 2803 inactive. the model is built on 18 descriptors.Statistics for goodness-of-fit were: Accuracy 0.94, MCC 0.74. | https://www.vegahub.eu/portfolio-item/vega-qsar-models-qrmf/ | |
| Organ toxicology | Hepatotoxicity | DILI | PKCSM | - | Active/"-" = Inactive/Not predicted | DILI: Drug-induced liver injury: The prediction is based on molecular properties (Molecular Weight; Heavy Atom count; LogP; Heteroatoms count; Rotatable Bonds count; Ring count; TPSA; Labute ASA; Fluorine atom Count; Toxicophore [1-36]; Pharmacophore count) calculated using the RDKit cheminformatics toolkit and used for training the predictive models. Two different algorithms, Random Forest and Logistic Regression, did the qualitative predictions (classification tasks). Drug-induced liver injury is a major safety concern for drug development and a significant cause of drug attrition. This predictor was built using the liver associated side effects of 531 compounds observed in humans. A compound was classed as hepatotoxic if it had at least one pathological or physiological liver event which is strongly associated with disrupted normal function of the liver. The best performing predictor in each task was chosen based 5-fold cv approach. The Weka toolkit was used for training and testing the models. How to interpret the results: How to interpret the results: It predicts whether a given compound is likely to be associated with disrupted normal function of the liver. Qualitative information were changed: "Yes" was replaced to "Active" and "No" prediction was replaced to "-". | doi: 10.1021/acs.jmedchem.5b00104 | |
| Genotoxicity/Mutagenicity | Mutagenicity | AMES | ADMETSAR | - | Active/"-" = Inactive/Not predicted | In total 4866 mutagens and 3482 non-mutagens were collected from literature and CPDB and CCRIS by Xu et al. The model was built by Morgan fingerprint and random forest. | DOI: 10.1093/bioinformatics/bty707 | |
| Endocrine Disruption | ERa | VEGA_NRMEA | - | Agonist/Antagonist/ a-anta agonist and antagonist /"-" = Inactive/Not predicted | https://www.vegahub.eu/wp/wp-content/uploads/2019/12/VEGA_NRMEA_model_Introduction.pdf | |||
| Genotoxicity/Mutagenicity | Mutagenicity | MicroN-In vitro | VEGA | Active | Active/"-" = Inactive/Not predicted | The dataset includes 380 mono-constituent organic compounds with experimental data collected from peer_x0002_reviewed literature, SCCS and EFSA opinions, ECVAM guidelines and review, and eChemPortal inventory. We carefully revised the sources in order to ensure their quality and reliability and, to our knowledge, most of the selected data can be classified with a Klimisch score of 1 due to the facts that the studies were done with test procedure in accordance with validated standard methods. The In vitro Micronucleus Activity (IRFMN/VERMEER) model (version 1.0.0) provides a qualitative prediction of genotoxicity as induction of micronucleus in mammalian cells in vitro. It is based on a set of rules extracted from a set of compounds by SARpy software without any ‘a priori’ knowledge. Active Structural Alerts (SAs) adimensional were of 82 genotoxic (active/positive) and were of Inactive Structural Alerts adimensional 56 non-genotoxic (inactive/negative). 293 molecules (171 active, 122 inactive) were used in the tranining set that allow to determine Accuracy 0.88; Specificity 0.73; Sensitivity 0.97; Matthews correlation coefficient 0.75. | https://www.vegahub.eu/portfolio-item/vega-qsar-models-qrmf/ | |
| Organ toxicology | Skin toxicity | SkinSen | ADMETLAB2 | Active | Active/"-" = Inactive/Not predicted | Skin sensitization is a potential adverse effect for dermally applied products. The evaluation of whether a compound, that may encounter the skin, can induce allergic contact dermatitis is an important safety concern. | doi: 10.1093/nar/gkab255 | |
| Endocrine Disruption | TRb | VEGA_NRMEA | - | Agonist/Antagonist/"-" = Inactive/Not predicted | https://www.vegahub.eu/wp/wp-content/uploads/2019/12/VEGA_NRMEA_model_Introduction.pdf | |||
| Cell toxicology | Cytotoxicity | Cyto- tox | vNN-ADMET | - | Agonist/Antagonist/"-" = Inactive/Not predicted | Cytotoxicity (HepG2): The k-nearest neighbor (k-NN) method is widely used to develop QSAR models (Zheng and Tropsha, 2000). An alternative approach is to use a predetermined similarity criterion, vNN method, which uses all nearest neighbors that meet a structural similarity criterion to define the model's applicability domain (Liu et al., 2012, 2015; Liu and Wallqvist, 2014). When no nearest neighbor meets the criterion, the vNN method makes no prediction. We developed a cytotoxicity prediction model, using a training dataset of in vitro toxicity against HepG2 cells for 6,097 structurally diverse compounds, which we collected from Chemical European Biology Laboratory (ChEMBL) (Bento et al., 2014). In developing our model, we considered compounds with an IC50 of 10 μM or less in the in vitro assay as cytotoxic. We classified cytotoxic compounds as positives and non-toxic compounds as negatives. The cytotoxicity model performed well, with an overall accuracy of 84% and a kappa value of 0.64. Because compounds in the dataset achieved only sparse coverage of the chemical space, the model only predicted compounds that were well represented in the dataset. Results were adapted and the “Yes” and “No” indicators were changed respectively to “Active” and “-“. | doi: 10.3389/fphar.2017.00889. | |
| Endocrine Disruption | AR | VEGA_NRMEA | - | Agonist/Antagonist/ a-anta agonist and antagonist /"-" = Inactive/Not predicted | https://www.vegahub.eu/wp/wp-content/uploads/2019/12/VEGA_NRMEA_model_Introduction.pdf | |||
| Organ toxicology | Hepatotoxicity | H-HT | ADMETLAB2 | - | Active/"-" = Inactive/Not predicted | The human hepatotoxicity. Drug induced liver injury is of great concern for patient safety and a major cause for drug withdrawal from the market. Adverse hepatic effects in clinical trials often lead to a late and costly termination of drug development programs. 2304 molecules (1299 + /1005 - ) were used among them 1850 (1044 + /806 - ) were used for the training dataset. Performance of classification models in training was AUC: 0.975, ACC: 0.895, SP: 0.976, Sen: 0.835, MCC: 0.802 | doi: 10.1093/nar/gkab255 | |
| Genotoxicity/Mutagenicity | Mutagenicity | AMES | VEGA | CAESAR | - | Active/"-" = Inactive/Not predicted | Combine 2 models: first datamining with Support Vector Machin (SVM) and then expert knowledge coded as structural alerts (SA). 3,367 chemicals have allowed the determination of 41 descriptors. Qualitative information were transformed: "mutagenic" whatever the quality of the prediction is considered as "Active" and "non-mutagenic" prediction as "-" whatever the quality of the prediction. | https://www.vegahub.eu/portfolio-item/vega-qsar-models-qrmf/ | 
Select an endpoint:
| Endpoint | Tool | Value | Unit | Comments | Reference | 
|---|---|---|---|---|---|
| Fsp3 | ADMETLAB2 | 0.0 | score | doi: 10.1093/nar/gkab255 | |
| Muegge | SWISSADME | 1.0 | Nb of alert | ||
| Bioavailability Score | SWISSADME | 0.55 | Probability | ||
| MCE-18 | ADMETLAB2 | 6.0 | score | doi: 10.1093/nar/gkab255 | |
| Natural Product-likeness | ADMETLAB2 | 0.114 | score | doi: 10.1093/nar/gkab255 | |
| Brenk | SWISSADME | 3.0 | Nb of alert | ||
| Leadlikeness | SWISSADME | 1.0 | Nb of alert | ||
| Alarm_NMR | ADMETLAB2 | 2.0 | Nb of alert | doi: 10.1093/nar/gkab255 | |
| Veber | SWISSADME | 0.0 | Nb of alert | ||
| BMS | ADMETLAB2 | 1.0 | Nb of alert | doi: 10.1093/nar/gkab255 | |
| Chelating | ADMETLAB2 | 0.0 | Nb of alert | doi: 10.1093/nar/gkab255 | |
| PAINS | ADMETLAB2 | 0.0 | Nb of alert | PAINS. Pan Assay Interference Compounds (PAINS) is one of the most famous frequent hitters filters, which comprises 480 substructures derived from the analysis of FHs determined by six target-based HTS assay. By application of these filters, it is easier to screen false positive hits and to flag suspicious compounds in screening databases. One of the most authoritative medicine magazines Journal of Medicinal Chemistry even requires authors to provide the screening results with the PAINS alerts of active compounds when submitting manuscripts. Results interpretation: If the number of alerts is not zero. | doi: 10.1093/nar/gkab255 | 
| PAINS | SWISSADME | 0.0 | Nb of alert | Pan Assay Interference Structures: implemented from Baell JB. & Holloway GA. 2010 J. Med. Chem. | |
| Lipinski | ADMETLAB2 | Accepted | Result | Lipinski Rule: Content: MW≤500; logP≤5; Hacc≤10; Hdon≤5. Results interpretation: If two properties are out of range, a poor absorption or permeability is possible, one is acceptable. Empirical decision: < 2 violations:excellent (green);≥2 violations: poor (red) | doi: 10.1093/nar/gkab255 | 
| Lipinski | SWISSADME | 0.0 | Nb of alert | Lipinski (Pfizer) filter: implemented from Lipinski CA. et al. 2001 Adv. Drug Deliv. Rev: 5 rules: MW ≤ 500; MLOGP ≤ 4.15; N or O ≤ 10; NH or OH ≤ 5. | |
| Pfizer | ADMETLAB2 | Accepted | Result | doi: 10.1093/nar/gkab255 | |
| GSK | ADMETLAB2 | Accepted | Result | doi: 10.1093/nar/gkab255 | |
| GoldenTriangle | ADMETLAB2 | Rejected | Result | doi: 10.1093/nar/gkab255 | |
| Ghose | SWISSADME | 1.0 | Nb of alert | ||
| QED | ADMETLAB2 | 0.297 | score | doi: 10.1093/nar/gkab255 | |
| Synth | ADMETLAB2 | 2.295 | score | Synth: Synthetic accessibility score is designed to estimate ease of synthesis of drug-like molecules, based on a combination of fragment contributions and a complexity penalty. The score is between 1 (easy to make) and 10 (very difficult to make). The synthetic accessibility score (SAscore) is calculated as a combination of two components: 𝑆𝐴𝑠𝑐𝑜𝑟𝑒 = 𝑓𝑟𝑎𝑔𝑚𝑒𝑛𝑡𝑆𝑐𝑜𝑟𝑒 − 𝑐𝑜𝑚𝑝𝑙𝑒𝑥𝑖𝑡𝑦𝑃𝑒𝑛𝑎𝑙𝑡𝑦. Results interpretation: high SAscore: ≥ 6, difficult to synthesize; low SAscore: < 6, easy to synthesize. Empirical decision: ≤ 6:excellent (green); > 6: poor (red) | doi: 10.1093/nar/gkab255 | 
| Synth | SWISSADME | 1.56 | score | Synthetic accessibility score: from 1 (very easy) to 10 (very difficult) based on 1024 fragmental contributions (FP2) modulated by size and complexity penaties, trained on 12’782’590 molecules and tested on 40 external molecules (r2 = 0.94) | |
| Egan | SWISSADME | 0.0 | Nb of alert | 
Fungi
- 
                Fungi id Species 91 Aspergillus avenaceus 
 
                         
                    