This document provides detailed instruction for preparing the files, and steps to run PgpRules. PgpRules is easy to use and only requires the specific structure file format.
PgpRules Server predicts two endpoints of P‑glycoprotein (P‑gp), including substrate property and inhibitor property. PgpRules predicts its results based on classification and regression tree (CART) algorithm. The rules are calculated based on PubChem 2D fingerprints and RDKit descriptors. The branching rules that offer guidance to refine compounds will be listed as well. This server will be helpful for researcher working in the field of drug discovery. We hope to contribute to decreasing the cost of discovering new drug molecules.
Required files for PgpRuleskeyboard_arrow_down
PgpRules accepts only the MDL Molfile and the SDF file format. SDF is one of a family of chemical-data file formats developed by MDL; it is intended especially for structural information. "SDF" stands for structure-data file, and SDF files actually wrap the molfile (MDL Molfile) format. Multiple compounds are delimited by lines consisting of four dollar signs ($$$$). A feature of the SDF format is its ability to include associated data. Here is an example SDF file containing five chemical structures.
After you get the MDL Molefile/SDF file ready, the PgpRules will be right executing after you click the button "Submit". The execution time mainly depends on the size of the screening Molfile/SDF file due to internet connection upload speed and the descriptor calculation speed. It's has been tested that a SDF containing 6000 chemical structures works well.
Example files of PgpRules
Here We'd like to provide a few SDF files for you to try out your own virtual screening!
Result of PgpRuleskeyboard_arrow_down
The result of PgpRules contains mainly three information. The visualization of the compound structure, the prediction as inhibitor or non-inhibitior and the fit rules.
When PgpRules finished the prediction process, a result page will be displayed as follows:
The prediction for the selected P‑gp endpoints is either substrate or non-substrate. As shown in the red square, calculated descriptors of the molecule qualify the certain substrate or non-substrate model rules.
The prediction for the selected P‑gp endpoints is either inhibitor or non-inhibitor. As shown in the red square, calculated descriptors of the molecule qualify the certain inhibitor or non-inhibitor model rules.
Branching Rules Explanation
The rules that lead to the endpoint pridiction of the given compounds are listed sequentially from the root of binary classification tree. The rows under the predicted endpoint show the fit rules and the rule conditions. The compound's actual calculated value of that rule is shown as (the bolded text in the parentheses).
Distance to Model Estimation
In order to estimate the applicable domain of our prediction models, the 3-Nearest Neighbor (3-NN) distances between the testing compound and compounds in the training sets are provided. According to our estimation of the prediction accuracy influenced by the 3-NN distance cut-offs, we suggest for the acuracy over 0.8, the 3-NN distances should be over 0.2 for the substrate model, and over 0.02 for the inhibitor model.
The scaled 3-NN distances of each compound to the training set in the P-gp substrate and inhibitor Model,
compared to the cumulative prediction accurarcy.
- mail_outline Email: firstname.lastname@example.org
- laptop Websites:
- Computational Molecular Design & Detection Laboratory: https://www.cmdm.tw/
- Graduate Institute of Biomedical Electronics and Bioinformatics: http://www.bebi.ntu.edu.tw/
- National Taiwan University: https://www.ntu.edu.tw/
- phone Phone: +886-2-33664888 ext.403