Deep learning model for peptide y-ion fragmentation prediction
You can run the pre-trained model on
Website.
- python ver. 3.8.3 using python libraries
- torch ver. 1.7.1
- numpy ver. 1.18.5
- pandas ver. 1.2.2
- pyteomics ver.4.3.2
- sklearn ver. 0.24
- pyYAML ver. 5.4.1
- easydict ver. 1.9
pip install -r requirements.txt
in terminal, if you want to install enviroment.
To reproduce the results in the manuscript just run
training.py
on console.To train with different data, make changes in the trianing file from the config.yaml. Trianing file should be in the same format as the original csv file. Trained models will be saved in
logs/
.
To predict using the PrAI-frag, either use the webiste or use the provided
inference.py
.
The input file should be in the following format.
- File type should be
*.csv
-
File must have 3 columns named
Peptide
,Charge
,CE
. CE
andCharge
value will be automatically calculated, if unsubmitted
Peptide | Charge | CE |
---|---|---|
AAAAAAAAAK | 2 | 24.6086 |
AAAAAAAAAR | 2 | |
AAAAAAAVSR | 31.5383 | |
... | ... | ... |
AAAACLDK |
To infer a different data fill in
config.yaml
.
Run### INPUT ###
...
INFER_DATA: '{your workspace}/input/Testset_data(NIST-rat).csv' # <-- use dafault testset or fill in path of your data
inference.py
on console.python worksapce/src/inference.py
If run was successful, {your input file name}_pred.csv
will be created at {your input file path}/{your input file name}_pred.csv
.
Peptide | Charge | CE | y1 | y1^2 | ... | y14^2 |
---|---|---|---|---|---|---|
AAAAAAAAAK | 2 | 24.6086 | ... | |||
AAAAAAAAAR | 2 | 25.9651 | ... | |||
AAAAAAAVSR | 2 | 31.5383 | ... | |||
... | ... | ... | ... | ... | ... | ... |
AAAACLDK | 2 | 26.1567 | ... |
First, open
{workplace}/src/result_PCC.ipynb
on with Jupyter notebook.
Second, fill in the data path.
- File type should be
*.csv
-
File must have 2 columns named
Peptide
,Charge
. - The order of the target data's row and the order of the prediction data's row should be the same.
- The target file of testset(NIST_rat) is located
/data/Testset_data(NIST_rat)_target.csv
.
'''
Read target data & predction data
- The order of the target data'row and the order of prediction data'row
should be same.
'''
# target = pd.read_csv("{your target file's path}")
# pred = pd.read_csv("{your prediction file's path}")
'''
Prosit & MS2PIP results can be parsed as follows,
'''
### Prosit result
# prosit_result = pd.read_csv("{path of prosit result}")
# parsed_prosit_result = parse_prosit_result(prosit_result)
### MS2PIP
# ms2pip_result = pd.read_csv("{path of ms2pip result}")
# parsed_ms2pip_result = parse_ms2pip_result(ms2pip_result)
Third, run the calculating cell.
'''
Calculte PCC and create table
'''
# get_pcc(target, pred)
get_pcc(target, pred)
returns PCC data frame.Peptide | Charge | PCC |
---|---|---|
AAAAAAAAAK | 2 | |
AAAAAAAAAR | 2 | |
AAAAAAAVSR | 2 | |
... | ... | ... |
AAAACLDK | 2 |