Skip to content

bertis-prai/PrAI-frag

Repository files navigation

PrAI-frag

Deep learning model for peptide y-ion fragmentation prediction
You can run the pre-trained model on Website.

Enviroment

  • python ver. 3.8.3 using python libraries
  • torch ver. 1.7.1
  • numpy ver. 1.18.5
  • pandas ver. 1.2.2
  • pyteomics ver.4.3.2
  • sklearn ver. 0.24
  • pyYAML ver. 5.4.1
  • easydict ver. 1.9
Run pip install -r requirements.txt in terminal, if you want to install enviroment.

How to Use

Training

To reproduce the results in the manuscript just run training.py on console.
To train with different data, make changes in the trianing file from the config.yaml. Trianing file should be in the same format as the original csv file. Trained models will be saved in logs/.

Inference

To predict using the PrAI-frag, either use the webiste or use the provided inference.py. The input file should be in the following format.
  • File type should be *.csv
  • File must have 3 columns named Peptide, Charge, CE.
  • CE and Charge value will be automatically calculated, if unsubmitted
PeptideChargeCE
AAAAAAAAAK224.6086
AAAAAAAAAR2
AAAAAAAVSR31.5383
.........
AAAACLDK

To infer a different data fill in config.yaml .
 ### INPUT ###

...

INFER_DATA: '{your workspace}/input/Testset_data(NIST-rat).csv' # <-- use dafault testset or fill in path of your data

Run inference.py on console.

python worksapce/src/inference.py

If run was successful, {your input file name}_pred.csv will be created at {your input file path}/{your input file name}_pred.csv.

PeptideChargeCEy1y1^2...y14^2
AAAAAAAAAK224.6086...
AAAAAAAAAR225.9651...
AAAAAAAVSR231.5383...
.....................
AAAACLDK226.1567...


Calculate PCC

First, open {workplace}/src/result_PCC.ipynb on with Jupyter notebook.
Second, fill in the data path.
  • File type should be *.csv
  • File must have 2 columns named Peptide, Charge.
  • The order of the target data's row and the order of the prediction data's row should be the same.
  • The target file of testset(NIST_rat) is located /data/Testset_data(NIST_rat)_target.csv.
 ''' 
 Read target data & predction data
     - The order of the target data'row and the order of prediction data'row
       should be same.
 '''
# target = pd.read_csv("{your target file's path}")
# pred = pd.read_csv("{your prediction file's path}")
To reproduce the data from manuscript
'''
Prosit & MS2PIP results can be parsed as follows,
'''
### Prosit result
# prosit_result = pd.read_csv("{path of prosit result}")
# parsed_prosit_result = parse_prosit_result(prosit_result)
### MS2PIP
# ms2pip_result = pd.read_csv("{path of ms2pip result}")
# parsed_ms2pip_result = parse_ms2pip_result(ms2pip_result)

Third, run the calculating cell.
 '''
 Calculte PCC and create table
 '''
# get_pcc(target, pred)
get_pcc(target, pred) returns PCC data frame.
PeptideChargePCC
AAAAAAAAAK2
AAAAAAAAAR2
AAAAAAAVSR2
.........
AAAACLDK2

About

Deep learning model for peptide y-ion fragmentation prediction

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •