PrAI-frag

Deep learning model for peptide y-ion fragmentation prediction
You can run the pre-trained model on Website.

Enviroment

python ver. 3.8.3 using python libraries
torch ver. 1.7.1
numpy ver. 1.18.5
pandas ver. 1.2.2
pyteomics ver.4.3.2
sklearn ver. 0.24
pyYAML ver. 5.4.1
easydict ver. 1.9

Run pip install -r requirements.txt in terminal, if you want to install enviroment.

How to Use

Training

To reproduce the results in the manuscript just run training.py on console.
To train with different data, make changes in the trianing file from the config.yaml. Trianing file should be in the same format as the original csv file. Trained models will be saved in logs/.

Inference

To predict using the PrAI-frag, either use the webiste or use the provided inference.py. The input file should be in the following format.

File type should be *.csv
File must have 3 columns named Peptide, Charge, CE.
CE and Charge value will be automatically calculated, if unsubmitted

Peptide	Charge	CE
AAAAAAAAAK	2	24.6086
AAAAAAAAAR	2
AAAAAAAVSR		31.5383
...	...	...
AAAACLDK

To infer a different data fill in config.yaml .

 ### INPUT ###
...
INFER_DATA: '{your workspace}/input/Testset_data(NIST-rat).csv' # <-- use dafault testset or fill in path of your data

Run inference.py on console.

python worksapce/src/inference.py

If run was successful, {your input file name}_pred.csv will be created at {your input file path}/{your input file name}_pred.csv.

Peptide	Charge	CE	y1	y1^2	...	y14^2
AAAAAAAAAK	2	24.6086			...
AAAAAAAAAR	2	25.9651			...
AAAAAAAVSR	2	31.5383			...
...	...	...	...	...	...	...
AAAACLDK	2	26.1567			...

Calculate PCC

First, open {workplace}/src/result_PCC.ipynb on with Jupyter notebook.
Second, fill in the data path.

File type should be *.csv
File must have 2 columns named Peptide, Charge.
The order of the target data's row and the order of the prediction data's row should be the same.
The target file of testset(NIST_rat) is located /data/Testset_data(NIST_rat)_target.csv.

 ''' 
 Read target data & predction data
     - The order of the target data'row and the order of prediction data'row
       should be same.
 '''
# target = pd.read_csv("{your target file's path}")
# pred = pd.read_csv("{your prediction file's path}")

To reproduce the data from manuscript

'''
Prosit & MS2PIP results can be parsed as follows,
'''
### Prosit result
# prosit_result = pd.read_csv("{path of prosit result}")
# parsed_prosit_result = parse_prosit_result(prosit_result)
### MS2PIP
# ms2pip_result = pd.read_csv("{path of ms2pip result}")
# parsed_ms2pip_result = parse_ms2pip_result(ms2pip_result)

Third, run the calculating cell.

 '''
 Calculte PCC and create table
 '''
# get_pcc(target, pred)

get_pcc(target, pred) returns PCC data frame.

Peptide	Charge	PCC
AAAAAAAAAK	2
AAAAAAAAAR	2
AAAAAAAVSR	2
...	...	...
AAAACLDK	2

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
data		data
dataset		dataset
input		input
logs/prai_frag		logs/prai_frag
src		src
utils		utils
LICENSE		LICENSE
README.md		README.md
inference_output_example.csv		inference_output_example.csv
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PrAI-frag

Enviroment

How to Use

Training

Inference

Calculate PCC

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

bertis-prai/PrAI-frag

Folders and files

Latest commit

History

Repository files navigation

PrAI-frag

Enviroment

How to Use

Training

Inference

Calculate PCC

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages