add_LLM_add_GUI_add_dmg #7

lareinahu-2023 · 2025-09-08T15:54:33Z

Final Report: GSoC ’25

Student Name: Jiahui Hu (Lareina)
Organization: National Resource for Network Biology (NRNB)
Mentors: Nantia Leonidou, Prof. Dr. Andreas Dräger
Project: Enhancing SBOannotator with LLM Integration & Dynamic Term

Overview

This project transforms SBOannotator from a static, hard-coded tool into a dynamic, intelligent system for annotating Systems Biology Ontology (SBO) terms in SBML models. The enhanced system integrates:

Real-time SBO term retrieval
Multiple enzymology data sources (BiGG, KEGG, Reactome, SEED)
Fine-tuned LLM-assisted annotation
Python Runtime GUI and desktop GUI with interactive visualization

These improvements significantly boost accuracy and usability while preserving the core rule-based strengths.

Methods

1) Automated SBO File Management

Auto-update detection: compare commit SHA at startup.
Versioning: maintain timestamped local versions and auto-delete sbo file if it exceeded 2 series
Formats: support .obo and .json with a 4-step validation pipeline.
User interaction: apply updates, download SBO files, or upload custom SBO files.
Integrity: round-trip conversion tests to ensure lossless persistence.

2) Three-Layer Rule-Based Annotation Workflow

Layer 1 — Configuration / Strategy
Let users to configure database with order and number
Layer 2 — Adapter Execution
Unified multi-database adapters for identifier extraction and EC-number lookup:

BiGG (direct API), KEGG (regex + REST), SEED (Solr), Reactome (web parsing + QuickGO)
Early termination: stop querying once a precise SBO term is found
EC-number truncation: normalize at first non-digit for consistency

Layer 3 — LLM Filter
Target only reactions needing disambiguation:

Distinct handling for single vs multiple ECs
Filter conflicting ECs to avoid ambiguity
Log EC and fetch context for LLM input

3) Fine-tuned LLM for EC → SBO

Base model: BioBERT (dmis-lab/biobert-base-cased-v1.1)
Architecture: 768D encoder → FC (768→384→42) with Focal Loss (~111M params)
Two-stage training:
- Stage-1: 331 expert samples, 80 epochs
- Stage-2: 6,966 GPT-generated samples, lower LR, 10 epochs (noise adaptation)

4) GUI Application

PyQt5 build python runtime GUI with side-by-side pre/post annotation tables and file upload/download
Packaged via PyInstaller as a macOS DMG (ships the rule-based pipeline）

Results

SBO updates: switched to direct GitHub download (~1 min per update)
Coverage: across 108 models, 3,317 reactions upgraded from generic SBO:0000176 to specific terms via multi-database integration
Efficiency: mean 432.99 s/model (~7.2 min); initial naive multi-DB flow was ~14 h/model; early termination reduced end-to-end time to ~7 min/model
LLM classification: Top-1 accuracy 94.00% (42 classes); Macro-F1 0.4563, Weighted-F1 0.9184; mean confidence 78.83% (median 81.41%, max 88.21%); automatic fallback to Stage-1 when Stage-2 degrades
python runtime gui and Dmg app

Constraints & Future Improvement

Packaging: DMG includes rule-based workflow; LLM features provided via CLI to avoid heavy runtime dependencies
Data quality control: intelligent filtering of GPT-generated training data
More expert labels: expand high-quality human annotations for better generalization
Fine-tune LLM find SBO for EC-less reactions:

Thank You

Thanks to the SBOannotator community and Google Summer of Code for this opportunity. Special thanks to mentors Nantia Leonidou and Andreas Dräger for guidance and support. I will continue to monitor issues and PRs and look forward to future collaborations.

Quick Links

PR:
Blog

…he link list in the readme, because they are too large to upload to github

lareinahu-2023 added 2 commits September 9, 2025 00:17

add_LLM_add_GUI_add_dmg: LLM model and Mac application are saved in t…

1707948

…he link list in the readme, because they are too large to upload to github

docs: update README_Enhance

760b2ff

lareinahu-2023 force-pushed the gui_with_trained_llm branch from 1cdf723 to 760b2ff Compare September 8, 2025 16:22

lareinahu-2023 added 3 commits September 9, 2025 20:26

delete original readme

e5fcbb3

change_reademe_name

a502c57

revised_README.md

e6204d4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add_LLM_add_GUI_add_dmg #7

add_LLM_add_GUI_add_dmg #7

Uh oh!

lareinahu-2023 commented Sep 8, 2025 •

edited

Loading

Uh oh!

Uh oh!

add_LLM_add_GUI_add_dmg #7

Are you sure you want to change the base?

add_LLM_add_GUI_add_dmg #7

Uh oh!

Conversation

lareinahu-2023 commented Sep 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Final Report: GSoC ’25

Overview

Methods

1) Automated SBO File Management

2) Three-Layer Rule-Based Annotation Workflow

3) Fine-tuned LLM for EC → SBO

4) GUI Application

Results

Constraints & Future Improvement

Thank You

Quick Links

Uh oh!

Uh oh!

lareinahu-2023 commented Sep 8, 2025 •

edited

Loading