Skip to content

Commit ec1ac7c

Browse files
authored
Merge pull request #3 from lareinahu-2023/feature/ols/add_dynamic_sbo_fetching
Add dynamic SBO fetching from OLS API
2 parents 6c41db7 + cf72c7c commit ec1ac7c

37 files changed

+32363
-0
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -146,3 +146,4 @@ testSQL.py
146146
/dist/
147147
/static/
148148
/templates/
149+
.idea/
Lines changed: 189 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,189 @@
1+
# OLS Fetch from GitHub Module
2+
3+
A comprehensive Python module for fetching, processing, and managing SBO (Systems Biology Ontology) files from GitHub repositories with automated change tracking and user file validation.
4+
5+
## Table of Contents
6+
- [Overview](#overview)
7+
- [Quick Start](#quick-start)
8+
- [Repository Structure](#repository-structure)
9+
- [Directory Structure](#directory-structure)
10+
- [Testing](#testing)
11+
- [Dependencies](#dependencies)
12+
13+
## Overview
14+
15+
The `ols_fetch_from_github` module provides a complete workflow for:
16+
- 🔄 Fetching SBO ontology files from GitHub repositories
17+
- 📊 Comparing file versions and tracking changes
18+
- 📁 Processing user-uploaded files (OBO/JSON formats)
19+
- ✅ Validating file structure and content
20+
- 📝 Logging changes between versions
21+
- 🛠️ Converting between OBO and JSON formats
22+
23+
## Quick Start
24+
25+
**Main Entry Point**: `main_workflow.py`
26+
27+
```bash
28+
# From project root directory
29+
python -m src.ols_fetch_from_github.main_workflow
30+
31+
```
32+
33+
## Repository Structure
34+
35+
```
36+
src/ols_fetch_from_github/
37+
├── README.md # This file
38+
├── config.json # Configuration settings
39+
├── __init__.py # Package initialization
40+
├── main_workflow.py # Main workflow orchestrator
41+
├── github_file_updater.py # GitHub file management
42+
├── user_file_processor.py # User file processing
43+
├── config.py # Configuration management
44+
├── file_downloader.py # File download utilities
45+
├── file_converter.py # OBO ↔ JSON conversion
46+
├── file_validator.py # File validation logic
47+
├── file_comparator.py # File comparison utilities
48+
├── obo_parser.py # OBO format parser
49+
├── change_logger.py # Change tracking and logging
50+
├── utils.py # General utilities and helpers
51+
52+
└── SBO_OBO_Files/ # Data directory (created at runtime)
53+
├── localfiles/ # Processed SBO files
54+
├── customerfile/ # User uploaded files
55+
└── logs/ # Change logs
56+
57+
tests/ (separate directory)
58+
├── run_tests.py # Test runner
59+
├── test_*.py # Individual test modules
60+
└── __init__.py # Test package init
61+
```
62+
63+
## Directory Structure
64+
65+
### `SBO_OBO_Files/` - Main Data Directory
66+
This directory contains all SBO ontology files and related data:
67+
68+
#### `localfiles/` - System Files
69+
- **Purpose**: Stores officially processed SBO files from GitHub
70+
- **Contents**:
71+
- `SBO_OBO_YYYYMMDD_HHMMSS.obo` - Original OBO files from GitHub
72+
- `SBO_OBO_YYYYMMDD_HHMMSS.json` - Converted JSON files
73+
- `SBO_OBO_YYYYMMDD_HHMMSS.obo.update_info` - Update metadata
74+
- **Management**: Automatic cleanup of old versions, keeps top 2 latest files
75+
76+
#### `customerfile/` - User Uploads
77+
- **Purpose**: Temporary storage for user-uploaded files
78+
- **Contents**:
79+
- User uploaded `.obo` or `.json` files
80+
- `*_user_upload.json` - Processed user files
81+
- `*_user_upload_converted.obo` - Validation files
82+
- **Management**: Cleaned up at beginning of each session
83+
84+
#### `logs/` - Change Tracking
85+
- **Purpose**: Maintains detailed logs of changes between versions
86+
- **Contents**:
87+
- `sbo_changes_YYYYMMDD_HHMMSS.json` - Change logs with timestamps
88+
- **Structure**:
89+
```json
90+
{
91+
"timestamp": "2023-05-16 11:01:22",
92+
"has_changes": true,
93+
"stats": {
94+
"terms_added": 5,
95+
"terms_updated": 12,
96+
"terms_deleted": 0
97+
},
98+
"term_changes": {
99+
"added": [...],
100+
"updated": [...],
101+
"deleted": [...]
102+
}
103+
}
104+
```
105+
106+
### `tests/` - Test Suite
107+
Complete test coverage for all module components:
108+
109+
#### Test Files
110+
- `test_main_workflow.py` - Main workflow testing
111+
- `test_github_file_updater.py` - GitHub operations testing
112+
- `test_user_file_processor.py` - User file processing testing
113+
- `test_file_converter.py` - Format conversion testing
114+
- `test_file_validator.py` - File validation testing
115+
- `test_obo_parser.py` - OBO parsing testing
116+
- `test_change_logger.py` - Change logging testing
117+
- `test_config.py` - Configuration testing
118+
- `test_utils.py` - Utility functions testing
119+
120+
## Module Components
121+
122+
### Core Classes
123+
- **`SBOWorkflowManager`** (main_workflow.py) - Main workflow orchestrator that coordinates the entire SBO file processing pipeline
124+
- **`GitHubFileUpdater`** (github_file_updater.py) - Manages GitHub file operations including downloading, updating, and version comparison
125+
- **`UserFileProcessor`** (user_file_processor.py) - Processes user uploaded files with validation and format conversion
126+
- **`Config`** (config.py) - Configuration management system that loads and provides access to system settings
127+
128+
### File Processing Classes
129+
- **`FileDownloader`** (file_downloader.py) - Handles downloading files from GitHub API with error handling and retry logic
130+
- **`FileConverter`** (file_converter.py) - Converts between OBO and JSON formats while preserving data structure
131+
- **`FileValidator`** (file_validator.py) - Validates file structure and content for both OBO and JSON formats
132+
- **`FileComparator`** (file_comparator.py) - Compares different versions of files to detect changes
133+
- **`OBOFileParser`** (obo_parser.py) - Parses OBO format files into structured data representations
134+
135+
### Utility Classes
136+
- **`ChangeLogger`** (change_logger.py) - Tracks and logs changes between file versions with detailed analysis
137+
- **`FileUtils`** (utils.py) - Provides static utility functions for file operations and directory management
138+
- **`DirectoryManager`** (utils.py) - Manages directory structure and ensures proper file organization
139+
- **`ValidationResult`** (utils.py) - Data structure for storing validation results and error information
140+
141+
142+
143+
144+
145+
## Testing
146+
147+
### Test Coverage
148+
- **173 total tests** covering all modules
149+
- **Unit tests** for individual components
150+
- **Mock testing** for external dependencies
151+
152+
### Test Categories
153+
- **Configuration**: Config loading and validation
154+
- **File Operations**: Download, conversion, validation
155+
- **Workflow**: End-to-end workflow testing
156+
- **Error Handling**: Exception and error cases
157+
- **User Interaction**: Input/output testing
158+
### Running Tests
159+
```bash
160+
# Run all tests
161+
python tests/run_tests.py
162+
163+
# Run specific test module
164+
python tests/run_tests.py test_config
165+
166+
# Run with verbose output
167+
python tests/run_tests.py -v
168+
169+
# Run specific test class
170+
python -m pytest tests/test_main_workflow.py::TestSBOWorkflowManager -v
171+
```
172+
173+
## Dependencies
174+
175+
### Core Dependencies
176+
- **Python 3.8+** - Core language version
177+
- **Standard Library**: `json`, `os`, `glob`, `shutil`, `datetime`, `urllib`
178+
- **Git** - For advanced file comparison operations
179+
- **requests** - For HTTP operations (fallback to urllib)
180+
181+
### Development Dependencies
182+
- **pytest** - Testing framework
183+
- **pytest-cov** - Coverage reporting
184+
- **unittest** - Built-in testing (alternative)
185+
186+
187+
## License
188+
189+
This module is part of the SBOannotator project and follows the same licensing terms.

0 commit comments

Comments
 (0)