|
| 1 | +# OLS Fetch from GitHub Module |
| 2 | + |
| 3 | +A comprehensive Python module for fetching, processing, and managing SBO (Systems Biology Ontology) files from GitHub repositories with automated change tracking and user file validation. |
| 4 | + |
| 5 | +## Table of Contents |
| 6 | +- [Overview](#overview) |
| 7 | +- [Quick Start](#quick-start) |
| 8 | +- [Repository Structure](#repository-structure) |
| 9 | +- [Directory Structure](#directory-structure) |
| 10 | +- [Testing](#testing) |
| 11 | +- [Dependencies](#dependencies) |
| 12 | + |
| 13 | +## Overview |
| 14 | + |
| 15 | +The `ols_fetch_from_github` module provides a complete workflow for: |
| 16 | +- 🔄 Fetching SBO ontology files from GitHub repositories |
| 17 | +- 📊 Comparing file versions and tracking changes |
| 18 | +- 📁 Processing user-uploaded files (OBO/JSON formats) |
| 19 | +- ✅ Validating file structure and content |
| 20 | +- 📝 Logging changes between versions |
| 21 | +- 🛠️ Converting between OBO and JSON formats |
| 22 | + |
| 23 | +## Quick Start |
| 24 | + |
| 25 | +**Main Entry Point**: `main_workflow.py` |
| 26 | + |
| 27 | +```bash |
| 28 | +# From project root directory |
| 29 | +python -m src.ols_fetch_from_github.main_workflow |
| 30 | + |
| 31 | +``` |
| 32 | + |
| 33 | +## Repository Structure |
| 34 | + |
| 35 | +``` |
| 36 | +src/ols_fetch_from_github/ |
| 37 | +├── README.md # This file |
| 38 | +├── config.json # Configuration settings |
| 39 | +├── __init__.py # Package initialization |
| 40 | +├── main_workflow.py # Main workflow orchestrator |
| 41 | +├── github_file_updater.py # GitHub file management |
| 42 | +├── user_file_processor.py # User file processing |
| 43 | +├── config.py # Configuration management |
| 44 | +├── file_downloader.py # File download utilities |
| 45 | +├── file_converter.py # OBO ↔ JSON conversion |
| 46 | +├── file_validator.py # File validation logic |
| 47 | +├── file_comparator.py # File comparison utilities |
| 48 | +├── obo_parser.py # OBO format parser |
| 49 | +├── change_logger.py # Change tracking and logging |
| 50 | +├── utils.py # General utilities and helpers |
| 51 | +│ |
| 52 | +└── SBO_OBO_Files/ # Data directory (created at runtime) |
| 53 | + ├── localfiles/ # Processed SBO files |
| 54 | + ├── customerfile/ # User uploaded files |
| 55 | + └── logs/ # Change logs |
| 56 | +
|
| 57 | +tests/ (separate directory) |
| 58 | +├── run_tests.py # Test runner |
| 59 | +├── test_*.py # Individual test modules |
| 60 | +└── __init__.py # Test package init |
| 61 | +``` |
| 62 | + |
| 63 | +## Directory Structure |
| 64 | + |
| 65 | +### `SBO_OBO_Files/` - Main Data Directory |
| 66 | +This directory contains all SBO ontology files and related data: |
| 67 | + |
| 68 | +#### `localfiles/` - System Files |
| 69 | +- **Purpose**: Stores officially processed SBO files from GitHub |
| 70 | +- **Contents**: |
| 71 | + - `SBO_OBO_YYYYMMDD_HHMMSS.obo` - Original OBO files from GitHub |
| 72 | + - `SBO_OBO_YYYYMMDD_HHMMSS.json` - Converted JSON files |
| 73 | + - `SBO_OBO_YYYYMMDD_HHMMSS.obo.update_info` - Update metadata |
| 74 | +- **Management**: Automatic cleanup of old versions, keeps top 2 latest files |
| 75 | + |
| 76 | +#### `customerfile/` - User Uploads |
| 77 | +- **Purpose**: Temporary storage for user-uploaded files |
| 78 | +- **Contents**: |
| 79 | + - User uploaded `.obo` or `.json` files |
| 80 | + - `*_user_upload.json` - Processed user files |
| 81 | + - `*_user_upload_converted.obo` - Validation files |
| 82 | +- **Management**: Cleaned up at beginning of each session |
| 83 | + |
| 84 | +#### `logs/` - Change Tracking |
| 85 | +- **Purpose**: Maintains detailed logs of changes between versions |
| 86 | +- **Contents**: |
| 87 | + - `sbo_changes_YYYYMMDD_HHMMSS.json` - Change logs with timestamps |
| 88 | +- **Structure**: |
| 89 | + ```json |
| 90 | + { |
| 91 | + "timestamp": "2023-05-16 11:01:22", |
| 92 | + "has_changes": true, |
| 93 | + "stats": { |
| 94 | + "terms_added": 5, |
| 95 | + "terms_updated": 12, |
| 96 | + "terms_deleted": 0 |
| 97 | + }, |
| 98 | + "term_changes": { |
| 99 | + "added": [...], |
| 100 | + "updated": [...], |
| 101 | + "deleted": [...] |
| 102 | + } |
| 103 | + } |
| 104 | + ``` |
| 105 | + |
| 106 | +### `tests/` - Test Suite |
| 107 | +Complete test coverage for all module components: |
| 108 | + |
| 109 | +#### Test Files |
| 110 | +- `test_main_workflow.py` - Main workflow testing |
| 111 | +- `test_github_file_updater.py` - GitHub operations testing |
| 112 | +- `test_user_file_processor.py` - User file processing testing |
| 113 | +- `test_file_converter.py` - Format conversion testing |
| 114 | +- `test_file_validator.py` - File validation testing |
| 115 | +- `test_obo_parser.py` - OBO parsing testing |
| 116 | +- `test_change_logger.py` - Change logging testing |
| 117 | +- `test_config.py` - Configuration testing |
| 118 | +- `test_utils.py` - Utility functions testing |
| 119 | + |
| 120 | +## Module Components |
| 121 | + |
| 122 | +### Core Classes |
| 123 | +- **`SBOWorkflowManager`** (main_workflow.py) - Main workflow orchestrator that coordinates the entire SBO file processing pipeline |
| 124 | +- **`GitHubFileUpdater`** (github_file_updater.py) - Manages GitHub file operations including downloading, updating, and version comparison |
| 125 | +- **`UserFileProcessor`** (user_file_processor.py) - Processes user uploaded files with validation and format conversion |
| 126 | +- **`Config`** (config.py) - Configuration management system that loads and provides access to system settings |
| 127 | + |
| 128 | +### File Processing Classes |
| 129 | +- **`FileDownloader`** (file_downloader.py) - Handles downloading files from GitHub API with error handling and retry logic |
| 130 | +- **`FileConverter`** (file_converter.py) - Converts between OBO and JSON formats while preserving data structure |
| 131 | +- **`FileValidator`** (file_validator.py) - Validates file structure and content for both OBO and JSON formats |
| 132 | +- **`FileComparator`** (file_comparator.py) - Compares different versions of files to detect changes |
| 133 | +- **`OBOFileParser`** (obo_parser.py) - Parses OBO format files into structured data representations |
| 134 | + |
| 135 | +### Utility Classes |
| 136 | +- **`ChangeLogger`** (change_logger.py) - Tracks and logs changes between file versions with detailed analysis |
| 137 | +- **`FileUtils`** (utils.py) - Provides static utility functions for file operations and directory management |
| 138 | +- **`DirectoryManager`** (utils.py) - Manages directory structure and ensures proper file organization |
| 139 | +- **`ValidationResult`** (utils.py) - Data structure for storing validation results and error information |
| 140 | + |
| 141 | + |
| 142 | + |
| 143 | + |
| 144 | + |
| 145 | +## Testing |
| 146 | + |
| 147 | +### Test Coverage |
| 148 | +- **173 total tests** covering all modules |
| 149 | +- **Unit tests** for individual components |
| 150 | +- **Mock testing** for external dependencies |
| 151 | + |
| 152 | +### Test Categories |
| 153 | +- **Configuration**: Config loading and validation |
| 154 | +- **File Operations**: Download, conversion, validation |
| 155 | +- **Workflow**: End-to-end workflow testing |
| 156 | +- **Error Handling**: Exception and error cases |
| 157 | +- **User Interaction**: Input/output testing |
| 158 | +### Running Tests |
| 159 | +```bash |
| 160 | +# Run all tests |
| 161 | +python tests/run_tests.py |
| 162 | + |
| 163 | +# Run specific test module |
| 164 | +python tests/run_tests.py test_config |
| 165 | + |
| 166 | +# Run with verbose output |
| 167 | +python tests/run_tests.py -v |
| 168 | + |
| 169 | +# Run specific test class |
| 170 | +python -m pytest tests/test_main_workflow.py::TestSBOWorkflowManager -v |
| 171 | +``` |
| 172 | + |
| 173 | +## Dependencies |
| 174 | + |
| 175 | +### Core Dependencies |
| 176 | +- **Python 3.8+** - Core language version |
| 177 | +- **Standard Library**: `json`, `os`, `glob`, `shutil`, `datetime`, `urllib` |
| 178 | +- **Git** - For advanced file comparison operations |
| 179 | +- **requests** - For HTTP operations (fallback to urllib) |
| 180 | + |
| 181 | +### Development Dependencies |
| 182 | +- **pytest** - Testing framework |
| 183 | +- **pytest-cov** - Coverage reporting |
| 184 | +- **unittest** - Built-in testing (alternative) |
| 185 | + |
| 186 | + |
| 187 | +## License |
| 188 | + |
| 189 | +This module is part of the SBOannotator project and follows the same licensing terms. |
0 commit comments