Skip to content

Conversation

hongsw
Copy link
Collaborator

@hongsw hongsw commented Nov 23, 2024

  • feat: refactor SQL Trial DB from Pandas Trial DB, and Test code
  • 🚑 fix: Set correct WORK_DIR based on environment variable

Seungwoo hong added 2 commits November 23, 2024 22:50
- Updated the logic in app.py to properly set the `WORK_DIR` based on the environment variable `AUTORAG_API_ENV`. If the environment is 'dev', the `WORK_DIR` will be located at `"../projects"`, otherwise, it will be set to `"projects"`. Additionally, the `.env` file path is now correctly constructed using the determined `WORK_DIR` value.
@hongsw
Copy link
Collaborator Author

hongsw commented Nov 23, 2024

(.venv) ➜  api git:(Feature/api/984) ✗ python -m pytest tests/test_trial_config.py  # 테스트 실행
=============================================================================================== test session starts ===============================================================================================
platform darwin -- Python 3.11.10, pytest-8.3.3, pluggy-1.5.0
rootdir: /Users/martin/Development/org_autorag/AutoRAG/api
plugins: anyio-4.6.2.post1
collected 7 items                                                                                                                                                                                                 

tests/test_trial_config.py .......                                                                                                                                                                          [100%]

================================================================================================ warnings summary =================================================================================================
../.venv/lib/python3.11/site-packages/pydantic/_internal/_config.py:291
../.venv/lib/python3.11/site-packages/pydantic/_internal/_config.py:291
  /Users/martin/Development/org_autorag/AutoRAG/.venv/lib/python3.11/site-packages/pydantic/_internal/_config.py:291: PydanticDeprecatedSince20: Support for class-based `config` is deprecated, use ConfigDict instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.9/migration/
    warnings.warn(DEPRECATION_MESSAGE, DeprecationWarning)

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
========================================================================================== 7 passed, 2 warnings in 0.63s ==========================================================================================

@hongsw hongsw changed the base branch from main to Feature/#959 November 23, 2024 14:08
…'] assignment and update set_trial_config for trial_id with TrialConfig model dump JSON. Add get_all_config_ids and get_all_trial_ids SQL query functions.
@hongsw hongsw self-assigned this Nov 23, 2024
@hongsw hongsw linked an issue Nov 23, 2024 that may be closed by this pull request
@hongsw
Copy link
Collaborator Author

hongsw commented Nov 23, 2024


AUTORAG_API_ENV: dev
--------------------------------
### Server start
--------------------------------
collected 6 items                                                                                                                                                                                  

tests/test_app.py::test_create_project PASSED
tests/test_app.py::test_create_trial FAILED
tests/test_app.py::test_create_trial_invalid_project PASSED
tests/test_app.py::test_get_trial PASSED
tests/test_app.py::test_delete_trial PASSED
tests/test_app.py::test_environment_variables PASSED

Seungwoo hong added 7 commits November 24, 2024 00:38
This commit introduces the addition of CORS headers in every response and explicit handling of OPTIONS requests in the API server. Includes setting Access-Control-Allow-Origin, Access-Control-Allow-Credentials, Access-Control-Allow-Headers, and Access-Control-Allow-Methods based on the request origin.
…tures, including logging configurations, environment setup, client creation, and project directory validation
…of working directory and model configuration. Fix deprecated usage in test_app.py and enhance testing in test_trial_config.py.
This commit introduces changes to the document parsing task initiation. The import statement for `parse_documents` has been updated within the file. Additionally, the logic for initiating the parsing process has been streamlined and improved for better performance and handling of imports.
@hongsw
Copy link
Collaborator Author

hongsw commented Nov 23, 2024

Test http://localhost:5555/
For Flower ui to monitoring Celery tasks

…g DB, setting/getting trials, updating trial configurations, and retrieving trial information by project or ID.
@hongsw
Copy link
Collaborator Author

hongsw commented Nov 24, 2024

python -m pytest tests/test_project_db.py -s -v
====================================================================== test session starts ======================================================================
platform darwin -- Python 3.11.10, pytest-8.3.3, pluggy-1.5.0 -- /Users/martin/Development/org_autorag/AutoRAG/.venv/bin/python
cachedir: .pytest_cache
rootdir: /Users/martin/Development/org_autorag/AutoRAG/api
configfile: pytest.ini
plugins: asyncio-0.24.0, anyio-4.6.2.post1
asyncio: mode=Mode.AUTO, default_loop_scope=function
collected 8 items                                                                                                                                               

tests/test_project_db.py::test_db_initialization 
[테스트] DB 초기화
- DB 파일 생성 확인: /var/folders/v0/_g3z68kd417dkyyj406b1tpc0000gn/T/tmpj604z2tw/test_project/project.db
PASSED
tests/test_project_db.py::test_set_and_get_trial 
[테스트] Trial 저장 및 조회
- Trial 저장: ID=test_trial_1
- Trial 조회 및 데이터 검증

[Config 검증]
원본 Config: {'trial_id': 'test_trial_1', 'project_id': 'test_project', 'raw_path': '/path/to/raw', 'corpus_path': '/path/to/corpus', 'qa_path': '/path/to/qa', 'config_path': '/path/to/config', 'metadata': {}}
조회된 Config: {'trial_id': 'test_trial_1', 'project_id': 'test_project', 'raw_path': '/path/to/raw', 'corpus_path': '/path/to/corpus', 'qa_path': '/path/to/qa', 'config_path': '/path/to/config', 'metadata': {}}
- 검증 완료: 모든 필드가 일치함
PASSED
tests/test_project_db.py::test_get_nonexistent_trial 
[테스트] 존재하지 않는 Trial 조회
- 존재하지 않는 ID로 조회: nonexistent_id
- 검증 완료: None 반환 확인
PASSED
tests/test_project_db.py::test_set_trial_config 
[테스트] Trial 설정 업데이트
- 기존 Trial 저장: ID=test_trial_1
- 새로운 설정으로 업데이트
- 검증 완료: 설정 업데이트 확인
PASSED
tests/test_project_db.py::test_get_trials_by_project 
[테스트] 프로젝트별 Trial 목록 조회
- 첫 번째 Trial 저장
- 두 번째 Trial 생성 및 저장
- 페이지네이션 테스트 (limit=1)
- 전체 Trial 조회 테스트
- 검증 완료: 총 2개의 Trial 확인
PASSED
tests/test_project_db.py::test_get_all_config_ids PASSED
tests/test_project_db.py::test_delete_trial PASSED
tests/test_project_db.py::test_get_all_trial_ids PASSED

======================================================================= 8 passed in 0.22s =

Seungwoo hong added 9 commits November 25, 2024 02:58
- Refactored the `_init_db` method to enhance database initialization.
- Added logging and enhanced debugging statements for better clarity.
- Now checks for the existence of the database file and its directory before initializing.
- If the database file does not exist, it creates the necessary directory and tables.
- Adjusted permissions for directories (777) and the database file (666) accordingly.
- Updated import statement in app.py to include chunk_documents from trial_tasks module.
- Changed the logging level from INFO to DEBUG for more detailed logging information.
- Refactored the parsing endpoint to handle configuration data retrieval more efficiently.
- Improved error handling to provide more informative error messages in case of missing data or failed tasks.
…import in app.py

- Avoid using uvloop by setting asyncio event loop policy to DefaultEventLoopPolicy().
- Apply nest_asyncio after that to prevent conflicts.
- Change the import in app.py from `from database.project_db import SQLiteProjectDB` to the correct import.

refactor: Update Celery configuration in celery_app.py

- Adjust broker and backend URLs to use 'redis://redis:6379/0'.
- Modify the timezone to 'Asia/Seoul' for better synchronization.
… service

- Removed unnecessary comments related to installing pip as it's clear from the command itself
- Added installation of 'watchfiles', setting PYTHONPATH and PYTHONUNBUFFERED environment variables
- Created a directory for celery beat schedule and added an entrypoint script
- Adjusted permissions for the entrypoint script and removed Windows line endings
- Updated entrypoint to /entrypoint.sh in the API service section
- Added environment variables for watching files, setting time zone, log level, and disabling Python output buffering
@hongsw
Copy link
Collaborator Author

hongsw commented Nov 24, 2024

  • Celery integration completed.
  • SQLiteProjectDB migration completed.
  • Start Chunking trial_task.py is to be determined (TBD); assistance needed from @vkehfdl1.
  • Frontend migration completed, including changes to the waitForTask function.
  • The filepath configuration schema needs to be updated from (raw_data, parse, chunk, qa) to (files, knowledges, qa_docs).
image Figure 1: Files are initially managed in the files folder. In the knowledges folder, selected files are processed, parsed, and chunked for further use. QA documents are generated and stored in the qa_docs folder. Lastly, in the trials folder, specific knowledge and QA documents are selected to perform evaluations and generate reports.

@hongsw hongsw mentioned this pull request Nov 25, 2024
5 tasks
@hongsw
Copy link
Collaborator Author

hongsw commented Nov 25, 2024

  • "Move config folder data to SQLite trial schema"

@hongsw
Copy link
Collaborator Author

hongsw commented Nov 25, 2024

image

Test http://localhost:5555/
For Flower ui to monitoring Celery tasks

@vkehfdl1 vkehfdl1 marked this pull request as ready for review November 25, 2024 05:34
@vkehfdl1 vkehfdl1 self-requested a review November 25, 2024 05:35
Copy link
Contributor

@vkehfdl1 vkehfdl1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's merge this first and split the issues

@vkehfdl1 vkehfdl1 merged commit 7565f7c into Feature/#959 Nov 25, 2024
2 checks passed
@vkehfdl1 vkehfdl1 deleted the Feature/api/984 branch November 25, 2024 05:35
@hongsw
Copy link
Collaborator Author

hongsw commented Nov 25, 2024

generate_qa_docs in trial_tasks.py

vkehfdl1 added a commit that referenced this pull request Feb 2, 2025
* just commit

* just commit

* add the root directory

* .gitignore in the autorag source folder

* edit github actions

* fix .env .gitignore

* add root .gitignore

* set PYTHONPATH at test.yml

* change the name of the test_base.py

* change the VERSION path at docs/conf.py

* Add api to repository

* Add api to repository

* add autorag at pythonpath

* edit gitignore for tracking projects folder

* add README.md at projects folder for tracking projects folder

* add autorag-frontend as git submodule

* Do not run API test at github actions

* rename: update file path from api/projects/README.md to projects/README.md

* 🚑 fix: Update .gitignore and add .dockerignore and Dockerfile

Added various entries to ignore specific files and directories in both the root directory's .gitignore and the api directory's .dockerignore. Additionally, included a Dockerfile for building a Python 3.10-slim-based API image with specified dependencies and runtime configurations. A docker-compose.yml file was introduced to define services and networks for frontend and API components.

* 📝 docs: remove AutoRAG Workflow API documentation and related resources.

* ✨ feat: Add description for tutorial_1 project

* 🔧 chore: update .gitignore to exclude .DS_Store

* 🚑 fix: Update project naming convention in README and adjust requirements

This commit updates the project naming convention in the README file from "AutoRAG API Server" to "AutoRAG-API" for consistency. Additionally, it modifies the version requirement in the `requirements.txt` file for AutoRAG to be greater than or equal to 0.3.8 to ensure compatibility with the latest features.

* 🚑 fix: Update ports and environment variables in docker-compose.yml to use port 5001 instead of 5000

* 🚑 fix: Update schema.py with corrected field indentation and added 'path' field in ParseRequest model

* 🚑 fix: Fix indentation in validate.py for decorator functions.

* 🚑 fix: refactor authentication decorator in auth.py

* 🚑 fix: Correct get_new_trial_dir parameter naming and handle trial directory creation accurately

* 🚑 fix: Corrected import formatting in qa_create.py and standardized function indentation.

* ✨ feat: Add dashboard module to autorag package and implement async parser function

* 🚑 fix: Refactor PandasTrialDB to handle trial operations more efficiently and improve error handling

* move upload file endpoint

* turn evaluate_history.py workable again

* just reformat and edit ignore files

* working with uvicorn now

* Add env variable to locate the project folder and resolve new pydantic version issues (#971)

Co-authored-by: jeffrey <[email protected]>

* Add env variable endpoints for managing env variable (#975)

* add delete endpoint and change to .env based operations

* add api endpoint for gathering all env settings

* load env variable when start each task

* change GET /env to return everything (key & values)

---------

Co-authored-by: jeffrey <[email protected]>

* upload multiple files at once using key 'files' (#981)

Co-authored-by: jeffrey <[email protected]>

* [API] fix validate and evaluation api config, set_trial_config #984 (#987)

* feat: refactor SQL Trial DB from Pandas Trial DB, and Test code

* 🚑 fix: Set correct WORK_DIR based on environment variable

- Updated the logic in app.py to properly set the `WORK_DIR` based on the environment variable `AUTORAG_API_ENV`. If the environment is 'dev', the `WORK_DIR` will be located at `"../projects"`, otherwise, it will be set to `"projects"`. Additionally, the `.env` file path is now correctly constructed using the determined `WORK_DIR` value.

* 🚑 fix: Update method to use model_validate_json in trial_dict['config'] assignment and update set_trial_config for trial_id with TrialConfig model dump JSON. Add get_all_config_ids and get_all_trial_ids SQL query functions.

* ✨ feat: Add CORS headers and handle OPTIONS requests

This commit introduces the addition of CORS headers in every response and explicit handling of OPTIONS requests in the API server. Includes setting Access-Control-Allow-Origin, Access-Control-Allow-Credentials, Access-Control-Allow-Headers, and Access-Control-Allow-Methods based on the request origin.

* ✅ test: add test file for project creation with setup and cleanup fixtures, including logging configurations, environment setup, client creation, and project directory validation

* 🚑 fix: Remove unnecessary commented-out properties in Trial class

* 🚑 fix: Set correct WORK_DIR based on environment variable AUTORAG_WORK_DIR

* ♻️ refactor: Update code in app.py and schema.py for better handling of working directory and model configuration. Fix deprecated usage in test_app.py and enhance testing in test_trial_config.py.

* 📝 docs: update README with instructions for running using Docker Compose and monitoring options.

* ✨ feat: start parsing documents task with improved import handling

This commit introduces changes to the document parsing task initiation. The import statement for `parse_documents` has been updated within the file. Additionally, the logic for initiating the parsing process has been streamlined and improved for better performance and handling of imports.

* ✅ test: add tests for project database operations such as initializing DB, setting/getting trials, updating trial configurations, and retrieving trial information by project or ID.

* ♻️ refactor: Improve database initialization in SQLiteProjectDB

- Refactored the `_init_db` method to enhance database initialization.
- Added logging and enhanced debugging statements for better clarity.
- Now checks for the existence of the database file and its directory before initializing.
- If the database file does not exist, it creates the necessary directory and tables.
- Adjusted permissions for directories (777) and the database file (666) accordingly.

* 🚑 fix: correct chunking and parsing tasks in trial_tasks.py

* 🔧 chore: Update imports and debug logging level in app.py

- Updated import statement in app.py to include chunk_documents from trial_tasks module.
- Changed the logging level from INFO to DEBUG for more detailed logging information.

* ♻️ refactor: refactor parsing endpoint and improve error handling

- Refactored the parsing endpoint to handle configuration data retrieval more efficiently.
- Improved error handling to provide more informative error messages in case of missing data or failed tasks.

* 🚑 fix: Correct chunked data path and task handling in start_chunking function

* ✨ feat: Configure not to use uvloop, apply nest_asyncio, and correct import in app.py

- Avoid using uvloop by setting asyncio event loop policy to DefaultEventLoopPolicy().
- Apply nest_asyncio after that to prevent conflicts.
- Change the import in app.py from `from database.project_db import SQLiteProjectDB` to the correct import.

refactor: Update Celery configuration in celery_app.py

- Adjust broker and backend URLs to use 'redis://redis:6379/0'.
- Modify the timezone to 'Asia/Seoul' for better synchronization.

* 🚑 fix: Install system dependencies and pip, adjust Dockerfile for API service

- Removed unnecessary comments related to installing pip as it's clear from the command itself
- Added installation of 'watchfiles', setting PYTHONPATH and PYTHONUNBUFFERED environment variables
- Created a directory for celery beat schedule and added an entrypoint script
- Adjusted permissions for the entrypoint script and removed Windows line endings
- Updated entrypoint to /entrypoint.sh in the API service section
- Added environment variables for watching files, setting time zone, log level, and disabling Python output buffering

* 🔧 chore: update subproject commit reference in autorag-frontend

* 🔧 chore: add test_projects to .gitignore

* add new lines and fix .env.dev

* fix chunk_documents

---------

Co-authored-by: Seungwoo hong <Seungwoo hong [email protected]>
Co-authored-by: jeffrey <[email protected]>

* Make the default timezone at the API server to UTC (#992)

* Change all datetime.now() to the timezone UTC

* properly working UTC timezone in the API server

---------

Co-authored-by: jeffrey <[email protected]>

* ✨ feat: Add QA document generation task in trial_tasks.py and schema.py (#1005)

* ✨ feat: Add QA document generation task in trial_tasks.py and schema.py

- Added a new field `qa_task_id` in the Trial schema to store the QA task ID.
- Introduced `generate_qa_documents` shared task in `trial_tasks.py` for creating QA documents.
- Updated imports and added `QACreationRequest` in `trial_tasks.py`.
- Included function `run_qa_creation` in `generate_qa_documents` task for generating QA documents with status tracking and database updates.

* 🚑 fix: Return full trial config in get_trial_config

Adjusts the return statement in `get_trial_config` to return the complete trial configuration instead of just the model dump.

* 🔧 chore: update subproject commit in autorag-frontend to 1434e797

---------

Co-authored-by: Seungwoo hong <Seungwoo hong [email protected]>

* Change the api port to 8000 (#1007)

* artifacts/content GET endpoint for sending raw_data files (#1008)

* Change the WORK_DIR setting

* send file directly

* Change the API server that qa, chunk, and qa contains to the project_id. (#1011)

* get all parsed documents and the parse is not relevant to the trial_id now

* add get chunk list at the API server

* chunk document at project view

* /parse POST with parse_name

* QA creation endpoint

* Working API with SQL DB (#1016)

* Refactor start_evaluate api endpoint

* if there is no .env, make one

* make to one api endpoint that retrieve file content
/artifacts/content

* add /artifacts/content delete operation to delete the file

* upload korean filenames

* working parse with frontend

* working QA!

* validation 정상화 shout!

* checkpoint (working but no result at evaluation)

* Fix problem that trial_tasks.py cannot load the env

* Finally success!!!!
Working evaluate and validate

* API server refactor to celery with report, streamlit, and qaurt api server with streaming (#1021)

* working running dashboard

* working running and closing report

* working and closing the chat streamlit server

* working and closing the external api server port to 8100

* add ko version at requirements.txt (#1026)

* Update frontend

* update for api compatibility

* update autorag-frontend

* Add parsed data get endpoint (#1041)

* add parsed file get endpoint

* Add an "all_files" endpoint.

* update AutoRAG version to 0.3.11rc3

* update AutoRAG version to 0.3.12

* update autorag frontend to the latest

* Enable the file extensions (data, html, etc.) (#1053)

* change to the dynamic root directory

* enable uploading html and data file extensions

* merge main into Feature/#959

* Feature/#1080 (#1082)

* Add documentation about AutoRAG GUI

* Delete talk with founders

* Change to build the docs

* move embedding package

---------

Co-authored-by: kimbwook <[email protected]>
Co-authored-by: jeffrey <[email protected]>
Co-authored-by: 홍승우 <[email protected]>
Co-authored-by: Seungwoo hong <[email protected]>
Co-authored-by: Seungwoo hong <Seungwoo hong [email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[API] fix validate and evaluation api config, set_trial_config
2 participants