-
Notifications
You must be signed in to change notification settings - Fork 366
Document async endpoint functionality #786
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document async endpoint functionality #786
Conversation
Signed-off-by: David Gardner <[email protected]>
Signed-off-by: David Gardner <[email protected]>
…uate endpoint Signed-off-by: David Gardner <[email protected]>
Signed-off-by: David Gardner <[email protected]>
Signed-off-by: David Gardner <[email protected]>
Signed-off-by: David Gardner <[email protected]>
Warning Rate limit exceeded@dagardner-nv has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 22 minutes and 32 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (2)
WalkthroughAdds documentation for an async /generate endpoint, OpenAI v1-compatible endpoint, server startup and installation notes (async_endpoints extra), and streaming examples. Adjusts FastAPI front-end to size LocalCluster workers from config, clarifies max_running_async_jobs description, updates JobStore db_url docstring, and adds "SQLAlchemy" to Vale vocabulary. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Client
participant API as API Server (FastAPI)
participant Dask as Dask (LocalCluster or External)
participant DB as JobStore (SQLAlchemy)
note over API: /generate/async (requires async_endpoints extra)
Client->>API: POST /generate/async {input, job_id?, sync_timeout?, expiry_seconds?}
alt scheduler_address provided
API->>Dask: Submit job to external scheduler
else local cluster
API->>Dask: Submit job to LocalCluster(n_workers=max_running_async_jobs, threads_per_worker=1)
end
API->>DB: Persist job metadata/status
alt completes within sync_timeout
Dask-->>API: Result
API->>DB: Update job status/result
API-->>Client: 200 {status: completed, output}
else async/pending
API-->>Client: 202 {job_id, status: pending}
Client->>API: GET /jobs/{job_id}
API->>DB: Fetch status/result
DB-->>API: state/result
API-->>Client: {status, output?}
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Pre-merge checks (3 passed)✅ Passed checks (3 passed)
Pre-merge checks (3 passed)✅ Passed checks (3 passed)
Pre-merge checks (3 passed)✅ Passed checks (3 passed)
Pre-merge checks (3 passed)✅ Passed checks (3 passed)
Pre-merge checks (3 passed)✅ Passed checks (3 passed)
Pre-merge checks (3 passed)✅ Passed checks (3 passed)
Pre-merge checks (3 passed)✅ Passed checks (3 passed)
Pre-merge checks (3 passed)✅ Passed checks (3 passed)
Pre-merge checks (3 passed)✅ Passed checks (3 passed)
Pre-merge checks (3 passed)✅ Passed checks (3 passed)
Pre-merge checks (3 passed)✅ Passed checks (3 passed)
Pre-merge checks (3 passed)✅ Passed checks (3 passed)
Pre-merge checks (3 passed)✅ Passed checks (3 passed)
Pre-merge checks (3 passed)✅ Passed checks (3 passed)
Pre-merge checks (3 passed)✅ Passed checks (3 passed)
Pre-merge checks (3 passed)✅ Passed checks (3 passed)
Pre-merge checks (3 passed)✅ Passed checks (3 passed)
Pre-merge checks (3 passed)✅ Passed checks (3 passed)
Pre-merge checks (3 passed)✅ Passed checks (3 passed)
Pre-merge checks (3 passed)✅ Passed checks (3 passed)
Pre-merge checks (3 passed)✅ Passed checks (3 passed)
Pre-merge checks (3 passed)✅ Passed checks (3 passed)
Pre-merge checks (3 passed)✅ Passed checks (3 passed)
Pre-merge checks (3 passed)✅ Passed checks (3 passed)
Pre-merge checks (3 passed)✅ Passed checks (3 passed)
Pre-merge checks (3 passed)✅ Passed checks (3 passed)
Pre-merge checks (3 passed)✅ Passed checks (3 passed)
Pre-merge checks (3 passed)✅ Passed checks (3 passed)
Pre-merge checks (3 passed)✅ Passed checks (3 passed)
Pre-merge checks (3 passed)✅ Passed checks (3 passed)
Pre-merge checks (3 passed)✅ Passed checks (3 passed)
Pre-merge checks (3 passed)✅ Passed checks (3 passed)
Pre-merge checks (3 passed)✅ Passed checks (3 passed)
Pre-merge checks (3 passed)✅ Passed checks (3 passed)
Pre-merge checks (3 passed)✅ Passed checks (3 passed)
Pre-merge checks (3 passed)✅ Passed checks (3 passed)
Pre-merge checks (3 passed)✅ Passed checks (3 passed)
Pre-merge checks (3 passed)✅ Passed checks (3 passed)
Pre-merge checks (3 passed)✅ Passed checks (3 passed)
Pre-merge checks (3 passed)✅ Passed checks (3 passed)
Pre-merge checks (3 passed)✅ Passed checks (3 passed)
Pre-merge checks (3 passed)✅ Passed checks (3 passed)
Pre-merge checks (3 passed)✅ Passed checks (3 passed)
Pre-merge checks (3 passed)✅ Passed checks (3 passed)
Pre-merge checks (3 passed)✅ Passed checks (3 passed)
Pre-merge checks (3 passed)✅ Passed checks (3 passed)
Pre-merge checks (3 passed)✅ Passed checks (3 passed)
Pre-merge checks (3 passed)✅ Passed checks (3 passed)
Pre-merge checks (3 passed)✅ Passed checks (3 passed)
Pre-merge checks (3 passed)✅ Passed checks (3 passed)
Pre-merge checks (3 passed)✅ Passed checks (3 passed)
Pre-merge checks (3 passed)✅ Passed checks (3 passed)
Pre-merge checks (3 passed)✅ Passed checks (3 passed)
Pre-merge checks (3 passed)✅ Passed checks (3 passed)
Pre-merge checks (3 passed)✅ Passed checks (3 passed)
Pre-merge checks (3 passed)✅ Passed checks (3 passed)
Pre-merge checks (3 passed)✅ Passed checks (3 passed)
Pre-merge checks (3 passed)✅ Passed checks (3 passed)
Pre-merge checks (3 passed)✅ Passed checks (3 passed)
Pre-merge checks (3 passed)✅ Passed checks (3 passed)
Pre-merge checks (3 passed)✅ Passed checks (3 passed)
Pre-merge checks (3 passed)✅ Passed checks (3 passed)
Pre-merge checks (3 passed)✅ Passed checks (3 passed)
Pre-merge checks (3 passed)✅ Passed checks (3 passed)
Pre-merge checks (3 passed)✅ Passed checks (3 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR documents the async endpoint functionality and fixes configuration handling for the max_running_async_jobs parameter. The changes improve documentation for asynchronous job processing capabilities and ensure proper configuration of the Dask cluster.
- Documents the
/generate/async
endpoint with examples and configuration details - Fixes the handling of
max_running_async_jobs
to properly configure Dask worker count - Updates documentation links and adds SQLAlchemy to the accepted vocabulary
Reviewed Changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.
Show a summary per file
File | Description |
---|---|
src/nat/front_ends/fastapi/job_store.py | Updates SQLAlchemy documentation URL reference |
src/nat/front_ends/fastapi/fastapi_front_end_plugin.py | Applies max_running_async_jobs config to Dask LocalCluster |
src/nat/front_ends/fastapi/fastapi_front_end_config.py | Improves documentation for max_running_async_jobs parameter |
docs/source/reference/evaluate-api.md | Documents async_endpoints dependency requirement |
docs/source/reference/api-server-endpoints.md | Adds comprehensive documentation for async endpoint |
ci/vale/styles/config/vocabularies/nat/accept.txt | Adds SQLAlchemy to accepted vocabulary |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
…avid-document-async-endpoints Signed-off-by: David Gardner <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (6)
ci/vale/styles/config/vocabularies/nat/accept.txt (1)
1-4
: Missing SPDX header — will fail CI header checks.Add the standard SPDX Apache‑2.0 header at the top of this .txt file.
Apply this diff:
+# SPDX-FileCopyrightText: Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 + # List of case-sensitive regular expressions matching words that should be accepted by Vale. For product names like # "cuDF" or "cuML", we want to ensure that they are capitalized the same way they're written by the product owners. # Regular expressions are parsed according to the Go syntax: https://golang.org/pkg/regexp/syntax/src/nat/front_ends/fastapi/job_store.py (4)
515-516
: Fix invalid SQLAlchemy filter:.not_in
doesn’t exist.Use
.notin_(...)
(or~col.in_(...)
) to generate a NOT IN clause.Apply this diff:
- stmt = select(JobInfo).where( - and_(JobInfo.is_expired == sa_expr.false(), - JobInfo.status.not_in(self.ACTIVE_STATUS))).order_by(JobInfo.updated_at.desc()) + stmt = select(JobInfo).where( + and_(JobInfo.is_expired == sa_expr.false(), + JobInfo.status.notin_(self.ACTIVE_STATUS)) + ).order_by(JobInfo.updated_at.desc())
159-160
: STATUS type mismatch causes logic errors; store and compare consistently as strings.
ACTIVE_STATUS
currently holds Enum members but the DB stores strings; membership checks and SQL filters will misbehave.Apply this diff:
- ACTIVE_STATUS = {JobStatus.RUNNING, JobStatus.SUBMITTED} + ACTIVE_STATUS = {JobStatus.RUNNING.value, JobStatus.SUBMITTED.value}
470-475
: Filter by the string value, not the Enum object.
JobInfo.status
is a string column; comparing it to aJobStatus
Enum will not match.Apply this diff:
- if not isinstance(status, JobStatus): - status = JobStatus(status) - - stmt = select(JobInfo).where(JobInfo.status == status) + if not isinstance(status, JobStatus): + status = JobStatus(status) + + stmt = select(JobInfo).where(JobInfo.status == status.value)
255-263
: Insert uses Enum object into a String column — may break on some DB drivers.Persist the Enum’s
.value
for consistency with queries and updates.Apply this diff:
- job = JobInfo(job_id=job_id, - status=JobStatus.SUBMITTED, + job = JobInfo(job_id=job_id, + status=JobStatus.SUBMITTED.value, config_file=config_file, created_at=datetime.now(UTC), updated_at=datetime.now(UTC), error=None, output_path=None, expiry_seconds=clamped_expiry)src/nat/front_ends/fastapi/fastapi_front_end_config.py (1)
60-75
: Strengthen job_id validation to enforce allowed characters.Current check blocks path traversal but doesn’t enforce “alphanumeric or underscore” as documented.
Apply this diff:
@@ -import typing +import typing +import re @@ def validate_job_id(cls, job_id: str): job_id = job_id.strip() - job_id_path = Path(job_id) - if len(job_id_path.parts) > 1 or job_id_path.resolve().name != job_id: + # Enforce allowed characters + if not re.fullmatch(r"[A-Za-z0-9_]+", job_id): raise ValueError( - f"Job ID '{job_id}' contains invalid characters. Only alphanumeric characters and underscores are" - " allowed.") + f"Job ID '{job_id}' contains invalid characters. Only alphanumeric characters and underscores are allowed." + ) + # Block any path-like input defensively + job_id_path = Path(job_id) + if len(job_id_path.parts) > 1: + raise ValueError(f"Job ID '{job_id}' must not contain path separators.")
🧹 Nitpick comments (9)
src/nat/front_ends/fastapi/job_store.py (3)
121-123
: Optional: Align Python type with stored DB type or switch to native Enum.Either annotate as
str
to match the String column, or useSAEnum(JobStatus)
for safer typing.Two options:
- status: Mapped[JobStatus] = mapped_column(String(11)) + status: Mapped[str] = mapped_column(String(11), index=True)Or:
+from sqlalchemy import Enum as SAEnum ... - status: Mapped[JobStatus] = mapped_column(String(11)) + status: Mapped[JobStatus] = mapped_column(SAEnum(JobStatus, values_callable=lambda e: [m.value for m in e]), index=True)
441-447
: Optional: Add DB index to speed “last job” queries.Ordering by
created_at
/updated_at
is frequent; an index helps on large tables.Apply one of:
- updated_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), - default=datetime.now(UTC), - onupdate=datetime.now(UTC)) + updated_at: Mapped[datetime] = mapped_column( + DateTime(timezone=True), + default=datetime.now(UTC), + onupdate=datetime.now(UTC), + index=True, + )
306-325
: Optional: Makesync_timeout
robust to long-running jobs.Consider catching/canceling stale local waits and logging at DEBUG when timing out.
- if sync_timeout > 0: + if sync_timeout > 0: try: _ = await future.result(timeout=sync_timeout) job = await self.get_job(job_id) assert job is not None, "Job should exist after future result" return (job_id, job) except TimeoutError: - pass + logger.debug("submit_job timed out after %s s for job_id=%s", sync_timeout, job_id)src/nat/front_ends/fastapi/fastapi_front_end_plugin.py (1)
109-110
: Good fix: wiremax_running_async_jobs
into Dask LocalCluster.This aligns runtime concurrency with configuration. Consider capping workers to CPU count to avoid excessive process fan-out on small hosts.
Example:
- self._cluster = LocalCluster(n_workers=self.front_end_config.max_running_async_jobs, - threads_per_worker=1) + self._cluster = LocalCluster( + n_workers=min(self.front_end_config.max_running_async_jobs, max(1, os.cpu_count() or 1)), + threads_per_worker=1, +)docs/source/reference/evaluate-api.md (1)
23-23
: Looks good; consider switching inline commands to fenced blocks for copy/paste.Turning the two install commands into a fenced bash block improves readability and reduces copy errors.
- ... installed. For users installing from source, this can be done by running `uv pip install -e .[async_endpoints]` from the root directory of the NeMo Agent toolkit library. Similarly, for users installing from PyPI, this can be done by running `pip install nvidia-nat[async_endpoints]`. + ... installed. For users installing from source: + + ```bash + uv pip install -e .[async_endpoints] + ``` + + For users installing from PyPI: + + ```bash + pip install nvidia-nat[async_endpoints] + ```docs/source/reference/api-server-endpoints.md (4)
21-25
: Fix markdown list indentation (MD007).Unindent the list to satisfy markdownlint and render consistently.
- - **Generate Interface:** Uses the transaction schema defined by your workflow. The interface documentation is accessible - using Swagger while the server is running [`http://localhost:8000/docs`](http://localhost:8000/docs). - - **Chat Interface:** [OpenAI API Documentation](https://platform.openai.com/docs/guides/text?api-mode=chat) provides - details on chat formats compatible with the NeMo Agent toolkit server. +- **Generate Interface:** Uses the transaction schema defined by your workflow. The interface documentation is accessible + using Swagger while the server is running [`http://localhost:8000/docs`](http://localhost:8000/docs). +- **Chat Interface:** [OpenAI API Documentation](https://platform.openai.com/docs/guides/text?api-mode=chat) provides + details on chat formats compatible with the NeMo Agent toolkit server.
34-39
: Avoid bare URL and improve sentence flow (MD034).Add punctuation and wrap the URL.
-The following examples assume that the simple calculator workflow has been installed and is running on http://localhost:8000 to do so run the following commands: +The following examples assume that the simple calculator workflow has been installed and is running on <http://localhost:8000>. To do so, run:uv pip install -e examples/getting_started/simple_calculator
nat serve --config_file examples/getting_started/simple_calculator/configs/config.yml
71-74
: Grammar/consistency inexpiry_seconds
.Fix pluralization and wording.
- - `expiry_seconds`: The amount of time in seconds after the job completes (either successfully or unsuccessfully) which any output files will be preserved before being deleted. Default is `3600` (1 hours), minimum is `600` (10 minutes) and maximum value for this field is `86400` (24 hours). The text output in the response is not affected by this field. + - `expiry_seconds`: Time in seconds after job completion (success or failure) to preserve any output files before deletion. Default is `3600` (1 hour); minimum `600` (10 minutes); maximum `86400` (24 hours). The text output in the response is not affected by this field.
109-118
: Use consistent RFC 3339 timestamps.Add
Z
suffix to UTC timestamps forcreated_at
andupdated_at
to matchexpires_at
.- "created_at": "2025-09-10T20:52:24.768066", + "created_at": "2025-09-10T20:52:24.768066Z", @@ - "updated_at": "2025-09-10T20:52:30.734659" + "updated_at": "2025-09-10T20:52:30.734659Z"
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
ci/vale/styles/config/vocabularies/nat/accept.txt
(1 hunks)docs/source/reference/api-server-endpoints.md
(2 hunks)docs/source/reference/evaluate-api.md
(1 hunks)src/nat/front_ends/fastapi/fastapi_front_end_config.py
(1 hunks)src/nat/front_ends/fastapi/fastapi_front_end_plugin.py
(1 hunks)src/nat/front_ends/fastapi/job_store.py
(1 hunks)
🧰 Additional context used
📓 Path-based instructions (9)
**/*.{py,sh,md,yml,yaml,toml,ini,json,ipynb,txt,rst}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
**/*.{py,sh,md,yml,yaml,toml,ini,json,ipynb,txt,rst}
: Every file must start with the standard SPDX Apache-2.0 header; keep copyright years up‑to‑date
All source files must include the SPDX Apache‑2.0 header; do not bypass CI header checks
Files:
ci/vale/styles/config/vocabularies/nat/accept.txt
src/nat/front_ends/fastapi/job_store.py
src/nat/front_ends/fastapi/fastapi_front_end_config.py
src/nat/front_ends/fastapi/fastapi_front_end_plugin.py
docs/source/reference/evaluate-api.md
docs/source/reference/api-server-endpoints.md
**/*
⚙️ CodeRabbit configuration file
**/*
: # Code Review Instructions
- Ensure the code follows best practices and coding standards. - For Python code, follow
PEP 20 and
PEP 8 for style guidelines.- Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values.
Example:def my_function(param1: int, param2: str) -> bool: pass- For Python exception handling, ensure proper stack trace preservation:
- When re-raising exceptions: use bare
raise
statements to maintain the original stack trace,
and uselogger.error()
(notlogger.exception()
) to avoid duplicate stack trace output.- When catching and logging exceptions without re-raising: always use
logger.exception()
to capture the full stack trace information.Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any
words listed in the
ci/vale/styles/config/vocabularies/nat/reject.txt
file, words that might appear to be
spelling mistakes but are listed in theci/vale/styles/config/vocabularies/nat/accept.txt
file are OK.Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,
and should contain an Apache License 2.0 header comment at the top of each file.
- Confirm that copyright years are up-to date whenever a file is changed.
Files:
ci/vale/styles/config/vocabularies/nat/accept.txt
src/nat/front_ends/fastapi/job_store.py
src/nat/front_ends/fastapi/fastapi_front_end_config.py
src/nat/front_ends/fastapi/fastapi_front_end_plugin.py
docs/source/reference/evaluate-api.md
docs/source/reference/api-server-endpoints.md
src/**/*.py
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
src/**/*.py
: All importable Python code must live under src/
All public APIs in src/ require Python 3.11+ type hints on parameters and return values; prefer typing/collections.abc abstractions; use typing.Annotated when useful
Provide Google-style docstrings for every public module, class, function, and CLI command; first line concise with a period; surround code entities with backticks
Files:
src/nat/front_ends/fastapi/job_store.py
src/nat/front_ends/fastapi/fastapi_front_end_config.py
src/nat/front_ends/fastapi/fastapi_front_end_plugin.py
src/nat/**/*
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Core functionality under src/nat should prioritize backward compatibility when changed
Files:
src/nat/front_ends/fastapi/job_store.py
src/nat/front_ends/fastapi/fastapi_front_end_config.py
src/nat/front_ends/fastapi/fastapi_front_end_plugin.py
⚙️ CodeRabbit configuration file
This directory contains the core functionality of the toolkit. Changes should prioritize backward compatibility.
Files:
src/nat/front_ends/fastapi/job_store.py
src/nat/front_ends/fastapi/fastapi_front_end_config.py
src/nat/front_ends/fastapi/fastapi_front_end_plugin.py
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
**/*.py
: Follow PEP 8/20 style; format with yapf (column_limit=120) and use 4-space indentation; end files with a single newline
Run ruff (ruff check --fix) per pyproject.toml; fix warnings unless explicitly ignored; ruff is linter-only
Use snake_case for functions/variables, PascalCase for classes, and UPPER_CASE for constants
Treat pyright warnings as errors during development
Exception handling: preserve stack traces and avoid duplicate logging
When re-raising exceptions, use bareraise
and log with logger.error(), not logger.exception()
When catching and not re-raising, log with logger.exception() to capture stack trace
Validate and sanitize all user input; prefer httpx with SSL verification and follow OWASP Top‑10
Use async/await for I/O-bound work; profile CPU-heavy paths with cProfile/mprof; cache with functools.lru_cache or external cache; leverage NumPy vectorization when beneficial
**/*.py
: Programmatic use: create TestLLMConfig(response_seq=[...], delay_ms=...), add with builder.add_llm("", cfg).
When retrieving the test LLM wrapper, use builder.get_llm(name, wrapper_type=LLMFrameworkEnum.) and call the framework’s method (e.g., ainvoke, achat, call).
Files:
src/nat/front_ends/fastapi/job_store.py
src/nat/front_ends/fastapi/fastapi_front_end_config.py
src/nat/front_ends/fastapi/fastapi_front_end_plugin.py
**/*.{py,md}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Never hard‑code version numbers in code or docs; versions are derived by setuptools‑scm
Files:
src/nat/front_ends/fastapi/job_store.py
src/nat/front_ends/fastapi/fastapi_front_end_config.py
src/nat/front_ends/fastapi/fastapi_front_end_plugin.py
docs/source/reference/evaluate-api.md
docs/source/reference/api-server-endpoints.md
**/*.{py,yaml,yml}
📄 CodeRabbit inference engine (.cursor/rules/nat-test-llm.mdc)
**/*.{py,yaml,yml}
: Configure response_seq as a list of strings; values cycle per call, and [] yields an empty string.
Configure delay_ms to inject per-call artificial latency in milliseconds for nat_test_llm.
Files:
src/nat/front_ends/fastapi/job_store.py
src/nat/front_ends/fastapi/fastapi_front_end_config.py
src/nat/front_ends/fastapi/fastapi_front_end_plugin.py
docs/source/**/*.md
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
docs/source/**/*.md
: Use the official naming: first use “NVIDIA NeMo Agent toolkit”; subsequent uses “NeMo Agent toolkit”; never use deprecated names in documentation
Documentation sources must be Markdown under docs/source; keep docs in sync and fix Sphinx errors/broken links
Documentation must be clear, comprehensive, free of TODO/FIXME/placeholders/offensive/outdated terms; fix spelling; adhere to Vale vocab allow/reject lists
Files:
docs/source/reference/evaluate-api.md
docs/source/reference/api-server-endpoints.md
docs/source/**/*
⚙️ CodeRabbit configuration file
This directory contains the source code for the documentation. All documentation should be written in Markdown format. Any image files should be placed in the
docs/source/_static
directory.
Files:
docs/source/reference/evaluate-api.md
docs/source/reference/api-server-endpoints.md
🧬 Code graph analysis (1)
src/nat/front_ends/fastapi/fastapi_front_end_plugin.py (1)
src/nat/front_ends/fastapi/fastapi_front_end_plugin_worker.py (1)
front_end_config
(123-124)
🪛 markdownlint-cli2 (0.17.2)
docs/source/reference/api-server-endpoints.md
21-21: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
23-23: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
34-34: Bare URL used
(MD034, no-bare-urls)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: CI Pipeline / Check
🔇 Additional comments (7)
ci/vale/styles/config/vocabularies/nat/accept.txt (1)
130-131
: LGTM — adding “SQLAlchemy” to accepted vocabulary avoids false positives in docs.src/nat/front_ends/fastapi/job_store.py (4)
150-151
: Doc link improvement is clear and specific.
495-503
: Active-job expiry check compares strings to Enums; will incorrectly expire running jobs.With the fix to
ACTIVE_STATUS
(strings), this check works. If you don’t changeACTIVE_STATUS
, wrap withJobStatus(job.status)
.Apply either:
- Keep the
ACTIVE_STATUS
string set (preferred); no change needed here.- Or:
- if job.status in self.ACTIVE_STATUS: + if JobStatus(job.status) in self.ACTIVE_STATUS: return None
211-218
: Isasync_scoped_session.remove()
awaitable?If
remove()
is synchronous (as in scoped_session),await
will fail at runtime.Please confirm against the SQLAlchemy version in use. If not awaitable:
- await self._session.remove() + self._session.remove()
552-553
: No await required — keep var.delete() as-is.
Variable.delete on the client is a synchronous method (it sends "variable_delete" to the scheduler) and is not awaitable, so the current var.delete() call is correct. (docs.dask.org)src/nat/front_ends/fastapi/fastapi_front_end_config.py (2)
208-213
: Docstring clarification for max_running_async_jobs looks good.Clearer behavior note (only used with local cluster) and impact on worker count are helpful.
200-207
: LocalCluster already uses max_running_async_jobs; confirm it's guarded by scheduler_addressLocalCluster in src/nat/front_ends/fastapi/fastapi_front_end_plugin.py (≈lines 106–112) is instantiated with n_workers=self.front_end_config.max_running_async_jobs and threads_per_worker=1 — confirm this code path only runs when front_end_config.scheduler_address is None.
Signed-off-by: David Gardner <[email protected]>
Signed-off-by: David Gardner <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
docs/source/reference/api-server-endpoints.md (2)
20-20
: Count mismatch fixed — looks good.The numeral now matches the five listed transactions.
65-65
: Replace incorrect config key, fix punctuation/spacing, and split into readable sentences.Use
db_url
(notdatabase_url
), add missing space in “SQLAlchemy’s Dialects,” and avoid run-ons.-Asynchronous jobs are managed using [Dask](https://docs.dask.org/en/stable/) by default a local Dask cluster is created at start time, however you can also configure the server to connect to an existing Dask scheduler by setting the `scheduler_address` configuration parameter. The Dask scheduler is used to manage the execution of asynchronous jobs, and can be configured to run on a single machine or across a cluster of machines. Job history and metadata is stored in a SQL database using [SQLAlchemy](https://www.sqlalchemy.org/) by default a temporary SQLite database is created at start time, however you can also configure the server to use a persistent database by setting the `database_url` configuration parameter. Any database supported by [SQLAlchemy's Asynchronous I/O extension](https://docs.sqlalchemy.org/en/20/orm/extensions/asyncio.html) can be used, refer to [SQLAlchemy'sDialects](https://docs.sqlalchemy.org/en/20/dialects/index.html) for a complete list (many but not all of these support Asynchronous I/O). +Asynchronous jobs are managed using [Dask](https://docs.dask.org/en/stable/). By default, a local Dask cluster is created at start time. You can also connect to an existing Dask scheduler by setting the `scheduler_address` configuration parameter. The Dask scheduler manages the execution of asynchronous jobs and can run on a single machine or across a cluster. + +Job history and metadata are stored in a SQL database using [SQLAlchemy](https://www.sqlalchemy.org/). By default, a temporary SQLite database is created at start time. To use a persistent database, set the `db_url` configuration parameter. Any database supported by [SQLAlchemy’s Asynchronous I/O extension](https://docs.sqlalchemy.org/en/20/orm/extensions/asyncio.html) can be used—refer to [SQLAlchemy’s Dialects](https://docs.sqlalchemy.org/en/20/dialects/index.html) for a complete list (many but not all support asynchronous I/O).
🧹 Nitpick comments (7)
docs/source/reference/api-server-endpoints.md (7)
21-24
: Fix top-level list indentation (markdownlint MD007).Unindent the bullets to column 0.
- - **Generate Interface:** Uses the transaction schema defined by your workflow. The interface documentation is accessible + - **Generate Interface:** Uses the transaction schema defined by your workflow. The interface documentation is accessible using Swagger while the server is running [`http://localhost:8000/docs`](http://localhost:8000/docs). - - **Chat Interface:** [OpenAI API Documentation](https://platform.openai.com/docs/guides/text?api-mode=chat) provides + - **Chat Interface:** [OpenAI API Documentation](https://platform.openai.com/docs/guides/text?api-mode=chat) provides details on chat formats compatible with the NeMo Agent toolkit server.
27-27
: Branding/style: use “NeMo Agent toolkit” (lowercase “toolkit”) and sentence case.Aligns with the docs naming rule.
-## Start the NeMo Agent Toolkit Server +## Start the NeMo Agent toolkit server
34-38
: Fix run-on sentence and bare URL (markdownlint MD034).Split the sentence and format the URL.
-The following examples assume that the simple calculator workflow has been installed and is running on http://localhost:8000 to do so run the following commands: +The following examples assume the simple calculator workflow is installed and running at `http://localhost:8000`. To do so, run: ```bash uv pip install -e examples/getting_started/simple_calculator nat serve --config_file examples/getting_started/simple_calculator/configs/config.yml--- `63-64`: **Quote extras in pip commands to avoid shell globbing.** Prevents bracket expansion in some shells. ```diff -... by running `uv pip install -e .[async_endpoints]` from the root directory of the NeMo Agent toolkit library. Similarly, for users installing from PyPI, this can be done by running `pip install nvidia-nat[async_endpoints]`. +... by running `uv pip install -e '.[async_endpoints]'` from the root directory of the NeMo Agent toolkit library. Similarly, for users installing from PyPI, this can be done by running `pip install 'nvidia-nat[async_endpoints]'`.
71-74
: Tighten wording and fix minor grammar.Plural agreement and “1 hour”.
- - `sync_timeout`: The maximum time in seconds to wait for the job to complete before returning a response. If the job completes in less than `sync_timeout` seconds then the response will include the job result, otherwise the `job_id` and `status` is returned. Default is `0` which causes the request to return immediately, and maximum value for this field is `300`. + - `sync_timeout`: The maximum time in seconds to wait for the job to complete before returning a response. If the job completes within `sync_timeout`, the response includes the job result; otherwise, the `job_id` and `status` are returned. The default is `0` (return immediately). The maximum value is `300`. - - `expiry_seconds`: The amount of time in seconds after the job completes (either successfully or unsuccessfully) which any output files will be preserved before being deleted. Default is `3600` (1 hours), minimum is `600` (10 minutes) and maximum value for this field is `86400` (24 hours). The text output in the response is not affected by this field. + - `expiry_seconds`: The number of seconds after job completion (success or failure) that any output files are preserved before deletion. Default is `3600` (1 hour); minimum is `600` (10 minutes); maximum is `86400` (24 hours). The text output in the response is not affected by this field.
109-118
: Standardize timestamps (add “Z” or state timezone).
created_at
andupdated_at
omitZ
whileexpires_at
includes it. Prefer RFC 3339 UTC everywhere.- "created_at": "2025-09-10T20:52:24.768066", + "created_at": "2025-09-10T20:52:24.768066Z", ... - "updated_at": "2025-09-10T20:52:30.734659" + "updated_at": "2025-09-10T20:52:30.734659Z"
60-66
: Add retrieval endpoints and failure states for async jobs.Document how to poll job status and fetch results/errors (route, method, response schema, typical HTTP status codes).
I can draft a short “Job Status and Retrieval” subsection once you confirm the route(s), e.g.,
GET /jobs/{job_id}
and/orGET /generate/async/{job_id}
with optional deletion/cleanup semantics.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
docs/source/reference/api-server-endpoints.md
(2 hunks)
🧰 Additional context used
📓 Path-based instructions (5)
docs/source/**/*.md
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
docs/source/**/*.md
: Use the official naming: first use “NVIDIA NeMo Agent toolkit”; subsequent uses “NeMo Agent toolkit”; never use deprecated names in documentation
Documentation sources must be Markdown under docs/source; keep docs in sync and fix Sphinx errors/broken links
Documentation must be clear, comprehensive, free of TODO/FIXME/placeholders/offensive/outdated terms; fix spelling; adhere to Vale vocab allow/reject lists
Files:
docs/source/reference/api-server-endpoints.md
**/*.{py,sh,md,yml,yaml,toml,ini,json,ipynb,txt,rst}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
**/*.{py,sh,md,yml,yaml,toml,ini,json,ipynb,txt,rst}
: Every file must start with the standard SPDX Apache-2.0 header; keep copyright years up‑to‑date
All source files must include the SPDX Apache‑2.0 header; do not bypass CI header checks
Files:
docs/source/reference/api-server-endpoints.md
**/*.{py,md}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Never hard‑code version numbers in code or docs; versions are derived by setuptools‑scm
Files:
docs/source/reference/api-server-endpoints.md
**/*
⚙️ CodeRabbit configuration file
**/*
: # Code Review Instructions
- Ensure the code follows best practices and coding standards. - For Python code, follow
PEP 20 and
PEP 8 for style guidelines.- Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values.
Example:def my_function(param1: int, param2: str) -> bool: pass- For Python exception handling, ensure proper stack trace preservation:
- When re-raising exceptions: use bare
raise
statements to maintain the original stack trace,
and uselogger.error()
(notlogger.exception()
) to avoid duplicate stack trace output.- When catching and logging exceptions without re-raising: always use
logger.exception()
to capture the full stack trace information.Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any
words listed in the
ci/vale/styles/config/vocabularies/nat/reject.txt
file, words that might appear to be
spelling mistakes but are listed in theci/vale/styles/config/vocabularies/nat/accept.txt
file are OK.Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,
and should contain an Apache License 2.0 header comment at the top of each file.
- Confirm that copyright years are up-to date whenever a file is changed.
Files:
docs/source/reference/api-server-endpoints.md
docs/source/**/*
⚙️ CodeRabbit configuration file
This directory contains the source code for the documentation. All documentation should be written in Markdown format. Any image files should be placed in the
docs/source/_static
directory.
Files:
docs/source/reference/api-server-endpoints.md
🪛 markdownlint-cli2 (0.17.2)
docs/source/reference/api-server-endpoints.md
21-21: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
23-23: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
34-34: Bare URL used
(MD034, no-bare-urls)
🔇 Additional comments (1)
docs/source/reference/api-server-endpoints.md (1)
27-39
: Validation complete — internal links and config key "db_url" verified.Quick checks: "Obtaining API Keys" anchor exists; evaluate-api.md and websockets.md are present. db_url is used consistently: defined in src/nat/front_ends/fastapi/fastapi_front_end_config.py, consumed by src/nat/front_ends/fastapi/job_store.py (falls back to NAT_JOB_STORE_DB_URL), passed through the FastAPI front-end/plugin and worker code, and covered by tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (9)
docs/source/reference/api-server-endpoints.md (9)
18-18
: Use the official product name casing in the first mention.Change “Toolkit” → “toolkit” to match the guideline “NVIDIA NeMo Agent toolkit” on first use.
-# NVIDIA NeMo Agent Toolkit API Server Endpoints +# NVIDIA NeMo Agent toolkit API Server Endpoints
21-24
: Fix list indentation to satisfy markdownlint MD007.Top-level list items shouldn’t be indented.
- - **Generate Interface:** Uses the transaction schema defined by your workflow. The interface documentation is accessible +- **Generate Interface:** Uses the transaction schema defined by your workflow. The interface documentation is accessible using Swagger while the server is running [`http://localhost:8000/docs`](http://localhost:8000/docs). - - **Chat Interface:** [OpenAI API Documentation](https://platform.openai.com/docs/guides/text?api-mode=chat) provides +- **Chat Interface:** [OpenAI API Documentation](https://platform.openai.com/docs/guides/text?api-mode=chat) provides details on chat formats compatible with the NeMo Agent toolkit server.
34-38
: Split run-on sentence and avoid bare URL (MD034).Improve readability and lint compliance.
-The following examples assume that the simple calculator workflow has been installed and is running on http://localhost:8000 to do so run the following commands: +The following examples assume the simple calculator workflow is installed and running at <http://localhost:8000>. To do so, run: ```bash uv pip install -e examples/getting_started/simple_calculator nat serve --config_file examples/getting_started/simple_calculator/configs/config.yml--- `60-66`: **Clarify Dask/SQLAlchemy configuration, fix grammar, and mention max_running_async_jobs.** - Break up run-ons. - “metadata are stored” (plural). - Document how max_running_async_jobs controls local parallelism when no external scheduler is used. Please confirm max_running_async_jobs maps to LocalCluster workers in code. ```diff -This endpoint is only available when the `async_endpoints` optional dependency extra is installed. For users installing from source, this can be done by running `uv pip install -e .[async_endpoints]` from the root directory of the NeMo Agent toolkit library. Similarly, for users installing from PyPI, this can be done by running `pip install nvidia-nat[async_endpoints]`. +This endpoint is only available when the `async_endpoints` optional dependency extra is installed. For source installs, run `uv pip install -e .[async_endpoints]` from the repository root. For PyPI installs, run `pip install nvidia-nat[async_endpoints]`. -Asynchronous jobs are managed using [Dask](https://docs.dask.org/en/stable/) by default a local Dask cluster is created at start time, however you can also configure the server to connect to an existing Dask scheduler by setting the `scheduler_address` configuration parameter. The Dask scheduler is used to manage the execution of asynchronous jobs, and can be configured to run on a single machine or across a cluster of machines. Job history and metadata is stored in a SQL database using [SQLAlchemy](https://www.sqlalchemy.org/) by default a temporary SQLite database is created at start time, however you can also configure the server to use a persistent database by setting the `db_url` configuration parameter. Refer to the [SQLAlchemy documentation](https://docs.sqlalchemy.org/en/20/core/engines.html#database-urls) for the format of the `db_url` parameter. Any database supported by [SQLAlchemy's Asynchronous I/O extension](https://docs.sqlalchemy.org/en/20/orm/extensions/asyncio.html) can be used, refer to [SQLAlchemy's Dialects](https://docs.sqlalchemy.org/en/20/dialects/index.html) for a complete list (many but not all of these support Asynchronous I/O). +Asynchronous jobs are managed using [Dask](https://docs.dask.org/en/stable/). By default, if `scheduler_address` is not set, the server creates a local Dask cluster at startup. The number of concurrent jobs is controlled by the `max_running_async_jobs` configuration option. To use an external scheduler, set `scheduler_address`. + +Job history and metadata are stored in a SQL database using [SQLAlchemy](https://www.sqlalchemy.org/). By default, a temporary SQLite database is created at start time. To use a persistent database, set the `db_url` configuration parameter. Refer to the [SQLAlchemy documentation](https://docs.sqlalchemy.org/en/20/core/engines.html#database-urls) for the `db_url` format. Any database supported by [SQLAlchemy’s asynchronous I/O extension](https://docs.sqlalchemy.org/en/20/orm/extensions/asyncio.html) can be used—see [SQLAlchemy’s Dialects](https://docs.sqlalchemy.org/en/20/dialects/index.html) for a complete list (many but not all support asynchronous I/O).
71-74
: Tighten wording and fix grammar in optional fields.Subject–verb agreement, clearer phrasing, and correct “1 hour”.
- - `sync_timeout`: The maximum time in seconds to wait for the job to complete before returning a response. If the job completes in less than `sync_timeout` seconds then the response will include the job result, otherwise the `job_id` and `status` is returned. Default is `0` which causes the request to return immediately, and maximum value for this field is `300`. + - `sync_timeout`: The maximum time in seconds to wait for the job to complete before returning a response. If the job completes within `sync_timeout`, the response includes the job result; otherwise, the `job_id` and `status` are returned. Default is `0` (return immediately). The maximum value is `300`. - - `expiry_seconds`: The amount of time in seconds after the job completes (either successfully or unsuccessfully) which any output files will be preserved before being deleted. Default is `3600` (1 hours), minimum is `600` (10 minutes) and maximum value for this field is `86400` (24 hours). The text output in the response is not affected by this field. + - `expiry_seconds`: The number of seconds after the job completes (success or failure) during which any output files are preserved before deletion. Default is `3600` (1 hour), minimum is `600` (10 minutes), and the maximum is `86400` (24 hours). The text output in the response is not affected by this field.
109-118
: Use consistent ISO 8601 timestamps with timezone.Add “Z” (UTC) or an explicit offset for consistency with
expires_at
.- "created_at": "2025-09-10T20:52:24.768066", + "created_at": "2025-09-10T20:52:24.768066Z", @@ - "updated_at": "2025-09-10T20:52:30.734659" + "updated_at": "2025-09-10T20:52:30.734659Z"
345-357
: Standardize product name casing.Use “NeMo Agent toolkit” (lowercase “toolkit”) in narrative text.
-The NeMo Agent Toolkit provides full OpenAI Chat Completions API compatibility through a dedicated endpoint that enables seamless integration with existing OpenAI-compatible client libraries and workflows. +The NeMo Agent toolkit provides full OpenAI Chat Completions API compatibility through a dedicated endpoint that enables seamless integration with existing OpenAI-compatible client libraries and workflows.
467-471
: Standardize product name casing in code comment.Align with naming guideline.
-# Initialize client pointing to your NeMo Agent Toolkit server +# Initialize client pointing to your NeMo Agent toolkit server
534-536
: Standardize section heading casing.Use “NeMo Agent toolkit”.
-## NeMo Agent Toolkit API Server Interaction Guide +## NeMo Agent toolkit API Server Interaction Guide
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
docs/source/reference/api-server-endpoints.md
(2 hunks)
🧰 Additional context used
📓 Path-based instructions (5)
docs/source/**/*.md
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
docs/source/**/*.md
: Use the official naming: first use “NVIDIA NeMo Agent toolkit”; subsequent uses “NeMo Agent toolkit”; never use deprecated names in documentation
Documentation sources must be Markdown under docs/source; keep docs in sync and fix Sphinx errors/broken links
Documentation must be clear, comprehensive, free of TODO/FIXME/placeholders/offensive/outdated terms; fix spelling; adhere to Vale vocab allow/reject lists
Files:
docs/source/reference/api-server-endpoints.md
**/*.{py,sh,md,yml,yaml,toml,ini,json,ipynb,txt,rst}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
**/*.{py,sh,md,yml,yaml,toml,ini,json,ipynb,txt,rst}
: Every file must start with the standard SPDX Apache-2.0 header; keep copyright years up‑to‑date
All source files must include the SPDX Apache‑2.0 header; do not bypass CI header checks
Files:
docs/source/reference/api-server-endpoints.md
**/*.{py,md}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Never hard‑code version numbers in code or docs; versions are derived by setuptools‑scm
Files:
docs/source/reference/api-server-endpoints.md
**/*
⚙️ CodeRabbit configuration file
**/*
: # Code Review Instructions
- Ensure the code follows best practices and coding standards. - For Python code, follow
PEP 20 and
PEP 8 for style guidelines.- Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values.
Example:def my_function(param1: int, param2: str) -> bool: pass- For Python exception handling, ensure proper stack trace preservation:
- When re-raising exceptions: use bare
raise
statements to maintain the original stack trace,
and uselogger.error()
(notlogger.exception()
) to avoid duplicate stack trace output.- When catching and logging exceptions without re-raising: always use
logger.exception()
to capture the full stack trace information.Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any
words listed in the
ci/vale/styles/config/vocabularies/nat/reject.txt
file, words that might appear to be
spelling mistakes but are listed in theci/vale/styles/config/vocabularies/nat/accept.txt
file are OK.Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,
and should contain an Apache License 2.0 header comment at the top of each file.
- Confirm that copyright years are up-to date whenever a file is changed.
Files:
docs/source/reference/api-server-endpoints.md
docs/source/**/*
⚙️ CodeRabbit configuration file
This directory contains the source code for the documentation. All documentation should be written in Markdown format. Any image files should be placed in the
docs/source/_static
directory.
Files:
docs/source/reference/api-server-endpoints.md
🪛 markdownlint-cli2 (0.17.2)
docs/source/reference/api-server-endpoints.md
21-21: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
23-23: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
34-34: Bare URL used
(MD034, no-bare-urls)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: CI Pipeline / Check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need quotes around .[async_endpoints]
because some shells are picky.
…avid-document-async-endpoints Signed-off-by: David Gardner <[email protected]>
Co-authored-by: lvojtku <[email protected]> Signed-off-by: David Gardner <[email protected]>
Co-authored-by: lvojtku <[email protected]> Signed-off-by: David Gardner <[email protected]>
Co-authored-by: lvojtku <[email protected]> Signed-off-by: David Gardner <[email protected]>
Co-authored-by: Will Killian <[email protected]> Signed-off-by: David Gardner <[email protected]>
Co-authored-by: lvojtku <[email protected]> Signed-off-by: David Gardner <[email protected]>
…-nv/AIQtoolkit into david-document-async-endpoints Signed-off-by: David Gardner <[email protected]>
Co-authored-by: lvojtku <[email protected]> Signed-off-by: David Gardner <[email protected]>
Co-authored-by: lvojtku <[email protected]> Signed-off-by: David Gardner <[email protected]>
Co-authored-by: lvojtku <[email protected]> Signed-off-by: David Gardner <[email protected]>
Co-authored-by: Will Killian <[email protected]> Signed-off-by: David Gardner <[email protected]>
Signed-off-by: David Gardner <[email protected]>
…-nv/AIQtoolkit into david-document-async-endpoints Signed-off-by: David Gardner <[email protected]>
Signed-off-by: David Gardner <[email protected]>
/merge |
Description
/generate/async
endpointmax_running_async_jobs
config parameterBy Submitting this PR I confirm:
Summary by CodeRabbit
• New Features
• Added an asynchronous generate API endpoint (/generate/async) with job IDs, optional timeouts, and output expiry; supports external schedulers and SQL-backed job history.
• Added an OpenAI v1-compatible API endpoint option.
• Made async worker count configurable (one thread per worker).
• Documentation
• Expanded startup guide, API key setup, install/run examples, migration guidance, and detailed async/streaming request/response examples (including intermediate streaming payloads).
• Noted installation requirement and install commands for the evaluation endpoint.