Update document capabilities for LLModel #543

micahsmith · 2025-08-05T16:29:57Z

In the course of attempting to use Koog with Gemini models for extracting structured data from PDFs, I discovered that the Google models aren't listed for LLMCapability.Document.

It may be the case that LLMCapability.Document is intended to be a broader category than just PDFs, but I would propose that since there isn't a specific capability for PDFs it would be better to be more permissive and list the Document capability for APIs that support PDFs.

For instance, the Gemini docs here specifically note that formats like markdown, etc., are supported, but that "document vision only meaningfully understands PDFs", suggesting PDFs are considered a separate processing category.

Anthropic model support for PDFs is documented here. Currently the models listed are:

Claude Opus 4 (claude-opus-4-20250514)
Claude Sonnet 4 (claude-sonnet-4-20250514)
Claude Sonnet 3.7 (claude-3-7-sonnet-20250219)
Claude Sonnet 3.5 models (claude-3-5-sonnet-20241022, claude-3-5-sonnet-20240620)
Claude Haiku 3.5 (claude-3-5-haiku-20241022)

Gemini model support for PDFs appears to be ubiquitous (the documentation doesn't specify any models for which it doesn't work). The model support documentation is misleading, as I've confirmed PDFs are supported across generations (specifically 1.5 flash, 2.0 flash, 2.5 flash and 2.5 pro).

The Llama documentation on vision capabilities assumes that processing documents is a part of any multimodal model supporting vision (and this has been confirmed in my own experience).

The OpenAI documentation states explicitly that "OpenAI models with vision capabilities can also accept PDF files as input".

Type of the change

New feature
Bug fix
Documentation fix
Tests improvement

Checklist for all pull requests

The pull request has a description of the proposed change
I read the Contributing Guidelines before opening the pull request
The pull request uses develop as the base branch
Tests for the changes have been added
All new and existing tests passed

Additional steps for pull requests adding a new feature

An issue describing the proposed change exists
The pull request includes a link to the issue
The change was discussed and approved in the issue
Docs have been added / updated

aozherelyeva

Thanks for the contribution!

micahsmith added 2 commits August 5, 2025 12:22

added document capability to google and anthropic models

3094bc7

added document capability to bedrock, openai, and ollama models

c99342a

micahsmith marked this pull request as ready for review August 5, 2025 19:00

Merge branch 'JetBrains:develop' into msmith/llm-model-document-support

e493e6e

aozherelyeva requested review from Rizzen and aozherelyeva August 6, 2025 11:05

aozherelyeva approved these changes Aug 6, 2025

View reviewed changes

aozherelyeva merged commit 1d78f42 into JetBrains:develop Aug 6, 2025
4 of 5 checks passed

micahsmith deleted the msmith/llm-model-document-support branch August 6, 2025 13:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update document capabilities for LLModel #543

Update document capabilities for LLModel #543

Uh oh!

micahsmith commented Aug 5, 2025 •

edited

Loading

Uh oh!

aozherelyeva left a comment

Uh oh!

Uh oh!

Uh oh!

Update document capabilities for LLModel #543

Update document capabilities for LLModel #543

Uh oh!

Conversation

micahsmith commented Aug 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Type of the change

Checklist for all pull requests

Additional steps for pull requests adding a new feature

Uh oh!

aozherelyeva left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

micahsmith commented Aug 5, 2025 •

edited

Loading