Skip to content

Conversation

micahsmith
Copy link
Contributor

@micahsmith micahsmith commented Aug 5, 2025

In the course of attempting to use Koog with Gemini models for extracting structured data from PDFs, I discovered that the Google models aren't listed for LLMCapability.Document.

It may be the case that LLMCapability.Document is intended to be a broader category than just PDFs, but I would propose that since there isn't a specific capability for PDFs it would be better to be more permissive and list the Document capability for APIs that support PDFs.

For instance, the Gemini docs here specifically note that formats like markdown, etc., are supported, but that "document vision only meaningfully understands PDFs", suggesting PDFs are considered a separate processing category.

Anthropic model support for PDFs is documented here. Currently the models listed are:

  • Claude Opus 4 (claude-opus-4-20250514)
  • Claude Sonnet 4 (claude-sonnet-4-20250514)
  • Claude Sonnet 3.7 (claude-3-7-sonnet-20250219)
  • Claude Sonnet 3.5 models (claude-3-5-sonnet-20241022, claude-3-5-sonnet-20240620)
  • Claude Haiku 3.5 (claude-3-5-haiku-20241022)

Gemini model support for PDFs appears to be ubiquitous (the documentation doesn't specify any models for which it doesn't work). The model support documentation is misleading, as I've confirmed PDFs are supported across generations (specifically 1.5 flash, 2.0 flash, 2.5 flash and 2.5 pro).

The Llama documentation on vision capabilities assumes that processing documents is a part of any multimodal model supporting vision (and this has been confirmed in my own experience).

The OpenAI documentation states explicitly that "OpenAI models with vision capabilities can also accept PDF files as input".


Type of the change

  • New feature
  • Bug fix
  • Documentation fix
  • Tests improvement

Checklist for all pull requests

  • The pull request has a description of the proposed change
  • I read the Contributing Guidelines before opening the pull request
  • The pull request uses develop as the base branch
  • Tests for the changes have been added
  • All new and existing tests passed
Additional steps for pull requests adding a new feature
  • An issue describing the proposed change exists
  • The pull request includes a link to the issue
  • The change was discussed and approved in the issue
  • Docs have been added / updated

@micahsmith micahsmith marked this pull request as ready for review August 5, 2025 19:00
Copy link
Contributor

@aozherelyeva aozherelyeva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution!

@aozherelyeva aozherelyeva merged commit 1d78f42 into JetBrains:develop Aug 6, 2025
4 of 5 checks passed
@micahsmith micahsmith deleted the msmith/llm-model-document-support branch August 6, 2025 13:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants