Skip to content

Expanding ChatPromptParser to handle other content types #11044

@sophialagerkranspandey

Description

@sophialagerkranspandey

Discussed in #11012

Originally posted by glorious-beard March 17, 2025
Now that OpenAI can handle file inputs (for PDFs) in addition to text and images, are there plans to add the ability to parse additional content tags in ChatPromptParser to handle additional content types, like BinaryContent, AudioContent, etc.? (Claude can handle PDFs too - see here)

Additional tags could include:

  • '<audio> (base64 audio stream) </audio>' - Parsed into an AudioContent instance
  • '<binary mimeType="(mime type)"> (base64 content) </binary>' - Parsed into a BinaryContent instance, with mimeType defaulting to "application/octet-stream" if not present
  • '<pdf> (base64 content) </pdf>' - Parsed into a new PdfContent class derived from BinaryContent

My application makes heavy use of the YAML prompt templates so this would be very helpful in not having to manually build chat histories for any operation involving inputs beyond text and images.

I volunteer to add the above if it's not already planned for a near term release.

(Maybe this is an extension of this discussion?)

Metadata

Metadata

Assignees

Type

No type

Projects

Status

Sprint: Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions