.Net: Add support for audio and binary tags to chat prompt parser #11919

glorious-beard · 2025-05-06T20:38:40Z

Motivation and Context

Why is this change required?

This template parsers like the YAML parser to embed content types other than just text and images for LLMs that support additional content types, like PDFs for OpenAI and DOCXs for Claude. Without this capability, functions with prompts that have attachments would have to manually build it's chat history in code.

What problem does it solve?

See above

What scenario does it contribute to?

Usage additional content types beyond visuals and audio for user messages

Open Issues Addressed

Fixes Expanding ChatPromptParser to handle other content types #11044

Description

Chat Prompt Parser

To preserve backward compatibility, rather than consolidating binary content types, I chose to go with adding additional content types so that LLM chat service providers could opt-in to new content types. It also reduces the chances of breaking existing code.

3 new content types are created:

PdfContent for PDF files. Uses the tag "<pdf>". Allows for Base64 data URIs or standard URIs, similar to ImageContent.
DocContent for MS Word .doc files. Uses the tag "<doc>". Allows for Base64 data URIs or standard URIs, similar to ImageContent.
DocxContent for MS Word .docx files. Uses the tag "<docx>". Allows for Base64 data URIs or standard URIs, similar to ImageContent.

(NOTE: DocContent and DocxContent are mainly separate because they have different MIME types and different content formats, though they could easily be consolidated into a single tag and just let the LLM provider handle distinguishing between "doc" and "docx" files. Alternately, I could also see the case for dropping ".doc" support and requiring the caller to only use ".docx".)

In addition, the following 2 contents are now parsed from the XML:

AudioContent - Parses the tag "<audio>" with either Base64 data URIs or standard URIs, similar to ImageContent.
BinaryContent - Parses the tag "<file>" with either Base64 data URIs or standard URIs, similar to ImageContent.

Here is a sample:

            
<message role='user'>
  This part will be discarded upon parsing
  <text>Make sense of this random assortment of stuff.</text>
  <image>https://fake-link-to-image/</image>
  <audio>data:audio/wav;base64,UklGRiQAAABXQVZFZm10IBAAAAABAAEAIlYAAACABAAZGF0YVgAAAAA</audio>
  <pdf>data:application/pdf;base64,JVBERi0xLjQKJeLjz9MKMyAwIG9iago8PC9UeXBlL1hSZWYvUGFnZXMgNiAwIFIKL1R5cGUvUGFnZS9NZWRpYUJveCBbMCAwIDQ4MCA1MF0KL0NvbnRlbnRzIDw8L0V4dEdTdGF0ZSA8PC9JRCBbPDwvTGVuZ3RoIDQ4XQovRm9udCA8PC9GMSA8PC9GMiA8PC9GMyA8PC9GNCBbPDwvTGVuZ3RoIDQ4XQovRm9udCA8PC9GNSA8PC9GNiA8PC9GNyBbPDwvTGVuZ3RoIDQ4XQovRm9udCA8PC9GOCAvPj4KZW5kb2JqCjEwIDAgb2JqCjw8L1R5cGUvUGFnZS9NYWRlYUJveCBbMCAwIDQ4MCA1MF0KL0NvbnRlbnRzIDw8L0V4dEdTdGF0ZSA8PC9JRCBbPDwvTGVuZ3RoIDQ4XQovRm9udCA8PC9GMSA8PC9GMiA8PC9GMyA8PC9GNCBbPDwvTGVuZ3RoIDQ4XQovRm9udCA8PC9GNSA8PC9GNiA8PC9GNyBbPDwvTGVuZ3RoIDQ4XQovRm9udCA8PC9GOCAvPj4KZW5kb2JqCjEwIDAgb2JqCjw8L1R5cGUvUGFnZS9NYWRlYUJveCBbMCAwIDQ4MCA1MF0KL0NvbnRlbnRzIDw8L0V4dEdTdGF0ZSA8PC9JRCBbPDwvTGVuZ3RoIDQ4XQovRm9udCA8PC9GMSA8PC9G</pdf>
  <pdf>https://fake-link-to-pdf/</pdf>  
 
 <doc>data:application/msword;base64,UEsDBBQAAAAIAI+Q1k5a2gAAABQAAAAIAAAAbmFtZS5kb2N4VVQJAAD9AAAACwAAAB4AAAAAA==</doc>
  <doc>https://fake-link-to-doc/</doc>
  <docx>data:application/vnd.openxmlformats-officedocument.wordprocessingml.document;base64,UEsDBBQAAAAIAI+Q1k5a2gAAABQAAAAIAAAAbmFtZS5kb2N4VVQJAAD9AAAACwAAAB4AAAAAA==</docx>
  <docx>https://fake-link-to-docx/</docx>
  <file>data:application/octet-stream;base64,UEsDBBQAAAAIAI+Q1k5a2gAAABQAAAAIAAAAbmFtZS5kb2N4VVQJAAD9AAAACwAAAB4AAAAAA==</file>
  <file>https://fake-link-to-binary/</file>
  This part will also be discarded upon parsing
</message>

Amazon Bedrock

Modified the Converse API request generator to handle the subset of binary content supported by Amazon Bedrock (PDF, DOC, DOCX, and Image), as documented here.

OpenAI

Modified the client to handle PDF content, audio content, and file references when generating a request to an OpenAI (or OpenAI compatible) client.

Contribution Checklist

The code builds clean without any errors or warnings
The PR follows the SK Contribution Guidelines and the pre-submission formatting script raises no violations
All unit tests pass, and I have added new tests where possible
I didn't break anyone 😄

…ntent This is to support microsoft#11044.

rogerbarreto · 2025-05-30T09:45:52Z

@glorious-beard I updated the proposal to be abstract as this is applied to the SemanticKernel.Abstraction package.

As we will have many different types of documents and binary files, to be more abroad and less specific, is better not introduce any special content types and use the existing ones we already have that works.

Given that updated the logic to accept a mimetype attribute as part of the <binary mimetype="type/subtype"/> to solve the scenarios where you provide a Uri.

For dataUri content, the mimeType is picked automatically from the data:mimeType schema.

rogerbarreto · 2025-05-30T10:05:52Z

Updated PR Description

Motivation and Context

Enhance the Chat Prompt XML parsing capability to also support audio and documents.

Fixes Expanding ChatPromptParser to handle other content types #11044

Description

The following 2 contents are now supported from the Chat Prompt XML:

AudioContent - Parses the tag <audio mimetype="type/subtype"> with either Base64 data URIs or standard URIs, similar to ImageContent.
BinaryContent - Parses the tag <binary mimetype="type/subtype"> with either Base64 data URIs or standard URIs, similar to ImageContent.

The mimetype attribute is optional, and can be omitted for Base64 data URIs.

Here is a sample:

<message role='user'>
  This part will be discarded upon parsing
  <text>Summarize all the contents I provided in this message.</text>
  <image mimetype="image/png">https://fake-link-to-image/</image>
  <audio>data:audio/wav;base64,UklGRiQAAAB...</audio>
  <binary>data:application/pdf;base64,UklGRiQAAAB...</binary>
  <binary mimetype="application/pdf">https://fake-link-to-pdf/</binary>  
  <binary>data:application/msword;base64,UklGRiQAAAB...</binary>
  <binary mimetype="octet/stream">https://fake-link-to-binary/</binary>
</message>

Contribution Checklist

The code builds clean without any errors or warnings
The PR follows the SK Contribution Guidelines and the pre-submission formatting script raises no violations
All unit tests pass, and I have added new tests where possible
I didn't break anyone 😄

dotnet/src/SemanticKernel.Abstractions/AI/ChatCompletion/ChatPromptParser.cs

glorious-beard added 2 commits May 6, 2025 13:10

feat: added handling for audio, pdf, docx, doc, and generic binary co…

32d48eb

…ntent This is to support microsoft#11044.

fix: switched to factory function for creating kernel content.

3e7be45

glorious-beard requested a review from a team as a code owner May 6, 2025 20:38

glorious-beard changed the title ~~Glorious-beard/11044-expand-chat-prompt-parser~~ .Net: Add support for audio, pdf, doc, and docx to chat prompt parser May 6, 2025

Merge branch 'main' into glorious-beard/11044-expand-chat-prompt-parser

b40bc84

rogerbarreto added ai connector Anything related to AI connectors needs discussion Issues that require discussion by the internal Semantic Kernel team before proceeding labels May 8, 2025

rogerbarreto assigned rogerbarreto and glorious-beard May 8, 2025

Merge branch 'main' into glorious-beard/11044-expand-chat-prompt-parser

15d1751

markwallace-microsoft added .NET Issue or Pull requests regarding .NET code kernel Issues or pull requests impacting the core kernel kernel.core labels May 8, 2025

glorious-beard and others added 4 commits May 9, 2025 11:45

Merge branch 'main' into glorious-beard/11044-expand-chat-prompt-parser

b02ac5e

Merge branch 'main' into glorious-beard/11044-expand-chat-prompt-parser

d0f7f2b

Merge branch 'main' into glorious-beard/11044-expand-chat-prompt-parser

f7384d8

Abstract chat parser update

ab7def9

rogerbarreto removed the needs discussion Issues that require discussion by the internal Semantic Kernel team before proceeding label May 30, 2025

Merge branch 'main' into glorious-beard/11044-expand-chat-prompt-parser

4380df4

rogerbarreto temporarily deployed to integration May 30, 2025 09:41 — with GitHub Actions Inactive

SergeyMenshykh approved these changes May 30, 2025

View reviewed changes

dotnet/src/SemanticKernel.Abstractions/AI/ChatCompletion/ChatPromptParser.cs Outdated Show resolved Hide resolved

Address PR comments

b48b978

markwallace-microsoft added the documentation label Jun 4, 2025

rogerbarreto changed the title ~~.Net: Add support for audio, pdf, doc, and docx to chat prompt parser~~ .Net: Add support for audio and binary tags to chat prompt parser Jun 4, 2025

rogerbarreto temporarily deployed to integration June 4, 2025 16:37 — with GitHub Actions Inactive

markwallace-microsoft approved these changes Jun 4, 2025

View reviewed changes

rogerbarreto added this pull request to the merge queue Jun 5, 2025

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jun 5, 2025

rogerbarreto added this pull request to the merge queue Jun 5, 2025

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jun 5, 2025

rogerbarreto added this pull request to the merge queue Jun 5, 2025

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jun 5, 2025

Merge branch 'main' into glorious-beard/11044-expand-chat-prompt-parser

2cd0136

rogerbarreto enabled auto-merge June 6, 2025 15:20

rogerbarreto temporarily deployed to integration June 6, 2025 15:20 — with GitHub Actions Inactive

rogerbarreto added this pull request to the merge queue Jun 6, 2025

Merged via the queue into microsoft:main with commit 5c04bbe Jun 6, 2025
19 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

.Net: Add support for audio and binary tags to chat prompt parser #11919

.Net: Add support for audio and binary tags to chat prompt parser #11919

Uh oh!

glorious-beard commented May 6, 2025 •

edited by rogerbarreto

Loading

Uh oh!

rogerbarreto commented May 30, 2025

Uh oh!

rogerbarreto commented May 30, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

.Net: Add support for audio and binary tags to chat prompt parser #11919

.Net: Add support for audio and binary tags to chat prompt parser #11919

Uh oh!

Conversation

glorious-beard commented May 6, 2025 • edited by rogerbarreto Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation and Context

Why is this change required?

What problem does it solve?

What scenario does it contribute to?

Open Issues Addressed

Description

Chat Prompt Parser

Amazon Bedrock

OpenAI

Contribution Checklist

Uh oh!

rogerbarreto commented May 30, 2025

Uh oh!

rogerbarreto commented May 30, 2025

Updated PR Description

Motivation and Context

Description

Contribution Checklist

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

glorious-beard commented May 6, 2025 •

edited by rogerbarreto

Loading