feat(process): introduce record processor factory and enrich conversion/transform pipeline #2796

Gezi-lzq · 2025-08-31T10:57:49Z

Config

(Add) automq.table.topic.convert.value.type (String)
- Explanation: Specifies how to parse Kafka record values. Supported values are raw, string, by_schema_id, and by_latest_schema. Schema Registry URL required for by_schema_id and by_latest_schema.
(Add) automq.table.topic.convert.key.type (String)
- Explanation: Specifies how to parse Kafka record keys. Supported values are raw, string, by_schema_id, and by_latest_schema. Schema Registry URL required for by_schema_id and by_latest_schema.
(Add) automq.table.topic.convert.value.subject (String, Optional)
- Explanation: The Schema Registry subject name for value schemas. Defaults to {topic-name}-value if not specified.
(Add) automq.table.topic.convert.value.message.full.name (String, Optional)
- Explanation: The fully qualified message name for Protobuf value schemas. Used when schema contains multiple message types. Defaults to first message type.
(Add) automq.table.topic.convert.key.subject (String, Optional)
- Explanation: The Schema Registry subject name for key schemas. Defaults to {topic-name}-key if not specified.
(Add) automq.table.topic.convert.key.message.full.name (String, Optional)
- Explanation: The fully qualified message name for Protobuf key schemas. Used when schema contains multiple message types. Defaults to first message type.
(Add) automq.table.topic.transform.value.type (String)
- Explanation: Transformation to apply to the record value after conversion. Supported values are none, flatten, and flatten_debezium.
  - none: No transformation applied.
  - flatten: Extract fields from structured records, promoting nested fields to top level.
  - flatten_debezium: Process Debezium CDC events, extracting before/after states based on operation type.
(Deprecated) automq.table.topic.schema.type (String)
- Explanation: This configuration is deprecated. Use separate converter and transform configurations instead.
- Migration: schema = convert.value.type=by_schema_id + transform.value.type=flatten

changelist:

Add RecordProcessorFactory to support dynamic creation of processors based on convert and transform configs
Introduce ConverterFactory for unified converter instantiation and management
Replace RegistryConverterFactory with more flexible ConverterFactory implementation
Add SchemaFormat enum to support different schema formats (Avro, Protobuf, Raw, String)
Implement TableTopicConvertType and TableTopicTransformType enums for config-driven processing
Enhance Converter interface to support separate key/value conversion with unified RecordData output
Add RecordAssembler for assembling processed records into final output format
Refactor AvroRegistryConverter and ProtobufRegistryConverter to work with new converter architecture
Add StringConverter for simple string-based conversions
Replace ValueUnwrapTransform with FlattenTransform for improved field extraction
Update DebeziumUnwrapTransform with enhanced CDC event processing
Add SchemalessTransform for handling raw record transformations
Remove obsolete KafkaRecordTransform and related classes
Update TopicConfig with new converter and transform configuration options
Add comprehensive unit tests for new processor factory and transforms
Include protobuf test schema and related test utilities

… conversion/transform pipeline - Add RecordProcessorFactory to support dynamic creation of processors based on schema and transform configs - Refactor RegistryConverterFactory for improved schema format handling and converter instantiation - Implement SchemaFormat, TableTopicConvertType, and TableTopicTransformType enums for config-driven processing - Enhance Converter interface and conversion records to include key, value, and timestamp fields - Refactor AvroRegistryConverter and ProtobufRegistryConverter to return unified RecordData objects - Add ProtoToAvroConverter for robust Protobuf-to-Avro conversion - Update transform chain: add KafkaMetadataTransform for metadata enrichment, refactor DebeziumUnwrapTransform - Update DefaultRecordProcessor and TransformContext to support partition-aware processing - Improve error handling and code clarity across conversion and transform modules

clients/src/main/java/org/apache/kafka/common/config/TopicConfig.java

core/src/main/java/kafka/automq/table/process/Converter.java

core/src/main/java/kafka/automq/table/process/convert/RegistrySchemaConverter.java

superhx · 2025-09-01T03:43:18Z

core/src/main/java/kafka/automq/table/process/transform/KafkaMetadataTransform.java

+        List<Schema.Field> originalFields = new ArrayList<>();
+        for (Schema.Field field : originalSchema.getFields()) {
+            // Create a new Schema.Field instance for each original field
+            originalFields.add(new Schema.Field(field.name(), field.schema(), field.doc(), field.defaultVal(), field.order()));


Why do we need create a new Schema.Field here instead of reusing the old Schema.Field?

Here, the newly created Field must be used; using the existing one will result in failure. (f.position != -1)

for (Field f : fields) { if (f.position != -1) { throw new AvroRuntimeException("Field already used: " + f); } ....

https://github.com/apache/avro/blob/c53857b8c9694c2f9b8cb071dd2ef617d5cda8b7/lang/java/avro/src/main/java/org/apache/avro/Schema.java#L966-L986

…ters for improved record processing

…mentation for converters and transforms

…in ConversionResult

Copilot

Pull Request Overview

This PR introduces a comprehensive record processing factory and transformation pipeline that replaces the previous simplified schema-based approach with a flexible converter/transform architecture. The changes add support for granular configuration of key/value conversion and transformation types while maintaining backward compatibility with deprecated schema-based configurations.

Key changes:

Introduced a RecordProcessorFactory for dynamic processor creation based on configuration
Added new converter types (RAW, STRING, BY_SCHEMA_ID, BY_LATEST_SCHEMA) and transform types (NONE, FLATTEN, FLATTEN_DEBEZIUM)
Replaced the monolithic schema type configuration with separate key/value conversion and transformation settings

Reviewed Changes

Copilot reviewed 42 out of 42 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
LogConfig.java	Added new table topic converter and transform configuration options
TableTopicConvertType.java	Enum defining converter types (raw, string, schema-based)
TableTopicTransformType.java	Enum defining transformation types (none, flatten, debezium)
TableTopicSchemaType.java	Added NONE option to existing schema type enum
RecordProcessorFactoryTest.java	Comprehensive test suite for the new processor factory
Various converter classes	New converter implementations for different data types
Various transform classes	Updated and new transform implementations
Test classes	Updated test classes to work with new architecture

Comments suppressed due to low confidence (1)

core/src/test/java/kafka/automq/table/process/RecordProcessorFactoryTest.java:1

[nitpick] Inconsistent case handling for enum names. Other test methods use the enum name directly (e.g., TableTopicTransformType.NONE.name) while this line applies toLowerCase(). Consider using consistent approach throughout the test class.

/*

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

storage/src/main/java/org/apache/kafka/storage/internals/log/LogConfig.java

core/src/main/java/kafka/automq/table/process/convert/ProtobufRegistryConverter.java

clients/src/main/java/org/apache/kafka/common/config/TopicConfig.java

core/src/main/java/kafka/automq/table/process/RecordAssembler.java

core/src/main/java/kafka/automq/table/process/DefaultRecordProcessor.java

core/src/main/java/kafka/automq/table/process/transform/SchemalessTransform.java

…ey and value fields

…d identity

…ecordAssembler

…or key and value fields

…ions from TopicConfig and LogConfig

…on/transform pipeline (#2796) * feat(process): introduce flexible record processor factory and enrich conversion/transform pipeline - Add RecordProcessorFactory to support dynamic creation of processors based on schema and transform configs - Refactor RegistryConverterFactory for improved schema format handling and converter instantiation - Implement SchemaFormat, TableTopicConvertType, and TableTopicTransformType enums for config-driven processing - Enhance Converter interface and conversion records to include key, value, and timestamp fields - Refactor AvroRegistryConverter and ProtobufRegistryConverter to return unified RecordData objects - Add ProtoToAvroConverter for robust Protobuf-to-Avro conversion - Update transform chain: add KafkaMetadataTransform for metadata enrichment, refactor DebeziumUnwrapTransform - Update DefaultRecordProcessor and TransformContext to support partition-aware processing - Improve error handling and code clarity across conversion and transform modules

Gezi-lzq requested review from superhx, 1sonofqiu and woshigaopp as code owners August 31, 2025 10:57

Gezi-lzq force-pushed the feat/icberg-writer branch 4 times, most recently from 5211a93 to 80502e9 Compare August 31, 2025 11:59

Gezi-lzq force-pushed the feat/icberg-writer branch from 80502e9 to 0b078da Compare August 31, 2025 12:18

Gezi-lzq changed the title ~~feat(process): introduce flexible record processor factory and enrich conversion/transform pipeline~~ feat(process): introduce record processor factory and enrich conversion/transform pipeline Aug 31, 2025

superhx reviewed Sep 1, 2025

View reviewed changes

clients/src/main/java/org/apache/kafka/common/config/TopicConfig.java Outdated Show resolved Hide resolved

clients/src/main/java/org/apache/kafka/common/config/TopicConfig.java Outdated Show resolved Hide resolved

superhx reviewed Sep 1, 2025

View reviewed changes

feat(transform): introduce FlattenTransform and update related conver…

ffcbb51

…ters for improved record processing

Gezi-lzq force-pushed the feat/icberg-writer branch 2 times, most recently from 6301e2f to 046fd8f Compare September 3, 2025 12:11

feat(config): update topic schema type configuration and enhance docu…

e1e6ebe

…mentation for converters and transforms

Gezi-lzq force-pushed the feat/icberg-writer branch from 046fd8f to e1e6ebe Compare September 3, 2025 12:18

feat(conversion): enforce non-null constraints for schema and record …

7c909ca

…in ConversionResult

Gezi-lzq requested a review from Copilot September 3, 2025 12:28

Copilot AI reviewed Sep 3, 2025

View reviewed changes

superhx reviewed Sep 4, 2025

View reviewed changes

clients/src/main/java/org/apache/kafka/common/config/TopicConfig.java Outdated Show resolved Hide resolved

superhx reviewed Sep 4, 2025

View reviewed changes

Gezi-lzq added 4 commits September 4, 2025 11:55

feat(transform): update SchemalessTransform to use string types for k…

7475f4b

…ey and value fields

feat(process): optimize header processing by using a static schema an…

81afe16

…d identity

feat(process): update metadata documentation for partition field in R…

7f654f4

…ecordAssembler

feat(transform): enhance SchemalessTransform to handle string types f…

58e78af

…or key and value fields

Gezi-lzq force-pushed the feat/icberg-writer branch from 4e2c8e3 to 58e78af Compare September 4, 2025 14:00

Gezi-lzq added 2 commits September 4, 2025 22:06

feat(config): remove unused value subject and message name configurat…

0b9c4af

…ions from TopicConfig and LogConfig

chore: code style

c068d5f

superhx approved these changes Sep 5, 2025

View reviewed changes

Gezi-lzq merged commit 01e5371 into main Sep 5, 2025
6 checks passed

Gezi-lzq deleted the feat/icberg-writer branch September 5, 2025 02:29

Gezi-lzq mentioned this pull request Sep 5, 2025

[Feature Request] Table Topic support schema without schemaId #2739

Closed

Gezi-lzq mentioned this pull request Sep 5, 2025

feat: enhanced table topic conversion function #2818

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(process): introduce record processor factory and enrich conversion/transform pipeline #2796

feat(process): introduce record processor factory and enrich conversion/transform pipeline #2796

Uh oh!

Gezi-lzq commented Aug 31, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

superhx Sep 1, 2025

Uh oh!

Gezi-lzq Sep 3, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

feat(process): introduce record processor factory and enrich conversion/transform pipeline #2796

feat(process): introduce record processor factory and enrich conversion/transform pipeline #2796

Uh oh!

Conversation

Gezi-lzq commented Aug 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Config

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

superhx Sep 1, 2025

Choose a reason for hiding this comment

Uh oh!

Gezi-lzq Sep 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Gezi-lzq commented Aug 31, 2025 •

edited

Loading

Gezi-lzq Sep 3, 2025 •

edited

Loading