[CORE-1620] Migrate `ntp_archiver` uploads to `segment_collector_stream` interface #25951

oleiman · 2025-04-26T08:29:28Z

Rather than a collection of segments for reading, segment_collector produces a segment_collector_stream struct. Includes corresponding changes to ntp_archiver to upload from one of these.

Backports Required

Release Notes

none

secpanda · 2025-04-26T08:29:44Z

🎉 Snyk checks have passed. No issues have been found so far.

✅ security/snyk check is complete. No issues have been found. (View Details)

✅ license/snyk check is complete. No issues have been found. (View Details)

secpanda · 2025-04-26T08:29:46Z

🎉 Snyk checks have passed. No issues have been found so far.

✅ security/snyk check is complete. No issues have been found. (View Details)

✅ license/snyk check is complete. No issues have been found. (View Details)

oleiman · 2025-04-26T08:29:51Z

/dt

vbotbuildovich · 2025-04-26T11:38:37Z

Retry command for Build#65139

please wait until all jobs are finished before running the slash command



/ci-repeat 1
tests/rptest/tests/recovery_mode_test.py::RecoveryModeTest.test_rolling_restart
tests/rptest/tests/delete_records_test.py::DeleteRecordsTest.test_delete_records_segment_deletion@{"cloud_storage_enabled":true,"truncate_point":"one_below_high_watermark"}
tests/rptest/tests/cloud_storage_usage_test.py::CloudStorageUsageTest.test_cloud_storage_usage_reporting
tests/rptest/tests/usage_test.py::UsageTestCloudStorageMetrics.test_usage_manager_cloud_storage
tests/rptest/tests/data_migrations_api_test.py::DataMigrationsApiTest.test_higher_level_migration_api
tests/rptest/tests/recovery_mode_test.py::RecoveryModeTest.test_recovery_mode
tests/rptest/tests/consumer_group_recovery_test.py::ConsumerOffsetsRecoveryTest.test_consumer_offsets_partition_recovery
tests/rptest/tests/data_migrations_api_test.py::DataMigrationsApiTest.test_conflicting_names
tests/rptest/tests/datalake/datalake_omb_test.py::DatalakeOMBTest.basic_workload_linear_20_test@{"cloud_storage_type":1}
tests/rptest/tests/e2e_shadow_indexing_test.py::EndToEndShadowIndexingTestWithDisruptions.test_write_with_node_failures@{"cloud_storage_type":1}
tests/rptest/tests/follower_fetching_test.py::FollowerFetchingTest.test_follower_fetching_with_maintenance_mode
tests/rptest/tests/shadow_indexing_compacted_topic_test.py::TSWithAlreadyCompactedTopic.test_initial_upload
tests/rptest/tests/delete_records_test.py::DeleteRecordsTest.test_delete_records_segment_deletion@{"cloud_storage_enabled":true,"truncate_point":"random_offset"}

vbotbuildovich · 2025-04-26T14:05:00Z

CI test results

test results on build#65139

test_id	test_kind	job_url	test_status	passed
rptest.tests.cloud_storage_usage_test.CloudStorageUsageTest.test_cloud_storage_usage_reporting	ducktape	https://buildkite.com/redpanda/redpanda/builds/65139#0196717a-85ce-4a1c-a9f5-fd3caa38a732	FAIL	0/21
rptest.tests.consumer_group_recovery_test.ConsumerOffsetsRecoveryTest.test_consumer_offsets_partition_recovery	ducktape	https://buildkite.com/redpanda/redpanda/builds/65139#0196717a-85cf-4dc5-aa9b-7cf7f3d493c3	FLAKY	13/21
rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_conflicting_names	ducktape	https://buildkite.com/redpanda/redpanda/builds/65139#0196717a-85cf-4dc5-aa9b-7cf7f3d493c3	FAIL	0/21
rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_higher_level_migration_api	ducktape	https://buildkite.com/redpanda/redpanda/builds/65139#0196717a-85ce-4a1c-a9f5-fd3caa38a732	FLAKY	15/21
rptest.tests.datalake.datalake_omb_test.DatalakeOMBTest.basic_workload_linear_20_test.cloud_storage_type=CloudStorageType.S3	ducktape	https://buildkite.com/redpanda/redpanda/builds/65139#0196717a-85ce-4024-9c7d-96faf0c16200	FAIL	0/21
rptest.tests.delete_records_test.DeleteRecordsTest.test_delete_records_segment_deletion.cloud_storage_enabled=True.truncate_point=one_below_high_watermark	ducktape	https://buildkite.com/redpanda/redpanda/builds/65139#0196717a-85cd-437f-825b-d9f11893579d	FLAKY	7/21
rptest.tests.delete_records_test.DeleteRecordsTest.test_delete_records_segment_deletion.cloud_storage_enabled=True.truncate_point=random_offset	ducktape	https://buildkite.com/redpanda/redpanda/builds/65139#0196717a-85ce-4024-9c7d-96faf0c16200	FLAKY	5/21
rptest.tests.e2e_shadow_indexing_test.EndToEndShadowIndexingTestWithDisruptions.test_write_with_node_failures.cloud_storage_type=CloudStorageType.S3	ducktape	https://buildkite.com/redpanda/redpanda/builds/65139#0196717a-85ce-4024-9c7d-96faf0c16200	FLAKY	18/21
rptest.tests.follower_fetching_test.FollowerFetchingTest.test_follower_fetching_with_maintenance_mode	ducktape	https://buildkite.com/redpanda/redpanda/builds/65139#0196717a-85ce-4024-9c7d-96faf0c16200	FAIL	0/21
rptest.tests.recovery_mode_test.RecoveryModeTest.test_recovery_mode	ducktape	https://buildkite.com/redpanda/redpanda/builds/65139#0196717a-85cf-4dc5-aa9b-7cf7f3d493c3	FAIL	0/21
rptest.tests.recovery_mode_test.RecoveryModeTest.test_rolling_restart	ducktape	https://buildkite.com/redpanda/redpanda/builds/65139#0196717a-85cd-437f-825b-d9f11893579d	FAIL	0/21
rptest.tests.shadow_indexing_compacted_topic_test.TSWithAlreadyCompactedTopic.test_initial_upload	ducktape	https://buildkite.com/redpanda/redpanda/builds/65139#0196717a-85ce-4024-9c7d-96faf0c16200	FAIL	0/21
rptest.tests.usage_test.UsageTestCloudStorageMetrics.test_usage_manager_cloud_storage	ducktape	https://buildkite.com/redpanda/redpanda/builds/65139#0196717a-85ce-4a1c-a9f5-fd3caa38a732	FAIL	0/21

test results on build#65155

test_id	test_kind	job_url	test_status	passed
rptest.tests.cloud_storage_chunk_read_path_test.CloudStorageChunkReadTest.test_prefetch_chunks.prefetch=0	ducktape	https://buildkite.com/redpanda/redpanda/builds/65155#019679e8-f4e3-4785-a9ef-f7f048a3aeca	FLAKY	20/21
rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_conflicting_names	ducktape	https://buildkite.com/redpanda/redpanda/builds/65155#019679e8-f4e6-41f8-83c7-1389933fac42	FLAKY	7/21
rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_conflicting_names	ducktape	https://buildkite.com/redpanda/redpanda/builds/65155#019679ed-eeb1-486e-a3fb-8589abe37f93	FLAKY	6/21
rptest.tests.shadow_indexing_compacted_topic_test.ShadowIndexingCompactedTopicTest.test_upload.cloud_storage_type=CloudStorageType.ABS	ducktape	https://buildkite.com/redpanda/redpanda/builds/65155#019679e8-f4e6-41f8-83c7-1389933fac42	FLAKY	12/21
rptest.transactions.consumer_offsets_test.VerifyConsumerOffsets.test_consumer_group_offsets	ducktape	https://buildkite.com/redpanda/redpanda/builds/65155#019679e8-f4e4-48cd-b315-7c721ede3259	FLAKY	18/21

test results on build#65423

test_id	test_kind	job_url	test_status	passed
rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_creating_and_listing_migrations	ducktape	https://buildkite.com/redpanda/redpanda/builds/65423#01968e08-5515-4ed9-bf61-4f49f2568f0d	FLAKY	20/21
rptest.tests.datalake.iceberg_toggling_test.IcebergTogglingTest.test_iceberg_toggling.cloud_storage_type=CloudStorageType.S3	ducktape	https://buildkite.com/redpanda/redpanda/builds/65423#01968e03-c804-46fe-b079-2148b374fb9f	FLAKY	20/21
rptest.tests.partition_balancer_test.PartitionBalancerTest.test_fuzz_admin_ops	ducktape	https://buildkite.com/redpanda/redpanda/builds/65423#01968e08-5515-4ed9-bf61-4f49f2568f0d	FLAKY	18/21
rptest.tests.random_node_operations_test.RandomNodeOperationsTest.test_node_operations.enable_failures=True.mixed_versions=False.with_tiered_storage=False.with_iceberg=True.with_chunked_compaction=True.cloud_storage_type=CloudStorageType.S3	ducktape	https://buildkite.com/redpanda/redpanda/builds/65423#01968e08-5515-428a-8770-7cb4a6e584b8	FAIL	0/1
rptest.tests.scaling_up_test.ScalingUpTest.test_scaling_up_with_recovered_topic	ducktape	https://buildkite.com/redpanda/redpanda/builds/65423#01968e08-5513-4f2a-9731-5ba3d4bede6e	FLAKY	6/21
rptest.tests.scram_test.SaslPlainTest.test_plain_authn.client_type=ClientType.KCL.scram_type=SCRAM-SHA-256.sasl_plain_enabled=False	ducktape	https://buildkite.com/redpanda/redpanda/builds/65423#01968e08-5515-4ed9-bf61-4f49f2568f0d	FLAKY	20/21
rptest.tests.scram_test.SaslPlainTest.test_plain_authn.client_type=ClientType.PYTHON_RDKAFKA.scram_type=SCRAM-SHA-512.sasl_plain_enabled=False	ducktape	https://buildkite.com/redpanda/redpanda/builds/65423#01968e08-5514-430e-a60d-b01c0c675f44	FLAKY	20/21
rptest.tests.shadow_indexing_compacted_topic_test.ShadowIndexingCompactedTopicTest.test_upload.cloud_storage_type=CloudStorageType.ABS	ducktape	https://buildkite.com/redpanda/redpanda/builds/65423#01968e03-c804-46fe-b079-2148b374fb9f	FLAKY	15/21
rptest.tests.upgrade_test.UpgradeBackToBackTest.test_upgrade_with_all_workloads.single_upgrade=False	ducktape	https://buildkite.com/redpanda/redpanda/builds/65423#01968e08-5515-428a-8770-7cb4a6e584b8	FAIL	0/1

test results on build#65442

test_id	test_kind	job_url	test_status	passed
rptest.tests.cloud_storage_timing_stress_test.CloudStorageTimingStressTest.test_cloud_storage_with_partition_moves.cleanup_policy=compact.delete	ducktape	https://buildkite.com/redpanda/redpanda/builds/65442#01968fc8-79cc-4d2c-839c-7bda24420b02	FLAKY	20/21
rptest.tests.nodes_decommissioning_test.NodesDecommissioningTest.test_decommissioning_finishes_after_manual_cancellation.delete_topic=False	ducktape	https://buildkite.com/redpanda/redpanda/builds/65442#01968fc8-79cc-4c2f-aedd-a3d6eea8866b	FLAKY	20/21
rptest.tests.random_node_operations_test.RandomNodeOperationsTest.test_node_operations.enable_failures=True.mixed_versions=True.with_tiered_storage=True.with_iceberg=False.with_chunked_compaction=False.cloud_storage_type=CloudStorageType.S3	ducktape	https://buildkite.com/redpanda/redpanda/builds/65442#01968fc8-79cc-4d2c-839c-7bda24420b02	FLAKY	20/21
rptest.tests.scaling_up_test.ScalingUpTest.test_scaling_up_with_recovered_topic	ducktape	https://buildkite.com/redpanda/redpanda/builds/65442#01968fc8-79cc-4d2c-839c-7bda24420b02	FLAKY	9/21
rptest.tests.upgrade_test.UpgradeBackToBackTest.test_upgrade_with_all_workloads.single_upgrade=False	ducktape	https://buildkite.com/redpanda/redpanda/builds/65442#01968fc8-79cd-4fab-93b0-8b847722586e	FLAKY	12/21

test results on build#65485

test_id	test_kind	job_url	test_status	passed
gtest_raft_rpunit.gtest_raft_rpunit	unit	https://buildkite.com/redpanda/redpanda/builds/65485#01969292-b31f-4648-a5aa-2de09b71ad82	FLAKY	1/2
rptest.tests.datalake.datalake_e2e_test.DatalakeDelayedEnablementTest.test_enabling_iceberg_in_existing_cluster.cloud_storage_type=CloudStorageType.S3.catalog_type=CatalogType.REST_JDBC	ducktape	https://buildkite.com/redpanda/redpanda/builds/65485#019692d8-531c-48aa-b90d-6fbccc332528	FLAKY	18/21
rptest.tests.partition_balancer_test.PartitionBalancerTest.test_transfer_controller_leadership	ducktape	https://buildkite.com/redpanda/redpanda/builds/65485#019692d8-5319-487b-96ab-c7d24c86eaf8	FLAKY	12/21
rptest.tests.shadow_indexing_compacted_topic_test.ShadowIndexingCompactedTopicTest.test_upload.cloud_storage_type=CloudStorageType.ABS	ducktape	https://buildkite.com/redpanda/redpanda/builds/65485#019692ea-a030-4562-b384-0e3696d55e55	FLAKY	12/21
rptest.tests.topic_creation_test.RecreateTopicMetadataTest.test_recreated_topic_metadata_are_valid.replication_factor=3	ducktape	https://buildkite.com/redpanda/redpanda/builds/65485#019692ea-a02f-4450-bfb2-48e81c6498c0	FLAKY	20/21
rptest.tests.upgrade_test.UpgradeBackToBackTest.test_upgrade_with_all_workloads.single_upgrade=False	ducktape	https://buildkite.com/redpanda/redpanda/builds/65485#019692ea-a030-4562-b384-0e3696d55e55	FLAKY	11/21
rptest.tests.write_caching_fi_e2e_test.WriteCachingFailureInjectionE2ETest.test_crash_all_with_consumer_group	ducktape	https://buildkite.com/redpanda/redpanda/builds/65485#019692ea-a02f-4450-bfb2-48e81c6498c0	FLAKY	20/21

test results on build#65649

test_id	test_kind	job_url	test_status	passed
rptest.tests.datalake.datalake_e2e_test.DatalakeDelayedEnablementTest.test_enabling_iceberg_in_existing_cluster.cloud_storage_type=CloudStorageType.S3.catalog_type=CatalogType.REST_JDBC	ducktape	https://buildkite.com/redpanda/redpanda/builds/65649#0196ac4a-b62e-46a7-8ce4-c04f91e350ec	FLAKY	19/21
rptest.tests.shadow_indexing_compacted_topic_test.ShadowIndexingCompactedTopicTest.test_upload.cloud_storage_type=CloudStorageType.ABS	ducktape	https://buildkite.com/redpanda/redpanda/builds/65649#0196ac5f-6703-419c-ba58-002324391aae	FAIL	0/21
rptest.tests.upgrade_test.UpgradeBackToBackTest.test_upgrade_with_all_workloads.single_upgrade=False	ducktape	https://buildkite.com/redpanda/redpanda/builds/65649#0196ac5f-6703-419c-ba58-002324391aae	FLAKY	13/21

test results on build#65965

test_class	test_method	test_arguments	test_kind	job_url	test_status	passed	reason
PartitionReassignmentsTest	test_add_partitions_with_inprogress_reassignments		ducktape	https://buildkite.com/redpanda/redpanda/builds/65965#0196cd49-cfc6-4642-b7f4-b6dd9ca84ed5	FLAKY	18/21	upstream reliability is '89.27335640138409'. current run reliability is '85.71428571428571'. drift is 3.55907 and the allowed drift is set to 50. The test should PASS
RandomNodeOperationsTest	test_node_operations	{"cloud_storage_type": 1, "enable_failures": false, "mixed_versions": false, "with_chunked_compaction": true, "with_iceberg": true, "with_tiered_storage": false}	ducktape	https://buildkite.com/redpanda/redpanda/builds/65965#0196cd49-cfc4-4d2e-9ede-ab71f6b9f71c	FLAKY	19/21	upstream reliability is '95.63567362428842'. current run reliability is '90.47619047619048'. drift is 5.15948 and the allowed drift is set to 50. The test should PASS
TopicDeleteCloudStorageTest	topic_delete_installed_snapshots_test		ducktape	https://buildkite.com/redpanda/redpanda/builds/65965#0196cd49-cfc6-4642-b7f4-b6dd9ca84ed5	FLAKY	20/21	upstream reliability is '100.0'. current run reliability is '95.23809523809523'. drift is 4.7619 and the allowed drift is set to 50. The test should PASS

test results on build#66000

test_class	test_method	test_arguments	test_kind	job_url	test_status	passed	reason
NodesDecommissioningTest	test_decommissioning_node_rf_1_replica		ducktape	https://buildkite.com/redpanda/redpanda/builds/66000#0196cfde-ee3e-439f-aef5-87cff43ca87e	FAIL	0/21	The test has failed across all retries

test results on build#66029

test_class	test_method	test_arguments	test_kind	job_url	test_status	passed	reason
NodesDecommissioningTest	test_decommissioning_node_rf_1_replica		ducktape	https://buildkite.com/redpanda/redpanda/builds/66029#0196d139-aa55-490d-84d8-0b7ca70ebcd2	FAIL	0/21	The test has failed across all retries

test results on build#66317

test_class	test_method	test_arguments	test_kind	job_url	test_status	passed	reason
CloudRetentionTest	test_cloud_retention	{"cloud_storage_type": 2, "max_consume_rate_mb": null}	ducktape	https://buildkite.com/redpanda/redpanda/builds/66317#0196f8f2-c224-41da-8120-9f0a94964f78	FLAKY	20/21	upstream reliability is '100.0'. current run reliability is '95.23809523809523'. drift is 4.7619 and the allowed drift is set to 50. The test should PASS
RandomNodeOperationsTest	test_node_operations	{"cloud_storage_type": 1, "enable_failures": false, "mixed_versions": false, "with_chunked_compaction": false, "with_iceberg": true}	ducktape	https://buildkite.com/redpanda/redpanda/builds/66317#0196f8f2-c223-494f-9ca2-940463e1f73f	FLAKY	20/21	upstream reliability is '97.05882352941177'. current run reliability is '95.23809523809523'. drift is 1.82073 and the allowed drift is set to 50. The test should PASS
RandomNodeOperationsTest	test_node_operations	{"cloud_storage_type": 1, "enable_failures": false, "mixed_versions": true, "with_chunked_compaction": false, "with_iceberg": false}	ducktape	https://buildkite.com/redpanda/redpanda/builds/66317#0196f8f2-c223-494f-9ca2-940463e1f73f	FLAKY	20/21	upstream reliability is '100.0'. current run reliability is '95.23809523809523'. drift is 4.7619 and the allowed drift is set to 50. The test should PASS
RandomNodeOperationsTest	test_node_operations	{"cloud_storage_type": 2, "enable_failures": true, "mixed_versions": true, "with_chunked_compaction": false, "with_iceberg": false}	ducktape	https://buildkite.com/redpanda/redpanda/builds/66317#0196f8f2-c225-4490-9335-fce580394cb7	FLAKY	20/21	upstream reliability is '100.0'. current run reliability is '95.23809523809523'. drift is 4.7619 and the allowed drift is set to 50. The test should PASS

test results on build#66505

test_class	test_method	test_arguments	test_kind	job_url	test_status	passed	reason
CloudStorageTimingStressTest	test_cloud_storage_with_partition_moves	{"cleanup_policy": "compact,delete"}	ducktape	https://buildkite.com/redpanda/redpanda/builds/66505#01971805-acc9-450e-9a2f-6f1744acf7cc	FLAKY	20/21
DataTransformsTest	test_consume_from_offset	{"offset": "-1"}	ducktape	https://buildkite.com/redpanda/redpanda/builds/66505#019717ec-482d-40bc-8759-0679f8bb479f	FLAKY	20/21	upstream reliability is '99.22178988326849'. current run reliability is '95.23809523809523'. drift is 3.98369 and the allowed drift is set to 50. The test should PASS
PartitionReassignmentsTest	test_add_partitions_with_inprogress_reassignments		ducktape	https://buildkite.com/redpanda/redpanda/builds/66505#01971805-acca-4e1d-920d-b8c790a7d6a1	FLAKY	18/21	upstream reliability is '89.57475994513031'. current run reliability is '85.71428571428571'. drift is 3.86047 and the allowed drift is set to 50. The test should PASS
RandomNodeOperationsTest	test_node_operations	{"cloud_storage_type": 1, "enable_failures": false, "mixed_versions": false, "with_chunked_compaction": false, "with_iceberg": true}	ducktape	https://buildkite.com/redpanda/redpanda/builds/66505#01971805-acc8-42e0-93b5-c76346428b4d	FAIL	0/1
RandomNodeOperationsTest	test_node_operations	{"cloud_storage_type": 1, "enable_failures": false, "mixed_versions": false, "with_chunked_compaction": true, "with_iceberg": true}	ducktape	https://buildkite.com/redpanda/redpanda/builds/66505#01971805-acc9-450e-9a2f-6f1744acf7cc	FLAKY	19/20
RandomNodeOperationsTest	test_node_operations	{"cloud_storage_type": 1, "enable_failures": false, "mixed_versions": true, "with_chunked_compaction": false, "with_iceberg": false}	ducktape	https://buildkite.com/redpanda/redpanda/builds/66505#01971805-acc8-42e0-93b5-c76346428b4d	FAIL	0/1
RandomNodeOperationsTest	test_node_operations	{"cloud_storage_type": 1, "enable_failures": true, "mixed_versions": true, "with_chunked_compaction": true, "with_iceberg": false}	ducktape	https://buildkite.com/redpanda/redpanda/builds/66505#01971805-acc9-450e-9a2f-6f1744acf7cc	FLAKY	18/19

test results on build#66906

test_class	test_method	test_arguments	test_kind	job_url	test_status	passed	reason
DeleteRecordsTest	test_delete_records_concurrent_truncations	{"cloud_storage_enabled": true, "truncate_point": "one_below_high_watermark"}	ducktape	https://buildkite.com/redpanda/redpanda/builds/66906#01973ed3-dfb0-47e5-879e-f09208c41833	FLAKY	20/21	upstream reliability is '100.0'. current run reliability is '95.23809523809523'. drift is 4.7619 and the allowed drift is set to 50. The test should PASS
PartitionBalancerTest	test_rack_awareness		ducktape	https://buildkite.com/redpanda/redpanda/builds/66906#01973ecf-803b-4f22-bdc7-368cc03c4862	FLAKY	20/21	upstream reliability is '99.25373134328358'. current run reliability is '95.23809523809523'. drift is 4.01564 and the allowed drift is set to 50. The test should PASS

test results on build#67368

test_class	test_method	test_arguments	test_kind	job_url	test_status	passed	reason
CompactionGapsTest	test_translation_no_gaps	{"catalog_type": "nessie", "cloud_storage_type": 1}	ducktape	https://buildkite.com/redpanda/redpanda/builds/67368#0197755d-e6b8-49e6-8a78-656aee40f65d	FLAKY	20/21	upstream reliability is '100.0'. current run reliability is '95.23809523809523'. drift is 4.7619 and the allowed drift is set to 50. The test should PASS
MultiRestartTest	test_recovery_after_multiple_restarts	{"cloud_storage_type": 2}	ducktape	https://buildkite.com/redpanda/redpanda/builds/67368#0197755d-e6b8-4cc5-9be2-765d88ba4a23	FLAKY	20/21	upstream reliability is '100.0'. current run reliability is '95.23809523809523'. drift is 4.7619 and the allowed drift is set to 50. The test should PASS
RandomNodeOperationsTest	test_node_operations	{"cloud_storage_type": 1, "compaction_mode": "sliding_window", "enable_failures": true, "mixed_versions": false, "with_iceberg": true}	ducktape	https://buildkite.com/redpanda/redpanda/builds/67368#0197755d-e6b8-4cc5-9be2-765d88ba4a23	FLAKY	20/21	upstream reliability is '100.0'. current run reliability is '95.23809523809523'. drift is 4.7619 and the allowed drift is set to 50. The test should PASS

oleiman · 2025-04-27T23:04:31Z

/dt

oleiman · 2025-04-27T23:50:19Z

/dt

vbotbuildovich · 2025-04-28T02:52:54Z

Retry command for Build#65155

please wait until all jobs are finished before running the slash command



/ci-repeat 1
tests/rptest/transactions/consumer_offsets_test.py::VerifyConsumerOffsets.test_consumer_group_offsets
tests/rptest/tests/cloud_storage_chunk_read_path_test.py::CloudStorageChunkReadTest.test_prefetch_chunks@{"prefetch":0}
tests/rptest/tests/shadow_indexing_compacted_topic_test.py::ShadowIndexingCompactedTopicTest.test_upload@{"cloud_storage_type":2}
tests/rptest/tests/data_migrations_api_test.py::DataMigrationsApiTest.test_conflicting_names

oleiman · 2025-05-01T21:31:13Z

/dt

secpanda · 2025-05-01T21:43:10Z

🎉 Snyk checks have passed. No issues have been found so far.

✅ security/snyk check is complete. No issues have been found. (View Details)

✅ license/snyk check is complete. No issues have been found. (View Details)

oleiman · 2025-05-02T05:39:49Z

/ci-repeat 1
release

oleiman · 2025-05-07T17:23:04Z

@Lazin FYI

Copilot

Pull Request Overview

This PR migrates NTP uploads to the new segment_collector_stream interface and updates associated types and APIs. Key changes include updating function signatures to use new stream‐based types, renaming and refactoring functions in archival_policy and ntp_archiver, and switching partition pointer types from ss::lw_shared_ptr to raw pointers in async uploader files.

Reviewed Changes

Copilot reviewed 14 out of 16 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
tests/rptest/utils/si_utils.py	Enhanced assertion message to include restored_ntps for better debugging.
src/v/cluster/archival/tests/async_data_uploader_test.cc	Updated get_test_partition() to explicitly obtain the partition via get().
src/v/cluster/archival/segment_reupload.h	Introduced new types, parameters, and constructors for reupload candidates and adjusted function signatures.
src/v/cluster/archival/ntp_archiver_service.h	Modified parameter types (from candidate to stream) and renamed internal functions to support the new streaming interface.
src/v/cluster/archival/async_data_uploader.h/.cc	Changed partition argument types from smart pointers to raw pointers.
src/v/cluster/archival/archival_policy.h	Renamed candidate retrieval functions to better reflect their behavior.
src/v/cluster/archival/adjacent_segment_merger.cc	Updated logging and offset computations to use new stream fields.

Files not reviewed (2)

src/v/cluster/archival/tests/BUILD: Language not supported
src/v/cluster/archival/tests/CMakeLists.txt: Language not supported

Comments suppressed due to low confidence (4)

src/v/cluster/archival/segment_reupload.h:245

The signature of find_replacement_boundary now requires a mode parameter; please update the related comments or documentation to clarify how this parameter influences the replacement boundary computation.

model::offset find_replacement_boundary(segment_collector_mode mode) const;

src/v/cluster/archival/ntp_archiver_service.h:455

The function do_upload_local now accepts a segment_collector_stream instead of an upload_candidate; please update the function documentation to reflect this interface change.

ss::future<bool> do_upload_local(archival_stm_fence fence, segment_collector_stream strm, std::optional<std::reference_wrapper<retry_chain_node>> source_rtc);

src/v/cluster/archival/archival_policy.h:36

[nitpick] The function name changes from get_next_candidate to get_next_compacted_segment (and subsequently to get_next_segment) may cause confusion; updating the inline documentation to explain the distinctions between these methods is recommended.

ss::future<segment_collector_stream_result> get_next_compacted_segment(

src/v/cluster/archival/adjacent_segment_merger.cc:222

[nitpick] The offset field used for computing the next offset has changed from candidate.final_offset to locks.end_offset; verify that all consumers of this value correctly interpret the new field and update related comments if necessary.

auto next = model::next_offset(find_res.locks.value().end_offset);

Copilot · 2025-05-07T17:23:54Z

src/v/cluster/archival/async_data_uploader.h

    static ss::future<result<std::unique_ptr<segment_upload>>>
    make_segment_upload(
-      ss::lw_shared_ptr<cluster::partition> part,
+      cluster::partition* part,


The type change from ss::lw_shared_ptrcluster::partition to raw pointer requires careful lifetime management; please ensure that the caller guarantees the partition object's validity throughout the segment_upload usage.

Suggested change

cluster::partition* part,

ss::lw_shared_ptr<cluster::partition> part,

vbotbuildovich · 2025-05-07T22:46:58Z

Retry command for Build#65649

please wait until all jobs are finished before running the slash command

/ci-repeat 1
tests/rptest/tests/shadow_indexing_compacted_topic_test.py::ShadowIndexingCompactedTopicTest.test_upload@{"cloud_storage_type":2}

vbotbuildovich · 2025-05-14T04:08:44Z

Retry command for Build#65954

please wait until all jobs are finished before running the slash command

/ci-repeat 1
tests/rptest/tests/shadow_indexing_compacted_topic_test.py::ShadowIndexingCompactedTopicTest.test_upload@{"cloud_storage_type":2}
tests/rptest/tests/shadow_indexing_compacted_topic_test.py::ShadowIndexingCompactedTopicTest.test_upload@{"cloud_storage_type":1}

oleiman · 2025-06-05T04:51:21Z

/ci-repeat 1

Lazin

LGTM, the upload path looks way more cleaner than ever before

src/v/cluster/archival/ntp_archiver_service.cc

Lazin · 2025-06-10T18:16:28Z

src/v/cluster/archival/ntp_archiver_service.cc

+    // background operation. We can't background it as is because the
+    // 'upload_index' call is taking 'rtc' as a reference. So there should be
+    // some wrapper for this call.
+    // QUESTION(oren): if we did background this, where would the future


nit: it can be wrapped using the ssx::spawn_with_gate and the rtc could be captured, or something similar

I think I'm going to punt this to a followup. A bunch of unit tests need small changes and the rtc accounting is a bit of a pain since they're neither movable nor copyable as written. Better as an isolated change IMO.

i guess we could use a lw_shared_ptr<retry_chain_node> w/o issue. point stands about test changes though.

the followup is OK

it's also not super important, I think that previously it was not running in the background

previously it was not running in the background

correct. seems like a sensible change to me though 🙂

src/v/cluster/archival/ntp_archiver_service.cc

Lazin · 2025-06-10T18:24:52Z

src/v/cluster/archival/ntp_archiver_service.cc

+    auto lazy_abort = lazy_abort_source{
+      [this]() { return upload_should_abort(); }};
+    auto stream = strm.create_input_stream();
+    auto [upload_stream, indexing_stream] = input_stream_fanout<2>(


Not right now but at some point we should get rid of this.
Currently, the byte stream is split into two parts. One is uploaded and another one is parsed to create an index.
But after conversion to storage::log_reader it will make no sense at all. We will get record batches from the log reader and then we will serialize them to upload. Somewhere in the middle we can build the index state incrementally.

interesting. the way i have this organized in #26099 has this logic staying largely the same but with a log_reader feeding the input stream rather than a concat_segment_reader_view. Is there a clear disadvantage to doing it this way?

I think it's fine to do it this way, the disadvantage is that we need to serialize the data which is comming from the log_reader first, and then feed this data to the fanout stream, then one of the branches of the fanout stream deserializes it (only headers but anyway). We can build the index using the data before the serialization instead. It will be more efficient. But it's not necessary for correctness.

Signed-off-by: Oren Leiman <[email protected]>

The ntp_archiver doesn't store the smart pointer so there is no point to pass smart pointer into the async_data_uploader if async_data_uploader is invoked by the ntp_archiver. Signed-off-by: Evgeny Lazin <[email protected]> Signed-off-by: Oren Leiman <[email protected]>

Signed-off-by: Oren Leiman <[email protected]>

…tream Signed-off-by: Oren Leiman <[email protected]>

Signed-off-by: Oren Leiman <[email protected]>

Mostly related to upload_candidates and whatnot Signed-off-by: Oren Leiman <[email protected]>

Signed-off-by: Oren Leiman <[email protected]>

oleiman · 2025-06-15T19:49:44Z

force push to fix up some stale comments and rebase dev to fix merge conflict

…620/archival-policy-streams" This reverts commit 8cb6455, reversing changes made to 6df64f5.

A previous PR[1] refactored upload_segment and in so doing added retry_strategy::disallow to the retry_chain_node governing the upload. As a result, we saw an uptick in - 'cloud_storage_failed_uploads' - 'cloud_storage_bytes_sent' - 'io_queue_total_read_bytes' Along with an increased incidence of 'backoff quota exceeded' logs from cloud_io. Specifically cloud_io::remote::upload_stream increments 'cloud_storage_failed_uploads' in a number of failure cases, but along with the increased frequency of the backoff quota log line, it's likely bordering on a certainty that the uptick in failures is a result of rtc node retries being exhausted as a result of transient errors that would previously be masked by retry logic inside cloud_io. For now, we should return to the default retry strategy (exponential backoff) and assess the viability of offloading retries to the archival loop at a later time. [1] redpanda-data#25951 Signed-off-by: Oren Leiman <[email protected]>

A previous PR[1] refactored upload_segment and in so doing added retry_strategy::disallow to the retry_chain_node governing the upload. As a result, we saw an uptick in - 'cloud_storage_failed_uploads' - 'cloud_storage_bytes_sent' - 'io_queue_total_read_bytes' Along with an increased incidence of 'backoff quota exceeded' logs from cloud_io. Specifically cloud_io::remote::upload_stream increments 'cloud_storage_failed_uploads' in a number of failure cases, but along with the increased frequency of the backoff quota log line, it's likely bordering on a certainty that the uptick in failures is a result of rtc node retries being exhausted as a result of transient errors that would previously be masked by retry logic inside cloud_io. For now, we should return to the default retry strategy (exponential backoff) and assess the viability of offloading retries to the archival loop at a later time. [1] redpanda-data#25951 Signed-off-by: Oren Leiman <[email protected]> (cherry picked from commit a9dea64)

In a previous PR[1] we began to rely on the archiver loop to retry, and moved away from relying on `cloud_io::remote` for retries in two ways: 1. setting an explicit `disallow` retry policy on the retry node passed to the remote, and 2. setting the `max_retries` passed to `remote::upload_segment()` to 1. In practice, we saw that _not_ relying on the remote resulted in an uptick in the `vectorized_cloud_storage_failed_uploads` metric, which is monitored and alerted on. In [2] we reverted #1, but didn't notice #2. This commit reverts #2. [1] redpanda-data#25951 [2] redpanda-data#26969

In a previous PR[1] we began to rely on the archiver loop to retry, and moved away from relying on `cloud_io::remote` for retries in two ways: 1. setting an explicit `disallow` retry policy on the retry node passed to the remote, and 2. setting the `max_retries` passed to `remote::upload_segment()` to 1. In practice, we saw that _not_ relying on the remote resulted in an uptick in the `vectorized_cloud_storage_failed_uploads` metric, which is monitored and alerted on. In [2] we reverted redpanda-data#1, but didn't notice redpanda-data#2. This commit reverts redpanda-data#2. [1] redpanda-data#25951 [2] redpanda-data#26969 (cherry picked from commit 7f409da)

oleiman self-assigned this Apr 26, 2025

github-actions bot added area/build area/redpanda labels Apr 26, 2025

oleiman force-pushed the ts/core-1620/archival-policy-streams branch from c1963fb to a3928ed Compare April 27, 2025 23:49

oleiman force-pushed the ts/core-1620/archival-policy-streams branch from a3928ed to 0f7e1cb Compare May 1, 2025 21:17

oleiman force-pushed the ts/core-1620/archival-policy-streams branch from 0f7e1cb to 08a1cac Compare May 2, 2025 05:33

oleiman force-pushed the ts/core-1620/archival-policy-streams branch from 08a1cac to 9fc4458 Compare May 2, 2025 19:54

oleiman marked this pull request as ready for review May 2, 2025 19:55

oleiman changed the title ~~Ts/core 1620/archival policy streams~~ WIP: Migrate ntp_archiver uploads to segment_collector_stream interface May 3, 2025

oleiman requested review from Lazin and Copilot May 7, 2025 17:23

Copilot AI reviewed May 7, 2025

View reviewed changes

oleiman force-pushed the ts/core-1620/archival-policy-streams branch from 9fc4458 to fe99ed0 Compare May 7, 2025 18:29

oleiman force-pushed the ts/core-1620/archival-policy-streams branch 2 times, most recently from 196faae to d06aa20 Compare May 14, 2025 00:40

oleiman force-pushed the ts/core-1620/archival-policy-streams branch from d06aa20 to d704fc6 Compare May 14, 2025 04:19

oleiman requested a review from Lazin June 5, 2025 15:06

Lazin previously approved these changes Jun 10, 2025

View reviewed changes

dotnwat requested a review from WillemKauf June 11, 2025 02:21

oleiman and others added 11 commits June 13, 2025 19:06

archival: Move eligible_for_compacted_reupload fn out of archival_policy

eae7cd7

Signed-off-by: Oren Leiman <[email protected]>

archival: Exhaustive variant checks in make_upload_candidate_stream

d861bcf

Signed-off-by: Oren Leiman <[email protected]>

archival: supporting bits from archival_policy -> segment_reupload

9c31430

Signed-off-by: Oren Leiman <[email protected]>

archival: Fix optional conversion linter issue

fbc83fc

Signed-off-by: Oren Leiman <[email protected]>

archival: Minor improvements to segment_collector stream API & types

55c4d77

Signed-off-by: Oren Leiman <[email protected]>

archival: Supporting functions for uploading from segment_collector_s…

5ce5918

…tream Signed-off-by: Oren Leiman <[email protected]>

archival: Flip switch to stream upload interface

56bbf99

Signed-off-by: Oren Leiman <[email protected]>

archival: port adjacent_segment_merger to stream interface

3a5245c

Signed-off-by: Oren Leiman <[email protected]>

archival: Remove dead code

092021a

Mostly related to upload_candidates and whatnot Signed-off-by: Oren Leiman <[email protected]>

archival: Pull static functions in to anon namespace

11d1141

Signed-off-by: Oren Leiman <[email protected]>

oleiman dismissed Lazin’s stale review via 11d1141 June 15, 2025 19:49

oleiman force-pushed the ts/core-1620/archival-policy-streams branch from 546afcf to 11d1141 Compare June 15, 2025 19:49

oleiman requested a review from Lazin June 15, 2025 19:49

Lazin approved these changes Jun 17, 2025

View reviewed changes

oleiman merged commit 8cb6455 into redpanda-data:dev Jun 17, 2025
18 checks passed

oleiman added a commit to oleiman/redpanda that referenced this pull request Jun 26, 2025

Revert "Merge pull request redpanda-data#25951 from oleiman/ts/core-1…

ac6094d

…620/archival-policy-streams" This reverts commit 8cb6455, reversing changes made to 6df64f5.

oleiman mentioned this pull request Jul 23, 2025

Allow retries in ntp_archiver::upload_segment and upload_index #26969

Merged

8 tasks

andrwng mentioned this pull request Jul 29, 2025

archival: remove upload retry limit for segments #27053

Merged

8 tasks

	cluster::partition* part,
	ss::lw_shared_ptr<cluster::partition> part,

[CORE-1620] Migrate ntp_archiver uploads to segment_collector_stream interface #25951

[CORE-1620] Migrate ntp_archiver uploads to segment_collector_stream interface #25951

Uh oh!

Conversation

oleiman commented Apr 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Backports Required

Release Notes

Uh oh!

secpanda commented Apr 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🎉 Snyk checks have passed. No issues have been found so far.

Uh oh!

secpanda commented Apr 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🎉 Snyk checks have passed. No issues have been found so far.

Uh oh!

oleiman commented Apr 26, 2025

Uh oh!

vbotbuildovich commented Apr 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Retry command for Build#65139

Uh oh!

vbotbuildovich commented Apr 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CI test results

Uh oh!

oleiman commented Apr 27, 2025

Uh oh!

oleiman commented Apr 27, 2025

Uh oh!

vbotbuildovich commented Apr 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Retry command for Build#65155

Uh oh!

oleiman commented May 1, 2025

Uh oh!

secpanda commented May 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🎉 Snyk checks have passed. No issues have been found so far.

Uh oh!

oleiman commented May 2, 2025

Uh oh!

oleiman commented May 7, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI May 7, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vbotbuildovich commented May 7, 2025

Retry command for Build#65649

Uh oh!

vbotbuildovich commented May 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Retry command for Build#65954

Uh oh!

oleiman commented Jun 5, 2025

Uh oh!

Lazin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

[CORE-1620] Migrate `ntp_archiver` uploads to `segment_collector_stream` interface #25951

[CORE-1620] Migrate `ntp_archiver` uploads to `segment_collector_stream` interface #25951

oleiman commented Apr 26, 2025 •

edited

Loading

secpanda commented Apr 26, 2025 •

edited

Loading

secpanda commented Apr 26, 2025 •

edited

Loading

vbotbuildovich commented Apr 26, 2025 •

edited

Loading

vbotbuildovich commented Apr 26, 2025 •

edited

Loading

vbotbuildovich commented Apr 28, 2025 •

edited

Loading

secpanda commented May 1, 2025 •

edited

Loading

vbotbuildovich commented May 14, 2025 •

edited

Loading