Skip to content

Conversation

oleiman
Copy link
Member

@oleiman oleiman commented Apr 26, 2025

Rather than a collection of segments for reading, segment_collector produces a segment_collector_stream struct. Includes corresponding changes to ntp_archiver to upload from one of these.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v25.1.x
  • v24.3.x
  • v24.2.x
  • v24.1.x

Release Notes

  • none

@secpanda
Copy link

secpanda commented Apr 26, 2025

🎉 Snyk checks have passed. No issues have been found so far.

security/snyk check is complete. No issues have been found. (View Details)

license/snyk check is complete. No issues have been found. (View Details)

@secpanda
Copy link

secpanda commented Apr 26, 2025

🎉 Snyk checks have passed. No issues have been found so far.

security/snyk check is complete. No issues have been found. (View Details)

license/snyk check is complete. No issues have been found. (View Details)

@oleiman
Copy link
Member Author

oleiman commented Apr 26, 2025

/dt

@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Apr 26, 2025

Retry command for Build#65139

please wait until all jobs are finished before running the slash command



/ci-repeat 1
tests/rptest/tests/recovery_mode_test.py::RecoveryModeTest.test_rolling_restart
tests/rptest/tests/delete_records_test.py::DeleteRecordsTest.test_delete_records_segment_deletion@{"cloud_storage_enabled":true,"truncate_point":"one_below_high_watermark"}
tests/rptest/tests/cloud_storage_usage_test.py::CloudStorageUsageTest.test_cloud_storage_usage_reporting
tests/rptest/tests/usage_test.py::UsageTestCloudStorageMetrics.test_usage_manager_cloud_storage
tests/rptest/tests/data_migrations_api_test.py::DataMigrationsApiTest.test_higher_level_migration_api
tests/rptest/tests/recovery_mode_test.py::RecoveryModeTest.test_recovery_mode
tests/rptest/tests/consumer_group_recovery_test.py::ConsumerOffsetsRecoveryTest.test_consumer_offsets_partition_recovery
tests/rptest/tests/data_migrations_api_test.py::DataMigrationsApiTest.test_conflicting_names
tests/rptest/tests/datalake/datalake_omb_test.py::DatalakeOMBTest.basic_workload_linear_20_test@{"cloud_storage_type":1}
tests/rptest/tests/e2e_shadow_indexing_test.py::EndToEndShadowIndexingTestWithDisruptions.test_write_with_node_failures@{"cloud_storage_type":1}
tests/rptest/tests/follower_fetching_test.py::FollowerFetchingTest.test_follower_fetching_with_maintenance_mode
tests/rptest/tests/shadow_indexing_compacted_topic_test.py::TSWithAlreadyCompactedTopic.test_initial_upload
tests/rptest/tests/delete_records_test.py::DeleteRecordsTest.test_delete_records_segment_deletion@{"cloud_storage_enabled":true,"truncate_point":"random_offset"}

@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Apr 26, 2025

CI test results

test results on build#65139
test_id test_kind job_url test_status passed
rptest.tests.cloud_storage_usage_test.CloudStorageUsageTest.test_cloud_storage_usage_reporting ducktape https://buildkite.com/redpanda/redpanda/builds/65139#0196717a-85ce-4a1c-a9f5-fd3caa38a732 FAIL 0/21
rptest.tests.consumer_group_recovery_test.ConsumerOffsetsRecoveryTest.test_consumer_offsets_partition_recovery ducktape https://buildkite.com/redpanda/redpanda/builds/65139#0196717a-85cf-4dc5-aa9b-7cf7f3d493c3 FLAKY 13/21
rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_conflicting_names ducktape https://buildkite.com/redpanda/redpanda/builds/65139#0196717a-85cf-4dc5-aa9b-7cf7f3d493c3 FAIL 0/21
rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_higher_level_migration_api ducktape https://buildkite.com/redpanda/redpanda/builds/65139#0196717a-85ce-4a1c-a9f5-fd3caa38a732 FLAKY 15/21
rptest.tests.datalake.datalake_omb_test.DatalakeOMBTest.basic_workload_linear_20_test.cloud_storage_type=CloudStorageType.S3 ducktape https://buildkite.com/redpanda/redpanda/builds/65139#0196717a-85ce-4024-9c7d-96faf0c16200 FAIL 0/21
rptest.tests.delete_records_test.DeleteRecordsTest.test_delete_records_segment_deletion.cloud_storage_enabled=True.truncate_point=one_below_high_watermark ducktape https://buildkite.com/redpanda/redpanda/builds/65139#0196717a-85cd-437f-825b-d9f11893579d FLAKY 7/21
rptest.tests.delete_records_test.DeleteRecordsTest.test_delete_records_segment_deletion.cloud_storage_enabled=True.truncate_point=random_offset ducktape https://buildkite.com/redpanda/redpanda/builds/65139#0196717a-85ce-4024-9c7d-96faf0c16200 FLAKY 5/21
rptest.tests.e2e_shadow_indexing_test.EndToEndShadowIndexingTestWithDisruptions.test_write_with_node_failures.cloud_storage_type=CloudStorageType.S3 ducktape https://buildkite.com/redpanda/redpanda/builds/65139#0196717a-85ce-4024-9c7d-96faf0c16200 FLAKY 18/21
rptest.tests.follower_fetching_test.FollowerFetchingTest.test_follower_fetching_with_maintenance_mode ducktape https://buildkite.com/redpanda/redpanda/builds/65139#0196717a-85ce-4024-9c7d-96faf0c16200 FAIL 0/21
rptest.tests.recovery_mode_test.RecoveryModeTest.test_recovery_mode ducktape https://buildkite.com/redpanda/redpanda/builds/65139#0196717a-85cf-4dc5-aa9b-7cf7f3d493c3 FAIL 0/21
rptest.tests.recovery_mode_test.RecoveryModeTest.test_rolling_restart ducktape https://buildkite.com/redpanda/redpanda/builds/65139#0196717a-85cd-437f-825b-d9f11893579d FAIL 0/21
rptest.tests.shadow_indexing_compacted_topic_test.TSWithAlreadyCompactedTopic.test_initial_upload ducktape https://buildkite.com/redpanda/redpanda/builds/65139#0196717a-85ce-4024-9c7d-96faf0c16200 FAIL 0/21
rptest.tests.usage_test.UsageTestCloudStorageMetrics.test_usage_manager_cloud_storage ducktape https://buildkite.com/redpanda/redpanda/builds/65139#0196717a-85ce-4a1c-a9f5-fd3caa38a732 FAIL 0/21
test results on build#65155
test_id test_kind job_url test_status passed
rptest.tests.cloud_storage_chunk_read_path_test.CloudStorageChunkReadTest.test_prefetch_chunks.prefetch=0 ducktape https://buildkite.com/redpanda/redpanda/builds/65155#019679e8-f4e3-4785-a9ef-f7f048a3aeca FLAKY 20/21
rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_conflicting_names ducktape https://buildkite.com/redpanda/redpanda/builds/65155#019679e8-f4e6-41f8-83c7-1389933fac42 FLAKY 7/21
rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_conflicting_names ducktape https://buildkite.com/redpanda/redpanda/builds/65155#019679ed-eeb1-486e-a3fb-8589abe37f93 FLAKY 6/21
rptest.tests.shadow_indexing_compacted_topic_test.ShadowIndexingCompactedTopicTest.test_upload.cloud_storage_type=CloudStorageType.ABS ducktape https://buildkite.com/redpanda/redpanda/builds/65155#019679e8-f4e6-41f8-83c7-1389933fac42 FLAKY 12/21
rptest.transactions.consumer_offsets_test.VerifyConsumerOffsets.test_consumer_group_offsets ducktape https://buildkite.com/redpanda/redpanda/builds/65155#019679e8-f4e4-48cd-b315-7c721ede3259 FLAKY 18/21
test results on build#65423
test_id test_kind job_url test_status passed
rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_creating_and_listing_migrations ducktape https://buildkite.com/redpanda/redpanda/builds/65423#01968e08-5515-4ed9-bf61-4f49f2568f0d FLAKY 20/21
rptest.tests.datalake.iceberg_toggling_test.IcebergTogglingTest.test_iceberg_toggling.cloud_storage_type=CloudStorageType.S3 ducktape https://buildkite.com/redpanda/redpanda/builds/65423#01968e03-c804-46fe-b079-2148b374fb9f FLAKY 20/21
rptest.tests.partition_balancer_test.PartitionBalancerTest.test_fuzz_admin_ops ducktape https://buildkite.com/redpanda/redpanda/builds/65423#01968e08-5515-4ed9-bf61-4f49f2568f0d FLAKY 18/21
rptest.tests.random_node_operations_test.RandomNodeOperationsTest.test_node_operations.enable_failures=True.mixed_versions=False.with_tiered_storage=False.with_iceberg=True.with_chunked_compaction=True.cloud_storage_type=CloudStorageType.S3 ducktape https://buildkite.com/redpanda/redpanda/builds/65423#01968e08-5515-428a-8770-7cb4a6e584b8 FAIL 0/1
rptest.tests.scaling_up_test.ScalingUpTest.test_scaling_up_with_recovered_topic ducktape https://buildkite.com/redpanda/redpanda/builds/65423#01968e08-5513-4f2a-9731-5ba3d4bede6e FLAKY 6/21
rptest.tests.scram_test.SaslPlainTest.test_plain_authn.client_type=ClientType.KCL.scram_type=SCRAM-SHA-256.sasl_plain_enabled=False ducktape https://buildkite.com/redpanda/redpanda/builds/65423#01968e08-5515-4ed9-bf61-4f49f2568f0d FLAKY 20/21
rptest.tests.scram_test.SaslPlainTest.test_plain_authn.client_type=ClientType.PYTHON_RDKAFKA.scram_type=SCRAM-SHA-512.sasl_plain_enabled=False ducktape https://buildkite.com/redpanda/redpanda/builds/65423#01968e08-5514-430e-a60d-b01c0c675f44 FLAKY 20/21
rptest.tests.shadow_indexing_compacted_topic_test.ShadowIndexingCompactedTopicTest.test_upload.cloud_storage_type=CloudStorageType.ABS ducktape https://buildkite.com/redpanda/redpanda/builds/65423#01968e03-c804-46fe-b079-2148b374fb9f FLAKY 15/21
rptest.tests.upgrade_test.UpgradeBackToBackTest.test_upgrade_with_all_workloads.single_upgrade=False ducktape https://buildkite.com/redpanda/redpanda/builds/65423#01968e08-5515-428a-8770-7cb4a6e584b8 FAIL 0/1
test results on build#65442
test_id test_kind job_url test_status passed
rptest.tests.cloud_storage_timing_stress_test.CloudStorageTimingStressTest.test_cloud_storage_with_partition_moves.cleanup_policy=compact.delete ducktape https://buildkite.com/redpanda/redpanda/builds/65442#01968fc8-79cc-4d2c-839c-7bda24420b02 FLAKY 20/21
rptest.tests.nodes_decommissioning_test.NodesDecommissioningTest.test_decommissioning_finishes_after_manual_cancellation.delete_topic=False ducktape https://buildkite.com/redpanda/redpanda/builds/65442#01968fc8-79cc-4c2f-aedd-a3d6eea8866b FLAKY 20/21
rptest.tests.random_node_operations_test.RandomNodeOperationsTest.test_node_operations.enable_failures=True.mixed_versions=True.with_tiered_storage=True.with_iceberg=False.with_chunked_compaction=False.cloud_storage_type=CloudStorageType.S3 ducktape https://buildkite.com/redpanda/redpanda/builds/65442#01968fc8-79cc-4d2c-839c-7bda24420b02 FLAKY 20/21
rptest.tests.scaling_up_test.ScalingUpTest.test_scaling_up_with_recovered_topic ducktape https://buildkite.com/redpanda/redpanda/builds/65442#01968fc8-79cc-4d2c-839c-7bda24420b02 FLAKY 9/21
rptest.tests.upgrade_test.UpgradeBackToBackTest.test_upgrade_with_all_workloads.single_upgrade=False ducktape https://buildkite.com/redpanda/redpanda/builds/65442#01968fc8-79cd-4fab-93b0-8b847722586e FLAKY 12/21
test results on build#65485
test_id test_kind job_url test_status passed
gtest_raft_rpunit.gtest_raft_rpunit unit https://buildkite.com/redpanda/redpanda/builds/65485#01969292-b31f-4648-a5aa-2de09b71ad82 FLAKY 1/2
rptest.tests.datalake.datalake_e2e_test.DatalakeDelayedEnablementTest.test_enabling_iceberg_in_existing_cluster.cloud_storage_type=CloudStorageType.S3.catalog_type=CatalogType.REST_JDBC ducktape https://buildkite.com/redpanda/redpanda/builds/65485#019692d8-531c-48aa-b90d-6fbccc332528 FLAKY 18/21
rptest.tests.partition_balancer_test.PartitionBalancerTest.test_transfer_controller_leadership ducktape https://buildkite.com/redpanda/redpanda/builds/65485#019692d8-5319-487b-96ab-c7d24c86eaf8 FLAKY 12/21
rptest.tests.shadow_indexing_compacted_topic_test.ShadowIndexingCompactedTopicTest.test_upload.cloud_storage_type=CloudStorageType.ABS ducktape https://buildkite.com/redpanda/redpanda/builds/65485#019692ea-a030-4562-b384-0e3696d55e55 FLAKY 12/21
rptest.tests.topic_creation_test.RecreateTopicMetadataTest.test_recreated_topic_metadata_are_valid.replication_factor=3 ducktape https://buildkite.com/redpanda/redpanda/builds/65485#019692ea-a02f-4450-bfb2-48e81c6498c0 FLAKY 20/21
rptest.tests.upgrade_test.UpgradeBackToBackTest.test_upgrade_with_all_workloads.single_upgrade=False ducktape https://buildkite.com/redpanda/redpanda/builds/65485#019692ea-a030-4562-b384-0e3696d55e55 FLAKY 11/21
rptest.tests.write_caching_fi_e2e_test.WriteCachingFailureInjectionE2ETest.test_crash_all_with_consumer_group ducktape https://buildkite.com/redpanda/redpanda/builds/65485#019692ea-a02f-4450-bfb2-48e81c6498c0 FLAKY 20/21
test results on build#65649
test_id test_kind job_url test_status passed
rptest.tests.datalake.datalake_e2e_test.DatalakeDelayedEnablementTest.test_enabling_iceberg_in_existing_cluster.cloud_storage_type=CloudStorageType.S3.catalog_type=CatalogType.REST_JDBC ducktape https://buildkite.com/redpanda/redpanda/builds/65649#0196ac4a-b62e-46a7-8ce4-c04f91e350ec FLAKY 19/21
rptest.tests.shadow_indexing_compacted_topic_test.ShadowIndexingCompactedTopicTest.test_upload.cloud_storage_type=CloudStorageType.ABS ducktape https://buildkite.com/redpanda/redpanda/builds/65649#0196ac5f-6703-419c-ba58-002324391aae FAIL 0/21
rptest.tests.upgrade_test.UpgradeBackToBackTest.test_upgrade_with_all_workloads.single_upgrade=False ducktape https://buildkite.com/redpanda/redpanda/builds/65649#0196ac5f-6703-419c-ba58-002324391aae FLAKY 13/21
test results on build#65965
test_class test_method test_arguments test_kind job_url test_status passed reason
PartitionReassignmentsTest test_add_partitions_with_inprogress_reassignments ducktape https://buildkite.com/redpanda/redpanda/builds/65965#0196cd49-cfc6-4642-b7f4-b6dd9ca84ed5 FLAKY 18/21 upstream reliability is '89.27335640138409'. current run reliability is '85.71428571428571'. drift is 3.55907 and the allowed drift is set to 50. The test should PASS
RandomNodeOperationsTest test_node_operations {"cloud_storage_type": 1, "enable_failures": false, "mixed_versions": false, "with_chunked_compaction": true, "with_iceberg": true, "with_tiered_storage": false} ducktape https://buildkite.com/redpanda/redpanda/builds/65965#0196cd49-cfc4-4d2e-9ede-ab71f6b9f71c FLAKY 19/21 upstream reliability is '95.63567362428842'. current run reliability is '90.47619047619048'. drift is 5.15948 and the allowed drift is set to 50. The test should PASS
TopicDeleteCloudStorageTest topic_delete_installed_snapshots_test ducktape https://buildkite.com/redpanda/redpanda/builds/65965#0196cd49-cfc6-4642-b7f4-b6dd9ca84ed5 FLAKY 20/21 upstream reliability is '100.0'. current run reliability is '95.23809523809523'. drift is 4.7619 and the allowed drift is set to 50. The test should PASS
test results on build#66000
test_class test_method test_arguments test_kind job_url test_status passed reason
NodesDecommissioningTest test_decommissioning_node_rf_1_replica ducktape https://buildkite.com/redpanda/redpanda/builds/66000#0196cfde-ee3e-439f-aef5-87cff43ca87e FAIL 0/21 The test has failed across all retries
test results on build#66029
test_class test_method test_arguments test_kind job_url test_status passed reason
NodesDecommissioningTest test_decommissioning_node_rf_1_replica ducktape https://buildkite.com/redpanda/redpanda/builds/66029#0196d139-aa55-490d-84d8-0b7ca70ebcd2 FAIL 0/21 The test has failed across all retries
test results on build#66317
test_class test_method test_arguments test_kind job_url test_status passed reason
CloudRetentionTest test_cloud_retention {"cloud_storage_type": 2, "max_consume_rate_mb": null} ducktape https://buildkite.com/redpanda/redpanda/builds/66317#0196f8f2-c224-41da-8120-9f0a94964f78 FLAKY 20/21 upstream reliability is '100.0'. current run reliability is '95.23809523809523'. drift is 4.7619 and the allowed drift is set to 50. The test should PASS
RandomNodeOperationsTest test_node_operations {"cloud_storage_type": 1, "enable_failures": false, "mixed_versions": false, "with_chunked_compaction": false, "with_iceberg": true} ducktape https://buildkite.com/redpanda/redpanda/builds/66317#0196f8f2-c223-494f-9ca2-940463e1f73f FLAKY 20/21 upstream reliability is '97.05882352941177'. current run reliability is '95.23809523809523'. drift is 1.82073 and the allowed drift is set to 50. The test should PASS
RandomNodeOperationsTest test_node_operations {"cloud_storage_type": 1, "enable_failures": false, "mixed_versions": true, "with_chunked_compaction": false, "with_iceberg": false} ducktape https://buildkite.com/redpanda/redpanda/builds/66317#0196f8f2-c223-494f-9ca2-940463e1f73f FLAKY 20/21 upstream reliability is '100.0'. current run reliability is '95.23809523809523'. drift is 4.7619 and the allowed drift is set to 50. The test should PASS
RandomNodeOperationsTest test_node_operations {"cloud_storage_type": 2, "enable_failures": true, "mixed_versions": true, "with_chunked_compaction": false, "with_iceberg": false} ducktape https://buildkite.com/redpanda/redpanda/builds/66317#0196f8f2-c225-4490-9335-fce580394cb7 FLAKY 20/21 upstream reliability is '100.0'. current run reliability is '95.23809523809523'. drift is 4.7619 and the allowed drift is set to 50. The test should PASS
test results on build#66505
test_class test_method test_arguments test_kind job_url test_status passed reason
CloudStorageTimingStressTest test_cloud_storage_with_partition_moves {"cleanup_policy": "compact,delete"} ducktape https://buildkite.com/redpanda/redpanda/builds/66505#01971805-acc9-450e-9a2f-6f1744acf7cc FLAKY 20/21
DataTransformsTest test_consume_from_offset {"offset": "-1"} ducktape https://buildkite.com/redpanda/redpanda/builds/66505#019717ec-482d-40bc-8759-0679f8bb479f FLAKY 20/21 upstream reliability is '99.22178988326849'. current run reliability is '95.23809523809523'. drift is 3.98369 and the allowed drift is set to 50. The test should PASS
PartitionReassignmentsTest test_add_partitions_with_inprogress_reassignments ducktape https://buildkite.com/redpanda/redpanda/builds/66505#01971805-acca-4e1d-920d-b8c790a7d6a1 FLAKY 18/21 upstream reliability is '89.57475994513031'. current run reliability is '85.71428571428571'. drift is 3.86047 and the allowed drift is set to 50. The test should PASS
RandomNodeOperationsTest test_node_operations {"cloud_storage_type": 1, "enable_failures": false, "mixed_versions": false, "with_chunked_compaction": false, "with_iceberg": true} ducktape https://buildkite.com/redpanda/redpanda/builds/66505#01971805-acc8-42e0-93b5-c76346428b4d FAIL 0/1
RandomNodeOperationsTest test_node_operations {"cloud_storage_type": 1, "enable_failures": false, "mixed_versions": false, "with_chunked_compaction": true, "with_iceberg": true} ducktape https://buildkite.com/redpanda/redpanda/builds/66505#01971805-acc9-450e-9a2f-6f1744acf7cc FLAKY 19/20
RandomNodeOperationsTest test_node_operations {"cloud_storage_type": 1, "enable_failures": false, "mixed_versions": true, "with_chunked_compaction": false, "with_iceberg": false} ducktape https://buildkite.com/redpanda/redpanda/builds/66505#01971805-acc8-42e0-93b5-c76346428b4d FAIL 0/1
RandomNodeOperationsTest test_node_operations {"cloud_storage_type": 1, "enable_failures": true, "mixed_versions": true, "with_chunked_compaction": true, "with_iceberg": false} ducktape https://buildkite.com/redpanda/redpanda/builds/66505#01971805-acc9-450e-9a2f-6f1744acf7cc FLAKY 18/19
test results on build#66906
test_class test_method test_arguments test_kind job_url test_status passed reason
DeleteRecordsTest test_delete_records_concurrent_truncations {"cloud_storage_enabled": true, "truncate_point": "one_below_high_watermark"} ducktape https://buildkite.com/redpanda/redpanda/builds/66906#01973ed3-dfb0-47e5-879e-f09208c41833 FLAKY 20/21 upstream reliability is '100.0'. current run reliability is '95.23809523809523'. drift is 4.7619 and the allowed drift is set to 50. The test should PASS
PartitionBalancerTest test_rack_awareness ducktape https://buildkite.com/redpanda/redpanda/builds/66906#01973ecf-803b-4f22-bdc7-368cc03c4862 FLAKY 20/21 upstream reliability is '99.25373134328358'. current run reliability is '95.23809523809523'. drift is 4.01564 and the allowed drift is set to 50. The test should PASS
test results on build#67368
test_class test_method test_arguments test_kind job_url test_status passed reason
CompactionGapsTest test_translation_no_gaps {"catalog_type": "nessie", "cloud_storage_type": 1} ducktape https://buildkite.com/redpanda/redpanda/builds/67368#0197755d-e6b8-49e6-8a78-656aee40f65d FLAKY 20/21 upstream reliability is '100.0'. current run reliability is '95.23809523809523'. drift is 4.7619 and the allowed drift is set to 50. The test should PASS
MultiRestartTest test_recovery_after_multiple_restarts {"cloud_storage_type": 2} ducktape https://buildkite.com/redpanda/redpanda/builds/67368#0197755d-e6b8-4cc5-9be2-765d88ba4a23 FLAKY 20/21 upstream reliability is '100.0'. current run reliability is '95.23809523809523'. drift is 4.7619 and the allowed drift is set to 50. The test should PASS
RandomNodeOperationsTest test_node_operations {"cloud_storage_type": 1, "compaction_mode": "sliding_window", "enable_failures": true, "mixed_versions": false, "with_iceberg": true} ducktape https://buildkite.com/redpanda/redpanda/builds/67368#0197755d-e6b8-4cc5-9be2-765d88ba4a23 FLAKY 20/21 upstream reliability is '100.0'. current run reliability is '95.23809523809523'. drift is 4.7619 and the allowed drift is set to 50. The test should PASS

@oleiman
Copy link
Member Author

oleiman commented Apr 27, 2025

/dt

@oleiman oleiman force-pushed the ts/core-1620/archival-policy-streams branch from c1963fb to a3928ed Compare April 27, 2025 23:49
@oleiman
Copy link
Member Author

oleiman commented Apr 27, 2025

/dt

@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Apr 28, 2025

Retry command for Build#65155

please wait until all jobs are finished before running the slash command



/ci-repeat 1
tests/rptest/transactions/consumer_offsets_test.py::VerifyConsumerOffsets.test_consumer_group_offsets
tests/rptest/tests/cloud_storage_chunk_read_path_test.py::CloudStorageChunkReadTest.test_prefetch_chunks@{"prefetch":0}
tests/rptest/tests/shadow_indexing_compacted_topic_test.py::ShadowIndexingCompactedTopicTest.test_upload@{"cloud_storage_type":2}
tests/rptest/tests/data_migrations_api_test.py::DataMigrationsApiTest.test_conflicting_names

@oleiman oleiman force-pushed the ts/core-1620/archival-policy-streams branch from a3928ed to 0f7e1cb Compare May 1, 2025 21:17
@oleiman
Copy link
Member Author

oleiman commented May 1, 2025

/dt

@secpanda
Copy link

secpanda commented May 1, 2025

🎉 Snyk checks have passed. No issues have been found so far.

security/snyk check is complete. No issues have been found. (View Details)

license/snyk check is complete. No issues have been found. (View Details)

@oleiman oleiman force-pushed the ts/core-1620/archival-policy-streams branch from 0f7e1cb to 08a1cac Compare May 2, 2025 05:33
@oleiman
Copy link
Member Author

oleiman commented May 2, 2025

/ci-repeat 1
release

@oleiman oleiman force-pushed the ts/core-1620/archival-policy-streams branch from 08a1cac to 9fc4458 Compare May 2, 2025 19:54
@oleiman oleiman marked this pull request as ready for review May 2, 2025 19:55
@oleiman oleiman changed the title Ts/core 1620/archival policy streams WIP: Migrate ntp_archiver uploads to segment_collector_stream interface May 3, 2025
@oleiman
Copy link
Member Author

oleiman commented May 7, 2025

@Lazin FYI

@oleiman oleiman requested review from Lazin and Copilot May 7, 2025 17:23
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR migrates NTP uploads to the new segment_collector_stream interface and updates associated types and APIs. Key changes include updating function signatures to use new stream‐based types, renaming and refactoring functions in archival_policy and ntp_archiver, and switching partition pointer types from ss::lw_shared_ptr to raw pointers in async uploader files.

Reviewed Changes

Copilot reviewed 14 out of 16 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tests/rptest/utils/si_utils.py Enhanced assertion message to include restored_ntps for better debugging.
src/v/cluster/archival/tests/async_data_uploader_test.cc Updated get_test_partition() to explicitly obtain the partition via get().
src/v/cluster/archival/segment_reupload.h Introduced new types, parameters, and constructors for reupload candidates and adjusted function signatures.
src/v/cluster/archival/ntp_archiver_service.h Modified parameter types (from candidate to stream) and renamed internal functions to support the new streaming interface.
src/v/cluster/archival/async_data_uploader.h/.cc Changed partition argument types from smart pointers to raw pointers.
src/v/cluster/archival/archival_policy.h Renamed candidate retrieval functions to better reflect their behavior.
src/v/cluster/archival/adjacent_segment_merger.cc Updated logging and offset computations to use new stream fields.
Files not reviewed (2)
  • src/v/cluster/archival/tests/BUILD: Language not supported
  • src/v/cluster/archival/tests/CMakeLists.txt: Language not supported
Comments suppressed due to low confidence (4)

src/v/cluster/archival/segment_reupload.h:245

  • The signature of find_replacement_boundary now requires a mode parameter; please update the related comments or documentation to clarify how this parameter influences the replacement boundary computation.
model::offset find_replacement_boundary(segment_collector_mode mode) const;

src/v/cluster/archival/ntp_archiver_service.h:455

  • The function do_upload_local now accepts a segment_collector_stream instead of an upload_candidate; please update the function documentation to reflect this interface change.
ss::future<bool> do_upload_local(archival_stm_fence fence, segment_collector_stream strm, std::optional<std::reference_wrapper<retry_chain_node>> source_rtc);

src/v/cluster/archival/archival_policy.h:36

  • [nitpick] The function name changes from get_next_candidate to get_next_compacted_segment (and subsequently to get_next_segment) may cause confusion; updating the inline documentation to explain the distinctions between these methods is recommended.
ss::future<segment_collector_stream_result> get_next_compacted_segment(

src/v/cluster/archival/adjacent_segment_merger.cc:222

  • [nitpick] The offset field used for computing the next offset has changed from candidate.final_offset to locks.end_offset; verify that all consumers of this value correctly interpret the new field and update related comments if necessary.
auto next = model::next_offset(find_res.locks.value().end_offset);

static ss::future<result<std::unique_ptr<segment_upload>>>
make_segment_upload(
ss::lw_shared_ptr<cluster::partition> part,
cluster::partition* part,
Copy link
Preview

Copilot AI May 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type change from ss::lw_shared_ptrcluster::partition to raw pointer requires careful lifetime management; please ensure that the caller guarantees the partition object's validity throughout the segment_upload usage.

Suggested change
cluster::partition* part,
ss::lw_shared_ptr<cluster::partition> part,

Copilot uses AI. Check for mistakes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bad bot

@oleiman oleiman force-pushed the ts/core-1620/archival-policy-streams branch from 9fc4458 to fe99ed0 Compare May 7, 2025 18:29
@vbotbuildovich
Copy link
Collaborator

Retry command for Build#65649

please wait until all jobs are finished before running the slash command

/ci-repeat 1
tests/rptest/tests/shadow_indexing_compacted_topic_test.py::ShadowIndexingCompactedTopicTest.test_upload@{"cloud_storage_type":2}

@oleiman oleiman force-pushed the ts/core-1620/archival-policy-streams branch 2 times, most recently from 196faae to d06aa20 Compare May 14, 2025 00:40
@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented May 14, 2025

Retry command for Build#65954

please wait until all jobs are finished before running the slash command

/ci-repeat 1
tests/rptest/tests/shadow_indexing_compacted_topic_test.py::ShadowIndexingCompactedTopicTest.test_upload@{"cloud_storage_type":2}
tests/rptest/tests/shadow_indexing_compacted_topic_test.py::ShadowIndexingCompactedTopicTest.test_upload@{"cloud_storage_type":1}

@oleiman oleiman force-pushed the ts/core-1620/archival-policy-streams branch from d06aa20 to d704fc6 Compare May 14, 2025 04:19
@oleiman
Copy link
Member Author

oleiman commented Jun 5, 2025

/ci-repeat 1

@oleiman oleiman requested a review from Lazin June 5, 2025 15:06
Lazin
Lazin previously approved these changes Jun 10, 2025
Copy link
Contributor

@Lazin Lazin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, the upload path looks way more cleaner than ever before

// background operation. We can't background it as is because the
// 'upload_index' call is taking 'rtc' as a reference. So there should be
// some wrapper for this call.
// QUESTION(oren): if we did background this, where would the future
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it can be wrapped using the ssx::spawn_with_gate and the rtc could be captured, or something similar

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm going to punt this to a followup. A bunch of unit tests need small changes and the rtc accounting is a bit of a pain since they're neither movable nor copyable as written. Better as an isolated change IMO.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess we could use a lw_shared_ptr<retry_chain_node> w/o issue. point stands about test changes though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the followup is OK

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's also not super important, I think that previously it was not running in the background

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

previously it was not running in the background

correct. seems like a sensible change to me though 🙂

auto lazy_abort = lazy_abort_source{
[this]() { return upload_should_abort(); }};
auto stream = strm.create_input_stream();
auto [upload_stream, indexing_stream] = input_stream_fanout<2>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not right now but at some point we should get rid of this.
Currently, the byte stream is split into two parts. One is uploaded and another one is parsed to create an index.
But after conversion to storage::log_reader it will make no sense at all. We will get record batches from the log reader and then we will serialize them to upload. Somewhere in the middle we can build the index state incrementally.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting. the way i have this organized in #26099 has this logic staying largely the same but with a log_reader feeding the input stream rather than a concat_segment_reader_view. Is there a clear disadvantage to doing it this way?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine to do it this way, the disadvantage is that we need to serialize the data which is comming from the log_reader first, and then feed this data to the fanout stream, then one of the branches of the fanout stream deserializes it (only headers but anyway). We can build the index using the data before the serialization instead. It will be more efficient. But it's not necessary for correctness.

@dotnwat dotnwat requested a review from WillemKauf June 11, 2025 02:21
oleiman and others added 11 commits June 13, 2025 19:06
The ntp_archiver doesn't store the smart pointer so there is no point to
pass smart pointer into the async_data_uploader if async_data_uploader
is invoked by the ntp_archiver.

Signed-off-by: Evgeny Lazin <[email protected]>
Signed-off-by: Oren Leiman <[email protected]>
Mostly related to upload_candidates and whatnot

Signed-off-by: Oren Leiman <[email protected]>
@oleiman oleiman force-pushed the ts/core-1620/archival-policy-streams branch from 546afcf to 11d1141 Compare June 15, 2025 19:49
@oleiman
Copy link
Member Author

oleiman commented Jun 15, 2025

force push to fix up some stale comments and rebase dev to fix merge conflict

@oleiman oleiman requested a review from Lazin June 15, 2025 19:49
@oleiman oleiman merged commit 8cb6455 into redpanda-data:dev Jun 17, 2025
18 checks passed
oleiman added a commit to oleiman/redpanda that referenced this pull request Jun 26, 2025
…620/archival-policy-streams"

This reverts commit 8cb6455, reversing
changes made to 6df64f5.
oleiman added a commit to oleiman/redpanda that referenced this pull request Jul 23, 2025
A previous PR[1] refactored upload_segment and in so doing added
retry_strategy::disallow to the retry_chain_node governing the upload.

As a result, we saw an uptick in
- 'cloud_storage_failed_uploads'
- 'cloud_storage_bytes_sent'
- 'io_queue_total_read_bytes'

Along with an increased incidence of 'backoff quota exceeded' logs from
cloud_io.

Specifically cloud_io::remote::upload_stream increments
'cloud_storage_failed_uploads' in a number of failure cases, but along with
the increased frequency of the backoff quota log line, it's likely bordering
on a certainty that the uptick in failures is a result of rtc node retries
being exhausted as a result of transient errors that would previously be
masked by retry logic inside cloud_io.

For now, we should return to the default retry strategy (exponential backoff)
and assess the viability of offloading retries to the archival loop at a
later time.

[1] redpanda-data#25951

Signed-off-by: Oren Leiman <[email protected]>
oleiman added a commit to oleiman/redpanda that referenced this pull request Jul 23, 2025
A previous PR[1] refactored upload_segment and in so doing added
retry_strategy::disallow to the retry_chain_node governing the upload.

As a result, we saw an uptick in
- 'cloud_storage_failed_uploads'
- 'cloud_storage_bytes_sent'
- 'io_queue_total_read_bytes'

Along with an increased incidence of 'backoff quota exceeded' logs from
cloud_io.

Specifically cloud_io::remote::upload_stream increments
'cloud_storage_failed_uploads' in a number of failure cases, but along with
the increased frequency of the backoff quota log line, it's likely bordering
on a certainty that the uptick in failures is a result of rtc node retries
being exhausted as a result of transient errors that would previously be
masked by retry logic inside cloud_io.

For now, we should return to the default retry strategy (exponential backoff)
and assess the viability of offloading retries to the archival loop at a
later time.

[1] redpanda-data#25951

Signed-off-by: Oren Leiman <[email protected]>
vbotbuildovich pushed a commit to vbotbuildovich/redpanda that referenced this pull request Jul 24, 2025
A previous PR[1] refactored upload_segment and in so doing added
retry_strategy::disallow to the retry_chain_node governing the upload.

As a result, we saw an uptick in
- 'cloud_storage_failed_uploads'
- 'cloud_storage_bytes_sent'
- 'io_queue_total_read_bytes'

Along with an increased incidence of 'backoff quota exceeded' logs from
cloud_io.

Specifically cloud_io::remote::upload_stream increments
'cloud_storage_failed_uploads' in a number of failure cases, but along with
the increased frequency of the backoff quota log line, it's likely bordering
on a certainty that the uptick in failures is a result of rtc node retries
being exhausted as a result of transient errors that would previously be
masked by retry logic inside cloud_io.

For now, we should return to the default retry strategy (exponential backoff)
and assess the viability of offloading retries to the archival loop at a
later time.

[1] redpanda-data#25951

Signed-off-by: Oren Leiman <[email protected]>
(cherry picked from commit a9dea64)
andrwng added a commit to andrwng/redpanda that referenced this pull request Jul 29, 2025
In a previous PR[1] we began to rely on the archiver loop to retry, and
moved away from relying on `cloud_io::remote` for retries in two ways:
1. setting an explicit `disallow` retry policy on the retry node passed
   to the remote, and
2. setting the `max_retries` passed to `remote::upload_segment()` to 1.

In practice, we saw that _not_ relying on the remote resulted in an
uptick in the `vectorized_cloud_storage_failed_uploads` metric, which is
monitored and alerted on. In [2] we reverted #1, but didn't notice #2.
This commit reverts #2.

[1] redpanda-data#25951
[2] redpanda-data#26969
andrwng added a commit to andrwng/redpanda that referenced this pull request Jul 29, 2025
In a previous PR[1] we began to rely on the archiver loop to retry, and
moved away from relying on `cloud_io::remote` for retries in two ways:
1. setting an explicit `disallow` retry policy on the retry node passed
   to the remote, and
2. setting the `max_retries` passed to `remote::upload_segment()` to 1.

In practice, we saw that _not_ relying on the remote resulted in an
uptick in the `vectorized_cloud_storage_failed_uploads` metric, which is
monitored and alerted on. In [2] we reverted #1, but didn't notice #2.
This commit reverts #2.

[1] redpanda-data#25951
[2] redpanda-data#26969
vbotbuildovich pushed a commit to vbotbuildovich/redpanda that referenced this pull request Jul 29, 2025
In a previous PR[1] we began to rely on the archiver loop to retry, and
moved away from relying on `cloud_io::remote` for retries in two ways:
1. setting an explicit `disallow` retry policy on the retry node passed
   to the remote, and
2. setting the `max_retries` passed to `remote::upload_segment()` to 1.

In practice, we saw that _not_ relying on the remote resulted in an
uptick in the `vectorized_cloud_storage_failed_uploads` metric, which is
monitored and alerted on. In [2] we reverted redpanda-data#1, but didn't notice redpanda-data#2.
This commit reverts redpanda-data#2.

[1] redpanda-data#25951
[2] redpanda-data#26969

(cherry picked from commit 7f409da)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants