Skip to content

bug(iceberg): add_files doesn't actually check duplicated files #1394

@Erigara

Description

@Erigara

Apache Iceberg Rust version

Latest main branch.

Describe the bug

I've noticed that add_files with check_duplicate doesn't find duplicated files.

The problem is that in the implementation instead of listing all files already exist in the table all manifest files instead are returned.

I've added print here to verify.

Here is what was returned (i trimmed path to exclude irrelevant parts):

s3://../039fe264-9c78-4f14-9d93-473b1d39afa6/metadata/a7149190-3785-4780-b416-3a62dfe2bd4b-m0.avro
s3://../039fe264-9c78-4f14-9d93-473b1d39afa6/metadata/e2eb79eb-c884-43a2-9a47-c35cef8aaf6b-m0.avro
...

To Reproduce

Create simple table, try to add the same file multiple times.

Expected behavior

Duplicated files are detected.

Willingness to contribute

I'm willing to contribute to this issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions