Skip to content

Conversation

sftse
Copy link
Contributor

@sftse sftse commented Jun 22, 2025

This commit adds Cargo.lock to the set of tracked files.

The benefits of doing this are that a) changes to transitive dependencies like in #1756 become obvious, put gentle backpressure on compile time/version multiplicity and b) all contributors are on a single source of truth.

The only downside to this change from experience is that merge conflicts with Cargo.lock can be quite bad and require some knowledge to resolve. Usually, when rebasing A onto main the most likely thing one wants to do is retain the version of Cargo.lock from main, resolve the conflicts in Cargo.toml and build the project to let cargo figure out the new patch versions, then commit the newly generated Cargo.lock.

@ArthurZucker
Copy link
Collaborator

Thanks! We do force push this file when releasing. Not sure if it is worth it for dev

@sftse
Copy link
Contributor Author

sftse commented Jul 1, 2025

Feel free to close if you think it's not worth it. I think it is rather important to be able to reproduce the exact state in which the library was tested.
One example that can be unpleasant without a Cargo.lock is if one needs to find the point at which a regression was introduced. Dependencies that do accidentally introduce breaking changes in a patch version can make git bisect painful. It's not sufficient, granted, as one would also have to track the compiler version, but it's one step towards making it more automated.

@sftse
Copy link
Contributor Author

sftse commented Aug 29, 2025

#1842 intends to add clap as a dependency. clap tends to be quite heavyweight, but without checking out and building the PR locally to get the Cargo.lock it is hard to judge what the cost in terms of dependencies is. If the Cargo.lock were checked in, it would be more convenient to judge the cost vs benefit during regular code review.

@ArthurZucker
Copy link
Collaborator

I am not super on point with pros + cons but will ask for advice internally!

@Narsil
Copy link
Collaborator

Narsil commented Sep 4, 2025

@ArthurZucker Having the lockfile in tree is super cumbersome. Basicaly every PR will have changes to this file, usually leading to conflicts.

The rust code it's pretty standard to not commit the lockfile. The documentation recommended for a long time to purely ignore it. They changed that because some projects started to behave differently: https://blog.rust-lang.org/2023/08/29/committing-lockfiles/

Honestly I feel for tokenizers it's just much better to ignore the lockfile most of the time. Tokenizers is a library, we shouldn't have anything to say about pinning versions (because that's what our users will see anyway).

For the python library it's slightly different since the library is embedded in the Python wheel. There for vendoring reasons it's nicer to provide the Cargo.lock so users can recompile and verify that they produce the same binary, it's great to avoid some types of depency chain attacks.

That's why there is the current setup. You are free to change that, but the caveat will be much more "useless" code modifications and merge conflicts when merging PRs (which tend to happen in bulk here so not really practical).

@sftse
Copy link
Contributor Author

sftse commented Sep 15, 2025

Sorry for the response delay.

I've simulated what the changes would be if this PR were to be accepted. If the lockfile were tracked, of the last 15 commits on the main branch, only the PRs related to #1834 would have resulted in changes to the lockfile, and these were later reverted. This seems statistically on par with the larger history of the repo, having 133 commits (7%) touching Cargo.toml which roughly corresponds with changes to the lockfile, unless one has Dependabot that sends PRs for every patch update. Translated into PRs, of the last 11 there was one that changed the lock file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants