-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
fix(core): don't dedupe virtual descriptors with different base idents resolved to the same virtual package #1726
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…s resolved to the same virtual package
Wow, that's a tricky one indeed 🐉 |
3 tasks
arcanis
pushed a commit
that referenced
this pull request
Mar 28, 2025
## What's the problem this PR addresses? Fixes #6573 In order to explain *why* the bug happens, let's do a refresher on virtual packages and deduplication, and *how* we end up here ### Virtuals Let's say we have the following packages: - `pkg-z` has a peer dependency on `pkg-p@*` - `pkg-a` has a dependency on `pkg-z@^1.0.0` and `pkg-p@^1.0.0` - `pkg-b` has a dependency on `pkg-z@^1.0.0` and `pkg-p@^2.0.0` ``` pkg-a --[ pkg-z@^1.0.0 ]--> pkg-z@npm:1.0.0 ==> pkg-p@* --[ pkg-p@^1.0.0 ]--> pkg-p@npm:1.0.0 pkg-b --[ pkg-z@^1.0.0 ]--> pkg-z@npm:1.0.0 ==> pkg-p@* --[ pkg-p@^2.0.0 ]--> pkg-p@npm:2.0.0 ``` When we resolve the peer dependencies, we resolve which instance of `pkg-p` we each `pkg-z` instance should get, then we add that instance of `pkg-p` as a regular dependency to the `pkg-z` instance in our internal package data ``` pkg-a --[ pkg-z@^1.0.0 ]--> pkg-z@npm:1.0.0 --[...]--> pkg-p@npm:1.0.0 pkg-b --[ pkg-z@^1.0.0 ]--> pkg-z@npm:1.0.0 --[...]--> pkg-p@npm:2.0.0 ``` This is a problem because we end up with the same package having two different sets of dependencies. To avoid that, we create **virtual packages** for each instance of peer requesters (i.e. packages with peer dependencies) in the tree. We also replace descriptors that resolve to those packages with **virtual descriptors**. ``` pkg-a --[ pkg-z@virtual:<id-a> ]--> pkg-z@virtual:<id-a> --[...]--> pkg-p@npm:1.0.0 pkg-b --[ pkg-z@virtual:<id-b> ]--> pkg-z@virtual:<id-b> --[...]--> pkg-p@npm:2.0.0 ``` The two instance of `pkg-z` that have different dependency sets are now literally two different packages in our internals. Note that the virtual id is based on the parent package and the pre-virtualization package. ### Deduplication However, just virtualizing every peer requester naively can lead to some problems. Take this example: ``` pkg-a --[ pkg-z@^1.0.0 ]--> pkg-z@npm:1.0.0 ==> pkg-p@* --[ pkg-p@^1.0.0 ]--> pkg-p@npm:1.0.0 pkg-b --[ pkg-z@^1.0.0 ]--> pkg-z@npm:1.0.0 ==> pkg-p@* --[ pkg-p@^1.0.0 ]--> pkg-p@npm:1.0.0 ``` ``` pkg-a --[ pkg-z@virtual:<id-a> ]--> pkg-z@virtual:<id-a> --[...]-> pkg-p@npm:1.0.0 pkg-b --[ pkg-z@virtual:<id-b> ]--> pkg-z@virtual:<id-b> --[...]-> pkg-p@npm:1.0.0 ``` We have the reverse "problem" -- we end up with two different virtual `pkg-z` with the same dependency sets. That's not incorrect *per se*, but it'd be nice if we can use the same package instance for both. That's why we also have a deduplication process where we find virtual packages that are the virtualizations of the same package and have the same dependency sets, and replace all of their virtual descriptors and packages with a single "master" descriptor/package among them ``` pkg-a --[ pkg-z@virtual:<id-a> ]--> pkg-z@virtual:<id-a> --[...]-> pkg-p@npm:1.0.0 pkg-b --[ pkg-z@virtual:<id-a> ]--> pkg-z@virtual:<id-a> --[...]-> pkg-p@npm:1.0.0 ``` Now we go back to having one `pkg-z` instance, and we can remove the discarded virtual descriptor/package (`pkg-z@virtual:<id-b>`) from our internals. ### Bug with aliases This deduplication, however, in turn leads to a bug (#1352) in some edge cases. (In this and subsequent examples, I'll omit `pkg-p`. Assume that all instances of `pkg-z` has their peer requests satisfied the same way) ```json { "name": "pkg-a", "dependencies": { "pkg-x": "npm:pkg-z@^1.0.0", "pkg-y": "npm:pkg-z@^1.0.0" } } ``` ``` pkg-a --[ pkg-x@npm:pkg-z@^1.0.0 ]--> pkg-z@npm:1.0.0 --[ pkg-y@npm:pkg-z@^1.0.0 ]--> pkg-z@npm:1.0.0 ``` ``` pkg-a --[ pkg-x@virtual:<id-a> ]--> pkg-z@virtual:<id-a> --[ pkg-y@virtual:<id-a> ]--> pkg-z@virtual:<id-a> ``` We have used aliases to make two different virtual descriptors resolve to the same virtual package. When deduping, we will dedupe one of the virtual descriptors to the other descriptor, and **discard the package it resolves to**. Now we end up with zero instances of `pkg-z`... #1726 fixes that, by having the deduplication key include the descriptor ident (`pkg-x` vs `pkg-y`), so those two are not deduped against each other. This is where we are currently at. ### More edge cases !!! This fix, however, once again causes more edge cases. In one edge case, it will cause virtual packages to not dedupe when they should ``` pkg-a --[ pkg-x@npm:pkg-z@^1.0.0 ]--> pkg-z@npm:1.0.0 pkg-b --[ pkg-y@npm:pkg-z@^1.0.0 ]--> pkg-z@npm:1.0.0 ``` ``` pkg-a --[ pkg-x@virtual:<id-a> ]--> pkg-z@virtual:<id-a> pkg-b --[ pkg-y@virtual:<id-b> ]--> pkg-z@virtual:<id-b> ``` #1726 fixes specifically for two aliased dependencies of the same package. What if we have two packages each using an alias? We don't dedupe those because the virtual descriptors have different idents, and end up with two virtual instances because the parent package is part of the virtual id calculation ---- That case also not a problem *per se*, just inefficient. However, using that we can construct a situation where the same package can either dedupe or not dedupe depending on which descriptor is being checked. ``` pkg-a --[ pkg-x@npm:pkg-z@^1.0.0 ]--> pkg-z@npm:1.0.0 pkg-b --[ pkg-x@npm:pkg-z@^1.0.0 ]--> pkg-z@npm:1.0.0 --[ pkg-y@npm:pkg-z@^1.0.0 ]--> pkg-z@npm:1.0.0 ``` ``` pkg-a --[ pkg-x@virtual:<id-a> ]--> pkg-z@virtual:<id-a> pkg-b --[ pkg-x@virtual:<id-b> ]--> pkg-z@virtual:<id-b> --[ pkg-y@virtual:<id-b> ]--> pkg-z@virtual:<id-b> ``` Here, the second descriptor `pkg-x@virtual:<id-b>` will be deduped into the first: they have the same ident and resolve to the same "physical" package. Once again we will **discard `pkg-z@virtual:<id-b>` in the process**. However, the third descriptor `pkg-x@virtual:<id-b>` is not deduped, leaving a dangling reference to the discarded package. This is the case reported in #6573 ## How did you fix it? At its root, #1352 is caused by the same virtual package being checked for dedupe twice against two diffent virtual descriptors that resolve to it. What if we just... not do that? **Instead of deduping virtual descriptors, just dedupe the virtual packages.** We would have virtual descriptors resolving to virtual packages with a different virtual id, but that would not cause problems. In fact, the entire point of resolution is to link descriptors and locators. This ends up simplifying the implementation a lot because we don't need to keep track of which packages have which virtual descriptors as dependencies. We just need a reverse resolution map from virtual locators to virtual descriptors. Deduping is as simple as updating the project resolution map. The only downside is we have a few more descriptors in the internals, but that's just a very tiny cost compared to the benefits. ## Checklist <!--- Don't worry if you miss something, chores are automatically tested. --> <!--- This checklist exists to help you remember doing the chores when you submit a PR. --> <!--- Put an `x` in all the boxes that apply. --> - [x] I have read the [Contributing Guide](https://yarnpkg.com/advanced/contributing). <!-- See https://yarnpkg.com/advanced/contributing#preparing-your-pr-to-be-released for more details. --> <!-- Check with `yarn version check` and fix with `yarn version check -i` --> - [x] I have set the packages that need to be released for my changes to be effective. <!-- The "Testing chores" workflow validates that your PR follows our guidelines. --> <!-- If it doesn't pass, click on it to see details as to what your PR might be missing. --> - [x] I will check that all automated PR checks pass before the PR gets reviewed.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What's the problem this PR addresses?
Virtual descriptors with different base idents that resolve to the same virtual package (aka aliases) were incorrectly deduped out of existence.
Fixes #1352.
How did you fix it?
I've made it so that the
identHash
is taken into account when computing thedependencyHash
.I've added a dragon test for this edge case, and all of the other dragon tests are passing.
Checklist