Skip to content

Conversation

paul-soporan
Copy link
Member

What's the problem this PR addresses?

Virtual descriptors with different base idents that resolve to the same virtual package (aka aliases) were incorrectly deduped out of existence.

Fixes #1352.

How did you fix it?

I've made it so that the identHash is taken into account when computing the dependencyHash.

I've added a dragon test for this edge case, and all of the other dragon tests are passing.

Checklist

  • I have set the packages that need to be released for my changes to be effective.
  • I have verified that all automated PR checks pass.

@paul-soporan paul-soporan requested a review from arcanis as a code owner August 20, 2020 10:45
@arcanis
Copy link
Member

arcanis commented Aug 20, 2020

Wow, that's a tricky one indeed 🐉

@arcanis arcanis merged commit 2326463 into master Aug 20, 2020
@arcanis arcanis deleted the paul/fix/virtual-descriptor-dedupe branch August 20, 2020 11:18
arcanis pushed a commit that referenced this pull request Mar 28, 2025
## What's the problem this PR addresses?

Fixes #6573

In order to explain *why* the bug happens, let's do a refresher on
virtual packages and deduplication, and *how* we end up here

### Virtuals

Let's say we have the following packages:
- `pkg-z` has a peer dependency on `pkg-p@*`
- `pkg-a` has a dependency on `pkg-z@^1.0.0` and `pkg-p@^1.0.0`
- `pkg-b` has a dependency on `pkg-z@^1.0.0` and `pkg-p@^2.0.0`
```
pkg-a --[ pkg-z@^1.0.0 ]--> pkg-z@npm:1.0.0 ==> pkg-p@*
      --[ pkg-p@^1.0.0 ]--> pkg-p@npm:1.0.0
pkg-b --[ pkg-z@^1.0.0 ]--> pkg-z@npm:1.0.0 ==> pkg-p@*
      --[ pkg-p@^2.0.0 ]--> pkg-p@npm:2.0.0
```

When we resolve the peer dependencies, we resolve which instance of
`pkg-p` we each `pkg-z` instance should get, then we add that instance
of `pkg-p` as a regular dependency to the `pkg-z` instance in our
internal package data
```
pkg-a --[ pkg-z@^1.0.0 ]--> pkg-z@npm:1.0.0 --[...]--> pkg-p@npm:1.0.0
pkg-b --[ pkg-z@^1.0.0 ]--> pkg-z@npm:1.0.0 --[...]--> pkg-p@npm:2.0.0
```

This is a problem because we end up with the same package having two
different sets of dependencies. To avoid that, we create **virtual
packages** for each instance of peer requesters (i.e. packages with peer
dependencies) in the tree. We also replace descriptors that resolve to
those packages with **virtual descriptors**.
```
pkg-a --[ pkg-z@virtual:<id-a> ]--> pkg-z@virtual:<id-a> --[...]--> pkg-p@npm:1.0.0
pkg-b --[ pkg-z@virtual:<id-b> ]--> pkg-z@virtual:<id-b> --[...]--> pkg-p@npm:2.0.0
```
The two instance of `pkg-z` that have different dependency sets are now
literally two different packages in our internals. Note that the virtual
id is based on the parent package and the pre-virtualization package.

### Deduplication

However, just virtualizing every peer requester naively can lead to some
problems. Take this example:
```
pkg-a --[ pkg-z@^1.0.0 ]--> pkg-z@npm:1.0.0 ==> pkg-p@*
      --[ pkg-p@^1.0.0 ]--> pkg-p@npm:1.0.0
pkg-b --[ pkg-z@^1.0.0 ]--> pkg-z@npm:1.0.0 ==> pkg-p@*
      --[ pkg-p@^1.0.0 ]--> pkg-p@npm:1.0.0
```
```
pkg-a --[ pkg-z@virtual:<id-a> ]--> pkg-z@virtual:<id-a> --[...]-> pkg-p@npm:1.0.0
pkg-b --[ pkg-z@virtual:<id-b> ]--> pkg-z@virtual:<id-b> --[...]-> pkg-p@npm:1.0.0
```

We have the reverse "problem" -- we end up with two different virtual
`pkg-z` with the same dependency sets. That's not incorrect *per se*,
but it'd be nice if we can use the same package instance for both.

That's why we also have a deduplication process where we find virtual
packages that are the virtualizations of the same package and have the
same dependency sets, and replace all of their virtual descriptors and
packages with a single "master" descriptor/package among them

```
pkg-a --[ pkg-z@virtual:<id-a> ]--> pkg-z@virtual:<id-a> --[...]-> pkg-p@npm:1.0.0
pkg-b --[ pkg-z@virtual:<id-a> ]--> pkg-z@virtual:<id-a> --[...]-> pkg-p@npm:1.0.0
```

Now we go back to having one `pkg-z` instance, and we can remove the
discarded virtual descriptor/package (`pkg-z@virtual:<id-b>`) from our
internals.

### Bug with aliases

This deduplication, however, in turn leads to a bug (#1352) in some edge
cases. (In this and subsequent examples, I'll omit `pkg-p`. Assume that
all instances of `pkg-z` has their peer requests satisfied the same way)
```json
{
  "name": "pkg-a",
  "dependencies": {
    "pkg-x": "npm:pkg-z@^1.0.0",
    "pkg-y": "npm:pkg-z@^1.0.0"
  }
}
```
```
pkg-a --[ pkg-x@npm:pkg-z@^1.0.0 ]--> pkg-z@npm:1.0.0
      --[ pkg-y@npm:pkg-z@^1.0.0 ]--> pkg-z@npm:1.0.0
```
```
pkg-a --[ pkg-x@virtual:<id-a> ]--> pkg-z@virtual:<id-a>
      --[ pkg-y@virtual:<id-a> ]--> pkg-z@virtual:<id-a>
```

We have used aliases to make two different virtual descriptors resolve
to the same virtual package. When deduping, we will dedupe one of the
virtual descriptors to the other descriptor, and **discard the package
it resolves to**. Now we end up with zero instances of `pkg-z`...

#1726 fixes that, by having the deduplication key include the descriptor
ident (`pkg-x` vs `pkg-y`), so those two are not deduped against each
other.

This is where we are currently at.

### More edge cases !!!

This fix, however, once again causes more edge cases.

In one edge case, it will cause virtual packages to not dedupe when they
should
```
pkg-a --[ pkg-x@npm:pkg-z@^1.0.0 ]--> pkg-z@npm:1.0.0
pkg-b --[ pkg-y@npm:pkg-z@^1.0.0 ]--> pkg-z@npm:1.0.0
```
```
pkg-a --[ pkg-x@virtual:<id-a> ]--> pkg-z@virtual:<id-a>
pkg-b --[ pkg-y@virtual:<id-b> ]--> pkg-z@virtual:<id-b>
```
#1726 fixes specifically for two aliased dependencies of the same
package. What if we have two packages each using an alias? We don't
dedupe those because the virtual descriptors have different idents, and
end up with two virtual instances because the parent package is part of
the virtual id calculation

----

That case also not a problem *per se*, just inefficient. However, using
that we can construct a situation where the same package can either
dedupe or not dedupe depending on which descriptor is being checked.
```
pkg-a --[ pkg-x@npm:pkg-z@^1.0.0 ]--> pkg-z@npm:1.0.0
pkg-b --[ pkg-x@npm:pkg-z@^1.0.0 ]--> pkg-z@npm:1.0.0
      --[ pkg-y@npm:pkg-z@^1.0.0 ]--> pkg-z@npm:1.0.0
```
```
pkg-a --[ pkg-x@virtual:<id-a> ]--> pkg-z@virtual:<id-a>
pkg-b --[ pkg-x@virtual:<id-b> ]--> pkg-z@virtual:<id-b>
      --[ pkg-y@virtual:<id-b> ]--> pkg-z@virtual:<id-b>
```
Here, the second descriptor `pkg-x@virtual:<id-b>` will be deduped into
the first: they have the same ident and resolve to the same "physical"
package. Once again we will **discard `pkg-z@virtual:<id-b>` in the
process**.

However, the third descriptor `pkg-x@virtual:<id-b>` is not deduped,
leaving a dangling reference to the discarded package.

This is the case reported in #6573

## How did you fix it?

At its root, #1352 is caused by the same virtual package being checked
for dedupe twice against two diffent virtual descriptors that resolve to
it. What if we just... not do that? **Instead of deduping virtual
descriptors, just dedupe the virtual packages.**

We would have virtual descriptors resolving to virtual packages with a
different virtual id, but that would not cause problems. In fact, the
entire point of resolution is to link descriptors and locators.

This ends up simplifying the implementation a lot because we don't need
to keep track of which packages have which virtual descriptors as
dependencies. We just need a reverse resolution map from virtual
locators to virtual descriptors. Deduping is as simple as updating the
project resolution map.

The only downside is we have a few more descriptors in the internals,
but that's just a very tiny cost compared to the benefits.

## Checklist

<!--- Don't worry if you miss something, chores are automatically
tested. -->
<!--- This checklist exists to help you remember doing the chores when
you submit a PR. -->
<!--- Put an `x` in all the boxes that apply. -->
- [x] I have read the [Contributing
Guide](https://yarnpkg.com/advanced/contributing).

<!-- See
https://yarnpkg.com/advanced/contributing#preparing-your-pr-to-be-released
for more details. -->
<!-- Check with `yarn version check` and fix with `yarn version check
-i` -->
- [x] I have set the packages that need to be released for my changes to
be effective.

<!-- The "Testing chores" workflow validates that your PR follows our
guidelines. -->
<!-- If it doesn't pass, click on it to see details as to what your PR
might be missing. -->
- [x] I will check that all automated PR checks pass before the PR gets
reviewed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] Can`t use same versions of aliased packages with peer dependencies
2 participants