Skip to content

Conversation

basilevs
Copy link
Contributor

@basilevs basilevs commented Aug 31, 2025

The deadlock was caused by working with ServiceTracker (which may activate bundles) while holding a lock

Do not hold any locks while working with ServiceTracker
Use ServiceTracker capabilities to:

  • compute priorities
  • monitor whole sets of services (instead of just one, computed at arbitrary moment)
  • manage lifetime of services (not only factories)

Decouple handling of OSGI services from explicitly registered ones to avoid cross-lock interaction.
Hide ServiceTrackerCustomizer as implementation detail.

Fixes #937

This is not ready yet, as tests are needed.
@merks is the overall approach acceptable?

The deadlock was caused by working with ServiceTracker (which may
activate bundles) while holding a lock

Do not hold any locks while working with ServiceTracker

Fixes eclipse-equinox#937
Copy link

github-actions bot commented Aug 31, 2025

Test Results

  128 files  ±0    128 suites  ±0   13m 8s ⏱️ +11s
1 899 tests ±0  1 896 ✅ ±0  3 💤 ±0  0 ❌ ±0 
2 240 runs  ±0  2 237 ✅ ±0  3 💤 ±0  0 ❌ ±0 

Results for commit 812a471. ± Comparison against base commit 47c6f19.

♻️ This comment has been updated with latest results.

@eclipse-equinox-bot
Copy link
Contributor

This pull request changes some projects for the first time in this development cycle.
Therefore the following files need a version increment:

bundles/org.eclipse.equinox.p2.core/META-INF/MANIFEST.MF
features/org.eclipse.equinox.p2.core.feature/feature.xml
features/org.eclipse.equinox.p2.extras.feature/feature.xml
features/org.eclipse.equinox.p2.rcp.feature/feature.xml
features/org.eclipse.equinox.p2.sdk/feature.xml
features/org.eclipse.equinox.p2.user.ui/feature.xml
features/org.eclipse.equinox.server.p2/feature.xml

An additional commit containing all the necessary changes was pushed to the top of this PR's branch. To obtain these changes (for example if you want to push more changes) either fetch from your fork or apply the git patch.

Git patch
From 73c5115ce94b57f5fc21fbe3dcb786c851d1a80a Mon Sep 17 00:00:00 2001
From: Eclipse Equinox Bot <[email protected]>
Date: Sun, 31 Aug 2025 20:09:32 +0000
Subject: [PATCH] Version bump(s) for 4.38 stream


diff --git a/bundles/org.eclipse.equinox.p2.core/META-INF/MANIFEST.MF b/bundles/org.eclipse.equinox.p2.core/META-INF/MANIFEST.MF
index d810d1ccd..4e49543f7 100644
--- a/bundles/org.eclipse.equinox.p2.core/META-INF/MANIFEST.MF
+++ b/bundles/org.eclipse.equinox.p2.core/META-INF/MANIFEST.MF
@@ -2,7 +2,7 @@ Manifest-Version: 1.0
 Bundle-ManifestVersion: 2
 Bundle-Name: %pluginName
 Bundle-SymbolicName: org.eclipse.equinox.p2.core;singleton:=true
-Bundle-Version: 2.13.100.qualifier
+Bundle-Version: 2.13.200.qualifier
 Bundle-Activator: org.eclipse.equinox.internal.p2.core.Activator
 Bundle-Vendor: %providerName
 Bundle-Localization: plugin
@@ -63,7 +63,7 @@ Export-Package: org.eclipse.equinox.internal.p2.core;x-friends:="org.eclipse.equ
    org.eclipse.equinox.p2.updatesite,
    org.eclipse.equinox.p2.director.app,
    org.eclipse.equinox.p2.transport.ecf",
- org.eclipse.equinox.p2.core;version="2.13.100";uses:="org.eclipse.core.runtime",
+ org.eclipse.equinox.p2.core;version="2.13.200";uses:="org.eclipse.core.runtime",
  org.eclipse.equinox.p2.core.spi;version="2.2.0";uses:="org.eclipse.equinox.p2.core"
 Bundle-RequiredExecutionEnvironment: JavaSE-17
 Bundle-ActivationPolicy: lazy
diff --git a/features/org.eclipse.equinox.p2.core.feature/feature.xml b/features/org.eclipse.equinox.p2.core.feature/feature.xml
index 4fd7e5946..b0b616d46 100644
--- a/features/org.eclipse.equinox.p2.core.feature/feature.xml
+++ b/features/org.eclipse.equinox.p2.core.feature/feature.xml
@@ -2,7 +2,7 @@
 <feature
       id="org.eclipse.equinox.p2.core.feature"
       label="%featureName"
-      version="1.7.800.qualifier"
+      version="1.7.900.qualifier"
       provider-name="%providerName"
       license-feature="org.eclipse.license"
       license-feature-version="0.0.0">
diff --git a/features/org.eclipse.equinox.p2.extras.feature/feature.xml b/features/org.eclipse.equinox.p2.extras.feature/feature.xml
index d5adc408b..d69cce961 100644
--- a/features/org.eclipse.equinox.p2.extras.feature/feature.xml
+++ b/features/org.eclipse.equinox.p2.extras.feature/feature.xml
@@ -2,7 +2,7 @@
 <feature
       id="org.eclipse.equinox.p2.extras.feature"
       label="%featureName"
-      version="1.4.2900.qualifier"
+      version="1.4.3000.qualifier"
       provider-name="%providerName"
       license-feature="org.eclipse.license"
       license-feature-version="0.0.0">
diff --git a/features/org.eclipse.equinox.p2.rcp.feature/feature.xml b/features/org.eclipse.equinox.p2.rcp.feature/feature.xml
index 3da8783e2..a7dcf8d68 100644
--- a/features/org.eclipse.equinox.p2.rcp.feature/feature.xml
+++ b/features/org.eclipse.equinox.p2.rcp.feature/feature.xml
@@ -2,7 +2,7 @@
 <feature
       id="org.eclipse.equinox.p2.rcp.feature"
       label="%featureName"
-      version="1.4.2900.qualifier"
+      version="1.4.3000.qualifier"
       provider-name="%providerName"
       license-feature="org.eclipse.license"
       license-feature-version="0.0.0">
diff --git a/features/org.eclipse.equinox.p2.sdk/feature.xml b/features/org.eclipse.equinox.p2.sdk/feature.xml
index 884bcfd27..02f3488f7 100644
--- a/features/org.eclipse.equinox.p2.sdk/feature.xml
+++ b/features/org.eclipse.equinox.p2.sdk/feature.xml
@@ -2,7 +2,7 @@
 <feature
       id="org.eclipse.equinox.p2.sdk"
       label="%featureName"
-      version="3.11.2900.qualifier"
+      version="3.11.3000.qualifier"
       provider-name="%providerName"
       license-feature="org.eclipse.license"
       license-feature-version="0.0.0">
diff --git a/features/org.eclipse.equinox.p2.user.ui/feature.xml b/features/org.eclipse.equinox.p2.user.ui/feature.xml
index ee5457fb2..ba059b135 100644
--- a/features/org.eclipse.equinox.p2.user.ui/feature.xml
+++ b/features/org.eclipse.equinox.p2.user.ui/feature.xml
@@ -2,7 +2,7 @@
 <feature
       id="org.eclipse.equinox.p2.user.ui"
       label="%featureName"
-      version="2.4.2900.qualifier"
+      version="2.4.3000.qualifier"
       provider-name="%providerName"
       license-feature="org.eclipse.license"
       license-feature-version="0.0.0">
diff --git a/features/org.eclipse.equinox.server.p2/feature.xml b/features/org.eclipse.equinox.server.p2/feature.xml
index 2d2bc2e3d..0f2719038 100644
--- a/features/org.eclipse.equinox.server.p2/feature.xml
+++ b/features/org.eclipse.equinox.server.p2/feature.xml
@@ -2,7 +2,7 @@
 <feature
       id="org.eclipse.equinox.server.p2"
       label="%featureName"
-      version="1.12.1800.qualifier"
+      version="1.12.1900.qualifier"
       provider-name="%providerName"
       license-feature="org.eclipse.license"
       license-feature-version="0.0.0">
-- 
2.51.0

Further information are available in Common Build Issues - Missing version increments.

private volatile boolean stopped = false;
private ServiceRegistration<IProvisioningAgent> reg;
private final Map<ServiceReference<IAgentServiceFactory>, ServiceTracker<IAgentServiceFactory, Object>> trackers = Collections
private final Map<String, ServiceTracker<IAgentServiceFactory, Object>> trackers = Collections
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would here today use a ConcurrentHashMap, also instead of storing a ServiceTracker object it would be better to use a dedicated class (that internal holds / manages a ServiceTracker), then one can use a quite nice pattern in a way that one first computes that class and then sync on the methods of that particular class. That way the map can work completely lock-free.

Copy link
Contributor Author

@basilevs basilevs Sep 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While lock-free map is nice, it is not critical in this case, because long-running operations are already done and managed by ServiceTracker outside of locks. The map only holds an instance of ServiceTracker, creation of which does not require any synchronization. ServiceTracker also provides necessary method synchronization, so no additional wrapping is needed. Indeed, ServiceTracker was designed for this exact purpose.

Also, performance is not a concern here, but computeIfAbsent() for ConcurrentHashMap carries same deadlock risks as Collections.synchronizedMap(), just hides some of conflicts.

if (stopped) {
return;
}
agentServices.remove(serviceName, service);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also for the agentServices I would use ConcurrentHashMap

@laeubi
Copy link
Member

laeubi commented Sep 1, 2025

The test failures seem mostly be cause by the fact that

ProvisioningAgent.stop

closes the tracker but has already been marked as being stopped. I think it must first close all trackers (and maybe release other things as well) to give a chance for the services to properly shut down.

@basilevs
Copy link
Contributor Author

basilevs commented Sep 1, 2025

I think it must first close all trackers (and maybe release other things as well) to give a chance for the services to properly shut down.

This will allow population/restoration of services during stop procedure.

I suggest instead to allow access and removal of services while stopped. It makes no sense to disallow access when service is present. I've pushed a prototype.

@basilevs basilevs requested a review from laeubi September 1, 2025 11:51
@laeubi
Copy link
Member

laeubi commented Sep 1, 2025

This will allow population/restoration of services during stop procedure.

If I can't perform required action the stop is not really useful. I also wonder in what cases it really will make sense here and given we did not called stop() before maybe even lead to undesired effects.

Overall, as this is a very crucial part of P2 and Eclipse platform and even used inside Tycho I think we would need to extract this into much smaller pieces each of them only covering a small subset of this PR to get more confident it does not break and understand why a certain thing is good to change.

Also at best the would be some kind of testcase that shows the problem and is fixed afterwards.

@basilevs
Copy link
Contributor Author

basilevs commented Sep 1, 2025

@laeubi

If I can't perform required action the stop is not really useful

It is able to stop each service as long as service dependencies are still present. To ensure this, my implementation disposes services in an inverse order of their creation.

I also wonder in what cases it really will make sense here and given we did not called stop() before maybe even lead to undesired effects.

The case is: a stopped service erroneously accesses a dependency that already went away. We can not allow to recreate a dependency, because then the service will work with a new instance while making an invalid assumption, that it was the original.

Current implementation has known defects

  • it leaks unstopped services
  • obsolete services continue to be provided when one with a higher ranking is registered
  • unstarted services are exposed to consumers

extract this into much smaller pieces each of them only covering a small subset of this PR

Splitting the PR is hard because ServiceTrackers are misused in the existing implementation (monitor an ServicesReference of volatile ranking, instead of the highest ranking). I will reopen #938 and see what can be done.

Tests are required, but they would take a significant effort, so I'm collecting input on overall approach (thanks for the comments so far).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Deadlock in ProvisioningAgent
3 participants