Skip to content

Conversation

kolyshkin
Copy link
Contributor

@kolyshkin kolyshkin commented Aug 7, 2025

Since GHA now provides ARM, we can switch away from actuated.

Many thanks to @alexellis (@self-actuated) for supporting this project.


PS Currently, criu installed from opensuse build farm repo doesn't work
on GHA arm. While we investigate it, let's disable this combination.
Tracked in checkpoint-restore/criu#2709.

@kolyshkin
Copy link
Contributor Author

So

  1. with GHA arm64 environment (ubuntu-24.04-arm), criu_4.1-1_arm64.deb from https://download.opensuse.org/repositories/devel:/tools:/criu/xUbuntu_24.04/ is stuck in tests (timeout is hit).
  2. with actuated CI (ubuntu-22.04, arm), and same criu version (but from https://download.opensuse.org/repositories/devel:/tools:/criu/xUbuntu_22.04/), it works.
  3. ubuntu-24.04 (x64), same criu version from the same repo as 1 (except architecture, of course) works.

Not sure where to start debugging it. It appears that criu CI on ubuntu-24.04-arm (GHA) runs just fine since April (checkpoint-restore/criu#2566). Cc: @adrianreber @rst0git

@kolyshkin
Copy link
Contributor Author

It appears that criu CI on ubuntu-24.04-arm (GHA) runs just fine since April (checkpoint-restore/criu#2566).

Correction: it was running fine, but no more. Opened checkpoint-restore/criu#2704

@kolyshkin
Copy link
Contributor Author

@rst0git can you help us here? Here's the situation with ubuntu-24.04-arm and criu:

Here you can find all the c/r logs for a test which timed out: test (ubuntu-24.04-arm, 1.24.x), raw logs

Here is a job with criu-dev which doesn't fail: https://github.com/opencontainers/runc/actions/runs/16841823814/job/47714378171?pr=4844

I see there were two patches to criu/arch/aarch64/crtools.c in criu-dev after 4.1 release. One (checkpoint-restore/criu@5d5a1e1) is missing from the deb, and the other (checkpoint-restore/criu@799504d) looks different in deb than in criu-dev.

Can you take a look?

@kolyshkin kolyshkin force-pushed the gha-arm branch 2 times, most recently from 373b12e to c6b3d23 Compare August 10, 2025 02:34
@kolyshkin kolyshkin changed the title [test] CI: switch to GHA for arm CI: switch to GHA for arm Aug 10, 2025
@kolyshkin kolyshkin marked this pull request as ready for review August 11, 2025 17:57
@kolyshkin kolyshkin requested review from rata, AkihiroSuda and lifubang and removed request for rata August 11, 2025 17:57
Copy link
Member

@rata rata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@rst0git
Copy link
Contributor

rst0git commented Aug 12, 2025

@rst0git can you help us here?

I've updated the OBS and Launchpad packages to 4.1.1 but have not been able to replicate the error yet.

I see there were two patches to criu/arch/aarch64/crtools.c in criu-dev after 4.1 release.

We merged these patches because the 4.1 release fails to compile when building the deb packages. These patches were already included: https://github.com/rst0git/criu-deb-packages/commits/open-build-service/

the other (checkpoint-restore/criu@799504d) looks different in deb than in criu-dev.

If I remember correctly, we had to rename user_pac_generic_keys to cr_user_pac_generic_keys to resolve a compilation error in Launchpad: https://github.com/rst0git/criu-deb-packages/blob/launchpad-24.04/debian/patches/

# (need to compile criu) and don't add much value/coverage.
- criu: criu-dev
go-version: 1.23.x
os: ubuntu-24.04
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this exclusion added?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we want ubuntu-24.04 to be run with criu from the package, but we want ubuntu-24.04-arm to be run with criu-dev (as criu package is not yet working, as described in the commit message).

Yes, I know, it is kind of complicated, I'd rather have criu package fixed.

os: actuated-arm64-6cpu-8gb
- criu: criu-dev
os: actuated-arm64-6cpu-8gb
os: ubuntu-24.04
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same reason as above

sudo apt -y install criu
- name: install CRIU (criu ${{ matrix.criu }})
- name: install CRIU (${{ matrix.criu }})
Copy link
Member

@lifubang lifubang Aug 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove this and use 'else' here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GHA workflow does not have else for if. Or do you mean something else? If yes, please show the code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think we can use else in bash, or else we will see a log like this in CI:

Install CRIU ()

Since GHA now provides ARM, we can switch away from actuated.

Many thanks to @alexellis (@self-actuated) for being the sponsor of this
project.

Signed-off-by: Kir Kolyshkin <[email protected]>
@kolyshkin kolyshkin marked this pull request as draft August 12, 2025 21:00
@kolyshkin
Copy link
Contributor Author

@rst0git can you help us here?

I've updated the OBS and Launchpad packages to 4.1.1 but have not been able to replicate the error yet.

Let's test here.

@kolyshkin
Copy link
Contributor Author

@rst0git can you help us here?

I've updated the OBS and Launchpad packages to 4.1.1 but have not been able to replicate the error yet.

Let's test here.

aaand it times out just like before, using criu_4.1.1-1_arm64.deb

OK, I'm finalizing this PR to use criu built from sources on gha arm, and opening an issue with criu.

Currently, criu package from opensuse build farm times out on GHA arm,
so let's only use criu-dev (i.e. compiled from source on CI machine).

Once this is fixed, this patch can be reverted.

Related to criu issue 2709.

Signed-off-by: Kir Kolyshkin <[email protected]>
@kolyshkin kolyshkin requested a review from lifubang August 12, 2025 21:51
@rst0git
Copy link
Contributor

rst0git commented Aug 12, 2025

it times out just like before, using criu_4.1.1-1_arm64.deb

Yes, I was expecting this. The changes between 4.1 and 4.1.1 are unrelated to this problem. The newer release contains a patch only for checkpoint-restore/criu#2694. I would need to git bisect the criu-dev branch to find the commit fixing this.

@kolyshkin kolyshkin marked this pull request as ready for review August 13, 2025 17:27
@kolyshkin
Copy link
Contributor Author

Yes, I was expecting this. The changes between 4.1 and 4.1.1 are unrelated to this problem. The newer release contains a patch only for checkpoint-restore/criu#2694. I would need to git bisect the criu-dev branch to find the commit fixing this.

The issue is, when I use the same criu version but compile it from source right there in the CI job, it works. So my gut feeling this is caused by something in your (i.e. opensuse) build environment -- older compiler, some compiler flags, etc.

Here's how we compile criu in CI (in this case, using ubuntu-24.04-arm):

sudo apt -qy install \
libcap-dev libnet1-dev libnl-3-dev uuid-dev \
libprotobuf-c-dev libprotobuf-dev protobuf-c-compiler protobuf-compiler
git clone --depth 1 --branch ${{ matrix.criu }} --single-branch \
https://github.com/checkpoint-restore/criu.git ~/criu
(cd ~/criu && sudo make -j $(nproc) install-criu)
rm -rf ~/criu

@rst0git
Copy link
Contributor

rst0git commented Aug 13, 2025

@kolyshkin I'm still investigating this. It seems to happen with both OBS and Launchpad packages.

@kolyshkin
Copy link
Contributor Author

@kolyshkin I'm still investigating this. It seems to happen with both OBS and Launchpad packages.

Thank you, but we're not blocked here (we can use criu-dev for now). The main reason for this PR is to switch away from actuated CI as we've been using it for free all this time.

@kolyshkin
Copy link
Contributor Author

kolyshkin commented Aug 14, 2025

OK, I'm finalizing this PR to use criu built from sources on gha arm, and opening an issue with criu.

For the reference, criu timeout on gha arm is now tracked by checkpoint-restore/criu#2709

@kolyshkin
Copy link
Contributor Author

@lifubang PTAL

@cyphar cyphar merged commit 6b08448 into opencontainers:main Aug 15, 2025
34 checks passed
@cyphar
Copy link
Member

cyphar commented Aug 27, 2025

This needs to be backported to the release-1.x branches, btw.

@cyphar cyphar added backport/1.2-todo A PR in main branch which needs to be backported to release-1.2 backport/1.3-todo A PR in main branch which needs to be backported to release-1.3 labels Aug 27, 2025
@lifubang lifubang added backport/1.2-done A PR in main branch which has been backported to release-1.2 backport/1.3-done A PR in main branch which has been backported to release-1.3 and removed backport/1.2-todo A PR in main branch which needs to be backported to release-1.2 backport/1.3-todo A PR in main branch which needs to be backported to release-1.3 labels Aug 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/1.2-done A PR in main branch which has been backported to release-1.2 backport/1.3-done A PR in main branch which has been backported to release-1.3
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants