Skip to content

Conversation

limengning
Copy link

@limengning limengning commented Aug 16, 2025

Description

In multi support, when reset file will only set file point as NULL, but not release the memory allocated to message.
This pr will release the message and gm memory and rebuild gm link array on grib context.
Need to merge ecmwf/cfgrib#428 as well to fix the bug.

Contributor Declaration

By opening this pull request, I affirm the following:

  • All authors agree to the Contributor License Agreement.
  • The code follows the project's coding standards.
  • I have performed self-review and added comments where needed.
  • I have added or updated tests to verify that my changes are effective and functional.
  • I have run all existing tests and confirmed they pass.

@limengning
Copy link
Author

limengning commented Aug 17, 2025

Fix issue ecmwf/cfgrib#283 ecmwf/cfgrib#325 and ecmwf/cfgrib#429

@codecov-commenter
Copy link

codecov-commenter commented Aug 18, 2025

Codecov Report

❌ Patch coverage is 84.61538% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.39%. Comparing base (5b1f691) to head (110e08e).
⚠️ Report is 6 commits behind head on develop.

Files with missing lines Patch % Lines
src/eccodes/grib_handle.cc 84.61% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #352      +/-   ##
===========================================
- Coverage    88.41%   88.39%   -0.02%     
===========================================
  Files          844      844              
  Lines        62721    62735      +14     
  Branches     11116    11119       +3     
===========================================
+ Hits         55456    55457       +1     
- Misses        7265     7278      +13     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@shahramn
Copy link
Collaborator

Can you please provide a test which shows there is a leak. We have an example here:
https://github.com/ecmwf/eccodes/blob/develop/examples/C/multi2.c
which calls the function
codes_grib_multi_support_reset_file(codes_context_get_default(), fp);

Running this with a multi-field input GRIB file shows no leaks

% cd $build_dir
% cd examples/C
% valgrind ./c_multi2
...
==3834087== LEAK SUMMARY:
==3834087==    definitely lost: 0 bytes in 0 blocks
==3834087==    indirectly lost: 0 bytes in 0 blocks
==3834087==      possibly lost: 0 bytes in 0 blocks

@limengning
Copy link
Author

limengning commented Aug 18, 2025

Can you please provide a test which shows there is a leak. We have an example here: https://github.com/ecmwf/eccodes/blob/develop/examples/C/multi2.c which calls the function codes_grib_multi_support_reset_file(codes_context_get_default(), fp);

Running this with a multi-field input GRIB file shows no leaks

% cd $build_dir
% cd examples/C
% valgrind ./c_multi2
...
==3834087== LEAK SUMMARY:
==3834087==    definitely lost: 0 bytes in 0 blocks
==3834087==    indirectly lost: 0 bytes in 0 blocks
==3834087==      possibly lost: 0 bytes in 0 blocks

Here is a test in python lib. ecmwf/cfgrib#429
The read_file function called in loop and will cause the memory leak.
This code is in cfgrib lib, when we use xarray to open grib file will case a huge memory leak.

def read_data():
    with xr.open_dataset('./data/2.2.grb2', engine='cfgrib', decode_timedelta=True) as ds:
        lons = ds.longitude.to_numpy()
        lats = ds.latitude.to_numpy()
        values = ds['unknown'].sum(dim='step').to_numpy()
    return (lons, lats, values)

Whe I open a 50M size of grib2 file having 70 counts of values in one geom point will use 50*70=3.5G memory.
memory_leak.py
2.2.grb2
Need to merge ecmwf/cfgrib#428 as well to fix the bug.

@github-actions github-actions bot removed the approved-for-ci Approved to run CI on ECMWF machines label Aug 19, 2025
@shahramn
Copy link
Collaborator

shahramn commented Aug 19, 2025

Actually if I run my test with your proposed changes, then there is a leak!

cd examples/C
valgrind ./c_multi2
==3618244== definitely lost: 24 bytes in 4 blocks

The current version of ecCodes does not leak (see above)

So I think the leak is in cfgrib and not in the core library.

@limengning
Copy link
Author

limengning commented Aug 19, 2025

I change my code and there is no memory leak.
And I add a example multi3.c for open file for 100 times. If we don't free memory when reset file, the pragram whill use large mount of memroy.
I have to call codes_context_delete after 100 times read file because when I call codes_context_delete in loop will cause a error.
In my change, it will release memory immediately and will reduce a lot memory when the program is running.

@shahramn shahramn added the approved-for-ci Approved to run CI on ECMWF machines label Aug 21, 2025
@shahramn
Copy link
Collaborator

shahramn commented Sep 1, 2025

Thanks for providing the latest multi3.c
I have run that with the current version of ecCodes (develop branch) and here is the valgrind report:

==3405421== HEAP SUMMARY:
==3405421==     in use at exit: 523,850 bytes in 1,766 blocks
==3405421==   total heap usage: 1,314,734 allocs, 1,312,968 frees, 604,926,299 bytes allocated
==3405421== 
==3405421== LEAK SUMMARY:
==3405421==    definitely lost: 0 bytes in 0 blocks
==3405421==    indirectly lost: 0 bytes in 0 blocks
==3405421==      possibly lost: 0 bytes in 0 blocks
==3405421==    still reachable: 523,850 bytes in 1,766 blocks
==3405421==         suppressed: 0 bytes in 0 blocks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved-for-ci Approved to run CI on ECMWF machines contributor
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants