Skip to content

Conversation

midichef
Copy link
Contributor

@midichef midichef commented Jul 31, 2025

This PR fixes failure of addcol-shell in replay mode or macros, where the resulting column has no shell output, just an error in every cell. It also fixes a race condition where most cells would have the expected output, but a rare cell could be empty with an error. Closes #2786.

@midichef midichef force-pushed the asynccache_threadsafe branch from 09a33ec to b6f1a43 Compare July 31, 2025 05:45
@midichef
Copy link
Contributor Author

Demonstration of the errors
The failure in replay/macros is this: when addcol-shell is run in a replay file or a macro, it gives an error running calcValue() for each cell:

        File "/home/midichef/.venv/lib/python3.12/site-packages/visidata/shell.py", line 60, in <lambda>
            Column(cmd+'_stdout', type=bytes.rstrip, srccol=shellcol, getter=lambda col,row: col.srccol.getValue(row)[0]),
        TypeError: '_MainThread' object is not subscriptable

This error can be demonstrated with vd -b -i -p shellcol.vdj using a replay file shellcol.vdj:

#!vd -p
{"sheet": "", "col": "", "row": "", "longname": "open-file", "input": "sample_data/benchmark.csv", "keystrokes": "o", "comment": "", "replayable": true}
{"sheet": "benchmark", "col": "Date", "row": "", "longname": "addcol-shell", "input": "echo $SKU", "keystrokes": "z;", "comment": "", "replayable": true}

To demonstrate the race condition possibility, put this in .visidatarc:

import random, time
@VisiData.after
def execAsync(vd, func, *args, **kwargs):
    # run only for the @asynccache commands
    if len(args) == 3 and 'ColumnShell: echo $SKU' in repr(args[0]):
        if random.randint(1,4) < 3:
            time.sleep(0.5)

then run vd sample_data/benchmark.csv and do z; and input echo $SKU. After several seconds of delay, the new columns appears. Several cells, but not all, should be blank with errors.

@midichef
Copy link
Contributor Author

The use of setdefault is apparently a common way to read/write a dict value safely as an atomic operation. A long comp.lang.python thread from 2018 gives a sense that it is fairly standard. This python.org thread from 2024 implies that the lack of explicit guarantees is deliberate. I've gathered that using this technique is common enough to be safe, and future implementations of CPython won't casually break this use case.

However, I don't know anything about how the removal of the GIL will affect this implementation. But for today's CPython, this PR is safer than the existing code.

Copy link
Owner

@saulpw saulpw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for figuring this out, @midichef! The fix looks good to me.

midichef added 2 commits July 31, 2025 00:22
The straightforward way to assign to d[k] is not thread-safe:
d[k] = vd.execAsync(_func, k, *args, **kwargs)
because events can happen in this order:
1) _func finishes, assigning func()'s return value to d[k], then
2) execAsync's return value is assigned to d[k], overwriting it.

The same failure happens when replay mode temporarily uses
execSync in place of every execAsync call. Using setdefault() is
safe, assigning to d[k] only if _func hasn't yet done so.
The setdefault() technique is commonly said to be atomic on
CPython, for a standard dictionary with no custom methods.
@midichef midichef force-pushed the asynccache_threadsafe branch from b6f1a43 to 9510c9e Compare July 31, 2025 07:25
@midichef
Copy link
Contributor Author

Okay, I've moved the comments into the commit message.

@midichef midichef requested a review from saulpw September 6, 2025 06:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[macros/addcol-shell] Asynchronous Issues with addcol-shell Command in Macros
2 participants