Skip to content

Conversation

UkuLoskit
Copy link
Contributor

@UkuLoskit UkuLoskit commented Oct 11, 2023

Implements #1890

  • Each shard has its own RNG due to the share nothing architecture

Questions:

  • Redis uses AES in counter mode. I think this is not important but just an implementation detail, so this is why I decided against using it via OpenSSL and opted to use the standard library. Is this fine?

  • Should tests use a fixed seed for testing actual returned value? If not, then the changes to test RNG initialization can be removed.

Signed-off-by: Uku Loskit <[email protected]>
@kostasrim kostasrim self-requested a review October 11, 2023 19:10
Copy link
Contributor

@kostasrim kostasrim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @UkuLoskit and thank you for your contribution 🚀

I left a couple of comments, mostly about:

  1. The functionality to be on par with redis
  2. Some small changes on random generation

Let me know if you have any questions!

Comment on lines 487 to 496
auto rbe = ServerState::tlocal()->random_bits_engine;
std::vector<unsigned char> data((random_bits + 3) / 4);
std::generate(begin(data), end(data), std::ref(*rbe));

const std::string hex_chars = "0123456789abcdef";
std::string response;
for (unsigned char c : data) {
response += hex_chars[c & 0x0F];
}

Copy link
Contributor

@kostasrim kostasrim Oct 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's rewrite this a little bit but first let me answer your question:

Each shard has its own RNG due to the share nothing architecture
Each of the shards can use this function concurrently since data is not really shared between different calls to this function. Therefore, the logic I commented/removed is not really needed.

Now let's rewrite this to be a little closer to redis:

  1. You can use std::[random_device](https://en.cppreference.com/w/cpp/numeric/random/random_device). This is a nice function that accepts a token. Redis uses, /dev/urandom/ which nicely bundles with this class because:

The implementation in libstdc++ expects token to name the source of random bytes. Possible token values include "default", "hw", "rand_s", "rdseed", "rdrand", "rdrnd", "/dev/urandom", "/dev/random", "mt19937", and integer string specifying the seed of the mt19937 engine. (Token values other than "default" are only valid for certain targets.)

Therefore, each time you call this function you can:

std::random_device urandom("/dev/urandom");

and this will work, because the /dev/urandom will produce a correct random number each time (the std::random_device is just a wrapper around it).

Now we have a nice generator than can generate random number for us. How can we transform it?
First, we need some simple compile time information:

constexpr random_bytes_produced = sizeof(decltype(urandom.max()));

This expression will return the number of bytes returned by a single call to /dev/urandom. That's our minimum. So:

  1. Adjust the minimum bits of this number, basically random_bytes_produced * 8. This is the minimum value that the [bits] command parameter accept.
  2. Write the result output into the string with the help of abseil:
std::string result;
for(number in generated numbers): 
  absl::StrAppend(&result, absl::Hex(number, kZeroPad4));

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small update, I edited the last step and removed the chore with the module since number + 3 / 4 covers it :)

Copy link
Contributor Author

@UkuLoskit UkuLoskit Oct 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kostasrim, first of all thanks for the thorough and speedy feedback!

I don't quite follow the constexpr random_bytes_produced = sizeof(decltype(urandom.max())); explanation.
I think the point is that the number of bytes returned can be platform-dependent and this piece of code calculates how many bytes this abstraction returns?

Or at least for me the important part where we actually generate the random bytes is missing. I'm also kind of wondering if this C++ abstraction makes sense at all in this case, when we just to read a predetermined number of bytes. Sounds like it would be a lot more straight-forward just to do that explicitly :/ (open /dev/urandom, read n bytes).

Moreover, the way I implemented it in the first place was to match Redis's behavior:

The password generation is also very cheap because we don't really ask /dev/urandom for bits at every execution*. At startup Redis creates a seed using /dev/urandom, then it will use SHA256 in counter mode, with HMAC-SHA256(seed,counter) as primitive, in order to create more random bytes as needed. This means that the application developer should be feel free to abuse ACL GENPASS to create as many secure pseudorandom strings as needed.

From what I understand, you suggested that I remove the the random engine member from and read from /dev/urandom on every invocation. My idea was that this PRNG would be seeded once (for each thread) and the resulting stream could be reused. However, I submitted this PR perhaps too hastily, and did not check if this is how it actually works.

So, I have two concerns with my own and the proposed solution:
Firstly, this would make the ACL GENPASS command definitely a lot slower and spamming this command could result in reduced entropy as well.

Secondly, I'm not sure if this construction is suitable for a secure PRNG. Perhaps it would be safer to employ the same solution as Redis with its AES CTR? In that case I believe each thread should still have its own counter, read its own seed from urandom etc.

Copy link
Collaborator

@romange romange Oct 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think we should be creative here unless we understand well the security domain ( i do not). Security and simplicity have higher precedence here over performance and if implementing what Redis did achieve these results we could do the same.

Copy link
Contributor

@kostasrim kostasrim Oct 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@UkuLoskit hello hello hello :)

I don't quite follow the constexpr random_bytes_produced = sizeof(decltype(urandom.max())); explanation.
I think the point is that the number of bytes returned can be platform-dependent and this piece of code calculates how many bytes this abstraction returns?

Yes it's the size of the type the function returns. This should be 4 on all platforms but for sanity sake I like this chore because the standard doesn't mandate what the exact type is.

Or at least for me the important part where we actually generate the random bytes is missing. I'm also kind of wondering if this C++ abstraction makes sense at all in this case, when we just to read a predetermined number of bytes. Sounds like it would be a lot more straight-forward just to do that explicitly :/ (open /dev/urandom, read n bytes)

Nah, the C++ wraps it for you, including the error handling etc. There is no reason to open this file and read n bytes. That's why you have abstractions, to do it for you.

Moreover, the way I implemented it in the first place was to match Redis's behavior:

I did not see any HMAC-SHA25 in your PR?
In fact, we can match redis behaviour. Instead, of generating a random number each time from urandom you can use it as a seed once and then use HMAC (see redis/util.c:860 getRandomBytes)

ACL GENPASS command definitely a lot slower

This is not in the hot path, we can take a hit here -- which I doubt will be measurable

spamming this command could result in reduced entropy as well.

Generally speaking, I think urandom covers entropy well from its pool

In that case I believe each thread should still have its own counter, read its own seed from urandom etc.

Atomic would do here, no need for complicated logic. This is not in the hot path, even for other ACL commands we use locks...

I'm not sure if this construction is suitable for a secure PRNG.

From google (as I said I am not an expert in security), urandom should be pretty secure :)

I agree with @romange that we should be as close to redis as possible so if you are ok with it, you can take the HMAC route (I would be also happy with urandom).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kostasrim and @romange, as my work day is starting now, I tried to submit my changes before that.
I changed it to the inline version that @kostasrim suggested in the first place, perhaps it's for the best not to overengineer the initial solution too much and only revisit it in the future if there really are any performance concerns. From my investigation using strace ti seems that this actually calls getrandom under the hood and asks for far fewer bytes.

finally, @kostasrim , I did not know how apply your constexpr solution, so suggestions welcome for this.

Comment on lines 488 to 498
std::random_device urandom("/dev/urandom");
std::independent_bits_engine<std::default_random_engine, CHAR_BIT, unsigned char>
random_bits_engine(urandom());
std::vector<unsigned char> data((random_bits + 3) / 4);
std::generate(begin(data), end(data), std::ref(random_bits_engine));

std::string response;
for (auto c : data) {
absl::StrAppend(&response, absl::Hex(c & 0x0F, absl::kNoPad));
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    std::random_device urandom("/dev/urandom");
    const size_t result_length = (random_bits + 3) / 4;
    constexpr size_t step_size = sizeof(decltype(urandom.max()));
    std::string response;
    for(size_t bytes_written = 0; bytes_written < result_length; bytes_written += step_size) {
      absl::StrAppend(&response, absl::Hex(urandom(), absl::kZeroPad8));
    }
  
    if(response.size() > result_length) {
      const size_t stride = response.size() - result_length;
      response.erase(response.end() - stride, response.end());
    }


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will solve the problem :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, but in the end I did not still understand why the independent_bits engine solution was no good? Is it because it calls the generator too many times?

Copy link
Contributor

@kostasrim kostasrim Oct 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope, because:

[independent_bits_engine](https://en.cppreference.com/w/cpp/numeric/random/independent_bits_engine#top) 

is a random number engine adaptor that produces random numbers with 
different number of bits than that of the wrapped engine.

What we want here, is to use urandom and then trim a portion of it, if and only if we got more bits than what we requested. Basically, I want to avoid, independent_bits_engine interfering with urandom() and producing random bits that do not come from there

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and also:

template<
    class Engine,
    std::size_t W,
    class UIntType
    class independent_bits_engine;

W is a compile time parameter but we know this number at runtime

}
std::random_device urandom("/dev/urandom");
const size_t result_length = (random_bits + 3) / 4;
constexpr size_t step_size = sizeof(decltype(std::random_device::max()));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kostasrim hey, I changed this a bit, sincemax is a static member. Is this correct?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uh, https://en.cppreference.com/w/cpp/numeric/random/random_device/max turns out it's an unsigned int and it's defined by the standard.

So yes, it's fine!

Copy link
Contributor

@kostasrim kostasrim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @UkuLoskit for your contribution 🚀 🔥

@kostasrim kostasrim enabled auto-merge (squash) October 19, 2023 15:15
@kostasrim kostasrim merged commit 395f65d into dragonflydb:main Oct 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants