Trino 476 on AWS EKS - Thread exhaustion on coordinator causes crash #26229
Unanswered
jonroquet2
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
We're deploying our Trino app on an EKS cluster in AWS. We're able to utilize this cluster to make queries, but when we scale up the load to 12 threads of user requests, we get the following error in the coordinator logs:
[714.377s][warning][os,thread] Failed to start thread "Unknown thread" - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 0k, detached.
[714.377s][warning][os,thread] Failed to start the native thread for java.lang.Thread "remote-task-callback-240"
[714.377s][error ][jvmti ] Posting Resource Exhausted event: unable to create native thread: possibly out of memory or process/resource limits reached
ResourceExhausted: unable to create native thread: possibly out of memory or process/resource limits reached: killing current process!
[714.378s][warning][os,thread] Failed to start thread "Unknown thread" - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 0k, detached.
Our assumption is that the coordinator pod is running out of native threads - this isn't a JVM problem, its a native image problem. When we shell into the coordinator and check the limits, we see this:
real-time non-blocking time (microseconds, -R) unlimited
core file size (blocks, -c) unlimited
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 30446
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 1024
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
We're assuming the low -n and -u limits are causing the issues. However, every attempt at overriding these limits have failed - via custom image creation, via helm chart override, and via EC2 configuration override. Can anyone help us understand where we're going wrong?
Beta Was this translation helpful? Give feedback.
All reactions