-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Current Behavior
We deploy apisix in K8s cluster and have problem with prometheus metrics.
We noticed that lua_shared_dict prometheus-metrics overflows, then the number of apisix_nginx_metric_errors_total errors starts to grow and all metrics stop displaying correctly.
We try increase the prometheus-metrics parameter to 40m in the ConfigMap (config.yaml), but after 2 months this lua_shared_dict was full on all pods and errors started to occur again.
nginx_config: # config for render the template to genarate nginx.conf
error_log: "/dev/stderr"
error_log_level: "warn" # warn,error
worker_processes: "auto"
enable_cpu_affinity: true
worker_rlimit_nofile: 20480 # the number of files a worker process can open, should be larger than worker_connections
event:
worker_connections: 10620
http:
enable_access_log: true
access_log: "/dev/stdout"
access_log_format: '$remote_addr - $remote_user [$time_local] $http_host \"$request\" $status $body_bytes_sent $request_time \"$http_referer\" \"$http_user_agent\" $upstream_addr $upstream_status $upstream_response_time \"$upstream_scheme://$upstream_host$upstream_uri\"'
access_log_format_escape: default
keepalive_timeout: "60s"
client_header_timeout: 60s # timeout for reading client request header, then 408 (Request Time-out) error is returned to the client
client_body_timeout: 60s # timeout for reading client request body, then 408 (Request Time-out) error is returned to the client
send_timeout: 10s # timeout for transmitting a response to the client.then the connection is closed
underscores_in_headers: "on" # default enables the use of underscores in client request header fields
real_ip_header: "X-Real-IP" # http://nginx.org/en/docs/http/ngx_http_realip_module.html#real_ip_header
real_ip_from: # http://nginx.org/en/docs/http/ngx_http_realip_module.html#set_real_ip_from
- 127.0.0.1
- 'unix:'
lua_shared_dict:
prometheus-metrics: 40m
Current Apisix state
- Deployment via Helm chart: https://github.com/apache/apisix-helm-chart
- Helm Chart version: 2.10.0
- K8s pods: 3
- Pod CPU limits: 15 (usage 4%)
- Pod Memory limits: 60Gb (usage 35 GiB)
- Total requests per second: 2500 - 3000
- Active connections: 2000+
- Upstreams: 100+
- Routes: 120+
- Consumers: 60+
- Plugins: basic-auth and kafka-logger on all routes
Expected Behavior
No response
Error Logs
No response
Steps to Reproduce
- Run apisix with default lua_shared_dict: prometheus-metrics
- After 2-3 weeks prometheus-metrics overflows and apisix_nginx_metric_errors_total errors starts to grow and all metrics stop displaying correctly
- Change lua_shared_dict: prometheus-metrics to 40m
- After 2-3 months lua_shared_dict overflows again and we get a similar problem with displaying metrics
Environment
- APISIX version (run
apisix version
): 3.10.0 - Operating system (run
uname -a
): Linux apisix-69cfdc5fbf-m7k27 5.14.0-362.13.1.el9_3.x86_64 SMP PREEMPT_DYNAMIC Fri Nov 24 01:57:57 EST 2023 x86_64 GNU/Linux - OpenResty / Nginx version (run
openresty -V
ornginx -V
): openresty/1.25.3.2 - etcd version, if relevant (run
curl http://127.0.0.1:9090/v1/server_info
): 3.5.0 - APISIX Dashboard version, if relevant: 3.0.0
- Plugin runner version, for issues related to plugin runners:
- LuaRocks version, for installation issues (run
luarocks --version
):
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working
Type
Projects
Status
📋 Backlog