Skip to content

Redis GetConnection stuck waiting #3434

@huhle

Description

@huhle

Bug Report

My program gets stuck while trying to call scan method asynchronously. I found similar (closed) issue #2422.

Current Behavior

Program gets randomly stuck while waiting for connection
Here is the stack trace of stuck thread (I am adding full thread dump to the reproducer.zip)

Stack trace
"lettuce-nioEventLoop-7-1" #58 [300917] daemon prio=5 os_prio=0 cpu=27.51ms elapsed=125.79s tid=0x00007f9049589530 nid=300917 waiting on condition  [0x00007f9024cdd000]
   java.lang.Thread.State: WAITING (parking)
	at jdk.internal.misc.Unsafe.park(java.base@21.0.5/Native Method)
	- parking to wait for  <0x0000000631972928> (a java.util.concurrent.CompletableFuture$Signaller)
	at java.util.concurrent.locks.LockSupport.park(java.base@21.0.5/LockSupport.java:221)
	at java.util.concurrent.CompletableFuture$Signaller.block(java.base@21.0.5/CompletableFuture.java:1864)
	at java.util.concurrent.ForkJoinPool.unmanagedBlock(java.base@21.0.5/ForkJoinPool.java:3780)
	at java.util.concurrent.ForkJoinPool.managedBlock(java.base@21.0.5/ForkJoinPool.java:3725)
	at java.util.concurrent.CompletableFuture.waitingGet(java.base@21.0.5/CompletableFuture.java:1898)
	at java.util.concurrent.CompletableFuture.join(java.base@21.0.5/CompletableFuture.java:2117)
	at io.lettuce.core.DefaultConnectionFuture.join(DefaultConnectionFuture.java:64)
	at io.lettuce.core.cluster.PooledClusterConnectionProvider.getConnection(PooledClusterConnectionProvider.java:476)
	at io.lettuce.core.cluster.PooledClusterConnectionProvider.getConnection(PooledClusterConnectionProvider.java:421)
	at io.lettuce.core.cluster.StatefulRedisClusterConnectionImpl.getConnection(StatefulRedisClusterConnectionImpl.java:194)
	at io.lettuce.core.cluster.api.StatefulRedisClusterConnection.getConnection(StatefulRedisClusterConnection.java:90)
	at io.lettuce.core.cluster.RedisAdvancedClusterAsyncCommandsImpl.clusterScan(RedisAdvancedClusterAsyncCommandsImpl.java:769)
	at io.lettuce.core.cluster.RedisAdvancedClusterAsyncCommandsImpl.clusterScan(RedisAdvancedClusterAsyncCommandsImpl.java:700)
	at io.lettuce.core.cluster.RedisAdvancedClusterAsyncCommandsImpl.scan(RedisAdvancedClusterAsyncCommandsImpl.java:663)
	at Client.scanDatabase(Client.java:158)
	at Client.lambda$scanDatabase$7(Client.java:174)
	at Client$$Lambda/0x00007f8fd029f9d0.apply(Unknown Source)
	at java.util.concurrent.CompletableFuture$UniCompose.tryFire(java.base@21.0.5/CompletableFuture.java:1150)
	at java.util.concurrent.CompletableFuture.postComplete(java.base@21.0.5/CompletableFuture.java:510)
	at java.util.concurrent.CompletableFuture.complete(java.base@21.0.5/CompletableFuture.java:2179)
	at io.lettuce.core.protocol.AsyncCommand.completeResult(AsyncCommand.java:126)
	at io.lettuce.core.protocol.AsyncCommand.complete(AsyncCommand.java:115)
	at io.lettuce.core.protocol.CommandWrapper.complete(CommandWrapper.java:67)
	at io.lettuce.core.cluster.ClusterCommand.complete(ClusterCommand.java:50)
	at io.lettuce.core.protocol.CommandHandler.complete(CommandHandler.java:769)
	at io.lettuce.core.protocol.CommandHandler.decode(CommandHandler.java:704)
	at io.lettuce.core.protocol.CommandHandler.channelRead(CommandHandler.java:621)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1357)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:868)
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:796)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:732)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:658)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:998)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.lang.Thread.runWith(java.base@21.0.5/Thread.java:1596)
	at java.lang.Thread.run(java.base@21.0.5/Thread.java:1583) 

Input Code

reproducer.zip

Environment

  • Lettuce version(s): 6.7.1.RELEASE
  • Redis version: reproducer uses embedded redis 6.2.11, but I can also simulate the problem na 7.4.4

Additional context

The reproducer gets stuck within first 5 iterations on my machine
When I increase the ioThreadPoolSize to be > 5 threads. It works fine, so probably some thread starvation problem?

Also when I create special client for putting data to db and another for scanning, like

    try (Client cl = new Client("localhost", cluster.getPort(), 25, 2)) {

      // add some data
      putToCache(cl, "too-old", 15);
      putToCache(cl, "exactly-same", 20);
      putToCache(cl, "new-enough", 25);
    }

    try (Client cl = new Client("localhost", cluster.getPort(), 25, 2)) {
      Long deletedCount = cl.deleteOlderThan(Instant.ofEpochMilli(20)).toCompletableFuture().join();
      System.out.println("Deleted count: " + deletedCount);
    }

It also stops happening.

Hope this helps and thanks for checking...

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions