Skip to content

TimeoutException causes leakage of connections in several servers #139

@fooling

Description

@fooling

After connection timeout from the initialization, failed address will be added to a waiting queue:

        this.connector.addToWatingQueue(
            new ReconnectRequest(inetSocketAddressWrapper, 0, getHealSessionInterval()));
        log.error("Connect to " + SystemUtils.getRawAddress(inetSocketAddress) + ":"
            + inetSocketAddress.getPort() + " fail", throwable);

Stacktrace of the exception:

java.util.concurrent.TimeoutException: null

        at com.google.code.yanf4j.core.impl.FutureImpl.get(FutureImpl.java:143) ~[xmemcached-2.4.7.jar:?]

        at net.rubyeye.xmemcached.XMemcachedClient.connect(XMemcachedClient.java:565) [xmemcached-2.4.7.jar:?]

        at net.rubyeye.xmemcached.XMemcachedClient.<init>(XMemcachedClient.java:840) [xmemcached-2.4.7.jar:?]

        at net.rubyeye.xmemcached.XMemcachedClientBuilder.build(XMemcachedClientBuilder.java:362) [xmemcached-2.4.7.jar:?]

such code in MemcachedConnector.java causes the infinite loop:

          try {
             log.info("Trying to connect to " + address.getAddress().getHostAddress() + ":"
                 + address.getPort() + " for " + request.getTries() + " times");
             if (!future.get(MemcachedClient.DEFAULT_CONNECT_TIMEOUT, TimeUnit.MILLISECONDS)) {
               connected = false;
             } else {
               connected = true;
             }
           } catch (TimeoutException e) {
             future.cancel(true);
           } catch (ExecutionException e) {
             future.cancel(true);
           } finally {
             if (!connected) {
               this.rescheduleConnectRequest(request);
             } else {
               continue;
             }
           }
         }

When future.get(MemcachedClient.DEFAULT_CONNECT_TIMEOUT, TimeUnit.MILLISECONDS) timed out 60 seconds , TimeoutException will be thrown ,and future.cancel(true) is called.

But , the underlying connections is actually established fron the netstat , and the cancellation didn't really cancel the connection. So the connection size keeps growing, there were 2000+ ESTABLISHED connections to a single destination, even if connection pool config is default(1).

Network delay is actually within 10ms.

Maybe somewhere blocked in Reactor?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions