Skip to content

Race condition in SDK shutdown #6827

@greatvovan

Description

@greatvovan

Describe the bug
In some circumstances, after successful execution of an instrumented program, an exception is thrown on termination:

java.lang.NoClassDefFoundError: io/opentelemetry/sdk/common/CompletableResultCode
    at io.opentelemetry.sdk.trace.SpanProcessor.forceFlush (SpanProcessor.java:104)
    at io.opentelemetry.sdk.trace.SpanProcessor.shutdown (SpanProcessor.java:94)
    at io.opentelemetry.sdk.trace.TracerSharedState.shutdown (TracerSharedState.java:104)
    at io.opentelemetry.sdk.trace.SdkTracerProvider.shutdown (SdkTracerProvider.java:123)
    at io.opentelemetry.sdk.OpenTelemetrySdk.shutdown (OpenTelemetrySdk.java:111)
    at Example.lambda$main$0 (Example.java:21)
    at java.lang.Thread.run (Thread.java:1583)
Caused by: java.lang.ClassNotFoundException: io.opentelemetry.sdk.common.CompletableResultCode
    at org.codehaus.mojo.exec.URLClassLoaderBuilder$ExecJavaClassLoader.loadClass (URLClassLoaderBuilder.java:198)
    at java.lang.ClassLoader.loadClass (ClassLoader.java:526)
    at io.opentelemetry.sdk.trace.SpanProcessor.forceFlush (SpanProcessor.java:104)
    at io.opentelemetry.sdk.trace.SpanProcessor.shutdown (SpanProcessor.java:94)
    at io.opentelemetry.sdk.trace.TracerSharedState.shutdown (TracerSharedState.java:104)
    at io.opentelemetry.sdk.trace.SdkTracerProvider.shutdown (SdkTracerProvider.java:123)
    at io.opentelemetry.sdk.OpenTelemetrySdk.shutdown (OpenTelemetrySdk.java:111)
    at Example.lambda$main$0 (Example.java:21)
    at java.lang.Thread.run (Thread.java:1583)
Oct 27, 2024 5:42:11 AM io.opentelemetry.sdk.OpenTelemetrySdk shutdown
INFO: Multiple shutdown calls
[WARNING] 
java.lang.NoClassDefFoundError: io/opentelemetry/sdk/common/CompletableResultCode
    at io.opentelemetry.sdk.OpenTelemetrySdk.shutdown (OpenTelemetrySdk.java:108)
    at io.opentelemetry.sdk.OpenTelemetrySdk.close (OpenTelemetrySdk.java:119)
    at java.lang.Thread.run (Thread.java:1583)
Caused by: java.lang.ClassNotFoundException: io.opentelemetry.sdk.common.CompletableResultCode
    at org.codehaus.mojo.exec.URLClassLoaderBuilder$ExecJavaClassLoader.loadClass (URLClassLoaderBuilder.java:198)
    at java.lang.ClassLoader.loadClass (ClassLoader.java:526)
    at io.opentelemetry.sdk.OpenTelemetrySdk.shutdown (OpenTelemetrySdk.java:108)
    at io.opentelemetry.sdk.OpenTelemetrySdk.close (OpenTelemetrySdk.java:119)
    at java.lang.Thread.run (Thread.java:1583)

Steps to reproduce
I am not sure how reliable my repro, but I hope maybe Docker makes it more stable.
https://github.com/greatvovan/otel-log4j-file-config/tree/bug/class-not-found

% docker run -it --rm -v $(pwd)/otel-log4j-file-config:/prj maven bash
# cd /prj
# mvn compile
# export OTEL_LOGS_EXPORTER=none OTEL_METRICS_EXPORTER=none OTEL_TRACES_EXPORTER=none
# mvn exec:java

This reliably throws exception every time for me.

Originally I discovered the error in the following way. I have this example for Log4j Appender:
https://github.com/greatvovan/otel-log4j-file-config/tree/bug/repro-runtime-hook
(Note that OTEL_LOGS_EXPORTER is set to console for this one).

It works fine, but if I comment the rows for OpenTelemetryAppender in log4j2.xml, the error appears. As from the semantics perspective it makes absolutely no sense, I conclude that it may be related to the order or timing of how the library cleans up the resources on program exit.

If I change the shutdown hook to a synchronous call to openTelemetry.shutdown(); it solves the problem but it posts

INFO: Multiple shutdown calls

I think it is the least of the evils, as it's either an info message or nothing.

What did you expect to see?
No weird exceptions.

What did you see instead?
A confusing exception.

What version and what artifacts are you using?
Artifacts: sdk, Console Exporter
Version: (e.g., 1.43.0 or bom 2.9.0)
How did you reference these artifacts? pom.xml

Environment
Compiler: Temurin-21.0.4+7 (build 21.0.4+7-LTS)
OS: Ubuntu 24.04.1 LTS
Runtime (if different from JDK above): "21.0.4" 2024-07-16 LTS

Additional context
I found the exception harmless, but it is very confusing for the user to see it, because it leads to internals of the library and makes one think something is wrong with instrumentation. Also it complains on the class that 100% exists in the class path (the line import io.opentelemetry.sdk.common.CompletableResultCode succeeds in run time), which can make user think some dependencies are missing while it is not like that.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions