opentelemetry-c-wrapper/MEMO at main · haproxytech/opentelemetry-c-wrapper · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
OpenTelemetry C++ Multithreading, Context, and Spans
====================================================

The OpenTelemetry C++ API follows a specific concurrency model that differs from
traditional shared-state multithreading.  Understanding this model is critical
for correct usage and avoiding common pitfalls.


------------------------------------------------------------------------
1. Design Philosophy
------------------------------------------------------------------------

OpenTelemetry C++ avoids shared mutable state.  The API is built on four
principles:

Immutability:
  Core types (Context, SpanContext) cannot be modified after creation.  Any
  "change" produces a new object.  No defensive copying needed, no read/write
  races possible.

Thread-local activation:
  The "current span" concept exists per-thread, not globally.  Each thread
  maintains its own activation stack via thread_local storage.  Creating a Scope
  object activates a span only for the calling thread.

Explicit propagation:
  Context does not automatically flow between threads.  When spawning a worker
  thread or async task, the parent thread must explicitly pass Context or
  extract it for injection into RPC headers.  No implicit inheritance.

Internally synchronized recording:
  Span recording operations (SetAttribute, AddEvent, End) are internally
  mutex-protected by the SDK.  Multiple threads can safely record to the same
  span instance without external synchronization.


------------------------------------------------------------------------
2. Spans
------------------------------------------------------------------------

Spans represent units of work in a distributed trace.  They record timing,
attributes, events, and status.

Conceptual model:

  Spans are write-only sinks for telemetry data.  The API provides methods to
  add information (SetAttribute, AddEvent, SetStatus, End) but intentionally
  omits methods to read that information back.

  A span is not a data structure queried during program execution.  It is an
  append-only log that gets flushed to a backend.

Thread safety guarantees:

  Recording operations are safe from multiple threads:
    thread_1: span->SetAttribute("key1", value1);
    thread_2: span->SetAttribute("key2", value2);
    thread_3: span->End();

  The SDK internally synchronizes these calls (typically with a mutex protecting
  the span's attribute map and event list).  No external locking required.

What is NOT guaranteed:

  Reading span state during execution.  The C++ API has no GetAttribute or
  GetStatus methods.  Even if the underlying implementation exposes such
  methods, using them is incorrect:

    if (span->GetStatus() == StatusCode::kError) { ... }  // WRONG

  No happens-before relationship exists between span operations and application
  logic.  Span recording is asynchronous side-effect.

Common misuse patterns:

  Using spans as IPC mechanism:
    thread_1: span->SetAttribute("ready", true);
    thread_2: while (!span->GetAttribute("ready")) { sleep(); }

    This is fundamentally wrong.  Use proper synchronization primitives
    (condition variables, atomic flags).

  Treating spans as feature flags:
    if (current_span != nullptr) {
      enable_special_mode();
    }

    Observability should not alter program behavior.

  Assuming synchronous export:
    span->End();
    // span data NOT guaranteed to be exported yet

    Spans are typically batched and exported asynchronously.  Use explicit
    Flush() if needed (but rarely required in practice).


------------------------------------------------------------------------
3. Context
------------------------------------------------------------------------

Context is an immutable key-value map that carries cross-cutting concerns
(active span, baggage, correlation data) through the call stack and across
process boundaries.

Implementation characteristics:

  Immutability:
    Context ctx1 = Context::GetCurrent();
    Context ctx2 = ctx1.SetValue(key, value);
    // ctx1 unchanged, ctx2 is new Context with added key

  Structural sharing (copy-on-write):
    Context uses persistent data structure techniques (hash-array mapped tries
    or similar).  Copying a Context is O(1), not O(n).  "Modification" creates
    a new root node pointing to shared subtrees.

  No hidden mutable state:
    No reference counting that could race, no lazy initialization that requires
    locking.  Pure value type.

Thread safety by construction:

  Context can be freely copied between threads:
    std::thread worker([ctx]() {
      // ctx is a copy, completely independent
    });

  No synchronization needed because no mutation exists.  This is different from
  "thread-safe mutable object" (which requires locking).

Usage patterns:

  Propagation within a thread:
    auto ctx = Context::GetCurrent();
    auto span = tracer->StartSpan("operation");
    auto scope = tracer->WithActiveSpan(span);  // modifies thread-local
    // Context::GetCurrent() now returns different context

  Propagation across threads:
    auto ctx = Context::GetCurrent();
    std::thread worker([ctx]() {
      // Must explicitly attach
      auto scope = RuntimeContext::Attach(ctx);
      // Now this thread's GetCurrent() returns ctx
    });

  Propagation across processes (HTTP):
    auto ctx = Context::GetCurrent();
    TextMapPropagator->Inject(carrier, ctx);
    // carrier now contains "traceparent" header
    // Receiving side: Extract(carrier) reconstructs Context

Why immutability matters:

  Without immutability, Context would need locking:
    Context ctx;  // hypothetically mutable
    thread_1: ctx.Set("key1", val1);  // would need mutex
    thread_2: ctx.Get("key2");        // would need mutex

  Or would need thread-local-only usage (no sharing).  Immutability allows both
  free sharing AND zero synchronization cost.


------------------------------------------------------------------------
4. Scope
------------------------------------------------------------------------

Scope is an RAII object that manages the thread-local "current context" stack.
Creating a Scope pushes a Context onto the stack, destroying it pops that
Context off.

How it works:

  auto scope = RuntimeContext::Attach(ctx);
  // Context::GetCurrent() now returns ctx
  // When scope destructs, previous context is restored

  Typical usage with spans:
  {
    auto span = tracer->StartSpan("operation");
    auto scope = tracer->WithActiveSpan(span);
    // span is now "current"
    do_work();  // nested spans will be children
  } // scope destructs here, span no longer current

Thread-local nature:

  Scope uses thread_local storage internally (or equivalent).  Each thread has
  its own independent context stack:

    thread_1:
      auto scope1 = RuntimeContext::Attach(ctx_A);
      // GetCurrent() returns ctx_A in thread_1

    thread_2:
      auto scope2 = RuntimeContext::Attach(ctx_B);
      // GetCurrent() returns ctx_B in thread_2

  Neither thread affects the other.  No global "current span".

Critical pitfall: Scope does not cross thread boundaries:

  WRONG:
    auto span = tracer->StartSpan("parent");
    auto scope = tracer->WithActiveSpan(span);
    std::thread worker([&]() {
      // GetCurrent() does NOT return span here!
      // New thread has empty context stack.
    });

  CORRECT:
    auto span = tracer->StartSpan("parent");
    auto ctx = Context::GetCurrent();  // capture before thread creation
    std::thread worker([ctx, &tracer]() {
      auto scope = RuntimeContext::Attach(ctx);  // explicit attach
      auto child_span = tracer->StartSpan("child");  // now has parent
    });

Why this design:

  Automatic context inheritance across threads would require:
  - Hooking thread creation (not portable)
  - Deciding what to inherit (active span? all baggage? custom keys?)
  - Managing lifetime (when is the inherited context still valid?)

  Explicit propagation is more predictable and works with any threading model
  (std::thread, thread pools, async/await, fibers).


------------------------------------------------------------------------
5. Multithreading Model
------------------------------------------------------------------------

Common concurrency patterns and how OpenTelemetry handles them.

Pattern 1: Worker thread pool

  Main thread creates a span, dispatches work to pool:

    auto span = tracer->StartSpan("request");
    auto ctx = Context::GetCurrent();

    thread_pool.submit([ctx, &tracer, span]() {
      auto scope = RuntimeContext::Attach(ctx);
      auto worker_span = tracer->StartSpan("worker");
      // worker_span is child of span

      process_data();

      worker_span->End();
      span->SetAttribute("processed", true);  // safe, internal lock
    });

    span->End();  // can end before or after worker

  Key points:
  - Context captured before dispatch
  - Worker explicitly attaches context
  - Both threads can record to parent span safely
  - Parent span can be ended from any thread

Pattern 2: Fork-join parallelism

  Parent span, multiple child spans on worker threads:

    auto parent = tracer->StartSpan("parallel_operation");
    auto ctx = Context::GetCurrent();

    std::vector<std::future<void>> futures;
    for (int i = 0; i < N; ++i) {
      futures.push_back(std::async([ctx, i, &tracer]() {
        auto scope = RuntimeContext::Attach(ctx);
        auto child = tracer->StartSpan("parallel_" + std::to_string(i));
        do_work(i);
        child->End();
      }));
    }

    for (auto &f : futures) f.wait();
    parent->End();

Pattern 3: Long-lived background thread

  Background thread maintains its own trace context:

    void background_loop() {
      while (running) {
        auto span = tracer->StartSpan("background_iteration");
        auto scope = tracer->WithActiveSpan(span);

        process_queue();

        span->End();
      }
    }

  Each iteration is independent trace root (no parent).  If processing work
  items that came from other threads, those items should carry Context
  explicitly:

    struct WorkItem {
      Context ctx;  // captured when enqueued
      Data data;
    };

    void background_loop() {
      while (running) {
        WorkItem item = queue.pop();
        auto scope = RuntimeContext::Attach(item.ctx);
        auto span = tracer->StartSpan("process_item");
        // span is child of whatever was current when item was enqueued
        process(item.data);
        span->End();
      }
    }

What does NOT work:

  Assuming implicit propagation:
    void func() {
      auto span = tracer->StartSpan("parent");
      auto scope = tracer->WithActiveSpan(span);
      std::thread worker([]() {
        auto current = trace::GetSpan(Context::GetCurrent());
        // current is INVALID span, not parent!
      });
    }

  Relying on span state for synchronization:
    thread_1: span->SetAttribute("done", true);
    thread_2: poll until span has "done" attribute

    No mechanism exists to query span state, and even if it did, span API is not
    a synchronization primitive.

Core principle:

  Context propagation is a data flow problem, not a control flow problem.  Treat
  Context like any other value passed to functions or threads.


------------------------------------------------------------------------
6. Read and Write Locking Expectations
------------------------------------------------------------------------

Traditional concurrent data structures provide reader-writer locks: multiple
readers OR single writer.  OpenTelemetry uses different strategies.

Context: no locking needed

  Context is immutable (copy-on-write).  No read/write distinction exists
  because no writes exist.  All operations return new Context instances.  Zero
  synchronization cost.

Spans: write-only, internal locking

  Span recording operations (SetAttribute, AddEvent, End) are writes.  No read
  operations exposed by API.  SDK implementation typically uses mutex per span:

    void Span::SetAttribute(string_view key, AttributeValue value) {
      std::lock_guard<std::mutex> lock(mu_);
      attributes_[key] = value;
    }

  Multiple threads writing to same span serialize on this mutex.  This is
  acceptable because:
  - Span writes are infrequent relative to actual work
  - Contention is low (different threads usually record to different spans)
  - Simplicity outweighs lock-free complexity

Providers/Tracers: immutable after initialization

  TracerProvider, MeterProvider, LoggerProvider are initialized once at startup,
  then treated as immutable:

    auto provider = MakeTracerProvider();  // initialization
    trace::Provider::SetTracerProvider(provider);  // publish
    // No further modification allowed

  Tracer and Meter instances obtained from providers are thread-safe for all
  operations (StartSpan, CreateCounter, etc.) because they only read immutable
  configuration.

Instruments (metrics): lock per instrument

  Counter/Histogram/Gauge recording:

    counter->Add(1, labels);

  SDK maintains per-instrument aggregation state protected by mutex.  Similar to
  span recording, contention is acceptable for typical workloads.

When locks are NOT used:

  Thread-local context stack:
    No mutex needed because each thread has independent storage.

  Atomic reference counting:
    shared_ptr uses atomic operations, not mutexes.

  Batching exporters:
    Use lock-free queues or mutexes depending on implementation.  Not exposed to
    API users.

Design principle:

  Prefer immutability and internal synchronization over exposing locking
  requirements to users.  Users should not need to think about mutexes when
  instrumenting code.


------------------------------------------------------------------------
7. Correct Mental Model
------------------------------------------------------------------------

Mental model comparison:

CORRECT mental model:

  OpenTelemetry is structured logging with distributed context.

  Context is like function parameters:
    - Passed explicitly down the call stack
    - Copied when creating threads
    - Immutable values

  Spans are like log statements:
    - Fire and forget
    - Buffered and flushed asynchronously
    - No readback mechanism

  Scope is like RAII resource management:
    - Constructor acquires (activates context)
    - Destructor releases (restores previous context)
    - Strictly lexical scoping

INCORRECT mental model:

  OpenTelemetry is NOT a global variable system:
    - No "current span" accessible from anywhere
    - Thread-local storage requires explicit setup per thread

  OpenTelemetry is NOT a coordination mechanism:
    - Cannot use spans to signal between threads
    - Cannot poll span state to make decisions

  OpenTelemetry is NOT synchronous:
    - Span data may not be exported immediately
    - No guarantee of happens-before between span operations
      and external observable effects

Analogies to other systems:

  Like errno (thread-local state):
    - Each thread has independent errno value
    - Setting errno in one thread doesn't affect others
    - Similar to Context being thread-local

  Like cout (write-only sink):
    - You write to cout, but don't read back what you wrote
    - Output may be buffered, not immediately visible
    - Similar to span recording

  Like std::optional (value semantics):
    - Context behaves like value type, not reference type
    - Copying is cheap and creates independent instance
    - No hidden aliasing

What this means in practice:

  If code relies on "current span" being set:
    void instrumented_function() {
      auto span = trace::GetSpan(Context::GetCurrent());
      if (span->IsValid()) {
        // only instrument if called in traced context
      }
    }

  This is fragile.  Better approach:
    void instrumented_function() {
      auto span = tracer->StartSpan("operation");
      auto scope = tracer->WithActiveSpan(span);
      // always create span, let sampling decide
    }

  If code tries to use spans for control flow:
    span->SetAttribute("phase", "started");
    background_thread.notify();  // WRONG

  Use proper primitives:
    std::atomic<bool> phase_started{false};
    phase_started.store(true, std::memory_order_release);
    background_thread.notify();
    span->SetAttribute("phase", "started");  // informational only


------------------------------------------------------------------------
8. Common Mistakes
------------------------------------------------------------------------

Mistake 1: Not propagating context to new threads

  void handle_request() {
    auto span = tracer->StartSpan("request");
    auto scope = tracer->WithActiveSpan(span);

    std::thread worker([]() {
      // BUG: no context here, new span will be root
      auto span = tracer->StartSpan("work");
    });
  }

  Fix:
    auto ctx = Context::GetCurrent();
    std::thread worker([ctx]() {
      auto scope = RuntimeContext::Attach(ctx);
      auto span = tracer->StartSpan("work");
    });

Mistake 2: Ending span in wrong scope

  void process() {
    auto span = tracer->StartSpan("process");
    auto scope = tracer->WithActiveSpan(span);
    do_work();
    // BUG: span not ended, leaks until scope destructs
  }

  Fix:
    auto span = tracer->StartSpan("process");
    auto scope = tracer->WithActiveSpan(span);
    do_work();
    span->End();  // explicit end before scope destructs

  Or rely on RAII:
    {
      auto span = tracer->StartSpan("process");
      auto scope = tracer->WithActiveSpan(span);
      do_work();
    } // scope destructs, but span->End() must be called explicitly

  Actually, spans typically need explicit End() call.  Check SDK documentation
  for whether automatic ending is supported.

Mistake 3: Capturing span by reference in async lambda

  void async_operation() {
    auto span = tracer->StartSpan("operation");
    thread_pool.submit([&span]() {  // BUG: reference to stack variable
      span->SetAttribute("key", value);  // may dangle
    });
  }

  Fix (copy shared_ptr):
    auto span = tracer->StartSpan("operation");
    thread_pool.submit([span]() {  // copy shared_ptr
      span->SetAttribute("key", value);
    });

Mistake 4: Forgetting to create scope

  void nested_operations() {
    auto parent = tracer->StartSpan("parent");
    // BUG: no scope, so parent not current

    auto child = tracer->StartSpan("child");
    // child is sibling of parent, not child
  }

  Fix:
    auto parent = tracer->StartSpan("parent");
    auto scope = tracer->WithActiveSpan(parent);

    auto child = tracer->StartSpan("child");
    // child correctly parented

Mistake 5: Assuming span data is queryable

  void conditionally_instrument() {
    auto span = trace::GetSpan(Context::GetCurrent());
    if (span->GetAttribute("debug_mode")) {  // NO SUCH METHOD
      enable_verbose_logging();
    }
  }

  Application state should not be stored in spans.  Use separate
  variables:
    bool debug_mode = get_debug_flag();
    if (debug_mode) {
      enable_verbose_logging();
    }
    span->SetAttribute("debug_mode", debug_mode);  // informational

Mistake 6: Race condition on span end

  void parallel_work() {
    auto span = tracer->StartSpan("parallel");

    std::thread t1([span]() { work1(); span->End(); });
    std::thread t2([span]() { work2(); span->End(); });

    // BUG: span->End() called twice
  }

  Fix (coordinator pattern):
    auto span = tracer->StartSpan("parallel");

    std::thread t1([span]() { work1(); });
    std::thread t2([span]() { work2(); });

    t1.join();
    t2.join();
    span->End();  // main thread ends after join

Mistake 7: Not handling exceptions with spans

  void risky_operation() {
    auto span = tracer->StartSpan("risky");
    auto scope = tracer->WithActiveSpan(span);

    might_throw();  // BUG: span not ended on exception

    span->End();
  }

  Fix (RAII or explicit exception handling):
    auto span = tracer->StartSpan("risky");
    auto scope = tracer->WithActiveSpan(span);

    try {
      might_throw();
      span->End();
    } catch (...) {
      span->SetStatus(StatusCode::kError);
      span->End();
      throw;
    }

  Better: use defer/finally pattern or scope guard to ensure End() is always
  called.

Mistake 8: Creating too many spans

  void loop_with_spans() {
    for (int i = 0; i < 1000000; ++i) {
      auto span = tracer->StartSpan("iteration");  // wasteful
      process(i);
      span->End();
    }
  }

  Spans have overhead (allocation, locking, export).  Batch work:
    auto span = tracer->StartSpan("loop");
    for (int i = 0; i < 1000000; ++i) {
      process(i);
    }
    span->SetAttribute("iterations", 1000000);
    span->End();

  Or span only significant iterations:
    for (int i = 0; i < 1000000; ++i) {
      if (is_significant(i)) {
        auto span = tracer->StartSpan("iteration_" + std::to_string(i));
        process(i);
        span->End();
      } else {
        process(i);
      }
    }


------------------------------------------------------------------------
9. Rule of Thumb
------------------------------------------------------------------------

Design principles summary:

  Immutability over locking:
    Context is immutable.  No locks needed.  Compare to alternative design where
    Context is mutable and requires reader-writer lock on every access.

  Thread-local over global:
    Each thread maintains independent activation stack.  No contention.  Compare
    to global "current span" that would require atomic operations or thread-safe
    stack.

  Internal synchronization over exposed locking:
    Span recording methods are internally mutex-protected.  API users never see
    locks.  Compare to API that returns "lock span before modifying"
    requirements.

  Explicit over implicit:
    Context passed explicitly to threads.  No magic.  Compare to automatic
    inheritance that breaks with thread pools or async/await.

  Write-only over read-write:
    Spans are append-only.  No need for consistent snapshots or versioning.
    Compare to queryable span state that would need consistent read protocols.

When to apply these principles to wrapper code:

  The C wrapper (this codebase) must preserve these semantics.  If exposing
  OpenTelemetry to C:

  - Do NOT create global "current span" C APIs
  - Do NOT expose span reading functions (even if C++ SDK has them)
  - Do require explicit context passing in C APIs for thread boundaries
  - Do use opaque handles (IDs) rather than raw pointers for safety

  The sharded map implementation (section 10) follows these principles:
  - Handles are write-heavy (create, destroy), rarely read after creation
  - Per-shard locking avoids global contention
  - Single-threaded mode has zero locking overhead
  - Template-based approach = no runtime configuration overhead

OpenTelemetry avoids concurrency problems by design, not by solving them with
sophisticated locking.  The wrapper should maintain this philosophy.


------------------------------------------------------------------------
10. Sharded Map Implementation
------------------------------------------------------------------------

The C wrapper maintains handle maps (otel_handle<T>) that translate integer IDs
to C++ OpenTelemetry objects (spans, contexts, instruments, views).  A sharded
hash map implementation reduces contention in multi-threaded workloads while
maintaining single-threaded performance.

Access pattern analysis:

  Span and span_context handles:
    - Created and destroyed frequently (per-request in HTTP servers)
    - High concurrent access from multiple worker threads
    - Short-lived (milliseconds to seconds)
    - Configured with 64 shards

  Instrument and view handles:
    - Created once during initialization
    - Rarely accessed after creation (cached by application)
    - Long-lived (process lifetime)
    - Configured with 1 shard (degenerates to single map)

Key implementation details:

  Template-based shard count:
    template<typename T, size_t num_shards = 64>
    struct otel_handle { ... };

    Compile-time configuration.  Different handle types instantiate with
    different shard counts.  static_assert ensures num_shards is a power of 2.

  Storage: std::array not std::vector:
    std::array<struct shard, num_shards> shards;

    Stack allocation, not heap.  No malloc on handle manager creation.  No
    pointer indirection on shard access.  std::vector required heap allocation
    and added measurable overhead in single-threaded benchmarks.

  Shard selection: bitwise AND not modulo:
    size_t get_shard_index(int64_t key) const noexcept {
        return static_cast<size_t>(key) & (num_shards - 1);
    }

    On x86-64: single AND instruction vs IDIV (20-80 cycles).
    When num_shards=1: (key & 0) = 0, dead code eliminated by optimizer.
    Requires power-of-2 shard count (enforced by static_assert).

Per-operation locking strategy:

  Single-key operations (find/emplace/erase):
    Per-shard std::mutex under OTELC_USE_THREAD_SHARED_HANDLE.  Lock only the
    target shard.  Other threads access other shards in parallel.  In
    single-threaded mode, no mutex exists (not just unlocked, but compiled out
    entirely).

  Cross-shard operations (find_if/for_each):
    Global std::shared_mutex for consistent view across all shards.  Shared lock
    allows multiple concurrent readers.  Exclusive lock for modifications.

  Atomic find-or-insert (find_or_emplace):
    Prevents duplicate entries when multiple threads try to create the same
    instrument simultaneously.  Two-phase:

    Phase 1 (optimistic):
      Acquire shared lock, search all shards.  If found, return existing.  Most
      calls succeed here (instrument already exists).

    Phase 2 (pessimistic):
      Release shared lock, acquire exclusive lock.  Search again (another
      thread may have inserted while upgrading).  If still not found, insert.
      Guarantees exactly one entry per unique predicate.

    Single-threaded version skips all locking, uses plain id++ instead of
    id.fetch_add(1).

Why other approaches were rejected:

  Initial attempt: global mutex serializing all operations.
    Result: worse than unsharded map.  Sharding added overhead without
    parallelism benefit.  Per-shard locking is mandatory.

  Lock-free hash map (libcds, junction, etc):
    More complex, harder to debug.  Per-shard locking is sufficient for observed
    workloads.  Lock-free helps when lock acquisition overhead dominates, but
    handle lookups are infrequent relative to actual span recording operations.

  std::shared_mutex for per-shard locks:
    Overkill.  Single-key operations are short (hash lookup, insert).
    Shared/exclusive distinction only helps find_or_emplace, which already uses
    global shared_mutex.

  Runtime-configurable shard count:
    Every shard access pays indirect lookup cost.  Template specialization at
    compile time is zero-cost abstraction.

  Larger shard counts (128, 256):
    Diminishing returns.  64 shards sufficient for observed thread counts
    (typically 8-32 worker threads).  More shards = more memory (each shard
    preallocates buckets), worse cache locality.

Measured behavior:

  Multi-threaded with OTELC_USE_THREAD_SHARED_HANDLE:
    Scaling is sub-linear (2x threads != 2x throughput) due to:
    - Occasional cross-shard operations (find_or_emplace)
    - Cache line bouncing on atomic id counter
    - Lock acquisition overhead
    Acceptable for production use.

  Single-threaded without OTELC_USE_THREAD_SHARED_HANDLE:
    Performance matches non-sharded std::unordered_map baseline.
    std::array + bitwise AND + no mutexes = near-zero sharding cost.
    When num_shards=1, optimizer produces identical code to direct map access.


------------------------------------------------------------------------
11. Initialization and Deinitialization Architecture
------------------------------------------------------------------------

The library follows a layered initialization model.  A single global entry point
opens the YAML configuration, then per-component factory functions build the
provider/exporter/processor stack from that configuration.  Deinitialization
tears everything down in reverse.

Global entry points:

  otelc_init (src/util.cpp):
    Opens the YAML configuration file.  Must be called before any component is
    created.

  otelc_deinit (src/util.cpp):
    Destroys tracer, meter, and logger (if provided), closes the YAML
    configuration document, and resets external callbacks and log handlers.

  otelc_ext_init (include/opentelemetry-c-wrapper/util.h):
    Registers external malloc, free, and thread-ID callbacks.  Called from
    otelc_deinit to reset them.

Per-component lifecycle:

  Each component (tracer, meter, logger) follows an identical pattern: a public
  create function calls an internal factory, which allocates via OTELC_CALLOC,
  zeroes the err and scope_name fields, and wires the ops vtable.

  Component    Public create            Factory          Destroy
  ---------    ---------------          ----------       ----------
  Tracer       otelc_tracer_create      otel_tracer_new  otel_tracer_destroy
  Meter        otelc_meter_create       otel_meter_new   otel_meter_destroy
  Logger       otelc_logger_create      otel_logger_new  otel_logger_destroy

  Structures returned to callers:

    otelc_tracer  (tracer.h)  -- err, scope_name, ops
    otelc_meter   (meter.h)   -- err, scope_name, ops
    otelc_logger  (logger.h)  -- err, scope_name, min_severity, ops

  The ops field in each structure is a vtable pointer that dispatches all
  subsequent operations through indirect function calls.

Handle initialization:

  Tracer initialization creates two handle maps:
    - otel_span handle (otel_handle<otel_span_handle *>)
    - otel_span_context handle (otel_handle<otel_span_context_handle *>)
    Both use OTEL_HANDLE_MAP_SHARDS (64) for concurrent access.

  Meter initialization creates two handle maps:
    - otel_instrument handle (otel_handle<otel_instrument_handle *>)
    - otel_view handle (otel_handle<otel_view_handle *>)
    Both use 1 shard (instruments are long-lived, low contention).

  Logger does not use handle maps.

  When OTELC_USE_THREAD_SHARED_HANDLE is defined, handles are shared across
  threads with per-shard mutexes.  Otherwise, handles are thread-local with
  no locking overhead.

Span and span context lifecycle:

  otel_span_new (src/span.cpp):
    Allocates via OTEL_EXT_MALLOC, assigns a monotonically increasing index
    from the otel_span handle, and wires the span ops vtable.  Increments
    alloc_fail_cnt on allocation failure.

  otel_span_destroy / otel_nolock_span_destroy (src/span.cpp):
    Ends the span and erases it from the handle map.  The nolock variant skips
    lock acquisition for use inside already-locked contexts.

  otelc_span_context_create / otel_span_context_new (src/span.cpp):
    Constructs an otel_trace::SpanContext from trace/span IDs and trace flags,
    wraps it in a DefaultSpan and Context, then emplaces into the
    otel_span_context handle map.

  otel_span_context_destroy / otel_nolock_span_context_destroy:
    Erases the span context from the handle map.

Provider/exporter/processor stack:

  Each component internally creates a provider backed by one or more exporters
  and processors.  The stack is built bottom-up:

    1. otel_*_exporter_create (src/exporter.cpp)
       Creates an exporter from YAML config.  Tracer supports: InMemory,
       OStream, OTLP File, OTLP gRPC, OTLP HTTP, Zipkin, Elasticsearch.
       Meter and logger support similar variants.

    2. otel_tracer_processor_create / otel_logger_processor_create
       (src/processor.cpp)
       Wraps the exporter in a Batch or Simple processor.  Batch mode
       is configured via max_queue_size, schedule_delay, export_timeout,
       and max_export_batch_size.  Wraps with counting_exporter and
       counting_span_processor for dropped-span tracking.

    3. otel_sampler_create (src/sampler.cpp)
       Tracer-only.  Creates AlwaysOn, AlwaysOff, TraceIdRatioBased, or
       ParentBased sampler from YAML config.

    4. otel_*_provider_create (src/provider.cpp)
       Builds a Resource from environment and YAML attributes via
       otel_resource_create (src/resource.cpp), then constructs the provider
       with processors (and sampler for tracer, metric reader for meter).

  Destruction reverses this stack:

    1. otel_*_destroy clears the global atomic pointer, preventing new
       operations.
    2. Force-flushes the provider with a 5-second timeout.
    3. Resets the global provider via Set*Provider(nullptr).
    4. Frees err and scope_name strings.

  For the tracer, otel_tracer_destroy additionally:
    - Ends all remaining spans in the handle map.
    - Destroys span and span context handles.
    - Clears the global text map propagator.

Global state:

  Each component maintains two global variables:
    - otel_nostd::shared_ptr<T> *_owner (reference-counted ownership)
    - std::atomic<T *> *_ptr (lock-free access for hot paths)

  The atomic pointer is cleared first during destruction, acting as a gate that
  causes in-flight operations to see nullptr and bail out before the provider
  is torn down.

Library constructor/destructor (debug builds only):

  otelc_lib_constructor (src/util.cpp):
    __attribute__((constructor)), currently a no-op.

  otelc_lib_destructor (src/util.cpp):
    __attribute__((destructor)), currently a no-op.

Overall initialization flow:

  otelc_init
    |
    +-- opens YAML configuration
    |
    +-- otelc_*_create  (one per component)
          |
          +-- otel_*_new  (allocate struct, wire vtable)
          |
          +-- handle map initialization  (tracer, meter only)
          |
          +-- otel_*_exporter_create
          +-- otel_*_processor_create
          +-- otel_sampler_create         (tracer only)
          +-- otel_meter_reader_create    (meter only)
          +-- otel_*_provider_create

Overall deinitialization flow:

  otelc_deinit
    |
    +-- otel_*_destroy  (one per component, via ops vtable)
          |
          +-- clear atomic pointer
          +-- end remaining spans          (tracer only)