Skip to content

Fix pipelines and tests#1415

Open
bmehta001 wants to merge 12 commits intomicrosoft:mainfrom
bmehta001:bhamehta/ci-fixes
Open

Fix pipelines and tests#1415
bmehta001 wants to merge 12 commits intomicrosoft:mainfrom
bmehta001:bhamehta/ci-fixes

Conversation

@bmehta001
Copy link

@bmehta001 bmehta001 commented Mar 18, 2026

  • Modernize all GitHub Actions workflows to use latest action versions and fix pre-existing bugs across iOS, Android, and Windows that cause CI failures, crashes, data races, and flaky tests.

  • Several CI pipelines were failing or logging extensive warnings due to:

    • Deprecated GitHub Actions (checkout v1/v2, setup-java v3, setup-msbuild v1.1) running on EOL Node.js 16
    • Removed macos-13 runner images (deprecated Dec 2024)
    • Hardcoded paths that break on newer runner images
    • Pre-existing bugs in iOS HTTP client, threading, and test infrastructure

    Changes

  1. Modernize GitHub Actions (11 workflow files)
  • Updated actions/checkout from v1/v2/v3 to v4
  • Updated actions/setup-java from v3 to v5
  • Updated android-actions/setup-android from v2 to v3
  • Updated microsoft/setup-msbuild from v1.1 to v2
  • Updated actions/upload-artifact from v2 to v4
  1. Pipeline improvements
  • Add concurrency groups to cancel superseded runs
  • Replace deprecated macos-13 with macos-14 and update simulator matrix
  • Remove hardcoded C:\Users\runneradmin\ → use $Env:USERPROFILE
  • Remove hardcoded cmdline-tools\7.0\bin (setup-android v3 adds sdkmanager to PATH)
  • Update vs-version from [16,) to [17,) for VS 2022
  1. Fix iOS CI: Simulator UUID Targeting (build-tests-ios.sh)

xcodebuild on macos-14 lists duplicate simulator entries, causing it to pick the "Any iOS Device" placeholder instead of a concrete simulator (actions/runner-images#8621). Fix by resolving the simulator UUID via xcrun simctl and using -destination "id=".

  1. Fix iOS HTTP Client Crashes and Leaks (5 files)
  • Use-after-destroy crash: Replace shared NSURLSession with per-instance ephemeral sessions with safe teardown via invalidateAndCancel
    • Unlikely, aside from tests, for the SDK to be torn down and then be created again, but the ephemeral session over the "dispatch once" session can also potentially help to recover from zombie sessions, since we will re-create a new session
  • Session invalidation: Move to CancelAllRequests for proper lifecycle management
  • Response body leak: Add delete before setting httpResponse to nullptr in HttpResponseDecoder
  • Test reliability: Relax timing tolerances for CI simulator environments
  1. Fix Data Race and Memory Leak (2 files)
  • TransmissionPolicyManager.cpp: Move m_scheduledUploadTime write inside LOCKGUARD to decrease chance of concurrent read/write and made m_scheduledUploadTime atomic for same reason (to avoid adding additional uses of LOCKGUARD, especially ones that could cause deadlocks if we make them too broad)
  • WorkerThread.cpp: Clean up remaining task queues on shutdown instead of just logging
  1. Fix Static-Destruction-Order Crash (Logger.cpp)

Remove LOG_TRACE from ~Logger() to prevent crash when logging subsystem is destroyed before Logger instances during static destruction. Order of namespace destruction can change depending on system, but the original implementation was causing issues for the simulator.

  1. Fix Android Build Issues (3 files)
  • HttpClientFactory.cpp: Use standard __ANDROID__ macro instead of non-standard ANDROID
  • HttpClient_Android.cpp: Mark unused log tag with attribute((unused))
  • config-default.h: Add #ifndef guards to prevent duplicate macro definitions
  1. Fix Flaky Tests (5 files)
  • Replace localhost with 127.0.0.1 in functional tests to avoid IPv6 resolution failures
  • Add synchronization loop for storage-full callback in APITest to fix assertion race
  • Increase timing tolerances in PalTests for slower CI environments

Testing

All changes have been validated through:

  • Local Windows builds (VS 2022), WSL, Android Simulator on WSL, macOS, iOS Simulator on macOS

bmehta001 and others added 7 commits March 17, 2026 23:52
Update all actions to latest major versions:
- actions/checkout: v1/v2/v3 -> v4
- actions/setup-java: v3 -> v5
- android-actions/setup-android: v2 -> v3
- microsoft/setup-msbuild: v1.1 -> v2
- actions/upload-artifact: v2 -> v4

Fix Android CI:
- Use sdkmanager from PATH (setup-android v3 adds it automatically)
  instead of hardcoded cmdline-tools/7.0/bin directory
- Replace hardcoded C:\Users\runneradmin with \C:\Users\bhamehta

Fix Windows CI:
- Update vs-version from [16,) to [17,) to match VS 2022 runners

Fix iOS/macOS CI:
- Replace deprecated macos-13 runner with macos-14
- Update iOS deployment target to 14.0
- Update simulator matrix for macos-14/macos-15 runners

General:
- Add concurrency groups to cancel superseded workflow runs
- Updates apply to both active and disabled (.off) workflows

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
xcodebuild on macos-14 runners lists duplicate simulator entries,
causing it to pick the 'Any iOS Device' placeholder as the first
match. This results in: 'Tests must be run on a concrete device'.

Fix by resolving the simulator UUID via xcrun simctl and passing
-destination id=<UUID> which is completely unambiguous. This is the
recommended approach per xcodebuild docs and actions/runner-images#8621.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Use per-instance ephemeral NSURLSession to prevent stale connection
  reuse and ensure clean teardown without use-after-destroy crashes
- Move session invalidation to CancelAllRequests for safer lifecycle
- Fix response body leak in HttpResponseDecoder
- Relax flaky test tolerances for CI simulator environments

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…hread

- Move m_runningLatency assignment inside LOCKGUARD to prevent
  concurrent read/write race condition
- Clean up remaining task queues on WorkerThread shutdown instead
  of just logging, preventing memory leak

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Remove LOG_TRACE from ~Logger() to prevent crash when the logging
subsystem is torn down before Logger instances during static
destruction (observed on iOS simulator).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Use standard __ANDROID__ macro instead of non-standard ANDROID
  in HttpClientFactory.cpp
- Mark unused log tag with __attribute__((unused)) in
  HttpClient_Android.cpp
- Add #ifndef guards in config-default.h to prevent duplicate
  macro definitions

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Replace 'localhost' with '127.0.0.1' in functional tests to
  avoid IPv6 resolution failures on some CI environments
- Add synchronization loop for storage-full callback in APITest
  to avoid race condition in assertions
- Increase timing tolerances in PalTests for slower CI runners

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@bmehta001 bmehta001 requested a review from a team as a code owner March 18, 2026 05:39
bmehta001 and others added 4 commits March 18, 2026 01:34
The queue cleanup must not run after detach() — the detached thread
still needs the shutdown item to exit its event loop. Cleaning up
after detach deletes the shutdown signal, causing the thread to hang
and then access destroyed members (use-after-free).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
m_runningLatency is read without locks at lines 219 and 376, while
written from other threads. Making it std::atomic eliminates the
undefined behavior without requiring additional locks (which could
deadlock through upload callbacks). The existing locks remain for
other shared state they protect.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…sary)

Now that m_runningLatency is std::atomic, the write doesn't need
to be inside the lock scope. Restore the original position from
main to minimize the diff.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Move m_runningLatency and m_scheduledUploadTime back to before the
LOCKGUARD scope, matching the original code on main. The .cpp now
has zero diff from main — all threading fixes are in the .hpp
(std::atomic declaration).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
MSVC rejects passing std::atomic<T> to variadic functions (deleted
copy constructor). Use explicit .load() to pass the underlying
EventLatency value.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant