Skip to content

Conversation

@tmckayus
Copy link
Contributor

@tmckayus tmckayus commented Jan 13, 2026

Description

This change adds a new remote execution system which will ultimately replace the cuopt Python server. In this new implementation, remote solves are identical to local solves except that the env vars CUOPT_REMOTE_HOST and CUOPT_REMOTE_PORT are set to indicate a remote server.

The remote server code is included and can be built optionally with "build.sh cuopt_remote_server". It does not build by default.

Internal code paths have been modified to use data_model_view_t with non-owning pointers to store problems and solutions as GPU memory or CPU memory, the former for local solves and the latter for remote solves.

Some markdown files have been added that describe the changes:

  1. REMOTE_SOLVE_GUIDE.md (Basic understanding)
  2. REMOTE_SOLVE_MODES.md (Protocol details)
  3. remote_solve_architecture.md (Deep architecture)
  4. SOLUTION_MEMORY_ARCHITECTURE.md (Memory management)
  5. SERIALIZATION_PLUGIN_GUIDE.md (If extending)

Checklist

Need to:

evaluate the need for new tests
review the available markdowns and make sure the feature is well-documented

Summary by CodeRabbit

  • New Features

    • Remote solve platform: sync & async solves, job submit/status/result, cancellation, streaming logs, pluggable serializers (Protobuf & MsgPack), CLI and C/Python client helpers, and lazy GPU submodule loading.
  • Documentation

    • Added comprehensive remote-solve, serialization-plugin, solution-memory, and architecture guides.
  • Tests

    • New unit/integration tests and example scripts exercising remote workflows, cancellation, and client-server interactions.
  • Chores

    • New remote-server build target, Protobuf dependency and installer, packaging/build updates, and CI install step.

✏️ Tip: You can customize this high-level summary in your review settings.

This adds a header that provides convenient aliases for data_model_view_t
and span<T> in the cuopt::linear_programming namespace. The canonical
implementation remains in cuopt::mps_parser to maintain mps_parser as
a standalone library.
- Refactored problem_and_stream_view_t to use unique_ptr for both CPU and GPU storage
- Check is_remote_solve_enabled() at problem creation time
- Local solve: store problem directly in GPU memory (optimization_problem_t)
- Remote solve: store problem in CPU memory (problem_cpu_data_t)
- cuOptSolve routes to appropriate path based on is_remote_configured flag
- Updated all cuOptGet* functions to read from correct storage
- No memory overhead: only one storage type is allocated per problem
- Maintains backward compatibility with existing C API
- Add is_device_memory field to data_model_view_t to track memory location
- Simplify solve_lp/solve_mip overloads: mps_data_model_t now creates view and delegates
- C API checks remote config at creation time: GPU memory for local, CPU for remote
- Handle edge case: GPU data + remote config copies to CPU with warning
- Add white-box tests for all local/remote + CPU/GPU combinations
- Modify cython_solve.cu to check for remote solve configuration early
- When remote is enabled, mark data_model as CPU memory and call
  solve_lp/solve_mip with data_model_view_t overloads
- Fix segfault by initializing warm start data to empty device buffers
  instead of nullptr when returning from remote solve path
- Local solve path unchanged - still creates GPU optimization_problem_t
…emote solve

- Add is_device_memory flag to optimization_problem_solution_t and mip_solution_t
- Use std::unique_ptr for both CPU (std::vector) and GPU (rmm::device_uvector) storage
- Add _host() getters for C++ users to access CPU-resident solution data
- Update cython_solve.cu to populate CPU or GPU storage based on solve type
- Update Python wrapper to transparently handle both CPU and GPU solution data
- Backward compatible: existing GPU-based code unchanged, Python users need no changes
- Check is_remote_solve_enabled() before any CUDA initialization in CLI
- Skip raft::handle creation and memory resource setup for remote solve
- Use CPU-only solution constructors when remote solve returns error
- Handle nullptr handle_ptr gracefully in solve_lp/solve_mip
- Enables CLI to run on hosts without GPU when remote solve is configured
- Make raft::handle_t in problem_and_stream_view_t lazy (unique_ptr)
- Make stream_view in solution_and_stream_view_t optional
- Check is_remote_solve_enabled() before creating handle in cuOptSolve
- Update cuOptGetPrimalSolution/DualSolution/ReducedCosts to handle CPU data
- Enables C API to run on hosts without GPU when remote solve is configured
- cython_solve.cu: Check remote solve before CUDA stream/handle creation
  - Remote solve path passes nullptr handle to solve_lp/solve_mip
  - Local solve path creates CUDA resources only when needed

- cuopt/__init__.py: Make linear_programming and routing imports lazy
  - Uses __getattr__ for deferred module loading
  - Allows 'import cuopt' on CPU-only hosts with remote solve

- solver_wrapper.pyx: Lazy CUDA imports and remote solve detection
  - Remove eager imports of cupy, cudf, numba.cuda
  - Add is_remote_solve_enabled() Python function
  - Add _get_cuda_imports() for lazy module loading
  - Update create_solution() to check is_device_memory_ for warm start

- utilities/__init__.py: Lazy imports for CUDA-dependent utilities
  - type_cast, series_from_buf, check_solution via __getattr__

- utilities/utils.py: Lazy cudf/pylibcudf imports in series_from_buf

Enables Python programs to run on CPU-only hosts when remote solve
is configured via CUOPT_REMOTE_HOST and CUOPT_REMOTE_PORT.
Features:
- Remote solve server (cuopt_remote_server) with protobuf serialization
- Real-time log streaming from server to client via stdout capture
- Transparent remote solve via CUOPT_REMOTE_HOST/PORT environment variables
- Pluggable serialization interface (remote_serialization.hpp)
- Static linking of libmps_parser into libcuopt for simpler deployment

Components:
- cuopt_remote_server.cpp: Server with streaming protocol
- cuopt_remote.proto: Protocol Buffers schema for LP/MIP problems
- protobuf_serializer.cu: Serialization implementation
- remote_solve.cu: Client-side remote solve with streaming support

Build:
- Added BUILD_REMOTE_SERVER cmake option
- Added cuopt_remote_server target to build.sh
- Added mps_parser_static library for static linking

The server captures stdout during solve and streams it to clients while
also echoing to the server console (for docker logs, monitoring, etc.).
When users set constraints using set_row_types() + set_constraint_bounds()
instead of set_constraint_lower/upper_bounds(), the serializer now computes
the constraint lower/upper bounds from the row types:
- 'E' (equality): lb = ub = rhs
- 'L' (less-than): lb = -inf, ub = rhs
- 'G' (greater-than): lb = rhs, ub = inf
- 'N' (free): lb = -inf, ub = inf

Also fixed server to always use streaming protocol for solutions
(even when --no-stream is set, only log streaming is disabled).
- Integrate solve_mip_remote() into solve_mip() dispatch
- Serialize variable types (integer/binary) to protobuf is_integer/is_binary fields
- Deserialize variable types on server side
- Fix is_mip_request() to check header.problem_type for reliable detection

MIP remote solve now correctly handles integer and binary variables.
- Add to_host() method to LP/MIP solution classes for GPU->CPU copy
- Server calls to_host() before serialization, simplifying serializer impl
- Add dynamic serializer loading via CUOPT_SERIALIZER_LIB env var
- Add MsgPack serializer plugin as proof-of-concept
- Add serialization plugin development guide
Features:
- Async job management: SUBMIT_JOB, CHECK_STATUS, GET_RESULT, DELETE_RESULT, GET_LOGS
- Worker processes with shared memory job queues (fork-based, not threads)
- File-based log streaming for async mode (matches Python server architecture)
  - Per-job log files: /tmp/cuopt_logs/log_{job_id}
  - Incremental reads via byte offset (frombyte parameter)
  - Automatic cleanup on job deletion
- Sync mode still uses real-time socket streaming
- CUOPT_REMOTE_USE_SYNC=1 env var to force sync mode

Protocol additions:
- RequestType.GET_LOGS for log retrieval
- AsyncRequest.frombyte for incremental log reads
- LogsResponse with log_lines, nbytes, job_exists

Client modes:
- Sync (blocking=true): Real-time log streaming over socket
- Async (blocking=false): Submit -> Poll (logs + status) -> Get -> Delete
Server (cuopt_remote_server.cpp):
- Add worker process monitoring with waitpid
- Automatically restart dead workers
- Mark in-progress jobs as FAILED when worker dies
- Add flush calls to monitoring log messages

Msgpack serializer:
- Implement full async interface support
- Add async message types (MSG_ASYNC_LP/MIP_REQUEST, etc.)
- Implement server-side methods: is_async_request, is_blocking_request,
  extract_problem_data, get_async_request_type, get_job_id, get_frombyte
- Implement response serialization: serialize_submit_response,
  serialize_status_response, serialize_result_response,
  serialize_delete_response, serialize_logs_response
- Fix is_mip_request to detect both sync and async MIP formats

Tested with LP (afiro.mps) and MIP (flugpl.mps) using msgpack async.
- docs/developer/REMOTE_SOLVE_GUIDE.md: Comprehensive guide covering
  server setup, client configuration, async/sync modes, job lifecycle,
  custom serialization, worker monitoring, and troubleshooting

- docs/developer/SOLUTION_MEMORY_ARCHITECTURE.md: Documents the hybrid
  GPU/CPU memory strategy for solutions, including to_host() method,
  accessor patterns, and design rationale

Deleted 56 obsolete markdown files from root directory that contained
outdated information from previous development iterations.
Features:
- CANCEL_JOB: Cancel queued jobs (marked cancelled, workers skip them)
  or running jobs (worker killed via SIGKILL, auto-restarted)
- WAIT_FOR_RESULT: Blocking wait using condition variables (no polling)
  Server thread sleeps until job completes, fails, or is cancelled
- cuopt_cancel_job: Standalone CLI utility to cancel jobs by ID

Protocol changes:
- Added CANCEL_JOB (type 5) and WAIT_FOR_RESULT (type 6) request types
- Added CANCELLED (5) to JobStatus enum
- Added CancelResponse message to protobuf schema

Server changes:
- Added cancelled flag to shared memory JobQueueEntry
- Workers check cancelled flag before processing claimed jobs
- Cancel handler kills worker process for running jobs
- wait_for_result() uses JobWaiter condition variables
- Fixed check_job_status() to detect PROCESSING state
- Thread-per-connection model (already present) enables blocking waits

Serialization:
- Added cancel methods to remote_serialization.hpp base class
- Implemented in both protobuf and msgpack serializers
- Added MSG_WAIT_REQUEST (17) to msgpack message types

Documentation:
- Added docs/remote_solve_architecture.md for security review
- Covers protocol, process model, data flow, security considerations
C++ API:
- Added cancel_job_remote() function to remote_solve.hpp/cu
- Added remote_job_status_t enum and cancel_job_result_t struct
- Simplified cuopt_cancel_job utility to use the new API (194→105 lines)

Python API:
- Added cuopt.linear_programming.remote module
- cancel_job() function for cancelling remote jobs
- JobStatus enum and CancelResult dataclass
- Exported from cuopt.linear_programming package

All three APIs tested and verified working:
- Raw protobuf cancel (baseline)
- C++ utility cancel (cuopt_cancel_job)
- Python API cancel (cancel_job)
Server-side:
- All jobs (blocking and non-blocking) now go through the shared memory
  job queue and are processed by worker processes
- For blocking=true requests, server submits to queue then waits on
  condition variable until completion, then auto-deletes and returns
- This enables cancellation for 'sync' jobs and allows concurrent solves
- Single code path for all job processing

Client-side:
- Updated solve_lp_remote and solve_mip_remote to use the unified
  response format (result_response) for blocking mode
- Removed streaming log dependency for blocking mode

Benefits:
- Cancellation works for all jobs (blocking and non-blocking)
- Multiple concurrent 'sync' solves possible (one per worker)
- Simpler code with single processing path
- Worker isolation (one job failure doesn't crash server)
- Rename protobuf OptimizationProblem fields to match data_model_view_t:
  * constraint_matrix_values -> A_values (matches A_)
  * constraint_matrix_indices -> A_indices (matches A_indices_)
  * constraint_matrix_offsets -> A_offsets (matches A_offsets_)
  * objective_coefficients -> c (matches c_)
  * constraint_bounds -> b (matches b_)
  * is_integer/is_binary -> variable_types (bytes, matches variable_types_)

- Add missing fields to protobuf and msgpack serialization:
  * initial_primal_solution - for warm start initial primal values
  * initial_dual_solution - for warm start initial dual values
  * Q_values, Q_indices, Q_offsets - quadratic objective matrix for QPS

- Update msgpack serializer to use identical field names as protobuf
- Update protobuf_serializer.cu serialization/deserialization
- Both serializers now handle all data_model_view_t fields
- Rename A_values -> A (matches data_model_view_t::A_)
- Remove unused problem_category field and enum
- Rename PDLPSolverMode enum values to match pdlp_solver_mode_t:
  PDLP_MODE_STABLE1 -> Stable1, etc.
- Rename LPMethod enum values to match method_t:
  METHOD_CONCURRENT -> Concurrent, METHOD_BARRIER -> Barrier, etc.
- Update protobuf_serializer.cu for new enum names
- Update msgpack_serializer.cpp for A field rename
- Remove redundant n_rows, n_cols, nnz fields (sizes inferred from arrays)
- Add variable_names and row_names fields (matching protobuf)
- Msgpack and protobuf now use identical field names for OptimizationProblem
Shared Memory Changes:
- Replace fixed 64MB/128MB embedded data arrays with per-job shared memory
- JobQueueEntry and ResultQueueEntry now store shm_data_name instead of data[]
- Create /cuopt_job_<id> and /cuopt_result_<id> segments on demand
- Cleanup segments after job completion or on error
- Change data_size from uint32_t to uint64_t for >4GB support

Wire Protocol Changes:
- Change message size from uint32_t (4GB max) to uint64_t
- Update both server (cuopt_remote_server.cpp) and client (remote_solve.cu)
- Increase sanity check limit from 100MB to 16GB

Benefits:
- No fixed size limits - problems limited only by system RAM
- No wasted memory for small problems
- Queue entries are now ~256 bytes instead of 64MB+
- Supports very large LP/MIP problems
- Add is_mip parameter to serialize_result_response to properly distinguish
  LP vs MIP solutions (protobuf parsing is lenient and was misinterpreting
  MIP results as LP)
- Add get_job_is_mip helper function to retrieve job type from tracker
- Change job_id fields from string to bytes in protobuf schema to avoid
  spurious UTF-8 validation warnings when parsing wrong message types
- Update all serialize_result_response call sites to pass is_mip flag

Both LP and MIP remote solve now work correctly with Python Problem class.
Pipes are container-friendly (no /dev/shm size limits or volume mounts needed).
Shared memory may be slightly faster for very large problems.

Changes:
- Add WorkerPipes struct with pipe file descriptors
- Add worker_index, data_sent fields to JobQueueEntry for pipe sync
- Add worker_index field to ResultQueueEntry for pipe mode
- Add pending_job_data map for storing data before worker claims job
- Add pipe I/O helpers: write_to_pipe, read_from_pipe, send_job_data_pipe,
  recv_job_data_pipe, send_result_pipe, recv_result_pipe
- Modify spawn_workers to create pipes before forking
- Modify spawn_single_worker to recreate pipes when restarting workers
- Modify worker_process to read job data from pipe or shm based on mode
- Modify result_retrieval_thread to send job data to workers (pipe mode)
  and read results from pipes
- Add --use-shm command line option
- Display IPC mode in server startup message

Tested: Both pipe mode (default) and shm mode work correctly with
LP and MIP remote solve via Python Problem class.
When a running job is cancelled, the worker is killed with SIGKILL which
closes its pipe ends. If the server is in the middle of writing to the
worker's input pipe, it receives SIGPIPE which would terminate the server.

By ignoring SIGPIPE, the write() call returns -1 with errno=EPIPE instead,
which is handled gracefully (logged as 'Failed to send job data').

Tested: Both pipe mode and shm mode work correctly with job cancellation
and worker restart.
These scripts are for manual testing during development and won't be
run by CI (pytest looks in specific directories like python/cuopt/).

- test_pipe_cancel_restart.py: Tests job cancellation and worker restart
- test_python_problem_remote.py: Tests Python Problem class with remote solve
Move protobuf::libprotobuf from PUBLIC to PRIVATE linking for both
cuopt and cuopt_remote_server libraries. This fixes Python wheel
build failures where protobuf was incorrectly exposed as a public
dependency.
Remove unnecessary data_sent flag wait loop that caused 30s timeout.
The pipe read operation naturally blocks until data is available,
so explicit polling of the data_sent flag is redundant and can
cause the worker to timeout before the server writes data.

This fixes the 'Timeout waiting for job data' error when using
pipe-based IPC with large problems.

Bug: Worker would timeout after 30 seconds waiting for data_sent flag
Fix: Worker now directly reads from pipe (blocking read)
Result: Data transfer happens as soon as server writes to pipe
Document all three operating modes (Sync, Async, WAIT_FOR_RESULT) with:
- Detailed workflow descriptions and use cases
- Log retrieval methods for each mode
- Complete API endpoint reference
- Python usage examples and best practices
- Architecture diagrams for all modes
- Future API design recommendations

This consolidates the remote solve protocol documentation into a single
reference for users and developers.
- Fix send_typed_message to write 8-byte payload size (was incorrectly writing 4 bytes)
- Fix pipe mode deadlock: set result.ready=true BEFORE writing to pipe
  (prevents deadlock when result size exceeds pipe buffer ~64KB)
- Add 2-minute poll() timeout in read_from_pipe() for safety if worker dies
- Add #include <poll.h> for poll() function
@tmckayus tmckayus requested review from Kh4ster and rg20 January 13, 2026 20:45
@copy-pr-bot
Copy link

copy-pr-bot bot commented Jan 13, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link

coderabbitai bot commented Jan 13, 2026

📝 Walkthrough

Walkthrough

Adds end-to-end remote-solve: server and CLI, protobuf/msgpack serializers with pluggable factory, remote client APIs (sync/async) and cancel, view-based CPU/GPU data paths with hybrid solution memory, build/CI dependency updates, Python/C bindings, tests, docs, and developer scripts.

Changes

Cohort / File(s) Summary
Build system & install
build.sh, cpp/CMakeLists.txt, cpp/libmps_parser/CMakeLists.txt
Add cuopt_remote_server and cuopt_cancel_job targets; protobuf codegen and linking; add mps_parser_static target/aliases; update build help, runpath, and conditional install flows.
Remote server & CLI
cpp/cuopt_remote_server.cpp, cpp/cuopt_cancel_job.cpp, cpp/scripts/run_dev_server.sh
New remote server implementation (workers, IPC pipes/SHM, job tracking, log streaming, sync/async APIs), job-cancel CLI utility, and dev-run script.
Protocol buffers & schema
cpp/src/linear_programming/utilities/cuopt_remote.proto, generated pb sources via cpp/CMakeLists.txt
New protobuf schema for sync/async remote API, job lifecycle, requests/responses; CMake wiring to generate and compile pb sources.
Serialization interface & implementations
cpp/include/.../remote_serialization.hpp, cpp/src/.../protobuf_serializer.cu, cpp/src/.../serializers/msgpack_serializer.cpp, cpp/src/.../serializers/CMakeLists.txt
Add remote_serializer_t abstract interface, protobuf serializer implementation, MsgPack serializer plugin and CMake rules, and dynamic serializer factory/loading.
Remote client & utilities
cpp/src/linear_programming/utilities/remote_solve.cu, cpp/include/.../remote_solve.hpp
Client-side remote API (solve_lp_remote, solve_mip_remote), sync/async job submission/polling/logs/result retrieval, cancel_job_remote, and env-based config parsing.
Data model view & mps parser
cpp/include/.../data_model_view.hpp, cpp/libmps_parser/include/mps_parser/data_model_view.hpp, cpp/libmps_parser/src/*
Add non-owning data_model_view_t aliases, expose is_device_memory flag and setters, add int64 explicit instantiations, centralize mps parser sources, and add static/shared mps_parser targets.
Solver API & view overloads
cpp/include/.../solve.hpp, cpp/src/linear_programming/solve.cu, cpp/src/mip/solve.cu
Add converters from data_model_view_toptimization_problem_t and overloads of solve_lp/solve_mip accepting views; GPU→CPU copy helpers and view-based solve pathways for remote/local routing.
C API / CLI integration & helpers
cpp/src/linear_programming/cuopt_c.cpp, cpp/cuopt_cli.cpp
Add CPU-side problem storage and view creation, lazy CUDA initialization, create_view helpers, and remote-solve plumbing in CLI flow.
Solution memory model (LP & MIP)
cpp/include/.../mip/solver_solution.hpp, cpp/include/.../pdlp/solver_solution.hpp, cpp/src/linear_programming/solver_solution.cu, cpp/src/mip/solver_solution.cu
Introduce hybrid host/device solution storage, is_device_memory_ flags, host constructors/setters/getters, to_host() conversions, and memory-path-aware IO.
Core integration (LP core)
cpp/src/linear_programming/CMakeLists.txt, cpp/src/linear_programming/utilities/protobuf_serializer.cu, cpp/src/linear_programming/utilities/remote_solve.cu
Add protobuf serializer and remote client sources into LP core build and compilation units.
Cython / Python bindings & utilities
cpp/include/.../cython_solve.hpp, cpp/src/.../cython_solve.cu, python/.../solver/solver_wrapper.pyx, python/cuopt/cuopt/__init__.py, python/cuopt/cuopt/linear_programming/remote.py, python/cuopt/cuopt/utilities/*, python/cuopt/cuopt/linear_programming/solver/solver.pxd
Extend Cython return structs with host solution vectors and is_device_memory_; add remote detection, lazy CUDA imports, device→numpy helpers, Python remote cancel API, and lazy package import mechanics.
Dev/test scripts & examples
dev_scripts/build_c_test.sh, dev_scripts/test_c_api.c, dev_scripts/test_python_api.py, dev_scripts/test_python_problem_remote.py, dev_scripts/test_pipe_cancel_restart.py
Add C test builder and C test app (local/remote), Python examples/tests (local/remote), and pipe-mode cancel/restart test.
Unit tests
cpp/tests/linear_programming/CMakeLists.txt, cpp/tests/linear_programming/unit_tests/remote_solve_test.cu
Add remote_solve_test.cu covering GPU/CPU and remote/local solve routing and basic expectations.
Docs & developer guides
REMOTE_SOLVE_MODES.md, docs/developer/*, docs/remote_solve_architecture.md
Add extensive documentation: remote solve modes, architecture, serialization plugin guide, solution memory architecture, developer guide, and architecture references.
Packaging / CI / envs
conda/environments/*, conda/recipes/libcuopt/recipe.yaml, dependencies.yaml, python/libcuopt/pyproject.toml, ci/*, ci/utils/install_protobuf.sh
Add libprotobuf to conda envs and recipe, update dependencies.yaml and pyproject, add CI install step and helper script to install Protobuf.
Serializer plugins build
cpp/src/linear_programming/utilities/serializers/CMakeLists.txt
Add conditional build for cuopt_msgpack_serializer when msgpack-cxx present; install plugin library.
Misc / gitignore
.gitignore
Add ignore rule for dev_scripts/test_c_api.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 28.40% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'WIP: Add a new remote execution system that is integrated in the cuopt engine' directly describes the main change: introducing a new remote execution system integrated into cuopt, which is confirmed by the extensive changes across server implementation, serialization, configuration, and documentation.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

Note

Due to the large number of review comments, Critical severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
cpp/cuopt_cli.cpp (1)

224-226: Minor: typo in log message

“MPS parser execption” (Line 225) → “exception”.

🤖 Fix all issues with AI agents
In @cpp/cuopt_remote_server.cpp:
- Around line 108-127: send_typed_message currently assumes ::write writes the
entire type byte and 8-byte size and doesn't handle partial writes or EINTR;
update it to use the existing write_all helper (and ensure write_all itself
handles EINTR/short writes) for the header fields (msg_type and payload_size)
and for the payload loop so all writes are retried until complete; also audit
and replace raw ::write usages in the other framing helpers (the code around the
other helper functions that perform framing and the read_all/write_all
implementations) to use the robust write_all/read_all semantics so headers,
sizes and payloads cannot be partially written or interrupted.
- Around line 1190-1202: The code mutates JobWaiter fields from
result_retrieval_thread() and cancel_job() without acquiring the
JobWaiter::mutex, causing data races against the waiting thread that reads
waiter->ready under that mutex; to fix, when you look up the waiter in
waiting_threads (using waiters_mutex) then acquire the specific waiter's mutex
(e.g., std::lock_guard<std::mutex> waiter_lock(waiter->mutex)) before writing
waiter->result_data, waiter->error_message, waiter->success, and waiter->ready,
and call waiter->cv.notify_one() while still holding that waiter mutex (or use
notify_one immediately after releasing but ensure memory visibility), doing the
same symmetric locking in cancel_job(); ensure the waiting thread still checks
the ready predicate under JobWaiter::mutex so no race or missed wakeup occurs.
- Around line 67-91: The problem: several strncpy(...) uses that populate
fixed-size fields (JobQueueEntry::job_id, JobQueueEntry::shm_data_name,
ResultQueueEntry::job_id, ResultQueueEntry::shm_data_name,
ResultQueueEntry::error_message and similar spots) use length-1 without
explicitly writing a trailing NUL, allowing non-NUL-terminated strings and UB
when constructing std::string or calling strcmp. Fix: after every strncpy into
these fixed buffers (job_id, shm_data_name, error_message, etc.) explicitly set
buffer[sizeof(buffer)-1] = '\0' (or switch to a safe copy like strncpy followed
by that assignment, or use strlcpy if available) so the fields are always
NUL-terminated; update all occurrences noted in the review (around lines 67-91,
548-562, 720-738, 999-1011, 1459-1476) and any other strncpy uses that write
into fixed char arrays.

In @cpp/src/linear_programming/cuopt_c.cpp:
- Around line 228-230: The GPU view creation in create_view_from_gpu_problem
currently leaves view.variable_types unset (unlike create_view_from_cpu_data
which calls create_view), causing solve routines in solve.cu (which expect
variable_types to be present) to read uninitialized data; fix by calling
view.set_variable_types(gpu.get_variable_types().data(), gpu.get_n_variables())
during create_view_from_gpu_problem so the view's variable_types span is
initialized, or alternatively modify the solve routines to check for a
missing/empty variable_types span and handle the GPU path explicitly.

In @cpp/src/mip/solve.cu:
- Around line 512-564: The CPU-local path in solve_mip can dereference a null
CUDA handle: before calling data_model_view_to_optimization_problem(handle_ptr,
view) when view.is_device_memory() is false and remote_config is not set,
validate handle_ptr; if it's null, log an error (CUOPT_LOG_ERROR) and return a
mip_solution_t<i_t,f_t> with cuopt::logic_error("No CUDA handle for CPU->GPU
copy", cuopt::error_type_t::RuntimeError) to mirror the GPU->CPU remote-check
branch; otherwise proceed to call data_model_view_to_optimization_problem and
return solve_mip(op_problem, settings).

In @dev_scripts/test_pipe_cancel_restart.py:
- Around line 29-52: The client protocol in
python/cuopt/cuopt/linear_programming/remote.py incorrectly uses 32-bit size
prefixes ("<I"), causing mismatch with the server's uint64_t sizes; update the
send and receive size packing/unpacking in functions that call struct.pack("<I",
...) and struct.unpack("<I", ...) to use "<Q" instead, and ensure any buffer
reads expect 8 bytes for the size prefix (e.g., change reads that assume 4 bytes
to 8 bytes) so the client reads the full uint64 size and then reads the
subsequent payload accordingly.

In @docs/remote_solve_architecture.md:
- Around line 304-314: The documented struct definitions for JobQueueEntry and
ResultQueueEntry are out of sync with the real implementation; update the
documentation to match cpp/cuopt_remote_server.cpp by changing
JobQueueEntry::job_id to char[64], data_size to uint64_t, remove inline data[]
and add char shm_data_name[128] for shared-memory names, change worker_pid to
std::atomic<pid_t>, and add std::atomic<int> worker_index and std::atomic<bool>
data_sent; similarly update ResultQueueEntry to include shm_data_name[128], use
uint64_t data_size, add worker_index and clarify the error message field/type;
also update surrounding explanatory text to describe dual-mode transport (pipe
and shared memory) and how data is referenced via shm_data_name instead of
inline buffers.
🟠 Major comments (14)
cpp/libmps_parser/CMakeLists.txt-81-82 (1)

81-82: Static library missing BZip2/ZLIB include directories.

The static library has the same source files as the shared library, but lines 144-150 only add the BZip2 and ZLIB include directories to mps_parser, not mps_parser_static. If any source files include headers from these libraries, the static build will fail when MPS_PARSER_WITH_BZIP2 or MPS_PARSER_WITH_ZLIB is enabled.

🔧 Proposed fix
 if(MPS_PARSER_WITH_BZIP2)
     target_include_directories(mps_parser PRIVATE BZip2::BZip2)
+    target_include_directories(mps_parser_static PRIVATE BZip2::BZip2)
 endif(MPS_PARSER_WITH_BZIP2)

 if(MPS_PARSER_WITH_ZLIB)
     target_include_directories(mps_parser PRIVATE ZLIB::ZLIB)
+    target_include_directories(mps_parser_static PRIVATE ZLIB::ZLIB)
 endif(MPS_PARSER_WITH_ZLIB)

Also applies to: 94-99

cpp/src/mip/solver_solution.cu-147-169 (1)

147-169: Potential null pointer dereference in solution accessors.

get_solution() dereferences solution_ and get_solution_host() dereferences solution_host_ without null checks. If get_solution() is called on a CPU-only solution (where solution_ is null), this will crash. Similarly for get_solution_host() on a GPU solution.

Consider adding assertions or guards to catch misuse.

🛡️ Suggested defensive check
 template <typename i_t, typename f_t>
 const rmm::device_uvector<f_t>& mip_solution_t<i_t, f_t>::get_solution() const
 {
+  cuopt_assert(solution_ != nullptr, "get_solution() called but solution is in host memory");
   return *solution_;
 }

 template <typename i_t, typename f_t>
 rmm::device_uvector<f_t>& mip_solution_t<i_t, f_t>::get_solution()
 {
+  cuopt_assert(solution_ != nullptr, "get_solution() called but solution is in host memory");
   return *solution_;
 }

 template <typename i_t, typename f_t>
 std::vector<f_t>& mip_solution_t<i_t, f_t>::get_solution_host()
 {
+  cuopt_assert(solution_host_ != nullptr, "get_solution_host() called but solution is in device memory");
   return *solution_host_;
 }

 template <typename i_t, typename f_t>
 const std::vector<f_t>& mip_solution_t<i_t, f_t>::get_solution_host() const
 {
+  cuopt_assert(solution_host_ != nullptr, "get_solution_host() called but solution is in device memory");
   return *solution_host_;
 }
cpp/src/linear_programming/utilities/serializers/msgpack_serializer.cpp-60-137 (1)

60-137: Duplicate LP settings serialization code should be extracted.

The LP settings packing code (28 fields) is duplicated between serialize_lp_request (lines 72-134) and serialize_async_lp_request (lines 577-639). This violates DRY principle and increases maintenance burden.

♻️ Extract helper method for LP settings serialization
private:
  void pack_lp_settings(msgpack::packer<msgpack::sbuffer>& pk,
                        const pdlp_solver_settings_t<i_t, f_t>& settings)
  {
    pk.pack_map(28);
    pk.pack("absolute_gap_tolerance");
    pk.pack(settings.tolerances.absolute_gap_tolerance);
    pk.pack("relative_gap_tolerance");
    pk.pack(settings.tolerances.relative_gap_tolerance);
    // ... rest of settings
  }

Then in both serialize_lp_request and serialize_async_lp_request:

pack_lp_settings(pk, settings);

Also applies to: 558-642

cpp/src/linear_programming/utilities/serializers/msgpack_serializer.cpp-139-183 (1)

139-183: Duplicate MIP settings serialization should also be extracted.

Similarly, MIP settings packing (13 fields) is duplicated between serialize_mip_request and serialize_async_mip_request.

♻️ Extract helper method for MIP settings serialization
private:
  void pack_mip_settings(msgpack::packer<msgpack::sbuffer>& pk,
                         const mip_solver_settings_t<i_t, f_t>& settings)
  {
    pk.pack_map(13);
    pk.pack("time_limit");
    pk.pack(settings.time_limit);
    // ... rest of settings
  }

Also applies to: 644-695

cpp/src/linear_programming/cuopt_c.cpp-1275-1355 (1)

1275-1355: Exception safety: potential memory leak if solve throws.

If solve_mip or solve_lp throws an exception (e.g., due to CUDA errors), the solution_and_stream_view_t allocated at lines 1284, 1300, 1325, or 1341 will be leaked.

Consider wrapping the solve calls in try-catch:

Suggested fix pattern
solution_and_stream_view_t* solution_and_stream_view =
  new solution_and_stream_view_t(true, problem_and_stream_view->handle.get());
try {
  solution_and_stream_view->mip_solution_ptr = new mip_solution_t<cuopt_int_t, cuopt_float_t>(
    solve_mip<cuopt_int_t, cuopt_float_t>(gpu_problem, mip_settings));
} catch (...) {
  delete solution_and_stream_view;
  throw;  // or return appropriate error code
}
cpp/cuopt_remote_server.cpp-1189-1221 (1)

1189-1221: Major: result data is moved into the waiter before storing to job_tracker

When a blocking waiter exists, result_data is moved into the waiter (Line 1196), then job_tracker[job_id].result_data = result_data; stores an empty vector (Line 1211). That makes subsequent GET_RESULT/DELETE flows inconsistent (especially for WAIT_FOR_RESULT).

cpp/cuopt_remote_server.cpp-380-459 (1)

380-459: Security hardening: shared memory and log files are world-writable / symlinkable

  • shm_open(..., 0666) (Line 380, 410, 437) exposes job/result control to other users on the same host.
  • Logs are written under /tmp (Line 1648) and opened with O_CREAT|O_TRUNC (Line 583), which is vulnerable to symlink attacks.

Consider 0600 perms (and/or per-user shm names), and opening logs with O_NOFOLLOW plus ensuring the log dir is owned and not world-writable.

Also applies to: 580-602, 1647-1665

cpp/cuopt_cli.cpp-72-172 (1)

72-172: Explicitly mark the MPS-backed view as host memory (avoid remote-solve false negatives/positives)

create_view_from_mps_data_model() builds spans to CPU vectors, but the view’s is_device_memory is never set here (Line 82+; later used at Line 271). If the default is true, CLI remote solves will fail because solve_lp/solve_mip will treat the view as GPU data and require a CUDA handle.

Proposed fix
 template <typename i_t, typename f_t>
 cuopt::linear_programming::data_model_view_t<i_t, f_t> create_view_from_mps_data_model(
   const cuopt::mps_parser::mps_data_model_t<i_t, f_t>& mps_data_model)
 {
   cuopt::linear_programming::data_model_view_t<i_t, f_t> view;
+  // MPS parser owns CPU buffers; this view is host-backed.
+  view.set_is_device_memory(false);

   view.set_maximize(mps_data_model.get_sense());
   ...
   return view;
 }

Also applies to: 269-272

cpp/src/linear_programming/utilities/remote_solve.cu-42-223 (1)

42-223: Harden socket I/O: handle EINTR/partial writes + add timeouts

read_all() / write_all() treat any n <= 0 as hard failure (Lines 193-200, 209-216), which will spuriously fail on EINTR. Also, connect/read/write can block indefinitely; please add timeouts (SO_RCVTIMEO/SO_SNDTIMEO or non-blocking connect + poll). This is especially important for long-running remote solves.

cpp/src/mip/solve.cu-298-381 (1)

298-381: Potential correctness gap: view helper ignores quadratic objective (and should set is_device_memory=false)

This create_view_from_mps_data_model() omits Q-objective population (unlike the CLI helper), but later code copies/serializes quadratic objective fields (cpu_problem_data_t has Q buffers; copy_view_to_cpu copies Q). That mismatch strongly suggests Q was intended to be supported here too. Also, this helper should explicitly mark the view as host memory.

Also applies to: 505-510

cpp/src/mip/solve.cu-450-495 (1)

450-495: Add CUDA error checks for cudaMemcpyAsync and cudaStreamSynchronize

Lines 57–65 missing error checking: wrap cudaMemcpyAsync and cudaStreamSynchronize with RAFT_CUDA_TRY() to ensure copies complete successfully before returning potentially incomplete data for remote serialization. Without error checking, corrupted or partial data silently propagates downstream.

cpp/src/linear_programming/solve.cu-1145-1275 (1)

1145-1275: Add CUDA error checks and use handle streams for data copies

The cudaMemcpy calls at lines 1187–1190 and 1240–1251 lack error checking. Additionally, cudaMemcpyAsync (line 1380) and cudaStreamSynchronize (line 1388) in copy_view_to_cpu are unchecked. More critically, the synchronous cudaMemcpy calls in data_model_view_to_optimization_problem do not pass handle_ptr->get_stream(), causing them to execute on the default stream and ignore upstream work enqueued on a non-default stream, which breaks correctness. Use raft::copy or cudaMemcpyAsync with the handle's stream (as done in copy_view_to_cpu), and wrap all CUDA operations with error checking.

cpp/src/linear_programming/utilities/protobuf_serializer.cu-1383-1391 (1)

1383-1391: Missing CUDA error checking on cudaMemcpy.

Same issue as in lp_solution_to_proto - wrap with RAFT_CUDA_TRY.

🔧 Proposed fix
       if (d_sol.size() > 0) {
         std::vector<f_t> h_sol(d_sol.size());
-        cudaMemcpy(h_sol.data(), d_sol.data(), d_sol.size() * sizeof(f_t), cudaMemcpyDeviceToHost);
+        RAFT_CUDA_TRY(cudaMemcpy(h_sol.data(), d_sol.data(), d_sol.size() * sizeof(f_t), cudaMemcpyDeviceToHost));
         for (const auto& v : h_sol) {
           pb_solution->add_solution(static_cast<double>(v));
         }
       }
cpp/src/linear_programming/utilities/protobuf_serializer.cu-1291-1322 (1)

1291-1322: Missing CUDA error checking on cudaMemcpy calls.

Per coding guidelines, every CUDA memory operation must have error checking. These cudaMemcpy calls could fail silently.

🔧 Proposed fix
       if (d_primal.size() > 0) {
         std::vector<f_t> h_primal(d_primal.size());
-        cudaMemcpy(
-          h_primal.data(), d_primal.data(), d_primal.size() * sizeof(f_t), cudaMemcpyDeviceToHost);
+        RAFT_CUDA_TRY(cudaMemcpy(
+          h_primal.data(), d_primal.data(), d_primal.size() * sizeof(f_t), cudaMemcpyDeviceToHost));
         for (const auto& v : h_primal) {
           pb_solution->add_primal_solution(static_cast<double>(v));
         }
       }

       // Copy dual solution from device
       if (d_dual.size() > 0) {
         std::vector<f_t> h_dual(d_dual.size());
-        cudaMemcpy(
-          h_dual.data(), d_dual.data(), d_dual.size() * sizeof(f_t), cudaMemcpyDeviceToHost);
+        RAFT_CUDA_TRY(cudaMemcpy(
+          h_dual.data(), d_dual.data(), d_dual.size() * sizeof(f_t), cudaMemcpyDeviceToHost));
         for (const auto& v : h_dual) {
           pb_solution->add_dual_solution(static_cast<double>(v));
         }
       }

       // For reduced cost, we can access via const cast since we're just reading
       auto& nc_solution    = const_cast<optimization_problem_solution_t<i_t, f_t>&>(solution);
       auto& d_reduced_cost = nc_solution.get_reduced_cost();
       if (d_reduced_cost.size() > 0) {
         std::vector<f_t> h_reduced_cost(d_reduced_cost.size());
-        cudaMemcpy(h_reduced_cost.data(),
+        RAFT_CUDA_TRY(cudaMemcpy(h_reduced_cost.data(),
                    d_reduced_cost.data(),
                    d_reduced_cost.size() * sizeof(f_t),
-                   cudaMemcpyDeviceToHost);
+                   cudaMemcpyDeviceToHost));
         for (const auto& v : h_reduced_cost) {
           pb_solution->add_reduced_cost(static_cast<double>(v));
         }
       }

You'll need to include <raft/util/cudart_utils.hpp> for RAFT_CUDA_TRY.

🟡 Minor comments (12)
dev_scripts/test_c_api.c-138-143 (1)

138-143: Add NULL check after malloc.

malloc can return NULL on allocation failure, but the code proceeds to use solution_values without checking.

Proposed fix
   // Get and print solution variables
   solution_values = (cuopt_float_t*)malloc(num_variables * sizeof(cuopt_float_t));
+  if (solution_values == NULL) {
+    printf("Error allocating memory for solution values\n");
+    status = CUOPT_ERROR;
+    goto DONE;
+  }
   status = cuOptGetPrimalSolution(solution, solution_values);
python/cuopt/cuopt/linear_programming/solver/solver_wrapper.pyx-155-185 (1)

155-185: Potential issue with type_cast when obj has neither dtype nor dtypes.

When obj lacks both dtype and dtypes attributes (line 172), obj_type = type(obj) returns a Python type object (e.g., <class 'int'>). Passing this to np.issubdtype() may not behave as expected since np.issubdtype expects numpy dtype-like objects.

🐛 Consider handling this edge case
     else:
-        obj_type = type(obj)
+        # Fallback: try to infer dtype from a numpy array conversion
+        try:
+            obj_type = np.asarray(obj).dtype
+        except Exception:
+            obj_type = type(obj)

Alternatively, you could skip the type checking for this fallback case since astype will handle conversion anyway.

cpp/tests/linear_programming/CMakeLists.txt-6-10 (1)

6-10: Test coverage is incomplete; add tests for numerical correctness and degenerate cases.

The test file properly isolates environment variables and covers the basic remote/local solve paths, but lacks critical validation:

  • Tests only verify termination status (Optimal or NumericalError), not actual solution correctness
  • No degenerate cases tested (infeasible, unbounded, singleton problems)
  • No free variables or extreme problem dimensions tested
  • No verification of problem transformation correctness or index consistency

Add tests that validate:

  • Actual optimal solution values against expected results
  • Degenerate cases (infeasible, unbounded, empty, and singleton problems)
  • Free variables and extreme problem dimensions
  • Transformation and postsolve mapping consistency
cpp/cuopt_cancel_job.cpp-82-83 (1)

82-83: std::stoi can throw on invalid input.

If an invalid port string is provided (e.g., -p abc), std::stoi will throw std::invalid_argument or std::out_of_range, causing an unhandled exception and crash.

🔧 Suggested fix with exception handling
     } else if (strcmp(argv[i], "-p") == 0 && i + 1 < argc) {
-      port = std::stoi(argv[++i]);
+      try {
+        port = std::stoi(argv[++i]);
+      } catch (const std::exception& e) {
+        std::cerr << "Error: Invalid port number\n";
+        return 1;
+      }
     } else if (argv[i][0] != '-') {
docs/developer/REMOTE_SOLVE_GUIDE.md-60-67 (1)

60-67: Correct the documented default port in REMOTE_SOLVE_GUIDE.md from 8765 to 9090.

The actual default port for cuopt_remote_server is 9090 (as confirmed in cpp/cuopt_remote_server.cpp line 257 and usage output), but this guide incorrectly documents it as 8765 at lines 29 and 65. Update the documentation to match the actual implementation. Note: Port 8765 can remain in examples if intentional, but the stated default must be corrected.

build.sh-403-404 (1)

403-404: Fix CMake reconfigure command - incorrect source directory.

The cmake reconfigure command passes ${LIBCUOPT_BUILD_DIR} as the source directory, but this is the build directory, not the source directory. CMake will use the cached source directory when reconfiguring an existing build, but this syntax is potentially confusing and may cause issues.

🔧 Suggested fix
     # Reconfigure with BUILD_REMOTE_SERVER=ON
-    cmake -DBUILD_REMOTE_SERVER=ON "${LIBCUOPT_BUILD_DIR}"
+    cmake -DBUILD_REMOTE_SERVER=ON .

When already in the build directory (line 401: cd "${LIBCUOPT_BUILD_DIR}"), use . to reconfigure the current build directory, which is the idiomatic CMake approach.

docs/remote_solve_architecture.md-536-547 (1)

536-547: Update documentation: return type for deserialize_lp_solution is incorrect.

The documented return type lp_solution_t<i_t, f_t> should be optimization_problem_solution_t<i_t, f_t> to match the actual interface defined in cpp/include/cuopt/linear_programming/utilities/remote_serialization.hpp (line 78).

cpp/src/linear_programming/utilities/serializers/msgpack_serializer.cpp-1051-1059 (1)

1051-1059: Potential protocol issue: error message duplicated on failure.

In serialize_submit_response, when !success, the result string (which is the error message) is packed twice (lines 1057-1058). This may cause deserialization issues if the client doesn't expect this duplication.

🐛 Verify protocol expectation or fix duplication

If the comment "error message duplicated for compatibility" is intentional, consider adding a version check. Otherwise:

     pk.pack(result);                    // job_id on success, error message on failure
-    if (!success) { pk.pack(result); }  // error message duplicated for compatibility
cpp/include/cuopt/linear_programming/utilities/remote_serialization.hpp-101-125 (1)

101-125: Document input validation requirements in the interface.

The virtual methods deserialize_lp_request and deserialize_mip_request process untrusted network data. Update the documentation to explicitly specify that implementations must validate:

  • Array sizes do not exceed system/solver resource limits (prevent resource exhaustion)
  • CSR matrix indices are within valid range (0 to num_variables-1) and offsets are consistent
  • Constraint/objective dimensions match declared problem size (prevent buffer overflows and array access violations)

Without documented requirements, custom implementations may skip validation.

cpp/include/cuopt/linear_programming/mip/solver_solution.hpp-80-86 (1)

80-86: Potential null dereference if called when is_device_memory() is true.

The documentation states the method is only valid when is_device_memory() returns false, but there's no runtime protection. Consider adding a debug assertion or throwing an exception for misuse.

💡 Suggested defensive check (in implementation)
std::vector<f_t>& mip_solution_t<i_t, f_t>::get_solution_host()
{
  cuopt_assert(solution_host_ != nullptr, "get_solution_host() called on device memory solution");
  return *solution_host_;
}
cpp/src/linear_programming/solver_solution.cu-441-476 (1)

441-476: Potential null dereference in host getters when called on device memory solution.

These methods dereference unique_ptr without null checks. If called when is_device_memory_ is true, the underlying pointer would be null.

💡 Suggested defensive check
 template <typename i_t, typename f_t>
 std::vector<f_t>& optimization_problem_solution_t<i_t, f_t>::get_primal_solution_host()
 {
+  cuopt_assert(!is_device_memory_ && primal_solution_host_,
+               "get_primal_solution_host() requires host memory");
   return *primal_solution_host_;
 }

Apply similar checks to all host getters.

cpp/src/linear_programming/solver_solution.cu-410-439 (1)

410-439: Device getters may cause null dereference after to_host() is called.

After to_host() is invoked, primal_solution_, dual_solution_, and reduced_cost_ are reset to null (lines 706-708). Subsequent calls to these getters would dereference null pointers.

💡 Suggested defensive check
 template <typename i_t, typename f_t>
 rmm::device_uvector<f_t>& optimization_problem_solution_t<i_t, f_t>::get_primal_solution()
 {
+  cuopt_assert(is_device_memory_ && primal_solution_,
+               "get_primal_solution() requires device memory");
   return *primal_solution_;
 }

Apply similar checks to all device getters.

🧹 Nitpick comments (38)
REMOTE_SOLVE_MODES.md (1)

38-40: Consider adding language identifiers to ASCII diagram code blocks.

The static analysis flagged several code blocks without language specifiers. For ASCII workflow diagrams, pseudo-code, and architecture diagrams, consider adding text or plaintext as the language identifier to silence linter warnings while preserving readability.

Example fix for line 38:

-```
+```text
 Client → SUBMIT_JOB (blocking=true) → Server blocks → Worker solves → Server returns result
-```
+```

This applies to other ASCII diagram blocks at lines 67, 99, 217, 492, 502, 567, 591, 634, 766, 790, and 838.

python/cuopt/cuopt/__init__.py (1)

19-24: Consider caching the imported module in the namespace.

The current implementation calls importlib.import_module on every attribute access. While Python's import system caches modules in sys.modules, you could avoid repeated __getattr__ calls by caching the result in the module namespace:

♻️ Optional optimization
 def __getattr__(name):
     """Lazy import submodules to support CPU-only hosts with remote solve."""
     if name in _submodules:
         import importlib
-        return importlib.import_module(f"cuopt.{name}")
+        module = importlib.import_module(f"cuopt.{name}")
+        globals()[name] = module  # Cache to avoid future __getattr__ calls
+        return module
     raise AttributeError(f"module {__name__!r} has no attribute {name!r}")

This is a minor optimization—the current approach is functionally correct.

dev_scripts/test_python_api.py (1)

33-52: Consider adding error handling for parse and solve failures.

If ParseMps or Solve throws an exception, the script will crash without useful context. Wrapping these in try-except would improve usability for a dev test script.

🛠️ Suggested improvement
+    try:
         # Parse MPS file into DataModel
         data_model = cuopt_mps_parser.ParseMps(mps_file)
-
         # Create solver settings
         settings = solver_settings.SolverSettings()
         settings.set_optimality_tolerance(1e-6)
-
         # Solve
         print("Solving...")
         solution = solver.Solve(data_model, settings)
+    except Exception as e:
+        print(f"Error during parse/solve: {e}")
+        sys.exit(1)
python/cuopt/cuopt/utilities/__init__.py (1)

15-26: Lazy import lacks caching—repeated attribute access triggers repeated imports.

Each call to from cuopt.utilities import type_cast (or direct access) will re-execute the import. For idiomatic lazy loading, cache the imported symbol in the module's __dict__ after first access.

♻️ Suggested fix with caching
 def __getattr__(name):
     """Lazy import CUDA-dependent utilities."""
     if name == "type_cast":
         from cuopt.utilities.type_casting import type_cast
+        globals()["type_cast"] = type_cast
         return type_cast
     elif name == "series_from_buf":
         from cuopt.utilities.utils import series_from_buf
+        globals()["series_from_buf"] = series_from_buf
         return series_from_buf
     elif name == "check_solution":
         from cuopt.utilities.utils import check_solution
+        globals()["check_solution"] = check_solution
         return check_solution
     raise AttributeError(f"module {__name__!r} has no attribute {name!r}")
docs/developer/SOLUTION_MEMORY_ARCHITECTURE.md (1)

73-78: Add language specifier to fenced code blocks for ASCII diagrams.

The static analysis flagged these blocks (MD040). Use text or plaintext as the language identifier for ASCII art diagrams.

📝 Suggested fix
-```
+```text
 ┌─────────────┐     ┌─────────────┐     ┌─────────────┐
 │   Solver    │ ──► │  Solution   │ ──► │    User     │
 │   (GPU)     │     │ (GPU mem)   │     │ (GPU/CPU)   │
 └─────────────┘     └─────────────┘     └─────────────┘

Apply similarly to the second diagram at line 90.
</details>


Also applies to: 90-95

</blockquote></details>
<details>
<summary>dev_scripts/test_python_problem_remote.py (1)</summary><blockquote>

`157-179`: **Consider adding edge case tests for degenerate problems.**

Per coding guidelines, tests should cover degenerate cases (infeasible, unbounded, empty problems). Consider adding tests for:
- Infeasible LP (e.g., x >= 1, x <= 0)
- Unbounded LP (e.g., minimize x with no upper bound)


Would you like me to generate additional test functions for infeasible and unbounded problem cases?

</blockquote></details>
<details>
<summary>docs/developer/SERIALIZATION_PLUGIN_GUIDE.md (3)</summary><blockquote>

`13-25`: **Add language specifier to fenced code block.**

The ASCII diagram code block lacks a language specifier. Consider adding `text` or `plaintext` for consistency with linting rules.


<details>
<summary>Proposed fix</summary>

```diff
-```
+```text
 ┌─────────────┐                              ┌─────────────┐
 │   Client    │                              │   Server    │

374-377: Add language specifier to fenced code block.

The server output example block lacks a language specifier.

Proposed fix
-```
+```text
 [remote_solve] Loading custom serializer from: /path/to/libcuopt_my_serializer.so
 [remote_solve] Using custom serializer: my_format
</details>

---

`182-197`: **Incomplete code example may confuse developers.**

The `deserialize_lp_solution` and `deserialize_mip_solution` method bodies only show the error case but lack a return statement for the success path. Consider adding a placeholder or explicit note.


<details>
<summary>Suggested improvement</summary>

```diff
   optimization_problem_solution_t<i_t, f_t> deserialize_lp_solution(
     const std::vector<uint8_t>& data) override
   {
     // Parse the solution data
     // Create and return solution object
+    // Example success return:
+    // return optimization_problem_solution_t<i_t, f_t>(...);

     // On error, return error solution:
     // return optimization_problem_solution_t<i_t, f_t>(
     //   cuopt::logic_error("Parse error", cuopt::error_type_t::RuntimeError));
   }
cpp/src/linear_programming/utilities/cython_solve.cu (1)

255-262: Consider using std::any_of for cleaner MIP detection.

The loop correctly identifies MIP problems, but could be more idiomatic.

Optional refactor
     // Determine if LP or MIP based on variable types
-    bool is_mip    = false;
-    auto var_types = data_model->get_variable_types();
-    for (size_t i = 0; i < var_types.size(); ++i) {
-      if (var_types.data()[i] != 'C') {
-        is_mip = true;
-        break;
-      }
-    }
+    auto var_types = data_model->get_variable_types();
+    bool is_mip = std::any_of(var_types.data(), 
+                              var_types.data() + var_rules.size(),
+                              [](char c) { return c != 'C'; });
cpp/CMakeLists.txt (2)

228-231: Prefer target-specific include directories over global include_directories.

Using include_directories() at the top level pollutes the global include path for all targets. Consider using target_include_directories() on the specific targets that need these paths (e.g., cuopt).

♻️ Suggested change
 # Protocol Buffers for remote solve serialization
 find_package(Protobuf REQUIRED)
-include_directories(${Protobuf_INCLUDE_DIRS})
-include_directories(${CMAKE_CURRENT_BINARY_DIR})
+# Protobuf includes will be added via target_include_directories on cuopt target

Then add to the cuopt target's include directories (around line 301):

target_include_directories(cuopt
  PRIVATE
  ...
  "${Protobuf_INCLUDE_DIRS}"
  "${CMAKE_CURRENT_BINARY_DIR}"  # For generated proto headers
  ...
)

551-613: Remote server build configuration looks well-structured.

The optional BUILD_REMOTE_SERVER flag with proper guards (NOT BUILD_LP_ONLY) and RUNPATH configuration for development is appropriate. The install targets and component assignment are correctly configured.

One minor observation: cuopt_cancel_job links cuopt as PUBLIC, but for executables, PRIVATE linkage is more appropriate since executables don't export their dependencies.

♻️ Minor linkage adjustment
   target_link_libraries(cuopt_cancel_job
-    PUBLIC
+    PRIVATE
     cuopt
   )
cpp/include/cuopt/linear_programming/pdlp/solver_solution.hpp (1)

274-306: Host accessor documentation could note thread-safety.

The host accessors are documented with preconditions, but per coding guidelines, consider adding a note about thread-safety. Since these return references to internal storage, concurrent access would be unsafe.

📝 Optional documentation enhancement
   /**
    * @brief Returns the primal solution in host (CPU) memory.
    * Only valid when is_device_memory() returns false.
+   * @note Not thread-safe - caller must ensure exclusive access when modifying.
    *
    * @return std::vector<f_t>& The host memory container for the primal solution.
    */
   std::vector<f_t>& get_primal_solution_host();
docs/developer/REMOTE_SOLVE_GUIDE.md (1)

16-46: Consider adding text or plaintext language identifier for the ASCII diagram.

The static analysis tool flags the missing language specifier. For ASCII diagrams, you can use text or plaintext to satisfy linters while preserving formatting.

🔧 Suggested fix
-```
+```text
 ┌─────────────────────────────────────────────────────────────┐
cpp/cuopt_cancel_job.cpp (1)

73-74: Consider validating environment variable port value.

std::atoi silently returns 0 for invalid input (e.g., CUOPT_REMOTE_PORT=abc), which would result in attempting to connect to port 0. Consider adding validation or using std::stoi with try-catch for consistency.

docs/remote_solve_architecture.md (2)

15-62: Add language specifier to ASCII diagram code blocks.

The ASCII diagrams should have a language specifier for markdown linting compliance. Use text or plaintext for ASCII art diagrams.

📝 Suggested fix
-```
+```text
 ┌─────────────────────────────────────────────────────────────────────────────┐
 │                              CLIENT PROCESS                                 │

Apply similar changes to other ASCII diagram blocks at lines 102, 138, 193, 229, 265, 336, 385, 445, and 490.


359-366: Document the rationale for connection timeout absence.

The "Connection timeout: None" entry is flagged as a security note, but the document doesn't explain whether this is intentional (e.g., for long-running solves) or a gap to be addressed. Consider adding context about why timeouts are not implemented and whether this is planned for future enhancement.

cpp/libmps_parser/include/mps_parser/data_model_view.hpp (1)

424-426: Consider documenting thread-safety expectations.

The is_device_memory_ flag has no synchronization. If a data_model_view_t instance might be accessed from multiple threads (one modifying via set_is_device_memory, another reading via is_device_memory), this could lead to a data race. Consider either:

  1. Documenting that the view is not thread-safe (consistent with the non-owning, setup-then-use pattern), or
  2. Using std::atomic<bool> if concurrent access is expected.

Given the typical usage pattern (configure view, then use), option 1 is likely sufficient.

build.sh (1)

409-415: Consider using CMake install instead of manual install command.

The manual install -m 755 command bypasses CMake's install mechanism, which could lead to inconsistencies (e.g., missing from uninstall, not tracked by CMake install manifest). If the CMake target has proper install rules, consider using cmake --install for the specific component.

However, if the server install rules are separate from the main libcuopt install, this approach may be intentional to avoid installing all of libcuopt again.

💡 Alternative using CMake install
     # Install the server executable
     if [ -z "${INSTALL_TARGET}" ]; then
         echo "Skipping install of cuopt_remote_server (-n flag set)"
     else
-        install -m 755 "${LIBCUOPT_BUILD_DIR}/cuopt_remote_server" "${INSTALL_PREFIX}/bin/"
-        echo "Installed cuopt_remote_server to ${INSTALL_PREFIX}/bin/"
+        cmake --install "${LIBCUOPT_BUILD_DIR}" --component cuopt_remote_server --prefix "${INSTALL_PREFIX}"
+        echo "Installed cuopt_remote_server to ${INSTALL_PREFIX}"
     fi

This requires the CMake target to have a corresponding install rule with the cuopt_remote_server component.

cpp/src/linear_programming/utilities/serializers/CMakeLists.txt (2)

20-26: Consider using CMAKE_CURRENT_SOURCE_DIR or target-based includes.

Using ${CMAKE_SOURCE_DIR} for include directories can be fragile if this CMakeLists.txt is ever included from a different location or used as a subproject. Consider using ${CMAKE_CURRENT_SOURCE_DIR}/../../../.. (relative paths) or preferably, rely on include directories exported by the cuopt target dependency.

💡 Alternative using relative paths
   target_include_directories(cuopt_msgpack_serializer
     PRIVATE
-    "${CMAKE_SOURCE_DIR}/include"
-    "${CMAKE_SOURCE_DIR}/libmps_parser/include"
-    "${CMAKE_SOURCE_DIR}/src"
+    "${CMAKE_CURRENT_SOURCE_DIR}/../../../../include"
+    "${CMAKE_CURRENT_SOURCE_DIR}/../../../../libmps_parser/include"
+    "${CMAKE_CURRENT_SOURCE_DIR}/../.."
     ${CUDA_INCLUDE_DIRS}
   )

Or better, if cuopt target exports its include directories properly, some of these may be redundant since cuopt is already linked.


42-46: Consider adding SOVERSION for shared library versioning.

For a shared library plugin, setting SOVERSION helps with ABI compatibility tracking and allows multiple versions to coexist.

💡 Suggested addition
   set_target_properties(cuopt_msgpack_serializer PROPERTIES
     CXX_STANDARD 17
     CXX_STANDARD_REQUIRED ON
     POSITION_INDEPENDENT_CODE ON
+    VERSION 1.0.0
+    SOVERSION 1
   )
dev_scripts/test_pipe_cancel_restart.py (1)

194-284: Test workflow is well-structured with clear steps.

The test follows a logical sequence: submit → wait → check status → cancel → verify cancellation → wait for restart → submit new job → verify completion. Each step has appropriate logging and validation.

Consider adding a return code for the case where cancellation verification fails (lines 239-244 only print a warning but don't fail the test).

♻️ Optional: Return failure if cancellation doesn't produce expected status
     if status != cuopt_remote_pb2.CANCELLED:
         print(
             f"    WARNING: Expected CANCELLED, got {cuopt_remote_pb2.JobStatus.Name(status)}"
         )
+        # Optionally fail the test if cancellation didn't work as expected
+        # return 1
     else:
         print("    ✓ Job successfully cancelled!")
cpp/tests/linear_programming/unit_tests/remote_solve_test.cu (2)

138-194: Add numerical correctness validation for local solve tests.

The local solve tests (gpu_data_with_local_solve and cpu_data_with_local_solve) only verify Optimal termination status but don't validate the solution values. Per coding guidelines, tests should validate numerical correctness, not just "runs without error."

For this LP:

  • minimize: 0.2x1 + 0.1x2
  • subject to: 3x1 + 4x2 <= 5.4, 2.7x1 + 10.1x2 <= 4.9, x1,x2 >= 0

The optimal solution is at x1=x2=0 with objective=0.

♻️ Add numerical validation to local solve tests
// After checking termination status
EXPECT_EQ(solution.get_termination_status(), pdlp_termination_status_t::Optimal);

// Validate solution values
auto primal = solution.get_primal_solution();
EXPECT_NEAR(solution.get_primal_objective(), 0.0, 1e-6);
// For this problem, optimal is at origin (0, 0)

Also applies to: 232-265


128-136: Verify expected behavior when remote solve is fully implemented.

The tests currently expect NumericalError for remote paths because remote solving is not yet implemented. Add a TODO or tracking mechanism to update these expectations once remote solving is functional.

Would you like me to open an issue to track updating these test expectations when remote solve is fully implemented?

Also applies to: 225-229

python/cuopt/cuopt/linear_programming/remote.py (1)

163-170: Socket receive loop should handle partial reads correctly.

The receive loop correctly accumulates chunks until the full message is received. However, if the connection is closed mid-message (not chunk), the function breaks out and may process incomplete data. Consider returning an error in this case.

♻️ Handle incomplete message as error
         while len(response_data) < size:
             chunk = sock.recv(size - len(response_data))
             if not chunk:
-                break
+                sock.close()
+                return CancelResult(
+                    success=False,
+                    message="Connection closed while receiving response",
+                    job_status=JobStatus.NOT_FOUND,
+                )
             response_data += chunk
cpp/include/cuopt/linear_programming/solve.hpp (2)

112-127: Documentation may be misleading about memory location.

The Doxygen comment states the view "points to GPU memory", but per the PR's dual-path architecture, data_model_view_t can point to either CPU or GPU memory. Consider updating the documentation to reflect this:

- * This function copies data from the view (which points to GPU memory)
- * into an owning optimization_problem_t.
+ * This function copies data from the view (which can point to either CPU
+ * or GPU memory, based on is_device_memory()) into an owning
+ * optimization_problem_t stored on the GPU.

129-157: Well-documented API addition for view-based LP solving.

The documentation clearly explains the dual CPU/GPU path behavior and environment variable usage. Consider adding a brief note about thread-safety expectations, as this is a public header. For example:

@note This function is not thread-safe when multiple threads solve using the same handle.

cpp/include/cuopt/linear_programming/utilities/remote_serialization.hpp (2)

217-237: Nested types within template class.

The job_status_t enum, logs_result_t, and cancel_result_t structs are defined inside the template class. This means users must access them as remote_serializer_t<i_t, f_t>::job_status_t, which may be cumbersome.

Consider whether these types should be defined at namespace scope for easier access, especially since they don't depend on the template parameters:

namespace cuopt::linear_programming {
  enum class job_status_t { QUEUED, PROCESSING, COMPLETED, FAILED, NOT_FOUND, CANCELLED };
  struct logs_result_t { ... };
  struct cancel_result_t { ... };
}

This is a minor usability consideration and can be addressed later if needed.


307-314: Consider using enum for request type instead of int.

The get_async_request_type returns an int with magic number meanings documented in the comment. Consider defining a RequestType enum at namespace scope and using it as the return type for type safety:

enum class request_type_t {
  SUBMIT_JOB = 0, CHECK_STATUS = 1, GET_RESULT = 2,
  DELETE_RESULT = 3, GET_LOGS = 4, CANCEL_JOB = 5, WAIT_FOR_RESULT = 6
};
virtual request_type_t get_async_request_type(const std::vector<uint8_t>& data) = 0;

This would prevent callers from using incorrect integer values.

cpp/include/cuopt/linear_programming/utilities/remote_solve.hpp (2)

39-54: Add port range validation.

The port parsing at line 46 uses std::stoi but doesn't validate that the port is in the valid range (1-65535). Invalid port values could cause connection failures with unclear error messages.

Suggested improvement
   if (host != nullptr && port != nullptr && host[0] != '\0' && port[0] != '\0') {
     try {
       int port_num = std::stoi(port);
+      if (port_num < 1 || port_num > 65535) {
+        return std::nullopt;  // Invalid port range
+      }
       return remote_solve_config_t{std::string(host), port_num};
     } catch (...) {
       // Invalid port number, fall back to local solve
       return std::nullopt;
     }
   }

98-114: Duplicate job status enum definitions.

remote_job_status_t here duplicates job_status_t defined inside remote_serializer_t in remote_serialization.hpp. While this separation may be intentional (public vs internal API), consider:

  1. Using a single definition at namespace scope
  2. Or adding a comment explaining why duplication is preferred

This prevents drift between the two definitions.

cpp/src/linear_programming/cuopt_c.cpp (2)

140-146: Consider caching MIP detection result.

The is_mip() method iterates through all variable types on each call, which is O(n). For large problems called multiple times, consider caching this result during problem creation:

bool is_mip_cached = false;
bool is_mip_value = false;

bool is_mip() const {
  if (!is_mip_cached) {
    for (char vt : variable_types) {
      if (vt == CUOPT_INTEGER) { 
        is_mip_value = true; 
        break;
      }
    }
    is_mip_cached = true;
  }
  return is_mip_value;
}

However, since is_mip() is typically called only a few times per solve, this is a minor optimization.


819-823: Consider null checks for defensive programming.

The getter functions assume that if !view.is_device_memory() then cpu_data is valid, and vice versa for gpu_problem. While this invariant should hold if the problem was created correctly, adding defensive null checks would catch programming errors:

if (!problem_and_stream_view->view.is_device_memory()) {
  if (!problem_and_stream_view->cpu_data) return CUOPT_INTERNAL_ERROR;
  *num_constraints_ptr = problem_and_stream_view->cpu_data->num_constraints;
}

This is a defensive programming suggestion and may not be necessary if invariants are always maintained.

cpp/cuopt_cli.cpp (1)

72-172: Deduplicate create_view_from_mps_data_model across TUs (currently diverging)

This helper exists (with different field coverage) in multiple places (CLI + LP + MIP). It’s already diverging (e.g., quadratic objective handling differs), which is a correctness risk long-term. Suggest moving to a shared header (e.g., cuopt/linear_programming/utilities/create_view.hpp) and reusing everywhere. Based on learnings, reduce duplication and coupling by consolidating shared utilities.

cpp/cuopt_remote_server.cpp (1)

2275-2277: Bound per-connection thread creation (DoS risk)

Each accepted socket spawns a detached thread (Line 2276) with no cap/backpressure. A burst of connections can exhaust threads/memory.

cpp/src/linear_programming/utilities/protobuf_serializer.cu (3)

140-153: Consider extracting MIP settings serialization into a helper method.

This block is duplicated verbatim at lines 339-352 in serialize_async_mip_request. Per coding guidelines, refactor code duplication (3+ occurrences) into shared utilities.

♻️ Suggested refactor
+private:
+  void serialize_mip_settings_to_proto(const mip_solver_settings_t<i_t, f_t>& settings,
+                                       cuopt::remote::MIPSolverSettings* pb_settings)
+  {
+    pb_settings->set_time_limit(settings.time_limit);
+    pb_settings->set_relative_mip_gap(settings.tolerances.relative_mip_gap);
+    pb_settings->set_absolute_mip_gap(settings.tolerances.absolute_mip_gap);
+    pb_settings->set_integrality_tolerance(settings.tolerances.integrality_tolerance);
+    pb_settings->set_absolute_tolerance(settings.tolerances.absolute_tolerance);
+    pb_settings->set_relative_tolerance(settings.tolerances.relative_tolerance);
+    pb_settings->set_presolve_absolute_tolerance(settings.tolerances.presolve_absolute_tolerance);
+    pb_settings->set_log_to_console(settings.log_to_console);
+    pb_settings->set_heuristics_only(settings.heuristics_only);
+    pb_settings->set_num_cpu_threads(settings.num_cpu_threads);
+    pb_settings->set_num_gpus(settings.num_gpus);
+    pb_settings->set_presolve(settings.presolve);
+    pb_settings->set_mip_scaling(settings.mip_scaling);
+  }

Then use serialize_mip_settings_to_proto(settings, pb_settings) in both locations.


1310-1312: Consider adding a const overload for get_reduced_cost() instead of using const_cast.

The const_cast works for read-only access but is fragile. If the getter ever has side effects, this could cause undefined behavior.


800-833: Consider adding a defensive check for CPU memory requirement.

The comment at lines 803-804 notes that view must point to CPU memory, but there's no runtime verification. If this precondition is violated, the code would have undefined behavior accessing GPU pointers from CPU.

💡 Suggested defensive check
   void serialize_problem_to_proto(const mps_parser::data_model_view_t<i_t, f_t>& view,
                                   cuopt::remote::OptimizationProblem* pb_problem)
   {
     // Note: view must point to CPU memory for serialization
     // The solve functions ensure this by copying GPU data to CPU if needed
+    if (view.is_device_memory()) {
+      throw cuopt::logic_error("serialize_problem_to_proto requires CPU memory view",
+                               cuopt::error_type_t::InvalidArgument);
+    }

This assumes data_model_view_t has an is_device_memory() method similar to solution types.

Comment on lines +67 to +91
struct JobQueueEntry {
char job_id[64];
uint32_t problem_type; // 0 = LP, 1 = MIP
uint64_t data_size; // Size of problem data (uint64 for large problems)
char shm_data_name[128]; // Name of per-job shared memory segment (shm mode only)
std::atomic<bool> ready; // Job is ready to be processed
std::atomic<bool> claimed; // Worker has claimed this job
std::atomic<pid_t> worker_pid; // PID of worker that claimed this job (0 if none)
std::atomic<bool> cancelled; // Job has been cancelled (worker should skip)
// Pipe mode fields
std::atomic<int> worker_index; // Index of worker that claimed this job (-1 if none)
std::atomic<bool> data_sent; // Server has sent data to worker's pipe (pipe mode)
};

// Result queue entry - small fixed size, data stored in separate per-result shared memory or pipe
struct ResultQueueEntry {
char job_id[64];
uint32_t status; // 0 = success, 1 = error, 2 = cancelled
uint64_t data_size; // Size of result data (uint64 for large results)
char shm_data_name[128]; // Name of per-result shared memory segment (shm mode only)
char error_message[1024];
std::atomic<bool> ready; // Result is ready
std::atomic<bool> retrieved; // Result has been retrieved
std::atomic<int> worker_index; // Index of worker that produced this result (pipe mode)
};
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: unsafe strncpy usage can produce non-NUL-terminated strings

Several strncpy(..., n-1) calls never write the trailing '\0', so later std::string(job_queue[i].job_id) / strcmp(...) may read past bounds (undefined behavior). This affects job IDs, shm names, and error messages.

Suggested pattern
- strncpy(result_queue[i].job_id, job_id.c_str(), sizeof(result_queue[i].job_id) - 1);
+ std::snprintf(result_queue[i].job_id, sizeof(result_queue[i].job_id), "%s", job_id.c_str());

(Or set buf[sizeof(buf)-1] = '\0' immediately after each strncpy.)

Also applies to: 548-562, 720-738, 999-1011, 1459-1476

🤖 Prompt for AI Agents
In @cpp/cuopt_remote_server.cpp around lines 67 - 91, The problem: several
strncpy(...) uses that populate fixed-size fields (JobQueueEntry::job_id,
JobQueueEntry::shm_data_name, ResultQueueEntry::job_id,
ResultQueueEntry::shm_data_name, ResultQueueEntry::error_message and similar
spots) use length-1 without explicitly writing a trailing NUL, allowing
non-NUL-terminated strings and UB when constructing std::string or calling
strcmp. Fix: after every strncpy into these fixed buffers (job_id,
shm_data_name, error_message, etc.) explicitly set buffer[sizeof(buffer)-1] =
'\0' (or switch to a safe copy like strncpy followed by that assignment, or use
strlcpy if available) so the fields are always NUL-terminated; update all
occurrences noted in the review (around lines 67-91, 548-562, 720-738, 999-1011,
1459-1476) and any other strncpy uses that write into fixed char arrays.

Comment on lines +108 to +127
// Helper to send a framed message with type
static bool send_typed_message(int sockfd, MessageType type, const void* data, size_t size)
{
uint8_t msg_type = static_cast<uint8_t>(type);
uint64_t payload_size = static_cast<uint64_t>(size);

if (::write(sockfd, &msg_type, 1) != 1) return false;
if (::write(sockfd, &payload_size, sizeof(payload_size)) != sizeof(payload_size)) return false;
if (size > 0) {
const uint8_t* ptr = static_cast<const uint8_t*>(data);
size_t remaining = size;
while (remaining > 0) {
ssize_t written = ::write(sockfd, ptr, remaining);
if (written <= 0) return false;
ptr += written;
remaining -= written;
}
}
return true;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: socket framing writes are not robust (partial writes / EINTR)

send_typed_message() assumes ::write() writes the full 1-byte type and 8-byte size (Lines 114-116). On sockets this can be partial; also EINTR isn’t handled here or in write_all/read_all. This can corrupt the stream protocol.

Proposed fix (use write_all for headers too)
 static bool send_typed_message(int sockfd, MessageType type, const void* data, size_t size)
 {
   uint8_t msg_type      = static_cast<uint8_t>(type);
   uint64_t payload_size = static_cast<uint64_t>(size);

-  if (::write(sockfd, &msg_type, 1) != 1) return false;
-  if (::write(sockfd, &payload_size, sizeof(payload_size)) != sizeof(payload_size)) return false;
+  if (!write_all(sockfd, &msg_type, 1)) return false;
+  if (!write_all(sockfd, &payload_size, sizeof(payload_size))) return false;
   ...
 }

Also applies to: 315-339, 185-197

🤖 Prompt for AI Agents
In @cpp/cuopt_remote_server.cpp around lines 108 - 127, send_typed_message
currently assumes ::write writes the entire type byte and 8-byte size and
doesn't handle partial writes or EINTR; update it to use the existing write_all
helper (and ensure write_all itself handles EINTR/short writes) for the header
fields (msg_type and payload_size) and for the payload loop so all writes are
retried until complete; also audit and replace raw ::write usages in the other
framing helpers (the code around the other helper functions that perform framing
and the read_all/write_all implementations) to use the robust write_all/read_all
semantics so headers, sizes and payloads cannot be partially written or
interrupted.

Comment on lines +1190 to +1202
{
std::lock_guard<std::mutex> lock(waiters_mutex);
auto wit = waiting_threads.find(job_id);
if (wit != waiting_threads.end()) {
// Wake up the waiting thread
auto waiter = wit->second;
waiter->result_data = std::move(result_data);
waiter->error_message = error_message;
waiter->success = success;
waiter->ready = true;
waiter->cv.notify_one();
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: data races when completing/cancelling a blocking waiter

result_retrieval_thread() and cancel_job() mutate JobWaiter fields without locking JobWaiter::mutex (Lines 1195-1201, 1758-1763). The waiting thread reads waiter->ready under that mutex (Line 1620), so current code is a data race (UB) and can miss wakeups or observe torn state.

Proposed fix
 auto waiter = wit->second;
- waiter->result_data   = std::move(result_data);
- waiter->error_message = error_message;
- waiter->success       = success;
- waiter->ready         = true;
- waiter->cv.notify_one();
+ {
+   std::lock_guard<std::mutex> lk(waiter->mutex);
+   waiter->result_data   = std::move(result_data);
+   waiter->error_message = error_message;
+   waiter->success       = success;
+   waiter->ready         = true;
+ }
+ waiter->cv.notify_one();

Also applies to: 1605-1627, 1754-1764

🤖 Prompt for AI Agents
In @cpp/cuopt_remote_server.cpp around lines 1190 - 1202, The code mutates
JobWaiter fields from result_retrieval_thread() and cancel_job() without
acquiring the JobWaiter::mutex, causing data races against the waiting thread
that reads waiter->ready under that mutex; to fix, when you look up the waiter
in waiting_threads (using waiters_mutex) then acquire the specific waiter's
mutex (e.g., std::lock_guard<std::mutex> waiter_lock(waiter->mutex)) before
writing waiter->result_data, waiter->error_message, waiter->success, and
waiter->ready, and call waiter->cv.notify_one() while still holding that waiter
mutex (or use notify_one immediately after releasing but ensure memory
visibility), doing the same symmetric locking in cancel_job(); ensure the
waiting thread still checks the ready predicate under JobWaiter::mutex so no
race or missed wakeup occurs.

Comment on lines +228 to +230
// Note: variable_types in optimization_problem_t uses var_t enum, not char
// The view's variable_types span will point to GPU memory with var_t values
// This is handled specially in solve routines
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

rg -n -B 3 -A 10 'variable_types' cpp/src/linear_programming/solve.cu | head -100

Repository: NVIDIA/cuopt

Length of output: 4952


🏁 Script executed:

# Also check the optimization_problem_t definition to see the actual type
rg -n -B 2 -A 15 'struct optimization_problem_t|class optimization_problem_t' --type cpp --type cu | head -80

Repository: NVIDIA/cuopt

Length of output: 82


🏁 Script executed:

# Check the view creation methods in the file mentioned
rg -n -B 2 -A 8 'create_view_from_gpu_problem\|create_view_from_cpu_data' cpp/src/linear_programming/cuopt_c.cpp

Repository: NVIDIA/cuopt

Length of output: 38


🏁 Script executed:

cat -n cpp/src/linear_programming/cuopt_c.cpp | sed -n '149,253p'

Repository: NVIDIA/cuopt

Length of output: 5121


🏁 Script executed:

# Search for problem_and_stream_view_t definition and its create_view_from_gpu_problem method
rg -n -A 30 'create_view_from_gpu_problem' cpp/src/linear_programming/cuopt_c.cpp

Repository: NVIDIA/cuopt

Length of output: 9238


Add missing set_variable_types() call to GPU view creation or handle unset variable_types in solve routines.

The create_view_from_gpu_problem() method (lines 200-242) does not set variable_types on the view, even though create_view_from_cpu_data() does via cpu_data->create_view(). The comment claims this is "handled specially in solve routines," but the view's variable_types span remains uninitialized for GPU data. Solve routines in solve.cu expect variable_types to be set on the view (lines 1177-1202) and will receive empty/uninitialized data for GPU problems. Either call view.set_variable_types(gpu.get_variable_types().data(), gpu.get_n_variables()) in create_view_from_gpu_problem(), or ensure solve routines gracefully handle missing variable_types on the GPU path.

🤖 Prompt for AI Agents
In @cpp/src/linear_programming/cuopt_c.cpp around lines 228 - 230, The GPU view
creation in create_view_from_gpu_problem currently leaves view.variable_types
unset (unlike create_view_from_cpu_data which calls create_view), causing solve
routines in solve.cu (which expect variable_types to be present) to read
uninitialized data; fix by calling
view.set_variable_types(gpu.get_variable_types().data(), gpu.get_n_variables())
during create_view_from_gpu_problem so the view's variable_types span is
initialized, or alternatively modify the solve routines to check for a
missing/empty variable_types span and handle the GPU path explicitly.

Comment on lines +1045 to +1128
// Helper to create a data_model_view_t from mps_data_model_t (for remote solve path)
template <typename i_t, typename f_t>
static data_model_view_t<i_t, f_t> create_view_from_mps_data_model(
const cuopt::mps_parser::mps_data_model_t<i_t, f_t>& mps_data_model)
{
data_model_view_t<i_t, f_t> view;

view.set_maximize(mps_data_model.get_sense());

if (!mps_data_model.get_constraint_matrix_values().empty()) {
view.set_csr_constraint_matrix(mps_data_model.get_constraint_matrix_values().data(),
mps_data_model.get_constraint_matrix_values().size(),
mps_data_model.get_constraint_matrix_indices().data(),
mps_data_model.get_constraint_matrix_indices().size(),
mps_data_model.get_constraint_matrix_offsets().data(),
mps_data_model.get_constraint_matrix_offsets().size());
}

if (!mps_data_model.get_constraint_bounds().empty()) {
view.set_constraint_bounds(mps_data_model.get_constraint_bounds().data(),
mps_data_model.get_constraint_bounds().size());
}

if (!mps_data_model.get_objective_coefficients().empty()) {
view.set_objective_coefficients(mps_data_model.get_objective_coefficients().data(),
mps_data_model.get_objective_coefficients().size());
}

view.set_objective_scaling_factor(mps_data_model.get_objective_scaling_factor());
view.set_objective_offset(mps_data_model.get_objective_offset());

if (!mps_data_model.get_variable_lower_bounds().empty()) {
view.set_variable_lower_bounds(mps_data_model.get_variable_lower_bounds().data(),
mps_data_model.get_variable_lower_bounds().size());
}

if (!mps_data_model.get_variable_upper_bounds().empty()) {
view.set_variable_upper_bounds(mps_data_model.get_variable_upper_bounds().data(),
mps_data_model.get_variable_upper_bounds().size());
}

if (!mps_data_model.get_variable_types().empty()) {
view.set_variable_types(mps_data_model.get_variable_types().data(),
mps_data_model.get_variable_types().size());
}

if (!mps_data_model.get_row_types().empty()) {
view.set_row_types(mps_data_model.get_row_types().data(),
mps_data_model.get_row_types().size());
}

if (!mps_data_model.get_constraint_lower_bounds().empty()) {
view.set_constraint_lower_bounds(mps_data_model.get_constraint_lower_bounds().data(),
mps_data_model.get_constraint_lower_bounds().size());
}

if (!mps_data_model.get_constraint_upper_bounds().empty()) {
view.set_constraint_upper_bounds(mps_data_model.get_constraint_upper_bounds().data(),
mps_data_model.get_constraint_upper_bounds().size());
}

view.set_objective_name(mps_data_model.get_objective_name());
view.set_problem_name(mps_data_model.get_problem_name());

if (!mps_data_model.get_variable_names().empty()) {
view.set_variable_names(mps_data_model.get_variable_names());
}

if (!mps_data_model.get_row_names().empty()) {
view.set_row_names(mps_data_model.get_row_names());
}

if (!mps_data_model.get_initial_primal_solution().empty()) {
view.set_initial_primal_solution(mps_data_model.get_initial_primal_solution().data(),
mps_data_model.get_initial_primal_solution().size());
}

if (!mps_data_model.get_initial_dual_solution().empty()) {
view.set_initial_dual_solution(mps_data_model.get_initial_dual_solution().data(),
mps_data_model.get_initial_dual_solution().size());
}

return view;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: quadratic objective is dropped when solving from mps_data_model_t via view

solve_lp(handle, mps_data_model, ...) now builds a view (Line 1140) using create_view_from_mps_data_model() (Line 1047), but that helper never sets the quadratic objective matrix. This will silently change behavior for QP inputs (local and remote), since downstream view.has_quadratic_objective() will be false.

Proposed fix
 static data_model_view_t<i_t, f_t> create_view_from_mps_data_model(
   const cuopt::mps_parser::mps_data_model_t<i_t, f_t>& mps_data_model)
 {
   data_model_view_t<i_t, f_t> view;

   view.set_maximize(mps_data_model.get_sense());
+  view.set_is_device_memory(false);  // MPS data is host-backed

   ...
+  if (mps_data_model.has_quadratic_objective()) {
+    view.set_quadratic_objective_matrix(mps_data_model.get_quadratic_objective_values().data(),
+                                        mps_data_model.get_quadratic_objective_values().size(),
+                                        mps_data_model.get_quadratic_objective_indices().data(),
+                                        mps_data_model.get_quadratic_objective_indices().size(),
+                                        mps_data_model.get_quadratic_objective_offsets().data(),
+                                        mps_data_model.get_quadratic_objective_offsets().size());
+  }

   return view;
 }

Also applies to: 1138-1143

Comment on lines +512 to 564
template <typename i_t, typename f_t>
mip_solution_t<i_t, f_t> solve_mip(raft::handle_t const* handle_ptr,
const data_model_view_t<i_t, f_t>& view,
mip_solver_settings_t<i_t, f_t> const& settings)
{
// Initialize logger for this overload (needed for early returns)
init_logger_t log(settings.log_file, settings.log_to_console);

// Check for remote solve configuration first
auto remote_config = get_remote_solve_config();

if (view.is_device_memory()) {
if (remote_config.has_value()) {
// GPU data + remote solve requested: need valid handle to copy GPU→CPU
if (handle_ptr == nullptr) {
CUOPT_LOG_ERROR(
"[solve_mip] Remote solve requested with GPU data but no CUDA handle. "
"This is an internal error - GPU data should not exist without CUDA initialization.");
return mip_solution_t<i_t, f_t>(
cuopt::logic_error("No CUDA handle for GPU data", cuopt::error_type_t::RuntimeError));
}
CUOPT_LOG_WARN(
"[solve_mip] Remote solve requested but data is on GPU. "
"Copying to CPU for serialization (performance impact).");
auto cpu_data = copy_view_to_cpu(handle_ptr, view);
auto cpu_view = cpu_data.create_view();

CUOPT_LOG_INFO(
"[solve_mip] Remote solve detected: CUOPT_REMOTE_HOST=%s, CUOPT_REMOTE_PORT=%d",
remote_config->host.c_str(),
remote_config->port);
// Remote solve with GPU data - serialize cpu_view and send to remote server
return solve_mip_remote(*remote_config, cpu_view, settings);
}

// Local solve: data already on GPU - convert view to optimization_problem_t and solve
auto op_problem = data_model_view_to_optimization_problem(handle_ptr, view);
return solve_mip(op_problem, settings);
}

// Data is on CPU
if (remote_config.has_value()) {
CUOPT_LOG_INFO("[solve_mip] Remote solve detected: CUOPT_REMOTE_HOST=%s, CUOPT_REMOTE_PORT=%d",
remote_config->host.c_str(),
remote_config->port);
// Remote solve with CPU data - serialize view and send to remote server
return solve_mip_remote(*remote_config, view, settings);
}

// Local solve with CPU data: copy to GPU and solve
auto op_problem = data_model_view_to_optimization_problem(handle_ptr, view);
return solve_mip(op_problem, settings);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: local CPU-view path can dereference a null CUDA handle

Same pattern as LP: when remote_config is not set and view.is_device_memory()==false, the code converts CPU view → GPU problem without validating handle_ptr (Lines 561-563).

🤖 Prompt for AI Agents
In @cpp/src/mip/solve.cu around lines 512 - 564, The CPU-local path in solve_mip
can dereference a null CUDA handle: before calling
data_model_view_to_optimization_problem(handle_ptr, view) when
view.is_device_memory() is false and remote_config is not set, validate
handle_ptr; if it's null, log an error (CUOPT_LOG_ERROR) and return a
mip_solution_t<i_t,f_t> with cuopt::logic_error("No CUDA handle for CPU->GPU
copy", cuopt::error_type_t::RuntimeError) to mirror the GPU->CPU remote-check
branch; otherwise proceed to call data_model_view_to_optimization_problem and
return solve_mip(op_problem, settings).

Comment on lines +22 to +23
sys.path.insert(0, "/home/tmckay/repos/nvidia-cuopt/cpp/build")
import cuopt_remote_pb2
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Hard-coded developer path must be removed.

Line 22 contains a hard-coded absolute path to a specific developer's home directory. This will break for any other developer or CI environment.

🐛 Proposed fix: Use relative path or environment variable
-sys.path.insert(0, "/home/tmckay/repos/nvidia-cuopt/cpp/build")
+# Add build directory for development - use relative path from script location
+import pathlib
+build_path = pathlib.Path(__file__).parent.parent / "cpp" / "build"
+if build_path.exists():
+    sys.path.insert(0, str(build_path))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
sys.path.insert(0, "/home/tmckay/repos/nvidia-cuopt/cpp/build")
import cuopt_remote_pb2
# Add build directory for development - use relative path from script location
import pathlib
build_path = pathlib.Path(__file__).parent.parent / "cpp" / "build"
if build_path.exists():
sys.path.insert(0, str(build_path))
import cuopt_remote_pb2

Comment on lines +29 to +52
def send_recv(sock, data):
"""Send request and receive response with uint64 size prefix."""
# Send size (uint64) + data
sock.sendall(struct.pack("<Q", len(data)) + data)

# Receive size (uint64)
size_data = b""
while len(size_data) < 8:
chunk = sock.recv(8 - len(size_data))
if not chunk:
raise ConnectionError("Connection closed")
size_data += chunk

size = struct.unpack("<Q", size_data)[0]

# Receive response
response = b""
while len(response) < size:
chunk = sock.recv(min(4096, size - len(response)))
if not chunk:
raise ConnectionError("Connection closed")
response += chunk

return response
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# First, find and examine the Python files for struct.pack/unpack patterns
echo "=== Python struct.pack/unpack patterns ==="
rg -n "struct\.(pack|unpack)" --type=py -B2 -A2 | head -100

echo ""
echo "=== Check remote.py specifically ==="
find . -path "*/python/cuopt/cuopt/linear_programming/remote.py" -o -path "*/remote.py" | head -5

echo ""
echo "=== Look for dev_scripts/test_pipe_cancel_restart.py ==="
find . -name "test_pipe_cancel_restart.py"

Repository: NVIDIA/cuopt

Length of output: 1822


🏁 Script executed:

#!/bin/bash
# Check C++ files for size prefix usage
echo "=== C++ uint64_t/uint32_t size usage ==="
rg -n "uint64_t.*size|uint32_t.*size" --type=cpp --type=cu -B2 -A2 | head -100

echo ""
echo "=== Find remote_solve.cu ==="
find . -name "remote_solve.cu"

Repository: NVIDIA/cuopt

Length of output: 225


🏁 Script executed:

#!/bin/bash
# Examine remote_solve.cu around lines 89-98
echo "=== remote_solve.cu lines 80-110 ==="
sed -n '80,110p' ./cpp/src/linear_programming/utilities/remote_solve.cu

echo ""
echo "=== Search for size/socket patterns in remote_solve.cu ==="
grep -n "uint64_t\|uint32_t\|recv\|send" ./cpp/src/linear_programming/utilities/remote_solve.cu | head -20

Repository: NVIDIA/cuopt

Length of output: 2164


Fix protocol mismatch: python/cuopt/cuopt/linear_programming/remote.py uses <I (uint32) but should use <Q (uint64).

The C++ server in remote_solve.cu (line 95) sends size as uint64_t. The test script test_pipe_cancel_restart.py correctly uses struct.pack("<Q", ...), but remote.py incorrectly uses struct.pack("<I", ...) at line 149 and struct.unpack("<I", ...) at line 162. This protocol mismatch breaks remote communication—the server sends 8 bytes for size, but the client reads only 4 bytes, corrupting all subsequent messages.

🧰 Tools
🪛 Ruff (0.14.11)

39-39: Avoid specifying long messages outside the exception class

(TRY003)


49-49: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents
In @dev_scripts/test_pipe_cancel_restart.py around lines 29 - 52, The client
protocol in python/cuopt/cuopt/linear_programming/remote.py incorrectly uses
32-bit size prefixes ("<I"), causing mismatch with the server's uint64_t sizes;
update the send and receive size packing/unpacking in functions that call
struct.pack("<I", ...) and struct.unpack("<I", ...) to use "<Q" instead, and
ensure any buffer reads expect 8 bytes for the size prefix (e.g., change reads
that assume 4 bytes to 8 bytes) so the client reads the full uint64 size and
then reads the subsequent payload accordingly.

Comment on lines +304 to +314
struct JobQueueEntry {
char job_id[32]; // Unique job identifier
uint32_t problem_type; // 0=LP, 1=MIP
uint32_t data_size; // Size of serialized problem
uint8_t data[MAX_JOB_DATA]; // Serialized problem data (~1MB)
pid_t worker_pid; // PID of worker processing this job
std::atomic<bool> ready; // Job ready for processing
std::atomic<bool> claimed; // Job claimed by a worker
std::atomic<bool> cancelled; // Job cancelled by user
};
```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify struct definitions in the actual implementation
echo "=== Searching for JobQueueEntry definition ==="
rg -n "struct JobQueueEntry" --type cpp -A 15

echo ""
echo "=== Searching for ResultQueueEntry definition ==="
rg -n "struct ResultQueueEntry" --type cpp -A 15

Repository: NVIDIA/cuopt

Length of output: 2697


Update struct definitions to match actual implementation in cpp/cuopt_remote_server.cpp.

The documentation does not match the actual code. Key discrepancies:

  • JobQueueEntry::job_id is char[64], not char[32]
  • data_size is uint64_t, not uint32_t
  • No inline uint8_t data[MAX_JOB_DATA] field; replaced with char shm_data_name[128] for shared memory segment names
  • worker_pid is std::atomic<pid_t>, not pid_t
  • Missing fields: std::atomic<int> worker_index and std::atomic<bool> data_sent (pipe mode support)

ResultQueueEntry similarly needs updates: add shm_data_name[128], update data_size to uint64_t, add worker_index, and clarify error message field. The architecture has evolved to support both pipe and shared memory modes; update the explanatory text to reflect this dual-mode design.

🤖 Prompt for AI Agents
In @docs/remote_solve_architecture.md around lines 304 - 314, The documented
struct definitions for JobQueueEntry and ResultQueueEntry are out of sync with
the real implementation; update the documentation to match
cpp/cuopt_remote_server.cpp by changing JobQueueEntry::job_id to char[64],
data_size to uint64_t, remove inline data[] and add char shm_data_name[128] for
shared-memory names, change worker_pid to std::atomic<pid_t>, and add
std::atomic<int> worker_index and std::atomic<bool> data_sent; similarly update
ResultQueueEntry to include shm_data_name[128], use uint64_t data_size, add
worker_index and clarify the error message field/type; also update surrounding
explanatory text to describe dual-mode transport (pipe and shared memory) and
how data is referenced via shm_data_name instead of inline buffers.

Comment on lines +147 to +162
# Send request (length-prefixed)
data = request.SerializeToString()
sock.sendall(struct.pack("<I", len(data)))
sock.sendall(data)

# Receive response
size_data = sock.recv(4)
if len(size_data) < 4:
sock.close()
return CancelResult(
success=False,
message="Failed to receive response size",
job_status=JobStatus.NOT_FOUND,
)

size = struct.unpack("<I", size_data)[0]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Size prefix format inconsistency with other implementations.

This code uses <I (uint32, 4 bytes) for the size prefix (lines 149, 153, 162), but:

  • dev_scripts/test_pipe_cancel_restart.py uses <Q (uint64, 8 bytes)
  • C++ remote_solve.cu uses uint64_t (8 bytes)

This will cause protocol errors when communicating with the server.

🐛 Proposed fix: Use uint64 size prefix to match server protocol
         # Send request (length-prefixed)
         data = request.SerializeToString()
-        sock.sendall(struct.pack("<I", len(data)))
+        sock.sendall(struct.pack("<Q", len(data)))
         sock.sendall(data)

         # Receive response
-        size_data = sock.recv(4)
-        if len(size_data) < 4:
+        size_data = sock.recv(8)
+        if len(size_data) < 8:
             sock.close()
             return CancelResult(
                 success=False,
                 message="Failed to receive response size",
                 job_status=JobStatus.NOT_FOUND,
             )

-        size = struct.unpack("<I", size_data)[0]
+        size = struct.unpack("<Q", size_data)[0]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Send request (length-prefixed)
data = request.SerializeToString()
sock.sendall(struct.pack("<I", len(data)))
sock.sendall(data)
# Receive response
size_data = sock.recv(4)
if len(size_data) < 4:
sock.close()
return CancelResult(
success=False,
message="Failed to receive response size",
job_status=JobStatus.NOT_FOUND,
)
size = struct.unpack("<I", size_data)[0]
# Send request (length-prefixed)
data = request.SerializeToString()
sock.sendall(struct.pack("<Q", len(data)))
sock.sendall(data)
# Receive response
size_data = sock.recv(8)
if len(size_data) < 8:
sock.close()
return CancelResult(
success=False,
message="Failed to receive response size",
job_status=JobStatus.NOT_FOUND,
)
size = struct.unpack("<Q", size_data)[0]

@tmckayus tmckayus added feature request New feature or request non-breaking Introduces a non-breaking change P0 Feature labels Jan 13, 2026
@tmckayus
Copy link
Contributor Author

/ok to test afd25ed

- Add libprotobuf to conda recipe build and host requirements
- Add libprotobuf to dependencies.yaml build_cpp section
- Regenerate conda environment files with libprotobuf

Fixes CI build failure where protobuf library was not found during
CMake configuration for remote solve server implementation.
@tmckayus
Copy link
Contributor Author

/ok to test 28ef558

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In @conda/environments/all_cuda-130_arch-x86_64.yaml.orig:
- Around line 1-82: The file all_cuda-130_arch-x86_64.yaml.orig should be
deleted from the commit because it represents an unsupported CUDA 13.0
environment and is missing the libprotobuf dependency; remove the file entirely
from the branch (git rm --cached/--force as appropriate) and commit that
deletion, then verify no other generated env files reference CUDA 13.0 or lack
libprotobuf so the supported CUDA 12.9 and 13.1 variants remain intact.

In @conda/recipes/libcuopt/recipe.yaml:
- Line 65: The run requirements for the libcuopt package are missing
libprotobuf, causing runtime link failures because cpp/CMakeLists.txt links
libcuopt.so against protobuf::libprotobuf; update the recipe.yaml run section
for the libcuopt output (the run list around the current entries pinned at the
run block) to include libprotobuf alongside libboost-devel and librmm (i.e.,
insert a run entry "libprotobuf" in the run dependencies list so consumers of
libcuopt get protobuf at runtime).
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between afd25ed and 28ef558.

📒 Files selected for processing (7)
  • conda/environments/all_cuda-129_arch-aarch64.yaml
  • conda/environments/all_cuda-129_arch-x86_64.yaml
  • conda/environments/all_cuda-130_arch-x86_64.yaml.orig
  • conda/environments/all_cuda-131_arch-aarch64.yaml
  • conda/environments/all_cuda-131_arch-x86_64.yaml
  • conda/recipes/libcuopt/recipe.yaml
  • dependencies.yaml
🔇 Additional comments (5)
conda/environments/all_cuda-131_arch-aarch64.yaml (1)

35-35: LGTM - libprotobuf dependency correctly added.

The addition aligns with the dependencies.yaml source file change and supports the new protobuf-based remote solve functionality.

conda/environments/all_cuda-129_arch-x86_64.yaml (1)

35-35: LGTM - Consistent with other environment files.

conda/environments/all_cuda-131_arch-x86_64.yaml (1)

35-35: LGTM - Consistent with other environment files.

dependencies.yaml (1)

296-305: LGTM - libprotobuf correctly added to build_cpp dependencies.

The placement under build_cpp.common with output_types: conda is appropriate for the C++ build dependency needed for protobuf-based remote serialization. This follows the existing pattern of other build dependencies in this section.

conda/environments/all_cuda-129_arch-aarch64.yaml (1)

35-35: LGTM!

The libprotobuf dependency addition is consistent with the PR's protobuf-based remote solve serialization requirements. The alphabetical placement is correct.

Comment on lines +1 to +82
# This file is generated by `rapids-dependency-file-generator`.
# To make changes, edit ../../dependencies.yaml and run `rapids-dependency-file-generator`.
channels:
- rapidsai-nightly
- rapidsai
- conda-forge
dependencies:
- boost
- breathe
- bzip2
- c-compiler
- ccache
- clang-tools=20.1.4
- clang==20.1.4
- cmake>=3.30.4
- cpp-argparse
- cuda-nvcc
- cuda-nvtx-dev
- cuda-python>=13.0.1,<14.0a0
- cuda-sanitizer-api
- cuda-version=13.0
- cudf==26.2.*,>=0.0.0a0
- cupy>=13.6.0
- cxx-compiler
- cython>=3.0.3
- doxygen=1.9.1
- exhale
- fastapi
- gcc_linux-64=14.*
- gmock
- gtest
- ipython
- jsonref==1.1.0
- libcudss-dev >=0.7
- libcurand-dev
- libcusolver-dev
- libcusparse-dev
- libraft-headers==26.2.*,>=0.0.0a0
- librmm==26.2.*,>=0.0.0a0
- make
- msgpack-numpy==0.4.8
- msgpack-python==1.1.0
- myst-nb
- myst-parser
- ninja
- notebook
- numba-cuda>=0.19.1,<0.20.0a0
- numba>=0.60.0
- numpy>=1.23.5,<3.0a0
- numpydoc
- pandas>=2.0
- pexpect
- pip
- pre-commit
- psutil>=6.0.0
- pylibraft==26.2.*,>=0.0.0a0
- pyrsistent
- pytest-cov
- pytest<8
- python>=3.10,<3.14
- rapids-build-backend>=0.4.0,<0.5.0.dev0
- rapids-dask-dependency==26.2.*,>=0.0.0a0
- rapids-logger==0.2.*,>=0.0.0a0
- requests
- rmm==26.2.*,>=0.0.0a0
- scikit-build-core>=0.10.0
- sphinx
- sphinx-copybutton
- sphinx-design
- sphinx-markdown-tables
- sphinx_rtd_theme
- sphinxcontrib-openapi
- sphinxcontrib-websupport
- sysroot_linux-64==2.28
- tbb-devel
- uvicorn==0.34.*
- zlib
- pip:
- nvidia_sphinx_theme
- swagger-plugin-for-sphinx
- veroviz
name: all_cuda-130_arch-x86_64
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if the non-.orig version exists
echo "=== Checking for existing all_cuda-130_arch-x86_64.yaml ==="
fd -t f "all_cuda-130_arch-x86_64.yaml" conda/environments/

echo ""
echo "=== Checking libprotobuf presence in all environment files ==="
for f in conda/environments/all_cuda-*.yaml; do
  if [ -f "$f" ]; then
    echo -n "$f: "
    grep -q "libprotobuf" "$f" && echo "HAS libprotobuf" || echo "MISSING libprotobuf"
  fi
done

echo ""
echo "=== Listing any .orig files that may be artifacts ==="
fd -e orig . conda/environments/

echo ""
echo "=== Checking git status and history for .orig files ==="
cd conda/environments && git status --short | grep orig || echo "No .orig files in git status"

Repository: NVIDIA/cuopt

Length of output: 694


🏁 Script executed:

#!/bin/bash
# Check if .orig file is tracked in git
echo "=== Checking if .orig file is tracked in git ==="
git ls-files | grep "all_cuda-130_arch-x86_64.yaml.orig"

echo ""
echo "=== Checking git log for this file ==="
git log --oneline -- conda/environments/all_cuda-130_arch-x86_64.yaml.orig | head -5

echo ""
echo "=== List all all_cuda*.yaml* files to see naming pattern ==="
ls -la conda/environments/all_cuda*.yaml*

Repository: NVIDIA/cuopt

Length of output: 804


🏁 Script executed:

#!/bin/bash
echo "=== Check all environment files in the directory ==="
ls -1 conda/environments/all_cuda*.yaml* | sort

echo ""
echo "=== Check what's in the git commit that added libprotobuf ==="
git show --stat 28ef558 | head -50

echo ""
echo "=== Check the full diff for the .orig file in that commit ==="
git show 28ef558 -- conda/environments/all_cuda-130_arch-x86_64.yaml.orig | head -100

Repository: NVIDIA/cuopt

Length of output: 5691


🏁 Script executed:

#!/bin/bash
echo "=== Check if dependencies.yaml mentions CUDA 130 ==="
grep -A 10 "130" dependencies.yaml | head -30

echo ""
echo "=== Count how many env variants are defined for each CUDA version in dependencies.yaml ==="
grep -E "cuda-130|cuda-129|cuda-131" dependencies.yaml | wc -l

Repository: NVIDIA/cuopt

Length of output: 202


🏁 Script executed:

#!/bin/bash
echo "=== Search for CUDA version patterns in dependencies.yaml ==="
grep -i "cuda" dependencies.yaml | head -20

echo ""
echo "=== Check the structure of dependencies.yaml ==="
head -100 dependencies.yaml

echo ""
echo "=== Look for arch and cuda variants ==="
grep -E "arch|cuda" dependencies.yaml | grep -i "x86_64\|aarch64\|130\|129\|131" | head -20

Repository: NVIDIA/cuopt

Length of output: 3306


🏁 Script executed:

#!/bin/bash
echo "=== Check git history for the .orig file ==="
git log --all --oneline -- conda/environments/all_cuda-130_arch-x86_64.yaml.orig

echo ""
echo "=== Check if there are any references to CUDA 130 in the codebase ==="
rg "130" conda/ --type yaml | head -20

echo ""
echo "=== Verify that CUDA 129 and 131 both have both arch variants ==="
echo "CUDA 129:"
ls -1 conda/environments/all_cuda-129* | sort

echo ""
echo "CUDA 131:"
ls -1 conda/environments/all_cuda-131* | sort

Repository: NVIDIA/cuopt

Length of output: 504


Delete the all_cuda-130_arch-x86_64.yaml.orig file—it should not be committed.

This file represents an unsupported CUDA version. The dependencies.yaml only defines CUDA versions 12.9 and 13.1, which generate environment files for CUDA 129 and 131. CUDA 130 is not in the supported matrix and has no corresponding aarch64 variant (unlike the 129 and 131 versions which both have x86_64 and aarch64 pairs).

Additionally, this file is missing the libprotobuf dependency that was added across all other environment files in this PR. It appears to be a leftover from the file regeneration process and should be removed entirely rather than renamed.

🤖 Prompt for AI Agents
In @conda/environments/all_cuda-130_arch-x86_64.yaml.orig around lines 1 - 82,
The file all_cuda-130_arch-x86_64.yaml.orig should be deleted from the commit
because it represents an unsupported CUDA 13.0 environment and is missing the
libprotobuf dependency; remove the file entirely from the branch (git rm
--cached/--force as appropriate) and commit that deletion, then verify no other
generated env files reference CUDA 13.0 or lack libprotobuf so the supported
CUDA 12.9 and 13.1 variants remain intact.

- tbb-devel
- zlib
- bzip2
- libprotobuf
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# Find the recipe.yaml file and check its structure
find . -name "recipe.yaml" -path "*libcuopt*" 2>/dev/null

Repository: NVIDIA/cuopt

Length of output: 92


🏁 Script executed:

# Look for CMakeLists.txt files that might contain protobuf references
find . -name "CMakeLists.txt" 2>/dev/null | head -20

Repository: NVIDIA/cuopt

Length of output: 799


🏁 Script executed:

# Search for any protobuf references in the repository
rg -l "protobuf" --type yaml --type cmake 2>/dev/null | head -20

Repository: NVIDIA/cuopt

Length of output: 367


🏁 Script executed:

# Read the recipe.yaml file to see the structure and all requirements
cat -n ./conda/recipes/libcuopt/recipe.yaml

Repository: NVIDIA/cuopt

Length of output: 8706


🏁 Script executed:

# Check cpp/CMakeLists.txt for protobuf linking details
rg -n "protobuf" ./cpp/CMakeLists.txt -A 3 -B 1

Repository: NVIDIA/cuopt

Length of output: 380


🏁 Script executed:

# Also check the main source CMakeLists for protobuf
rg -n "protobuf" ./cpp/src/CMakeLists.txt -A 3 -B 1

Repository: NVIDIA/cuopt

Length of output: 38


🏁 Script executed:

# Search for how protobuf is found/configured in CMakeLists
rg -n "find_package.*protobuf" ./cpp/CMakeLists.txt -i -A 5

Repository: NVIDIA/cuopt

Length of output: 342


🏁 Script executed:

# Check if there's any mention of static protobuf linking
rg -n "protobuf.*STATIC\|Protobuf_USE_STATIC_LIBS\|Protobuf.*static" ./cpp/CMakeLists.txt -i

Repository: NVIDIA/cuopt

Length of output: 38


🏁 Script executed:

# Look for the full context of the protobuf link in libcuopt
rg -n "target_link_libraries.*cuopt" ./cpp/CMakeLists.txt -B 10 -A 10 | head -60

Repository: NVIDIA/cuopt

Length of output: 1934


libprotobuf missing from libcuopt run requirements — add at line 161.

While libprotobuf is correctly added to build and host requirements, it must also be in the run requirements for the libcuopt output package. The CMakeLists.txt dynamically links libcuopt.so against protobuf::libprotobuf (line 370 in cpp/CMakeLists.txt), which means consumers of libcuopt require protobuf at runtime. Without this dependency, binaries and libraries linking to libcuopt.so will fail with missing symbol errors.

Add libprotobuf to the run requirements at line 161:

      run:
        - ${{ pin_compatible("cuda-version", upper_bound="x", lower_bound="x") }}
        - ${{ pin_subpackage("libmps-parser", exact=True) }}
        - libboost-devel
        - libprotobuf
        - librmm =${{ minor_version }}
🤖 Prompt for AI Agents
In @conda/recipes/libcuopt/recipe.yaml at line 65, The run requirements for the
libcuopt package are missing libprotobuf, causing runtime link failures because
cpp/CMakeLists.txt links libcuopt.so against protobuf::libprotobuf; update the
recipe.yaml run section for the libcuopt output (the run list around the current
entries pinned at the run block) to include libprotobuf alongside libboost-devel
and librmm (i.e., insert a run entry "libprotobuf" in the run dependencies list
so consumers of libcuopt get protobuf at runtime).

- Add depends_on_protobuf section to dependencies.yaml
- Include protobuf in py_rapids_build_libcuopt requirements
- Regenerate python/libcuopt/pyproject.toml with protobuf

Fixes Python wheel CI build failure where CMake could not find
protobuf during the libcuopt wheel build. The previous commit added
protobuf to conda builds, this commit adds it to pip wheel builds.
@tmckayus
Copy link
Contributor Author

/ok to test 3b4e741

@tmckayus
Copy link
Contributor Author

TODO: take another pass and replace client protocol with gRPC

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@ci/utils/install_protobuf.sh`:
- Around line 9-25: Pin Protobuf to a compatible version across all places:
update dependencies.yaml to require a specific Protobuf range (e.g. "protobuf:
>=3.20,<4.0"), update CMakeLists.txt's find_package invocation to include the
minimum version (e.g. change find_package(Protobuf REQUIRED) to
find_package(Protobuf 3.20 REQUIRED)), and change this install script
(install_protobuf.sh) to install that pinned version via the package manager
(adjust the dnf/apt package names or add version qualifiers / use apt-get
install protobuf-compiler=3.20.* or the equivalent distro-specific
package/version) and also ensure conda environment specs are updated to the same
version range; make sure error branches still instruct manual install if the
exact version is unavailable.
🧹 Nitpick comments (2)
ci/utils/install_protobuf.sh (2)

14-17: Consider cleaning up apt cache to reduce image size.

In CI/container environments, cleaning up the apt cache after installation reduces the final image size.

Suggested improvement
     elif [[ "$ID" == "ubuntu" ]]; then
         echo "Detected Ubuntu. Installing Protobuf via apt..."
         apt-get update
-        apt-get install -y libprotobuf-dev protobuf-compiler
+        apt-get install -y --no-install-recommends libprotobuf-dev protobuf-compiler
+        rm -rf /var/lib/apt/lists/*
     else

11-21: Consider adding Debian support.

Since Debian is closely related to Ubuntu and uses the same package manager, you might want to support it explicitly. This would help if the CI environment or developer machines use Debian.

Suggested improvement
     if [[ "$ID" == "rocky" ]]; then
         echo "Detected Rocky Linux. Installing Protobuf via dnf..."
         dnf install -y protobuf-devel protobuf-compiler
-    elif [[ "$ID" == "ubuntu" ]]; then
-        echo "Detected Ubuntu. Installing Protobuf via apt..."
+    elif [[ "$ID" == "ubuntu" || "$ID" == "debian" ]]; then
+        echo "Detected $ID. Installing Protobuf via apt..."
         apt-get update
         apt-get install -y libprotobuf-dev protobuf-compiler
     else
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3b4e741 and 0dcd54a.

📒 Files selected for processing (2)
  • ci/build_wheel_libcuopt.sh
  • ci/utils/install_protobuf.sh
🔇 Additional comments (2)
ci/utils/install_protobuf.sh (1)

1-6: Good shell script setup.

The strict mode (set -euo pipefail) ensures the script fails fast on errors, which is appropriate for CI scripts.

ci/build_wheel_libcuopt.sh (1)

20-22: LGTM!

The protobuf installation step is correctly placed in the dependency installation sequence, following the established pattern used by other dependencies (Boost, TBB, cuDSS). The script's set -euo pipefail ensures any installation failure will abort the build.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Comment on lines 9 to 25
if [ -f /etc/os-release ]; then
. /etc/os-release
if [[ "$ID" == "rocky" ]]; then
echo "Detected Rocky Linux. Installing Protobuf via dnf..."
dnf install -y protobuf-devel protobuf-compiler
elif [[ "$ID" == "ubuntu" ]]; then
echo "Detected Ubuntu. Installing Protobuf via apt..."
apt-get update
apt-get install -y libprotobuf-dev protobuf-compiler
else
echo "Unknown OS: $ID. Please install Protobuf development libraries manually."
exit 1
fi
else
echo "/etc/os-release not found. Cannot determine OS. Please install Protobuf development libraries manually."
exit 1
fi
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash

# Check if there are version constraints specified elsewhere in the project
echo "=== Checking dependency files for protobuf version constraints ==="
rg "protobuf" --type yaml --type txt -g '!*.pb.*' -n

echo ""
echo "=== Checking CMake files for protobuf version requirements ==="
rg "find_package.*[Pp]rotobuf" -A2 -B2 -n

echo ""
echo "=== Searching for CMakeLists.txt files ==="
fd "CMakeLists.txt" -x cat -n {}

echo ""
echo "=== Checking for .proto files ==="
fd "\.proto$" -x cat -n {}

echo ""
echo "=== Checking requirements.txt or similar Python dependency files ==="
fd -E node_modules "(requirements.*\.txt|setup\.py|pyproject\.toml|poetry\.lock|Pipfile)" -x cat -n {}

echo ""
echo "=== Checking for docker or container files ==="
fd "(Dockerfile|\.dockerignore)" -x cat -n {}

Repository: NVIDIA/cuopt

Length of output: 50369


🏁 Script executed:

#!/bin/bash

echo "=== Checking dependencies.yaml for protobuf version constraints ==="
rg -A5 "depends_on_protobuf|libprotobuf" dependencies.yaml

echo ""
echo "=== Checking conda environment files for libprotobuf version specs ==="
fd "all_cuda.*\.yaml" conda/environments/ -x grep -A1 -B1 "libprotobuf" {}

echo ""
echo "=== Checking conda recipe for libprotobuf ==="
grep -A2 -B2 "libprotobuf" conda/recipes/libcuopt/recipe.yaml

echo ""
echo "=== Checking if protobuf has version specifications anywhere ==="
rg "protobuf.*==" -n

Repository: NVIDIA/cuopt

Length of output: 1147


🏁 Script executed:

#!/bin/bash

echo "=== Looking for .proto file to understand protobuf usage ==="
find . -name "*.proto" -type f -exec cat -n {} +

echo ""
echo "=== Checking what's in linear_programming/utilities/protobuf_serializer.cu ==="
fd "protobuf_serializer" -x head -30 {}

Repository: NVIDIA/cuopt

Length of output: 14312


Add protobuf version constraints to ensure compatibility across distributions.

Protobuf version mismatches between build and runtime environments can cause serialization/deserialization issues. Currently, protobuf has no version pinning in dependencies.yaml, conda environments, or CMakeLists.txt (find_package(Protobuf REQUIRED) lacks version specification), while the install script installs without version constraints. Given that this PR introduces protobuf-based serialization for remote solving, add explicit version constraints to dependencies.yaml, CMakeLists.txt, and update the install script to match the pinned version.

🤖 Prompt for AI Agents
In `@ci/utils/install_protobuf.sh` around lines 9 - 25, Pin Protobuf to a
compatible version across all places: update dependencies.yaml to require a
specific Protobuf range (e.g. "protobuf: >=3.20,<4.0"), update CMakeLists.txt's
find_package invocation to include the minimum version (e.g. change
find_package(Protobuf REQUIRED) to find_package(Protobuf 3.20 REQUIRED)), and
change this install script (install_protobuf.sh) to install that pinned version
via the package manager (adjust the dnf/apt package names or add version
qualifiers / use apt-get install protobuf-compiler=3.20.* or the equivalent
distro-specific package/version) and also ensure conda environment specs are
updated to the same version range; make sure error branches still instruct
manual install if the exact version is unavailable.

@rgsl888prabhu
Copy link
Collaborator

/ok to test 0dcd54a

@tmckayus tmckayus mentioned this pull request Jan 14, 2026
@rgsl888prabhu
Copy link
Collaborator

/ok to test de8daf7

@rgsl888prabhu
Copy link
Collaborator

/ok to test 7660217

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@cpp/CMakeLists.txt`:
- Around line 621-624: Wrap the
add_subdirectory(src/linear_programming/utilities/serializers) call with the
BUILD_REMOTE_SERVER conditional so the serializers plugin is only built when
BUILD_REMOTE_SERVER is enabled; locate the existing add_subdirectory invocation
in CMakeLists and move or surround it with the if(BUILD_REMOTE_SERVER) ...
endif() block (keeping it near other remote-server related entries and
references such as cuopt_remote_server.cpp) to prevent building the msgpack
serializer when remote server support is disabled.
🧹 Nitpick comments (2)
cpp/CMakeLists.txt (2)

234-238: Prefer target-specific includes over global include_directories().

Using global include_directories() adds paths to all targets in this directory, which can cause unintended include pollution. Modern CMake best practice is to use target_include_directories() on specific targets that need the paths.

Since multiple targets need the generated proto headers (cuopt, cuopt_remote_server), consider either:

  1. Adding the paths explicitly to each target that needs them, or
  2. Creating an interface library for the proto headers
♻️ Suggested approach using target-specific includes
 # Protocol Buffers for remote solve serialization
 find_package(Protobuf REQUIRED)
-include_directories(SYSTEM ${Protobuf_INCLUDE_DIRS})
-include_directories(${CMAKE_CURRENT_BINARY_DIR})
+# Protobuf includes added to targets via target_include_directories below

Then add to the cuopt target (around line 305-319):

target_include_directories(cuopt SYSTEM PRIVATE ${Protobuf_INCLUDE_DIRS})
target_include_directories(cuopt PRIVATE ${CMAKE_CURRENT_BINARY_DIR})

And similarly for cuopt_remote_server.


575-581: Consider using PRIVATE for executable link dependencies.

For executable targets, the PUBLIC/PRIVATE distinction on target_link_libraries has no practical effect on the executable itself. Using PRIVATE is the conventional choice and more clearly expresses intent.

♻️ Suggested change
   target_link_libraries(cuopt_remote_server
-    PUBLIC
-    cuopt
-    OpenMP::OpenMP_CXX
     PRIVATE
+    cuopt
+    OpenMP::OpenMP_CXX
     protobuf::libprotobuf
   )
   target_link_libraries(cuopt_cancel_job
-    PUBLIC
-    cuopt
+    PRIVATE cuopt
   )

Also applies to: 609-612

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1a2227b and 7660217.

📒 Files selected for processing (1)
  • cpp/CMakeLists.txt
🧰 Additional context used
🧠 Learnings (11)
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Check that hard-coded GPU device IDs and resource limits are made configurable; abstract multi-backend support for different CUDA versions

Applied to files:

  • cpp/CMakeLists.txt
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Verify error propagation from CUDA to user-facing APIs is complete; ensure CUDA errors are caught and mapped to meaningful user error codes

Applied to files:

  • cpp/CMakeLists.txt
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh} : Every CUDA kernel launch and memory operation must have error checking with CUDA_CHECK or equivalent verification

Applied to files:

  • cpp/CMakeLists.txt
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cpp,hpp,h} : Avoid inappropriate use of exceptions in performance-critical GPU operation paths; prefer error codes or CUDA error checking for latency-sensitive code

Applied to files:

  • cpp/CMakeLists.txt
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh} : Avoid reinventing functionality already available in Thrust, CCCL, or RMM libraries; prefer standard library utilities over custom implementations

Applied to files:

  • cpp/CMakeLists.txt
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.cu : Detect inefficient GPU kernel launches with low occupancy or poor memory access patterns; optimize for coalesced memory access and minimize warp divergence in hot paths

Applied to files:

  • cpp/CMakeLists.txt
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Verify correct problem size checks before expensive GPU/CPU operations; prevent resource exhaustion on oversized problems

Applied to files:

  • cpp/CMakeLists.txt
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Ensure race conditions are absent in multi-GPU code and multi-threaded server implementations; verify proper synchronization of shared state

Applied to files:

  • cpp/CMakeLists.txt
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Track GPU device memory allocations and deallocations to prevent memory leaks; ensure cudaMalloc/cudaFree balance and cleanup of streams/events

Applied to files:

  • cpp/CMakeLists.txt
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : For concurrent CUDA operations (barriers, async operations), explicitly create and manage dedicated streams instead of reusing the default stream; document stream lifecycle

Applied to files:

  • cpp/CMakeLists.txt
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Refactor code duplication in solver components (3+ occurrences) into shared utilities; for GPU kernels, use templated device functions to avoid duplication

Applied to files:

  • cpp/CMakeLists.txt
🔇 Additional comments (3)
cpp/CMakeLists.txt (3)

142-147: LGTM!

Reasonable workaround for GCC's known false-positive -Warray-bounds diagnostics from external headers like Protobuf, while preserving -Werror for other warnings.


239-253: Proto generation setup looks correct.

The custom command is properly configured with DEPENDS for rebuild tracking and VERBATIM for safe argument handling.

Minor observation: PROTO_HDRS is defined but unused. Consider adding it to the cuopt target's sources or target_sources() to ensure proper dependency tracking for the header file, though in practice the .cc file inclusion typically suffices.


264-267: LGTM!

The protobuf sources are correctly added to the cuopt library, and the linking arrangement is appropriate:

  • mps_parser_static embeds the parser symbols directly
  • protobuf::libprotobuf as PRIVATE keeps it an implementation detail

Also applies to: 375-377

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Comment on lines +621 to +624
# ##################################################################################################
# - Pluggable Serializers --------------------------------------------------------------------------
# Build optional serializer plugins (e.g., msgpack)
add_subdirectory(src/linear_programming/utilities/serializers)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if serializers are used outside of remote solve code paths

# Find references to serializer in the codebase excluding the serializers directory itself
echo "=== References to serializer outside serializers directory ==="
rg -n "serializer" --type cpp --type cmake -g '!**/serializers/**' -g '!**/test**' | head -100

echo ""
echo "=== Check if serializers CMakeLists.txt has its own guards ==="
fd -t f "CMakeLists.txt" --path "*/serializers/*" -exec cat {} \;

echo ""
echo "=== Check the context around line 621-624 in CMakeLists.txt ==="
rg -B 30 -A 5 "add_subdirectory.*serializers" cpp/CMakeLists.txt

Repository: NVIDIA/cuopt

Length of output: 7539


🏁 Script executed:

#!/bin/bash
# Get serializers CMakeLists.txt
echo "=== Serializers CMakeLists.txt ==="
find . -path "*/serializers/CMakeLists.txt" -exec cat {} \;

echo ""
echo "=== Check BUILD_REMOTE_SERVER conditional structure in main CMakeLists.txt ==="
# Search for BUILD_REMOTE_SERVER conditions to see the structure
rg -n "BUILD_REMOTE_SERVER|add_subdirectory.*serializers" cpp/CMakeLists.txt | tail -50

echo ""
echo "=== Check protobuf_serializer.cu location and context ==="
fd -t f "protobuf_serializer.cu" -exec head -20 {} \;

echo ""
echo "=== Check if protobuf_serializer is conditioned in src/linear_programming/CMakeLists.txt ==="
rg -B 5 -A 5 "protobuf_serializer" cpp/src/linear_programming/CMakeLists.txt

Repository: NVIDIA/cuopt

Length of output: 2658


Move the serializers subdirectory inside the BUILD_REMOTE_SERVER conditional.

The add_subdirectory(src/linear_programming/utilities/serializers) call builds the msgpack serializer plugin, which is only used by the remote server. All serializer references in the codebase are confined to cuopt_remote_server.cpp. Building this optional plugin when remote server support is disabled is wasteful and should be gated behind BUILD_REMOTE_SERVER.

🤖 Prompt for AI Agents
In `@cpp/CMakeLists.txt` around lines 621 - 624, Wrap the
add_subdirectory(src/linear_programming/utilities/serializers) call with the
BUILD_REMOTE_SERVER conditional so the serializers plugin is only built when
BUILD_REMOTE_SERVER is enabled; locate the existing add_subdirectory invocation
in CMakeLists and move or surround it with the if(BUILD_REMOTE_SERVER) ...
endif() block (keeping it near other remote-server related entries and
references such as cuopt_remote_server.cpp) to prevent building the msgpack
serializer when remote server support is disabled.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@cpp/CMakeLists.txt`:
- Around line 266-269: The cuopt library is currently built with ${PROTO_SRCS}
always appended which breaks builds when Protobuf is disabled; ensure PROTO_SRCS
is defined to an empty value when the protobuf/remote-server branch is not taken
(e.g., set(PROTO_SRCS "" ) or guard with if(NOT DEFINED PROTO_SRCS)
set(PROTO_SRCS "")) so the add_library(cuopt SHARED ${CUOPT_SRC_FILES}
${PROTO_SRCS}) line works unconditionally; update the CMake logic that defines
PROTO_SRCS (the conditional that generates/sets it) to provide this default.
- Around line 236-255: Wrap the Protobuf dependency and proto generation in an
if(BUILD_REMOTE_SERVER) ... endif() guard so find_package(Protobuf REQUIRED),
the add_custom_command that produces PROTO_SRCS/PROTO_HDRS, and setting
PROTO_FILE/PROTO_SRCS/PROTO_HDRS only run when BUILD_REMOTE_SERVER is ON; remove
the unconditional include_directories(SYSTEM ${Protobuf_INCLUDE_DIRS}) and
include_directories(${CMAKE_CURRENT_BINARY_DIR}) and instead call
target_include_directories(<the target that consumes PROTO_SRCS/PROTO_HDRS>
PRIVATE ${CMAKE_CURRENT_BINARY_DIR} SYSTEM ${Protobuf_INCLUDE_DIRS}) so only the
target that uses the generated headers gets the includes.
♻️ Duplicate comments (1)
cpp/CMakeLists.txt (1)

623-626: Move the serializers subdirectory inside the BUILD_REMOTE_SERVER conditional.

This was flagged in a previous review. The add_subdirectory(src/linear_programming/utilities/serializers) builds the msgpack serializer plugin, which is only used by the remote server.

♻️ Suggested fix
-# ##################################################################################################
-# - Pluggable Serializers --------------------------------------------------------------------------
-# Build optional serializer plugins (e.g., msgpack)
-add_subdirectory(src/linear_programming/utilities/serializers)

Move this inside the if(BUILD_REMOTE_SERVER AND NOT BUILD_LP_ONLY) block above (before line 621).

🧹 Nitpick comments (2)
cpp/CMakeLists.txt (2)

376-379: Protobuf linkage should be conditional.

The protobuf::libprotobuf is linked unconditionally. If the Protobuf dependency is made conditional on BUILD_REMOTE_SERVER, this linkage should also be guarded:

♻️ Suggested refactor
   PRIVATE
   ${CUOPT_PRIVATE_CUDA_LIBS}
   cuopt::mps_parser_static  # Static link - symbols embedded in libcuopt.so
-  protobuf::libprotobuf
+  $<$<BOOL:${BUILD_REMOTE_SERVER}>:protobuf::libprotobuf>
   )

559-621: Remote server build section is well-structured.

The BUILD_REMOTE_SERVER guard and RUNPATH configuration are appropriate. Minor suggestions:

  1. Redundant include directories (lines 567-576): For executables, PUBLIC include directories are not meaningful since nothing links against them. The same paths appear in both PRIVATE and PUBLIC.

  2. Executable linkage (lines 577-583, 611-614): For executables, PUBLIC vs PRIVATE linkage is semantically equivalent. Consider using only PRIVATE for clarity.

♻️ Optional cleanup
   target_include_directories(cuopt_remote_server
     PRIVATE
     "${CMAKE_CURRENT_SOURCE_DIR}/src"
     "${CMAKE_CURRENT_SOURCE_DIR}/include"
     "${CMAKE_CURRENT_SOURCE_DIR}/libmps_parser/include"
     "${CMAKE_CURRENT_BINARY_DIR}"
-    PUBLIC
     "$<BUILD_INTERFACE:${CMAKE_CURRENT_SOURCE_DIR}/include>"
     "$<BUILD_INTERFACE:${CMAKE_CURRENT_BINARY_DIR}/include>"
   )
   target_link_libraries(cuopt_remote_server
-    PUBLIC
-    cuopt
-    OpenMP::OpenMP_CXX
     PRIVATE
+    cuopt
+    OpenMP::OpenMP_CXX
     protobuf::libprotobuf
   )
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7660217 and 235323c.

📒 Files selected for processing (1)
  • cpp/CMakeLists.txt
🧰 Additional context used
🧠 Learnings (12)
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Check that hard-coded GPU device IDs and resource limits are made configurable; abstract multi-backend support for different CUDA versions

Applied to files:

  • cpp/CMakeLists.txt
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cpp,hpp,h} : Prevent thread-unsafe use of global and static variables; use proper mutex/synchronization in server code accessing shared solver state

Applied to files:

  • cpp/CMakeLists.txt
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Ensure race conditions are absent in multi-GPU code and multi-threaded server implementations; verify proper synchronization of shared state

Applied to files:

  • cpp/CMakeLists.txt
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Refactor code duplication in solver components (3+ occurrences) into shared utilities; for GPU kernels, use templated device functions to avoid duplication

Applied to files:

  • cpp/CMakeLists.txt
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Verify error propagation from CUDA to user-facing APIs is complete; ensure CUDA errors are caught and mapped to meaningful user error codes

Applied to files:

  • cpp/CMakeLists.txt
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh} : Every CUDA kernel launch and memory operation must have error checking with CUDA_CHECK or equivalent verification

Applied to files:

  • cpp/CMakeLists.txt
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cpp,hpp,h} : Avoid inappropriate use of exceptions in performance-critical GPU operation paths; prefer error codes or CUDA error checking for latency-sensitive code

Applied to files:

  • cpp/CMakeLists.txt
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh} : Avoid reinventing functionality already available in Thrust, CCCL, or RMM libraries; prefer standard library utilities over custom implementations

Applied to files:

  • cpp/CMakeLists.txt
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.cu : Detect inefficient GPU kernel launches with low occupancy or poor memory access patterns; optimize for coalesced memory access and minimize warp divergence in hot paths

Applied to files:

  • cpp/CMakeLists.txt
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Verify correct problem size checks before expensive GPU/CPU operations; prevent resource exhaustion on oversized problems

Applied to files:

  • cpp/CMakeLists.txt
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Track GPU device memory allocations and deallocations to prevent memory leaks; ensure cudaMalloc/cudaFree balance and cleanup of streams/events

Applied to files:

  • cpp/CMakeLists.txt
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : For concurrent CUDA operations (barriers, async operations), explicitly create and manage dedicated streams instead of reusing the default stream; document stream lifecycle

Applied to files:

  • cpp/CMakeLists.txt
🔇 Additional comments (1)
cpp/CMakeLists.txt (1)

142-149: LGTM: Pragmatic warning suppression for third-party headers.

The GCC-specific suppression of -Warray-bounds and -Wstringop-overread is a reasonable approach to avoid false positives from Protobuf headers while maintaining strict warnings for project code.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Comment on lines +236 to +255
# Protocol Buffers for remote solve serialization
find_package(Protobuf REQUIRED)
include_directories(SYSTEM ${Protobuf_INCLUDE_DIRS})
include_directories(${CMAKE_CURRENT_BINARY_DIR})

# Generate C++ code from .proto file
set(PROTO_FILE "${CMAKE_CURRENT_SOURCE_DIR}/src/linear_programming/utilities/cuopt_remote.proto")
set(PROTO_SRCS "${CMAKE_CURRENT_BINARY_DIR}/cuopt_remote.pb.cc")
set(PROTO_HDRS "${CMAKE_CURRENT_BINARY_DIR}/cuopt_remote.pb.h")

add_custom_command(
OUTPUT "${PROTO_SRCS}" "${PROTO_HDRS}"
COMMAND ${Protobuf_PROTOC_EXECUTABLE}
ARGS --cpp_out ${CMAKE_CURRENT_BINARY_DIR}
--proto_path ${CMAKE_CURRENT_SOURCE_DIR}/src/linear_programming/utilities
${PROTO_FILE}
DEPENDS ${PROTO_FILE}
COMMENT "Generating C++ code from cuopt_remote.proto"
VERBATIM
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Protobuf is unconditionally required even when remote server is disabled.

The find_package(Protobuf REQUIRED) and proto generation occur unconditionally, adding a mandatory build dependency for all users even when BUILD_REMOTE_SERVER=OFF. This increases build complexity and external dependencies for users who only need local solves.

Consider making Protobuf conditional on BUILD_REMOTE_SERVER:

♻️ Suggested refactor
-# Protocol Buffers for remote solve serialization
-find_package(Protobuf REQUIRED)
-include_directories(SYSTEM ${Protobuf_INCLUDE_DIRS})
-include_directories(${CMAKE_CURRENT_BINARY_DIR})
-
-# Generate C++ code from .proto file
-set(PROTO_FILE "${CMAKE_CURRENT_SOURCE_DIR}/src/linear_programming/utilities/cuopt_remote.proto")
-set(PROTO_SRCS "${CMAKE_CURRENT_BINARY_DIR}/cuopt_remote.pb.cc")
-set(PROTO_HDRS "${CMAKE_CURRENT_BINARY_DIR}/cuopt_remote.pb.h")
-
-add_custom_command(
-  OUTPUT "${PROTO_SRCS}" "${PROTO_HDRS}"
-  COMMAND ${Protobuf_PROTOC_EXECUTABLE}
-  ARGS --cpp_out ${CMAKE_CURRENT_BINARY_DIR}
-       --proto_path ${CMAKE_CURRENT_SOURCE_DIR}/src/linear_programming/utilities
-       ${PROTO_FILE}
-  DEPENDS ${PROTO_FILE}
-  COMMENT "Generating C++ code from cuopt_remote.proto"
-  VERBATIM
-)
+# Protocol Buffers for remote solve serialization (optional)
+set(PROTO_SRCS "")
+if(BUILD_REMOTE_SERVER)
+  find_package(Protobuf REQUIRED)
+  
+  set(PROTO_FILE "${CMAKE_CURRENT_SOURCE_DIR}/src/linear_programming/utilities/cuopt_remote.proto")
+  set(PROTO_SRCS "${CMAKE_CURRENT_BINARY_DIR}/cuopt_remote.pb.cc")
+  set(PROTO_HDRS "${CMAKE_CURRENT_BINARY_DIR}/cuopt_remote.pb.h")
+
+  add_custom_command(
+    OUTPUT "${PROTO_SRCS}" "${PROTO_HDRS}"
+    COMMAND ${Protobuf_PROTOC_EXECUTABLE}
+    ARGS --cpp_out ${CMAKE_CURRENT_BINARY_DIR}
+         --proto_path ${CMAKE_CURRENT_SOURCE_DIR}/src/linear_programming/utilities
+         ${PROTO_FILE}
+    DEPENDS ${PROTO_FILE}
+    COMMENT "Generating C++ code from cuopt_remote.proto"
+    VERBATIM
+  )
+endif()

Additionally, prefer target_include_directories() over global include_directories() for better encapsulation.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Protocol Buffers for remote solve serialization
find_package(Protobuf REQUIRED)
include_directories(SYSTEM ${Protobuf_INCLUDE_DIRS})
include_directories(${CMAKE_CURRENT_BINARY_DIR})
# Generate C++ code from .proto file
set(PROTO_FILE "${CMAKE_CURRENT_SOURCE_DIR}/src/linear_programming/utilities/cuopt_remote.proto")
set(PROTO_SRCS "${CMAKE_CURRENT_BINARY_DIR}/cuopt_remote.pb.cc")
set(PROTO_HDRS "${CMAKE_CURRENT_BINARY_DIR}/cuopt_remote.pb.h")
add_custom_command(
OUTPUT "${PROTO_SRCS}" "${PROTO_HDRS}"
COMMAND ${Protobuf_PROTOC_EXECUTABLE}
ARGS --cpp_out ${CMAKE_CURRENT_BINARY_DIR}
--proto_path ${CMAKE_CURRENT_SOURCE_DIR}/src/linear_programming/utilities
${PROTO_FILE}
DEPENDS ${PROTO_FILE}
COMMENT "Generating C++ code from cuopt_remote.proto"
VERBATIM
)
# Protocol Buffers for remote solve serialization (optional)
set(PROTO_SRCS "")
if(BUILD_REMOTE_SERVER)
find_package(Protobuf REQUIRED)
set(PROTO_FILE "${CMAKE_CURRENT_SOURCE_DIR}/src/linear_programming/utilities/cuopt_remote.proto")
set(PROTO_SRCS "${CMAKE_CURRENT_BINARY_DIR}/cuopt_remote.pb.cc")
set(PROTO_HDRS "${CMAKE_CURRENT_BINARY_DIR}/cuopt_remote.pb.h")
add_custom_command(
OUTPUT "${PROTO_SRCS}" "${PROTO_HDRS}"
COMMAND ${Protobuf_PROTOC_EXECUTABLE}
ARGS --cpp_out ${CMAKE_CURRENT_BINARY_DIR}
--proto_path ${CMAKE_CURRENT_SOURCE_DIR}/src/linear_programming/utilities
${PROTO_FILE}
DEPENDS ${PROTO_FILE}
COMMENT "Generating C++ code from cuopt_remote.proto"
VERBATIM
)
endif()
🤖 Prompt for AI Agents
In `@cpp/CMakeLists.txt` around lines 236 - 255, Wrap the Protobuf dependency and
proto generation in an if(BUILD_REMOTE_SERVER) ... endif() guard so
find_package(Protobuf REQUIRED), the add_custom_command that produces
PROTO_SRCS/PROTO_HDRS, and setting PROTO_FILE/PROTO_SRCS/PROTO_HDRS only run
when BUILD_REMOTE_SERVER is ON; remove the unconditional
include_directories(SYSTEM ${Protobuf_INCLUDE_DIRS}) and
include_directories(${CMAKE_CURRENT_BINARY_DIR}) and instead call
target_include_directories(<the target that consumes PROTO_SRCS/PROTO_HDRS>
PRIVATE ${CMAKE_CURRENT_BINARY_DIR} SYSTEM ${Protobuf_INCLUDE_DIRS}) so only the
target that uses the generated headers gets the includes.

Comment on lines 266 to 269
add_library(cuopt SHARED
${CUOPT_SRC_FILES}
${PROTO_SRCS}
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Proto sources compiled into main library unconditionally.

The ${PROTO_SRCS} is added to the cuopt library regardless of whether remote server support is enabled. If Protobuf setup is made conditional per the previous comment, ensure PROTO_SRCS defaults to empty so this line works correctly in both cases.

🤖 Prompt for AI Agents
In `@cpp/CMakeLists.txt` around lines 266 - 269, The cuopt library is currently
built with ${PROTO_SRCS} always appended which breaks builds when Protobuf is
disabled; ensure PROTO_SRCS is defined to an empty value when the
protobuf/remote-server branch is not taken (e.g., set(PROTO_SRCS "" ) or guard
with if(NOT DEFINED PROTO_SRCS) set(PROTO_SRCS "")) so the add_library(cuopt
SHARED ${CUOPT_SRC_FILES} ${PROTO_SRCS}) line works unconditionally; update the
CMake logic that defines PROTO_SRCS (the conditional that generates/sets it) to
provide this default.

@anandhkb anandhkb added this to the 26.02 milestone Jan 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature request New feature or request Feature non-breaking Introduces a non-breaking change P0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants