Skip to content

refactor: python package to be compatible with proto3 envelope#1849

Merged
JoshuaSBrown merged 18 commits into1845-DAPS-refactor-repo-server-to-use-proto3from
1847-DAPS-refactor-python-client-to-use-proto3
Feb 7, 2026
Merged

refactor: python package to be compatible with proto3 envelope#1849
JoshuaSBrown merged 18 commits into1845-DAPS-refactor-repo-server-to-use-proto3from
1847-DAPS-refactor-python-client-to-use-proto3

Conversation

@JoshuaSBrown
Copy link
Collaborator

@JoshuaSBrown JoshuaSBrown commented Feb 5, 2026

Ticket

Description

How Has This Been Tested?

Artifacts (if appropriate):

Tasks

  • - A description of the PR has been provided, and a diagram included if it is a new feature.
  • - Formatter has been run
  • - CHANGELOG comment has been added
  • - Labels have been assigned to the pr
  • - A reviwer has been added
  • - A user has been assigned to work on the pr
  • - If new feature a unit test has been added

Summary by Sourcery

Adopt a proto3 envelope-based messaging scheme across C++ and Python components, deriving message type IDs from the Envelope descriptor at runtime and aligning framing, mapping, and tests with the new protocol.

Enhancements:

  • Document and clarify the IMessage interface, message metadata enums, protobuf mapping, and framing layer with detailed comments and clearer semantics.
  • Extend ProtoBufMap with envelope wrap/unwrap helpers, descriptor-based type resolution, and an auth requirement helper based on message names.
  • Refine Frame, FrameConverter, and FrameFactory to clearly define responsibilities for wire headers and conversions between ZMQ, protobuf, and IMessage representations.
  • Deprecate the Python registerProtocol path in favor of a new registerEnvelope API that discovers message types from the Envelope proto, and switch MessageLib to use the unified proto envelope module.
  • Update CMake configuration for the Python package to generate protobufs from the new proto3 directory layout and eliminate the custom msg-index injection script in favor of envelope-based type discovery.
  • Slightly reduce proxy-related test run durations to speed up unit tests.

Build:

  • Repoint Python protobuf generation to the proto3/common tree, adjust import rewriting, and stop invoking the custom pyproto_add_msg_idx.py indexer script.

Tests:

  • Shorten proxy and basic ZMQ proxy test run durations to reduce overall test execution time.

@JoshuaSBrown JoshuaSBrown self-assigned this Feb 5, 2026
@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Feb 5, 2026

Reviewer's Guide

Refactors the messaging stack and Python client to use a proto3 Envelope-based runtime message-type registry, replacing the old build-time index generation, updates build/test wiring to the new proto3 layout, and improves documentation/comments throughout the C++ messaging headers and tests.

Sequence diagram for Python client initialization using proto3 Envelope

sequenceDiagram
    actor User
    participant MessageLib
    participant Connection
    participant envelope_pb2 as envelope_pb2_module
    participant Envelope as Envelope_message
    participant Server

    User->>MessageLib: create MessageLib(...)
    MessageLib->>Connection: __init__(...)
    activate Connection

    MessageLib->>Connection: registerEnvelope(envelope_pb2_module, "Envelope")
    Connection->>envelope_pb2_module: getattr(envelope_pb2_module, "Envelope")
    envelope_pb2_module-->>Connection: Envelope_message
    Connection->>Envelope_message: read DESCRIPTOR.fields
    loop for each message field
        Connection->>Connection: store msg_type, desc in maps
    end
    Connection-->>MessageLib: mappings ready

    deactivate Connection

    MessageLib->>Connection: sendRecv(VersionRequest(), 10000)
    activate Connection
    Connection->>envelope_pb2_module: create VersionRequest()
    Connection->>Connection: lookup msg_type via descriptor
    Connection->>Envelope_message: wrap VersionRequest into Envelope
    Connection->>Server: send framed Envelope
    Server-->>Connection: reply Envelope frame
    Connection->>Envelope_message: unwrap inner message
    Connection-->>MessageLib: VersionReply, msg_type
    deactivate Connection

    MessageLib->>Connection: sendRecv(GetAuthStatusRequest(), 10000)
    Connection->>Server: send framed Envelope
    Server-->>Connection: AuthStatusReply Envelope
    Connection-->>MessageLib: AuthStatusReply, msg_type

    MessageLib-->>User: initialized, auth status known
Loading

Class diagram for updated Python client message-type registry and usage

classDiagram
    direction LR

    class Connection {
      -logger _logger
      -dict~int,Descriptor~ _msg_desc_by_type
      -dict~string,Descriptor~ _msg_desc_by_name
      -dict~Descriptor,int~ _msg_type_by_desc
      +__del__()
      +registerEnvelope(envelope_module, envelope_class_name="Envelope")
      +registerProtocol(msg_module)
      ..other existing methods..
    }

    class MessageLib {
      -Connection _conn
      -string new_client_avail
      -bool _auth
      -string _uid
      +__init__(...)
      +manualAuthByPassword(uid, password)
      +manualAuthByToken(token)
      +getDailyMessage()
      +sendRecv(msg, timeout=None)
      ..other existing methods..
    }

    class envelope_pb2 {
      <<module>>
      +class Envelope
      +class VersionRequest
      +class GetAuthStatusRequest
      +class AuthenticateByPasswordRequest
      +class AuthenticateByTokenRequest
      +class DailyMessageRequest
      ..other message types..
    }

    class Envelope_py {
      <<protobuf message>>
      +DESCRIPTOR
      +fields
    }

    class Descriptor {
      <<protobuf Descriptor>>
      +string name
      +int number
      +Descriptor message_type
      +repeated FieldDescriptor fields
    }

    Connection --> Descriptor : uses in mappings
    Connection o--> envelope_pb2 : during registerEnvelope
    envelope_pb2 --> Envelope_py : class Envelope
    Envelope_py --> Descriptor : DESCRIPTOR

    MessageLib --> Connection : composition
    MessageLib ..> envelope_pb2 : uses message classes

    class anon_pb2 {
      <<deprecated module>>
    }

    class auth_pb2 {
      <<deprecated module>>
    }

    MessageLib ..x anon_pb2 : replaced by envelope_pb2
    MessageLib ..x auth_pb2 : replaced by envelope_pb2

    note for Connection "registerProtocol is deprecated; registerEnvelope inspects Envelope.DESCRIPTOR.fields and builds type maps at runtime"
Loading

File-Level Changes

Change Details Files
Document and slightly clarify the core C++ messaging interfaces and constants used for framing and attributes.
  • Add detailed Doxygen-style comments to MessageType, MessageState, and MessageAttribute enums and the toString helper
  • Add interface-level documentation to IMessage, clarifying ownership, routing, and metadata semantics
  • Document message-related constants used for frame_size, msg_type, and context
common/include/common/IMessage.hpp
Extend ProtoBufMap to explicitly model an envelope-based message type registry and add helpers for wrapping/unwrapping and auth classification.
  • Add high-level class/file documentation describing ProtoBufMap as an envelope-based registry
  • Clarify the descriptor maps and their purpose via comments and typedefs
  • Document getMessageType, wrapInEnvelope, unwrapFromEnvelope, and requiresAuth behaviors and error semantics
common/include/common/ProtoBufMap.hpp
common/source/ProtoBufMap.cpp
Document and slightly refactor frame handling utilities to clarify serialization semantics between ZMQ, Frame, and IMessage.
  • Add file- and type-level comments for Frame, FrameConverter, and FrameFactory
  • Document CopyDirection semantics and expectations for ZMQ frame size
  • Clarify which directions are supported for IMessage-based frame copying and frame construction
common/source/Frame.hpp
Switch the Python client from build-time message index helpers to a runtime Envelope-based registry and single proto module.
  • Introduce Connection.registerEnvelope to derive message type mappings from the Envelope descriptor at runtime and log registrations
  • Deprecate Connection.registerProtocol in favor of registerEnvelope, emitting a DeprecationWarning when used
  • Update MessageLib to register the unified proto envelope module and construct all client/auth messages from it instead of separate anon/auth modules
python/datafed_pkg/datafed/Connection.py
python/datafed_pkg/datafed/MessageLib.py
Align Python protobuf build with the new proto3 layout and Envelope module, removing obsolete helpers and updating copied artifacts.
  • Change CMake to recurse over common/proto3/common/*.proto instead of the old proto path
  • Remove pyproto_add_msg_idx.py usage and target the generated envelope_pb2.py for test-time copies
  • Delete the now-obsolete pyproto_add_msg_idx.py script
python/datafed_pkg/datafed/CMakeLists.txt
python/pyproto_add_msg_idx.py
Tighten proxy-related unit test runtimes to speed up the test suite without altering behavior.
  • Halve the run duration in Proxy tests that drive the proxy event loop
  • Reduce wait durations in ProxyBasicZMQ tests for the basic ZMQ proxy path
common/tests/unit/test_Proxy.cpp
common/tests/unit/test_ProxyBasicZMQ.cpp
Minor cleanup in server database API includes.
  • Remove a duplicate include of common/envelope.pb.h
core/server/DatabaseAPI.cpp

Assessment against linked issues

Issue Objective Addressed Explanation
#1847 Refactor the Python client messaging (Connection and MessageLib) to use the proto3, envelope-based protocol instead of the old anon/auth protocol modules and build-time message index helpers.
#1847 Update the Python package build configuration to generate protobuf code from the proto3 sources (including the Envelope message) and eliminate reliance on the pyproto_add_msg_idx.py helper and older proto layout.

Possibly linked issues

  • #: The PR implements the Python client refactor to use proto3 Envelope, directly addressing the proto3 support issue.

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • In registerEnvelope, consider adding basic validation/error handling (e.g., check that envelope_class_name exists on the module and that it has a DESCRIPTOR) and avoid silently doing nothing or raising an obscure AttributeError when the wrong module/class is passed.
  • When deriving message types from the envelope descriptor, you may want to restrict registration to only the fields that are intended as payloads (for example, by checking field.containing_oneof or a naming convention) to avoid accidentally registering internal or future metadata fields as message types.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `registerEnvelope`, consider adding basic validation/error handling (e.g., check that `envelope_class_name` exists on the module and that it has a `DESCRIPTOR`) and avoid silently doing nothing or raising an obscure `AttributeError` when the wrong module/class is passed.
- When deriving message types from the envelope descriptor, you may want to restrict registration to only the fields that are intended as payloads (for example, by checking `field.containing_oneof` or a naming convention) to avoid accidentally registering internal or future metadata fields as message types.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@JoshuaSBrown JoshuaSBrown added Type: Refactor Imlplementation change, same functionality Component: Repository Relates to repository service Priority: Medium Above average priority labels Feb 5, 2026
@JoshuaSBrown JoshuaSBrown linked an issue Feb 5, 2026 that may be closed by this pull request
@JoshuaSBrown
Copy link
Collaborator Author

@sourcery-ai review

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • The MessageAttribute::toString documentation now calls out all four enum values but the implementation still only handles ID and KEY; either extend the function to cover STATE and CORRELATION_ID or adjust the comment to avoid implying they are supported.
  • ProtoBufMap::requiresAuth still uses a hard-coded set of anonymous message names; consider deriving this from the same envelope/type registry used elsewhere so that new or renamed message types don't silently get the wrong auth behavior.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The MessageAttribute::toString documentation now calls out all four enum values but the implementation still only handles ID and KEY; either extend the function to cover STATE and CORRELATION_ID or adjust the comment to avoid implying they are supported.
- ProtoBufMap::requiresAuth still uses a hard-coded set of anonymous message names; consider deriving this from the same envelope/type registry used elsewhere so that new or renamed message types don't silently get the wrong auth behavior.

## Individual Comments

### Comment 1
<location> `python/datafed_pkg/datafed/Connection.py:164-171` </location>
<code_context>
+    #
     # @param msg_module - Protobuf module (imported *_pb2 module)
     #
     def registerProtocol(self, msg_module):
-        # Message descriptors are stored by name created by protobuf compiler
-        # A custom post-proc tool generates and appends _msg_name_to_type with
-        # defined DataFed-sepcific numer message types
-
+        import warnings
+        warnings.warn(
+            "registerProtocol() is deprecated, use registerEnvelope() instead",
+            DeprecationWarning,
+            stacklevel=2,
+        )
         for name, desc in sorted(msg_module.DESCRIPTOR.message_types_by_name.items()):
             msg_t = msg_module._msg_name_to_type[name]
             self._msg_desc_by_type[msg_t] = desc
</code_context>

<issue_to_address>
**issue (bug_risk):** Deprecated registerProtocol still relies on _msg_name_to_type even though the generator was removed.

Because `pyproto_add_msg_idx.py` was removed, this now assumes `msg_module._msg_name_to_type` exists when it may not, causing `AttributeError` at runtime for any remaining `registerProtocol` callers. Please either add a compatibility path that derives the mapping without `_msg_name_to_type`, or make this method fail fast with a clear error (or remove it) so consumers don’t hit a confusing runtime failure.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines 164 to 171
def registerProtocol(self, msg_module):
# Message descriptors are stored by name created by protobuf compiler
# A custom post-proc tool generates and appends _msg_name_to_type with
# defined DataFed-sepcific numer message types

import warnings
warnings.warn(
"registerProtocol() is deprecated, use registerEnvelope() instead",
DeprecationWarning,
stacklevel=2,
)
for name, desc in sorted(msg_module.DESCRIPTOR.message_types_by_name.items()):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): Deprecated registerProtocol still relies on _msg_name_to_type even though the generator was removed.

Because pyproto_add_msg_idx.py was removed, this now assumes msg_module._msg_name_to_type exists when it may not, causing AttributeError at runtime for any remaining registerProtocol callers. Please either add a compatibility path that derives the mapping without _msg_name_to_type, or make this method fail fast with a clear error (or remove it) so consumers don’t hit a confusing runtime failure.

@JoshuaSBrown JoshuaSBrown merged commit c531a4b into 1845-DAPS-refactor-repo-server-to-use-proto3 Feb 7, 2026
5 of 6 checks passed
JoshuaSBrown added a commit that referenced this pull request Feb 7, 2026
* [DAPS-1847] - refactor: python package to be compatible with proto3 envelope (#1849)
* [DAPS-1848] - refactor: update authz files to use proto3. (#1851)
* [DAPS-1850] - refactor web server proto3 (#1852)
JoshuaSBrown added a commit that referenced this pull request Feb 7, 2026
* [DAPS-1845] - refactor repo server to use proto3 (#1846)
* [DAPS-1847] - refactor: python package to be compatible with proto3 envelope (#1849)
* [DAPS-1848] - refactor: update authz files to use proto3. (#1851)
* [DAPS-1850] - refactor web server proto3 (#1852)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Component: Repository Relates to repository service Priority: Medium Above average priority Type: Refactor Imlplementation change, same functionality

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Refactor] - Python client to support proto 3

1 participant