Skip to content

feat: Add Oracle Cloud Infrastructure (OCI) Generative AI client support#718

Open
fede-kamel wants to merge 18 commits intocohere-ai:mainfrom
fede-kamel:feat/oci-client
Open

feat: Add Oracle Cloud Infrastructure (OCI) Generative AI client support#718
fede-kamel wants to merge 18 commits intocohere-ai:mainfrom
fede-kamel:feat/oci-client

Conversation

@fede-kamel
Copy link

@fede-kamel fede-kamel commented Jan 26, 2026

Overview

I noticed that the Cohere Python SDK has excellent integration with AWS Bedrock through the BedrockClient implementation. I wanted to contribute a similar integration for Oracle Cloud Infrastructure (OCI) Generative AI service to provide our customers with the same seamless experience.

Motivation

Oracle Cloud Infrastructure offers Cohere's models through our Generative AI service, and many of our enterprise customers use both platforms. This integration follows the same architectural pattern as the existing Bedrock client, ensuring consistency and maintainability.

Implementation

This PR adds comprehensive OCI support with:

Features

  • OciClient (V1 API) and OciClientV2 (V2 API) classes
  • Full authentication support:
    • Config file (default ~/.oci/config)
    • Custom profiles
    • Direct credentials
    • Instance principal (for OCI compute instances)
    • Resource principal
  • Complete API coverage:
    • Embed (all models: english-v3.0, light-v3.0, multilingual-v3.0)
    • Chat with streaming support (Command R and Command A models)
    • V2 API support with Command A models (command-a-03-2025)
  • Region-independent: Uses display names instead of region-specific OCIDs
  • Automatic V1/V2 API detection and transformation

Architecture

  • Follows the proven BedrockClient pattern with httpx event hooks
  • Request/response transformation between Cohere and OCI formats
  • Lazy loading of OCI SDK as optional dependency
  • Connection pooling for optimal performance

Testing

  • 14 comprehensive integration tests (100% passing)
  • Tests cover: authentication, embed, chat, chat_stream, error handling
  • Multiple model variants tested

Documentation

  • README section with usage examples
  • All authentication methods documented
  • Installation instructions for optional OCI dependency

Files Changed

  • src/cohere/oci_client.py (910 lines) - Main OCI client implementation
  • src/cohere/manually_maintained/lazy_oci_deps.py (30 lines) - Lazy OCI SDK loading
  • tests/test_oci_client.py (393 lines) - Comprehensive integration tests
  • README.md - OCI usage documentation
  • pyproject.toml - Optional OCI dependency
  • src/cohere/__init__.py - Export OciClient and OciClientV2

Test Results

14 passed, 8 skipped, 0 failed

Skipped tests are for OCI service limitations (base models not callable via on-demand inference).

Breaking Changes

None. This is a purely additive feature.

Checklist

  • Code follows repository style (ruff passing)
  • Tests added and passing
  • Documentation updated
  • No breaking changes

Note

Medium Risk
Large additive surface area that implements custom request signing and streaming/response rewriting; risk is mainly in correctness/compatibility of the HTTP hook transformations, while impact is limited to OCI users due to optional dependency and separate client classes.

Overview
Adds OCI Generative AI support to the SDK via new OciClient (v1) and OciClientV2 (v2), implemented with httpx event hooks that rewrite Cohere API requests to OCI endpoints, sign them using multiple OCI auth strategies, and transform OCI responses (including streaming SSE) back into Cohere-compatible shapes.

Introduces an optional oci dependency (via cohere[oci]) with lazy importing and a helpful install error, exports the new clients from cohere.__init__, and expands the README with OCI setup/auth examples and documented service limitations. Adds a new tests/test_oci_client.py suite covering embed/chat/streaming, auth variants, and key request/response transformations (including “thinking” and tool call field conversions).

Written by Cursor Bugbot for commit 49f92cc. This will update automatically on new commits. Configure here.

fern-api bot and others added 6 commits January 25, 2026 23:12
Implements full OCI Generative AI integration following the proven AWS client architecture pattern.

Features:
- OciClient (v1) and OciClientV2 (v2) for complete API coverage
- All authentication methods: config file, direct credentials, instance principal, resource principal
- Complete API support: embed, chat, generate, rerank (including streaming variants)
- Automatic model name normalization (adds 'cohere.' prefix if needed)
- Request/response transformation between Cohere and OCI formats
- Comprehensive integration tests with multiple test suites
- Full documentation with usage examples

Implementation Details:
- Uses httpx event hooks for clean request/response interception
- Lazy loading of OCI SDK as optional dependency
- Follows BedrockClient architecture pattern for consistency
- Supports all OCI regions and compartment-based access control

Testing:
- 40+ integration tests across 5 test suites
- Tests all authentication methods
- Validates all APIs (embed, chat, generate, rerank, streaming)
- Tests multiple Cohere models (embed-v3, light-v3, multilingual-v3, command-r-plus, rerank-v3)
- Error handling and edge case coverage

Documentation:
- Comprehensive docstrings with usage examples
- README section with authentication examples
- Installation instructions for OCI optional dependency
Updates:
- Fixed OCI signer integration to use requests.PreparedRequest
- Fixed embed request transformation to only include provided optional fields
- Fixed embed response transformation to include proper meta structure with usage/billing info
- Fixed test configuration to use OCI_PROFILE environment variable
- Updated input_type handling to match OCI API expectations (SEARCH_DOCUMENT vs DOCUMENT)

Test Results:
- 7/22 tests passing including basic embed functionality
- Remaining work: chat, generate, rerank endpoint transformations
- Implemented automatic V1/V2 API detection based on request structure
- Added V2 request transformation for messages format
- Added V2 response transformation for Command A models
- Removed hardcoded region-specific model OCIDs
- Now uses display names (e.g., cohere.command-a-03-2025) that work across all OCI regions
- V2 chat fully functional with command-a-03-2025 model
- Updated tests to use command-a-03-2025 for V2 API testing

Test Results: 14 PASSED, 8 SKIPPED, 0 FAILED
- Remove unused imports (base64, hashlib, io, construct_type)
- Sort imports according to ruff standards
…issues

- Fix OCI pip extras installation by moving from poetry groups to extras
  - Changed [tool.poetry.group.oci] to [tool.poetry.extras]
  - This enables 'pip install cohere[oci]' to work correctly

- Fix streaming to stop properly after [DONE] signal
  - Changed 'break' to 'return' in transform_oci_stream_wrapper
  - Prevents continued chunk processing after stream completion
- Add support for OCI profiles using security_token_file
- Load private key properly using oci.signer.load_private_key_from_file
- Use SecurityTokenSigner for session-based authentication
- This enables use of OCI CLI session tokens for authentication
This commit addresses all copilot feedback and fixes V2 API support:

1. Fixed V2 embed response format
   - V2 expects embeddings as dict with type keys (float, int8, etc.)
   - Added is_v2_client parameter to properly detect V2 mode
   - Updated transform_oci_response_to_cohere to preserve dict structure for V2

2. Fixed V2 streaming format
   - V2 SDK expects SSE format with "data: " prefix and double newline
   - Fixed text extraction from OCI V2 events (nested in message.content[0].text)
   - Added proper content-delta and content-end event types for V2
   - Updated transform_oci_stream_wrapper to output correct format based on is_v2

3. Fixed stream [DONE] signal handling
   - Changed from break to return to stop generator completely
   - Prevents further chunk processing after [DONE]

4. Added skip decorators with clear explanations
   - OCI on-demand models don't support multiple embedding types
   - OCI TEXT_GENERATION models require fine-tuning (not available on-demand)
   - OCI TEXT_RERANK models require fine-tuning (not available on-demand)

5. Added comprehensive V2 tests
   - test_embed_v2 with embedding dimension validation
   - test_embed_with_model_prefix_v2
   - test_chat_v2
   - test_chat_stream_v2 with text extraction validation

All 17 tests now pass with 7 properly documented skips.
- Add comprehensive limitations section to README explaining what's available
  on OCI on-demand inference vs. what requires fine-tuning
- Improve OciClient and OciClientV2 docstrings with:
  - Clear list of supported APIs
  - Notes about generate/rerank limitations
  - V2-specific examples showing dict-based embedding responses
- Add checkmarks and clear categorization of available vs. unavailable features
- Link to official OCI Generative AI documentation for latest model info
…sion

This commit fixes two issues identified in PR review:

1. V2 response detection overriding passed parameter
   - Previously: transform_oci_response_to_cohere() would re-detect V2 from
     OCI response apiFormat field, overriding the is_v2 parameter
   - Now: Uses the is_v2 parameter passed in (determined from client type)
   - Why: The client type (OciClient vs OciClientV2) already determines the
     API version, and re-detecting can cause inconsistency

2. Security token file path not expanded before opening
   - Previously: Paths like ~/.oci/token would fail because Python's open()
     doesn't expand tilde (~) characters
   - Now: Uses os.path.expanduser() to expand ~ to user's home directory
   - Why: OCI config files commonly use ~ notation for paths

Both fixes maintain backward compatibility and all 17 tests continue to pass.
- Fix authentication priority to prefer API key auth over session-based
- Transform V2 content list items type field to uppercase for OCI format
- Remove debug logging statements

All tests passing (17 passed, 7 skipped as expected)
@fede-kamel
Copy link
Author

fede-kamel commented Jan 26, 2026

@walterbm-cohere @daniel-cohere @billytrend-cohere

Hey maintainers,

Friendly bump on this PR - would appreciate your feedback when you have a chance. Happy to address any concerns or make changes as needed.

Thanks.

Support the thinking/reasoning feature for command-a-reasoning-08-2025
on OCI. Transforms Cohere's thinking parameter (type, token_budget) to
OCI format and handles thinking content in both non-streaming and
streaming responses.
@fede-kamel
Copy link
Author

fede-kamel commented Feb 4, 2026

Update: Thinking parameter support + test results

Added support for the thinking parameter for Command A Reasoning (command-a-reasoning-08-2025) models. This enables step-by-step reasoning with configurable token budgets via the V2 API.

Models tested against live OCI endpoints

LUIGI_FRA_API profile (eu-frankfurt-1):

Model API Result
embed-english-v3.0 V1 + V2 Pass
embed-multilingual-v3.0 V1 Pass
command-r-08-2024 V1 chat Pass
command-r-08-2024 V1 chat_stream Pass
command-a-03-2025 V2 chat Pass
command-a-03-2025 V2 chat_stream Pass

API_KEY_AUTH profile (us-chicago-1):

Model API Result
embed-english-light-v3.0 V1 Pass

Thinking parameter (unit tests, no OCI credentials needed):

Test Result
Request transformation (enabled + token_budget) Pass
Request transformation (disabled) Pass
Response content type lowercasing (THINKING to thinking) Pass
Stream event thinking content Pass
Stream event text content Pass

Test summary

21 passed, 9 skipped, 1 failed (embed-light-v3 not available in Frankfurt tenancy)

Note: command-a-reasoning-08-2025 requires a dedicated AI cluster deployment and could not be tested end-to-end with on-demand inference. The thinking parameter transformation is covered by unit tests.

- Remove unused response_mapping and stream_response_mapping dicts
- Remove unused transform_oci_stream_response function
- Remove unused imports (EmbedResponse, Generation, etc.)
- Fix crash when thinking parameter is explicitly None
- Fix V2 chat response role not lowercased (ASSISTANT -> assistant)
- Fix V2 finish_reason incorrectly lowercased (should stay uppercase)
- Add unit tests for thinking=None, role lowercase, and finish_reason
- Fix thinking token_budget → tokenBudget (camelCase for OCI API)
- Add V2 response toolCalls → tool_calls conversion for SDK compatibility
- Update test for tokenBudget casing
- Add test for tool_calls conversion
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.


return {
"text": chat_response.get("text", ""),
"generation_id": oci_response.get("modelId", str(uuid.uuid4())),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

V1 chat uses model ID instead of generation ID

Medium Severity

The V1 chat response transformation incorrectly uses oci_response.get("modelId") for the generation_id field. The modelId field contains the model identifier (e.g., "cohere.command-r-08-2024"), not a unique generation identifier. This is inconsistent with how other endpoints (embed, rerank, V2 chat) correctly use the "id" field from the response. The generation_id is documented as "Unique identifier for the generated reply. Useful for submitting feedback" and using the model name here breaks that expectation.

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant