Skip to content

Add get_feature_names#4

Merged
aryehklein-rise merged 8 commits into
aryehlev:publishfrom
marcomq:publish
Apr 19, 2026
Merged

Add get_feature_names#4
aryehklein-rise merged 8 commits into
aryehlev:publishfrom
marcomq:publish

Conversation

@marcomq
Copy link
Copy Markdown
Contributor

@marcomq marcomq commented Dec 1, 2025

  • adding rust functions to get feature names
    • get_feature_names
    • get_float_feature_names
    • get_cat_feature_names
    • get_text_feature_names
    • get_embedding_feature_names

I already created a PR in the original catboost repo. But I also found this code here and not sure yet which crate I should use.

Summary by CodeRabbit

  • New Features

    • Added APIs to fetch feature names (all and by type: float, categorical, text, embedding) from a model.
  • Examples

    • Example flow now includes a "Feature names and indices" section that displays available names or a message if unavailable.
  • Bug Fixes

    • Improved robustness when querying feature name lists; no changes to prediction behavior.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Dec 1, 2025

Warning

Rate limit exceeded

@aryehklein-rise has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 38 minutes and 1 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 38 minutes and 1 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ef2daca4-4efa-4754-814b-dd7ac0ef99d7

📥 Commits

Reviewing files that changed from the base of the PR and between 4d25656 and 87ebd38.

📒 Files selected for processing (3)
  • Cargo.toml
  • examples/advanced_usage.rs
  • src/model.rs

Walkthrough

Added runtime dependency libc and FFI helpers in src/model.rs to convert C-allocated buffers/strings into Rust types, free C memory, and expose five feature-name retrieval APIs (gated by catboost_feature_indices). The new helper/API block was accidentally duplicated. Also added example calls in examples/advanced_usage.rs.

Changes

Cohort / File(s) Summary
Dependency Management
Cargo.toml
Added runtime dependency libc = "0.2.178".
Feature Name Retrieval APIs
src/model.rs
Added public APIs (gated by catboost_feature_indices): get_feature_names(), get_float_feature_names(), get_cat_feature_names(), get_text_feature_names(), get_embedding_feature_names().
Internal FFI & Conversion Helpers
src/model.rs
Added RAII CFreeGuard<T>, unsafe fn from_c_allocated_buffer<T: Copy>, and helpers: get_feature_indices_from_c(), get_feature_names_from_c(), get_specific_feature_names(), get_feature_indices(). Includes unit tests for null/empty handling and copying.
Duplication Notice
src/model.rs
The helper/public-API block and test module appear duplicated within the file, causing repeated declarations to reconcile.
Example Updates
examples/advanced_usage.rs
Inserted a new "Feature names and indices" section calling a private display_feature_names helper (feature-flagged behavior) before model validation.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • support all catboost versions. #2 — Adds/emits the catboost_feature_indices feature-flag infrastructure that the new APIs are gated by; directly related to build/feature gating.

Poem

🐰 I hopped through pointers by moonlit streams,
I freed C crumbs and harvested names in dreams,
Five lists I gathered, tidy and bright,
Libc lent a hand through the quiet night,
Hooray — the model's labels now in sight! 🥕

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'Add get_feature_names' accurately reflects the main objective: adding feature-name retrieval APIs to the Model struct.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
src/model.rs (2)

325-337: Guard assumptions when mapping feature indices to names

The pattern of:

  • calling get_feature_names() to get all_names, then
  • calling get_feature_indices(...) and indexing all_names[i]

is clean, but it rests entirely on the C API guaranteeing that:

  • every index returned by indices_fn is < all_names.len(), and
  • the index space used by Get*FeatureIndices matches that of GetModelUsedFeaturesNames.

If that contract is ever violated (version skew, bug, or undocumented change), all_names[i] will panic or read garbage.

Suggested follow‑ups:

  • Add a short debug‑only sanity check before indexing, e.g. debug_assert!(i < all_names.len());.
  • Document in the get_specific_feature_names doc comment that it assumes indices are in bounds for the array returned by GetModelUsedFeaturesNames.

The rest of the FFI wrapper (indices_fn signature, indices_ptr/count handling) looks consistent with the surrounding code.

Also applies to: 356-371


373-403: Type‑specific feature name accessors are clean; consider version gating for text/embedding APIs

The four type‑specific getters:

  • get_float_feature_names
  • get_cat_feature_names
  • get_text_feature_names
  • get_embedding_feature_names

nicely reuse get_specific_feature_names and the corresponding sys::Get*FeatureIndices functions, keeping the surface API small and consistent.

Two non‑blocking concerns to keep in mind:

  1. Version/feature availability

    • GetTextFeatureIndices and GetEmbeddingFeatureIndices may not exist in older CatBoost binaries, while this Rust code calls them unconditionally (unlike the get_*_features_count methods, which are behind #[cfg(...)] flags).
    • If you intend to support multiple CatBoost versions, you may want to mirror the #[cfg(catboost_text_count)] / #[cfg(catboost_embeddings)] gating here, or clearly document that this crate now targets a minimum CatBoost version where these functions are guaranteed to exist.
  2. Failure mode if indices retrieval fails

    • Right now, any failure in the underlying Get*FeatureIndices or GetModelUsedFeaturesNames will bubble up via CatBoostError, which is appropriate, but it might be worth mentioning in the Rustdoc that these functions can error if the underlying C API doesn’t support them or returns an error.

Overall, the API design is good; I’d just align the version gating/documentation with how you already handle text/embedding feature counts.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5693195 and 5d9e87a.

📒 Files selected for processing (2)
  • Cargo.toml (1 hunks)
  • src/model.rs (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/model.rs (1)
src/error.rs (1)
  • check_return_value (15-21)
🔇 Additional comments (2)
Cargo.toml (1)

13-15: libc dependency addition looks appropriate; just confirm version and usage scope

Adding libc as a runtime dependency to get access to free is a standard pattern for FFI with a C library and fits the new helpers in model.rs. Please just double‑check that:

  • this exact 0.2.177 version aligns with your intended MSRV and other deps, and
  • you don’t need to gate it behind any feature flags (e.g., if you later introduce a no-ffi or similar mode).
src/model.rs (1)

339-354: get_feature_names FFI pattern is sound; ensure CatBoost version compatibility

The implementation of get_feature_names:

  • correctly initializes names_ptr/count,
  • uses GetModelUsedFeaturesNames once, checks the boolean return via CatBoostError::check_return_value, and
  • delegates pointer conversion and freeing to get_feature_names_from_c.

This is a solid, minimal wrapper.

One thing to watch: GetModelUsedFeaturesNames is only available in newer versions of the CatBoost evaluation library. If your build script still supports older CatBoost binaries, this symbol (and possibly the corresponding header declaration) might be missing, causing compilation or link failures.

It would be good to:

  • Explicitly state in the crate docs or README that these APIs require a sufficiently new CatBoost evaluation library, or
  • Gate them behind an appropriate Cargo feature / #[cfg(...)] tied to the CatBoost version you download in build.rs.

Comment thread src/model.rs
Comment on lines +278 to +297
fn get_feature_indices_from_c(
indices_ptr: *mut usize,
count: usize,
err_msg: &str,
) -> CatBoostResult<Vec<usize>> {
if indices_ptr.is_null() {
if count == 0 {
return Ok(Vec::new());
}
return Err(CatBoostError {
description: err_msg.to_owned(),
});
}
let mut indices = Vec::with_capacity(count);
for i in 0..count {
indices.push(unsafe { *indices_ptr.add(i) });
}
unsafe { libc::free(indices_ptr as *mut _) };
Ok(indices)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add documentation confirming the C API memory ownership contract

The get_feature_indices_from_c and get_feature_names_from_c functions both use libc::free() to deallocate pointers returned from the CatBoost C API, which requires verifying and documenting the memory ownership contract:

  • Confirm in CatBoost C API documentation or comments that these functions allocate with malloc-compatible allocators and expect callers to deallocate with free
  • Add a SAFETY comment in each helper function explaining this assumption
  • Consider centralizing the pattern in a single unsafe wrapper (e.g., from_c_allocated_buffer<T>) to reduce duplication and make the contract explicit

Without documented verification of the allocator compatibility, this represents a potential source of undefined behavior (heap corruption/use-after-free), particularly if CatBoost or its dependencies use custom allocators or library-specific deallocators.

🤖 Prompt for AI Agents
In src/model.rs around lines 278 to 297, both get_feature_indices_from_c and
get_feature_names_from_c call libc::free on pointers returned by the CatBoost C
API without documenting the allocator contract; update the code by (1) adding a
SAFETY comment above each helper stating that the CatBoost C API must return
malloc-compatible buffers that the caller must free with libc::free (cite the
exact CatBoost C API doc or source line that confirms this), (2) centralize the
pattern by implementing a single unsafe helper (e.g.,
from_c_allocated_buffer<T>) that reads count elements, frees the pointer with
libc::free, and returns a Vec<T>, and (3) replace the existing functions to call
that helper so the ownership contract and unsafe boundary are explicit and
documented.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
src/model.rs (4)

278-303: Excellent safety documentation—past concerns addressed.

The comprehensive safety documentation and centralized memory management pattern directly addresses the previous review comment about documenting the C API memory ownership contract. The implementation correctly copies data before freeing and handles null pointers appropriately.

Optional: Consider using std::slice::from_raw_parts for a more idiomatic conversion:

unsafe fn from_c_allocated_buffer<T: Copy>(ptr: *mut T, count: usize) -> Vec<T> {
    if ptr.is_null() {
        return Vec::new();
    }
    let result = unsafe { std::slice::from_raw_parts(ptr, count).to_vec() };
    unsafe { libc::free(ptr as *mut _) };
    result
}

356-380: Consider adding feature gate to prevent dead code.

This helper function is only called by feature-gated public APIs and itself calls get_feature_names() which is also feature-gated. Adding the same gate here would prevent dead code warnings when the feature is disabled.

Apply this diff:

     /// Get names of specific type of features used in model,
     /// returns error if index out of bounds
+    #[cfg(catboost_feature_indices)]
     fn get_specific_feature_names(

382-415: Public API looks good; private helper could also use feature gate.

get_feature_names is properly implemented with appropriate error handling and memory management. The get_feature_indices helper has the same dead code consideration as get_specific_feature_names—it's only used by feature-gated code but isn't itself gated.

Consider adding #[cfg(catboost_feature_indices)] to get_feature_indices (line 400) as well to maintain consistency with get_specific_feature_names.


417-451: LGTM! Consistent public API implementation.

All four feature name getter functions follow a consistent pattern with proper feature gating and error handling. The implementation correctly delegates to get_specific_feature_names with the appropriate C function for each feature type.

Optional: The doc comments could be enhanced with information about return values and potential failure cases. For example:

/// Get names of float features used in model
///
/// # Returns
/// A vector of feature names, or an error if:
/// - The C API call fails
/// - Feature indices are out of bounds
///
/// # Example
/// ```ignore
/// let float_names = model.get_float_feature_names()?;
/// ```
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 68ce50e and 2831014.

📒 Files selected for processing (2)
  • Cargo.toml (1 hunks)
  • src/model.rs (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/model.rs (1)
src/error.rs (1)
  • check_return_value (15-21)
🔇 Additional comments (2)
src/model.rs (1)

305-354: LGTM! Proper C memory management for both simple and nested structures.

Both functions correctly handle the C memory lifecycle:

  • get_feature_indices_from_c uses the centralized helper for flat arrays
  • get_feature_names_from_c properly handles the 2D structure by freeing each inner string pointer before freeing the outer array
  • Null pointer and empty cases are handled correctly
Cargo.toml (1)

13-14: The libc version 0.2.178 is valid and free from known vulnerabilities.

Version 0.2.178 is the latest stable release on crates.io with no security advisories in the RustSec database. The dependency is appropriate for FFI memory management operations.

@aryehklein-rise
Copy link
Copy Markdown
Collaborator

@marcomq

i apologize i missed this PR, will review and merge later today.

Comment thread src/model.rs Outdated
return Vec::new();
}
let mut result = Vec::with_capacity(count);
for i in 0..count {
Copy link
Copy Markdown
Owner

@aryehlev aryehlev Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think u can use std::slice::from_raw_parts.
unsafe { std::slice::from_raw_parts(ptr, count) }.to_vec()

Comment thread src/model.rs Outdated
let mut names = Vec::with_capacity(count);
for i in 0..count {
let ptr = unsafe { *names_ptr.add(i) };
let s = unsafe { CStr::from_ptr(ptr) }
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this panics then im not sure the underlying string will be freed, i think maybe better to implement a guard around all free's wdyt?

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can use https://github.com/bluss/scopeguard or just implement a CFreeGuard struct.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/model.rs`:
- Around line 360-369: The helper functions get_specific_feature_names and
get_feature_indices must be gated with #[cfg(catboost_feature_indices)] so the
compiler doesn't type-check their bodies when the feature is disabled; add
#[cfg(catboost_feature_indices)] above the fn get_specific_feature_names(...)
and above fn get_feature_indices(...) (both are private) so their calls to
get_feature_names() and other cfg-gated items are only compiled when
catboost_feature_indices is enabled.
- Around line 351-355: The conversion from C pointers to Rust Strings must check
each inner char* for null and validate UTF‑8 before converting: in the block
that builds guards (the Vec<CFreeGuard<c_char>> and subsequent mapping), iterate
with index over the original str_ptrs/guards and for each g.0 return a
CatBoostError if the pointer is null instead of calling CStr::from_ptr; then
call CStr::from_ptr and use to_str() (not to_string_lossy()) to validate UTF‑8
and return a CatBoostError on invalid UTF‑8; only after all checks succeed,
collect owned Strings and return Ok(Vec<String>). Ensure the error paths free
resources consistently via the CFreeGuard wrapper.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ed2ef684-b20c-43fc-a9a7-810fce8a4145

📥 Commits

Reviewing files that changed from the base of the PR and between 2831014 and 4d25656.

📒 Files selected for processing (2)
  • examples/advanced_usage.rs
  • src/model.rs

Comment thread src/model.rs Outdated
Comment thread src/model.rs
@aryehklein-rise aryehklein-rise merged commit 6ed6b05 into aryehlev:publish Apr 19, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants