Add get_feature_names by marcomq · Pull Request #4 · aryehlev/catboost-rust

marcomq · 2025-12-01T18:17:38Z

adding rust functions to get feature names
- get_feature_names
- get_float_feature_names
- get_cat_feature_names
- get_text_feature_names
- get_embedding_feature_names

I already created a PR in the original catboost repo. But I also found this code here and not sure yet which crate I should use.

Summary by CodeRabbit

New Features
- Added APIs to fetch feature names (all and by type: float, categorical, text, embedding) from a model.
Examples
- Example flow now includes a "Feature names and indices" section that displays available names or a message if unavailable.
Bug Fixes
- Improved robustness when querying feature name lists; no changes to prediction behavior.

coderabbitai · 2025-12-01T18:17:46Z

Warning

Rate limit exceeded

@aryehklein-rise has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 38 minutes and 1 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 38 minutes and 1 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ef2daca4-4efa-4754-814b-dd7ac0ef99d7

📥 Commits

Reviewing files that changed from the base of the PR and between 4d25656 and 87ebd38.

📒 Files selected for processing (3)

Cargo.toml
examples/advanced_usage.rs
src/model.rs

Walkthrough

Added runtime dependency libc and FFI helpers in src/model.rs to convert C-allocated buffers/strings into Rust types, free C memory, and expose five feature-name retrieval APIs (gated by catboost_feature_indices). The new helper/API block was accidentally duplicated. Also added example calls in examples/advanced_usage.rs.

Changes

Cohort / File(s)	Summary
Dependency Management `Cargo.toml`	Added runtime dependency `libc = "0.2.178"`.
Feature Name Retrieval APIs `src/model.rs`	Added public APIs (gated by `catboost_feature_indices`): `get_feature_names()`, `get_float_feature_names()`, `get_cat_feature_names()`, `get_text_feature_names()`, `get_embedding_feature_names()`.
Internal FFI & Conversion Helpers `src/model.rs`	Added RAII `CFreeGuard<T>`, `unsafe fn from_c_allocated_buffer<T: Copy>`, and helpers: `get_feature_indices_from_c()`, `get_feature_names_from_c()`, `get_specific_feature_names()`, `get_feature_indices()`. Includes unit tests for null/empty handling and copying.
Duplication Notice `src/model.rs`	The helper/public-API block and test module appear duplicated within the file, causing repeated declarations to reconcile.
Example Updates `examples/advanced_usage.rs`	Inserted a new "Feature names and indices" section calling a private `display_feature_names` helper (feature-flagged behavior) before model validation.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

support all catboost versions. #2 — Adds/emits the catboost_feature_indices feature-flag infrastructure that the new APIs are gated by; directly related to build/feature gating.

Poem

🐰 I hopped through pointers by moonlit streams,
I freed C crumbs and harvested names in dreams,
Five lists I gathered, tidy and bright,
Libc lent a hand through the quiet night,
Hooray — the model's labels now in sight! 🥕

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title 'Add get_feature_names' accurately reflects the main objective: adding feature-name retrieval APIs to the Model struct.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

src/model.rs (2)

325-337: Guard assumptions when mapping feature indices to names

The pattern of:

calling get_feature_names() to get all_names, then

calling get_feature_indices(...) and indexing all_names[i]

is clean, but it rests entirely on the C API guaranteeing that:

every index returned by indices_fn is < all_names.len(), and

the index space used by Get*FeatureIndices matches that of GetModelUsedFeaturesNames.

If that contract is ever violated (version skew, bug, or undocumented change), all_names[i] will panic or read garbage.

Suggested follow‑ups:

Add a short debug‑only sanity check before indexing, e.g. debug_assert!(i < all_names.len());.

Document in the get_specific_feature_names doc comment that it assumes indices are in bounds for the array returned by GetModelUsedFeaturesNames.

The rest of the FFI wrapper (indices_fn signature, indices_ptr/count handling) looks consistent with the surrounding code.

Also applies to: 356-371

373-403: Type‑specific feature name accessors are clean; consider version gating for text/embedding APIs

The four type‑specific getters:

get_float_feature_names

get_cat_feature_names

get_text_feature_names

get_embedding_feature_names

nicely reuse get_specific_feature_names and the corresponding sys::Get*FeatureIndices functions, keeping the surface API small and consistent.

Two non‑blocking concerns to keep in mind:

Version/feature availability

GetTextFeatureIndices and GetEmbeddingFeatureIndices may not exist in older CatBoost binaries, while this Rust code calls them unconditionally (unlike the get_*_features_count methods, which are behind #[cfg(...)] flags).

If you intend to support multiple CatBoost versions, you may want to mirror the #[cfg(catboost_text_count)] / #[cfg(catboost_embeddings)] gating here, or clearly document that this crate now targets a minimum CatBoost version where these functions are guaranteed to exist.

Failure mode if indices retrieval fails

Right now, any failure in the underlying Get*FeatureIndices or GetModelUsedFeaturesNames will bubble up via CatBoostError, which is appropriate, but it might be worth mentioning in the Rustdoc that these functions can error if the underlying C API doesn’t support them or returns an error.

Overall, the API design is good; I’d just align the version gating/documentation with how you already handle text/embedding feature counts.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5693195 and 5d9e87a.

📒 Files selected for processing (2)

Cargo.toml (1 hunks)
src/model.rs (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

src/model.rs (1)

src/error.rs (1)

check_return_value (15-21)

🔇 Additional comments (2)

Cargo.toml (1)

13-15: libc dependency addition looks appropriate; just confirm version and usage scope

Adding libc as a runtime dependency to get access to free is a standard pattern for FFI with a C library and fits the new helpers in model.rs. Please just double‑check that:

this exact 0.2.177 version aligns with your intended MSRV and other deps, and

you don’t need to gate it behind any feature flags (e.g., if you later introduce a no-ffi or similar mode).

src/model.rs (1)

339-354: get_feature_names FFI pattern is sound; ensure CatBoost version compatibility

The implementation of get_feature_names:

correctly initializes names_ptr/count,

uses GetModelUsedFeaturesNames once, checks the boolean return via CatBoostError::check_return_value, and

delegates pointer conversion and freeing to get_feature_names_from_c.

This is a solid, minimal wrapper.

One thing to watch: GetModelUsedFeaturesNames is only available in newer versions of the CatBoost evaluation library. If your build script still supports older CatBoost binaries, this symbol (and possibly the corresponding header declaration) might be missing, causing compilation or link failures.

It would be good to:

Explicitly state in the crate docs or README that these APIs require a sufficiently new CatBoost evaluation library, or

Gate them behind an appropriate Cargo feature / #[cfg(...)] tied to the CatBoost version you download in build.rs.

coderabbitai · 2025-12-01T18:23:14Z

+    fn get_feature_indices_from_c(
+        indices_ptr: *mut usize,
+        count: usize,
+        err_msg: &str,
+    ) -> CatBoostResult<Vec<usize>> {
+        if indices_ptr.is_null() {
+            if count == 0 {
+                return Ok(Vec::new());
+            }
+            return Err(CatBoostError {
+                description: err_msg.to_owned(),
+            });
+        }
+        let mut indices = Vec::with_capacity(count);
+        for i in 0..count {
+            indices.push(unsafe { *indices_ptr.add(i) });
+        }
+        unsafe { libc::free(indices_ptr as *mut _) };
+        Ok(indices)
+    }


⚠️ Potential issue | 🟠 Major

Add documentation confirming the C API memory ownership contract

The get_feature_indices_from_c and get_feature_names_from_c functions both use libc::free() to deallocate pointers returned from the CatBoost C API, which requires verifying and documenting the memory ownership contract:

Confirm in CatBoost C API documentation or comments that these functions allocate with malloc-compatible allocators and expect callers to deallocate with free

Add a SAFETY comment in each helper function explaining this assumption

Consider centralizing the pattern in a single unsafe wrapper (e.g., from_c_allocated_buffer<T>) to reduce duplication and make the contract explicit

Without documented verification of the allocator compatibility, this represents a potential source of undefined behavior (heap corruption/use-after-free), particularly if CatBoost or its dependencies use custom allocators or library-specific deallocators.

🤖 Prompt for AI Agents

In src/model.rs around lines 278 to 297, both get_feature_indices_from_c and get_feature_names_from_c call libc::free on pointers returned by the CatBoost C API without documenting the allocator contract; update the code by (1) adding a SAFETY comment above each helper stating that the CatBoost C API must return malloc-compatible buffers that the caller must free with libc::free (cite the exact CatBoost C API doc or source line that confirms this), (2) centralize the pattern by implementing a single unsafe helper (e.g., from_c_allocated_buffer<T>) that reads count elements, frees the pointer with libc::free, and returns a Vec<T>, and (3) replace the existing functions to call that helper so the ownership contract and unsafe boundary are explicit and documented.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (4)

src/model.rs (4)
278-303: Excellent safety documentation—past concerns addressed.

The comprehensive safety documentation and centralized memory management pattern directly addresses the previous review comment about documenting the C API memory ownership contract. The implementation correctly copies data before freeing and handles null pointers appropriately.

Optional: Consider using std::slice::from_raw_parts for a more idiomatic conversion:
unsafe fn from_c_allocated_buffer<T: Copy>(ptr: *mut T, count: usize) -> Vec<T> {
    if ptr.is_null() {
        return Vec::new();
    }
    let result = unsafe { std::slice::from_raw_parts(ptr, count).to_vec() };
    unsafe { libc::free(ptr as *mut _) };
    result
}
356-380: Consider adding feature gate to prevent dead code.

This helper function is only called by feature-gated public APIs and itself calls get_feature_names() which is also feature-gated. Adding the same gate here would prevent dead code warnings when the feature is disabled.

Apply this diff:
     /// Get names of specific type of features used in model,
     /// returns error if index out of bounds
+    #[cfg(catboost_feature_indices)]
     fn get_specific_feature_names(
382-415: Public API looks good; private helper could also use feature gate.

get_feature_names is properly implemented with appropriate error handling and memory management. The get_feature_indices helper has the same dead code consideration as get_specific_feature_names—it's only used by feature-gated code but isn't itself gated.

Consider adding #[cfg(catboost_feature_indices)] to get_feature_indices (line 400) as well to maintain consistency with get_specific_feature_names.

417-451: LGTM! Consistent public API implementation.

All four feature name getter functions follow a consistent pattern with proper feature gating and error handling. The implementation correctly delegates to get_specific_feature_names with the appropriate C function for each feature type.

Optional: The doc comments could be enhanced with information about return values and potential failure cases. For example:
/// Get names of float features used in model
///
/// # Returns
/// A vector of feature names, or an error if:
/// - The C API call fails
/// - Feature indices are out of bounds
///
/// # Example
/// ```ignore
/// let float_names = model.get_float_feature_names()?;
/// ```

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 68ce50e and 2831014.

📒 Files selected for processing (2)

Cargo.toml (1 hunks)
src/model.rs (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

src/model.rs (1)

src/error.rs (1)

check_return_value (15-21)

🔇 Additional comments (2)

src/model.rs (1)

305-354: LGTM! Proper C memory management for both simple and nested structures.

Both functions correctly handle the C memory lifecycle:

get_feature_indices_from_c uses the centralized helper for flat arrays

get_feature_names_from_c properly handles the 2D structure by freeing each inner string pointer before freeing the outer array

Null pointer and empty cases are handled correctly

Cargo.toml (1)

13-14: The libc version 0.2.178 is valid and free from known vulnerabilities.

Version 0.2.178 is the latest stable release on crates.io with no security advisories in the RustSec database. The dependency is appropriate for FFI memory management operations.

aryehklein-rise · 2026-04-16T12:21:28Z

@marcomq

i apologize i missed this PR, will review and merge later today.

aryehlev · 2026-04-16T17:13:19Z

+            return Vec::new();
+        }
+        let mut result = Vec::with_capacity(count);
+        for i in 0..count {


i think u can use std::slice::from_raw_parts.
unsafe { std::slice::from_raw_parts(ptr, count) }.to_vec()

aryehlev · 2026-04-16T17:25:47Z

+        let mut names = Vec::with_capacity(count);
+        for i in 0..count {
+            let ptr = unsafe { *names_ptr.add(i) };
+            let s = unsafe { CStr::from_ptr(ptr) }


if this panics then im not sure the underlying string will be freed, i think maybe better to implement a guard around all free's wdyt?

maybe we can use https://github.com/bluss/scopeguard or just implement a CFreeGuard struct.

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/model.rs`:
- Around line 360-369: The helper functions get_specific_feature_names and
get_feature_indices must be gated with #[cfg(catboost_feature_indices)] so the
compiler doesn't type-check their bodies when the feature is disabled; add
#[cfg(catboost_feature_indices)] above the fn get_specific_feature_names(...)
and above fn get_feature_indices(...) (both are private) so their calls to
get_feature_names() and other cfg-gated items are only compiled when
catboost_feature_indices is enabled.
- Around line 351-355: The conversion from C pointers to Rust Strings must check
each inner char* for null and validate UTF‑8 before converting: in the block
that builds guards (the Vec<CFreeGuard<c_char>> and subsequent mapping), iterate
with index over the original str_ptrs/guards and for each g.0 return a
CatBoostError if the pointer is null instead of calling CStr::from_ptr; then
call CStr::from_ptr and use to_str() (not to_string_lossy()) to validate UTF‑8
and return a CatBoostError on invalid UTF‑8; only after all checks succeed,
collect owned Strings and return Ok(Vec<String>). Ensure the error paths free
resources consistently via the CFreeGuard wrapper.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ed2ef684-b20c-43fc-a9a7-810fce8a4145

📥 Commits

Reviewing files that changed from the base of the PR and between 2831014 and 4d25656.

📒 Files selected for processing (2)

examples/advanced_usage.rs
src/model.rs

add get feature names

26b0a74

fmt

5d9e87a

coderabbitai Bot reviewed Dec 1, 2025

View reviewed changes

marcomq added 2 commits December 2, 2025 18:17

error handling and specific cfg

68ce50e

add safety comments

2831014

coderabbitai Bot reviewed Dec 15, 2025

View reviewed changes

aryehlev reviewed Apr 16, 2026

View reviewed changes

add guards in case of panic.

4d25656

coderabbitai Bot reviewed Apr 19, 2026

View reviewed changes

Comment thread src/model.rs Outdated

Comment thread src/model.rs

aryehklein-rise added 3 commits April 19, 2026 11:08

Merge remote-tracking branch 'origin/publish' into marcomq/publish

28ff9c7

put back null checks.

93d2206

fix linters.

87ebd38

aryehklein-rise merged commit 6ed6b05 into aryehlev:publish Apr 19, 2026
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add get_feature_names#4

Add get_feature_names#4
aryehklein-rise merged 8 commits into
aryehlev:publishfrom
marcomq:publish

marcomq commented Dec 1, 2025 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Dec 1, 2025 •

edited

Loading

Rate limit exceeded

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Dec 1, 2025

Uh oh!

coderabbitai Bot left a comment

Uh oh!

aryehklein-rise commented Apr 16, 2026

Uh oh!

aryehlev Apr 16, 2026 •

edited

Loading

Uh oh!

aryehlev Apr 16, 2026

Uh oh!

aryehlev Apr 16, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

marcomq commented Dec 1, 2025 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

aryehklein-rise commented Apr 16, 2026

Uh oh!

aryehlev Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aryehlev Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

aryehlev Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

marcomq commented Dec 1, 2025 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Dec 1, 2025 •

edited

Loading

aryehlev Apr 16, 2026 •

edited

Loading