Skip to content

New tracer ergonomics (JUD-3957)#697

Open
abhishekg999 wants to merge 30 commits intostagingfrom
ahh/scope-new-tracer-ergonomics
Open

New tracer ergonomics (JUD-3957)#697
abhishekg999 wants to merge 30 commits intostagingfrom
ahh/scope-new-tracer-ergonomics

Conversation

@abhishekg999
Copy link
Contributor

@abhishekg999 abhishekg999 commented Feb 13, 2026

This is a WIP Pr that will be a primary refactor separating Judgeval (a SDK around JudgmentLabs Rest API) from Judgment's observability product.

This will introduce a new Tracer top level object that can be used to trace, as well as better multi project support and ergonomics.

Example of new syntax:

from judgeval import Tracer
Tracer.init("test")

@Tracer.observe()
def test(x: int) -> int:
    Tracer.async_evaluate("always accept")
    Tracer.async_evaluate(
        "Answer Correctness",
        {
            "question": "What is the capital of France?",
            "answer": "Paris",
        },
    )
    return x + 1
test(1)

Open with Devin

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @abhishekg999, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request undertakes a significant architectural refactor of the judgeval library, decoupling its core SDK functionalities from its observability product. The primary outcome is a new, more flexible Tracer object that streamlines tracing operations, offers improved support for diverse project structures, and integrates an asynchronous background queue for evaluation tasks. These changes aim to enhance the library's modularity, performance, and user experience for observability features.

Highlights

  • New Tracer Object: Introduced a new top-level Tracer object for enhanced observability, separating it from the core Judgeval SDK.
  • Improved Ergonomics and Multi-Project Support: The new Tracer design provides better ergonomics and robust support for tracing across multiple projects.
  • Asynchronous Background Queue: Implemented a BackgroundQueue for asynchronous dispatch of evaluation tasks, improving performance and responsiveness.
  • OpenTelemetry Integration: The new tracing system leverages OpenTelemetry for span management, including custom exporters, processors, and ID generators, and allows registration of third-party OTel instrumentors.
  • Deprecation of Old Tracer: The previous judgeval.v1.tracer.Tracer and BaseTracer classes are now deprecated, with a warning added to guide users to the new judgeval.v1.trace.Tracer.
  • New Utility Functions and Decorators: Added a debug_time decorator for performance measurement and serialize_attribute for controlled attribute serialization.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • src/bar.py
    • Added a new example file demonstrating threaded Fibonacci and FizzBuzz functions using the new Tracer.
  • src/foo.py
    • Added a new example file showcasing asynchronous Fibonacci, FizzBuzz, chat, and long-running tasks with the new Tracer.
  • src/judgeval/init.py
    • Updated the __all__ export list to include the new Tracer, ProxyTracerProvider, BackgroundQueue, enqueue, and flush components.
  • src/judgeval/env.py
    • Added optional_int_env_var function for parsing integer environment variables.
    • Removed deprecated environment variables related to default GPT and Together models and max concurrent evaluations.
    • Introduced new environment variables JUDGMENT_BG_WORKERS and JUDGMENT_BG_MAX_QUEUE for background queue configuration.
  • src/judgeval/utils/decorators/debug_time.py
    • Added a new debug_time decorator to measure and log the execution time of functions.
  • src/judgeval/utils/serialize.py
    • Added a serialize_attribute function to conditionally serialize object attributes, preserving primitive types.
  • src/judgeval/v1/init.py
    • Refactored imports to expose the new Judgeval, Tracer, ProxyTracerProvider, BackgroundQueue, enqueue, and flush from their respective new modules.
  • src/judgeval/v1/background_queue.py
    • Added a new module defining BackgroundQueue for managing and executing tasks asynchronously in a thread pool.
  • src/judgeval/v1/internal/api/api_client.py
    • Added new post_projects_eval_queue_judge_examples and post_projects_eval_queue_judge_traces methods for asynchronous evaluation queuing.
  • src/judgeval/v1/internal/api/api_types.py
    • Added new TypedDict definitions for JudgeEvalQueueExamplesResponse, JudgeEvalQueueTracesResponse, JudgeExampleEvaluationRun, and JudgeTraceEvaluationRun.
  • src/judgeval/v1/judgeval.py
    • Moved the Judgeval class definition from src/judgeval/v1/__init__.py to its own dedicated file.
  • src/judgeval/v1/trace/init.py
    • Added a new __init__.py file to organize and expose the new tracing components.
  • src/judgeval/v1/trace/base_tracer.py
    • Added the abstract BaseTracer class, providing a foundation for tracing with static methods for span creation, attribute setting, context propagation, and asynchronous evaluation dispatch.
  • src/judgeval/v1/trace/exporters/init.py
    • Added a new __init__.py file for trace exporters.
  • src/judgeval/v1/trace/exporters/noop_span_exporter.py
    • Added a NoOpSpanExporter for scenarios where span export is disabled or not configured.
  • src/judgeval/v1/trace/exporters/span_exporter.py
    • Added a SpanExporter class that wraps OTLPSpanExporter to send traces to the Judgment API with custom headers.
  • src/judgeval/v1/trace/generators.py
    • Added _ObservedSyncGenerator and _ObservedAsyncGenerator classes to manage spans for observed generator functions.
  • src/judgeval/v1/trace/id_generator.py
    • Added IsolatedRandomIdGenerator for generating unique trace and span IDs using a dedicated random instance.
  • src/judgeval/v1/trace/processors/init.py
    • Added a new __init__.py file for trace processors.
  • src/judgeval/v1/trace/processors/_lifecycles/init.py
    • Added a new __init__.py file to organize and expose lifecycle span processors and context keys.
  • src/judgeval/v1/trace/processors/_lifecycles/agent_id_processor.py
    • Added AgentIdProcessor to set agent-related attributes on spans based on context.
  • src/judgeval/v1/trace/processors/_lifecycles/context_keys.py
    • Added new context keys for propagating customer ID, session ID, agent ID, and project ID overrides.
  • src/judgeval/v1/trace/processors/_lifecycles/customer_id_processor.py
    • Added CustomerIdProcessor to automatically set customer ID attributes on spans from the context.
  • src/judgeval/v1/trace/processors/_lifecycles/project_id_override_processor.py
    • Added ProjectIdOverrideProcessor to apply project ID overrides to spans from the context.
  • src/judgeval/v1/trace/processors/_lifecycles/registry.py
    • Added a registry system for lifecycle span processors.
  • src/judgeval/v1/trace/processors/_lifecycles/session_id_processor.py
    • Added SessionIdProcessor to automatically set session ID attributes on spans from the context.
  • src/judgeval/v1/trace/processors/noop_span_processor.py
    • Added a NoOpSpanProcessor for when span processing is disabled.
  • src/judgeval/v1/trace/processors/span_processor.py
    • Added a custom SpanProcessor that extends BatchSpanProcessor to handle partial span emissions and internal attributes.
  • src/judgeval/v1/trace/proxy_tracer_provider.py
    • Added ProxyTracerProvider to act as a central point for managing active tracers and OpenTelemetry instrumentation.
  • src/judgeval/v1/trace/tracer.py
    • Added the concrete Tracer class, implementing BaseTracer and handling the full OpenTelemetry setup with exporters and processors.
  • src/judgeval/v1/tracer/base_tracer.py
    • Added a DeprecationWarning to inform users about the move to the new judgeval.v1.trace.Tracer.
  • src/judgeval/v1/utils.py
    • Modified resolve_project_id to use a dictionary-based cache instead of lru_cache and to include client.organization_id in the cache key for better multi-organization support.
Activity
  • The pull request is currently marked as 'Work In Progress' (WIP), indicating ongoing development.
  • The author, abhishekg999, is actively refactoring the Judgeval SDK to separate it from the observability product.
  • The changes introduce a new Tracer top-level object, suggesting a significant shift in how tracing and observability are handled within the library.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a major refactoring of the tracing capabilities, creating a new, more ergonomic Tracer object with better support for multi-project and multi-threaded/multi-tasking scenarios. The core of the change is a new ProxyTracerProvider that can dynamically switch between different tracer instances. The changes are extensive, adding many new modules for tracers, exporters, processors, and background task handling.

My review focuses on the new example files that demonstrate these features. I've found a couple of issues in the async example (src/foo.py) that could be misleading for users of the SDK: one related to an incorrect client configuration for a third-party API, and another related to an inefficient usage pattern of the new Tracer.create method. Addressing these would make the examples clearer and more robust.

src/foo.py Outdated
Comment on lines +8 to +11
client = AsyncOpenAI(
api_key=os.getenv("ANTHROPIC_API_KEY"),
base_url="https://api.anthropic.com/v1",
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The configuration for the AsyncOpenAI client appears incorrect for calling the Anthropic API. The openai library sends requests in a format specific to OpenAI's API, which is not compatible with Anthropic's native API at https://api.anthropic.com/v1. This will likely lead to runtime errors.

To call the Anthropic API, you should use the anthropic Python SDK.

Additionally, the model name claude-opus-4-1 used on line 47 does not seem to be a valid Anthropic model name. The latest Opus model is claude-3-opus-20240229.

If you are using a proxy that makes Anthropic's API OpenAI-compatible, it would be helpful to clarify this with a comment and ensure the base_url points to that proxy.

src/foo.py Outdated
tags: list[str],
**kwargs,
):
Tracer.create(project_name=name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Calling Tracer.create() inside handle_request means a new Tracer instance (along with its own TracerProvider and other resources) is created for every single request. This is inefficient and not recommended for production code. Tracer instances are meant to be long-lived and created once per project.

A better approach would be to create the tracers for different projects once at application startup and then select the appropriate one to activate within the request handler. The commented-out code at the top of the file and the pattern in src/bar.py demonstrate this better pattern.

For example:

# At application startup
tracers = {
    "fibonacci": Tracer.create(project_name="fibonacci", set_active=False),
    "fizzbuzz": Tracer.create(project_name="fizzbuzz", set_active=False),
    "chat": Tracer.create(project_name="chat", set_active=False),
    "long_running_task": Tracer.create(project_name="long_running_task", set_active=False),
}

async def handle_request(name: str, ...):
    tracers[name].set_active()
    # ... rest of the handler

This would be much more efficient and is likely the intended usage of the new multi-project support.

@abhishekg999 abhishekg999 marked this pull request as ready for review March 4, 2026 01:49
@abhishekg999 abhishekg999 changed the title wip: scope new tracer ergonomics New tracer ergonomics (JUD-3957) Mar 4, 2026
@linear
Copy link

linear bot commented Mar 4, 2026

propel-code-bot[bot]

This comment was marked as outdated.

devin-ai-integration[bot]

This comment was marked as resolved.

Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 new potential issues.

View 11 additional findings in Devin Review.

Open in Devin Review

Comment on lines +108 to +112
@classmethod
def get_instance(cls) -> ProxyTracerProvider:
if cls._instance is None:
cls._instance = cls()
return cls._instance
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 JudgmentTracerProvider singleton is not thread-safe, unlike BackgroundQueue

JudgmentTracerProvider.get_instance() (src/judgeval/v1/trace/judgment_tracer_provider.py:109-112) uses a simple check-then-act pattern without any locking. In contrast, BackgroundQueue.get_instance() (src/judgeval/v1/background_queue.py:28-36) correctly uses double-checked locking with cls._lock. In a multi-threaded web server, two threads calling get_instance() simultaneously before initialization completes could create two separate JudgmentTracerProvider instances, with the first thread's registered tracers and context being silently discarded when the second thread overwrites _instance.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@devin-ai-integration you want to create a lock on every get_instance call?

Copy link

@propel-code-bot propel-code-bot bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found a shutdown logic issue that prevents flushing queued jobs; changes suggested.

Status: Changes Suggested | Risk: Medium

Issues Identified & Suggestions
  • Ensure shutdown flushes pending jobs before setting shutdown flag: src/judgeval/v1/background_queue.py
Review Details

📁 182 files reviewed | 💬 1 comments

👍 / 👎 individual comments to help improve reviews for you

Comment on lines +67 to +72
return True

def shutdown(self, timeout_ms: int = 30000) -> None:
if self._shutdown:
return
self._shutdown = True

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

[Logic] shutdown() sets _shutdown = True before calling force_flush(), but force_flush() immediately returns False when _shutdown is set. That means pending jobs are never flushed during shutdown. Move the flag assignment after force_flush() (or allow flushing when shutting down).

def shutdown(self, timeout_ms: int = 30000) -> None:
    if self._shutdown:
        return
    self.force_flush(timeout_ms)
    self._shutdown = True
    self._executor.shutdown(wait=False)
Context for Agents
`shutdown()` sets `_shutdown = True` before calling `force_flush()`, but `force_flush()` immediately returns `False` when `_shutdown` is set. That means pending jobs are never flushed during shutdown. Move the flag assignment after `force_flush()` (or allow flushing when shutting down).

```python
def shutdown(self, timeout_ms: int = 30000) -> None:
    if self._shutdown:
        return
    self.force_flush(timeout_ms)
    self._shutdown = True
    self._executor.shutdown(wait=False)
```


File: src/judgeval/v1/background_queue.py
Line: 72

Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 14 additional findings in Devin Review.

Open in Devin Review

return False
if not self._futures:
return True
_, not_done = wait(self._futures, timeout=timeout_ms / 1000.0)
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 BackgroundQueue _futures set mutated concurrently without synchronization

The _futures set is a plain set accessed from the calling thread in enqueue() (line 45: self._futures.add(future)) and force_flush() (line 61: wait(self._futures, ...)), and from thread-pool worker threads in _on_done() (line 51: self._futures.discard(future)). concurrent.futures.wait() iterates the set to build an internal copy via set(fs). If _on_done calls discard() on a worker thread during that iteration, it raises RuntimeError: Set changed size during iteration. This race is triggered when force_flush() or shutdown() is called while background jobs are completing.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Copy link
Contributor

@adivate2021 adivate2021 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some minor comments/clarifying questions

self._project_id = project_id

def get(self, name: str) -> Optional[CustomScorer]:
def get(self, name: str) -> BaseJudge | None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are still keeping get? What can you do with the scorer once you get it, also why does this seem to return prompt scorer specific fields but its supposed to be general?

class CustomScorerFactory:
class JudgesFactory:
__slots__ = ("_client", "_project_id")
_cache: Dict[Tuple[str, str, str, str], APIPromptScorer] = {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a general judges factory but it has to do with prompt scorer?

"""Abstract base for all Judgment tracers.

Provides the core tracing surface: span creation, attribute recording,
the ``@observe`` and ``@agent`` decorators, context propagation for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no agent anymore

"""

__slots__ = (
"project_name",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need project name, can we remove project name from the tracer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is scoped to a project name tho?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like the underlying tracer instance is. the user only worries about the static Tracer, but internally Tracer.init does have a backing object

api_key: Optional[str],
organization_id: Optional[str],
api_url: Optional[str],
environment: Optional[str],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this to support when we add environments in the future?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah correct

@@ -5,8 +5,8 @@
from opentelemetry.context import Context, get_value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets remove this like we did for typescript

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updating to use baggage

Copy link

@propel-code-bot propel-code-bot bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two important issues were noted around thread safety and potential memory growth that should be addressed.

Status: Changes Suggested | Risk: Medium

Issues Identified & Suggestions
  • Guard shared futures set to avoid concurrent mutation errors: src/judgeval/v1/background_queue.py
  • Clear accumulated async payloads to prevent memory leak over time: src/judgeval/v1/trace/base_tracer.py
Review Details

📁 182 files reviewed | 💬 2 comments

👍 / 👎 individual comments to help improve reviews for you

self._semaphore.release()
self._futures.discard(future)
exc = future.exception()
if exc is not None:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

[Logic] ```
_, not_done = wait(self._futures, timeout=timeout_ms / 1000.0)

`_futures` is mutated by `_on_done` running in worker threads while `force_flush()` iterates it. This can raise `RuntimeError: Set changed size during iteration` or cause missed futures. Protect `_futures` with a lock or snapshot before waiting.

```python
self._futures_lock = threading.Lock()

# enqueue/_on_done: wrap add/discard with the lock
with self._futures_lock:
    self._futures.add(future)

# force_flush
with self._futures_lock:
    futures = set(self._futures)
_, not_done = wait(futures, timeout=timeout_ms / 1000.0)
Context for Agents
```
_, not_done = wait(self._futures, timeout=timeout_ms / 1000.0)
```
`_futures` is mutated by `_on_done` running in worker threads while `force_flush()` iterates it. This can raise `RuntimeError: Set changed size during iteration` or cause missed futures. Protect `_futures` with a lock or snapshot before waiting.

```python
self._futures_lock = threading.Lock()

# enqueue/_on_done: wrap add/discard with the lock
with self._futures_lock:
    self._futures.add(future)

# force_flush
with self._futures_lock:
    futures = set(self._futures)
_, not_done = wait(futures, timeout=timeout_ms / 1000.0)
```


File: src/judgeval/v1/background_queue.py
Line: 53

Copy link
Collaborator

@alanzhang25 alanzhang25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Do you have test coverage report? Might need to add some UTs for background queue

# Static API: Async Evaluation #
# ------------------------------------------------------------------ #

_pending_evals: Dict[str, List[Dict[str, Any]]] = {}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to store this, i dont think the eval_name actually matters in this case and if it does, i think we can just store count instead of the actual payload

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed, we do need the full array since its serialized in the span attribute, But i will use internal_attributes sicne that gets cleaned up

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to confirm, this is only used by tags right now right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tags and async_evaluate

tracer.set_span_attribute(
agent_span, f"agent.{key}", value
)
if result_metadata:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure why the indent here changed

# Static API: Async Evaluation #
# ------------------------------------------------------------------ #

_pending_evals: Dict[str, List[Dict[str, Any]]] = {}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this _pending_evals unbounded memory

propel-code-bot[bot]

This comment was marked as outdated.

devin-ai-integration[bot]

This comment was marked as resolved.

Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 13 additional findings in Devin Review.

Open in Devin Review

Comment on lines 15 to +16
def __init__(self):
self.resource_attributes = {}
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 NoOpSpanProcessor does not override state_incr/state_append/state_get, will crash with AttributeError

NoOpSpanProcessor.__init__ does pass (line 15-16 of noop_span_processor.py), skipping the parent JudgmentSpanProcessor.__init__ which initializes self._state and self._lock. The NoOp overrides the old method names set_internal_attribute/get_internal_attribute, but the parent class was renamed to use state_set/state_get/state_incr/state_append. When async_evaluate in base_tracer.py:589 calls processor.state_incr() on a NoOpSpanProcessor, it hits the parent's state_incr which accesses self._lock — an attribute that doesn't exist because the parent __init__ was never called. This raises AttributeError. While the call is wrapped with @dont_throw so it's silently swallowed, the evaluation silently fails.

Suggested change
def __init__(self):
self.resource_attributes = {}
pass
def __init__(self):
self._lock = __import__('threading').RLock()
self._state: dict = {}
self._span_finalizers: dict = {}
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Copy link

@propel-code-bot propel-code-bot bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes suggested: fix semaphore release when executor submit fails.

Status: Changes Suggested | Risk: Medium

Issues Identified & Suggestions
  • Release semaphore if submit raises to avoid capacity leak: src/judgeval/v1/background_queue.py
Review Details

📁 188 files reviewed | 💬 1 comments

👍 / 👎 individual comments to help improve reviews for you

if self._shutdown:
return False
if not self._semaphore.acquire(blocking=False):
judgeval_logger.warning("[BackgroundQueue] Queue full, dropping job")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

[Logic] If ThreadPoolExecutor.submit() raises (e.g., executor already shut down), the semaphore was acquired but never released, permanently reducing queue capacity. Wrap submit in try/except and release the semaphore on failure.

if not self._semaphore.acquire(blocking=False):
    ...
try:
    future = self._executor.submit(fn)
except Exception:
    self._semaphore.release()
    raise
Context for Agents
If `ThreadPoolExecutor.submit()` raises (e.g., executor already shut down), the semaphore was acquired but never released, permanently reducing queue capacity. Wrap submit in try/except and release the semaphore on failure.

```python
if not self._semaphore.acquire(blocking=False):
    ...
try:
    future = self._executor.submit(fn)
except Exception:
    self._semaphore.release()
    raise
```


File: src/judgeval/v1/background_queue.py
Line: 42

devin-ai-integration[bot]

This comment was marked as resolved.

Copy link
Contributor

@adivate2021 adivate2021 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm other than the thing we talked abt with JudgesFactory

Copy link

@propel-code-bot propel-code-bot bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One important logic issue found around missing score_type values that could break API validation.

Status: Changes Suggested | Risk: Medium

Issues Identified & Suggestions
  • Populate or omit score_type to match server contract: src/judgeval/v1/evaluation/hosted_evaluation.py
Review Details

📁 210 files reviewed | 💬 1 comments

👍 / 👎 individual comments to help improve reviews for you

"judgment_scorers": [s.get_scorer_config() for s in scorers],
"judgment_scorers": [
{"name": name, "score_type": "", "threshold": 0.5} for name in scorers
],

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

[Logic] Hosted evaluations now send {"score_type": ""} for each scorer. If the API validates score_type as required (as it was previously), this will be rejected or interpreted incorrectly.

Fix: populate the real score_type for each scorer name (if known), or omit the field and update the server contract accordingly.

Context for Agents
Hosted evaluations now send `{"score_type": ""}` for each scorer. If the API validates `score_type` as required (as it was previously), this will be rejected or interpreted incorrectly.

Fix: populate the real `score_type` for each scorer name (if known), or omit the field and update the server contract accordingly.

File: src/judgeval/v1/evaluation/hosted_evaluation.py
Line: 32


from judgeval.v1.trace.baggage.propagator import JudgmentBaggagePropagator

_JUDGMENT_BAGGAGE_KEY = create_key("judgment.baggage")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe keep this otel

propel-code-bot[bot]

This comment was marked as outdated.

@propel-code-bot
Copy link

✔️ Propel has finished reviewing this change.

Copy link

@propel-code-bot propel-code-bot bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review suggests addressing an important initialization issue in the no-op span processor.

Status: Changes Suggested | Risk: Medium

Issues Identified & Suggestions
  • Initialize NoOpSpanProcessor state/lock or override inherited methods: src/judgeval/v1/trace/processors/noop_span_processor.py
Review Details

📁 212 files reviewed | 💬 1 comments

👍 / 👎 individual comments to help improve reviews for you


class NoOpJudgmentSpanProcessor(JudgmentSpanProcessor):
__slots__ = ("resource_attributes",)
class NoOpSpanProcessor(JudgmentSpanProcessor):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

[Logic] NoOpSpanProcessor.init doesn’t initialize _state/_lock, but it inherits state_set/state_incr/state_append from JudgmentSpanProcessor. Any caller using those will hit AttributeError. Either call super().__init__(...) with a noop exporter or override those methods to no-op.

Context for Agents
NoOpSpanProcessor.__init__ doesn’t initialize `_state`/`_lock`, but it inherits `state_set/state_incr/state_append` from JudgmentSpanProcessor. Any caller using those will hit `AttributeError`. Either call `super().__init__(...)` with a noop exporter or override those methods to no-op.

File: src/judgeval/v1/trace/processors/noop_span_processor.py
Line: 12

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants