Conversation
Contributor
There was a problem hiding this comment.
Caution
Changes requested ❌
Reviewed everything up to 2de5bf0 in 1 minute and 49 seconds. Click for details.
- Reviewed
246lines of code in1files - Skipped
0files when reviewing. - Skipped posting
4draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. src/pipelines/pipeline.rs:99
- Draft comment:
Using a single 'last_error' to record transient errors may lose detail if multiple failures occur. Consider aggregating errors or preserving more context for debugging. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% The current implementation logs each error as it occurs via eprintln!, so error history is available in logs. The last_error is only used to return a final status code, not for debugging. Storing multiple errors would add complexity without clear benefit since errors are already logged. The suggestion seems to misunderstand the purpose of last_error. I could be undervaluing the benefit of having structured error history available programmatically rather than just in logs. There might be use cases for aggregated error reporting I haven't considered. The current error handling meets the needs - errors are logged for debugging, and the status code handling is appropriate for an HTTP API. Adding complexity to store multiple errors would be overengineering. The comment should be deleted. The current error handling is appropriate and adding more error tracking would increase complexity without clear benefit.
2. src/pipelines/pipeline.rs:102
- Draft comment:
Repeatedly cloning the payload in each loop iteration (e.g. at line 102) might be inefficient if the payload is large. Consider restructuring to avoid unnecessary clones. - Reason this comment was not posted:
Comment was on unchanged code.
3. src/pipelines/pipeline.rs:115
- Draft comment:
Consider replacing eprintln with a structured logging library (or using the existing OtelTracer) for consistency and better log management. - Reason this comment was not posted:
Confidence changes required:50%<= threshold50%None
4. src/pipelines/pipeline.rs:155
- Draft comment:
There is an inconsistency in return types—chat_completions returns Result<impl IntoResponse, StatusCode> while completions (and embeddings) return impl IntoResponse directly. Ensure this difference is intentional for error handling purposes. - Reason this comment was not posted:
Comment was on unchanged code.
Workflow ID: wflow_8jYBYamauvT5rNLz
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
| if let ChatCompletionResponse::NonStream(completion) = response { | ||
| tracer.log_success(&completion); | ||
| return Ok(Json(completion).into_response()); | ||
| let matching_models: Vec<_> = model_keys |
Contributor
There was a problem hiding this comment.
The filtering of matching models (lines 81-90) is repeated in three functions. Consider extracting this logic into a helper function to reduce code duplication.
Contributor
There was a problem hiding this comment.
Important
Looks good to me! 👍
Reviewed b172efe in 41 seconds. Click for details.
- Reviewed
56lines of code in1files - Skipped
0files when reviewing. - Skipped posting
3draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. src/pipelines/pipeline.rs:134
- Draft comment:
Using unwrap() on last_error assumes it’s always Some. Although matching_models is checked earlier, consider using expect("transient error expected") to clarify this invariant. - Reason this comment was not posted:
Confidence changes required:33%<= threshold50%None
2. src/pipelines/pipeline.rs:203
- Draft comment:
Consider replacing unwrap() with expect() here to document that a transient error is expected at this point in completions. - Reason this comment was not posted:
Confidence changes required:33%<= threshold50%None
3. src/pipelines/pipeline.rs:261
- Draft comment:
Using unwrap() assumes last_error is Some. Use expect() with a clear message to document the assumption in embeddings. - Reason this comment was not posted:
Confidence changes required:33%<= threshold50%None
Workflow ID: wflow_y5DpiO4Ap60pEUQx
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
galkleinman
approved these changes
Jun 21, 2025
Contributor
galkleinman
left a comment
There was a problem hiding this comment.
General comments:
- I'm quite concerned about this one, for these reasons:
- Design wise - we should generalize it (but ok, let's duck tape it for now)
- Performance/behaviour wise - Since LLM calls are usually quite long (in terms of response time), you'll end up waiting for the model till it times out, then you either place the hub at risk of timing out (without pointing out that it's the model fault), or if you set a long timeout on our server, it can flood it with a lot of hanging requests (and also the client app) - not sure how the mainloop here is working and wether there's a reactive way of dealing with it...
- DRY this retry logic?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Important
Add fallback mechanism to handle transient errors by trying other same-family models in
chat_completions,completions, andembeddings.chat_completions,completions, andembeddingsfunctions, added fallback to other same-family models on transient errors.is_transient_errorfunction to identify transient errors likeStatusCode::TOO_MANY_REQUESTS,StatusCode::REQUEST_TIMEOUT, etc.chat_completions,completions, andembeddings.This description was created by
for b172efe. You can customize this summary. It will automatically update as commits are pushed.