TTYG-166 Configurable LLM by pgan002 · Pull Request #50 · Ontotext-AD/graphrag-eval

pgan002 · 2026-02-07T02:16:31Z

Modify the configuration file to specify LLM details using the litellm library. If parameter config_file_path is not specified or the file does not contain the key llm, the metrics that require an LLM are skipped.

We change the code to receive LLM parameters from the top-level function.

graphrag_eval/aggregation.py

graphrag_eval/answer_correctness.py

graphrag_eval/custom_evaluation.py

nelly-hateva · 2026-02-09T08:20:25Z

graphrag_eval/evaluation.py

+from pydantic import BaseModel, Field, model_validator
+
+from . import custom_evaluation
+from . import llm as llm_


why do we rename llm to llm_?

Otherwise defining class Config field llm gives error 'Type of "llm" could not be determined because it refers to itself'.

I would suggest then to rename the field llm in the Config class to llm_config or something. I don't like the underscore. The readability is not good.

I don't like llm_config. Renamed the imported class instead.

graphrag_eval/evaluation.py

graphrag_eval/llm.py

nelly-hateva · 2026-02-09T08:23:32Z

graphrag_eval/llm.py

+class Config(BaseModel):
+    name: str
+    temperature: float = Field(ge=0.0, le=1.0)
+    max_tokens: int = Field(ge=1)


If we merge #47 first, this should be removed, right? Also, we should introduce an embeddings model for answer relevancy, right https://github.com/Ontotext-AD/graphrag-eval/pull/47/changes#diff-f3dd3002658ae3670117b3df5ee6937158b2fad7c1543420fc25ccae901bfe24R9 ?

Of course! After one of these PRs is merged, the other has to be rebased and re-reviewed. Feel free to pause reviewing one of these until the other is merged and rebased.

We still have this field. What is the purpose of it?

See section Configuration: "maximum number of tokens to generate"

README.md

nelly-hateva · 2026-02-09T08:33:35Z

graphrag_eval/llm.py

+def call(config: Config, prompt: str) -> str:
+    import litellm
+    try:
+        response = litellm.completions(


My IDE is complaining that it can't find reference to this function

so I can't click on it and inspect the parameters. What I wonder is if for example I want to use Azure OpenAI how can I pass the API version? I know Azure requires some additional params, which I don't see how to pass.

Did you poetry install --with llm?

Now linked LiteLLM documentation from the description.

It seems that there are many optional params that differ between agent/model types and providers. For providers whose name starts with either 'A' or 'B', I found:

temperature

top_p

api_base

api_version

reasoning_effort

reasoning

verbosity

tools

tool_choice

max_output_tokens

response_format

enforce_validation

thinking

vertex_project

vertex_location

vertex_credentials

use_psc_endpoint_format

extra_headers

parallel_tool_calls

aws_access_key_id

aws_secret_access_key

aws_session_name

aws_session_token

aws_region_name

aws_profile_name

aws_role_name

aws_web_identity_token

aws_bedrock_runtime_endpoint

api_key

max_tokens

I modified the PR to expect name, temperature and max_tokens and use where them when appropriate, and pass all params to litellm.completions().

Yes, I did poetry install --with llm, but I still see this warning. I also see a warning that .dict() is deprectated

My IDE is complaining that it can't find reference to this function
Good catch: it's completion().

warning that .dict() is deprectated
Changed to .model_dump().

graphrag_eval/llm.py

nelly-hateva · 2026-02-23T13:24:10Z

README.md

+        * `temperature`: (float in the range [0.0, 2.0]) adversarial temperature for generation
+        * `max_tokens`: (int > 0) maximum number of tokens to generate
+        * Optional keys: parameters to be passed to LiteLLM for generation (for (`answer_correctness`)[#output-keys] and (custom evaluation)[#custom-evaluation-(custom-metrics)])
+    * `embedding`: required for (`answer_relevance`)[#output-keys].


the link is not rendered as one

nelly-hateva · 2026-02-23T13:24:24Z

README.md

+    * `embedding`: required for (`answer_relevance`)[#output-keys].
+        * `provider`: (str) name of the organiation providing the embedding model
+        * `model`: (str) name of the embedding model
+* `custom_evaluations`: (list of the following maps) required nonempty for (custom evaluation)[#custom-evaluation-(custom-metrics)]. Each map has keys:


the link is not rendered as one

nelly-hateva · 2026-02-23T13:24:48Z

README.md

+
+#### Example Configuration File With LLM Configuration
+
+Below is a YAML file that configures the LLM generation ((for metrics that require an LLM)[#llm-use-in-evaluation]) and embedding (for (answer relevance)[#otuput-keys]).


the links are not rendered as such

nelly-hateva · 2026-02-23T13:25:34Z

There is a typo in the PR description - detais
The title can be improved to something like Implement the ability for the user to define, which LLM to use using the litellm library

graphrag_eval/steps/evaluation.py

Not sure how they got removed / how tests were passing before

pgan002 requested review from atagarev and nelly-hateva February 7, 2026 02:16

pgan002 self-assigned this Feb 7, 2026