Skip to content

LLM Experimentation Notes #10

@sarah114tran

Description

@sarah114tran

Prompt 1:

  1. Prompt Used: Long Prompt with Schema Provided
  2. 5ish seconds to pull down GH data, averaging to 0.14 seconds per repository
  3. Standard model (gpt-4.5)
  4. Long prompt with chain of thought
  5. Running the prompt on the set of 15 took 3 min and 38 seconds
  6. Primarily struggles with BOM and assembly instructions --> for both of these pieces of documentation, brief descriptions of materials/ components and assembly instructions gets incorrectly classified as a present.
Accuracy Precision Recall F1 Score True Positive Rate False Negative Rate
0.88 0.81 0.9 0.86 0.9 0.1

Prompt 2:

  1. Prompt Used: Revised Prompt
  2. Standard model (gpt-4.5)
  3. Long prompt, revised prompt number one with greater specificity regarding bill of materials and assembly instructions. I specify that brief descriptions of components do not constitute as a BOM and that brief mentions of assembly instructions do not constitute as assembly instructions.
  4. Running the prompt took around 4.5 minutes
  5. Single assembly instruction classified as present when it is actually missing
Accuracy Precision Recall F1 Score True Positive Rate False Negative Rate
0.99 0.97 1 0.98 1 0

Prompt 3

  1. Prompt Used: Revised Prompt, no schema provided
  2. Standard model (gpt-4.5)
  3. Running the prompt took 3.24 min
  4. Struggles mainly with false positives, classifying for the presence of the assembly instructions and the mechanical files when they do not exist.
Accuracy Precision Recall F1 Score True Positive Rate False Negative Rate
0.95 0.92 0.98 0.95 0.98 0.02

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions