LLM Experimentation Notes

**Prompt 1:** 
1. Prompt Used: [Long Prompt with Schema Provided](https://github.com/WeberLab-UW/OSH_Datasets/blob/main/Prompt%20Evaluation/Test%20%235/one_shot_with_schema_prompt.md)
2. 5ish seconds to pull down GH data, averaging to 0.14 seconds per repository
3. Standard model (gpt-4.5)
4. Long prompt with chain of thought 
5. Running the prompt on the set of 15 took 3 min and 38 seconds
6. Primarily struggles with BOM and assembly instructions --> for both of these pieces of documentation, brief descriptions of materials/ components and assembly instructions gets incorrectly classified as a present. 

| Accuracy | Precision | Recall | F1 Score | True Positive Rate | False Negative Rate
| :--- | :------: | -----: | -----: | -----: | -----: |
| 0.88 | 0.81 | 0.9 | 0.86 | 0.9 | 0.1

**Prompt 2:** 
1. Prompt Used: [Revised Prompt](https://github.com/WeberLab-UW/OSH_Datasets/blob/main/Prompt%20Evaluation/Test%20%236/revised_long_prompt.md)
2. Standard model (gpt-4.5)
3. Long prompt, revised prompt number one with greater specificity regarding bill of materials and assembly instructions. I specify that brief descriptions of components do not constitute as a BOM and that brief mentions of assembly instructions do not constitute as assembly instructions.  
4. Running the prompt took around 4.5 minutes 
5. Single assembly instruction classified as present when it is actually missing

| Accuracy | Precision | Recall | F1 Score | True Positive Rate | False Negative Rate
| :--- | :------: | -----: | -----: | -----: | -----: |
| 0.99 | 0.97 | 1 | 0.98 | 1 | 0

**Prompt 3** 
1. Prompt Used: Revised Prompt, no schema provided 
2. Standard model (gpt-4.5)
3. Running the prompt took 3.24 min
4. Struggles mainly with false positives, classifying for the presence of the assembly instructions and the mechanical files when they do not exist. 

| Accuracy | Precision | Recall | F1 Score | True Positive Rate | False Negative Rate
| :--- | :------: | -----: | -----: | -----: | -----: |
| 0.95 | 0.92 | 0.98 | 0.95 | 0.98 | 0.02



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLM Experimentation Notes #10

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

LLM Experimentation Notes #10

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions