Skip to content

OSH Prompt Evaluation #9

@sarah114tran

Description

@sarah114tran

Test 1 Prompt: prompt_extract.md

  • Prompt Description: Provides detailed operational definitions, required json schema, critical instructions, and 1 example.

  • Prompt Results

  • Observations for iaq_board

    • License Information: Correct and accurate sentence extraction. Does and accurate job is separating the hardware, software, and the documentation license
    • BOM: pulls down the component name well even though it is named "item" in this repo. Schematic reference number is not extracted. Mistakes "Comment" column as sourcing_info. Evidence for BOM is clear/accurate.
    • Assembly Instructions:
    • Design/Mechanical Files: Does an accurate job of separating the design files from the mechanical files.
  • Observations for Open Scout

    • License Information: Correct and accurate sentence extraction. Even though the readme doe not assign which of the multiple license is assigned to what component, it does reasonably assign each of the 3 licenses to each of the components.
    • Contributing is correctly extracted
    • BOM: there is an actual bom in the repo, but thats not provided. The paragraph that details components is extracted as the BOM, and it is classified as "partial".
    • Assembly Instructions: Correctly finds that the video provided is a link to assembly instructions.
    • Design/Mechanical Files: Does an accurate job of separating the design files from the mechanical files.
  • Observations for totp

    • All of the categories are correct, EXCEPT
    • Design/Mechanical Files: there are technical files (.stl), but this category is incorrectly marked as not present.

Test 2 Prompt: No Schema Prompt

  • Prompt Description: Provides the operational definitions but does not provide the required json schema or examples (zero-shot).

  • Prompt Results

  • Observations for iaq_board

    • License information: correctly splits the license for each component
    • BOM: returns the column names, but incorrectly interps "comments" as sourcing information
    • Assembly Instructions: correctly identifies assembly instructions
    • Design files: returns all of the design files separated by type, such as CAD, PCB, 3D, and other files
  • Observations for Open Scout

    • License information: extracts sentence, but does not separate the three licenses
    • BOM: "The README mentions materials like aluminum extrusions and motors but lacks a complete list with quantities." --> this is a correct evaluation
    • Design/Mechanical Files: Incorrect -- only returns one of the design files ("CAD Files"), but does not return the mechanical files
    • Provides the directory structure
  • Observations for totp

    • all categories correct EXECPT
    • Design/Mechanical Files: there are stl files, but "3D Files" are returned as empty

Test 3 Prompt: prompt_extractTune

  • Prompt Description: Provides basic instructions for content classification, response schema, and a shortened example.

  • Prompt Results

  • Observations for iaq_board

    • License info correct.
    • BOM: identifies that there are 20 components, but only returns 2 of them
    • Assembly instructions: correctly identified and returned
    • Design/Mechanical Files: correctly separates between the mechanical and the design files
  • Observations for open scout

    • License info correct. Even though the readme doe not assign which of the multiple license is assigned to what component, it does reasonably assign each of the 3 licenses to each of the components.
    • BOM: same as test 1 and test 2, returns the materials used as the BOM --> labeled as "partial" completeness
    • Assembly Instructions: Correctly finds that the video provided is a link to assembly instructions.
    • Design/Mechanical Files: Does an accurate job of separating the design files from the mechanical files.
  • Observations for totp

    • Same results as test 1 and 2 -- does not pick up on the mechanical design files

Test 4 Prompt: reasoning

  • Prompt Description: Provides the operational definitions but does not provide the required json schema or examples (zero-shot) --> (the same prompt used in test 4).

  • Prompt Results

  • Observations for iaq_board

    • no difference between the standard and the reasoning model
    • confidence scores are comparable
  • Observations for Open Scout

    • All classifications are correct. Notably, it specifies that "The README lists key materials (aluminium extrusions, motor with encoder, and Arduino Mega) without detailed tables or full specifications.", which is more accurate than the other tests
    • confidence scores are lower for the standard model than the reasoning model
  • Observations for totp

    • classifications are correct
    • confidence scores are lower for the standard model than the reasoning model

Test 5 Prompt: Semi-Chain-of-Thought

  • Prompt Description: uses the shortened prompt from Test 3 but provides a detailed example and asks the model to provide a reason prior to the classification

  • Prompt Results

  • Observations for iaq_board

    • BOM: BOM is correctly detected and all of the components are extracted. The "Comment" column is not incorrectly interpreted as supplier information. All of the components are correctly extracted.
    • Design/Mechanical Files: For design file, the "type" categories are correctly classified.
    • "Reasoning": the reasoning is convincing and logical
  • Observations for Confagrid

    • BOM: correctly identified from the filename and provides the correct path. The reasoning provided is correct.
    • Design/Mechanical Files: "Type" is correctly identified
  • Observations for totp

    • all classification are correct

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions