Skip to content

Latest commit

 

History

History
175 lines (135 loc) · 6.84 KB

File metadata and controls

175 lines (135 loc) · 6.84 KB

Prompting-Strategies-CodeGen-Study

Chain-of-Thought vs. Few-Shot: A Comparative Study of Prompting Strategies for Code Generation

This repository accompanies the research study and provides the code, dataset, and analysis artifacts referenced in the paper (see Associated Publication).


Contributors


Contents

1) Token Counter Program

  • Authored by us; uses the tiktoken library for tokenization.
  • Installation instructions for tiktoken are available in the official repository: https://github.com/openai/tiktoken.

2) Data and Analytics Excel File

  • Contains the raw data for prompts, responses, and human evaluations.
  • Includes basic analytics for quick inspection.

3) ANOVA Excel File

  • ANOVA conducted using the Analysis ToolPak add-in.
  • Effect sizes (eta-squared, partial eta-squared, and omega-squared) were calculated manually using standard definitions (see Effect Sizes).
  • To enable the add-in in Excel:
    1. File → Options → Add-ins
    2. From Manage, select Excel Add-ins, click Go…
    3. Check Analysis ToolPak, click OK.

4) Dataset

  • Organized by the combination of Reasoning-Style (CoT vs. Non-CoT) and Example-Context (Zero-Shot vs. Few-Shot).
  • Each combination contains 20 tasks (cases).
  • Each task has three files:
    • Prompt file: the prompt authored by the LLM.
    • Response file: the LLM’s response to that prompt.
    • Data file: structured metadata, evaluation results, and other task-level information.

Example layout:

Dataset/
├─ CoT Few-Shot (CFS)/
│  ├─ CFS 1/
│  │  ├─ task_031_data.json
│  │  ├─ task_031_prompt.txt
│  │  └─ task_031_response.txt
│  ├─ CFS 2/
│  │  ├─ ...
│  │  └─ ...
│  └─ ...
├─ CoT Zero-Shot (CZS)/
│  ├─ CZS 1/
│  │  ├─ ...
│  │  └─ ...
│  └─ ...
├─ Non-CoT Few-Shot (NCFS)/
│  ├─ NCFS 1/
│  │  ├─ ...
│  │  └─ ...
│  └─ ...
└─ Non-CoT Zero-Shot (NCZS)/
   ├─ NCZS 1/
   │  ├─ ...
   │  └─ ...
   └─ ...

Data Files

  • Data files are designed to store information about the prompt and response.
  • These files were originally produced by the LLM and then edited by humans to correct metadata and add additional information where necessary to ensure accuracy and completeness.

Notes on stored evaluations:

  • Self-evaluation refers to the model’s own assessment (values typically on a 1–10 scale). These were retained for completeness but disregarded in analysis due to unreliability.
  • Supervised (human) evaluations were performed by real evaluators. The rubric fields are:
field weight range explanation
factual_correctness 25% 1 to 5 Are the facts and steps correct?
Reasoning_quality 25% 1 to 5 Is the logic transparent?
coherency_and_clarity 20% 1 to 5 Is the response clear and easy to follow?
completeness 20% 1 to 5 Does it cover all required aspects?
understanding_depth 10% 1 to 5 Does it show insight beyond surface-level?
weighted_total N/A 0 to 100 (pct) Final composite score from weights

For more information regarding the evaluation of accuracy, see the Accuracy Evaluation Process and Criteria file.


Effect Sizes

  • Eta-squared (η²), partial eta-squared (ηp²), and omega-squared (ω²) were derived from the ANOVA results using their standard formulas based on sums-of-squares (SS), mean-squares (MS) and degrees-of-freedom (df).

$$ \eta^{2} = \frac{SS_{\text{effect}}}{SS_{\text{total}}} $$ $$ \text{partial } \eta^{2} = \frac{SS_{\text{effect}}}{SS_{\text{effect}} + SS_{\text{error}}} $$ $$ \omega^{2} = \frac{ SS_{\text{effect}} - (df_{\text{effect}})(MS_{\text{error}}) }{ SS_{\text{total}} + MS_{\text{error}} } $$

  • Some formulas are not presented in our paper
  • For more information regarding the formulas, see the reference below.

Reference for effect-size formulas
B. G. Tabachnick and L. S. Fidell, Using Multivariate Statistics, 6th ed., Upper Saddle River, NJ: Pearson Education, 2013, pp. 54–55.


Associated Publication

This repository contains the source code, datasets, and experimental materials for the following publication:

Chain-of-Thought vs. Few-Shot: A Comparative Study of Prompting Strategies for Code Generation
K. Nobakhtfar, K. Çakılcı, R. Zilan

Published in the IEEE conference proceedings of the 5th International Informatics and Software Engineering Conference (IISEC 2026) and available through IEEE Xplore.

DOI: https://doi.org/10.1109/IISEC69317.2026.11418478

Publication Timeline

Event Date
Paper Submission December 1, 2025
Notification of Acceptance January 9, 2026
Camera-Ready Submission January 28, 2026
Date of IISEC 2026 Conference February 5–6, 2026
Published on IEEE Xplore March 10, 2026

Citation

If you use this work in your research, please cite the paper:

BibTeX

@INPROCEEDINGS{11418478,
  author={Nobakhtfar, Koorosh and Çakılcı, Kenan and Zilan, Ruken},
  booktitle={2026 5th International Informatics and Software Engineering Conference (IISEC)}, 
  title={Chain-of-Thought vs. Few-Shot: A Comparative Study of Prompting Strategies for Code Generation}, 
  year={2026},
  volume={},
  number={},
  pages={382-387},
  keywords={Costs;Codes;Accuracy;Limiting;Large language models;Cognition;Robustness;Prompt engineering;Informatics;Software engineering;Chain-of-Thought (CoT);Few-Shot Learning;Large Language Models;Prompt Engineering;Empirical Study},
  doi={10.1109/IISEC69317.2026.11418478}
}

Plain Text

K. Nobakhtfar, K. Çakılcı and R. Zilan, "Chain-of-Thought vs. Few-Shot: A Comparative Study of Prompting Strategies for Code Generation," 2026 5th International Informatics and Software Engineering Conference (IISEC), Ankara, Turkiye, 2026, pp. 382-387, doi: 10.1109/IISEC69317.2026.11418478.


Links