Skip to content
#

humaneval

Here are 38 public repositories matching this topic...

SkyCode是一个多语言开源编程大模型,采用GPT3模型结构,支持Java, JavaScript, C, C++, Python, Go, shell等多种主流编程语言,并能理解中文注释。模型可以对代码进行补全,拥有强大解题能力,使您从编程中解放出来,专心于解决更重要的问题。| SkyCode is an open source programming model, which adopts the GPT3 model structure. It supports Java, JavaScript, C, C++, Python, Go, shell and other languages, and can understand Chinese comments.

  • Updated Mar 2, 2023

Benchmark suite for evaluating LLMs and SLMs on coding and SE tasks. Features HumanEval, MBPP, SWE-bench, and BigCodeBench with an interactive Streamlit UI. Supports cloud APIs (OpenAI, Anthropic, Google) and local models via Ollama. Tracks pass rates, latency, token usage, and costs.

  • Updated Apr 23, 2026
  • Python

大模型评测平台 — 本地/API/HuggingFace/OpenCompass 三路后端,支持数据生产(Self-Instruct/Evol-Instruct)、长尾场景生成、弱项挖掘、回归分析、污染检测、Bad Case归因。可扩展的 Benchmark 系统和 LLM-as-Judge 自动评分。

  • Updated Jun 3, 2026
  • Python

Improve this page

Add a description, image, and links to the humaneval topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the humaneval topic, visit your repo's landing page and select "manage topics."

Learn more