Skip to content

Add prompt baseline test to check lisa_test_writer prompt quality#4365

Open
paxue wants to merge 1 commit intomainfrom
paxue/eval_prompt
Open

Add prompt baseline test to check lisa_test_writer prompt quality#4365
paxue wants to merge 1 commit intomainfrom
paxue/eval_prompt

Conversation

@paxue
Copy link
Copy Markdown
Collaborator

@paxue paxue commented Mar 20, 2026

Description

Add prompt baseline test to check "lisa_test_writer.prompt.md" prompt quality.
Threshold default is 70% for cloud large model. 65% for local small model.
Can also used to verified other prompt to help generate lisa test.

Related Issue

Type of Change

  • Bug fix
  • New feature
  • Breaking change
  • Refactoring
  • Documentation update

Checklist

  • Description is filled in above
  • No credentials, secrets, or internal details are included
  • Peer review requested (if not, add required peer reviewers after raising PR)
  • Tests executed and results posted below

Test Validation

Key Test Cases:

Impacted LISA Features:

Tested Azure Marketplace Images:

Test Results

Image VM Size Result
PASSED / FAILED / SKIPPED

Copilot AI review requested due to automatic review settings March 20, 2026 01:35
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a prompt evaluation framework under .github/prompts/eval/ to baseline and score the quality of outputs produced by lisa_test_writer.prompt.md, using JSONL-defined cases and an LLM-as-judge scoring rubric.

Changes:

  • Added eval_runner.py to run prompt eval cases against a configurable LLM provider and emit scored results.
  • Added cases.jsonl with baseline evaluation prompts + rubrics across multiple capability dimensions.
  • Added documentation (README.md) describing setup, usage, scoring, and case design.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File Description
.github/prompts/eval/scripts/eval_runner.py Implements the runner (provider selection, generation, judging, scoring, output).
.github/prompts/eval/cases.jsonl Defines baseline eval cases and rubrics used by the runner.
.github/prompts/eval/README.md Documents how to run/evolve the prompt evaluation framework and interpret results.

Copilot AI review requested due to automatic review settings March 20, 2026 01:49
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

Copilot AI review requested due to automatic review settings April 4, 2026 00:05
@paxue paxue force-pushed the paxue/eval_prompt branch from ce1fc11 to e55a96a Compare April 4, 2026 00:05
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.

modify PR review process to add some standard
@paxue paxue force-pushed the paxue/eval_prompt branch from e55a96a to 9fbad18 Compare April 4, 2026 00:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants