Skip to content

ci: daily Evals CI for Extensions/Skills on github using Evalbench#152

Open
omkargaikwad23 wants to merge 19 commits intomainfrom
ci-evals
Open

ci: daily Evals CI for Extensions/Skills on github using Evalbench#152
omkargaikwad23 wants to merge 19 commits intomainfrom
ci-evals

Conversation

@omkargaikwad23
Copy link
Copy Markdown

No description provided.

@omkargaikwad23 omkargaikwad23 requested review from a team as code owners April 13, 2026 10:23
@github-actions github-actions Bot requested a review from isaurabhuttam April 13, 2026 10:24
@omkargaikwad23 omkargaikwad23 added the do not merge Indicates a pull request not ready for merge, due to either quality or timing. label Apr 14, 2026
@omkargaikwad23 omkargaikwad23 force-pushed the ci-evals branch 2 times, most recently from 4004d05 to e69c7b6 Compare April 14, 2026 05:21
@omkargaikwad23 omkargaikwad23 changed the title Daily Evals CI for Extensions/Skills on github using Evalbench ci: daily Evals CI for Extensions/Skills on github using Evalbench Apr 17, 2026
Comment thread cloudbuild.yaml
steps:

# --- Evaluation Step ---
- name: 'us-central1-docker.pkg.dev/cloud-db-nl2sql/evalbench/eval_server:89aa9fefd4b247610a95ef0896ba55d468563f50'
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will use latest image once recently pushed changes in evalbench are deployed.
https://github.com/GoogleCloudPlatform/evalbench/pull/336/changes

To identify which evaluation results belong to my extension added extension_id: cloud-sql-postgresql key
@omkargaikwad23 omkargaikwad23 removed the do not merge Indicates a pull request not ready for merge, due to either quality or timing. label Apr 21, 2026
Comment thread evals/dataset.json
Comment thread evals/run_config.yaml
scorers:
trajectory_matcher: {}
goal_completion:
model_config: /workspace/evals/gemini_2.5_pro_model.yaml
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add all the required scorers

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants