Skip to content

FEAT Add VisualLeakBench dataset loader (arXiv:2603.13385)#1531

Open
Copilot wants to merge 11 commits intomainfrom
copilot/add-visual-leak-bench-dataset-loader
Open

FEAT Add VisualLeakBench dataset loader (arXiv:2603.13385)#1531
Copilot wants to merge 11 commits intomainfrom
copilot/add-visual-leak-bench-dataset-loader

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 22, 2026

Adds PyRIT support for the VisualLeakBench / MM-SafetyBench dataset — a multimodal benchmark of 1,000 adversarial images testing LVLMs against OCR injection (harmful instructions embedded in images) and PII leakage (social engineering to extract SSNs, passwords, API keys, etc.).

Fixes #1530

New: _VisualLeakBenchDataset

  • Fetches metadata.csv from YoutingWang/MM-SafetyBench on GitHub and downloads images with local caching
  • Produces image+text prompt pairs per example, linked by prompt_group_id (image at sequence=0, category-specific query at sequence=1)
  • Maps harm categories: ocr_injection for OCR entries; pii_leakage + normalized PII type (e.g. ssn, api_key) for PII entries
  • Supports filtering via categories, pii_types, and max_examples
  • Registered with tags={"default", "safety", "privacy"}, modalities=["image", "text"] for SeedDatasetFilter discovery

New enums

Enum Values
VisualLeakBenchCategory OCR_INJECTION, PII_LEAKAGE
VisualLeakBenchPIIType EMAIL, DOB, PHONE, PASSWORD, PIN, API_KEY, SSN, CREDIT_CARD

Usage

from pyrit.datasets.seed_datasets.remote import (
    _VisualLeakBenchDataset,
    VisualLeakBenchCategory,
    VisualLeakBenchPIIType,
)

# Load only PII leakage examples for SSN and Password
loader = _VisualLeakBenchDataset(
    categories=[VisualLeakBenchCategory.PII_LEAKAGE],
    pii_types=[VisualLeakBenchPIIType.SSN, VisualLeakBenchPIIType.PASSWORD],
)
dataset = await loader.fetch_dataset()

Test coverage

28 unit tests covering init validation, OCR/PII pair creation, harm category mapping, category/PII-type filtering, max_examples, failed image handling, and metadata correctness. Integration test updated to cap image downloads at max_examples=6 (same pattern as _VLSUMultimodalDataset).

Additionally: refactoring VisualLeakBench + VLSU loaders

Applied during code review to both the new loader and the existing _VLSUMultimodalDataset:

  • Moved prompt constants from module-level to class-level (OCR_INJECTION_PROMPT, PII_LEAKAGE_PROMPT)
  • Made class metadata attributes immutable (frozenset for tags, tuple for modalities/harm_categories)
  • Extracted _matches_filters() and _build_prompt_pair_async() helpers from long fetch_dataset methods
  • Replaced manual setup_memory fixture with @pytest.mark.usefixtures("patch_central_database") in tests
  • Renamed test_failed_image_download_skips_exampletest_all_images_fail_produces_empty_dataset for clarity

Fix: strict enum validation across all dataset loaders

Replaced leaky set-based validation with strict isinstance checks in all dataset loaders that accept enum-typed filter parameters. Previously, raw strings matching enum values (e.g. "PII Leakage") would silently pass __init__ validation but crash downstream when .value was called on them.

Added _validate_enums / _validate_enum helpers to _RemoteDatasetLoader base class. Applied to:

  • visual_leak_bench_dataset.py (categories, pii_types)
  • vlsu_multimodal_dataset.py (categories)
  • harmbench_multimodal_dataset.py (categories)
  • promptintel_dataset.py (severity, categories)

Non-enum values are now rejected immediately with a clear error listing valid enum members.

Copilot AI changed the title [WIP] Add dataset loader for VisualLeakBench in PyRIT Add VisualLeakBench dataset loader (arXiv:2603.13385) Mar 22, 2026
Copilot AI requested a review from romanlutz March 22, 2026 20:21
@romanlutz romanlutz marked this pull request as ready for review April 11, 2026 00:40
romanlutz and others added 2 commits April 10, 2026 17:41
- Move prompt constants from module-level to class-level (OCR_INJECTION_PROMPT, PII_LEAKAGE_PROMPT)
- Make class metadata immutable (frozenset for tags, tuples for modalities/harm_categories)
- Extract _build_prompt_pair_async and _matches_filters helpers in both loaders
- Use patch_central_database fixture instead of manual CentralMemory setup in tests
- Rename test_failed_image_download_skips_example to test_all_images_fail_produces_empty_dataset

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@romanlutz romanlutz changed the title Add VisualLeakBench dataset loader (arXiv:2603.13385) FEAT Add VisualLeakBench dataset loader (arXiv:2603.13385) Apr 11, 2026
romanlutz and others added 2 commits April 10, 2026 18:07
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@adrian-gavrila adrian-gavrila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All looks good except for one potential minor bug

romanlutz and others added 5 commits April 13, 2026 11:29
Replace set-based validation with isinstance checks for categories and
pii_types parameters. Previously, raw strings matching enum values
(e.g. 'PII Leakage') would pass __init__ but crash in _matches_filters
when .value was called on them. Now non-enum values are rejected
immediately with a clear error message.

Add tests for the edge case where raw strings match enum values.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Resolve conflict in integration test by adopting main's smoke-test
structure (3 representative providers instead of exhaustive list).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Normalize string values to their enum equivalents in __init__ via
_normalize_enums(), so users can pass either VisualLeakBenchCategory.PII_LEAKAGE
or 'PII Leakage'. Invalid strings still raise ValueError with a helpful message
listing valid values. This is consistent with how vlsu_multimodal_dataset and
other loaders in the repo accept string-based filtering.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace leaky set-based validation with strict isinstance checks in four
dataset loaders. Previously, raw strings matching enum values would pass
__init__ validation but crash downstream when .value was called on them.

Add _validate_enums/_validate_enum helpers to _RemoteDatasetLoader base
class so all subclasses can reuse them. Apply the fix to:
- visual_leak_bench_dataset.py (categories, pii_types)
- vlsu_multimodal_dataset.py (categories)
- harmbench_multimodal_dataset.py (categories)
- promptintel_dataset.py (severity, categories)

Non-enum values are now rejected immediately with a clear error message
listing valid enum members.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

FEAT VisualLeakBench

3 participants