FEAT: Add Image functionality to TAP by awksrj · Pull Request #1036 · microsoft/PyRIT

awksrj · 2025-07-30T15:52:29Z

Description

This PR makes TAP (Tree of Attacks with Pruning) work with image generation targets and improves its resilience to blocked/content-filtered responses. Originally started by @awksrj, then brought up to date with main and significantly expanded.

Core changes

1. error_score_map — Resilient error handling for TAP

Adds an error_score_map parameter to TreeOfAttacksWithPruningAttack that maps response error types (e.g., "blocked") to fixed scores instead of letting the scorer crash and the branch get pruned. This prevents premature termination when all initial branches hit content filters.

Default: {"blocked": 0.0} — blocked branches survive with score 0 and are only pruned when width is exceeded
Pass {} to disable and restore previous behavior
Validated at construction time: keys must be valid PromptResponseError values, scores in [0, 1]
Synthetic scores are persisted to memory for audit trail

2. Single-turn target support via TargetCapabilities

TAP now checks objective_target.capabilities.supports_multi_turn and generates a fresh conversation_id before each prompt send for single-turn targets (like image generators). No special configuration needed.

3. Multimodal scoring fixes

SelfAskScaleScorer._score_piece_async now handles non-text content (images, audio) correctly by sending the raw content with its original data type and prepending the objective as a text piece — matching the pattern already used by SelfAskTrueFalseScorer
FloatScaleScorer._score_value_with_llm passes through prepended_text_message_piece to the parent class
TAP's default scorer auto-detects target output modalities and configures ScorerPromptValidator with the right supported types

4. Default scoring scale fix

TAP default scorer now uses TASK_ACHIEVED_SCALE instead of TREE_OF_ATTACKS_SCALE
Changed task_achieved_scale.yaml category from "jailbreak" to "task_achievement" so the scorer LLM evaluates objective completion rather than harmfulness

5. TAPSystemPromptPaths enum

Added TAPSystemPromptPaths enum (matching RTASystemPromptPaths pattern) with TEXT_GENERATION and IMAGE_GENERATION variants. The image generation system prompt is tailored for single-turn image models.

Documentation

Added image generation target example to doc/code/executor/attack/tap_attack.py and .ipynb
Notebook executed with real APIs — image TAP scores 0.95 (SUCCESS) for a cat-with-hat objective

Tests

11 new unit tests for error_score_map behavior (validation, interception, propagation, opt-out)
8 parametrized scenario tests exercising the full TAP loop with mocked nodes across multiple depths, covering blocked recovery, mixed errors, gradual improvement, off-topic recovery, and threshold plateau
2 integration tests for TAP with text and image targets

Related Issue

Closes: #585

Behavioral changes

Default error_score_map: All existing TAP users will automatically get {"blocked": 0.0}. Pass error_score_map={} to restore previous behavior where blocked responses go through normal scoring.
Default scoring scale: Changed from jailbreak-oriented to task-achievement-oriented. This affects the default SelfAskScaleScorer used when no attack_scoring_config is provided.

awksrj · 2025-08-01T16:13:42Z

Thanks for all the comments. I'll go through them and push changes soon!

…PyRIT into feature/tap-image-target

awksrj · 2025-08-06T15:19:38Z

I added two unit tests to cover the pruning logic, ensuring blocked responses are scored as 0.0 and pruning only occurs when we exceed tree_width. I also updated the example in tree_of_attacks_with_pruning.py, which used to show how the old TreeOfAttacksWithPruningOrchestrator worked with text targets. I replaced it with the new TAPAttack class to reflect the current implementation, which hopefully makes the documentation more complete.

romanlutz

One of the maintainers should run the notebook as well once it exists. Just to make sure we aren't missing anything

…p-image-target

…attack notebooks, run notebooks

Resolve conflicts: keep both error_score_map (PR) and initial_prompt/prepended_conversation_config (main). Take main version for doc files (need separate follow-up). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Add error_score_map parameter to TreeOfAttacksWithPruningAttack and _TreeOfAttacksNode that maps response error types to fixed scores. This prevents premature branch pruning when targets return blocked or content-filtered responses (e.g., image generation targets). Key changes: - Default error_score_map maps 'blocked' -> 0.0 (pass {} to disable) - Intercepts mapped errors in _score_response_async before calling scorer - Creates synthetic float_scale Score for mapped errors - Propagates map through duplicate() and _create_attack_node() - Copies dict to avoid shared mutable state Updates from original PR microsoft#1036: - Adapted to current Message/MessagePiece API (was PromptRequestResponse) - Fixed Score constructor args (message_piece_id, score_category as list) - Made default None -> {'blocked': 0.0} per reviewer feedback - Added comprehensive unit tests for error interception, scoring, and map propagation Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Add input validation: keys must be valid PromptResponseError values, scores must be in [0, 1] range. Errors caught at construction time. - Persist synthetic scores to memory via add_scores_to_memory() - Fix multi-piece handling: iterate all message_pieces to find the error piece, not just the first piece - Add validation unit tests for invalid key and out-of-range value Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Add TAPSystemPromptPaths enum with TEXT_GENERATION and IMAGE_GENERATION variants, matching the RTASystemPromptPaths pattern - Export TAPSystemPromptPaths from pyrit.executor.attack - Add image generation target example to TAP doc (tap_attack.py/.ipynb) demonstrating use of IMAGE_GENERATION system prompt - Add TAP integration tests for both text and image targets - Regenerate tap_attack.ipynb from updated .py Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>