Skip to content

Conversation

@snehalvartak
Copy link
Contributor

This PR fixes issue #200 - two bugs in the atkgen probe's verbose output feature that prevented users from viewing the conversation between the attack model and target model during red-teaming runs.

Bug 1: Line 181 - Incorrect string indexing

  • Problem: challenge_text[1] only displayed the second character of the probe text
  • Fix: Changed to challenge_text to display the full probe text
  • Impact: Users can now see complete attack prompts in verbose mode

Bug 2: Line 213 - Wrong variable displayed

  • Problem: this_attempt.prompt.turns[-1].content.text displayed the prompt sent to the model instead of the model's response
  • Fix: Changed to response_text to display the actual model response
  • Impact: Users now see the target model's actual responses, not echoes of the prompts

How it works

The atkgen probe has an existing verbose mode feature (triggered when --verbose >= 2 or -vv flag) that should display conversation turns with emoji markers. These bugs prevented the feature from working correctly. The fixes ensure:

  • Full probe text is displayed with 🔴 marker in yellow
  • Actual model responses are displayed with 🦜 marker in bright text
  • New conversation headers display with 🆕 marker

Testing added

Added test_atkgen_verbose_output() in tests/probes/test_probes_atkgen.py that:

  • Sets verbosity to 2 to enable conversation output
  • Captures stdout during probe execution
  • Verifies all three conversation markers appear (🆕, 🔴 probe:, 🦜 model:)

All 7 tests in test_probes_atkgen.py pass.


Verification

Steps to verify this fix works correctly:

  • Run all atkgen tests:
    python -m pytest tests/probes/test_probes_atkgen.py -v
    All 7 tests should pass.
    
  • Test verbose output manually:
    python -m garak --target_type test.Blank --probes atkgen.Tox -vv --generations 1
  • Verify the console displays:
    • atkgen: 🆕 ⋅.˳˳.⋅ॱ˙˙ॱ New conversation ॱ˙˙ॱ⋅.˳˳.⋅ 🗣️ (conversation header)
    • atkgen: 🔴 probe: [full probe text in yellow] (complete challenge text, not single character)
    • atkgen: 🦜 model: [model response in bright] (actual response, not the prompt)
  • Verify with a real model (optional - requires API key):
    export OPENAI_API_KEY="your-key"
    python -m garak --target_type openai --target_name gpt-3.5-turbo --probes atkgen.Tox -vv --generations 1
    
  • Should show multi-turn conversations with visible responses.
  • Verify the thing does not do what it should not:
    • Non-verbose mode (no -vv) should still show progress bars, not conversation text
    • Debug logs should still capture full conversation data

@github-actions
Copy link
Contributor

github-actions bot commented Dec 25, 2025

DCO Assistant Lite bot All contributors have signed the DCO ✍️ ✅

Fixed two bugs in the atkgen probe's verbose output (issue NVIDIA#200):

1. Line 181: Removed incorrect [1] indexing that only showed the second
   character of challenge_text instead of the full probe text
2. Line 213: Changed to print response_text (actual model response)
   instead of the prompt that was sent to the model

Added test_atkgen_verbose_output() to verify that verbose mode
(verbose >= 2) correctly displays conversation markers:
- 🆕 New conversation indicator
- 🔴 probe: Red team challenge text
- 🦜 model: Target model response text

The verbose output feature is triggered with -vv flag or when
_config.system.verbose >= 2.

Fixes NVIDIA#200

Signed-off-by: Snehal Vartak <snehal.inusa@gmail.com>
@snehalvartak snehalvartak force-pushed the fix/atkgen-verbose-output branch from 14d56f0 to 066cf4a Compare December 25, 2025 22:33
@snehalvartak
Copy link
Contributor Author

snehalvartak commented Dec 25, 2025

I have read the DCO Document and I hereby sign the DCO

@snehalvartak
Copy link
Contributor Author

recheck

Copy link
Collaborator

@jmartin-tech jmartin-tech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch, thanks for the added tests as well.

@jmartin-tech jmartin-tech merged commit 1408cf9 into NVIDIA:main Jan 2, 2026
15 of 16 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Jan 2, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants