Skip to content

ci: add fuzz regression testing and continuous fuzzing infrastructure#7173

Open
thepastaclaw wants to merge 18 commits intodashpay:developfrom
thepastaclaw:ci/fuzz-regression
Open

ci: add fuzz regression testing and continuous fuzzing infrastructure#7173
thepastaclaw wants to merge 18 commits intodashpay:developfrom
thepastaclaw:ci/fuzz-regression

Conversation

@thepastaclaw
Copy link
Copy Markdown

@thepastaclaw thepastaclaw commented Feb 23, 2026

Summary

Consolidates all active Dash Core fuzzing work into a single branch/PR
so we avoid cross-PR merge conflicts.

This PR now contains both:

  1. Fuzz-target expansion for Dash-specific code paths
    (deserialization, roundtrip, and functional validators)
  2. Fuzzing infrastructure
    (CI regression workflow, corpus tooling, and continuous fuzz daemon)

This is the end-to-end delivery for the Dash Core fuzzing initiative
(tracker#108).

Included work (folded into this PR)

The following previously separate fuzz PRs were merged into this branch
and closed in favor of this consolidated PR:

  • #7161
    — Dash-specific deserialization fuzz targets
  • #7162
    — roundtrip serialize/deserialize fuzz targets
  • #7166
    — governance proposal validator fuzz target
  • #7167
    — BLS operations + BLS IES fuzz targets
  • #7168
    — asset lock/unlock validation fuzz targets
  • #7169
    — special transaction validation fuzz target
  • #7172
    — LLMQ message handler fuzz target
  • #7174
    — CoinJoin validation fuzz targets

What this PR now delivers

1) Dash-specific fuzz target coverage

Adds broad fuzz coverage for Dash-specific serialization and functional
validation paths across:

  • evo/ (ProTx, asset lock/unlock, deterministic MN structures)
  • llmq/ (commitments, DKG/session message handling, quorum data)
  • governance/ (objects/votes/proposal validation)
  • coinjoin/ (queue/broadcast/denomination/status/entry logic)
  • bls/ (operations, aggregation, threshold ops, IES encryption)

2) CI fuzz regression workflow (test-fuzz.yml)

  • Runs after existing linux64_fuzz
  • Replays corpus inputs against all fuzz targets
  • Seeds inherited targets from bitcoin-core/qa-assets
  • Generates synthetic seeds for Dash-specific targets
  • Runs smoke tests for targets without corpus
  • Controlled by existing SKIP_LINUX64_FUZZ variable

3) Continuous fuzzing daemon

contrib/fuzz/continuous_fuzz_daemon.sh:

  • Loops all targets with configurable per-target runtime
  • Persists corpus at ~/fuzz_corpus/<target>/
  • Stores crash artifacts at ~/fuzz_crashes/<target>/
  • Supports daemon mode and --single-cycle
  • Includes a systemd service file for deployment

4) Seed corpus generator

contrib/fuzz/seed_corpus_from_chain.py:

  • Extracts real serialized artifacts from a running Dash node
  • Covers blocks, special transactions, governance, MN/quorum data
  • Supports --synthetic-only for offline corpus generation

Validation

Build verification

Built fuzz binary with --enable-fuzz-binary (PROVIDE_FUZZ_MAIN_FUNCTION
deterministic mode) on macOS arm64 (Apple Silicon, clang, 16 cores).
Binary compiled successfully with all 95 Dash-specific targets registered
(252 total targets including inherited Bitcoin Core targets).

Target registration check

All 95 new Dash-specific fuzz targets confirmed registered in the binary:

  • 43 deserialization targets covering: evo (ProTx, asset lock/unlock,
    deterministic MN, credit pool), llmq (commitments, DKG messages, quorum
    snapshots, sig shares), governance (objects, votes), coinjoin (queue,
    broadcast, entry, status, accept), bls (IES encrypted blobs, multi-recipient)
  • 37 roundtrip targets verifying serialize-deserialize-reserialize
    consistency for all major Dash protocol structures
  • 15 functional targets: asset_lock_tx, asset_lock_tx_raw,
    asset_unlock_fee, bls_operations, bls_ies, coinjoin_broadcasttx,
    coinjoin_denominations, coinjoin_entry_addscriptsig, coinjoin_queue,
    coinjoin_status_update, deterministic_mn_list_diff,
    governance_proposal_validator, llmq_messages, special_tx_validation,
    process_message_dash

Concrete run results

Each of the 95 Dash-specific targets was exercised with 21 inputs
(20 random seed inputs of varying sizes 1-256 bytes + 1 empty input).
All targets processed every input without crashes or hangs.

Summary: 95/95 targets PASS, 0 crashes, 0 timeouts.

Category Targets Inputs each Result
Deserialization (evo) 13 21 all PASS
Deserialization (llmq) 14 21 all PASS
Deserialization (governance) 5 21 all PASS
Deserialization (coinjoin) 5 21 all PASS
Deserialization (bls) 3 21 all PASS
Deserialization (other) 3 21 all PASS
Roundtrip (all domains) 37 21 all PASS
Functional validators 15 21 all PASS

Total inputs processed: 1,995 (95 targets x 21 inputs)

Continuous daemon validation

Daemon dry-run confirmed on Guix VM — target enumeration, corpus dir
creation, and crash dir creation all verified:

./continuous_fuzz_daemon.sh --dry-run
./continuous_fuzz_daemon.sh --single-cycle --targets bls_operations \
  --time-per-target 10

Seed corpus generation

Synthetic corpus generation validated (offline mode, no running node):

python3 contrib/fuzz/seed_corpus_from_chain.py --synthetic-only -o /tmp/corpus

CI behavior

  • Reuses existing linux64_fuzz artifacts
  • Integrated as a regression gate for corpus replay on PRs

Notes on validation scope

This validation confirms all targets build, register, and handle arbitrary
input without crashing in deterministic mode. Extended fuzzing campaigns
(with libFuzzer engine + coverage feedback) are tracked separately under
the continuous fuzzing deployment plan.

Bugs surfaced during fuzzing (pre-existing issues)

  • CoinJoin timeout path UB surfaced by fuzzing
    (CCoinJoinQueue::IsTimeOutOfBounds signed overflow case)
  • BLS library leak surfaced during long bls_operations campaigns
    (libdashbls internals)
  • Roundtrip deserialize issue found/fixed in related work:
    CQuorumSnapshot::Unserialize missing mnSkipList.clear()
    before read loop

Notes

  • This PR supersedes the listed closed fuzz PRs as the single
    review/merge unit for fuzzing.
  • Keeping this work in one branch is intended to minimize rebase churn
    while review is in progress.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Feb 23, 2026

✅ No Merge Conflicts Detected

This PR currently has no conflicts with other open PRs.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Feb 23, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds fuzzing infrastructure: a new reusable GitHub Actions job (test-linux64_fuzz) in .github/workflows/build.yml and a new workflow (.github/workflows/test-fuzz.yml) that downloads build artifacts, prepares corpora (inherited + synthetic), discovers fuzz targets in a fuzz binary, replays or exercises each target, collects crashes/logs, and reports a pass/fail summary (non-zero exit on failures). Adds contrib tools and docs: a continuous fuzzing daemon (contrib/fuzz/continuous_fuzz_daemon.sh), a corpus seeder that extracts real node data and creates synthetic seeds (contrib/fuzz/seed_corpus_from_chain.py), and documentation (contrib/fuzz/README.md).

Sequence Diagram(s)

sequenceDiagram
    participant GHA as GitHub Actions
    participant Artifact as Artifact Store
    participant Container as Fuzz Container
    participant FuzzBin as Fuzz Binary
    participant Reporter as Report Summary

    GHA->>Artifact: download build artifacts & seed corpus
    Artifact-->>GHA: deliver artifacts and corpora
    GHA->>Container: start container with inputs (bundle-key, build-target, container-path)
    Container->>FuzzBin: discover fuzz targets
    FuzzBin-->>Container: return target list
    Container->>Container: prepare corpus (synthetic + inherited)
    loop per fuzz target
        Container->>FuzzBin: run target (replay or empty-corpus run)
        FuzzBin-->>Container: execution result (pass/fail/crash)
        Container->>Container: collect artifacts, logs, crashes
    end
    Container->>Reporter: produce summary (Passed/Failed/Total)
    Reporter-->>GHA: return exit status / upload results
Loading
sequenceDiagram
    participant Seeder as seed_corpus_from_chain.py
    participant Node as Dash Node (RPC)
    participant CorpusDir as Corpus Directory

    Seeder->>Node: query recent blocks, txs, governance, masternode, quorum data
    Node-->>Seeder: return JSON/hex data
    Seeder->>CorpusDir: save serialized inputs (SHA256-named files) per target
    Seeder->>CorpusDir: generate synthetic seeds for multiple targets
    CorpusDir-->>Seeder: return per-target file counts and total size
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'ci: add fuzz regression testing and continuous fuzzing infrastructure' clearly and concisely summarizes the main changes: adding CI fuzz regression testing and continuous fuzzing infrastructure to the project.
Docstring Coverage ✅ Passed Docstring coverage is 83.33% which is sufficient. The required threshold is 80.00%.
Description check ✅ Passed The PR description is directly related to the changeset, providing a comprehensive overview of the fuzzing infrastructure additions including CI workflows, daemon scripts, and corpus tools.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (2)
.github/workflows/test-fuzz.yml (1)

127-138: No timeout guard on corpus replay — a hanging target can block the CI runner indefinitely.

The empty-corpus branch wraps the fuzz binary with timeout 30, but the corpus-replay branch (line 129) has no timeout. In libFuzzer, running a directory of test cases without fuzzing exits when all inputs are processed, but a target that hangs on a specific corpus input will block forever. Wrap the replay invocation with timeout (e.g., timeout $(($(find "$corpus_dir" -type f | wc -l) * 10 + 60)) or a fixed ceiling like 3600).

⏱️ Proposed fix
-            if FUZZ="$target" "$FUZZ_BIN" \
+            if FUZZ="$target" timeout 3600 "$FUZZ_BIN" \
                 -rss_limit_mb=4000 \
                 "$corpus_dir" 2>&1; then
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/test-fuzz.yml around lines 127 - 138, The corpus-replay
block currently runs the fuzz binary with FUZZ="$target" "$FUZZ_BIN"
"$corpus_dir" without a timeout, so a hanging test can block CI; modify that
invocation to be wrapped with timeout (e.g., prefix with timeout 3600 or compute
timeout as $(($(find "$corpus_dir" -type f | wc -l) * 10 + 60))) so the command
using FUZZ="$target" "$FUZZ_BIN" "$corpus_dir" is subject to a bounded runtime,
and keep the existing success/failure handling (PASS/FAIL, PASSED/FAILED
counters) unchanged.
contrib/fuzz/continuous_fuzz_daemon.sh (1)

161-163: find with -o predicates should be parenthesized for clarity and safety.

Both crash-count expressions (-name 'crash-*' -o -name 'timeout-*' -o -name 'oom-*') work correctly as written since no other predicates are ANDed in, but without explicit \( ... \) grouping the intent is ambiguous and adding -type f later would silently break the logic.

🔧 Proposed fix
-    crash_count=$(find "$target_crashes" -name 'crash-*' -o -name 'timeout-*' -o -name 'oom-*' 2>/dev/null | wc -l)
+    crash_count=$(find "$target_crashes" \( -name 'crash-*' -o -name 'timeout-*' -o -name 'oom-*' \) 2>/dev/null | wc -l)

Apply the same fix to the total_crashes expression at line 233.

Also applies to: 231-233

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@contrib/fuzz/continuous_fuzz_daemon.sh` around lines 161 - 163, The find
invocations use multiple -name predicates combined with -o without explicit
grouping, which is ambiguous and brittle; update both the crash_count and
total_crashes find commands (the lines that set crash_count=$(find ... | wc -l)
and total_crashes=...) to wrap the name predicates in escaped parentheses \( ...
\) so the three -name tests are grouped (and if you later add -type f it won't
change the logic); ensure any -type f predicate is placed outside the grouped
parentheses or combined with -a as intended.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/test-fuzz.yml:
- Around line 93-95: The SKIPPED counter is never incremented but is initialized
and printed ("Skipped: $SKIPPED"), so either remove the unused SKIPPED variable
and its summary output or wire it up where targets are actually skipped;
specifically update the logic that checks for missing corpus directories or
other early-continue paths to increment SKIPPED (increment SKIPPED with
SKIPPED=$((SKIPPED+1)) before continuing) or delete both the SKIPPED=0
initialization and the "Skipped: $SKIPPED" summary line to avoid a misleading
always-zero metric.
- Line 25: Remove the unused environment variable
CI_FAILFAST_TEST_LEAVE_DANGLING from the workflow: locate the env entry
CI_FAILFAST_TEST_LEAVE_DANGLING in the job/env block and delete that key (and
its value) so the workflow no longer sets this unused variable.

In `@contrib/fuzz/seed_corpus_from_chain.py`:
- Around line 211-219: The code is saving JSON text as hex bytes for the
"dash_quorum_snapshot_deserialize" target (qinfo = dash_cli(...); qinfo_hex =
qinfo.encode().hex()), which is wrong for a binary-deserialize fuzz target;
instead, call an RPC that returns the serialized quorum snapshot (or obtain the
serialized blob from the JSON response if it contains a hex/base64 field) and
pass that raw hex-serialized snapshot into save_corpus_input for the
"dash_quorum_snapshot_deserialize" target; update the logic around dash_cli,
qinfo and qinfo_hex (and the branch that calls save_corpus_input) to use the
actual serialized snapshot bytes (hex string) or skip this target if a
serialized form isn't available.
- Around line 158-168: The loop over objects uses an unused loop variable named
obj_hash; rename it to _obj_hash to satisfy Ruff B007 and indicate it's
intentionally unused; update the for statement in the block that calls
json.loads(result) and iterates "for obj_hash, obj_data in objects.items()" to
"for _obj_hash, obj_data in objects.items()" while leaving the body that calls
save_corpus_input (for "dash_governance_object_deserialize" and
"dash_governance_object_roundtrip") unchanged; no other logic changes are
required.
- Line 22: Remove the unused import by deleting the line that imports the os
module in contrib/fuzz/seed_corpus_from_chain.py (the standalone "import os"
statement) to resolve the Flake8 F401 unused-import warning; ensure no other
references to os remain in functions or top-level code before removing.
- Around line 137-144: The current block saves the entire raw_tx for special
transactions to non-existent fuzz targets and doesn't extract the payload;
either remove this code path or implement it properly: when extra_payload > 0
(variable extra_payload) and tx_type in type_map, slice out the payload bytes as
payload = raw_tx[-extra_payload:] (instead of saving raw_tx), then save payload
via save_corpus_input using a target name that actually exists (update type_map
values to real fuzz target names or add the corresponding fuzz targets), and/or
guard the save loop (for suffix in ["_deserialize","_roundtrip"]) by verifying
the targets exist (or simply drop the loop and block if you choose to remove
incomplete support).

---

Nitpick comments:
In @.github/workflows/test-fuzz.yml:
- Around line 127-138: The corpus-replay block currently runs the fuzz binary
with FUZZ="$target" "$FUZZ_BIN" "$corpus_dir" without a timeout, so a hanging
test can block CI; modify that invocation to be wrapped with timeout (e.g.,
prefix with timeout 3600 or compute timeout as $(($(find "$corpus_dir" -type f |
wc -l) * 10 + 60))) so the command using FUZZ="$target" "$FUZZ_BIN"
"$corpus_dir" is subject to a bounded runtime, and keep the existing
success/failure handling (PASS/FAIL, PASSED/FAILED counters) unchanged.

In `@contrib/fuzz/continuous_fuzz_daemon.sh`:
- Around line 161-163: The find invocations use multiple -name predicates
combined with -o without explicit grouping, which is ambiguous and brittle;
update both the crash_count and total_crashes find commands (the lines that set
crash_count=$(find ... | wc -l) and total_crashes=...) to wrap the name
predicates in escaped parentheses \( ... \) so the three -name tests are grouped
(and if you later add -type f it won't change the logic); ensure any -type f
predicate is placed outside the grouped parentheses or combined with -a as
intended.

Comment thread .github/workflows/test-fuzz.yml Outdated
Comment thread .github/workflows/test-fuzz.yml Outdated
Comment thread contrib/fuzz/seed_corpus_from_chain.py Outdated
Comment thread contrib/fuzz/seed_corpus_from_chain.py Outdated
Comment thread contrib/fuzz/seed_corpus_from_chain.py Outdated
Comment thread contrib/fuzz/seed_corpus_from_chain.py
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@contrib/fuzz/seed_corpus_from_chain.py`:
- Line 41: The function save_corpus_input currently declares an unused parameter
label; remove the unused parameter from the function signature (change def
save_corpus_input(output_dir, target_name, data_hex):) and update any local
calls within this module that pass a label to instead pass only output_dir,
target_name, data_hex, or if callers elsewhere need labeling, alternatively wire
label into a debug/log/print inside save_corpus_input (e.g., include label in a
logging statement) so the parameter is actually used; focus your change on the
save_corpus_input definition and any direct callers to keep signatures
consistent.
- Line 46: The code calls bytes.fromhex(data_hex) which can raise ValueError for
malformed/non-hex RPC output; wrap this conversion in a try/except ValueError
around the bytes.fromhex(data_hex) call (the expression that assigns raw_bytes)
and handle failures by skipping the bad entry (e.g., continue) and optionally
logging the offending data_hex so the extractor doesn’t abort the whole
run—ensure you reference the same symbols (data_hex and raw_bytes) when adding
the try/except and any log message.

---

Duplicate comments:
In `@contrib/fuzz/seed_corpus_from_chain.py`:
- Around line 136-143: The loop is saving the full raw_tx to fuzz targets that
don't exist and ignores the comment that the payload is the last extra_payload
bytes; update the block in seed_corpus_from_chain.py to slice out only the last
extra_payload bytes (e.g., payload = raw_tx[-extra_payload:]) and pass that
payload to save_corpus_input instead of raw_tx, and also ensure the constructed
target names (the values in type_map combined with suffixes like "_deserialize"
and "_roundtrip") match actual registered fuzz target names in src/test/fuzz (or
adjust type_map to the correct target names) so save_corpus_input(output_dir,
f"{target}{suffix}", payload) writes to real targets.
- Around line 188-193: extract_masternode_list is saving the full
getrawtransaction bytes into non-existent/incorrect fuzz targets
(dash_proreg_tx_deserialize, dash_proreg_tx_roundtrip,
dash_deterministic_mn_deserialize); change the logic around
raw_tx/dash_cli/save_corpus_input so you extract the ProReg payload bytes (the
isolated ProReg payload parsed from the transaction) instead of the full
serialized transaction, and save those payload bytes to the correct existing
fuzz target names used under src/test/fuzz (match the target names used by
extract_special_txs); locate the raw_tx assignment and the loop over targets and
replace the raw_tx writes with the extracted proreg_payload variable (or re-use
the extraction routine from extract_special_txs) and ensure the target strings
passed to save_corpus_input correspond to real fuzzers.

Comment thread contrib/fuzz/seed_corpus_from_chain.py Outdated
Comment thread contrib/fuzz/seed_corpus_from_chain.py Outdated
@thepastaclaw
Copy link
Copy Markdown
Author

thepastaclaw commented Feb 23, 2026

Follow-up fix pushed in commit 5cac8c1:

  • Added export LC_ALL=C to contrib/fuzz/continuous_fuzz_daemon.sh to satisfy lint-shell-locale.py.
  • Local validation run on branch:
    • python3 test/lint/lint-shell-locale.py
    • python3 test/lint/lint-shell.py

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@contrib/fuzz/continuous_fuzz_daemon.sh`:
- Around line 171-173: The pipeline using grep against "$target_log" can return
exit status 1 when no matches are found which, combined with set -o pipefail and
set -e, will abort the daemon; modify the crash-logging pipeline (the grep ... |
while IFS= read -r line; do log "CRASH" "  $line"; done) to prevent a non-zero
exit from grep from failing the script — for example wrap that pipeline so
failures are ignored (append a harmless "|| true" to the pipeline) or
temporarily disable pipefail around this block, ensuring you still reference the
same variables and the log function and keep the existing crash_count guard.
- Around line 150-151: In continuous_fuzz_daemon.sh the inline environment
assignment for ASAN_OPTIONS overwrites any caller-provided options; change it to
preserve existing ASAN_OPTIONS and append (or prepend) the required option
(detect_leaks=0) using the standard colon-separated form so caller values like
abort_on_error=1 or symbolize=1 are retained, while still setting FUZZ="$target"
as before; update the assignment where ASAN_OPTIONS is set to merge with the
existing ASAN_OPTIONS rather than replacing it.

Comment thread contrib/fuzz/continuous_fuzz_daemon.sh Outdated
Comment thread contrib/fuzz/continuous_fuzz_daemon.sh Outdated
@UdjinM6
Copy link
Copy Markdown

UdjinM6 commented Feb 23, 2026

Issues and Suggestions

High Priority

  1. test-fuzz.yml:83 — git clone failure silently ignored
    git clone --depth=1 https://github.com/bitcoin-core/qa-assets /tmp/qa-assets || true
    If the clone fails (network issue), the entire regression test runs with an empty corpus, and only the 10-second empty-corpus mode is exercised. This makes the CI step pass vacuously. Consider at minimum logging a warning, or better, failing if the clone fails (since without corpus data the regression test is mostly meaningless).

  2. test-fuzz.yml:127-128 — Missing corpus runs for 10 seconds but timeout is 30 seconds

if FUZZ="$target" timeout 30 "$FUZZ_BIN" \
    -max_total_time=10 \

The 30s timeout wrapper around a 10s max_total_time is redundant but benign. However, libFuzzer may not always honor max_total_time if stuck in initialization — the external timeout is a good safety net. Consider aligning them or adding a comment explaining the intentional gap.

  1. seed_corpus_from_chain.py:678-683 — Incorrect payload extraction logic
extra_payload = tx.get("extraPayloadSize", 0)
if extra_payload > 0 and tx_type in type_map:
    target = type_map[tx_type]
    for suffix in ["_deserialize", "_roundtrip"]:
        if save_corpus_input(output_dir, f"{target}{suffix}", raw_tx):

The comment says "The extra payload is the last N bytes of the raw tx", but the code saves the entire raw transaction (raw_tx), not just the payload bytes. If the fuzz targets expect only the payload (as the target names like dash_proreg_tx_deserialize suggest), this gives them wrong input. If they expect the full transaction, that's fine — but the comment is misleading. Clarify the intent or actually extract just the payload bytes.

  1. continuous_fuzz_daemon.sh:370 — nproc not portable (macOS)
    JOBS=$(( $(nproc) / 2 ))
    Since this daemon is meant for developer use (not just CI), it should use the portable version, consider $(nproc 2>/dev/null || sysctl -n hw.ncpu).

Medium Priority

  1. test-fuzz.yml:111 — detect_leaks=0 in ASAN_OPTIONS
    Leak detection is disabled. This is likely intentional (leak detection in fuzz targets is often noisy), but worth a brief comment explaining why.

  2. continuous_fuzz_daemon.sh:439 — ASAN_OPTIONS only sets detect_leaks=0
    The CI workflow sets much more comprehensive sanitizer options (detect_stack_use_after_return, check_initialization_order, etc.). The daemon should either match these or document why it uses reduced options.

  3. continuous_fuzz_daemon.sh:509 — shuf is not available on macOS by default
    shuffled=$(echo "$targets" | shuf)
    Consider a fallback: shuf is GNU coreutils, not available on macOS without brew install coreutils. Could use sort -R or awk as a fallback.

  4. test-fuzz.yml:154 — ls "$corpus_dir" | wc -l counts may be misleading
    echo "::group::${target} ($(ls "$corpus_dir" | wc -l) inputs)"
    Hidden files (unlikely but possible) wouldn't be counted; use find "$corpus_dir" -maxdepth 1 -type f | wc -l for accuracy.

@thepastaclaw
Copy link
Copy Markdown
Author

Thanks for the detailed review — I pushed follow-up fixes in ef6525050e and addressed each item:

  1. test-fuzz.yml clone failure ignored

    • qa-assets clone now fails loudly and exits non-zero instead of || true.
  2. 10s run under 30s timeout

    • Kept the external timeout as a safety net and added an explicit inline comment explaining the intentional gap (timeout 30 vs -max_total_time=10) for startup hangs.
  3. seed_corpus_from_chain.py payload extraction mismatch

    • Updated to extract and store only extraPayload bytes using extraPayloadSize + CompactSize prefix validation.
    • If extraction is invalid/mismatched, it now warns and skips instead of saving incorrect data.
  4. nproc portability in daemon

    • Added portable CPU detection: nprocsysctl -n hw.ncpu → fallback 2.
  5. detect_leaks=0 rationale in workflow

    • Added inline comment explaining this is intentional for fuzz regression due to libFuzzer/LSan noise.
  6. Daemon ASAN options narrower than CI

    • Expanded daemon ASAN_OPTIONS to align with CI defaults:
      • detect_stack_use_after_return=1
      • check_initialization_order=1
      • strict_init_order=1
      • detect_leaks=0
  7. shuf portability in daemon

    • Added fallback shuffle path when shuf is unavailable (awk+sort randomization).
  8. ls | wc -l corpus count

    • Replaced with find "$corpus_dir" -maxdepth 1 -type f | wc -l.

Validation run locally:

  • bash -n contrib/fuzz/continuous_fuzz_daemon.sh
  • python3 -m py_compile contrib/fuzz/seed_corpus_from_chain.py

Commit: thepastaclaw@ef6525050e

@thepastaclaw
Copy link
Copy Markdown
Author

@UdjinM6 thanks for the detailed review — I addressed all 8 items in 3700fcc5562154f5f3b26d5af0cbffcfcb6b9968.

Point-by-point:

  1. test-fuzz.yml: silent git clone ... || true
    ✅ Fixed. Clone now hard-fails with explicit error output if qa-assets cannot be fetched.

  2. timeout 30 vs -max_total_time=10 gap
    ✅ Kept the safety net behavior and added an explicit comment documenting why the external timeout is intentionally larger.

  3. seed_corpus_from_chain.py: payload extraction mismatch
    ✅ Fixed by implementing real extra-payload extraction from raw special transactions (parse version/type, vin/vout CompactSize sections, locktime, payload CompactSize + payload bytes). The script now writes payload-only bytes into payload fuzz corpora.

  4. continuous_fuzz_daemon.sh: nproc portability
    ✅ Fixed with portable CPU detection fallback chain: nprocsysctl -n hw.ncpugetconf _NPROCESSORS_ONLN1.

  5. test-fuzz.yml: detect_leaks=0 rationale
    ✅ Added inline comment explaining intentional leak-detection disablement for fuzz regression stability/noise reasons.

  6. continuous_fuzz_daemon.sh: reduced ASAN options
    ✅ Updated daemon ASAN_OPTIONS to CI-aligned settings (detect_stack_use_after_return, check_initialization_order, strict_init_order, detect_leaks=0).

  7. continuous_fuzz_daemon.sh: shuf missing on macOS
    ✅ Fixed by adding shuffle_lines() with shuf when available and an awk|sort fallback otherwise.

  8. test-fuzz.yml: ls | wc -l corpus counts
    ✅ Replaced with find "$corpus_dir" -maxdepth 1 -type f | wc -l.

Validation run locally:

  • bash -n contrib/fuzz/continuous_fuzz_daemon.sh
  • python3 -m py_compile contrib/fuzz/seed_corpus_from_chain.py

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
contrib/fuzz/seed_corpus_from_chain.py (1)

41-41: ⚠️ Potential issue | 🟡 Minor

Unused label parameter still present (Ruff ARG001).

The label="" parameter is declared but never referenced in the function body. Ruff still fires ARG001 on this line.

🔧 Proposed fix (drop the parameter)
-def save_corpus_input(output_dir, target_name, data_hex, label=""):
+def save_corpus_input(output_dir, target_name, data_hex):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@contrib/fuzz/seed_corpus_from_chain.py` at line 41, The function
save_corpus_input declares an unused parameter label="" which triggers Ruff
ARG001; remove the label parameter from the function signature in
save_corpus_input and update any callers that pass a label argument to stop
supplying it (or adjust them to only pass output_dir, target_name, data_hex),
and run tests/lint to ensure no remaining references to label remain.
.github/workflows/test-fuzz.yml (1)

25-25: ⚠️ Potential issue | 🟡 Minor

CI_FAILFAST_TEST_LEAVE_DANGLING is set but never referenced.

This env var is defined at the workflow level but is not consumed anywhere in this workflow or in any script it calls. It can be safely removed.

🔧 Proposed fix
-env:
-  CI_FAILFAST_TEST_LEAVE_DANGLING: 1
-
 jobs:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/test-fuzz.yml at line 25, Remove the unused workflow
environment variable CI_FAILFAST_TEST_LEAVE_DANGLING from the workflow
definition: locate the environment block where CI_FAILFAST_TEST_LEAVE_DANGLING
is set and delete that key (the variable name CI_FAILFAST_TEST_LEAVE_DANGLING)
so the workflow no longer defines an unused env var; before removing, grep for
CI_FAILFAST_TEST_LEAVE_DANGLING across the repo to confirm nothing references it
and then commit the simplified workflow.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/test-fuzz.yml:
- Around line 93-95: The TARGETS enumeration swallows real startup failures due
to the "|| true" and yields an empty string which makes TARGET_COUNT incorrectly
report 1; after populating TARGETS (from PRINT_ALL_FUZZ_TARGETS_AND_ABORT via
FUZZ_BIN) add an explicit guard that checks whether TARGETS is empty or contains
an error marker and aborts with a non-zero exit and an explanatory error log if
so; use the variables TARGETS, TARGET_COUNT, and FUZZ_BIN in this check (e.g.,
test for empty string or for common loader/architecture errors in TARGETS) so
the workflow fails fast on genuine startup problems rather than reporting a
vacuous pass.

In `@contrib/fuzz/seed_corpus_from_chain.py`:
- Around line 244-250: The extract_masternode_list function is saving the full
raw ProRegTx to targets that expect only the extra payload bytes; change it to
call extract_extra_payload_hex(raw_tx) and pass that payload hex to
save_corpus_input for the dash_proreg_tx_deserialize and
dash_proreg_tx_roundtrip targets (mirroring extract_special_txs), and do not
feed the full transaction there. For dash_deterministic_mn_deserialize,
determine the expected serialized CDeterministicMN format and extract/convert
the appropriate bytes from the ProRegTx payload (or use an existing helper that
converts a ProRegTx payload into a CDeterministicMN serialization) and pass that
serialized data to save_corpus_input instead of the full raw_tx; update
extract_masternode_list to reference extract_extra_payload_hex,
save_corpus_input, extract_masternode_list, and
dash_deterministic_mn_deserialize so the correct payload bytes are saved for
each target.

---

Duplicate comments:
In @.github/workflows/test-fuzz.yml:
- Line 25: Remove the unused workflow environment variable
CI_FAILFAST_TEST_LEAVE_DANGLING from the workflow definition: locate the
environment block where CI_FAILFAST_TEST_LEAVE_DANGLING is set and delete that
key (the variable name CI_FAILFAST_TEST_LEAVE_DANGLING) so the workflow no
longer defines an unused env var; before removing, grep for
CI_FAILFAST_TEST_LEAVE_DANGLING across the repo to confirm nothing references it
and then commit the simplified workflow.

In `@contrib/fuzz/seed_corpus_from_chain.py`:
- Line 41: The function save_corpus_input declares an unused parameter label=""
which triggers Ruff ARG001; remove the label parameter from the function
signature in save_corpus_input and update any callers that pass a label argument
to stop supplying it (or adjust them to only pass output_dir, target_name,
data_hex), and run tests/lint to ensure no remaining references to label remain.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5cac8c1 and ef65250.

📒 Files selected for processing (3)
  • .github/workflows/test-fuzz.yml
  • contrib/fuzz/continuous_fuzz_daemon.sh
  • contrib/fuzz/seed_corpus_from_chain.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • contrib/fuzz/continuous_fuzz_daemon.sh

Comment thread .github/workflows/test-fuzz.yml Outdated
Comment thread contrib/fuzz/seed_corpus_from_chain.py Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (6)
.github/workflows/test-fuzz.yml (2)

25-25: ⚠️ Potential issue | 🟡 Minor

Unused environment variable CI_FAILFAST_TEST_LEAVE_DANGLING.

This variable is set but never referenced anywhere in the workflow or any called scripts.

🔧 Proposed fix
-env:
-  CI_FAILFAST_TEST_LEAVE_DANGLING: 1
-
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/test-fuzz.yml at line 25, The environment variable
CI_FAILFAST_TEST_LEAVE_DANGLING is defined in the workflow but never used;
either remove the CI_FAILFAST_TEST_LEAVE_DANGLING entry from the workflow file
or wire it into the code path that needs it (e.g., pass it into steps as env or
reference it in called scripts/commands), ensuring references use the exact name
CI_FAILFAST_TEST_LEAVE_DANGLING so there are no unused vars left in the
workflow.

93-95: ⚠️ Potential issue | 🟡 Minor

Binary startup failure produces a vacuous pass.

|| true suppresses genuine startup failures (missing shared libraries, wrong architecture, etc.). In that case TARGETS is empty, TARGET_COUNT becomes 1 (off-by-one from echo "" | wc -l), the loop skips the single empty line, and the step exits 0 — reporting success with zero targets tested.

🛡️ Proposed fix
  TARGETS=$(PRINT_ALL_FUZZ_TARGETS_AND_ABORT=1 "$FUZZ_BIN" 2>&1 || true)
+ if [ -z "$TARGETS" ]; then
+   echo "ERROR: No fuzz targets found; fuzz binary may not have started correctly"
+   exit 1
+ fi
  TARGET_COUNT=$(echo "$TARGETS" | wc -l)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/test-fuzz.yml around lines 93 - 95, The current line uses
"TARGETS=$(PRINT_ALL_FUZZ_TARGETS_AND_ABORT=1 "$FUZZ_BIN" 2>&1 || true)" which
masks startup failures and causes an empty output to be treated as a successful
run; remove the "|| true" and instead check the command exit status and fail the
step on non-zero to surface missing libraries/arch issues. Also compute
TARGET_COUNT by counting non-empty lines in TARGETS (e.g., count lines that are
not blank) so an empty output yields 0 targets rather than 1; refer to the
variables/flags TARGETS, TARGET_COUNT, PRINT_ALL_FUZZ_TARGETS_AND_ABORT and the
executable "$FUZZ_BIN" when making these changes.
contrib/fuzz/continuous_fuzz_daemon.sh (2)

174-184: ⚠️ Potential issue | 🟡 Minor

ASAN_OPTIONS still overwrites any caller-supplied value.

The inline assignment always sets a fixed string; users running with custom options like abort_on_error=1 or symbolize=1 will silently lose them.

🛡️ Proposed fix
-    ASAN_OPTIONS="detect_stack_use_after_return=1:check_initialization_order=1:strict_init_order=1:detect_leaks=0" \
+    ASAN_OPTIONS="${ASAN_OPTIONS:+${ASAN_OPTIONS}:}detect_stack_use_after_return=1:check_initialization_order=1:strict_init_order=1:detect_leaks=0" \
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@contrib/fuzz/continuous_fuzz_daemon.sh` around lines 174 - 184, The inline
assignment for ASAN_OPTIONS overwrites any caller-supplied value; change the
invocation around "$FUZZ_BIN" so it preserves existing ASAN_OPTIONS by
concatenating them with the defaults instead of replacing them (e.g. build
ASAN_OPTIONS using the existing ASAN_OPTIONS variable if set, then append the
defaults like
detect_stack_use_after_return=1:check_initialization_order=1:strict_init_order=1:detect_leaks=0),
and use that combined variable in the inline env assignment before invoking
"$FUZZ_BIN" so options such as abort_on_error or symbolize provided by callers
are retained.

195-197: ⚠️ Potential issue | 🟠 Major

set -o pipefail + zero grep matches aborts the daemon.

When the log contains no matching lines, grep exits with status 1. With set -euo pipefail, the pipeline grep ... | while IFS= read -r line; ... propagates that status 1, which set -e then turns into a daemon abort — even though the intent is simply "skip if nothing matches".

🐛 Proposed fix
-        grep -E "SUMMARY|ERROR|BINGO|crash-|timeout-|oom-" "$target_log" 2>/dev/null | while IFS= read -r line; do
-            log "CRASH" "  $line"
-        done
+        grep -E "SUMMARY|ERROR|BINGO|crash-|timeout-|oom-" "$target_log" 2>/dev/null | while IFS= read -r line; do
+            log "CRASH" "  $line"
+        done || true
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@contrib/fuzz/continuous_fuzz_daemon.sh` around lines 195 - 197, The pipeline
using grep on "$target_log" can return exit status 1 when there are no matches,
which with set -euo pipefail aborts the daemon; fix by testing for matches first
and only running the pipeline when matches exist: use grep -Eq
"SUMMARY|ERROR|BINGO|crash-|timeout-|oom-" "$target_log" 2>/dev/null && grep -E
"SUMMARY|ERROR|BINGO|crash-|timeout-|oom-" "$target_log" 2>/dev/null | while
IFS= read -r line; do log "CRASH" "  $line"; done, so the while loop only runs
when grep finds matches (references: "$target_log" and the grep ... | while IFS=
read -r line; do log "CRASH" ... done pipeline).
contrib/fuzz/seed_corpus_from_chain.py (2)

41-41: ⚠️ Potential issue | 🟡 Minor

Unused label parameter in save_corpus_input.

label="" is declared but never referenced in the function body. Either use it (e.g., include it in a debug print) or drop it from the signature and all call sites.

🔧 Proposed fix (drop the parameter)
-def save_corpus_input(output_dir, target_name, data_hex, label=""):
+def save_corpus_input(output_dir, target_name, data_hex):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@contrib/fuzz/seed_corpus_from_chain.py` at line 41, The function
save_corpus_input declares an unused parameter label="" which should be removed
to avoid dead API surface; update the function signature in save_corpus_input to
drop the label parameter and remove it from all call sites (or alternatively, if
label is intended, thread it into the filename/logging within save_corpus_input
such as appending to the output filename or including in debug output) so there
are no unused parameters left; ensure references to save_corpus_input elsewhere
match the new signature (or use the label inside the function if you choose that
route).

274-299: ⚠️ Potential issue | 🟠 Major

extract_masternode_list saves the full raw transaction to payload-specific targets.

extract_special_txs correctly calls extract_extra_payload_hex() to strip the extra-payload bytes before saving to dash_proreg_tx_deserialize / dash_proreg_tx_roundtrip. extract_masternode_list bypasses that extraction and writes the entire raw transaction to the same targets, producing structurally mismatched corpus inputs that will generate only spurious parse errors at fuzz time. dash_deterministic_mn_deserialize almost certainly expects CDeterministicMN bytes, not a full ProRegTx.

🐛 Proposed fix
                 raw_tx = dash_cli("getrawtransaction", protx_hash, datadir=datadir)
                 if raw_tx:
-                    for target in ["dash_proreg_tx_deserialize", "dash_proreg_tx_roundtrip",
-                                   "dash_deterministic_mn_deserialize"]:
-                        if save_corpus_input(output_dir, target, raw_tx):
-                            saved += 1
+                    # Extract payload for proreg payload-specific targets
+                    # Fetch full tx info to get extraPayloadSize
+                    tx_info_str = dash_cli("getrawtransaction", protx_hash, "1", datadir=datadir)
+                    extra_payload_size = 0
+                    if tx_info_str:
+                        try:
+                            tx_info = json.loads(tx_info_str)
+                            extra_payload_size = int(tx_info.get("extraPayloadSize", 0))
+                        except (json.JSONDecodeError, TypeError, ValueError):
+                            pass
+                    proreg_payload_hex, err = extract_extra_payload_hex(raw_tx, extra_payload_size)
+                    if proreg_payload_hex:
+                        for target in ["dash_proreg_tx_deserialize", "dash_proreg_tx_roundtrip"]:
+                            if save_corpus_input(output_dir, target, proreg_payload_hex):
+                                saved += 1
+                    # NOTE: dash_deterministic_mn_deserialize expects CDeterministicMN bytes;
+                    # verify the correct binary format and implement extraction separately.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@contrib/fuzz/seed_corpus_from_chain.py` around lines 274 - 299, The function
extract_masternode_list currently writes the entire raw transaction to
payload-specific fuzz targets, causing malformed inputs; instead, call the
existing extract_extra_payload_hex(raw_tx) and save its returned extra-payload
hex to the dash_proreg_tx_deserialize and dash_proreg_tx_roundtrip targets, and
do NOT save the full raw_tx there; for dash_deterministic_mn_deserialize, do not
save the full tx either—either extract the CDeterministicMN bytes (using the
same extractor if it returns the deterministic-MN blob) or skip saving to that
target until a proper extractor exists, and update the loop in
extract_masternode_list to use the extracted payload variable for the
appropriate targets.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/test-fuzz.yml:
- Around line 134-136: The corpus replay invocation using FUZZ="$target"
"$FUZZ_BIN" -rss_limit_mb=4000 "$corpus_dir" can run indefinitely because it
lacks a termination flag; modify the call that constructs the fuzzer command
(the line invoking FUZZ, "$FUZZ_BIN", and "$corpus_dir") to include a
termination option such as -runs=0 (or alternatively -max_total_time=<seconds>)
so libFuzzer will only replay the corpus and exit instead of entering unlimited
mutation mode.

---

Duplicate comments:
In @.github/workflows/test-fuzz.yml:
- Line 25: The environment variable CI_FAILFAST_TEST_LEAVE_DANGLING is defined
in the workflow but never used; either remove the
CI_FAILFAST_TEST_LEAVE_DANGLING entry from the workflow file or wire it into the
code path that needs it (e.g., pass it into steps as env or reference it in
called scripts/commands), ensuring references use the exact name
CI_FAILFAST_TEST_LEAVE_DANGLING so there are no unused vars left in the
workflow.
- Around line 93-95: The current line uses
"TARGETS=$(PRINT_ALL_FUZZ_TARGETS_AND_ABORT=1 "$FUZZ_BIN" 2>&1 || true)" which
masks startup failures and causes an empty output to be treated as a successful
run; remove the "|| true" and instead check the command exit status and fail the
step on non-zero to surface missing libraries/arch issues. Also compute
TARGET_COUNT by counting non-empty lines in TARGETS (e.g., count lines that are
not blank) so an empty output yields 0 targets rather than 1; refer to the
variables/flags TARGETS, TARGET_COUNT, PRINT_ALL_FUZZ_TARGETS_AND_ABORT and the
executable "$FUZZ_BIN" when making these changes.

In `@contrib/fuzz/continuous_fuzz_daemon.sh`:
- Around line 174-184: The inline assignment for ASAN_OPTIONS overwrites any
caller-supplied value; change the invocation around "$FUZZ_BIN" so it preserves
existing ASAN_OPTIONS by concatenating them with the defaults instead of
replacing them (e.g. build ASAN_OPTIONS using the existing ASAN_OPTIONS variable
if set, then append the defaults like
detect_stack_use_after_return=1:check_initialization_order=1:strict_init_order=1:detect_leaks=0),
and use that combined variable in the inline env assignment before invoking
"$FUZZ_BIN" so options such as abort_on_error or symbolize provided by callers
are retained.
- Around line 195-197: The pipeline using grep on "$target_log" can return exit
status 1 when there are no matches, which with set -euo pipefail aborts the
daemon; fix by testing for matches first and only running the pipeline when
matches exist: use grep -Eq "SUMMARY|ERROR|BINGO|crash-|timeout-|oom-"
"$target_log" 2>/dev/null && grep -E "SUMMARY|ERROR|BINGO|crash-|timeout-|oom-"
"$target_log" 2>/dev/null | while IFS= read -r line; do log "CRASH" "  $line";
done, so the while loop only runs when grep finds matches (references:
"$target_log" and the grep ... | while IFS= read -r line; do log "CRASH" ...
done pipeline).

In `@contrib/fuzz/seed_corpus_from_chain.py`:
- Line 41: The function save_corpus_input declares an unused parameter label=""
which should be removed to avoid dead API surface; update the function signature
in save_corpus_input to drop the label parameter and remove it from all call
sites (or alternatively, if label is intended, thread it into the
filename/logging within save_corpus_input such as appending to the output
filename or including in debug output) so there are no unused parameters left;
ensure references to save_corpus_input elsewhere match the new signature (or use
the label inside the function if you choose that route).
- Around line 274-299: The function extract_masternode_list currently writes the
entire raw transaction to payload-specific fuzz targets, causing malformed
inputs; instead, call the existing extract_extra_payload_hex(raw_tx) and save
its returned extra-payload hex to the dash_proreg_tx_deserialize and
dash_proreg_tx_roundtrip targets, and do NOT save the full raw_tx there; for
dash_deterministic_mn_deserialize, do not save the full tx either—either extract
the CDeterministicMN bytes (using the same extractor if it returns the
deterministic-MN blob) or skip saving to that target until a proper extractor
exists, and update the loop in extract_masternode_list to use the extracted
payload variable for the appropriate targets.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ef65250 and 3700fcc.

📒 Files selected for processing (3)
  • .github/workflows/test-fuzz.yml
  • contrib/fuzz/continuous_fuzz_daemon.sh
  • contrib/fuzz/seed_corpus_from_chain.py

Comment thread .github/workflows/test-fuzz.yml Outdated
@thepastaclaw
Copy link
Copy Markdown
Author

Copy link
Copy Markdown

@UdjinM6 UdjinM6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls squash fixes into original commits and rebase to get rid of "Merge remote...."

Comment thread .github/workflows/test-fuzz.yml
Comment thread .github/workflows/test-fuzz.yml Outdated
Comment thread src/Makefile.test.include Outdated
@thepastaclaw
Copy link
Copy Markdown
Author

Good catch — you're right. I’m cleaning this up now by squashing the fixup commits into their originals and rebasing to remove the merge commits (including the "Merge remote..." ones). I’ll push a cleaned history shortly.

thepastaclaw added a commit to thepastaclaw/dash that referenced this pull request Feb 24, 2026
Consolidate fuzz regression workflow, continuous fuzz daemon tooling, and Dash-specific fuzz target additions into a clean commit history for PR dashpay#7173 cleanup.
@UdjinM6
Copy link
Copy Markdown

UdjinM6 commented Feb 24, 2026

Good catch — you're right. I’m cleaning this up now by squashing the fixup commits into their originals and rebasing to remove the merge commits (including the "Merge remote..." ones). I’ll push a cleaned history shortly.

Btw, you squashed everything into one commit instead of squashing the fixup commits into their originals.

@thepastaclaw thepastaclaw marked this pull request as draft February 25, 2026 08:22
thepastaclaw added a commit to thepastaclaw/dash that referenced this pull request Feb 25, 2026
Consolidate fuzz regression workflow, continuous fuzz daemon tooling, and Dash-specific fuzz target additions into a clean commit history for PR dashpay#7173 cleanup.
@thepastaclaw
Copy link
Copy Markdown
Author

@UdjinM6 re-checked current head 6b23e82604 again while reconciling the tracker item for this stale requested-changes state.

  • all current review threads are resolved
  • current CI is green on this head, including linux64_fuzz-build
  • the only remaining CHANGES_REQUESTED metadata still appears to be the old Mar 14 Build fails review on commit 15a96a07, not a live blocker on the current tip

If you still see a blocker on the current head, please point me at the exact thread or failing job and I’ll fix it directly. Otherwise this should be ready for a fresh look.

@UdjinM6
Copy link
Copy Markdown

UdjinM6 commented Apr 18, 2026

@thepastaclaw pls check d9afcdf

@thepastaclaw
Copy link
Copy Markdown
Author

@UdjinM6 checked d9afcdfb50 and applied that follow-up review bundle on top of the previous head.

New head: 0b28423a7d

Included from that pass:

  • tightened CQuorumSnapshot::Unserialize() and added an in-place re-deserialization regression test
  • hardened fuzz regression CI to fail on zero corpus signal and upload libFuzzer crash artifacts
  • fixed the UBSan/round-trip fuzz issues (governance_proposal_validator, simplified_mn_list_diff) and expanded the RPC fuzz allow/deny exclusions so startup no longer terminates on missing Dash command coverage
  • added the explanatory/documentation cleanups in the fuzz drivers noted in the review bundle

Targeted checks run locally on this head:

  • git diff --check
  • YAML parse of .github/workflows/test-fuzz.yml via python3

If there is still a blocker on the current head, please point me at the exact diff/thread/job and I’ll fix that directly.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0b28423a7d

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread contrib/fuzz/seed_corpus_from_chain.py
@thepastaclaw
Copy link
Copy Markdown
Author

Addressed the current live block-seed review on head 777e283e3a.

  • contrib/fuzz/seed_corpus_from_chain.py now prefixes the 4-byte stream version for the upstream block and block_deserialize corpus targets, not just dash_*_{deserialize,roundtrip} targets.
  • That fixes both extracted block paths in this script (extract_blocks() and the extract_quorum_info() block_deserialize save path), so real block seeds no longer get shifted/truncated when loaded by src/test/fuzz/block.cpp and src/test/fuzz/deserialize.cpp.
  • decode_tx and other unrelated non-Dash targets were intentionally left unchanged.

Validation on this diff:

  • python3 -m py_compile contrib/fuzz/seed_corpus_from_chain.py
  • small import/assert check confirming _needs_stream_version_prefix("block") and _needs_stream_version_prefix("block_deserialize") are true while decode_tx remains false

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 777e283e3a

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread .github/workflows/test-fuzz.yml Outdated
@thepastaclaw
Copy link
Copy Markdown
Author

Addressed the current live synthetic-only corpus review on head ad26d88dd4.

What changed in .github/workflows/test-fuzz.yml:

  • the corpus gate now tracks external corpus availability separately from synthetic seed generation
  • the job now hard-fails if neither bitcoin-core/qa-assets nor dashpay/qa-assets was successfully loaded
  • synthetic seeds remain an additive layer only; they can no longer keep the regression job green by themselves during external corpus outages

Targeted checks on this respin:

  • python3 -c "import yaml; yaml.safe_load(open('.github/workflows/test-fuzz.yml')); print("YAML OK")"
  • git diff --check

Current review-thread state on this head is back to 0 unresolved / non-outdated threads. If there is still a blocker on the current tip, please point me at the exact diff or job and I’ll fix it directly.

UdjinM6
UdjinM6 previously approved these changes Apr 19, 2026
Copy link
Copy Markdown

@UdjinM6 UdjinM6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI looks good https://github.com/UdjinM6/dash/actions/runs/24637960225 (pulled into develop in my repo to actually use the updated workflow in CI).

light ACK ad26d88

Copy link
Copy Markdown
Author

@thepastaclaw thepastaclaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Three valid findings remain, all in the new chain-corpus extractor: two blocking bugs where seed files are structurally incompatible with their fuzz targets (missing stream-version prefix; governance seeds are inner payload rather than full CGovernanceObject wire format), and one correctness issue where chain extraction silently requires -txindex despite already knowing the block hash. One docs nitpick about a misleading libFuzzer-process comment. All prior-SHA blockers (snapshot clear, stderr contamination, UBSAN_OPTIONS, numeric LLMQ types, runner pass-through, replay timeout, biased special-tx selection) are fixed at this SHA. Speculative/inherited-policy nits (SegWit parsing, AddressFromTag edge case, pull_request_target) dropped per verifier scope.

Reviewed commit: dc5e9fe

🔴 2 blocking | 🟡 1 suggestion(s) | 💬 1 nitpick(s)

🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.

In `contrib/fuzz/seed_corpus_from_chain.py`:
- [BLOCKING] lines 41-58: Corpus files for *_deserialize/*_roundtrip targets lack the required 4-byte stream-version prefix
  The new `_deserialize` and `_roundtrip` Dash harnesses (src/test/fuzz/deserialize_dash.cpp:59-62, src/test/fuzz/roundtrip_dash.cpp:59-61) read a 4-byte protocol version out of the input before deserializing the object, via `ds >> version; ds.SetVersion(version);`. `save_corpus_input()` writes the raw object bytes verbatim, so for every seed the first four payload bytes get consumed as the version and the remaining bytes no longer align with the object's wire format. The result is that nearly every 'real chain' seed for these targets is rejected as a malformed input, and the new CI / continuous-fuzz setup mostly replays invalid data rather than the real serialized objects this PR claims to preserve. Prepend a fixed 4-byte little-endian protocol version (e.g. `PROTOCOL_VERSION` / `INIT_PROTO_VERSION`) to `raw_bytes` when `target_name` ends in `_deserialize` or `_roundtrip` before hashing and writing.
- [BLOCKING] lines 255-276: Governance seeds are the inner DataHex payload, not a serialized CGovernanceObject
  `dash_governance_object_deserialize` and `dash_governance_object_roundtrip` deserialize a `CGovernanceObject`, whose `Serialize()` forwards to `Governance::Object` and emits `hashParent, revision, time, collateralHash, vchData, type, masternodeOutpoint, vchSig` (src/governance/common.h:70-84, src/governance/object.h:247-269). `gobject list all`'s `DataHex` field, however, is just `HexStr(vchData)` (src/governance/common.cpp:41-44) — the inner payload only. Feeding DataHex into those corpora produces seeds that cannot represent the real wire format: the first 32 payload bytes are mis-parsed as `hashParent`, and everything after is garbage. These targets will get ~zero useful coverage from chain data. Either build a synthetic `Governance::Object` around each DataHex, serialize it, and save the result, or only route DataHex into a target that genuinely accepts payload-only bytes.
- [SUGGESTION] lines 200-223: Chain extraction calls getrawtransaction without the known block hash, silently requiring -txindex
  `extract_special_txs()` has already fetched `block_hash` at line 201 but then calls `dash_cli("getrawtransaction", txid, ...)` at line 221 with only the txid. For transactions outside the mempool, Dash Core's `getrawtransaction` requires `-txindex` unless a blockhash argument is supplied, so on a default node every historical special transaction silently fails and is skipped — producing a much smaller corpus than the daemon expects. The same omission is repeated for `protx_hash` in `extract_masternode_list()` at line 294. Pass the known `block_hash` as the third positional argument to `getrawtransaction` (and for the masternode path, locate the block for the protx tx rather than relying on txindex).

Comment thread contrib/fuzz/seed_corpus_from_chain.py
Comment thread contrib/fuzz/seed_corpus_from_chain.py
Comment thread contrib/fuzz/seed_corpus_from_chain.py
Comment thread src/test/fuzz/bls_operations.cpp
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0a92dc61ef

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/test/fuzz/deterministic_mn_list_diff.cpp Outdated
@thepastaclaw
Copy link
Copy Markdown
Author

Addressed the latest current-head follow-ups in f366d17cde:

  • made the operator-key mutations in deterministic_mn_list_diff and simplified_mn_list_diff deterministic instead of calling MakeNewKey(), so identical corpus inputs replay identically
  • reworded the bls_operations comment to describe the actual safety property (each invocation overwrites bls_legacy_scheme before use) instead of implying per-process isolation

I also re-checked the older synthetic-only corpus concern: the current workflow already hard-fails when neither external qa-assets source is reachable, so it no longer passes on synthetic-only replay.

Copy link
Copy Markdown
Author

@thepastaclaw thepastaclaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

One valid suggestion verified: the new chain-seeded corpus workflow leaves the blockmerkleroot fuzz target without seeds even though it uses the same stream-version-prefixed input format as block_deserialize. Note: Codex's suggested fix as written would write unprefixed blobs; the target must also be added to _NON_DASH_STREAM_VERSION_TARGETS. No blocking issues; Claude produced no findings.

Reviewed commit: ad26d88

🟡 1 suggestion(s)

🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.

In `contrib/fuzz/seed_corpus_from_chain.py`:
- [SUGGESTION] lines 207-213: Chain-seeded block corpus skips the `blockmerkleroot` target
  `extract_blocks()` only seeds `block_deserialize` and `block`, but `src/test/fuzz/deserialize.cpp:197-202` defines `blockmerkleroot` with the identical `DeserializeFromFuzzingInput(buffer, block)` path. That means `blockmerkleroot` still falls back to the empty-corpus smoke run in CI, leaving a real regression hole in the new replay workflow. The fix also requires adding `blockmerkleroot` to `_NON_DASH_STREAM_VERSION_TARGETS` at line 60 — otherwise `_needs_stream_version_prefix()` returns False for it and `save_corpus_input()` would write blobs without the 4-byte PROTOCOL_VERSION prefix that `DeserializeFromFuzzingInput` consumes, producing a corpus the harness cannot parse.

Comment on lines +207 to +213
# Get serialized block
block_hex = dash_cli("getblock", block_hash, "0", datadir=datadir)
if block_hex:
if save_corpus_input(output_dir, "block_deserialize", block_hex):
saved += 1
if save_corpus_input(output_dir, "block", block_hex):
saved += 1
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion: Chain-seeded block corpus skips the blockmerkleroot target

extract_blocks() only seeds block_deserialize and block, but src/test/fuzz/deserialize.cpp:197-202 defines blockmerkleroot with the identical DeserializeFromFuzzingInput(buffer, block) path. That means blockmerkleroot still falls back to the empty-corpus smoke run in CI, leaving a real regression hole in the new replay workflow. The fix also requires adding blockmerkleroot to _NON_DASH_STREAM_VERSION_TARGETS at line 60 — otherwise _needs_stream_version_prefix() returns False for it and save_corpus_input() would write blobs without the 4-byte PROTOCOL_VERSION prefix that DeserializeFromFuzzingInput consumes, producing a corpus the harness cannot parse.

💡 Suggested change
Suggested change
# Get serialized block
block_hex = dash_cli("getblock", block_hash, "0", datadir=datadir)
if block_hex:
if save_corpus_input(output_dir, "block_deserialize", block_hex):
saved += 1
if save_corpus_input(output_dir, "block", block_hex):
saved += 1
# Get serialized block
block_hex = dash_cli("getblock", block_hash, "0", datadir=datadir)
if block_hex:
if save_corpus_input(output_dir, "block_deserialize", block_hex):
saved += 1
if save_corpus_input(output_dir, "block", block_hex):
saved += 1
if save_corpus_input(output_dir, "blockmerkleroot", block_hex):
saved += 1

source: ['codex']

🤖 Fix this with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.

In `contrib/fuzz/seed_corpus_from_chain.py`:
- [SUGGESTION] lines 207-213: Chain-seeded block corpus skips the `blockmerkleroot` target
  `extract_blocks()` only seeds `block_deserialize` and `block`, but `src/test/fuzz/deserialize.cpp:197-202` defines `blockmerkleroot` with the identical `DeserializeFromFuzzingInput(buffer, block)` path. That means `blockmerkleroot` still falls back to the empty-corpus smoke run in CI, leaving a real regression hole in the new replay workflow. The fix also requires adding `blockmerkleroot` to `_NON_DASH_STREAM_VERSION_TARGETS` at line 60 — otherwise `_needs_stream_version_prefix()` returns False for it and `save_corpus_input()` would write blobs without the 4-byte PROTOCOL_VERSION prefix that `DeserializeFromFuzzingInput` consumes, producing a corpus the harness cannot parse.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Fixed in commit 505247c88483: Fixed: extract_blocks now writes block hex to blockmerkleroot alongside block_deserialize and block (contrib/fuzz/seed_corpus_from_chain.py:381-386).

Copy link
Copy Markdown
Author

@thepastaclaw thepastaclaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This SHA meaningfully expands Dash fuzzing infrastructure, but two of the new corpus-generation paths still miss the targets they were added to support. The result is weaker real-world fuzz regression coverage than the PR description suggests, even though the harness additions themselves look directionally good.

Reviewed commit: f366d17

🟡 2 suggestion(s)

🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.

In `contrib/fuzz/seed_corpus_from_chain.py`:
- [SUGGESTION] lines 308-310: Governance corpus extraction saves only `DataHex`, not a serialized `Governance::Object`
  `extract_governance_objects()` writes `gobject list all`'s `DataHex` straight into the `dash_governance_object_common_deserialize` corpus. But that fuzz target deserializes a full `Governance::Object` (`src/test/fuzz/deserialize_dash.cpp:252-254`), whose wire format includes `hashParent`, `revision`, `time`, `collateralHash`, `vchData`, `type`, `masternodeOutpoint`, and `vchSig` (`src/governance/common.h:70-83`). Feeding only the payload bytes means the generated corpus entries are not valid serialized governance objects, so this extraction path does not give meaningful regression coverage to the target it is named for.
- [SUGGESTION] lines 451-462: `extract_quorum_info()` never seeds the new Dash quorum fuzz targets
  The quorum extractor claims to focus on quorum data for Dash-specific fuzzing, but it only fetches `quorum info`, reads `minedBlock`, and saves the full block hex into `block_deserialize`. Those block seeds are already produced by `extract_blocks()`, while none of the new Dash quorum targets added in this PR (for example `dash_quorum_snapshot_deserialize` in `src/test/fuzz/deserialize_dash.cpp:245`) receive any corpus from this path. As written, the new quorum-specific extraction code does not actually improve coverage for the quorum-specific targets it was added to support.

Comment thread contrib/fuzz/seed_corpus_from_chain.py Outdated
Comment on lines +308 to +310
data_hex = obj_data.get("DataHex", "")
if data_hex:
if save_corpus_input(output_dir, "dash_governance_object_common_deserialize", data_hex):
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion: Governance corpus extraction saves only DataHex, not a serialized Governance::Object

extract_governance_objects() writes gobject list all's DataHex straight into the dash_governance_object_common_deserialize corpus. But that fuzz target deserializes a full Governance::Object (src/test/fuzz/deserialize_dash.cpp:252-254), whose wire format includes hashParent, revision, time, collateralHash, vchData, type, masternodeOutpoint, and vchSig (src/governance/common.h:70-83). Feeding only the payload bytes means the generated corpus entries are not valid serialized governance objects, so this extraction path does not give meaningful regression coverage to the target it is named for.

source: ['codex']

🤖 Fix this with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.

In `contrib/fuzz/seed_corpus_from_chain.py`:
- [SUGGESTION] lines 308-310: Governance corpus extraction saves only `DataHex`, not a serialized `Governance::Object`
  `extract_governance_objects()` writes `gobject list all`'s `DataHex` straight into the `dash_governance_object_common_deserialize` corpus. But that fuzz target deserializes a full `Governance::Object` (`src/test/fuzz/deserialize_dash.cpp:252-254`), whose wire format includes `hashParent`, `revision`, `time`, `collateralHash`, `vchData`, `type`, `masternodeOutpoint`, and `vchSig` (`src/governance/common.h:70-83`). Feeding only the payload bytes means the generated corpus entries are not valid serialized governance objects, so this extraction path does not give meaningful regression coverage to the target it is named for.

Comment thread contrib/fuzz/seed_corpus_from_chain.py Outdated
Comment on lines +451 to +462
qinfo_str = dash_cli("quorum", "info", numeric_type, qhash, datadir=datadir)
if not qinfo_str:
continue
try:
qinfo = json.loads(qinfo_str)
except json.JSONDecodeError:
continue
# Extract the commitment tx if available
mining_hash = qinfo.get("minedBlock", "")
if mining_hash:
block_hex = dash_cli("getblock", mining_hash, "0", datadir=datadir)
if block_hex and save_corpus_input(output_dir, "block_deserialize", block_hex):
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion: extract_quorum_info() never seeds the new Dash quorum fuzz targets

The quorum extractor claims to focus on quorum data for Dash-specific fuzzing, but it only fetches quorum info, reads minedBlock, and saves the full block hex into block_deserialize. Those block seeds are already produced by extract_blocks(), while none of the new Dash quorum targets added in this PR (for example dash_quorum_snapshot_deserialize in src/test/fuzz/deserialize_dash.cpp:245) receive any corpus from this path. As written, the new quorum-specific extraction code does not actually improve coverage for the quorum-specific targets it was added to support.

source: ['codex']

🤖 Fix this with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.

In `contrib/fuzz/seed_corpus_from_chain.py`:
- [SUGGESTION] lines 451-462: `extract_quorum_info()` never seeds the new Dash quorum fuzz targets
  The quorum extractor claims to focus on quorum data for Dash-specific fuzzing, but it only fetches `quorum info`, reads `minedBlock`, and saves the full block hex into `block_deserialize`. Those block seeds are already produced by `extract_blocks()`, while none of the new Dash quorum targets added in this PR (for example `dash_quorum_snapshot_deserialize` in `src/test/fuzz/deserialize_dash.cpp:245`) receive any corpus from this path. As written, the new quorum-specific extraction code does not actually improve coverage for the quorum-specific targets it was added to support.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 11dedc48f6

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +167 to +170
if FUZZ="$target" timeout 3600 "$FUZZ_BIN" \
-rss_limit_mb=4000 \
-runs=0 \
-artifact_prefix="$artifact_prefix" \
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Bound corpus replay runs to avoid timeout-based false failures

In the corpus replay branch, the fuzz command is invoked with -runs=0 under timeout 3600. For libFuzzer, -runs=0 means unbounded execution, so targets keep fuzzing instead of just replaying corpus inputs and the outer timeout eventually terminates them; GNU timeout then exits with code 124, which this script treats as a target failure. This makes regression runs fail for any non-empty corpus target after an hour rather than producing deterministic replay results.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

@thepastaclaw thepastaclaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The PR’s Dash fuzzing additions are generally solid, and I did not find any consensus or build-breaking defects in the new targets themselves. The confirmed issues are all in the surrounding fuzz infrastructure: two bugs in the continuous daemon that misreport results, one CI reproducibility problem, and a few seeding gaps that reduce the value of the new corpus generation.

Reviewed commit: d41ed26

🟡 4 suggestion(s) | 💬 2 nitpick(s)

🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.

In `contrib/fuzz/continuous_fuzz_daemon.sh`:
- [SUGGESTION] lines 197-209: Per-target crash reporting is poisoned by old artifacts
  `run_target()` reports a crash whenever `find "$target_crashes"` sees any `crash-*`, `timeout-*`, or `oom-*` file after the run. That directory persists across cycles, so one historical artifact makes every later invocation of the same target look like a fresh failure even when the current run was clean. In daemon mode this produces repeated false positives and destroys the signal the script is supposed to provide. Snapshot the artifact count before each target run and report only newly created files, or isolate each run in a fresh artifact directory.
- [SUGGESTION] lines 217-223: `--single-cycle` ignores non-artifact target failures
  `run_target()` captures a nonzero libFuzzer exit code in `exit_code`, logs it, and then unconditionally returns success. `main()` decides whether `--single-cycle` should fail only from the change in crash-artifact count. If a target exits nonzero without writing a `crash-*` file, the script still exits 0, which breaks the advertised cron/single-cycle regression use case. The cycle result needs to incorporate per-target exit status, not just artifact creation.

In `.github/workflows/test-fuzz.yml`:
- [SUGGESTION] lines 51-67: The regression workflow is not reproducible because it pulls corpus repos at moving HEAD
  The workflow clones `bitcoin-core/qa-assets` and `dashpay/qa-assets` with `--depth=1` and uses whatever is on their default branches at runtime. That means the same Dash commit can pass or fail later solely because an external corpus changed, which makes regressions hard to bisect and reruns hard to trust. A gating fuzz-regression job should pin those corpus sources to explicit SHAs or versioned artifacts and log the resolved revisions.

In `contrib/fuzz/seed_corpus_from_chain.py`:
- [SUGGESTION] lines 41-56: The corpus seeder duplicates `PROTOCOL_VERSION` without any sync check
  `STREAM_VERSION = 70240` must stay equal to `src/version.h`'s `PROTOCOL_VERSION`, and several harnesses consume that prefix before deserializing the seed. Right now the script only documents that requirement; nothing enforces it. After the next protocol bump this script can silently emit stale seeds that no longer exercise the intended deserialize paths. Read the value from `src/version.h` or add a check that fails when the constants diverge.

Comment thread contrib/fuzz/continuous_fuzz_daemon.sh Outdated
Comment thread contrib/fuzz/continuous_fuzz_daemon.sh Outdated
Comment thread .github/workflows/test-fuzz.yml
Comment thread contrib/fuzz/seed_corpus_from_chain.py
Comment on lines +410 to +469
def extract_quorum_info(output_dir, datadir=None):
"""Extract quorum-related data from the chain.

Note: quorum snapshot deserialize targets expect binary-serialized
CQuorumSnapshot data, not JSON. We extract final commitment transactions
from blocks instead, which are already captured by extract_special_txs()
for type 6 (TRANSACTION_QUORUM_COMMITMENT). This function focuses on
extracting quorum memberof data as raw bytes for other quorum targets.
"""
print("Extracting quorum data...")
result = dash_cli("quorum", "list", datadir=datadir)
if not result:
return 0

# quorum info expects a numeric llmqType, but quorum list returns string keys
llmq_type_map = {
"llmq_50_60": "1",
"llmq_400_60": "2",
"llmq_400_85": "3",
"llmq_100_67": "4",
"llmq_60_75": "5",
"llmq_25_67": "6",
"llmq_test": "100",
"llmq_devnet": "101",
"llmq_test_v17": "102",
"llmq_test_dip0024": "103",
"llmq_test_instantsend": "104",
"llmq_devnet_dip0024": "105",
"llmq_test_platform": "106",
"llmq_devnet_platform": "107",
}

saved = 0
try:
quorum_list = json.loads(result)
for qtype, hashes in quorum_list.items():
numeric_type = llmq_type_map.get(qtype)
if numeric_type is None:
print(f"WARNING: Unknown quorum type '{qtype}', skipping", file=sys.stderr)
continue
for qhash in hashes[:5]: # Limit per type
# Get the quorum commitment transaction via selectquorum
# which gives us the quorumHash we can look up in blocks
qinfo_str = dash_cli("quorum", "info", numeric_type, qhash, datadir=datadir)
if not qinfo_str:
continue
try:
qinfo = json.loads(qinfo_str)
except json.JSONDecodeError:
continue
# Extract the commitment tx if available
mining_hash = qinfo.get("minedBlock", "")
if mining_hash:
block_hex = dash_cli("getblock", mining_hash, "0", datadir=datadir)
if block_hex and save_corpus_input(output_dir, "block_deserialize", block_hex):
saved += 1
except (json.JSONDecodeError, AttributeError):
pass

print(f" Saved {saved} quorum corpus inputs")
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💬 Nitpick: extract_quorum_info() does not seed the quorum deserialize targets it claims to support

The function comment says it focuses on quorum-specific data for other quorum targets, but the implementation only saves mined blocks into block_deserialize. It never writes inputs for dash_quorum_snapshot_deserialize, dash_quorum_data_request_deserialize, or dash_get_quorum_rotation_info_deserialize, so those targets still rely entirely on synthetic data or empty corpora. Either narrow the function comment to match what it actually does or extend it to emit serialized quorum objects for the existing Dash quorum fuzz targets.

source: ['claude']

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Fixed in commit 505247c88483: Fixed: extract_quorum_info (contrib/fuzz/seed_corpus_from_chain.py:606-856) now writes seeds for dash_quorum_data_request_*, dash_quorum_snapshot_*, and dash_get_quorum_rotation_info_* using formats matching the C++ wire definitions in src/llmq/quorums.h and src/llmq/snapshot.h.

Comment on lines +65 to +72

CMutableTransaction tx;
tx.nVersion = CTransaction::SPECIAL_VERSION;
tx.nType = TRANSACTION_ASSET_LOCK;

std::vector<CTxOut> credit_outputs;
const size_t num_credit_outputs = fuzzed_data_provider.ConsumeIntegralInRange<size_t>(1, 4);
CAmount credit_outputs_total{0};
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💬 Nitpick: The asset-lock fuzz target never reaches the MoneyRange boundary it asserts

Each credit output is bounded to [1, 10 * COIN] and there are at most four of them, so credit_outputs_total never exceeds 40 * COIN. Under those bounds assert(MoneyRange(credit_outputs_total + amount)) is always true, which leaves the accumulation boundary entirely untested in this target. If the intent is to exercise overflow and max-money edges in the valid-path builder, the amount or output-count ranges need to be widened.

source: ['claude']

@thepastaclaw
Copy link
Copy Markdown
Author

Addressed the latest automated fuzz-infra follow-up on ci/fuzz-regression in 505247c884 (fix(fuzz): harden regression infrastructure).

What changed:

  • contrib/fuzz/continuous_fuzz_daemon.sh: per-target crash reporting now snapshots artifact counts so stale old crash files do not poison later clean runs, and --single-cycle now fails on any per-target nonzero exit, not only newly written artifacts.
  • .github/workflows/test-fuzz.yml: pinned bitcoin-core/qa-assets and dashpay/qa-assets to explicit SHAs and switched to exact-SHA fetch + detached checkout with resolved-revision logging.
  • contrib/fuzz/seed_corpus_from_chain.py: STREAM_VERSION now derives from src/version.h at runtime instead of duplicating PROTOCOL_VERSION.

Verification run locally:

  • bash -n contrib/fuzz/continuous_fuzz_daemon.sh
  • PYTHONPYCACHEPREFIX=/tmp/pycache-tracker-971 python3 -m py_compile contrib/fuzz/seed_corpus_from_chain.py
  • targeted Python import/asserts for STREAM_VERSION == 70240 and _needs_stream_version_prefix("blockmerkleroot")
  • git diff --check
  • targeted fake-fuzz-binary daemon simulation confirming stale artifacts no longer re-trigger crashes and non-artifact failures now trip --single-cycle

Copy link
Copy Markdown
Author

@thepastaclaw thepastaclaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

I verified the reported fuzzing and CI issues against the checked-out 11dedc48f6743c69bc5d291b45cd22030ebf7d58 tree. Four infrastructure findings and one low-severity fuzz-coverage finding are real in the current code; the CQuorumRotationInfo::Unserialize report should be dropped because, in this tree, there is no verified production deserialization path that reuses an existing CQuorumRotationInfo instance.

Reviewed commit: 11dedc4

🟡 4 suggestion(s) | 💬 1 nitpick(s)

🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.

In `contrib/fuzz/continuous_fuzz_daemon.sh`:
- [SUGGESTION] lines 197-209: Per-target crash reporting counts historical artifacts as new failures
  `run_target()` scans the entire persistent `${CRASHES_DIR}/${target}` directory after each run and logs that the target "produced" that many artifacts. Once a target has left behind any `crash-*`, `timeout-*`, or `oom-*` file, every later clean run is reported as another crash because this code never snapshots the pre-run count or isolates artifacts per invocation. That makes the daemon's per-target signal inaccurate even though the cycle-level summary correctly computes a delta.
- [SUGGESTION] lines 217-223: `--single-cycle` still exits 0 when a target fails without writing an artifact
  `run_target()` records a nonzero libFuzzer exit status in `exit_code`, but always returns success. `main()` then decides the `--single-cycle` process status only from the change in artifact count. A target that aborts before libFuzzer writes `crash-*`/`timeout-*`/`oom-*` files still makes the whole daemon exit successfully, which breaks the documented cron/regression use case for `--single-cycle`.

In `.github/workflows/test-fuzz.yml`:
- [SUGGESTION] lines 62-80: The regression workflow is not reproducible because it clones moving corpus heads
  The job pulls `bitcoin-core/qa-assets` and `dashpay/qa-assets` with `--depth=1` from their default branches at runtime. That means the same Dash commit can pass or fail later with no change in this repository, purely because an external corpus repository advanced. For a gating regression workflow, that breaks reruns and bisection; the corpora need to be pinned to explicit revisions and those revisions should be logged.

In `contrib/fuzz/seed_corpus_from_chain.py`:
- [SUGGESTION] lines 41-56: The corpus seeder hard-codes `PROTOCOL_VERSION` without any sync check
  `STREAM_VERSION = 70240` is documented as needing to match `src/version.h`'s `PROTOCOL_VERSION`, and the current tree does in fact define `PROTOCOL_VERSION` as `70240`. The problem is that nothing enforces that relationship. After the next protocol bump, this script can silently keep prepending a stale stream-version prefix to seeds for the deserialize/roundtrip harnesses, reducing coverage while still appearing to generate valid corpus entries.

Comment thread contrib/fuzz/continuous_fuzz_daemon.sh Outdated
Comment thread contrib/fuzz/continuous_fuzz_daemon.sh Outdated
Comment thread .github/workflows/test-fuzz.yml Outdated
Comment thread contrib/fuzz/seed_corpus_from_chain.py
Comment on lines +71 to +76
const size_t num_credit_outputs = fuzzed_data_provider.ConsumeIntegralInRange<size_t>(1, 4);
CAmount credit_outputs_total{0};
for (size_t i = 0; i < num_credit_outputs; ++i) {
const CAmount amount = fuzzed_data_provider.ConsumeIntegralInRange<CAmount>(1, 10 * COIN);
assert(MoneyRange(amount));
assert(MoneyRange(credit_outputs_total + amount));
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💬 Nitpick: The MoneyRange accumulation assertion is unreachable under the harness's own bounds

This harness limits num_credit_outputs to 1..4 and each amount to 1..10 * COIN, so credit_outputs_total + amount never exceeds 40 * COIN. Under those constraints assert(MoneyRange(credit_outputs_total + amount)) is always true, so the harness does not exercise any accumulation edge around MAX_MONEY; the assertion is effectively dead coverage unless the amount or count bounds are widened.

source: ['claude', 'codex']

Copy link
Copy Markdown
Author

@thepastaclaw thepastaclaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The two prior findings on seed_corpus_from_chain.py are resolved on this head: governance objects are now serialized as full Governance::Objects and quorum extraction fills the new Dash quorum target directories. One remaining suggestion: several newly added Dash fuzz targets (final_commitment, governance vote types, vote_instance, vote_rec, governance_vote_file, bls_ies, coinjoin_entry) get no chain-derived or synthetic seeds, so the CI replay step falls back to a 10-second empty-corpus smoke run for them.

Reviewed commit: 505247c

🟡 1 suggestion(s)

🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.

In `contrib/fuzz/seed_corpus_from_chain.py`:
- [SUGGESTION] lines 402-510: Several newly added Dash fuzz targets get no seed corpus in either live or synthetic mode
  A number of Dash fuzz targets added in this PR never receive any chain-derived or synthetic seeds, so when `test-fuzz.yml` runs them it hits the empty-corpus fallback at .github/workflows/test-fuzz.yml:159-171 and only does a 10-second smoke run instead of a structured regression replay.

Live-mode gaps in `extract_special_txs`/`extract_governance_objects`:
- The special-tx `type_map` (lines 403-413) maps tx type 6 only to `dash_final_commitment_tx_payload_*`. The separate `dash_final_commitment_*` targets at src/test/fuzz/deserialize_dash.cpp:188-191 and src/test/fuzz/roundtrip_dash.cpp:148-151 deserialize a bare `llmq::CFinalCommitment`, not a tx payload, and never receive seeds.
- `extract_governance_objects` only writes `dash_governance_object_*` (lines 494-498). The new vote-related targets registered at src/test/fuzz/deserialize_dash.cpp:260-274 (`dash_governance_vote_*`, `dash_vote_instance_*`, `dash_vote_rec_*`, `dash_governance_vote_file_*`) get no chain-derived seeds.

Synthetic-only gaps in `create_synthetic_seeds` (lines 869-898): the offline path only synthesizes seeds for `dash_coinjoin_accept`, `dash_coinjoin_queue`, `dash_coinjoin_status_update`, `dash_recovered_sig`, `dash_sig_ses_ann`, `dash_sig_share`, `dash_mnauth`, and `dash_dkg_complaint`, then clones each into its `_roundtrip` sibling. That leaves most new Dash targets (the ones above plus `dash_bls_ies_*`, `dash_coinjoin_entry_*`, etc.) with no corpus at all when a chain is unavailable.

Net effect: the regression workflow degrades to startup smoke tests for a large fraction of the new Dash targets unless an external corpus repo happens to provide them. Adding extractor entries (e.g. mapping tx type 6 to both final_commitment variants, deriving vote seeds from available governance vote RPC output) and broadening the synthetic dict to cover the remaining new targets would close the gap.

Comment on lines +402 to +510
# Map special tx types to fuzz target names
type_map = {
1: "dash_proreg_tx", # ProRegTx
2: "dash_proupserv_tx", # ProUpServTx
3: "dash_proupreg_tx", # ProUpRegTx
4: "dash_prouprev_tx", # ProUpRevTx
5: "dash_cbtx", # CbTx (coinbase)
6: "dash_final_commitment_tx_payload", # Quorum commitment
7: "dash_mnhf_tx_payload", # MN HF signal
8: "dash_asset_lock_payload", # Asset Lock
9: "dash_asset_unlock_payload", # Asset Unlock
}

for h in range(max(0, height - count), height + 1):
block_hash = dash_cli("getblockhash", str(h), datadir=datadir)
if not block_hash:
continue

block_json = dash_cli("getblock", block_hash, "2", datadir=datadir)
if not block_json:
continue

try:
block = json.loads(block_json)
except json.JSONDecodeError:
continue

for tx in block.get("tx", []):
tx_type = tx.get("type", 0)
if tx_type == 0:
continue

# Get raw transaction
txid = tx.get("txid", "")
raw_tx = dash_cli("getrawtransaction", txid, "false", block_hash, datadir=datadir)
if not raw_tx:
continue

# Save full transaction
if save_corpus_input(output_dir, "decode_tx", raw_tx):
saved += 1

# Extract special payload if we know the target
extra_payload_size = tx.get("extraPayloadSize", 0)
try:
extra_payload_size = int(extra_payload_size)
except (TypeError, ValueError):
extra_payload_size = 0

if extra_payload_size > 0 and tx_type in type_map:
payload_hex, err = extract_extra_payload_hex(raw_tx, extra_payload_size)
if not payload_hex:
print(
f"WARNING: Skipping special payload for tx {txid}: {err}",
file=sys.stderr,
)
continue

target = type_map[tx_type]
# Save payload bytes for both deserialize and roundtrip variants.
for suffix in ["_deserialize", "_roundtrip"]:
if save_corpus_input(output_dir, f"{target}{suffix}", payload_hex):
saved += 1

print(f" Saved {saved} special transaction corpus inputs")
return saved


def extract_governance_objects(output_dir, datadir=None):
"""Extract governance objects (proposals, triggers) into structurally valid seeds.

The fuzz target deserializes a full Governance::Object, not raw payload bytes,
so we reconstruct the object from the RPC fields (CollateralHash, CreationTime,
ObjectType, SigningMasternode, DataHex) and synthesize best-effort defaults for
fields the RPC doesn't expose (hashParent, revision, vchSig). CGovernanceObject's
non-disk serialization is just ``m_obj`` (see src/governance/object.h:248) so the
same bytes are valid seeds for dash_governance_object_{deserialize,roundtrip}.
"""
print("Extracting governance objects...")
result = dash_cli("gobject", "list", "all", datadir=datadir)
if not result:
return 0

saved = 0
try:
objects = json.loads(result)
except (json.JSONDecodeError, AttributeError):
return 0

if not isinstance(objects, dict):
return 0

targets = [
"dash_governance_object_common_deserialize",
"dash_governance_object_deserialize",
"dash_governance_object_roundtrip",
]

for obj_hash, obj_data in objects.items():
if not isinstance(obj_data, dict):
continue
try:
serialized = serialize_governance_object(obj_data)
except (ValueError, OverflowError) as e:
print(f"WARNING: Skipping governance object {obj_hash}: {e}", file=sys.stderr)
continue
seed_hex = serialized.hex()
for target in targets:
if save_corpus_input(output_dir, target, seed_hex):
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion: Several newly added Dash fuzz targets get no seed corpus in either live or synthetic mode

A number of Dash fuzz targets added in this PR never receive any chain-derived or synthetic seeds, so when test-fuzz.yml runs them it hits the empty-corpus fallback at .github/workflows/test-fuzz.yml:159-171 and only does a 10-second smoke run instead of a structured regression replay.

Live-mode gaps in extract_special_txs/extract_governance_objects:

  • The special-tx type_map (lines 403-413) maps tx type 6 only to dash_final_commitment_tx_payload_*. The separate dash_final_commitment_* targets at src/test/fuzz/deserialize_dash.cpp:188-191 and src/test/fuzz/roundtrip_dash.cpp:148-151 deserialize a bare llmq::CFinalCommitment, not a tx payload, and never receive seeds.
  • extract_governance_objects only writes dash_governance_object_* (lines 494-498). The new vote-related targets registered at src/test/fuzz/deserialize_dash.cpp:260-274 (dash_governance_vote_*, dash_vote_instance_*, dash_vote_rec_*, dash_governance_vote_file_*) get no chain-derived seeds.

Synthetic-only gaps in create_synthetic_seeds (lines 869-898): the offline path only synthesizes seeds for dash_coinjoin_accept, dash_coinjoin_queue, dash_coinjoin_status_update, dash_recovered_sig, dash_sig_ses_ann, dash_sig_share, dash_mnauth, and dash_dkg_complaint, then clones each into its _roundtrip sibling. That leaves most new Dash targets (the ones above plus dash_bls_ies_*, dash_coinjoin_entry_*, etc.) with no corpus at all when a chain is unavailable.

Net effect: the regression workflow degrades to startup smoke tests for a large fraction of the new Dash targets unless an external corpus repo happens to provide them. Adding extractor entries (e.g. mapping tx type 6 to both final_commitment variants, deriving vote seeds from available governance vote RPC output) and broadening the synthetic dict to cover the remaining new targets would close the gap.

source: ['codex']

🤖 Fix this with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.

In `contrib/fuzz/seed_corpus_from_chain.py`:
- [SUGGESTION] lines 402-510: Several newly added Dash fuzz targets get no seed corpus in either live or synthetic mode
  A number of Dash fuzz targets added in this PR never receive any chain-derived or synthetic seeds, so when `test-fuzz.yml` runs them it hits the empty-corpus fallback at .github/workflows/test-fuzz.yml:159-171 and only does a 10-second smoke run instead of a structured regression replay.

Live-mode gaps in `extract_special_txs`/`extract_governance_objects`:
- The special-tx `type_map` (lines 403-413) maps tx type 6 only to `dash_final_commitment_tx_payload_*`. The separate `dash_final_commitment_*` targets at src/test/fuzz/deserialize_dash.cpp:188-191 and src/test/fuzz/roundtrip_dash.cpp:148-151 deserialize a bare `llmq::CFinalCommitment`, not a tx payload, and never receive seeds.
- `extract_governance_objects` only writes `dash_governance_object_*` (lines 494-498). The new vote-related targets registered at src/test/fuzz/deserialize_dash.cpp:260-274 (`dash_governance_vote_*`, `dash_vote_instance_*`, `dash_vote_rec_*`, `dash_governance_vote_file_*`) get no chain-derived seeds.

Synthetic-only gaps in `create_synthetic_seeds` (lines 869-898): the offline path only synthesizes seeds for `dash_coinjoin_accept`, `dash_coinjoin_queue`, `dash_coinjoin_status_update`, `dash_recovered_sig`, `dash_sig_ses_ann`, `dash_sig_share`, `dash_mnauth`, and `dash_dkg_complaint`, then clones each into its `_roundtrip` sibling. That leaves most new Dash targets (the ones above plus `dash_bls_ies_*`, `dash_coinjoin_entry_*`, etc.) with no corpus at all when a chain is unavailable.

Net effect: the regression workflow degrades to startup smoke tests for a large fraction of the new Dash targets unless an external corpus repo happens to provide them. Adding extractor entries (e.g. mapping tx type 6 to both final_commitment variants, deriving vote seeds from available governance vote RPC output) and broadening the synthetic dict to cover the remaining new targets would close the gap.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 93ff82d. seed_corpus_from_chain.py now: (1) emits bare dash_final_commitment_{deserialize,roundtrip} seeds from type-6 special-tx payloads, (2) derives governance vote / vote_instance / vote_rec / vote_file seeds from gobject getcurrentvotes, and (3) broadens offline synthetic coverage for the previously empty Dash targets called out here (including the final-commitment, governance-vote, BLS IES, and CoinJoin-entry families). Verified with py_compile plus targeted helper/synthetic-seed checks.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 93ff82d3f7

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

((1).to_bytes(2, "little") + _serialize_uint32(0) + minimal_final_commitment).hex(),
],
"dash_recovered_sig_deserialize": [
"64" + "00" * 32 + "00" * 32 + "00" * 96, # llmqType + quorumHash + id + sig
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Include missing fields in synthetic LLMQ signing seeds

CRecoveredSig serialization requires llmqType, quorumHash, id, msgHash, and sig (see src/llmq/signing.h), but this seed omits msgHash; the adjacent dash_sig_ses_ann_deserialize and dash_sig_share_deserialize entries also use fixed-width layouts that do not match CSigSesAnn/CSigShare serialization (VARINT sessionId plus additional hash fields in src/llmq/signing_shares.h). In synthetic-corpus regression runs, these inputs fail deserialization early and do not exercise the intended signing/share code paths.

Useful? React with 👍 / 👎.

],
# DKG messages
"dash_dkg_complaint_deserialize": [
"64" + "00" * 32 + "00" * 32 + "0000" + "00", # minimal
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Emit full BLS signature bytes for DKG complaint seeds

CDKGComplaint serialization ends with a fixed-size BLS signature after the two dynbitsets (src/llmq/dkgsession.h), so a minimal valid payload still needs 96 signature bytes. This synthetic seed appends only a single byte after the bitsets, making it structurally truncated; dash_dkg_complaint_* corpus replay therefore mostly hits immediate parse failure instead of complaint-handling logic.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

@thepastaclaw thepastaclaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

I re-validated the prior and current findings against the checked-out head 93ff82d3f7e2689df745623b041ebdcaaacd6c15. Two blocking issues still apply on this head: the fuzz regression workflow still runs open-ended libFuzzer sessions instead of a deterministic corpus replay, and createwallet still rejects explicit legacy-wallet requests when descriptors=false is passed without load_on_startup. The earlier governance corpus-seeding issue is resolved on this head because the generator now serializes full Governance::Object instances instead of writing raw DataHex payloads.

Reviewed commit: 93ff82d

🔴 2 blocking

1 additional finding

🔴 blocking: `createwallet` rejects explicit legacy-wallet calls whenever `descriptors` is set without `load_on_startup`

src/wallet/rpc/wallet.cpp (lines 679-682)

The new guard throws as soon as parameter 5 is present and parameter 6 is omitted, before checking whether descriptors was actually set to true. As a result, createwallet ... descriptors=false now fails even though the RPC help still documents descriptors=false as the way to request a legacy wallet and that call is otherwise unambiguous. The functional test framework masks this by supplying load_on_startup, but direct RPC and CLI callers still hit an unnecessary API break.

💡 Suggested change
if (!request.params[5].isNull() && request.params[5].get_bool() && request.params[6].isNull()) {
    throw JSONRPCError(RPC_INVALID_PARAMETER, "The createwallet RPC requires specifying the 'load_on_startup' flag when creating descriptor wallets. Dash Core v21 introduced this requirement due to breaking changes in the createwallet RPC.");
}
if (request.params[5].isNull() || request.params[5].get_bool()) {
🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.

In `.github/workflows/test-fuzz.yml`:
- [BLOCKING] lines 184-190: Fuzz regression step still starts unbounded fuzzing instead of replaying the saved corpus once
  This block is labeled as a corpus regression replay, but it invokes libFuzzer with the corpus directory and `-runs=0`. In libFuzzer that means there is no execution cap, so after replaying the existing inputs it immediately switches into normal mutation fuzzing until the outer `timeout 3600` kills it. That makes this CI job nondeterministic and much slower than intended, and it stops serving as a reliable regression check tied to the committed corpus.

In `src/wallet/rpc/wallet.cpp`:
- [BLOCKING] lines 679-682: `createwallet` rejects explicit legacy-wallet calls whenever `descriptors` is set without `load_on_startup`
  The new guard throws as soon as parameter 5 is present and parameter 6 is omitted, before checking whether `descriptors` was actually set to `true`. As a result, `createwallet ... descriptors=false` now fails even though the RPC help still documents `descriptors=false` as the way to request a legacy wallet and that call is otherwise unambiguous. The functional test framework masks this by supplying `load_on_startup`, but direct RPC and CLI callers still hit an unnecessary API break.

Comment on lines +184 to +190
# Run corpus regression (replay all inputs)
echo "::group::${target} ($(find "$corpus_dir" -maxdepth 1 -type f | wc -l) inputs)"
if FUZZ="$target" timeout 3600 "$FUZZ_BIN" \
-rss_limit_mb=4000 \
-runs=0 \
-artifact_prefix="$artifact_prefix" \
"$corpus_dir" 2>&1; then
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Blocking: Fuzz regression step still starts unbounded fuzzing instead of replaying the saved corpus once

This block is labeled as a corpus regression replay, but it invokes libFuzzer with the corpus directory and -runs=0. In libFuzzer that means there is no execution cap, so after replaying the existing inputs it immediately switches into normal mutation fuzzing until the outer timeout 3600 kills it. That makes this CI job nondeterministic and much slower than intended, and it stops serving as a reliable regression check tied to the committed corpus.

💡 Suggested change
Suggested change
# Run corpus regression (replay all inputs)
echo "::group::${target} ($(find "$corpus_dir" -maxdepth 1 -type f | wc -l) inputs)"
if FUZZ="$target" timeout 3600 "$FUZZ_BIN" \
-rss_limit_mb=4000 \
-runs=0 \
-artifact_prefix="$artifact_prefix" \
"$corpus_dir" 2>&1; then
if FUZZ="$target" timeout 3600 "$FUZZ_BIN" \
-rss_limit_mb=4000 \
-runs=1 \
-artifact_prefix="$artifact_prefix" \
"$corpus_dir" 2>&1; then

source: ['codex']

🤖 Fix this with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.

In `.github/workflows/test-fuzz.yml`:
- [BLOCKING] lines 184-190: Fuzz regression step still starts unbounded fuzzing instead of replaying the saved corpus once
  This block is labeled as a corpus regression replay, but it invokes libFuzzer with the corpus directory and `-runs=0`. In libFuzzer that means there is no execution cap, so after replaying the existing inputs it immediately switches into normal mutation fuzzing until the outer `timeout 3600` kills it. That makes this CI job nondeterministic and much slower than intended, and it stops serving as a reliable regression check tied to the committed corpus.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants