ci: add fuzz regression testing and continuous fuzzing infrastructure#7173
ci: add fuzz regression testing and continuous fuzzing infrastructure#7173thepastaclaw wants to merge 18 commits intodashpay:developfrom
Conversation
✅ No Merge Conflicts DetectedThis PR currently has no conflicts with other open PRs. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds fuzzing infrastructure: a new reusable GitHub Actions job (test-linux64_fuzz) in .github/workflows/build.yml and a new workflow (.github/workflows/test-fuzz.yml) that downloads build artifacts, prepares corpora (inherited + synthetic), discovers fuzz targets in a fuzz binary, replays or exercises each target, collects crashes/logs, and reports a pass/fail summary (non-zero exit on failures). Adds contrib tools and docs: a continuous fuzzing daemon (contrib/fuzz/continuous_fuzz_daemon.sh), a corpus seeder that extracts real node data and creates synthetic seeds (contrib/fuzz/seed_corpus_from_chain.py), and documentation (contrib/fuzz/README.md). Sequence Diagram(s)sequenceDiagram
participant GHA as GitHub Actions
participant Artifact as Artifact Store
participant Container as Fuzz Container
participant FuzzBin as Fuzz Binary
participant Reporter as Report Summary
GHA->>Artifact: download build artifacts & seed corpus
Artifact-->>GHA: deliver artifacts and corpora
GHA->>Container: start container with inputs (bundle-key, build-target, container-path)
Container->>FuzzBin: discover fuzz targets
FuzzBin-->>Container: return target list
Container->>Container: prepare corpus (synthetic + inherited)
loop per fuzz target
Container->>FuzzBin: run target (replay or empty-corpus run)
FuzzBin-->>Container: execution result (pass/fail/crash)
Container->>Container: collect artifacts, logs, crashes
end
Container->>Reporter: produce summary (Passed/Failed/Total)
Reporter-->>GHA: return exit status / upload results
sequenceDiagram
participant Seeder as seed_corpus_from_chain.py
participant Node as Dash Node (RPC)
participant CorpusDir as Corpus Directory
Seeder->>Node: query recent blocks, txs, governance, masternode, quorum data
Node-->>Seeder: return JSON/hex data
Seeder->>CorpusDir: save serialized inputs (SHA256-named files) per target
Seeder->>CorpusDir: generate synthetic seeds for multiple targets
CorpusDir-->>Seeder: return per-target file counts and total size
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
🧹 Nitpick comments (2)
.github/workflows/test-fuzz.yml (1)
127-138: No timeout guard on corpus replay — a hanging target can block the CI runner indefinitely.The empty-corpus branch wraps the fuzz binary with
timeout 30, but the corpus-replay branch (line 129) has no timeout. In libFuzzer, running a directory of test cases without fuzzing exits when all inputs are processed, but a target that hangs on a specific corpus input will block forever. Wrap the replay invocation withtimeout(e.g.,timeout $(($(find "$corpus_dir" -type f | wc -l) * 10 + 60))or a fixed ceiling like3600).⏱️ Proposed fix
- if FUZZ="$target" "$FUZZ_BIN" \ + if FUZZ="$target" timeout 3600 "$FUZZ_BIN" \ -rss_limit_mb=4000 \ "$corpus_dir" 2>&1; then🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/test-fuzz.yml around lines 127 - 138, The corpus-replay block currently runs the fuzz binary with FUZZ="$target" "$FUZZ_BIN" "$corpus_dir" without a timeout, so a hanging test can block CI; modify that invocation to be wrapped with timeout (e.g., prefix with timeout 3600 or compute timeout as $(($(find "$corpus_dir" -type f | wc -l) * 10 + 60))) so the command using FUZZ="$target" "$FUZZ_BIN" "$corpus_dir" is subject to a bounded runtime, and keep the existing success/failure handling (PASS/FAIL, PASSED/FAILED counters) unchanged.contrib/fuzz/continuous_fuzz_daemon.sh (1)
161-163:findwith-opredicates should be parenthesized for clarity and safety.Both crash-count expressions (
-name 'crash-*' -o -name 'timeout-*' -o -name 'oom-*') work correctly as written since no other predicates are ANDed in, but without explicit\( ... \)grouping the intent is ambiguous and adding-type flater would silently break the logic.🔧 Proposed fix
- crash_count=$(find "$target_crashes" -name 'crash-*' -o -name 'timeout-*' -o -name 'oom-*' 2>/dev/null | wc -l) + crash_count=$(find "$target_crashes" \( -name 'crash-*' -o -name 'timeout-*' -o -name 'oom-*' \) 2>/dev/null | wc -l)Apply the same fix to the
total_crashesexpression at line 233.Also applies to: 231-233
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@contrib/fuzz/continuous_fuzz_daemon.sh` around lines 161 - 163, The find invocations use multiple -name predicates combined with -o without explicit grouping, which is ambiguous and brittle; update both the crash_count and total_crashes find commands (the lines that set crash_count=$(find ... | wc -l) and total_crashes=...) to wrap the name predicates in escaped parentheses \( ... \) so the three -name tests are grouped (and if you later add -type f it won't change the logic); ensure any -type f predicate is placed outside the grouped parentheses or combined with -a as intended.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/test-fuzz.yml:
- Around line 93-95: The SKIPPED counter is never incremented but is initialized
and printed ("Skipped: $SKIPPED"), so either remove the unused SKIPPED variable
and its summary output or wire it up where targets are actually skipped;
specifically update the logic that checks for missing corpus directories or
other early-continue paths to increment SKIPPED (increment SKIPPED with
SKIPPED=$((SKIPPED+1)) before continuing) or delete both the SKIPPED=0
initialization and the "Skipped: $SKIPPED" summary line to avoid a misleading
always-zero metric.
- Line 25: Remove the unused environment variable
CI_FAILFAST_TEST_LEAVE_DANGLING from the workflow: locate the env entry
CI_FAILFAST_TEST_LEAVE_DANGLING in the job/env block and delete that key (and
its value) so the workflow no longer sets this unused variable.
In `@contrib/fuzz/seed_corpus_from_chain.py`:
- Around line 211-219: The code is saving JSON text as hex bytes for the
"dash_quorum_snapshot_deserialize" target (qinfo = dash_cli(...); qinfo_hex =
qinfo.encode().hex()), which is wrong for a binary-deserialize fuzz target;
instead, call an RPC that returns the serialized quorum snapshot (or obtain the
serialized blob from the JSON response if it contains a hex/base64 field) and
pass that raw hex-serialized snapshot into save_corpus_input for the
"dash_quorum_snapshot_deserialize" target; update the logic around dash_cli,
qinfo and qinfo_hex (and the branch that calls save_corpus_input) to use the
actual serialized snapshot bytes (hex string) or skip this target if a
serialized form isn't available.
- Around line 158-168: The loop over objects uses an unused loop variable named
obj_hash; rename it to _obj_hash to satisfy Ruff B007 and indicate it's
intentionally unused; update the for statement in the block that calls
json.loads(result) and iterates "for obj_hash, obj_data in objects.items()" to
"for _obj_hash, obj_data in objects.items()" while leaving the body that calls
save_corpus_input (for "dash_governance_object_deserialize" and
"dash_governance_object_roundtrip") unchanged; no other logic changes are
required.
- Line 22: Remove the unused import by deleting the line that imports the os
module in contrib/fuzz/seed_corpus_from_chain.py (the standalone "import os"
statement) to resolve the Flake8 F401 unused-import warning; ensure no other
references to os remain in functions or top-level code before removing.
- Around line 137-144: The current block saves the entire raw_tx for special
transactions to non-existent fuzz targets and doesn't extract the payload;
either remove this code path or implement it properly: when extra_payload > 0
(variable extra_payload) and tx_type in type_map, slice out the payload bytes as
payload = raw_tx[-extra_payload:] (instead of saving raw_tx), then save payload
via save_corpus_input using a target name that actually exists (update type_map
values to real fuzz target names or add the corresponding fuzz targets), and/or
guard the save loop (for suffix in ["_deserialize","_roundtrip"]) by verifying
the targets exist (or simply drop the loop and block if you choose to remove
incomplete support).
---
Nitpick comments:
In @.github/workflows/test-fuzz.yml:
- Around line 127-138: The corpus-replay block currently runs the fuzz binary
with FUZZ="$target" "$FUZZ_BIN" "$corpus_dir" without a timeout, so a hanging
test can block CI; modify that invocation to be wrapped with timeout (e.g.,
prefix with timeout 3600 or compute timeout as $(($(find "$corpus_dir" -type f |
wc -l) * 10 + 60))) so the command using FUZZ="$target" "$FUZZ_BIN"
"$corpus_dir" is subject to a bounded runtime, and keep the existing
success/failure handling (PASS/FAIL, PASSED/FAILED counters) unchanged.
In `@contrib/fuzz/continuous_fuzz_daemon.sh`:
- Around line 161-163: The find invocations use multiple -name predicates
combined with -o without explicit grouping, which is ambiguous and brittle;
update both the crash_count and total_crashes find commands (the lines that set
crash_count=$(find ... | wc -l) and total_crashes=...) to wrap the name
predicates in escaped parentheses \( ... \) so the three -name tests are grouped
(and if you later add -type f it won't change the logic); ensure any -type f
predicate is placed outside the grouped parentheses or combined with -a as
intended.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@contrib/fuzz/seed_corpus_from_chain.py`:
- Line 41: The function save_corpus_input currently declares an unused parameter
label; remove the unused parameter from the function signature (change def
save_corpus_input(output_dir, target_name, data_hex):) and update any local
calls within this module that pass a label to instead pass only output_dir,
target_name, data_hex, or if callers elsewhere need labeling, alternatively wire
label into a debug/log/print inside save_corpus_input (e.g., include label in a
logging statement) so the parameter is actually used; focus your change on the
save_corpus_input definition and any direct callers to keep signatures
consistent.
- Line 46: The code calls bytes.fromhex(data_hex) which can raise ValueError for
malformed/non-hex RPC output; wrap this conversion in a try/except ValueError
around the bytes.fromhex(data_hex) call (the expression that assigns raw_bytes)
and handle failures by skipping the bad entry (e.g., continue) and optionally
logging the offending data_hex so the extractor doesn’t abort the whole
run—ensure you reference the same symbols (data_hex and raw_bytes) when adding
the try/except and any log message.
---
Duplicate comments:
In `@contrib/fuzz/seed_corpus_from_chain.py`:
- Around line 136-143: The loop is saving the full raw_tx to fuzz targets that
don't exist and ignores the comment that the payload is the last extra_payload
bytes; update the block in seed_corpus_from_chain.py to slice out only the last
extra_payload bytes (e.g., payload = raw_tx[-extra_payload:]) and pass that
payload to save_corpus_input instead of raw_tx, and also ensure the constructed
target names (the values in type_map combined with suffixes like "_deserialize"
and "_roundtrip") match actual registered fuzz target names in src/test/fuzz (or
adjust type_map to the correct target names) so save_corpus_input(output_dir,
f"{target}{suffix}", payload) writes to real targets.
- Around line 188-193: extract_masternode_list is saving the full
getrawtransaction bytes into non-existent/incorrect fuzz targets
(dash_proreg_tx_deserialize, dash_proreg_tx_roundtrip,
dash_deterministic_mn_deserialize); change the logic around
raw_tx/dash_cli/save_corpus_input so you extract the ProReg payload bytes (the
isolated ProReg payload parsed from the transaction) instead of the full
serialized transaction, and save those payload bytes to the correct existing
fuzz target names used under src/test/fuzz (match the target names used by
extract_special_txs); locate the raw_tx assignment and the loop over targets and
replace the raw_tx writes with the extracted proreg_payload variable (or re-use
the extraction routine from extract_special_txs) and ensure the target strings
passed to save_corpus_input correspond to real fuzzers.
|
Follow-up fix pushed in commit 5cac8c1:
|
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@contrib/fuzz/continuous_fuzz_daemon.sh`:
- Around line 171-173: The pipeline using grep against "$target_log" can return
exit status 1 when no matches are found which, combined with set -o pipefail and
set -e, will abort the daemon; modify the crash-logging pipeline (the grep ... |
while IFS= read -r line; do log "CRASH" " $line"; done) to prevent a non-zero
exit from grep from failing the script — for example wrap that pipeline so
failures are ignored (append a harmless "|| true" to the pipeline) or
temporarily disable pipefail around this block, ensuring you still reference the
same variables and the log function and keep the existing crash_count guard.
- Around line 150-151: In continuous_fuzz_daemon.sh the inline environment
assignment for ASAN_OPTIONS overwrites any caller-provided options; change it to
preserve existing ASAN_OPTIONS and append (or prepend) the required option
(detect_leaks=0) using the standard colon-separated form so caller values like
abort_on_error=1 or symbolize=1 are retained, while still setting FUZZ="$target"
as before; update the assignment where ASAN_OPTIONS is set to merge with the
existing ASAN_OPTIONS rather than replacing it.
|
Issues and Suggestions High Priority
The 30s timeout wrapper around a 10s max_total_time is redundant but benign. However, libFuzzer may not always honor max_total_time if stuck in initialization — the external timeout is a good safety net. Consider aligning them or adding a comment explaining the intentional gap.
The comment says "The extra payload is the last N bytes of the raw tx", but the code saves the entire raw transaction (raw_tx), not just the payload bytes. If the fuzz targets expect only the payload (as the target names like dash_proreg_tx_deserialize suggest), this gives them wrong input. If they expect the full transaction, that's fine — but the comment is misleading. Clarify the intent or actually extract just the payload bytes.
Medium Priority
|
|
Thanks for the detailed review — I pushed follow-up fixes in
Validation run locally:
Commit: thepastaclaw@ef6525050e |
|
@UdjinM6 thanks for the detailed review — I addressed all 8 items in 3700fcc5562154f5f3b26d5af0cbffcfcb6b9968. Point-by-point:
Validation run locally:
|
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (2)
contrib/fuzz/seed_corpus_from_chain.py (1)
41-41:⚠️ Potential issue | 🟡 MinorUnused
labelparameter still present (Ruff ARG001).The
label=""parameter is declared but never referenced in the function body. Ruff still fires ARG001 on this line.🔧 Proposed fix (drop the parameter)
-def save_corpus_input(output_dir, target_name, data_hex, label=""): +def save_corpus_input(output_dir, target_name, data_hex):🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@contrib/fuzz/seed_corpus_from_chain.py` at line 41, The function save_corpus_input declares an unused parameter label="" which triggers Ruff ARG001; remove the label parameter from the function signature in save_corpus_input and update any callers that pass a label argument to stop supplying it (or adjust them to only pass output_dir, target_name, data_hex), and run tests/lint to ensure no remaining references to label remain..github/workflows/test-fuzz.yml (1)
25-25:⚠️ Potential issue | 🟡 Minor
CI_FAILFAST_TEST_LEAVE_DANGLINGis set but never referenced.This env var is defined at the workflow level but is not consumed anywhere in this workflow or in any script it calls. It can be safely removed.
🔧 Proposed fix
-env: - CI_FAILFAST_TEST_LEAVE_DANGLING: 1 - jobs:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/test-fuzz.yml at line 25, Remove the unused workflow environment variable CI_FAILFAST_TEST_LEAVE_DANGLING from the workflow definition: locate the environment block where CI_FAILFAST_TEST_LEAVE_DANGLING is set and delete that key (the variable name CI_FAILFAST_TEST_LEAVE_DANGLING) so the workflow no longer defines an unused env var; before removing, grep for CI_FAILFAST_TEST_LEAVE_DANGLING across the repo to confirm nothing references it and then commit the simplified workflow.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/test-fuzz.yml:
- Around line 93-95: The TARGETS enumeration swallows real startup failures due
to the "|| true" and yields an empty string which makes TARGET_COUNT incorrectly
report 1; after populating TARGETS (from PRINT_ALL_FUZZ_TARGETS_AND_ABORT via
FUZZ_BIN) add an explicit guard that checks whether TARGETS is empty or contains
an error marker and aborts with a non-zero exit and an explanatory error log if
so; use the variables TARGETS, TARGET_COUNT, and FUZZ_BIN in this check (e.g.,
test for empty string or for common loader/architecture errors in TARGETS) so
the workflow fails fast on genuine startup problems rather than reporting a
vacuous pass.
In `@contrib/fuzz/seed_corpus_from_chain.py`:
- Around line 244-250: The extract_masternode_list function is saving the full
raw ProRegTx to targets that expect only the extra payload bytes; change it to
call extract_extra_payload_hex(raw_tx) and pass that payload hex to
save_corpus_input for the dash_proreg_tx_deserialize and
dash_proreg_tx_roundtrip targets (mirroring extract_special_txs), and do not
feed the full transaction there. For dash_deterministic_mn_deserialize,
determine the expected serialized CDeterministicMN format and extract/convert
the appropriate bytes from the ProRegTx payload (or use an existing helper that
converts a ProRegTx payload into a CDeterministicMN serialization) and pass that
serialized data to save_corpus_input instead of the full raw_tx; update
extract_masternode_list to reference extract_extra_payload_hex,
save_corpus_input, extract_masternode_list, and
dash_deterministic_mn_deserialize so the correct payload bytes are saved for
each target.
---
Duplicate comments:
In @.github/workflows/test-fuzz.yml:
- Line 25: Remove the unused workflow environment variable
CI_FAILFAST_TEST_LEAVE_DANGLING from the workflow definition: locate the
environment block where CI_FAILFAST_TEST_LEAVE_DANGLING is set and delete that
key (the variable name CI_FAILFAST_TEST_LEAVE_DANGLING) so the workflow no
longer defines an unused env var; before removing, grep for
CI_FAILFAST_TEST_LEAVE_DANGLING across the repo to confirm nothing references it
and then commit the simplified workflow.
In `@contrib/fuzz/seed_corpus_from_chain.py`:
- Line 41: The function save_corpus_input declares an unused parameter label=""
which triggers Ruff ARG001; remove the label parameter from the function
signature in save_corpus_input and update any callers that pass a label argument
to stop supplying it (or adjust them to only pass output_dir, target_name,
data_hex), and run tests/lint to ensure no remaining references to label remain.
ℹ️ Review info
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
.github/workflows/test-fuzz.ymlcontrib/fuzz/continuous_fuzz_daemon.shcontrib/fuzz/seed_corpus_from_chain.py
🚧 Files skipped from review as they are similar to previous changes (1)
- contrib/fuzz/continuous_fuzz_daemon.sh
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (6)
.github/workflows/test-fuzz.yml (2)
25-25:⚠️ Potential issue | 🟡 MinorUnused environment variable
CI_FAILFAST_TEST_LEAVE_DANGLING.This variable is set but never referenced anywhere in the workflow or any called scripts.
🔧 Proposed fix
-env: - CI_FAILFAST_TEST_LEAVE_DANGLING: 1 -🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/test-fuzz.yml at line 25, The environment variable CI_FAILFAST_TEST_LEAVE_DANGLING is defined in the workflow but never used; either remove the CI_FAILFAST_TEST_LEAVE_DANGLING entry from the workflow file or wire it into the code path that needs it (e.g., pass it into steps as env or reference it in called scripts/commands), ensuring references use the exact name CI_FAILFAST_TEST_LEAVE_DANGLING so there are no unused vars left in the workflow.
93-95:⚠️ Potential issue | 🟡 MinorBinary startup failure produces a vacuous pass.
|| truesuppresses genuine startup failures (missing shared libraries, wrong architecture, etc.). In that caseTARGETSis empty,TARGET_COUNTbecomes1(off-by-one fromecho "" | wc -l), the loop skips the single empty line, and the step exits 0 — reporting success with zero targets tested.🛡️ Proposed fix
TARGETS=$(PRINT_ALL_FUZZ_TARGETS_AND_ABORT=1 "$FUZZ_BIN" 2>&1 || true) + if [ -z "$TARGETS" ]; then + echo "ERROR: No fuzz targets found; fuzz binary may not have started correctly" + exit 1 + fi TARGET_COUNT=$(echo "$TARGETS" | wc -l)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/test-fuzz.yml around lines 93 - 95, The current line uses "TARGETS=$(PRINT_ALL_FUZZ_TARGETS_AND_ABORT=1 "$FUZZ_BIN" 2>&1 || true)" which masks startup failures and causes an empty output to be treated as a successful run; remove the "|| true" and instead check the command exit status and fail the step on non-zero to surface missing libraries/arch issues. Also compute TARGET_COUNT by counting non-empty lines in TARGETS (e.g., count lines that are not blank) so an empty output yields 0 targets rather than 1; refer to the variables/flags TARGETS, TARGET_COUNT, PRINT_ALL_FUZZ_TARGETS_AND_ABORT and the executable "$FUZZ_BIN" when making these changes.contrib/fuzz/continuous_fuzz_daemon.sh (2)
174-184:⚠️ Potential issue | 🟡 Minor
ASAN_OPTIONSstill overwrites any caller-supplied value.The inline assignment always sets a fixed string; users running with custom options like
abort_on_error=1orsymbolize=1will silently lose them.🛡️ Proposed fix
- ASAN_OPTIONS="detect_stack_use_after_return=1:check_initialization_order=1:strict_init_order=1:detect_leaks=0" \ + ASAN_OPTIONS="${ASAN_OPTIONS:+${ASAN_OPTIONS}:}detect_stack_use_after_return=1:check_initialization_order=1:strict_init_order=1:detect_leaks=0" \🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@contrib/fuzz/continuous_fuzz_daemon.sh` around lines 174 - 184, The inline assignment for ASAN_OPTIONS overwrites any caller-supplied value; change the invocation around "$FUZZ_BIN" so it preserves existing ASAN_OPTIONS by concatenating them with the defaults instead of replacing them (e.g. build ASAN_OPTIONS using the existing ASAN_OPTIONS variable if set, then append the defaults like detect_stack_use_after_return=1:check_initialization_order=1:strict_init_order=1:detect_leaks=0), and use that combined variable in the inline env assignment before invoking "$FUZZ_BIN" so options such as abort_on_error or symbolize provided by callers are retained.
195-197:⚠️ Potential issue | 🟠 Major
set -o pipefail+ zerogrepmatches aborts the daemon.When the log contains no matching lines,
grepexits with status 1. Withset -euo pipefail, the pipelinegrep ... | while IFS= read -r line; ...propagates that status 1, whichset -ethen turns into a daemon abort — even though the intent is simply "skip if nothing matches".🐛 Proposed fix
- grep -E "SUMMARY|ERROR|BINGO|crash-|timeout-|oom-" "$target_log" 2>/dev/null | while IFS= read -r line; do - log "CRASH" " $line" - done + grep -E "SUMMARY|ERROR|BINGO|crash-|timeout-|oom-" "$target_log" 2>/dev/null | while IFS= read -r line; do + log "CRASH" " $line" + done || true🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@contrib/fuzz/continuous_fuzz_daemon.sh` around lines 195 - 197, The pipeline using grep on "$target_log" can return exit status 1 when there are no matches, which with set -euo pipefail aborts the daemon; fix by testing for matches first and only running the pipeline when matches exist: use grep -Eq "SUMMARY|ERROR|BINGO|crash-|timeout-|oom-" "$target_log" 2>/dev/null && grep -E "SUMMARY|ERROR|BINGO|crash-|timeout-|oom-" "$target_log" 2>/dev/null | while IFS= read -r line; do log "CRASH" " $line"; done, so the while loop only runs when grep finds matches (references: "$target_log" and the grep ... | while IFS= read -r line; do log "CRASH" ... done pipeline).contrib/fuzz/seed_corpus_from_chain.py (2)
41-41:⚠️ Potential issue | 🟡 MinorUnused
labelparameter insave_corpus_input.
label=""is declared but never referenced in the function body. Either use it (e.g., include it in a debug🔧 Proposed fix (drop the parameter)
-def save_corpus_input(output_dir, target_name, data_hex, label=""): +def save_corpus_input(output_dir, target_name, data_hex):🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@contrib/fuzz/seed_corpus_from_chain.py` at line 41, The function save_corpus_input declares an unused parameter label="" which should be removed to avoid dead API surface; update the function signature in save_corpus_input to drop the label parameter and remove it from all call sites (or alternatively, if label is intended, thread it into the filename/logging within save_corpus_input such as appending to the output filename or including in debug output) so there are no unused parameters left; ensure references to save_corpus_input elsewhere match the new signature (or use the label inside the function if you choose that route).
274-299:⚠️ Potential issue | 🟠 Major
extract_masternode_listsaves the full raw transaction to payload-specific targets.
extract_special_txscorrectly callsextract_extra_payload_hex()to strip the extra-payload bytes before saving todash_proreg_tx_deserialize/dash_proreg_tx_roundtrip.extract_masternode_listbypasses that extraction and writes the entire raw transaction to the same targets, producing structurally mismatched corpus inputs that will generate only spurious parse errors at fuzz time.dash_deterministic_mn_deserializealmost certainly expectsCDeterministicMNbytes, not a full ProRegTx.🐛 Proposed fix
raw_tx = dash_cli("getrawtransaction", protx_hash, datadir=datadir) if raw_tx: - for target in ["dash_proreg_tx_deserialize", "dash_proreg_tx_roundtrip", - "dash_deterministic_mn_deserialize"]: - if save_corpus_input(output_dir, target, raw_tx): - saved += 1 + # Extract payload for proreg payload-specific targets + # Fetch full tx info to get extraPayloadSize + tx_info_str = dash_cli("getrawtransaction", protx_hash, "1", datadir=datadir) + extra_payload_size = 0 + if tx_info_str: + try: + tx_info = json.loads(tx_info_str) + extra_payload_size = int(tx_info.get("extraPayloadSize", 0)) + except (json.JSONDecodeError, TypeError, ValueError): + pass + proreg_payload_hex, err = extract_extra_payload_hex(raw_tx, extra_payload_size) + if proreg_payload_hex: + for target in ["dash_proreg_tx_deserialize", "dash_proreg_tx_roundtrip"]: + if save_corpus_input(output_dir, target, proreg_payload_hex): + saved += 1 + # NOTE: dash_deterministic_mn_deserialize expects CDeterministicMN bytes; + # verify the correct binary format and implement extraction separately.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@contrib/fuzz/seed_corpus_from_chain.py` around lines 274 - 299, The function extract_masternode_list currently writes the entire raw transaction to payload-specific fuzz targets, causing malformed inputs; instead, call the existing extract_extra_payload_hex(raw_tx) and save its returned extra-payload hex to the dash_proreg_tx_deserialize and dash_proreg_tx_roundtrip targets, and do NOT save the full raw_tx there; for dash_deterministic_mn_deserialize, do not save the full tx either—either extract the CDeterministicMN bytes (using the same extractor if it returns the deterministic-MN blob) or skip saving to that target until a proper extractor exists, and update the loop in extract_masternode_list to use the extracted payload variable for the appropriate targets.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/test-fuzz.yml:
- Around line 134-136: The corpus replay invocation using FUZZ="$target"
"$FUZZ_BIN" -rss_limit_mb=4000 "$corpus_dir" can run indefinitely because it
lacks a termination flag; modify the call that constructs the fuzzer command
(the line invoking FUZZ, "$FUZZ_BIN", and "$corpus_dir") to include a
termination option such as -runs=0 (or alternatively -max_total_time=<seconds>)
so libFuzzer will only replay the corpus and exit instead of entering unlimited
mutation mode.
---
Duplicate comments:
In @.github/workflows/test-fuzz.yml:
- Line 25: The environment variable CI_FAILFAST_TEST_LEAVE_DANGLING is defined
in the workflow but never used; either remove the
CI_FAILFAST_TEST_LEAVE_DANGLING entry from the workflow file or wire it into the
code path that needs it (e.g., pass it into steps as env or reference it in
called scripts/commands), ensuring references use the exact name
CI_FAILFAST_TEST_LEAVE_DANGLING so there are no unused vars left in the
workflow.
- Around line 93-95: The current line uses
"TARGETS=$(PRINT_ALL_FUZZ_TARGETS_AND_ABORT=1 "$FUZZ_BIN" 2>&1 || true)" which
masks startup failures and causes an empty output to be treated as a successful
run; remove the "|| true" and instead check the command exit status and fail the
step on non-zero to surface missing libraries/arch issues. Also compute
TARGET_COUNT by counting non-empty lines in TARGETS (e.g., count lines that are
not blank) so an empty output yields 0 targets rather than 1; refer to the
variables/flags TARGETS, TARGET_COUNT, PRINT_ALL_FUZZ_TARGETS_AND_ABORT and the
executable "$FUZZ_BIN" when making these changes.
In `@contrib/fuzz/continuous_fuzz_daemon.sh`:
- Around line 174-184: The inline assignment for ASAN_OPTIONS overwrites any
caller-supplied value; change the invocation around "$FUZZ_BIN" so it preserves
existing ASAN_OPTIONS by concatenating them with the defaults instead of
replacing them (e.g. build ASAN_OPTIONS using the existing ASAN_OPTIONS variable
if set, then append the defaults like
detect_stack_use_after_return=1:check_initialization_order=1:strict_init_order=1:detect_leaks=0),
and use that combined variable in the inline env assignment before invoking
"$FUZZ_BIN" so options such as abort_on_error or symbolize provided by callers
are retained.
- Around line 195-197: The pipeline using grep on "$target_log" can return exit
status 1 when there are no matches, which with set -euo pipefail aborts the
daemon; fix by testing for matches first and only running the pipeline when
matches exist: use grep -Eq "SUMMARY|ERROR|BINGO|crash-|timeout-|oom-"
"$target_log" 2>/dev/null && grep -E "SUMMARY|ERROR|BINGO|crash-|timeout-|oom-"
"$target_log" 2>/dev/null | while IFS= read -r line; do log "CRASH" " $line";
done, so the while loop only runs when grep finds matches (references:
"$target_log" and the grep ... | while IFS= read -r line; do log "CRASH" ...
done pipeline).
In `@contrib/fuzz/seed_corpus_from_chain.py`:
- Line 41: The function save_corpus_input declares an unused parameter label=""
which should be removed to avoid dead API surface; update the function signature
in save_corpus_input to drop the label parameter and remove it from all call
sites (or alternatively, if label is intended, thread it into the
filename/logging within save_corpus_input such as appending to the output
filename or including in debug output) so there are no unused parameters left;
ensure references to save_corpus_input elsewhere match the new signature (or use
the label inside the function if you choose that route).
- Around line 274-299: The function extract_masternode_list currently writes the
entire raw transaction to payload-specific fuzz targets, causing malformed
inputs; instead, call the existing extract_extra_payload_hex(raw_tx) and save
its returned extra-payload hex to the dash_proreg_tx_deserialize and
dash_proreg_tx_roundtrip targets, and do NOT save the full raw_tx there; for
dash_deterministic_mn_deserialize, do not save the full tx either—either extract
the CDeterministicMN bytes (using the same extractor if it returns the
deterministic-MN blob) or skip saving to that target until a proper extractor
exists, and update the loop in extract_masternode_list to use the extracted
payload variable for the appropriate targets.
ℹ️ Review info
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
.github/workflows/test-fuzz.ymlcontrib/fuzz/continuous_fuzz_daemon.shcontrib/fuzz/seed_corpus_from_chain.py
|
Consolidation update: I merged the open fuzz-target branches into this PR branch ( Included/superseded PRs:
Those PRs were closed as superseded, and all relevant fuzz sources are now on this branch. |
UdjinM6
left a comment
There was a problem hiding this comment.
pls squash fixes into original commits and rebase to get rid of "Merge remote...."
|
Good catch — you're right. I’m cleaning this up now by squashing the fixup commits into their originals and rebasing to remove the merge commits (including the "Merge remote..." ones). I’ll push a cleaned history shortly. |
Consolidate fuzz regression workflow, continuous fuzz daemon tooling, and Dash-specific fuzz target additions into a clean commit history for PR dashpay#7173 cleanup.
c80135d to
2c28e19
Compare
Btw, you squashed everything into one commit instead of squashing the fixup commits into their originals. |
61a2e13 to
c194d87
Compare
Consolidate fuzz regression workflow, continuous fuzz daemon tooling, and Dash-specific fuzz target additions into a clean commit history for PR dashpay#7173 cleanup.
|
@UdjinM6 re-checked current head
If you still see a blocker on the current head, please point me at the exact thread or failing job and I’ll fix it directly. Otherwise this should be ready for a fresh look. |
|
@thepastaclaw pls check d9afcdf |
|
@UdjinM6 checked New head: Included from that pass:
Targeted checks run locally on this head:
If there is still a blocker on the current head, please point me at the exact diff/thread/job and I’ll fix that directly. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0b28423a7d
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
0b28423 to
777e283
Compare
|
Addressed the current live block-seed review on head
Validation on this diff:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 777e283e3a
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
777e283 to
ad26d88
Compare
|
Addressed the current live synthetic-only corpus review on head What changed in
Targeted checks on this respin:
Current review-thread state on this head is back to |
UdjinM6
left a comment
There was a problem hiding this comment.
CI looks good https://github.com/UdjinM6/dash/actions/runs/24637960225 (pulled into develop in my repo to actually use the updated workflow in CI).
light ACK ad26d88
thepastaclaw
left a comment
There was a problem hiding this comment.
Code Review
Three valid findings remain, all in the new chain-corpus extractor: two blocking bugs where seed files are structurally incompatible with their fuzz targets (missing stream-version prefix; governance seeds are inner payload rather than full CGovernanceObject wire format), and one correctness issue where chain extraction silently requires -txindex despite already knowing the block hash. One docs nitpick about a misleading libFuzzer-process comment. All prior-SHA blockers (snapshot clear, stderr contamination, UBSAN_OPTIONS, numeric LLMQ types, runner pass-through, replay timeout, biased special-tx selection) are fixed at this SHA. Speculative/inherited-policy nits (SegWit parsing, AddressFromTag edge case, pull_request_target) dropped per verifier scope.
Reviewed commit: dc5e9fe
🔴 2 blocking | 🟡 1 suggestion(s) | 💬 1 nitpick(s)
🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.
In `contrib/fuzz/seed_corpus_from_chain.py`:
- [BLOCKING] lines 41-58: Corpus files for *_deserialize/*_roundtrip targets lack the required 4-byte stream-version prefix
The new `_deserialize` and `_roundtrip` Dash harnesses (src/test/fuzz/deserialize_dash.cpp:59-62, src/test/fuzz/roundtrip_dash.cpp:59-61) read a 4-byte protocol version out of the input before deserializing the object, via `ds >> version; ds.SetVersion(version);`. `save_corpus_input()` writes the raw object bytes verbatim, so for every seed the first four payload bytes get consumed as the version and the remaining bytes no longer align with the object's wire format. The result is that nearly every 'real chain' seed for these targets is rejected as a malformed input, and the new CI / continuous-fuzz setup mostly replays invalid data rather than the real serialized objects this PR claims to preserve. Prepend a fixed 4-byte little-endian protocol version (e.g. `PROTOCOL_VERSION` / `INIT_PROTO_VERSION`) to `raw_bytes` when `target_name` ends in `_deserialize` or `_roundtrip` before hashing and writing.
- [BLOCKING] lines 255-276: Governance seeds are the inner DataHex payload, not a serialized CGovernanceObject
`dash_governance_object_deserialize` and `dash_governance_object_roundtrip` deserialize a `CGovernanceObject`, whose `Serialize()` forwards to `Governance::Object` and emits `hashParent, revision, time, collateralHash, vchData, type, masternodeOutpoint, vchSig` (src/governance/common.h:70-84, src/governance/object.h:247-269). `gobject list all`'s `DataHex` field, however, is just `HexStr(vchData)` (src/governance/common.cpp:41-44) — the inner payload only. Feeding DataHex into those corpora produces seeds that cannot represent the real wire format: the first 32 payload bytes are mis-parsed as `hashParent`, and everything after is garbage. These targets will get ~zero useful coverage from chain data. Either build a synthetic `Governance::Object` around each DataHex, serialize it, and save the result, or only route DataHex into a target that genuinely accepts payload-only bytes.
- [SUGGESTION] lines 200-223: Chain extraction calls getrawtransaction without the known block hash, silently requiring -txindex
`extract_special_txs()` has already fetched `block_hash` at line 201 but then calls `dash_cli("getrawtransaction", txid, ...)` at line 221 with only the txid. For transactions outside the mempool, Dash Core's `getrawtransaction` requires `-txindex` unless a blockhash argument is supplied, so on a default node every historical special transaction silently fails and is skipped — producing a much smaller corpus than the daemon expects. The same omission is repeated for `protx_hash` in `extract_masternode_list()` at line 294. Pass the known `block_hash` as the third positional argument to `getrawtransaction` (and for the masternode path, locate the block for the protx tx rather than relying on txindex).
ad26d88 to
0a92dc6
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0a92dc61ef
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
Addressed the latest current-head follow-ups in
I also re-checked the older synthetic-only corpus concern: the current workflow already hard-fails when neither external qa-assets source is reachable, so it no longer passes on synthetic-only replay. |
thepastaclaw
left a comment
There was a problem hiding this comment.
Code Review
One valid suggestion verified: the new chain-seeded corpus workflow leaves the blockmerkleroot fuzz target without seeds even though it uses the same stream-version-prefixed input format as block_deserialize. Note: Codex's suggested fix as written would write unprefixed blobs; the target must also be added to _NON_DASH_STREAM_VERSION_TARGETS. No blocking issues; Claude produced no findings.
Reviewed commit: ad26d88
🟡 1 suggestion(s)
🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.
In `contrib/fuzz/seed_corpus_from_chain.py`:
- [SUGGESTION] lines 207-213: Chain-seeded block corpus skips the `blockmerkleroot` target
`extract_blocks()` only seeds `block_deserialize` and `block`, but `src/test/fuzz/deserialize.cpp:197-202` defines `blockmerkleroot` with the identical `DeserializeFromFuzzingInput(buffer, block)` path. That means `blockmerkleroot` still falls back to the empty-corpus smoke run in CI, leaving a real regression hole in the new replay workflow. The fix also requires adding `blockmerkleroot` to `_NON_DASH_STREAM_VERSION_TARGETS` at line 60 — otherwise `_needs_stream_version_prefix()` returns False for it and `save_corpus_input()` would write blobs without the 4-byte PROTOCOL_VERSION prefix that `DeserializeFromFuzzingInput` consumes, producing a corpus the harness cannot parse.
| # Get serialized block | ||
| block_hex = dash_cli("getblock", block_hash, "0", datadir=datadir) | ||
| if block_hex: | ||
| if save_corpus_input(output_dir, "block_deserialize", block_hex): | ||
| saved += 1 | ||
| if save_corpus_input(output_dir, "block", block_hex): | ||
| saved += 1 |
There was a problem hiding this comment.
🟡 Suggestion: Chain-seeded block corpus skips the blockmerkleroot target
extract_blocks() only seeds block_deserialize and block, but src/test/fuzz/deserialize.cpp:197-202 defines blockmerkleroot with the identical DeserializeFromFuzzingInput(buffer, block) path. That means blockmerkleroot still falls back to the empty-corpus smoke run in CI, leaving a real regression hole in the new replay workflow. The fix also requires adding blockmerkleroot to _NON_DASH_STREAM_VERSION_TARGETS at line 60 — otherwise _needs_stream_version_prefix() returns False for it and save_corpus_input() would write blobs without the 4-byte PROTOCOL_VERSION prefix that DeserializeFromFuzzingInput consumes, producing a corpus the harness cannot parse.
💡 Suggested change
| # Get serialized block | |
| block_hex = dash_cli("getblock", block_hash, "0", datadir=datadir) | |
| if block_hex: | |
| if save_corpus_input(output_dir, "block_deserialize", block_hex): | |
| saved += 1 | |
| if save_corpus_input(output_dir, "block", block_hex): | |
| saved += 1 | |
| # Get serialized block | |
| block_hex = dash_cli("getblock", block_hash, "0", datadir=datadir) | |
| if block_hex: | |
| if save_corpus_input(output_dir, "block_deserialize", block_hex): | |
| saved += 1 | |
| if save_corpus_input(output_dir, "block", block_hex): | |
| saved += 1 | |
| if save_corpus_input(output_dir, "blockmerkleroot", block_hex): | |
| saved += 1 |
source: ['codex']
🤖 Fix this with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.
In `contrib/fuzz/seed_corpus_from_chain.py`:
- [SUGGESTION] lines 207-213: Chain-seeded block corpus skips the `blockmerkleroot` target
`extract_blocks()` only seeds `block_deserialize` and `block`, but `src/test/fuzz/deserialize.cpp:197-202` defines `blockmerkleroot` with the identical `DeserializeFromFuzzingInput(buffer, block)` path. That means `blockmerkleroot` still falls back to the empty-corpus smoke run in CI, leaving a real regression hole in the new replay workflow. The fix also requires adding `blockmerkleroot` to `_NON_DASH_STREAM_VERSION_TARGETS` at line 60 — otherwise `_needs_stream_version_prefix()` returns False for it and `save_corpus_input()` would write blobs without the 4-byte PROTOCOL_VERSION prefix that `DeserializeFromFuzzingInput` consumes, producing a corpus the harness cannot parse.
There was a problem hiding this comment.
✅ Fixed in commit 505247c88483: Fixed: extract_blocks now writes block hex to blockmerkleroot alongside block_deserialize and block (contrib/fuzz/seed_corpus_from_chain.py:381-386).
thepastaclaw
left a comment
There was a problem hiding this comment.
Code Review
This SHA meaningfully expands Dash fuzzing infrastructure, but two of the new corpus-generation paths still miss the targets they were added to support. The result is weaker real-world fuzz regression coverage than the PR description suggests, even though the harness additions themselves look directionally good.
Reviewed commit: f366d17
🟡 2 suggestion(s)
🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.
In `contrib/fuzz/seed_corpus_from_chain.py`:
- [SUGGESTION] lines 308-310: Governance corpus extraction saves only `DataHex`, not a serialized `Governance::Object`
`extract_governance_objects()` writes `gobject list all`'s `DataHex` straight into the `dash_governance_object_common_deserialize` corpus. But that fuzz target deserializes a full `Governance::Object` (`src/test/fuzz/deserialize_dash.cpp:252-254`), whose wire format includes `hashParent`, `revision`, `time`, `collateralHash`, `vchData`, `type`, `masternodeOutpoint`, and `vchSig` (`src/governance/common.h:70-83`). Feeding only the payload bytes means the generated corpus entries are not valid serialized governance objects, so this extraction path does not give meaningful regression coverage to the target it is named for.
- [SUGGESTION] lines 451-462: `extract_quorum_info()` never seeds the new Dash quorum fuzz targets
The quorum extractor claims to focus on quorum data for Dash-specific fuzzing, but it only fetches `quorum info`, reads `minedBlock`, and saves the full block hex into `block_deserialize`. Those block seeds are already produced by `extract_blocks()`, while none of the new Dash quorum targets added in this PR (for example `dash_quorum_snapshot_deserialize` in `src/test/fuzz/deserialize_dash.cpp:245`) receive any corpus from this path. As written, the new quorum-specific extraction code does not actually improve coverage for the quorum-specific targets it was added to support.
| data_hex = obj_data.get("DataHex", "") | ||
| if data_hex: | ||
| if save_corpus_input(output_dir, "dash_governance_object_common_deserialize", data_hex): |
There was a problem hiding this comment.
🟡 Suggestion: Governance corpus extraction saves only DataHex, not a serialized Governance::Object
extract_governance_objects() writes gobject list all's DataHex straight into the dash_governance_object_common_deserialize corpus. But that fuzz target deserializes a full Governance::Object (src/test/fuzz/deserialize_dash.cpp:252-254), whose wire format includes hashParent, revision, time, collateralHash, vchData, type, masternodeOutpoint, and vchSig (src/governance/common.h:70-83). Feeding only the payload bytes means the generated corpus entries are not valid serialized governance objects, so this extraction path does not give meaningful regression coverage to the target it is named for.
source: ['codex']
🤖 Fix this with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.
In `contrib/fuzz/seed_corpus_from_chain.py`:
- [SUGGESTION] lines 308-310: Governance corpus extraction saves only `DataHex`, not a serialized `Governance::Object`
`extract_governance_objects()` writes `gobject list all`'s `DataHex` straight into the `dash_governance_object_common_deserialize` corpus. But that fuzz target deserializes a full `Governance::Object` (`src/test/fuzz/deserialize_dash.cpp:252-254`), whose wire format includes `hashParent`, `revision`, `time`, `collateralHash`, `vchData`, `type`, `masternodeOutpoint`, and `vchSig` (`src/governance/common.h:70-83`). Feeding only the payload bytes means the generated corpus entries are not valid serialized governance objects, so this extraction path does not give meaningful regression coverage to the target it is named for.
| qinfo_str = dash_cli("quorum", "info", numeric_type, qhash, datadir=datadir) | ||
| if not qinfo_str: | ||
| continue | ||
| try: | ||
| qinfo = json.loads(qinfo_str) | ||
| except json.JSONDecodeError: | ||
| continue | ||
| # Extract the commitment tx if available | ||
| mining_hash = qinfo.get("minedBlock", "") | ||
| if mining_hash: | ||
| block_hex = dash_cli("getblock", mining_hash, "0", datadir=datadir) | ||
| if block_hex and save_corpus_input(output_dir, "block_deserialize", block_hex): |
There was a problem hiding this comment.
🟡 Suggestion: extract_quorum_info() never seeds the new Dash quorum fuzz targets
The quorum extractor claims to focus on quorum data for Dash-specific fuzzing, but it only fetches quorum info, reads minedBlock, and saves the full block hex into block_deserialize. Those block seeds are already produced by extract_blocks(), while none of the new Dash quorum targets added in this PR (for example dash_quorum_snapshot_deserialize in src/test/fuzz/deserialize_dash.cpp:245) receive any corpus from this path. As written, the new quorum-specific extraction code does not actually improve coverage for the quorum-specific targets it was added to support.
source: ['codex']
🤖 Fix this with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.
In `contrib/fuzz/seed_corpus_from_chain.py`:
- [SUGGESTION] lines 451-462: `extract_quorum_info()` never seeds the new Dash quorum fuzz targets
The quorum extractor claims to focus on quorum data for Dash-specific fuzzing, but it only fetches `quorum info`, reads `minedBlock`, and saves the full block hex into `block_deserialize`. Those block seeds are already produced by `extract_blocks()`, while none of the new Dash quorum targets added in this PR (for example `dash_quorum_snapshot_deserialize` in `src/test/fuzz/deserialize_dash.cpp:245`) receive any corpus from this path. As written, the new quorum-specific extraction code does not actually improve coverage for the quorum-specific targets it was added to support.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 11dedc48f6
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if FUZZ="$target" timeout 3600 "$FUZZ_BIN" \ | ||
| -rss_limit_mb=4000 \ | ||
| -runs=0 \ | ||
| -artifact_prefix="$artifact_prefix" \ |
There was a problem hiding this comment.
Bound corpus replay runs to avoid timeout-based false failures
In the corpus replay branch, the fuzz command is invoked with -runs=0 under timeout 3600. For libFuzzer, -runs=0 means unbounded execution, so targets keep fuzzing instead of just replaying corpus inputs and the outer timeout eventually terminates them; GNU timeout then exits with code 124, which this script treats as a target failure. This makes regression runs fail for any non-empty corpus target after an hour rather than producing deterministic replay results.
Useful? React with 👍 / 👎.
thepastaclaw
left a comment
There was a problem hiding this comment.
Code Review
The PR’s Dash fuzzing additions are generally solid, and I did not find any consensus or build-breaking defects in the new targets themselves. The confirmed issues are all in the surrounding fuzz infrastructure: two bugs in the continuous daemon that misreport results, one CI reproducibility problem, and a few seeding gaps that reduce the value of the new corpus generation.
Reviewed commit: d41ed26
🟡 4 suggestion(s) | 💬 2 nitpick(s)
🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.
In `contrib/fuzz/continuous_fuzz_daemon.sh`:
- [SUGGESTION] lines 197-209: Per-target crash reporting is poisoned by old artifacts
`run_target()` reports a crash whenever `find "$target_crashes"` sees any `crash-*`, `timeout-*`, or `oom-*` file after the run. That directory persists across cycles, so one historical artifact makes every later invocation of the same target look like a fresh failure even when the current run was clean. In daemon mode this produces repeated false positives and destroys the signal the script is supposed to provide. Snapshot the artifact count before each target run and report only newly created files, or isolate each run in a fresh artifact directory.
- [SUGGESTION] lines 217-223: `--single-cycle` ignores non-artifact target failures
`run_target()` captures a nonzero libFuzzer exit code in `exit_code`, logs it, and then unconditionally returns success. `main()` decides whether `--single-cycle` should fail only from the change in crash-artifact count. If a target exits nonzero without writing a `crash-*` file, the script still exits 0, which breaks the advertised cron/single-cycle regression use case. The cycle result needs to incorporate per-target exit status, not just artifact creation.
In `.github/workflows/test-fuzz.yml`:
- [SUGGESTION] lines 51-67: The regression workflow is not reproducible because it pulls corpus repos at moving HEAD
The workflow clones `bitcoin-core/qa-assets` and `dashpay/qa-assets` with `--depth=1` and uses whatever is on their default branches at runtime. That means the same Dash commit can pass or fail later solely because an external corpus changed, which makes regressions hard to bisect and reruns hard to trust. A gating fuzz-regression job should pin those corpus sources to explicit SHAs or versioned artifacts and log the resolved revisions.
In `contrib/fuzz/seed_corpus_from_chain.py`:
- [SUGGESTION] lines 41-56: The corpus seeder duplicates `PROTOCOL_VERSION` without any sync check
`STREAM_VERSION = 70240` must stay equal to `src/version.h`'s `PROTOCOL_VERSION`, and several harnesses consume that prefix before deserializing the seed. Right now the script only documents that requirement; nothing enforces it. After the next protocol bump this script can silently emit stale seeds that no longer exercise the intended deserialize paths. Read the value from `src/version.h` or add a check that fails when the constants diverge.
| def extract_quorum_info(output_dir, datadir=None): | ||
| """Extract quorum-related data from the chain. | ||
|
|
||
| Note: quorum snapshot deserialize targets expect binary-serialized | ||
| CQuorumSnapshot data, not JSON. We extract final commitment transactions | ||
| from blocks instead, which are already captured by extract_special_txs() | ||
| for type 6 (TRANSACTION_QUORUM_COMMITMENT). This function focuses on | ||
| extracting quorum memberof data as raw bytes for other quorum targets. | ||
| """ | ||
| print("Extracting quorum data...") | ||
| result = dash_cli("quorum", "list", datadir=datadir) | ||
| if not result: | ||
| return 0 | ||
|
|
||
| # quorum info expects a numeric llmqType, but quorum list returns string keys | ||
| llmq_type_map = { | ||
| "llmq_50_60": "1", | ||
| "llmq_400_60": "2", | ||
| "llmq_400_85": "3", | ||
| "llmq_100_67": "4", | ||
| "llmq_60_75": "5", | ||
| "llmq_25_67": "6", | ||
| "llmq_test": "100", | ||
| "llmq_devnet": "101", | ||
| "llmq_test_v17": "102", | ||
| "llmq_test_dip0024": "103", | ||
| "llmq_test_instantsend": "104", | ||
| "llmq_devnet_dip0024": "105", | ||
| "llmq_test_platform": "106", | ||
| "llmq_devnet_platform": "107", | ||
| } | ||
|
|
||
| saved = 0 | ||
| try: | ||
| quorum_list = json.loads(result) | ||
| for qtype, hashes in quorum_list.items(): | ||
| numeric_type = llmq_type_map.get(qtype) | ||
| if numeric_type is None: | ||
| print(f"WARNING: Unknown quorum type '{qtype}', skipping", file=sys.stderr) | ||
| continue | ||
| for qhash in hashes[:5]: # Limit per type | ||
| # Get the quorum commitment transaction via selectquorum | ||
| # which gives us the quorumHash we can look up in blocks | ||
| qinfo_str = dash_cli("quorum", "info", numeric_type, qhash, datadir=datadir) | ||
| if not qinfo_str: | ||
| continue | ||
| try: | ||
| qinfo = json.loads(qinfo_str) | ||
| except json.JSONDecodeError: | ||
| continue | ||
| # Extract the commitment tx if available | ||
| mining_hash = qinfo.get("minedBlock", "") | ||
| if mining_hash: | ||
| block_hex = dash_cli("getblock", mining_hash, "0", datadir=datadir) | ||
| if block_hex and save_corpus_input(output_dir, "block_deserialize", block_hex): | ||
| saved += 1 | ||
| except (json.JSONDecodeError, AttributeError): | ||
| pass | ||
|
|
||
| print(f" Saved {saved} quorum corpus inputs") |
There was a problem hiding this comment.
💬 Nitpick: extract_quorum_info() does not seed the quorum deserialize targets it claims to support
The function comment says it focuses on quorum-specific data for other quorum targets, but the implementation only saves mined blocks into block_deserialize. It never writes inputs for dash_quorum_snapshot_deserialize, dash_quorum_data_request_deserialize, or dash_get_quorum_rotation_info_deserialize, so those targets still rely entirely on synthetic data or empty corpora. Either narrow the function comment to match what it actually does or extend it to emit serialized quorum objects for the existing Dash quorum fuzz targets.
source: ['claude']
There was a problem hiding this comment.
✅ Fixed in commit 505247c88483: Fixed: extract_quorum_info (contrib/fuzz/seed_corpus_from_chain.py:606-856) now writes seeds for dash_quorum_data_request_*, dash_quorum_snapshot_*, and dash_get_quorum_rotation_info_* using formats matching the C++ wire definitions in src/llmq/quorums.h and src/llmq/snapshot.h.
|
|
||
| CMutableTransaction tx; | ||
| tx.nVersion = CTransaction::SPECIAL_VERSION; | ||
| tx.nType = TRANSACTION_ASSET_LOCK; | ||
|
|
||
| std::vector<CTxOut> credit_outputs; | ||
| const size_t num_credit_outputs = fuzzed_data_provider.ConsumeIntegralInRange<size_t>(1, 4); | ||
| CAmount credit_outputs_total{0}; |
There was a problem hiding this comment.
💬 Nitpick: The asset-lock fuzz target never reaches the MoneyRange boundary it asserts
Each credit output is bounded to [1, 10 * COIN] and there are at most four of them, so credit_outputs_total never exceeds 40 * COIN. Under those bounds assert(MoneyRange(credit_outputs_total + amount)) is always true, which leaves the accumulation boundary entirely untested in this target. If the intent is to exercise overflow and max-money edges in the valid-path builder, the amount or output-count ranges need to be widened.
source: ['claude']
|
Addressed the latest automated fuzz-infra follow-up on What changed:
Verification run locally:
|
thepastaclaw
left a comment
There was a problem hiding this comment.
Code Review
I verified the reported fuzzing and CI issues against the checked-out 11dedc48f6743c69bc5d291b45cd22030ebf7d58 tree. Four infrastructure findings and one low-severity fuzz-coverage finding are real in the current code; the CQuorumRotationInfo::Unserialize report should be dropped because, in this tree, there is no verified production deserialization path that reuses an existing CQuorumRotationInfo instance.
Reviewed commit: 11dedc4
🟡 4 suggestion(s) | 💬 1 nitpick(s)
🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.
In `contrib/fuzz/continuous_fuzz_daemon.sh`:
- [SUGGESTION] lines 197-209: Per-target crash reporting counts historical artifacts as new failures
`run_target()` scans the entire persistent `${CRASHES_DIR}/${target}` directory after each run and logs that the target "produced" that many artifacts. Once a target has left behind any `crash-*`, `timeout-*`, or `oom-*` file, every later clean run is reported as another crash because this code never snapshots the pre-run count or isolates artifacts per invocation. That makes the daemon's per-target signal inaccurate even though the cycle-level summary correctly computes a delta.
- [SUGGESTION] lines 217-223: `--single-cycle` still exits 0 when a target fails without writing an artifact
`run_target()` records a nonzero libFuzzer exit status in `exit_code`, but always returns success. `main()` then decides the `--single-cycle` process status only from the change in artifact count. A target that aborts before libFuzzer writes `crash-*`/`timeout-*`/`oom-*` files still makes the whole daemon exit successfully, which breaks the documented cron/regression use case for `--single-cycle`.
In `.github/workflows/test-fuzz.yml`:
- [SUGGESTION] lines 62-80: The regression workflow is not reproducible because it clones moving corpus heads
The job pulls `bitcoin-core/qa-assets` and `dashpay/qa-assets` with `--depth=1` from their default branches at runtime. That means the same Dash commit can pass or fail later with no change in this repository, purely because an external corpus repository advanced. For a gating regression workflow, that breaks reruns and bisection; the corpora need to be pinned to explicit revisions and those revisions should be logged.
In `contrib/fuzz/seed_corpus_from_chain.py`:
- [SUGGESTION] lines 41-56: The corpus seeder hard-codes `PROTOCOL_VERSION` without any sync check
`STREAM_VERSION = 70240` is documented as needing to match `src/version.h`'s `PROTOCOL_VERSION`, and the current tree does in fact define `PROTOCOL_VERSION` as `70240`. The problem is that nothing enforces that relationship. After the next protocol bump, this script can silently keep prepending a stale stream-version prefix to seeds for the deserialize/roundtrip harnesses, reducing coverage while still appearing to generate valid corpus entries.
| const size_t num_credit_outputs = fuzzed_data_provider.ConsumeIntegralInRange<size_t>(1, 4); | ||
| CAmount credit_outputs_total{0}; | ||
| for (size_t i = 0; i < num_credit_outputs; ++i) { | ||
| const CAmount amount = fuzzed_data_provider.ConsumeIntegralInRange<CAmount>(1, 10 * COIN); | ||
| assert(MoneyRange(amount)); | ||
| assert(MoneyRange(credit_outputs_total + amount)); |
There was a problem hiding this comment.
💬 Nitpick: The MoneyRange accumulation assertion is unreachable under the harness's own bounds
This harness limits num_credit_outputs to 1..4 and each amount to 1..10 * COIN, so credit_outputs_total + amount never exceeds 40 * COIN. Under those constraints assert(MoneyRange(credit_outputs_total + amount)) is always true, so the harness does not exercise any accumulation edge around MAX_MONEY; the assertion is effectively dead coverage unless the amount or count bounds are widened.
source: ['claude', 'codex']
thepastaclaw
left a comment
There was a problem hiding this comment.
Code Review
The two prior findings on seed_corpus_from_chain.py are resolved on this head: governance objects are now serialized as full Governance::Objects and quorum extraction fills the new Dash quorum target directories. One remaining suggestion: several newly added Dash fuzz targets (final_commitment, governance vote types, vote_instance, vote_rec, governance_vote_file, bls_ies, coinjoin_entry) get no chain-derived or synthetic seeds, so the CI replay step falls back to a 10-second empty-corpus smoke run for them.
Reviewed commit: 505247c
🟡 1 suggestion(s)
🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.
In `contrib/fuzz/seed_corpus_from_chain.py`:
- [SUGGESTION] lines 402-510: Several newly added Dash fuzz targets get no seed corpus in either live or synthetic mode
A number of Dash fuzz targets added in this PR never receive any chain-derived or synthetic seeds, so when `test-fuzz.yml` runs them it hits the empty-corpus fallback at .github/workflows/test-fuzz.yml:159-171 and only does a 10-second smoke run instead of a structured regression replay.
Live-mode gaps in `extract_special_txs`/`extract_governance_objects`:
- The special-tx `type_map` (lines 403-413) maps tx type 6 only to `dash_final_commitment_tx_payload_*`. The separate `dash_final_commitment_*` targets at src/test/fuzz/deserialize_dash.cpp:188-191 and src/test/fuzz/roundtrip_dash.cpp:148-151 deserialize a bare `llmq::CFinalCommitment`, not a tx payload, and never receive seeds.
- `extract_governance_objects` only writes `dash_governance_object_*` (lines 494-498). The new vote-related targets registered at src/test/fuzz/deserialize_dash.cpp:260-274 (`dash_governance_vote_*`, `dash_vote_instance_*`, `dash_vote_rec_*`, `dash_governance_vote_file_*`) get no chain-derived seeds.
Synthetic-only gaps in `create_synthetic_seeds` (lines 869-898): the offline path only synthesizes seeds for `dash_coinjoin_accept`, `dash_coinjoin_queue`, `dash_coinjoin_status_update`, `dash_recovered_sig`, `dash_sig_ses_ann`, `dash_sig_share`, `dash_mnauth`, and `dash_dkg_complaint`, then clones each into its `_roundtrip` sibling. That leaves most new Dash targets (the ones above plus `dash_bls_ies_*`, `dash_coinjoin_entry_*`, etc.) with no corpus at all when a chain is unavailable.
Net effect: the regression workflow degrades to startup smoke tests for a large fraction of the new Dash targets unless an external corpus repo happens to provide them. Adding extractor entries (e.g. mapping tx type 6 to both final_commitment variants, deriving vote seeds from available governance vote RPC output) and broadening the synthetic dict to cover the remaining new targets would close the gap.
| # Map special tx types to fuzz target names | ||
| type_map = { | ||
| 1: "dash_proreg_tx", # ProRegTx | ||
| 2: "dash_proupserv_tx", # ProUpServTx | ||
| 3: "dash_proupreg_tx", # ProUpRegTx | ||
| 4: "dash_prouprev_tx", # ProUpRevTx | ||
| 5: "dash_cbtx", # CbTx (coinbase) | ||
| 6: "dash_final_commitment_tx_payload", # Quorum commitment | ||
| 7: "dash_mnhf_tx_payload", # MN HF signal | ||
| 8: "dash_asset_lock_payload", # Asset Lock | ||
| 9: "dash_asset_unlock_payload", # Asset Unlock | ||
| } | ||
|
|
||
| for h in range(max(0, height - count), height + 1): | ||
| block_hash = dash_cli("getblockhash", str(h), datadir=datadir) | ||
| if not block_hash: | ||
| continue | ||
|
|
||
| block_json = dash_cli("getblock", block_hash, "2", datadir=datadir) | ||
| if not block_json: | ||
| continue | ||
|
|
||
| try: | ||
| block = json.loads(block_json) | ||
| except json.JSONDecodeError: | ||
| continue | ||
|
|
||
| for tx in block.get("tx", []): | ||
| tx_type = tx.get("type", 0) | ||
| if tx_type == 0: | ||
| continue | ||
|
|
||
| # Get raw transaction | ||
| txid = tx.get("txid", "") | ||
| raw_tx = dash_cli("getrawtransaction", txid, "false", block_hash, datadir=datadir) | ||
| if not raw_tx: | ||
| continue | ||
|
|
||
| # Save full transaction | ||
| if save_corpus_input(output_dir, "decode_tx", raw_tx): | ||
| saved += 1 | ||
|
|
||
| # Extract special payload if we know the target | ||
| extra_payload_size = tx.get("extraPayloadSize", 0) | ||
| try: | ||
| extra_payload_size = int(extra_payload_size) | ||
| except (TypeError, ValueError): | ||
| extra_payload_size = 0 | ||
|
|
||
| if extra_payload_size > 0 and tx_type in type_map: | ||
| payload_hex, err = extract_extra_payload_hex(raw_tx, extra_payload_size) | ||
| if not payload_hex: | ||
| print( | ||
| f"WARNING: Skipping special payload for tx {txid}: {err}", | ||
| file=sys.stderr, | ||
| ) | ||
| continue | ||
|
|
||
| target = type_map[tx_type] | ||
| # Save payload bytes for both deserialize and roundtrip variants. | ||
| for suffix in ["_deserialize", "_roundtrip"]: | ||
| if save_corpus_input(output_dir, f"{target}{suffix}", payload_hex): | ||
| saved += 1 | ||
|
|
||
| print(f" Saved {saved} special transaction corpus inputs") | ||
| return saved | ||
|
|
||
|
|
||
| def extract_governance_objects(output_dir, datadir=None): | ||
| """Extract governance objects (proposals, triggers) into structurally valid seeds. | ||
|
|
||
| The fuzz target deserializes a full Governance::Object, not raw payload bytes, | ||
| so we reconstruct the object from the RPC fields (CollateralHash, CreationTime, | ||
| ObjectType, SigningMasternode, DataHex) and synthesize best-effort defaults for | ||
| fields the RPC doesn't expose (hashParent, revision, vchSig). CGovernanceObject's | ||
| non-disk serialization is just ``m_obj`` (see src/governance/object.h:248) so the | ||
| same bytes are valid seeds for dash_governance_object_{deserialize,roundtrip}. | ||
| """ | ||
| print("Extracting governance objects...") | ||
| result = dash_cli("gobject", "list", "all", datadir=datadir) | ||
| if not result: | ||
| return 0 | ||
|
|
||
| saved = 0 | ||
| try: | ||
| objects = json.loads(result) | ||
| except (json.JSONDecodeError, AttributeError): | ||
| return 0 | ||
|
|
||
| if not isinstance(objects, dict): | ||
| return 0 | ||
|
|
||
| targets = [ | ||
| "dash_governance_object_common_deserialize", | ||
| "dash_governance_object_deserialize", | ||
| "dash_governance_object_roundtrip", | ||
| ] | ||
|
|
||
| for obj_hash, obj_data in objects.items(): | ||
| if not isinstance(obj_data, dict): | ||
| continue | ||
| try: | ||
| serialized = serialize_governance_object(obj_data) | ||
| except (ValueError, OverflowError) as e: | ||
| print(f"WARNING: Skipping governance object {obj_hash}: {e}", file=sys.stderr) | ||
| continue | ||
| seed_hex = serialized.hex() | ||
| for target in targets: | ||
| if save_corpus_input(output_dir, target, seed_hex): |
There was a problem hiding this comment.
🟡 Suggestion: Several newly added Dash fuzz targets get no seed corpus in either live or synthetic mode
A number of Dash fuzz targets added in this PR never receive any chain-derived or synthetic seeds, so when test-fuzz.yml runs them it hits the empty-corpus fallback at .github/workflows/test-fuzz.yml:159-171 and only does a 10-second smoke run instead of a structured regression replay.
Live-mode gaps in extract_special_txs/extract_governance_objects:
- The special-tx
type_map(lines 403-413) maps tx type 6 only todash_final_commitment_tx_payload_*. The separatedash_final_commitment_*targets at src/test/fuzz/deserialize_dash.cpp:188-191 and src/test/fuzz/roundtrip_dash.cpp:148-151 deserialize a barellmq::CFinalCommitment, not a tx payload, and never receive seeds. extract_governance_objectsonly writesdash_governance_object_*(lines 494-498). The new vote-related targets registered at src/test/fuzz/deserialize_dash.cpp:260-274 (dash_governance_vote_*,dash_vote_instance_*,dash_vote_rec_*,dash_governance_vote_file_*) get no chain-derived seeds.
Synthetic-only gaps in create_synthetic_seeds (lines 869-898): the offline path only synthesizes seeds for dash_coinjoin_accept, dash_coinjoin_queue, dash_coinjoin_status_update, dash_recovered_sig, dash_sig_ses_ann, dash_sig_share, dash_mnauth, and dash_dkg_complaint, then clones each into its _roundtrip sibling. That leaves most new Dash targets (the ones above plus dash_bls_ies_*, dash_coinjoin_entry_*, etc.) with no corpus at all when a chain is unavailable.
Net effect: the regression workflow degrades to startup smoke tests for a large fraction of the new Dash targets unless an external corpus repo happens to provide them. Adding extractor entries (e.g. mapping tx type 6 to both final_commitment variants, deriving vote seeds from available governance vote RPC output) and broadening the synthetic dict to cover the remaining new targets would close the gap.
source: ['codex']
🤖 Fix this with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.
In `contrib/fuzz/seed_corpus_from_chain.py`:
- [SUGGESTION] lines 402-510: Several newly added Dash fuzz targets get no seed corpus in either live or synthetic mode
A number of Dash fuzz targets added in this PR never receive any chain-derived or synthetic seeds, so when `test-fuzz.yml` runs them it hits the empty-corpus fallback at .github/workflows/test-fuzz.yml:159-171 and only does a 10-second smoke run instead of a structured regression replay.
Live-mode gaps in `extract_special_txs`/`extract_governance_objects`:
- The special-tx `type_map` (lines 403-413) maps tx type 6 only to `dash_final_commitment_tx_payload_*`. The separate `dash_final_commitment_*` targets at src/test/fuzz/deserialize_dash.cpp:188-191 and src/test/fuzz/roundtrip_dash.cpp:148-151 deserialize a bare `llmq::CFinalCommitment`, not a tx payload, and never receive seeds.
- `extract_governance_objects` only writes `dash_governance_object_*` (lines 494-498). The new vote-related targets registered at src/test/fuzz/deserialize_dash.cpp:260-274 (`dash_governance_vote_*`, `dash_vote_instance_*`, `dash_vote_rec_*`, `dash_governance_vote_file_*`) get no chain-derived seeds.
Synthetic-only gaps in `create_synthetic_seeds` (lines 869-898): the offline path only synthesizes seeds for `dash_coinjoin_accept`, `dash_coinjoin_queue`, `dash_coinjoin_status_update`, `dash_recovered_sig`, `dash_sig_ses_ann`, `dash_sig_share`, `dash_mnauth`, and `dash_dkg_complaint`, then clones each into its `_roundtrip` sibling. That leaves most new Dash targets (the ones above plus `dash_bls_ies_*`, `dash_coinjoin_entry_*`, etc.) with no corpus at all when a chain is unavailable.
Net effect: the regression workflow degrades to startup smoke tests for a large fraction of the new Dash targets unless an external corpus repo happens to provide them. Adding extractor entries (e.g. mapping tx type 6 to both final_commitment variants, deriving vote seeds from available governance vote RPC output) and broadening the synthetic dict to cover the remaining new targets would close the gap.
There was a problem hiding this comment.
Fixed in 93ff82d. seed_corpus_from_chain.py now: (1) emits bare dash_final_commitment_{deserialize,roundtrip} seeds from type-6 special-tx payloads, (2) derives governance vote / vote_instance / vote_rec / vote_file seeds from gobject getcurrentvotes, and (3) broadens offline synthetic coverage for the previously empty Dash targets called out here (including the final-commitment, governance-vote, BLS IES, and CoinJoin-entry families). Verified with py_compile plus targeted helper/synthetic-seed checks.
505247c to
93ff82d
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 93ff82d3f7
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| ((1).to_bytes(2, "little") + _serialize_uint32(0) + minimal_final_commitment).hex(), | ||
| ], | ||
| "dash_recovered_sig_deserialize": [ | ||
| "64" + "00" * 32 + "00" * 32 + "00" * 96, # llmqType + quorumHash + id + sig |
There was a problem hiding this comment.
Include missing fields in synthetic LLMQ signing seeds
CRecoveredSig serialization requires llmqType, quorumHash, id, msgHash, and sig (see src/llmq/signing.h), but this seed omits msgHash; the adjacent dash_sig_ses_ann_deserialize and dash_sig_share_deserialize entries also use fixed-width layouts that do not match CSigSesAnn/CSigShare serialization (VARINT sessionId plus additional hash fields in src/llmq/signing_shares.h). In synthetic-corpus regression runs, these inputs fail deserialization early and do not exercise the intended signing/share code paths.
Useful? React with 👍 / 👎.
| ], | ||
| # DKG messages | ||
| "dash_dkg_complaint_deserialize": [ | ||
| "64" + "00" * 32 + "00" * 32 + "0000" + "00", # minimal |
There was a problem hiding this comment.
Emit full BLS signature bytes for DKG complaint seeds
CDKGComplaint serialization ends with a fixed-size BLS signature after the two dynbitsets (src/llmq/dkgsession.h), so a minimal valid payload still needs 96 signature bytes. This synthetic seed appends only a single byte after the bitsets, making it structurally truncated; dash_dkg_complaint_* corpus replay therefore mostly hits immediate parse failure instead of complaint-handling logic.
Useful? React with 👍 / 👎.
thepastaclaw
left a comment
There was a problem hiding this comment.
Code Review
I re-validated the prior and current findings against the checked-out head 93ff82d3f7e2689df745623b041ebdcaaacd6c15. Two blocking issues still apply on this head: the fuzz regression workflow still runs open-ended libFuzzer sessions instead of a deterministic corpus replay, and createwallet still rejects explicit legacy-wallet requests when descriptors=false is passed without load_on_startup. The earlier governance corpus-seeding issue is resolved on this head because the generator now serializes full Governance::Object instances instead of writing raw DataHex payloads.
Reviewed commit: 93ff82d
🔴 2 blocking
1 additional finding
🔴 blocking: `createwallet` rejects explicit legacy-wallet calls whenever `descriptors` is set without `load_on_startup`
src/wallet/rpc/wallet.cpp (lines 679-682)
The new guard throws as soon as parameter 5 is present and parameter 6 is omitted, before checking whether descriptors was actually set to true. As a result, createwallet ... descriptors=false now fails even though the RPC help still documents descriptors=false as the way to request a legacy wallet and that call is otherwise unambiguous. The functional test framework masks this by supplying load_on_startup, but direct RPC and CLI callers still hit an unnecessary API break.
💡 Suggested change
if (!request.params[5].isNull() && request.params[5].get_bool() && request.params[6].isNull()) {
throw JSONRPCError(RPC_INVALID_PARAMETER, "The createwallet RPC requires specifying the 'load_on_startup' flag when creating descriptor wallets. Dash Core v21 introduced this requirement due to breaking changes in the createwallet RPC.");
}
if (request.params[5].isNull() || request.params[5].get_bool()) {
🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.
In `.github/workflows/test-fuzz.yml`:
- [BLOCKING] lines 184-190: Fuzz regression step still starts unbounded fuzzing instead of replaying the saved corpus once
This block is labeled as a corpus regression replay, but it invokes libFuzzer with the corpus directory and `-runs=0`. In libFuzzer that means there is no execution cap, so after replaying the existing inputs it immediately switches into normal mutation fuzzing until the outer `timeout 3600` kills it. That makes this CI job nondeterministic and much slower than intended, and it stops serving as a reliable regression check tied to the committed corpus.
In `src/wallet/rpc/wallet.cpp`:
- [BLOCKING] lines 679-682: `createwallet` rejects explicit legacy-wallet calls whenever `descriptors` is set without `load_on_startup`
The new guard throws as soon as parameter 5 is present and parameter 6 is omitted, before checking whether `descriptors` was actually set to `true`. As a result, `createwallet ... descriptors=false` now fails even though the RPC help still documents `descriptors=false` as the way to request a legacy wallet and that call is otherwise unambiguous. The functional test framework masks this by supplying `load_on_startup`, but direct RPC and CLI callers still hit an unnecessary API break.
| # Run corpus regression (replay all inputs) | ||
| echo "::group::${target} ($(find "$corpus_dir" -maxdepth 1 -type f | wc -l) inputs)" | ||
| if FUZZ="$target" timeout 3600 "$FUZZ_BIN" \ | ||
| -rss_limit_mb=4000 \ | ||
| -runs=0 \ | ||
| -artifact_prefix="$artifact_prefix" \ | ||
| "$corpus_dir" 2>&1; then |
There was a problem hiding this comment.
🔴 Blocking: Fuzz regression step still starts unbounded fuzzing instead of replaying the saved corpus once
This block is labeled as a corpus regression replay, but it invokes libFuzzer with the corpus directory and -runs=0. In libFuzzer that means there is no execution cap, so after replaying the existing inputs it immediately switches into normal mutation fuzzing until the outer timeout 3600 kills it. That makes this CI job nondeterministic and much slower than intended, and it stops serving as a reliable regression check tied to the committed corpus.
💡 Suggested change
| # Run corpus regression (replay all inputs) | |
| echo "::group::${target} ($(find "$corpus_dir" -maxdepth 1 -type f | wc -l) inputs)" | |
| if FUZZ="$target" timeout 3600 "$FUZZ_BIN" \ | |
| -rss_limit_mb=4000 \ | |
| -runs=0 \ | |
| -artifact_prefix="$artifact_prefix" \ | |
| "$corpus_dir" 2>&1; then | |
| if FUZZ="$target" timeout 3600 "$FUZZ_BIN" \ | |
| -rss_limit_mb=4000 \ | |
| -runs=1 \ | |
| -artifact_prefix="$artifact_prefix" \ | |
| "$corpus_dir" 2>&1; then |
source: ['codex']
🤖 Fix this with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.
In `.github/workflows/test-fuzz.yml`:
- [BLOCKING] lines 184-190: Fuzz regression step still starts unbounded fuzzing instead of replaying the saved corpus once
This block is labeled as a corpus regression replay, but it invokes libFuzzer with the corpus directory and `-runs=0`. In libFuzzer that means there is no execution cap, so after replaying the existing inputs it immediately switches into normal mutation fuzzing until the outer `timeout 3600` kills it. That makes this CI job nondeterministic and much slower than intended, and it stops serving as a reliable regression check tied to the committed corpus.
Summary
Consolidates all active Dash Core fuzzing work into a single branch/PR
so we avoid cross-PR merge conflicts.
This PR now contains both:
(deserialization, roundtrip, and functional validators)
(CI regression workflow, corpus tooling, and continuous fuzz daemon)
This is the end-to-end delivery for the Dash Core fuzzing initiative
(tracker#108).
Included work (folded into this PR)
The following previously separate fuzz PRs were merged into this branch
and closed in favor of this consolidated PR:
— Dash-specific deserialization fuzz targets
— roundtrip serialize/deserialize fuzz targets
— governance proposal validator fuzz target
— BLS operations + BLS IES fuzz targets
— asset lock/unlock validation fuzz targets
— special transaction validation fuzz target
— LLMQ message handler fuzz target
— CoinJoin validation fuzz targets
What this PR now delivers
1) Dash-specific fuzz target coverage
Adds broad fuzz coverage for Dash-specific serialization and functional
validation paths across:
evo/(ProTx, asset lock/unlock, deterministic MN structures)llmq/(commitments, DKG/session message handling, quorum data)governance/(objects/votes/proposal validation)coinjoin/(queue/broadcast/denomination/status/entry logic)bls/(operations, aggregation, threshold ops, IES encryption)2) CI fuzz regression workflow (
test-fuzz.yml)linux64_fuzzbitcoin-core/qa-assetsSKIP_LINUX64_FUZZvariable3) Continuous fuzzing daemon
contrib/fuzz/continuous_fuzz_daemon.sh:~/fuzz_corpus/<target>/~/fuzz_crashes/<target>/--single-cycle4) Seed corpus generator
contrib/fuzz/seed_corpus_from_chain.py:--synthetic-onlyfor offline corpus generationValidation
Build verification
Built fuzz binary with
--enable-fuzz-binary(PROVIDE_FUZZ_MAIN_FUNCTIONdeterministic mode) on macOS arm64 (Apple Silicon, clang, 16 cores).
Binary compiled successfully with all 95 Dash-specific targets registered
(252 total targets including inherited Bitcoin Core targets).
Target registration check
All 95 new Dash-specific fuzz targets confirmed registered in the binary:
deterministic MN, credit pool), llmq (commitments, DKG messages, quorum
snapshots, sig shares), governance (objects, votes), coinjoin (queue,
broadcast, entry, status, accept), bls (IES encrypted blobs, multi-recipient)
consistency for all major Dash protocol structures
asset_lock_tx,asset_lock_tx_raw,asset_unlock_fee,bls_operations,bls_ies,coinjoin_broadcasttx,coinjoin_denominations,coinjoin_entry_addscriptsig,coinjoin_queue,coinjoin_status_update,deterministic_mn_list_diff,governance_proposal_validator,llmq_messages,special_tx_validation,process_message_dashConcrete run results
Each of the 95 Dash-specific targets was exercised with 21 inputs
(20 random seed inputs of varying sizes 1-256 bytes + 1 empty input).
All targets processed every input without crashes or hangs.
Summary: 95/95 targets PASS, 0 crashes, 0 timeouts.
Total inputs processed: 1,995 (95 targets x 21 inputs)
Continuous daemon validation
Daemon dry-run confirmed on Guix VM — target enumeration, corpus dir
creation, and crash dir creation all verified:
Seed corpus generation
Synthetic corpus generation validated (offline mode, no running node):
CI behavior
linux64_fuzzartifactsNotes on validation scope
This validation confirms all targets build, register, and handle arbitrary
input without crashing in deterministic mode. Extended fuzzing campaigns
(with libFuzzer engine + coverage feedback) are tracked separately under
the continuous fuzzing deployment plan.
Bugs surfaced during fuzzing (pre-existing issues)
(
CCoinJoinQueue::IsTimeOutOfBoundssigned overflow case)bls_operationscampaigns(libdashbls internals)
CQuorumSnapshot::UnserializemissingmnSkipList.clear()before read loop
Notes
review/merge unit for fuzzing.
while review is in progress.