Skip to content

Conversation

@ternaus
Copy link
Owner

@ternaus ternaus commented Mar 20, 2025

Summary by Sourcery

Tests:

  • Adds tests for parsing references with multi-line descriptions and mixed formatting.

@sourcery-ai
Copy link

sourcery-ai bot commented Mar 20, 2025

Reviewer's Guide by Sourcery

This pull request improves the parsing of the References section in Google-style docstrings, specifically addressing multi-line descriptions and whitespace handling. It introduces new functions to identify main reference lines and process single/multiple references with continuation lines. Additionally, it fixes a bug related to leading whitespace when checking for dashes at the beginning of reference lines. Comprehensive tests have been added to ensure the correct parsing of various reference formats.

Sequence diagram for parsing multiple references

sequenceDiagram
    participant Parser as Docstring Parser
    participant Processor as _process_multiple_references
    participant LineIterator as Line Iterator
    participant RefLineParser as _parse_reference_line

    loop For each line in lines
        Parser->>LineIterator: next_line = lines[i]
        LineIterator-->>Parser: Returns next_line

        alt next_line starts with '-'
            Parser->>Processor: Process reference
            Processor->>RefLineParser: _parse_reference_line(full_ref_text)
            RefLineParser-->>Processor: Returns ref
            Processor-->>Parser: Returns references
        else next_line is continuation
            Parser->>LineIterator: Skip continuation lines
        end
    end
Loading

Updated class diagram for reference parsing

classDiagram
    class _parse_references {
        +list[dict[str, str]] parse(reference_content: str)
    }
    class _identify_main_reference_lines {
        +list[str] identify(lines: list[str])
    }
    class _process_single_reference {
        +dict[str, str] process(main_line: str, all_lines: list[str])
    }
    class _process_multiple_references {
        +list[dict[str, str]] process(lines: list[str])
    }
    class _parse_reference_line {
        +dict[str, str] parse(line: str, is_single: bool = False)
    }

    _parse_references -- _identify_main_reference_lines : uses
    _parse_references -- _process_single_reference : uses
    _parse_references -- _process_multiple_references : uses
    _process_single_reference -- _parse_reference_line : uses
    _process_multiple_references -- _parse_reference_line : uses
Loading

File-Level Changes

Change Details Files
Improved handling of multi-line descriptions in the References section of docstrings.
  • Implemented logic to identify main reference lines vs. continuation lines based on indentation and the presence of a dash.
  • Added functions to process single and multiple references, accounting for potential continuation lines.
  • Modified the main _parse_references function to use the new helper functions for identifying and processing references.
  • Added tests to verify the correct parsing of references with multi-line descriptions and mixed formatting.
google_docstring_parser/google_docstring_parser.py
tests/test_references.py
Fixed a bug where leading whitespace was not being stripped when checking for a dash at the beginning of a reference line.
  • Added lstrip() to remove leading whitespace before checking for a dash in _parse_reference_line.
google_docstring_parser/google_docstring_parser.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!
  • Generate a plan of action for an issue: Comment @sourcery-ai plan on
    an issue to generate a plan of action for it.

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@ternaus ternaus merged commit 80a772d into main Mar 20, 2025
17 checks passed
@ternaus ternaus deleted the fix_mutliline branch March 20, 2025 00:56
Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @ternaus - I've reviewed your changes - here's some feedback:

Overall Comments:

  • Consider adding a helper function to calculate indentation to avoid repetition.
  • The logic for handling single vs. multiple references could be simplified for better readability.
Here's what I looked at during the review
  • 🟡 General issues: 2 issues found
  • 🟢 Security: all looks good
  • 🟡 Testing: 1 issue found
  • 🟡 Complexity: 1 issue found
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

line_indent = len(main_line) - len(main_line.lstrip())

# Add continuation lines if any
main_index = all_lines.index(main_line)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (bug_risk): Avoid potential pitfalls with .index() on duplicate lines.

Using all_lines.index(main_line) assumes that the main_line is unique in the list. If duplicate lines occur in the reference section, this method might return an unexpected index. Consider tracking the index during iteration or passing the index explicitly to ensure accuracy.

Suggested implementation:

    # Add continuation lines if any
    # Use the tracked index of main_line (e.g., current_line_index) instead of .index()
    for j in range(current_line_index + 1, len(all_lines)):
        next_line = all_lines[j]

If the variable current_line_index (or an equivalent) is not currently tracked in this context, update the calling context or the surrounding loop to pass the index of main_line explicitly into this section of the code.

Returns:
bool: True if this is a continuation line
"""
line_index = all_lines.index(line)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (bug_risk): Review use of index lookup for continuation line detection.

The use of all_lines.index(line) in _is_continuation_line may be problematic if the same line content appears more than once. To avoid potential misidentification, consider using enumerate to pass the current line’s index or otherwise ensure the correct position is obtained.

Suggested implementation:

def _is_continuation_line(line: str, line_index: int, all_lines: list[str]) -> bool:
    if line_index == 0:

Remember to update all calls to _is_continuation_line so that they pass the appropriate line_index along with the line and all_lines arguments.

Comment on lines +354 to +356
- Second reference: Another description with
multiple lines and
even more text.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Add test cases for malformed multi-line references.

It would be beneficial to include test cases that cover scenarios where multi-line references are malformed, such as inconsistent indentation or missing dashes for subsequent lines in a multi-line reference. This would help ensure the parser handles these edge cases gracefully.

Suggested implementation:

import pytest
# Assuming parse_references is imported from the module containing the parser
from mymodule.docstring_parser import parse_references

def test_malformed_multiline_reference_inconsistent_indentation() -> None:
    """Test malformed multi-line reference with inconsistent indentation.

    This test verifies that the parser raises an error when multi-line references contain
    inconsistent indentation.
    """
    docstring = '''\
    Function description.

    References:
        - First reference: Valid description line.
         Second line without proper dash and inconsistent indentation.
    '''
    with pytest.raises(ValueError):
        parse_references(docstring)

def test_malformed_multiline_reference_missing_dash() -> None:
    """Test malformed multi-line reference with missing dash for continuation lines.

    This test verifies that the parser raises an error when a reference is missing the dash
    prefix on continuation lines.
    """
    docstring = '''\
    Function description.

    References:
        - First reference: Valid description.
          Continuation line missing dash.
        Second reference without dash at all.
    '''
    with pytest.raises(ValueError):
        parse_references(docstring)

If parse_references is defined in a different module or under a different name, please adjust the import statement accordingly.
Also, ensure that the parser is set to raise ValueError for malformed multi-line references in your codebase.

return _parse_reference_line(reference_text, is_single=True)


def _process_multiple_references(lines: list[str]) -> list[dict[str, str]]:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (complexity): Consider grouping contiguous lines into reference blocks to simplify index management in _process_multiple_references function and reduce nested loops complexity

Consider simplifying the manual index management in `_process_multiple_references` by grouping contiguous lines into a reference block. For example, you can add a helper that groups a list of lines where a new group starts whenever a line beginning with a dash is encountered. Then, process each group without manually juggling indices.

Example helper:
```python
def _group_reference_blocks(lines: list[str]) -> list[list[str]]:
    groups = []
    current = []
    for line in lines:
        if line.lstrip().startswith("-") and current:
            groups.append(current)
            current = [line]
        else:
            current.append(line)
    if current:
        groups.append(current)
    return groups

Then refactor _process_multiple_references as:

def _process_multiple_references(lines: list[str]) -> list[dict[str, str]]:
    references = []
    blocks = _group_reference_blocks(lines)
    for block in blocks:
        # Join block lines and parse the reference.
        full_ref_text = " ".join(l.strip() for l in block)
        references.append(_parse_reference_line(full_ref_text))
    return references

This change streamlines the iteration and reduces nested loops while keeping functionality intact.

Comment on lines +238 to +246

if is_dashed:
# Definitely a main reference
main_ref_lines.append(line)
prev_indent = line_indent
elif line_indent <= prev_indent and has_colon:
# Same or less indentation than previous with a colon - likely a new reference
main_ref_lines.append(line)
prev_indent = line_indent
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (code-quality): We've found these issues:

current_line = lines[i].rstrip()

# Check if this is a new reference (starts with dash)
if current_line.lstrip().startswith("-"):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (code-quality): We've found these issues:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants