-
-
Notifications
You must be signed in to change notification settings - Fork 0
Fix in references #12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Reviewer's Guide by SourceryThis pull request improves the parsing of the References section in Google-style docstrings, specifically addressing multi-line descriptions and whitespace handling. It introduces new functions to identify main reference lines and process single/multiple references with continuation lines. Additionally, it fixes a bug related to leading whitespace when checking for dashes at the beginning of reference lines. Comprehensive tests have been added to ensure the correct parsing of various reference formats. Sequence diagram for parsing multiple referencessequenceDiagram
participant Parser as Docstring Parser
participant Processor as _process_multiple_references
participant LineIterator as Line Iterator
participant RefLineParser as _parse_reference_line
loop For each line in lines
Parser->>LineIterator: next_line = lines[i]
LineIterator-->>Parser: Returns next_line
alt next_line starts with '-'
Parser->>Processor: Process reference
Processor->>RefLineParser: _parse_reference_line(full_ref_text)
RefLineParser-->>Processor: Returns ref
Processor-->>Parser: Returns references
else next_line is continuation
Parser->>LineIterator: Skip continuation lines
end
end
Updated class diagram for reference parsingclassDiagram
class _parse_references {
+list[dict[str, str]] parse(reference_content: str)
}
class _identify_main_reference_lines {
+list[str] identify(lines: list[str])
}
class _process_single_reference {
+dict[str, str] process(main_line: str, all_lines: list[str])
}
class _process_multiple_references {
+list[dict[str, str]] process(lines: list[str])
}
class _parse_reference_line {
+dict[str, str] parse(line: str, is_single: bool = False)
}
_parse_references -- _identify_main_reference_lines : uses
_parse_references -- _process_single_reference : uses
_parse_references -- _process_multiple_references : uses
_process_single_reference -- _parse_reference_line : uses
_process_multiple_references -- _parse_reference_line : uses
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @ternaus - I've reviewed your changes - here's some feedback:
Overall Comments:
- Consider adding a helper function to calculate indentation to avoid repetition.
- The logic for handling single vs. multiple references could be simplified for better readability.
Here's what I looked at during the review
- 🟡 General issues: 2 issues found
- 🟢 Security: all looks good
- 🟡 Testing: 1 issue found
- 🟡 Complexity: 1 issue found
- 🟢 Documentation: all looks good
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| line_indent = len(main_line) - len(main_line.lstrip()) | ||
|
|
||
| # Add continuation lines if any | ||
| main_index = all_lines.index(main_line) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (bug_risk): Avoid potential pitfalls with .index() on duplicate lines.
Using all_lines.index(main_line) assumes that the main_line is unique in the list. If duplicate lines occur in the reference section, this method might return an unexpected index. Consider tracking the index during iteration or passing the index explicitly to ensure accuracy.
Suggested implementation:
# Add continuation lines if any
# Use the tracked index of main_line (e.g., current_line_index) instead of .index()
for j in range(current_line_index + 1, len(all_lines)):
next_line = all_lines[j]If the variable current_line_index (or an equivalent) is not currently tracked in this context, update the calling context or the surrounding loop to pass the index of main_line explicitly into this section of the code.
| Returns: | ||
| bool: True if this is a continuation line | ||
| """ | ||
| line_index = all_lines.index(line) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (bug_risk): Review use of index lookup for continuation line detection.
The use of all_lines.index(line) in _is_continuation_line may be problematic if the same line content appears more than once. To avoid potential misidentification, consider using enumerate to pass the current line’s index or otherwise ensure the correct position is obtained.
Suggested implementation:
def _is_continuation_line(line: str, line_index: int, all_lines: list[str]) -> bool: if line_index == 0:Remember to update all calls to _is_continuation_line so that they pass the appropriate line_index along with the line and all_lines arguments.
| - Second reference: Another description with | ||
| multiple lines and | ||
| even more text. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (testing): Add test cases for malformed multi-line references.
It would be beneficial to include test cases that cover scenarios where multi-line references are malformed, such as inconsistent indentation or missing dashes for subsequent lines in a multi-line reference. This would help ensure the parser handles these edge cases gracefully.
Suggested implementation:
import pytest
# Assuming parse_references is imported from the module containing the parser
from mymodule.docstring_parser import parse_references
def test_malformed_multiline_reference_inconsistent_indentation() -> None:
"""Test malformed multi-line reference with inconsistent indentation.
This test verifies that the parser raises an error when multi-line references contain
inconsistent indentation.
"""
docstring = '''\
Function description.
References:
- First reference: Valid description line.
Second line without proper dash and inconsistent indentation.
'''
with pytest.raises(ValueError):
parse_references(docstring)
def test_malformed_multiline_reference_missing_dash() -> None:
"""Test malformed multi-line reference with missing dash for continuation lines.
This test verifies that the parser raises an error when a reference is missing the dash
prefix on continuation lines.
"""
docstring = '''\
Function description.
References:
- First reference: Valid description.
Continuation line missing dash.
Second reference without dash at all.
'''
with pytest.raises(ValueError):
parse_references(docstring)If parse_references is defined in a different module or under a different name, please adjust the import statement accordingly.
Also, ensure that the parser is set to raise ValueError for malformed multi-line references in your codebase.
| return _parse_reference_line(reference_text, is_single=True) | ||
|
|
||
|
|
||
| def _process_multiple_references(lines: list[str]) -> list[dict[str, str]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (complexity): Consider grouping contiguous lines into reference blocks to simplify index management in _process_multiple_references function and reduce nested loops complexity
Consider simplifying the manual index management in `_process_multiple_references` by grouping contiguous lines into a reference block. For example, you can add a helper that groups a list of lines where a new group starts whenever a line beginning with a dash is encountered. Then, process each group without manually juggling indices.
Example helper:
```python
def _group_reference_blocks(lines: list[str]) -> list[list[str]]:
groups = []
current = []
for line in lines:
if line.lstrip().startswith("-") and current:
groups.append(current)
current = [line]
else:
current.append(line)
if current:
groups.append(current)
return groupsThen refactor _process_multiple_references as:
def _process_multiple_references(lines: list[str]) -> list[dict[str, str]]:
references = []
blocks = _group_reference_blocks(lines)
for block in blocks:
# Join block lines and parse the reference.
full_ref_text = " ".join(l.strip() for l in block)
references.append(_parse_reference_line(full_ref_text))
return referencesThis change streamlines the iteration and reduces nested loops while keeping functionality intact.
|
|
||
| if is_dashed: | ||
| # Definitely a main reference | ||
| main_ref_lines.append(line) | ||
| prev_indent = line_indent | ||
| elif line_indent <= prev_indent and has_colon: | ||
| # Same or less indentation than previous with a colon - likely a new reference | ||
| main_ref_lines.append(line) | ||
| prev_indent = line_indent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (code-quality): We've found these issues:
- Merge duplicate blocks in conditional (
merge-duplicate-blocks) - Remove redundant conditional (
remove-redundant-if)
| current_line = lines[i].rstrip() | ||
|
|
||
| # Check if this is a new reference (starts with dash) | ||
| if current_line.lstrip().startswith("-"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (code-quality): We've found these issues:
- Swap if/else branches (
swap-if-else-branches) - Remove unnecessary else after guard condition (
remove-unnecessary-else)
Summary by Sourcery
Tests: