PDFPlumberParser#
- class langchain_community.document_loaders.parsers.pdf.PDFPlumberParser(
- text_kwargs: Mapping[str, Any] | None = None,
- dedupe: bool = False,
- extract_images: bool = False,
Parse PDF with PDFPlumber.
Initialize the parser.
- Parameters:
text_kwargs (Optional[Mapping[str, Any]]) β Keyword arguments to pass to
pdfplumber.Page.extract_text()
dedupe (bool) β Avoiding the error of duplicate characters if dedupe=True.
extract_images (bool)
Methods
__init__
([text_kwargs,Β dedupe,Β extract_images])Initialize the parser.
lazy_parse
(blob)Lazily parse the blob.
parse
(blob)Eagerly parse the blob into a document or documents.
- __init__(
- text_kwargs: Mapping[str, Any] | None = None,
- dedupe: bool = False,
- extract_images: bool = False,
Initialize the parser.
- Parameters:
text_kwargs (Mapping[str, Any] | None) β Keyword arguments to pass to
pdfplumber.Page.extract_text()
dedupe (bool) β Avoiding the error of duplicate characters if dedupe=True.
extract_images (bool)
- Return type:
None