tabula-py
Note: If you want to use multiple area options and extract in one table, it should be better
to set multiple_tables=False for read_pdf()
Examples
[269.875,12.75,790.5,561], [[12.1,20.5,30.1,50.2], [1.0,3.2,10.5,40.
2]]
• relative_area (bool, optional) – If all area values are between 0-100 (inclusive) and
preceded by '%', input will be taken as % of actual height or width of the page. Default
False.
• lattice (bool, optional) – Force PDF to be extracted using lattice-mode extraction (if
there are ruling lines separating each cell, as in a PDF of an Excel spreadsheet)
• stream (bool, optional) – Force PDF to be extracted using stream-mode extraction (if
there are no ruling lines separating each cell, as in a PDF of an Excel spreadsheet)
• password (str, optional) – Password to decrypt document. Default: empty
• silent (bool, optional ) – Suppress all stderr output.
• columns (Sequence, optional) – X coordinates of column boundaries. Must be sorted
and of a datatype that preserves order, e.g. tuple or list
Example
[10.1, 20.2, 30.3]
• relative_columns (bool, optional) – If all values are between 0-100 (inclusive) and
preceded by ‘%’, input will be taken as % of actual width of the page. Default False.
• format (str, optional) – Format for output file or extracted object. ("CSV", "TSV",
"JSON")
• force_subprocess (bool) – Force to use tabula-java subprocess mode. If you have some
issue with jpype, try this option with same environment. Default False.
• options (str, optional) – Raw option string for tabula-java.
Returns
Nothing. Outputs are saved into the same directory with input_dir
Raises
ValueError – If input_dir doesn’t exist.
tabula.io.read_pdf(input_path: IO | str | PathLike, output_format: str | None = None, encoding: str = 'utf-8',
java_options: List[str] | None = None, pandas_options: Dict[str, Any] | None = None,
multiple_tables: bool = True, user_agent: str | None = None, use_raw_url: bool = False,
pages: str | int | Iterable[int] | None = None, guess: bool = True, area: Iterable[float] |
Iterable[Iterable[float]] | None = None, relative_area: bool = False, lattice: bool = False,
stream: bool = False, password: str | None = None, silent: bool | None = None, columns:
Sequence[float] | None = None, relative_columns: bool = False, format: str | None = None,
batch: str | None = None, output_path: str | None = None, force_subprocess: bool = False,
options: str = '') → List[DataFrame] | Dict[str, Any]
Read tables in PDF.
16 Chapter 4. tabula