Configs for Partitioning
-
additional_partition_args: A JSON string representation of any values to pass through to thepartitionfunction. -
encoding: The encoding method used to decode the text input. By default, UTF-8 will be used. -
ocr_languages: The languages present in the document, for use in partitioning, OCR, or both. Multiple languages indicate that the text could be in any of the specified languages. -
skip_infer_table_types: List of document types that you want to skip table extraction with. -
strategy: Default:auto. The strategy to use for partitioning PDF and image files. Uses a layout detection model if set tohi_res. Otherwise, partitioning simply extracts the text from the document and processes it.
Configs for the Process
-
api_key: Ifpartition_by_apiis set toTrue, requests that are sent to the Unstructured API will use this Unstructured API key to make authenticated calls. -
fields_include: Fields to include in the output JSON. By default, the following fields are included:element_id,text,type,metadata, andembeddings. -
flatten_metadata: Default:False. If set toTrue, the hierarchical metadata structure is flattened to have all values exist at the top level. -
hi_res_model_name: The model to use whenstrategyis set tohi_res. Available values arelayout_v1.0.0(the default) andyolox. -
metadata_exclude: Values from themetadatafield to exclude from the output. -
metadata_include: If provided, only the specified fields are preserved in themetadataoutput. -
partition_by_api: Default:False. If set toTrue, uses Unstructured to run partitioning. If set toFalse, runs partitioning locally. -
partition_endpoint: Ifpartition_by_apiis set toTrue, partitioning requests are sent to this Unstructured API URL.

