Partition configuration

A standard partition configuration is a collection of parameters designed to oversee document partitioning, whether executed through API integration or by the unstructured library on a local system. These parameters serve a dual role, encompassing those passed to the partition method for the initial segmentation of documents and those responsible for coordinating data after processing, including the dynamic metadata associated with each element.

Configs for Partitioning

additional_partition_args: A JSON string representation of any values to pass through to the partition function.
encoding: The encoding method used to decode the text input. By default, UTF-8 will be used.
ocr_languages: The languages present in the document, for use in partitioning, OCR, or both. Multiple languages indicate that the text could be in any of the specified languages.
skip_infer_table_types: List of document types that you want to skip table extraction with.
strategy: Default: auto. The strategy to use for partitioning PDF and image files. Uses a layout detection model if set to hi_res. Otherwise, partitioning simply extracts the text from the document and processes it.

Configs for the Process

api_key: If partition_by_api is set to True, requests that are sent to the Unstructured API will use this Unstructured API key to make authenticated calls.
fields_include: Fields to include in the output JSON. By default, the following fields are included: element_id, text, type, metadata, and embeddings.
flatten_metadata: Default: False. If set to True, the hierarchical metadata structure is flattened to have all values exist at the top level.
hi_res_model_name: The model to use when strategy is set to hi_res. Available values are layout_v1.0.0 (the default) and yolox.
metadata_exclude: Values from the metadata field to exclude from the output.
metadata_include: If provided, only the specified fields are preserved in the metadata output.
partition_by_api: Default: False. If set to True, uses Unstructured to run partitioning. If set to False, runs partitioning locally.
partition_endpoint: If partition_by_api is set to True, partitioning requests are sent to this Unstructured API URL.

Unstructured open source

Getting started with open source

Using Unstructured open source

Ingestion

How to

Best practices

Concepts

Integrations

Partition configuration

Configs for Partitioning

Configs for the Process

Unstructured open source

Getting started with open source

Using Unstructured open source

Ingestion

How to

Best practices

Concepts

Integrations

​Configs for Partitioning

​Configs for the Process

Configs for Partitioning

Configs for the Process