TableParquetWriterOptions
The TableParquetWriterOptions
class provides specialized instructions for configuring your IcebergTableWriter
instances.
Syntax
TableParquetWriterOptions(
table_definition: Union[TableDefinition, Mapping[str, DType], Iterable[ColumnDefinition], JType],
schema_provider: Optional[SchemaProvider] = None,
field_id_to_column_name: Optional[Dict[int, str]] = None,
compression_codec_name: Optional[str] = None,
maximum_dictionary_keys: Optional[int] = None,
maximum_dictionary_size: Optional[int] = None,
target_page_size: Optional[int] = None,
sort_order_provider: Optional[SortOrderProvider] = None,
data_instructions: Optional[s3.S3Instructions] = None,
)
Parameters
Parameter | Type | Description |
---|---|---|
table_definition | Union[TableDefinition, Mapping[str, DType], Iterable[ColumnDefinition], JType] | The table definition to use when writing Iceberg data files. The definition can be used to skip some columns or add additional columns with null values. The provided definition should have at least one column. |
schema_provider | Optional[SchemaProvider] | Used to extract a schema from an Iceberg table. The schema will be used in conjunction with the |
field_id_to_column_name | Optional[Dict[int, str]] | A one-to-one mapping of Iceberg field IDs from the schema specification to Deephaven column names from the table definition. Defaults to |
compression_codec_name | Optional[str] | The compression codec to use for writing the Parquet file. Allowed values include |
maximum_dictionary_keys | Optional[int] | The maximum number of unique keys the Parquet writer should add to a dictionary page before switching to non-dictionary encoding. Never used for non-string columns. Defaults to |
maximum_dictionary_size | Optional[int] | The maximum number of bytes the Parquet writer should add to the dictionary before switching to non-dictionary encoding. Never used for non-string columns. Defaults to |
target_page_size | Optional[int] | The target Parquet file page size in bytes. Defaults to |
sort_order_provider | Optional[SortOrderProvider] | Specifies the sort order to use for sorting new data
when writing to an Iceberg table with this writer. The sort order is determined at the time the writer is created and does not change if the table's sort order changes later. Defaults to |
data_instructions | Optional[s3.S3Instructions] | Special instructions for writing data files, useful when writing files to a non-local file system, like S3. If omitted, the data instructions will be derived from the catalog. |
Methods
None.
Constructors
A TableParquetWriterOptions
is constructed directly from the class.
Examples
The following example creates a TableParquetWriterOptions
object that can be used to write Deephaven tables to an Iceberg table:
from deephaven.experimental import iceberg
from deephaven.experimental import s3
from deephaven import empty_table
source = empty_table(10).update(["X = i", "Y = 0.1 * X", "Z = pow(Y, 2)"])
source_def = source.definition
s3_instructions = s3.S3Instructions(
region_name="us-east-1",
endpoint_override="http://minio:9000",
credentials=s3.Credentials.basic("admin", "password"),
)
writer_options = iceberg.TableParquetWriterOptions(
table_definition=source_def, data_instructions=s3_instructions
)