Skip to content

This function should work on all files, even if read_parquet() is unable to read them, because of an unsupported schema, encoding, compression or other reason.

Usage

parquet_schema(file)

Arguments

file

Path to a Parquet file.

Value

Data frame, the schema of the file. It has one row for
each node (inner node or leaf node). For flat files this means one
root node (inner node), always the first one, and then one row for
each "real" column. For nested schemas, the rows are in depth-first
search order. Most important columns are:
- `file_name`: file name.
- `name`: column name.
- `type`: data type. One of the low level data types.
- `type_length`: length for fixed length byte arrays.
- `repettion_type`: character, one of `REQUIRED`, `OPTIONAL` or
  `REPEATED`.
- `logical_type`: a list column, the logical types of the columns.
  An element has at least an entry called `type`, and potentially
  additional entries, e.g. `bit_width`, `is_signed`, etc.
- `num_children`: number of child nodes. Should be a non-negative
  integer for the root node, and `NA` for a leaf node.

See also

parquet_metadata() to read more metadata, parquet_column_types() to show the columns R would read, parquet_info() to show only basic information. read_parquet(), write_parquet(), nanoparquet-types.