Wednesday, June 16, 2021

Thursday, June 10, 2021

Trino and Zero-Length Parquet Files in HDFS Part 2

Continuing from Part 1.

The test application to write parquet files over the course of an hour as on github at https://github.com/awcoleman/example_trino_hdfs_zero_length

We can run the app in quick mode to populate some hive-style directories with data:



Create a table from those hive-style directories:



And tell the Hive Metastore about these partitions:



(Here we just tell hivemetastore to refresh all partitions, we could have just added single partitions)


And query data with Trino:



Now imagine if a legacy application opened a parquet file in an older partition directory and held that file open while waiting to see if there was any more incoming old data. We can use our test application to simulate that:



We can see the hdfs directory now has another file:



If we run the same query in Trino again, we get an error:


And the Trino server.log shows us the issue is in the footer:


Queries in other directories without open parquet files work fine:



Trino and Zero-Length Parquet Files in HDFS Part 1

 Trino (formerly Presto) is a great distributed query engine. It allows one to use SQL to query data in parquet files.

Parquet files have file metadata in a footer at the end of the file. The footer is written when the parquet file is closed.


I have a client with a legacy application that writes parquet files to HDFS or S3 in a hive partition structure. Parquet files written to S3 do not exist until they are closed, however those written to HDFS show as zero-length files until closed. This can be a problem for Trino since the parquet footer has not been written yet.


Trino can use Hive Metastore as the metastore to hold information linking tables to data files. Hive Metastore partitions operate at the directory level, not file level. This means that all the files listed in a directory are shown as part of the partition.


In later posts, we'll look at Apache Iceberg as an alternative to Hive Metastore to avoid this issue.


In the next posts, we set up a test environment to show this happening.