feat: add Parquet output format for layerstats#1546
Conversation
Full logs: https://github.com/onthegomap/planetiler/actions/runs/24568351799 |
|
Hi @msbarry, could you please review the PR when you have time? Thanks. |
|
I think we should just replace TSV with Parquet, not sure how many people are actually using layerstats as-is. |
|
Sure, I can remove the existing TSV output. Let me know if anything else needs changes. |
| org.apache.hadoop.fs.Path hadoopPath = new org.apache.hadoop.fs.Path(output.toString()); | ||
| this.writer = ExampleParquetWriter.builder(hadoopPath) | ||
| .withType(SCHEMA) | ||
| .withCompressionCodec(CompressionCodecName.SNAPPY) |
There was a problem hiding this comment.
@bdon any preference on encoding/row group size here? What do you usually use when converting?
|
Thanks for adding this! I'm OK removing TSV option, let's just compare how long it takes to write the parquet output over a larger area to make sure it's not much worse than TSV was. We can compare against main branch as well so don't need to block removing tsv from this PR. |
Got it, I’ll remove the TSV in this PR. |
|
Done. Please review |
…/PlanetilerConfig.java Co-authored-by: Michael Barry <msbarry@users.noreply.github.com>
|



Fixes: #1045
This PR adds a
--layerstats-format=parquetflag so users can write layerstats as Parquet instead. Parquet is more compact. TSV remains the default, so nothing breaks for existing users.