Skip to content

feat: add Parquet output format for layerstats#1546

Open
1Ninad wants to merge 7 commits into
onthegomap:mainfrom
1Ninad:write-parquet-layerstats
Open

feat: add Parquet output format for layerstats#1546
1Ninad wants to merge 7 commits into
onthegomap:mainfrom
1Ninad:write-parquet-layerstats

Conversation

@1Ninad

@1Ninad 1Ninad commented Apr 14, 2026

Copy link
Copy Markdown
Contributor

Fixes: #1045

This PR adds a --layerstats-format=parquet flag so users can write layerstats as Parquet instead. Parquet is more compact. TSV remains the default, so nothing breaks for existing users.

@github-actions

github-actions Bot commented Apr 14, 2026

Copy link
Copy Markdown
This Branch fe217c6 Base ae35950
0:01:07 DEB [archive] - Tile stats:
0:01:07 DEB [archive] - Biggest tiles (gzipped)
1. 14/4942/6092 (161k) https://onthegomap.github.io/planetiler-demo/#14.5/41.82864/-71.40015 (poi:88k)
2. 9/154/190 (148k) https://onthegomap.github.io/planetiler-demo/#9.5/41.77078/-71.36719 (landcover:86k)
3. 10/308/381 (138k) https://onthegomap.github.io/planetiler-demo/#10.5/41.63994/-71.54297 (landcover:72k)
4. 10/308/380 (137k) https://onthegomap.github.io/planetiler-demo/#10.5/41.90214/-71.54297 (landcover:66k)
5. 14/4941/6092 (120k) https://onthegomap.github.io/planetiler-demo/#14.5/41.82864/-71.42212 (poi:69k)
6. 14/4941/6093 (118k) https://onthegomap.github.io/planetiler-demo/#14.5/41.81227/-71.42212 (poi:62k)
7. 14/4946/6113 (110k) https://onthegomap.github.io/planetiler-demo/#14.5/41.48389/-71.31226 (building:59k)
8. 14/4946/6112 (109k) https://onthegomap.github.io/planetiler-demo/#14.5/41.50035/-71.31226 (building:67k)
9. 14/4940/6092 (101k) https://onthegomap.github.io/planetiler-demo/#14.5/41.82864/-71.44409 (building:92k)
10. 14/4942/6091 (100k) https://onthegomap.github.io/planetiler-demo/#14.5/41.84501/-71.40015 (building:79k)
0:01:07 DEB [archive] - Max tile sizes
                      z0    z1    z2    z3    z4    z5    z6    z7    z8    z9   z10   z11   z12   z13   z14   all
           boundary  151   336   409   544   802   287   382   490   670  1.6k    2k  6.8k  6.2k  5.6k  4.4k  6.8k
              water 7.7k  3.7k  8.6k  5.5k  2.6k  5.1k   15k   18k   16k   26k   15k   13k   17k   15k   12k   26k
              place    0     0   487   487   487   733   822  1.1k  1.8k  3.3k  6.2k  3.8k    2k   935    1k  6.2k
            landuse    0     0     0     0   549   695  1.6k    7k   18k   44k   58k   49k   38k   19k   12k   58k
     transportation    0     0     0     0   406    1k  1.5k  4.6k  6.4k   21k   15k   17k   67k   41k   38k   67k
           waterway    0     0     0     0   112   119     0     0     0  3.3k  2.4k    2k  2.1k  4.9k  2.4k  4.9k
               park    0     0     0     0     0     0  1.1k    4k  9.6k   18k   13k  8.2k  3.7k  3.4k  4.4k   18k
transportation_name    0     0     0     0     0     0   293   360  1.1k  1.9k  5.8k  4.8k  3.9k  3.5k   19k   19k
          landcover    0     0     0     0     0     0     0  9.7k   29k   86k   72k   82k   53k   30k   25k   86k
      mountain_peak    0     0     0     0     0     0     0  1.1k  1.8k  3.4k  4.4k  2.8k  1.4k  1.4k   869  4.4k
         water_name    0     0     0     0     0     0     0     0     0   528   503   475   494  1.2k  1.5k  1.5k
    aerodrome_label    0     0     0     0     0     0     0     0     0     0   666   289   273   221   221   666
            aeroway    0     0     0     0     0     0     0     0     0     0  1.6k    2k    3k  3.3k  2.8k  3.3k
                poi    0     0     0     0     0     0     0     0     0     0     0     0   589   586   88k   88k
           building    0     0     0     0     0     0     0     0     0     0     0     0     0   59k   92k   92k
        housenumber    0     0     0     0     0     0     0     0     0     0     0     0     0     0   35k   35k
          full tile 7.9k    4k  9.5k  6.5k  3.7k  6.3k   21k   41k   85k  202k  185k  135k  114k  122k  254k  254k
            gzipped 6.2k  3.5k  7.1k  5.2k  3.1k    5k   14k   29k   61k  148k  138k   99k   84k   87k  161k  161k
0:01:07 DEB [archive] -    Max tile: 254k (gzipped: 161k)
0:01:07 DEB [archive] -    Avg tile: 5.5k (gzipped: 4.1k) using weighted average based on OSM traffic
0:01:07 DEB [archive] -     # tiles: 4,115,030
0:01:07 DEB [archive] -  # features: 5,758,167
0:01:07 INF [archive] - Finished in 19s cpu:1m10s avg:3.7
0:01:07 INF [archive] -   read    1x(3% 0.6s wait:17s done:1s)
0:01:07 INF [archive] -   encode  4x(57% 11s wait:2s)
0:01:07 INF [archive] -   write   1x(21% 4s wait:13s)
0:01:07 INF [archive] - Finished in 1m8s cpu:3m29s gc:1s avg:3.1
0:01:07 INF [archive] - FINISHED!
0:01:07 INF [archive] - 
0:01:07 INF [archive] - ----------------------------------------
0:01:07 INF [archive] - data errors:
0:01:07 INF [archive] - 	render_snap_fix_input	16,794
0:01:07 INF [archive] - 	osm_multipolygon_missing_way	396
0:01:07 INF [archive] - 	osm_boundary_missing_way	55
0:01:07 INF [archive] - 	merge_snap_fix_input	9
0:01:07 INF [archive] - 	feature_polygon_osm_invalid_multipolygon_empty_after_fix	7
0:01:07 INF [archive] - 	feature_centroid_if_convex_osm_invalid_multipolygon_empty_after_fix	2
0:01:07 INF [archive] - 	omt_fix_water_before_ne_intersect	2
0:01:07 INF [archive] - 	render_snap_fix_input2	1
0:01:07 INF [archive] - 	omt_park_area_osm_invalid_multipolygon_empty_after_fix	1
0:01:07 INF [archive] - ----------------------------------------
0:01:07 INF [archive] - 	overall          1m8s cpu:3m29s gc:1s avg:3.1
0:01:07 INF [archive] - 	lake_centerlines 2s cpu:6s avg:2.4
0:01:07 INF [archive] - 	  read     1x(19% 0.5s done:2s)
0:01:07 INF [archive] - 	  process  4x(0% 0s done:2s)
0:01:07 INF [archive] - 	  write    1x(0% 0s done:2s)
0:01:07 INF [archive] - 	water_polygons   15s cpu:39s avg:2.6
0:01:07 INF [archive] - 	  read     1x(43% 6s done:7s)
0:01:07 INF [archive] - 	  process  4x(22% 3s wait:4s done:6s)
0:01:07 INF [archive] - 	  write    1x(2% 0.3s wait:10s done:5s)
0:01:07 INF [archive] - 	natural_earth    11s cpu:18s avg:1.6
0:01:07 INF [archive] - 	  read     1x(57% 6s done:5s)
0:01:07 INF [archive] - 	  process  4x(8% 0.9s wait:6s done:5s)
0:01:07 INF [archive] - 	  write    1x(0% 0s wait:6s done:5s)
0:01:07 INF [archive] - 	osm_pass1        2s cpu:7s avg:3.3
0:01:07 INF [archive] - 	  read     1x(2% 0s wait:2s)
0:01:07 INF [archive] - 	  parse    4x(35% 0.7s)
0:01:07 INF [archive] - 	  process  1x(69% 1s)
0:01:07 INF [archive] - 	osm_pass2        16s cpu:1m3s avg:4
0:01:07 INF [archive] - 	  read     1x(0% 0s wait:10s done:6s)
0:01:07 INF [archive] - 	  process  4x(70% 11s)
0:01:07 INF [archive] - 	  write    1x(2% 0.3s wait:16s)
0:01:07 INF [archive] - 	ne_lakes         0s cpu:0s avg:5.7
0:01:07 INF [archive] - 	boundaries       0s cpu:0s avg:0
0:01:07 INF [archive] - 	agg_stop         0s cpu:0s avg:0
0:01:07 INF [archive] - 	sort             1s cpu:4s avg:2.8
0:01:07 INF [archive] - 	  worker  1x(42% 0.6s)
0:01:07 INF [archive] - 	archive          19s cpu:1m10s avg:3.7
0:01:07 INF [archive] - 	  read    1x(3% 0.6s wait:17s done:1s)
0:01:07 INF [archive] - 	  encode  4x(57% 11s wait:2s)
0:01:07 INF [archive] - 	  write   1x(21% 4s wait:13s)
0:01:07 INF [archive] - ----------------------------------------
0:01:07 INF [archive] - 	archive	108MB
0:01:07 INF [archive] - 	features	296MB
-rw-r--r-- 1 runner runner 89M Apr 17 13:48 run.jar
0:01:04 DEB [archive] - Tile stats:
0:01:04 DEB [archive] - Biggest tiles (gzipped)
1. 14/4942/6092 (161k) https://onthegomap.github.io/planetiler-demo/#14.5/41.82864/-71.40015 (poi:88k)
2. 9/154/190 (148k) https://onthegomap.github.io/planetiler-demo/#9.5/41.77078/-71.36719 (landcover:86k)
3. 10/308/381 (138k) https://onthegomap.github.io/planetiler-demo/#10.5/41.63994/-71.54297 (landcover:72k)
4. 10/308/380 (137k) https://onthegomap.github.io/planetiler-demo/#10.5/41.90214/-71.54297 (landcover:66k)
5. 14/4941/6092 (120k) https://onthegomap.github.io/planetiler-demo/#14.5/41.82864/-71.42212 (poi:69k)
6. 14/4941/6093 (118k) https://onthegomap.github.io/planetiler-demo/#14.5/41.81227/-71.42212 (poi:62k)
7. 14/4946/6113 (110k) https://onthegomap.github.io/planetiler-demo/#14.5/41.48389/-71.31226 (building:59k)
8. 14/4946/6112 (109k) https://onthegomap.github.io/planetiler-demo/#14.5/41.50035/-71.31226 (building:67k)
9. 14/4940/6092 (101k) https://onthegomap.github.io/planetiler-demo/#14.5/41.82864/-71.44409 (building:92k)
10. 14/4942/6091 (100k) https://onthegomap.github.io/planetiler-demo/#14.5/41.84501/-71.40015 (building:79k)
0:01:04 DEB [archive] - Max tile sizes
                      z0    z1    z2    z3    z4    z5    z6    z7    z8    z9   z10   z11   z12   z13   z14   all
           boundary  151   336   409   544   802   287   382   490   670  1.6k    2k  6.8k  6.2k  5.6k  4.4k  6.8k
              water 7.7k  3.7k  8.6k  5.5k  2.6k  5.1k   15k   18k   16k   26k   15k   13k   17k   15k   12k   26k
              place    0     0   487   487   487   733   822  1.1k  1.8k  3.3k  6.2k  3.8k    2k   935    1k  6.2k
            landuse    0     0     0     0   549   695  1.6k    7k   18k   44k   58k   49k   38k   19k   12k   58k
     transportation    0     0     0     0   406    1k  1.5k  4.6k  6.4k   21k   15k   17k   67k   41k   38k   67k
           waterway    0     0     0     0   112   119     0     0     0  3.3k  2.4k    2k  2.1k  4.9k  2.4k  4.9k
               park    0     0     0     0     0     0  1.1k    4k  9.6k   18k   13k  8.2k  3.7k  3.4k  4.4k   18k
transportation_name    0     0     0     0     0     0   293   360  1.1k  1.9k  5.8k  4.8k  3.9k  3.5k   19k   19k
          landcover    0     0     0     0     0     0     0  9.7k   29k   86k   72k   82k   53k   30k   25k   86k
      mountain_peak    0     0     0     0     0     0     0  1.1k  1.8k  3.4k  4.4k  2.8k  1.4k  1.4k   869  4.4k
         water_name    0     0     0     0     0     0     0     0     0   528   503   475   494  1.2k  1.5k  1.5k
    aerodrome_label    0     0     0     0     0     0     0     0     0     0   666   289   273   221   221   666
            aeroway    0     0     0     0     0     0     0     0     0     0  1.6k    2k    3k  3.3k  2.8k  3.3k
                poi    0     0     0     0     0     0     0     0     0     0     0     0   589   586   88k   88k
           building    0     0     0     0     0     0     0     0     0     0     0     0     0   59k   92k   92k
        housenumber    0     0     0     0     0     0     0     0     0     0     0     0     0     0   35k   35k
          full tile 7.9k    4k  9.5k  6.5k  3.7k  6.3k   21k   41k   85k  202k  185k  135k  114k  122k  254k  254k
            gzipped 6.2k  3.5k  7.1k  5.2k  3.1k    5k   14k   29k   61k  148k  138k   99k   84k   87k  161k  161k
0:01:04 DEB [archive] -    Max tile: 254k (gzipped: 161k)
0:01:04 DEB [archive] -    Avg tile: 5.5k (gzipped: 4.1k) using weighted average based on OSM traffic
0:01:04 DEB [archive] -     # tiles: 4,115,030
0:01:04 DEB [archive] -  # features: 5,758,167
0:01:04 INF [archive] - Finished in 19s cpu:1m10s avg:3.7
0:01:04 INF [archive] -   read    1x(3% 0.6s wait:17s done:1s)
0:01:04 INF [archive] -   encode  4x(55% 10s wait:2s)
0:01:04 INF [archive] -   write   1x(21% 4s wait:13s)
0:01:04 INF [archive] - Finished in 1m4s cpu:3m29s gc:1s avg:3.2
0:01:04 INF [archive] - FINISHED!
0:01:04 INF [archive] - 
0:01:04 INF [archive] - ----------------------------------------
0:01:04 INF [archive] - data errors:
0:01:04 INF [archive] - 	render_snap_fix_input	16,794
0:01:04 INF [archive] - 	osm_multipolygon_missing_way	396
0:01:04 INF [archive] - 	osm_boundary_missing_way	55
0:01:04 INF [archive] - 	merge_snap_fix_input	9
0:01:04 INF [archive] - 	feature_polygon_osm_invalid_multipolygon_empty_after_fix	7
0:01:04 INF [archive] - 	feature_centroid_if_convex_osm_invalid_multipolygon_empty_after_fix	2
0:01:04 INF [archive] - 	omt_fix_water_before_ne_intersect	2
0:01:04 INF [archive] - 	render_snap_fix_input2	1
0:01:04 INF [archive] - 	omt_park_area_osm_invalid_multipolygon_empty_after_fix	1
0:01:04 INF [archive] - ----------------------------------------
0:01:04 INF [archive] - 	overall          1m4s cpu:3m29s gc:1s avg:3.2
0:01:04 INF [archive] - 	lake_centerlines 2s cpu:5s avg:2.4
0:01:04 INF [archive] - 	  read     1x(21% 0.5s done:2s)
0:01:04 INF [archive] - 	  process  4x(0% 0s done:2s)
0:01:04 INF [archive] - 	  write    1x(0% 0s done:2s)
0:01:04 INF [archive] - 	water_polygons   15s cpu:39s avg:2.6
0:01:04 INF [archive] - 	  read     1x(44% 7s done:7s)
0:01:04 INF [archive] - 	  process  4x(23% 3s wait:4s done:6s)
0:01:04 INF [archive] - 	  write    1x(2% 0.3s wait:9s done:5s)
0:01:04 INF [archive] - 	natural_earth    7s cpu:13s avg:2
0:01:04 INF [archive] - 	  read     1x(95% 6s)
0:01:04 INF [archive] - 	  process  4x(13% 0.9s wait:6s)
0:01:04 INF [archive] - 	  write    1x(0% 0s wait:6s)
0:01:04 INF [archive] - 	osm_pass1        2s cpu:7s avg:3.2
0:01:04 INF [archive] - 	  read     1x(2% 0s wait:2s)
0:01:04 INF [archive] - 	  parse    4x(34% 0.7s)
0:01:04 INF [archive] - 	  process  1x(68% 1s)
0:01:04 INF [archive] - 	osm_pass2        17s cpu:1m9s avg:4
0:01:04 INF [archive] - 	  read     1x(0% 0s wait:11s done:7s)
0:01:04 INF [archive] - 	  process  4x(67% 12s)
0:01:04 INF [archive] - 	  write    1x(2% 0.3s wait:17s)
0:01:04 INF [archive] - 	ne_lakes         0s cpu:0s avg:0
0:01:04 INF [archive] - 	boundaries       0s cpu:0s avg:13.2
0:01:04 INF [archive] - 	agg_stop         0s cpu:0s avg:0
0:01:04 INF [archive] - 	sort             1s cpu:4s avg:2.7
0:01:04 INF [archive] - 	  worker  1x(47% 0.6s)
0:01:04 INF [archive] - 	archive          19s cpu:1m10s avg:3.7
0:01:04 INF [archive] - 	  read    1x(3% 0.6s wait:17s done:1s)
0:01:04 INF [archive] - 	  encode  4x(55% 10s wait:2s)
0:01:04 INF [archive] - 	  write   1x(21% 4s wait:13s)
0:01:04 INF [archive] - ----------------------------------------
0:01:04 INF [archive] - 	archive	108MB
0:01:04 INF [archive] - 	features	296MB
-rw-r--r-- 1 runner runner 89M Apr 17 13:50 run.jar

Full logs: https://github.com/onthegomap/planetiler/actions/runs/24568351799

@1Ninad

1Ninad commented Apr 14, 2026

Copy link
Copy Markdown
Contributor Author

Hi @msbarry, could you please review the PR when you have time? Thanks.

@bdon

bdon commented Apr 15, 2026

Copy link
Copy Markdown
Contributor

I think we should just replace TSV with Parquet, not sure how many people are actually using layerstats as-is.

@1Ninad

1Ninad commented Apr 15, 2026

Copy link
Copy Markdown
Contributor Author

Sure, I can remove the existing TSV output. Let me know if anything else needs changes.

org.apache.hadoop.fs.Path hadoopPath = new org.apache.hadoop.fs.Path(output.toString());
this.writer = ExampleParquetWriter.builder(hadoopPath)
.withType(SCHEMA)
.withCompressionCodec(CompressionCodecName.SNAPPY)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bdon any preference on encoding/row group size here? What do you usually use when converting?

@msbarry

msbarry commented Apr 15, 2026

Copy link
Copy Markdown
Contributor

Thanks for adding this! I'm OK removing TSV option, let's just compare how long it takes to write the parquet output over a larger area to make sure it's not much worse than TSV was. We can compare against main branch as well so don't need to block removing tsv from this PR.

@1Ninad

1Ninad commented Apr 15, 2026

Copy link
Copy Markdown
Contributor Author

Thanks for adding this! I'm OK removing TSV option, let's just compare how long it takes to write the parquet output over a larger area to make sure it's not much worse than TSV was. We can compare against main branch as well so don't need to block removing tsv from this PR.

Got it, I’ll remove the TSV in this PR.

@1Ninad

1Ninad commented Apr 15, 2026

Copy link
Copy Markdown
Contributor Author

Done. Please review

@1Ninad 1Ninad requested a review from msbarry April 15, 2026 12:12
…/PlanetilerConfig.java

Co-authored-by: Michael Barry <msbarry@users.noreply.github.com>
Comment thread planetiler-core/src/main/java/com/onthegomap/planetiler/util/TileSizeStats.java Outdated
Comment thread planetiler-core/src/main/java/com/onthegomap/planetiler/util/TileSizeStats.java Outdated
Comment thread planetiler-core/src/test/java/com/onthegomap/planetiler/PlanetilerTests.java Outdated
Comment thread planetiler-core/src/test/java/com/onthegomap/planetiler/PlanetilerTests.java Outdated
@sonarqubecloud

Copy link
Copy Markdown

@1Ninad 1Ninad requested a review from msbarry April 17, 2026 14:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] write Parquet layerstats directly

3 participants