ORC-1986: Trigger flush stripe for large input rows #2371

wankunde · 2025-09-03T10:26:23Z

What changes were proposed in this pull request?

For large input rows, the stripe may excessively large , requiring more memory for both reading and writing one strip. We can check the tree write size in bytes and flush the strip even when the input rows count is less than 5000.

Stripes:
  Stripe: offset: 3 data: 347494188 rows: 5120 tail: 244 index: 2304
    Stream: column 0 section ROW_INDEX start: 3 length 12
    Stream: column 1 section ROW_INDEX start: 15 length 110
    Stream: column 2 section ROW_INDEX start: 125 length 893
    Stream: column 3 section ROW_INDEX start: 1018 length 31
    Stream: column 4 section ROW_INDEX start: 1049 length 65
    Stream: column 5 section ROW_INDEX start: 1114 length 923
    Stream: column 6 section ROW_INDEX start: 2037 length 25
    Stream: column 7 section ROW_INDEX start: 2062 length 155
    Stream: column 8 section ROW_INDEX start: 2217 length 28
    Stream: column 9 section ROW_INDEX start: 2245 length 31
    Stream: column 10 section ROW_INDEX start: 2276 length 31
    Stream: column 1 section DATA start: 2307 length 81853
    Stream: column 1 section LENGTH start: 84160 length 2191
    Stream: column 2 section DATA start: 86351 length 345862763
    Stream: column 2 section LENGTH start: 345949114 length 13736
    Stream: column 3 section DATA start: 345962850 length 22
    Stream: column 3 section LENGTH start: 345962872 length 6
    Stream: column 3 section DICTIONARY_DATA start: 345962878 length 5
    Stream: column 4 section PRESENT start: 345962883 length 200
    Stream: column 4 section DATA start: 345963083 length 6322
    Stream: column 4 section LENGTH start: 345969405 length 495
    Stream: column 4 section DICTIONARY_DATA start: 345969900 length 2919
    Stream: column 5 section DATA start: 345972819 length 1507883
    Stream: column 5 section LENGTH start: 347480702 length 7346
    Stream: column 6 section DATA start: 347488048 length 22
    Stream: column 6 section LENGTH start: 347488070 length 6
    Stream: column 6 section DICTIONARY_DATA start: 347488076 length 0
    Stream: column 7 section DATA start: 347488076 length 5795
    Stream: column 7 section LENGTH start: 347493871 length 301
    Stream: column 7 section DICTIONARY_DATA start: 347494172 length 2187
    Stream: column 8 section DATA start: 347496359 length 22
    Stream: column 8 section LENGTH start: 347496381 length 6
    Stream: column 8 section DICTIONARY_DATA start: 347496387 length 4
    Stream: column 9 section DATA start: 347496391 length 58
    Stream: column 9 section LENGTH start: 347496449 length 6
    Stream: column 9 section DICTIONARY_DATA start: 347496455 length 7
    Stream: column 10 section DATA start: 347496462 length 22
    Stream: column 10 section LENGTH start: 347496484 length 6
    Stream: column 10 section DICTIONARY_DATA start: 347496490 length 5
    Encoding column 0: DIRECT
    Encoding column 1: DIRECT_V2
    Encoding column 2: DIRECT_V2
    Encoding column 3: DICTIONARY_V2[1]
    Encoding column 4: DICTIONARY_V2[661]
    Encoding column 5: DIRECT_V2
    Encoding column 6: DICTIONARY_V2[1]
    Encoding column 7: DICTIONARY_V2[682]
    Encoding column 8: DICTIONARY_V2[1]
    Encoding column 9: DICTIONARY_V2[2]
    Encoding column 10: DICTIONARY_V2[1]

Why are the changes needed?

To optimize the memory usage.

How was this patch tested?

Local test

Stripe with this change:

  Stripe: offset: 3 data: 69573620 rows: 1024 tail: 227 index: 2245
    Stream: column 0 section ROW_INDEX start: 3 length 12
    Stream: column 1 section ROW_INDEX start: 15 length 111
    Stream: column 2 section ROW_INDEX start: 126 length 914
    Stream: column 3 section ROW_INDEX start: 1040 length 30
    Stream: column 4 section ROW_INDEX start: 1070 length 62
    Stream: column 5 section ROW_INDEX start: 1132 length 848
    Stream: column 6 section ROW_INDEX start: 1980 length 25
    Stream: column 7 section ROW_INDEX start: 2005 length 155
    Stream: column 8 section ROW_INDEX start: 2160 length 28
    Stream: column 9 section ROW_INDEX start: 2188 length 30
    Stream: column 10 section ROW_INDEX start: 2218 length 30
    Stream: column 1 section DATA start: 2248 length 15899
    Stream: column 1 section LENGTH start: 18147 length 478
    Stream: column 2 section DATA start: 18625 length 69245402
    Stream: column 2 section LENGTH start: 69264027 length 2795
    Stream: column 3 section DATA start: 69266822 length 11
    Stream: column 3 section LENGTH start: 69266833 length 6
    Stream: column 3 section DICTIONARY_DATA start: 69266839 length 5
    Stream: column 4 section PRESENT start: 69266844 length 55
    Stream: column 4 section DATA start: 69266899 length 1269
    Stream: column 4 section LENGTH start: 69268168 length 231
    Stream: column 4 section DICTIONARY_DATA start: 69268399 length 1261
    Stream: column 5 section DATA start: 69269660 length 302251
    Stream: column 5 section LENGTH start: 69571911 length 1548
    Stream: column 6 section DATA start: 69573459 length 11
    Stream: column 6 section LENGTH start: 69573470 length 6
    Stream: column 6 section DICTIONARY_DATA start: 69573476 length 0
    Stream: column 7 section DATA start: 69573476 length 1129
    Stream: column 7 section LENGTH start: 69574605 length 168
    Stream: column 7 section DICTIONARY_DATA start: 69574773 length 1030
    Stream: column 8 section DATA start: 69575803 length 11
    Stream: column 8 section LENGTH start: 69575814 length 6
    Stream: column 8 section DICTIONARY_DATA start: 69575820 length 4
    Stream: column 9 section DATA start: 69575824 length 11
    Stream: column 9 section LENGTH start: 69575835 length 6
    Stream: column 9 section DICTIONARY_DATA start: 69575841 length 5
    Stream: column 10 section DATA start: 69575846 length 11
    Stream: column 10 section LENGTH start: 69575857 length 6
    Stream: column 10 section DICTIONARY_DATA start: 69575863 length 5
    Encoding column 0: DIRECT
    Encoding column 1: DIRECT_V2
    Encoding column 2: DIRECT_V2
    Encoding column 3: DICTIONARY_V2[1]
    Encoding column 4: DICTIONARY_V2[266]
    Encoding column 5: DIRECT_V2
    Encoding column 6: DICTIONARY_V2[1]
    Encoding column 7: DICTIONARY_V2[297]
    Encoding column 8: DICTIONARY_V2[1]
    Encoding column 9: DICTIONARY_V2[1]
    Encoding column 10: DICTIONARY_V2[1]

Was this patch authored or co-authored using generative AI tooling?

No

wankunde · 2025-09-03T11:59:54Z

Hi, @dongjoon-hyun could you help to review this PR ? Thanks

java/core/src/java/org/apache/orc/OrcConf.java

cxzl25 · 2025-09-04T04:00:31Z

java/core/src/java/org/apache/orc/impl/WriterImpl.java

@@ -325,9 +327,9 @@ public boolean checkMemory(double newScale) throws IOException {
  }

  private boolean checkMemory() throws IOException {
-    if (rowsSinceCheck >= ROWS_PER_CHECK) {
+    long size = treeWriter.estimateMemory();


If estimateMemory is called for each batch write, will the performance of the write be degraded?

If rowsSinceCheck >= ROWS_PER_CHECK, the estimateMemory will be called the same as before

If rowsSinceCheck < ROWS_PER_CHECK, the estimateMemory will be called ROWS_PER_CHECK / VectorizedRowBatch.size times, and 5000 / 1024 = 4 times by default, I think this is much smaller overhead in the overall computation.

dongjoon-hyun · 2025-09-04T18:42:03Z

java/core/src/java/org/apache/orc/OrcConf.java

@@ -118,6 +118,11 @@ public enum OrcConf {
      "If the number of distinct keys in a dictionary is greater than this\n" +
          "fraction of the total number of non-null rows, turn off \n" +
          "dictionary encoding.  Use 1 to always use dictionary encoding."),
+  DICTIONARY_MAX_SIZE_IN_BYTES("orc.dictionary.maxSizeInBytes",
+      "hive.exec.orc.dictionary.maxSizeInBytes",
+      16 * 1024 * 1024,


May I ask why this value is the default, @wankunde ?

dictionary.size = 5120, row size = 5120, dictionary SizeInBytes = 360448, rows SizeInBytes = 32768 dictionary.size = 5120, row size = 5120, dictionary SizeInBytes = 1452277760, rows SizeInBytes = 32768 dictionary.size = 1, row size = 5120, dictionary SizeInBytes = 81920, rows SizeInBytes = 32768 dictionary.size = 661, row size = 5012, dictionary SizeInBytes = 81920, rows SizeInBytes = 32768 dictionary.size = 5120, row size = 5120, dictionary SizeInBytes = 10223616, rows SizeInBytes = 32768 dictionary.size = 1, row size = 5120, dictionary SizeInBytes = 81920, rows SizeInBytes = 32768 dictionary.size = 682, row size = 5120, dictionary SizeInBytes = 147456, rows SizeInBytes = 32768 dictionary.size = 1, row size = 5120, dictionary SizeInBytes = 81920, rows SizeInBytes = 32768 dictionary.size = 2, row size = 5120, dictionary SizeInBytes = 81920, rows SizeInBytes = 32768 dictionary.size = 1, row size = 5120, dictionary SizeInBytes = 81920, rows SizeInBytes = 32768

The dictionary size in bytes usually less than 10MB.
If dictionary size (>16MB) / stripe size(< 128MB) > 12.5%, I think it large enough to disable the dictionary encoding.

Look at the default dictionary of Presto and Trino, which is also 16mb.

https://github.com/prestodb/presto/blob/master/presto-orc/src/main/java/com/facebook/presto/orc/OrcWriterOptions.java#L37

public static final DataSize DEFAULT_DICTIONARY_MAX_MEMORY = new DataSize(16, MEGABYTE);

https://github.com/trinodb/trino/blob/master/lib/trino-orc/src/main/java/io/trino/orc/OrcWriterOptions.java#L52

private static final DataSize DEFAULT_DICTIONARY_MAX_MEMORY = DataSize.of(16, MEGABYTE);

dongjoon-hyun · 2025-09-04T18:42:29Z

java/core/src/java/org/apache/orc/OrcConf.java

@@ -182,6 +187,9 @@ public enum OrcConf {
      "added to all of the writers.  Valid range is [1,10000] and is primarily meant for" +
      "testing.  Setting this too low may negatively affect performance."
        + " Use orc.stripe.row.count instead if the value larger than orc.stripe.row.count."),
+  STRIPE_SIZE_CHECK("orc.stripe.size.check", "hive.exec.orc.default.stripe.size.check",
+      128L * 1024 * 1024,


The default stripe size is 64MB, so I think 128MB is large enough to flush this strip.

STRIPE_SIZE("orc.stripe.size", "hive.exec.orc.default.stripe.size", 64L * 1024 * 1024, "Define the default ORC stripe size, in bytes."),

Test with our production jobs, our spark jobs could run with 6GB executors if STRIPE_SIZE_CHECK = 128MB, and need 8GB executors if STRIPE_SIZE_CHECK = 256MB

Can you consider the configuration according to the scale of stripe size?

orc.stripe.size*ratio

For large input rows, the stripe may excessively large , requiring more memory for both reading and writing one strip. We can check the tree write size in bytes and flush the strip even when the input rows count is less than 5000.

wankunde · 2025-09-08T03:53:56Z

Hi, @cxzl25 @dongjoon-hyun

Allow use 0 to disable this new check.
Add local benchmark, 733834d

java/core/src/java/org/apache/orc/impl/WriterImpl.java

cxzl25 · 2025-09-12T13:11:35Z

java/core/src/java/org/apache/orc/OrcConf.java

@@ -182,6 +187,9 @@ public enum OrcConf {
      "added to all of the writers.  Valid range is [1,10000] and is primarily meant for" +
      "testing.  Setting this too low may negatively affect performance."
        + " Use orc.stripe.row.count instead if the value larger than orc.stripe.row.count."),
+  STRIPE_SIZE_CHECK("orc.stripe.size.check", "hive.exec.orc.default.stripe.size.check",
+      128L * 1024 * 1024,


Can you consider the configuration according to the scale of stripe size?

orc.stripe.size*ratio

Co-authored-by: cxzl25 <[email protected]>

wankunde · 2025-09-15T02:35:32Z

@cxzl25 Thanks for your review. Change STRIPE_SIZE_CHECK to OrcConf.STRIPE_SIZE_CHECKRATIO * stripeSize

wankunde · 2025-09-16T02:22:41Z

Hi, @dongjoon-hyun do you have any thoughts on this PR ?

github-actions bot added the JAVA label Sep 3, 2025

wankunde force-pushed the force_spill_stripe branch from 183ffb7 to 9dc8c74 Compare September 4, 2025 02:18

cxzl25 reviewed Sep 4, 2025

View reviewed changes

java/core/src/java/org/apache/orc/OrcConf.java Outdated Show resolved Hide resolved

cxzl25 reviewed Sep 4, 2025

View reviewed changes

dongjoon-hyun reviewed Sep 4, 2025

View reviewed changes

ORC-1986: Trigger flush stripe for large input rows

5574657

For large input rows, the stripe may excessively large , requiring more memory for both reading and writing one strip. We can check the tree write size in bytes and flush the strip even when the input rows count is less than 5000.

wankunde force-pushed the force_spill_stripe branch from 4f4f640 to 5574657 Compare September 8, 2025 03:49

cxzl25 reviewed Sep 12, 2025

View reviewed changes

Update java/core/src/java/org/apache/orc/impl/WriterImpl.java

731b82d

Co-authored-by: cxzl25 <[email protected]>

wankunde force-pushed the force_spill_stripe branch 2 times, most recently from 82fc656 to 64b2998 Compare September 15, 2025 03:04

Update STRIPE_SIZE_CHECK conf

5a8f819

wankunde force-pushed the force_spill_stripe branch from 64b2998 to 5a8f819 Compare September 15, 2025 06:44

Update STRIPE_SIZE_CHECK conf

d1552d9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ORC-1986: Trigger flush stripe for large input rows #2371

ORC-1986: Trigger flush stripe for large input rows #2371

Uh oh!

wankunde commented Sep 3, 2025 •

edited

Loading

Uh oh!

wankunde commented Sep 3, 2025

Uh oh!

Uh oh!

cxzl25 Sep 4, 2025

Uh oh!

wankunde Sep 4, 2025 •

edited

Loading

Uh oh!

dongjoon-hyun Sep 4, 2025

Uh oh!

wankunde Sep 5, 2025

Uh oh!

cxzl25 Sep 15, 2025

Uh oh!

dongjoon-hyun Sep 4, 2025

Uh oh!

wankunde Sep 5, 2025

Uh oh!

cxzl25 Sep 12, 2025

Uh oh!

wankunde commented Sep 8, 2025

Uh oh!

Uh oh!

cxzl25 Sep 12, 2025

Uh oh!

wankunde commented Sep 15, 2025

Uh oh!

wankunde commented Sep 16, 2025

Uh oh!

Uh oh!

ORC-1986: Trigger flush stripe for large input rows #2371

Are you sure you want to change the base?

ORC-1986: Trigger flush stripe for large input rows #2371

Uh oh!

Conversation

wankunde commented Sep 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

wankunde commented Sep 3, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wankunde Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wankunde commented Sep 8, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wankunde commented Sep 15, 2025

Uh oh!

wankunde commented Sep 16, 2025

Uh oh!

Uh oh!

wankunde commented Sep 3, 2025 •

edited

Loading

wankunde Sep 4, 2025 •

edited

Loading