-
Notifications
You must be signed in to change notification settings - Fork 500
ORC-1986: Trigger flush stripe for large input rows #2371
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Hi, @dongjoon-hyun could you help to review this PR ? Thanks |
183ffb7
to
9dc8c74
Compare
@@ -325,9 +327,9 @@ public boolean checkMemory(double newScale) throws IOException { | |||
} | |||
|
|||
private boolean checkMemory() throws IOException { | |||
if (rowsSinceCheck >= ROWS_PER_CHECK) { | |||
long size = treeWriter.estimateMemory(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If estimateMemory
is called for each batch write, will the performance of the write be degraded?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- If rowsSinceCheck >= ROWS_PER_CHECK, the
estimateMemory
will be called the same as before - If rowsSinceCheck < ROWS_PER_CHECK, the
estimateMemory
will be calledROWS_PER_CHECK / VectorizedRowBatch.size
times, and5000 / 1024 = 4
times by default, I think this is much smaller overhead in the overall computation.
@@ -118,6 +118,11 @@ public enum OrcConf { | |||
"If the number of distinct keys in a dictionary is greater than this\n" + | |||
"fraction of the total number of non-null rows, turn off \n" + | |||
"dictionary encoding. Use 1 to always use dictionary encoding."), | |||
DICTIONARY_MAX_SIZE_IN_BYTES("orc.dictionary.maxSizeInBytes", | |||
"hive.exec.orc.dictionary.maxSizeInBytes", | |||
16 * 1024 * 1024, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May I ask why this value is the default, @wankunde ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dictionary.size = 5120, row size = 5120, dictionary SizeInBytes = 360448, rows SizeInBytes = 32768
dictionary.size = 5120, row size = 5120, dictionary SizeInBytes = 1452277760, rows SizeInBytes = 32768
dictionary.size = 1, row size = 5120, dictionary SizeInBytes = 81920, rows SizeInBytes = 32768
dictionary.size = 661, row size = 5012, dictionary SizeInBytes = 81920, rows SizeInBytes = 32768
dictionary.size = 5120, row size = 5120, dictionary SizeInBytes = 10223616, rows SizeInBytes = 32768
dictionary.size = 1, row size = 5120, dictionary SizeInBytes = 81920, rows SizeInBytes = 32768
dictionary.size = 682, row size = 5120, dictionary SizeInBytes = 147456, rows SizeInBytes = 32768
dictionary.size = 1, row size = 5120, dictionary SizeInBytes = 81920, rows SizeInBytes = 32768
dictionary.size = 2, row size = 5120, dictionary SizeInBytes = 81920, rows SizeInBytes = 32768
dictionary.size = 1, row size = 5120, dictionary SizeInBytes = 81920, rows SizeInBytes = 32768
The dictionary size in bytes usually less than 10MB.
If dictionary size (>16MB) / stripe size(< 128MB) > 12.5%, I think it large enough to disable the dictionary encoding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look at the default dictionary of Presto and Trino, which is also 16mb.
public static final DataSize DEFAULT_DICTIONARY_MAX_MEMORY = new DataSize(16, MEGABYTE);
private static final DataSize DEFAULT_DICTIONARY_MAX_MEMORY = DataSize.of(16, MEGABYTE);
@@ -182,6 +187,9 @@ public enum OrcConf { | |||
"added to all of the writers. Valid range is [1,10000] and is primarily meant for" + | |||
"testing. Setting this too low may negatively affect performance." | |||
+ " Use orc.stripe.row.count instead if the value larger than orc.stripe.row.count."), | |||
STRIPE_SIZE_CHECK("orc.stripe.size.check", "hive.exec.orc.default.stripe.size.check", | |||
128L * 1024 * 1024, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default stripe size is 64MB, so I think 128MB is large enough to flush this strip.
STRIPE_SIZE("orc.stripe.size", "hive.exec.orc.default.stripe.size",
64L * 1024 * 1024,
"Define the default ORC stripe size, in bytes."),
Test with our production jobs, our spark jobs could run with 6GB executors if STRIPE_SIZE_CHECK = 128MB, and need 8GB executors if STRIPE_SIZE_CHECK = 256MB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you consider the configuration according to the scale of stripe size?
orc.stripe.size
*ratio
For large input rows, the stripe may excessively large , requiring more memory for both reading and writing one strip. We can check the tree write size in bytes and flush the strip even when the input rows count is less than 5000.
4f4f640
to
5574657
Compare
![]() |
@@ -182,6 +187,9 @@ public enum OrcConf { | |||
"added to all of the writers. Valid range is [1,10000] and is primarily meant for" + | |||
"testing. Setting this too low may negatively affect performance." | |||
+ " Use orc.stripe.row.count instead if the value larger than orc.stripe.row.count."), | |||
STRIPE_SIZE_CHECK("orc.stripe.size.check", "hive.exec.orc.default.stripe.size.check", | |||
128L * 1024 * 1024, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you consider the configuration according to the scale of stripe size?
orc.stripe.size
*ratio
Co-authored-by: cxzl25 <[email protected]>
@cxzl25 Thanks for your review. Change |
82fc656
to
64b2998
Compare
64b2998
to
5a8f819
Compare
Hi, @dongjoon-hyun do you have any thoughts on this PR ? |
What changes were proposed in this pull request?
For large input rows, the stripe may excessively large , requiring more memory for both reading and writing one strip. We can check the tree write size in bytes and flush the strip even when the input rows count is less than 5000.
Why are the changes needed?
To optimize the memory usage.
How was this patch tested?
Local test
Stripe with this change:
Was this patch authored or co-authored using generative AI tooling?
No