Skip to content

Conversation

atris
Copy link
Contributor

@atris atris commented Sep 5, 2025

Following up on Phase 1 (PR #19060), this adds the cost estimation
infrastructure needed for query optimization (RFC #18906).

What this does:
Builds a tree of QueryPlanNode objects from QueryBuilder
Estimates query costs by summing Lucene scorer costs across all segments
Tracks CPU/memory/IO costs alongside document counts
Preserves field names and query metadata (lost when converting to Lucene)

The main insight is using Lucene's existing scorer.cost() aggregated
properly across index leaves, then adding our own heuristics for
things Lucene doesn't track.

Notable implementation details:
BooleanPlanNode caps must_not penalties so costs don't explode
Percentage minimumShouldMatch now works (e.g. "50%")
Feature flagged: -Dopensearch.experimental.feature.query_planner.enabled

No user-visible changes yet - this is just the foundation for Phase 3
where we'll actually optimize queries based on these costs.

Signed-off-by: Atri Sharma [email protected]

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Introduces LogicalPlanBuilder to convert QueryBuilder to QueryPlanNode tree
with multi-dimensional cost tracking (lucene_docs, cpu, memory, io). Adds
CostEstimator leveraging Lucene's scorer costs across index leaves with
OpenSearch-specific heuristics. Implements volatile cost caching with
double-checked locking for thread safety. Supports Boolean, Term, Range,
and Match queries with QueryBuilder context preservation.

Includes feature flag (opensearch.experimental.feature.query_planner.enabled)
for gradual rollout and QueryPlannerIntegration helper for search pipeline
integration. Bounded must_not penalty prevents cost explosion on large
document sets.

Signed-off-by: Atri Sharma <[email protected]>
Signed-off-by: Atri Sharma <[email protected]>
Signed-off-by: Atri Sharma <[email protected]>
@atris
Copy link
Contributor Author

atris commented Sep 5, 2025

@andrross @rishabhmaurya Please review

Copy link
Contributor

github-actions bot commented Sep 5, 2025

❌ Gradle check result for 75e057f: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@atris atris closed this Sep 7, 2025
@atris atris reopened this Sep 7, 2025
Copy link
Contributor

github-actions bot commented Sep 7, 2025

❌ Gradle check result for 75e057f: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant