Skip to content

Conversation

WaVEV
Copy link
Collaborator

@WaVEV WaVEV commented Sep 9, 2025

Doc here

In this PR a unified approach for generating MQL from Django expressions was implemented. The core idea is to centralize the control flow in a base_expression method, which decides whether the expression can be translated into a direct field: value match (index-friendly) or must fall back to $expr. This keeps the logic for wrapping and dispatching in one place, while each lookup/function only defines its own expression-building logic.

This approach also allows mixing direct field: value matches with $expr clauses within the same $match. As a result, multiple $expr entries may coexist alongside index-optimized conditions, depending on the shape of the query.

Most lookups now follow this pattern by simply implementing as_mql_expr (and optionally as_mql_path when a match-based translation is possible). Only a few special cases like Col, Func operators (except the KeyTransform) , and many more, override the base behavior directly. This structure also leaves room for future optimizations (e.g. constant folding) without having to change the overall flow.

Additionally, since MongoDB 6 does not allow nesting $expr inside another $expr, the flow in base_expression ensures that such cases are flattened. In practice, expressions are generated without redundant wrapping, so the final MQL never contains $expr within $expr.

NOTE: Some polish will be made, but the main idea and the majority of the code is already rendered.

@WaVEV WaVEV force-pushed the lookup-refactor branch 3 times, most recently from 529e0ff to a78f26b Compare September 15, 2025 21:21
@timgraham timgraham changed the title WIP lookup refactor INTPYTHON-751 Make query generation omit $expr unless required Sep 20, 2025
Substr.as_mql = substr
Trim.as_mql = trim("trim")
TruncBase.as_mql = trunc
Cast.as_mql_expr = cast
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the function does not support as_mql_path. It could be added latter if we try to simplify constants expressions

return value


def base_expression(self, compiler, connection, as_path=False, **extra):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the common handler for all expressions. It defines if an expr is needed or not.

@WaVEV WaVEV marked this pull request as ready for review September 26, 2025 04:52
@WaVEV WaVEV requested review from Jibola and timgraham and removed request for Jibola September 26, 2025 04:52
Comment on lines 258 to 259
KeyTransformExact.as_mql_expr = key_transform_exact_expr
KeyTransformExact.as_mql_path = key_transform_exact_path
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check alphabetization of the classes and functions (not only in this file).

}

def range_match(a, b):
## TODO: MAKE A TEST TO TEST WHEN BOTH ENDS ARE NONE. WHAT SHALL I RETURN?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI says, "If either the start or end value provided to the BETWEEN operator is NULL, the entire BETWEEN condition will typically evaluate to UNKNOWN (and thus FALSE in a WHERE clause), unless explicitly handled." (I confirmed this for SQLite and PostgreSQL)

However, to match the semantics implemented here where None is treated as min/max date, I would expect __range=[None, None] not to filter any values.

connection,
operator=None,
resolve_inner_expression=False,
**extra_context, # noqa: ARG001
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the removal of extra_context strictly related to this patch? (Mainly wondering, though perhaps it could be a separate trivial PR).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, It was imported from django. I copied the extra_context thing but I realize that the codes never uses those extra_context. So I went to remove it. But I agree, it could be in a separate PR.

Comment on lines +69 to +72
Aggregate.as_mql_expr = aggregate
Count.as_mql_expr = count
StdDev.as_mql_expr = stddev_variance
Variance.as_mql_expr = stddev_variance
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see we have as_mql_expr(), as_mql_path(), and as_mql(..., as_path=...). If this is the way we keep it, it would be good to explain in the design document which objects (aggregate, func, expression, etc.) get which.

I wonder about renaming as_mql_expr() or as_mql_path() to as_mql() (i.e. treating one of paths as the default). Do you think it would be more or less confusing?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that was the idea. I’ll explain it in the docs, and we might also consider renaming some methods. The core concept is:

  • Every expression has an as_mql method.
  • In some cases, it’s simpler to implement as_mql directly, so those methods don’t follow the common expression flow.
  • For other expressions, as_mql is a composite function that delegates to as_path or as_expr when applied.
  • The base_expression.as_mql method controls when these are called and performs boilerplate checks to prevent nesting an expr inside another expr (a MongoDB 6 restriction).

In short: every object has as_mql. Some also define as_path and as_expr. The base_expression coordinates how these methods are used, except for cases where as_mql is defined directly.

Copy link
Collaborator Author

@WaVEV WaVEV Sep 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doc here: link

return {"$or": [{a: {"$exists": False}}, {a: None}]}
return {"$and": [{a: {"$exists": True}}, {a: {"$ne": None}}]}

mongo_operators_expr = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mongo_expr_operators might be a more natural word order.

lhs_mql = {"$convert": {"input": lhs_mql, "to": output_type}}
if decimal_places := getattr(self.output_field, "decimal_places", None):
lhs_mql = {"$trunc": [lhs_mql, decimal_places]}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert

from django.db.models.sql.where import AND, OR, XOR, ExtraWhere, NothingNode, WhereNode
from pymongo.errors import BulkWriteError, DuplicateKeyError, PyMongoError

from .query_conversion.query_optimizer import convert_expr_to_match
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can delete all that code too. :-)


def regex_match(field, regex, insensitive=False):
options = "i" if insensitive else ""
# return {"$regexMatch": {"input": field, "regex": regex, "options": options}}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

chop

Comment on lines +228 to +235
def test_annotate(self):
obj = Book.objects.create(
author=Author(name="Shakespeare", age=55, address=Address(city="NYC", state="NY"))
)
book_from_ny = (
Book.objects.annotate(city=F("author__address__city")).filter(city="NYC").first()
)
self.assertCountEqual(book_from_ny.city, obj.author.address.city)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it passes in the current code, it could be added separately.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mmh. maybe I have to delete it, I create It to validate something. But the check is contained in others test

qs = Tour.objects.filter(exhibit__sections__number=1)
self.assertCountEqual(qs, [self.egypt_tour, self.wonders_tour])

def test_foreign_field_exact_expr(self):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I have to do more test like this? just make a query and the check the generated sql?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, an assertion could be added to existing tests where possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants