🐛 fix(docstore): preserve retrieval ranking order in lancedb get() #745
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This pull request fixes a bug in the
get()
method of thelancedb
document store where documents retrieved by ID were returned in an arbitrary order (typically insertion order), rather than in the order of the input list of IDs.As a result, when using vector retrieval and pairing scores with documents (via
zip(docs, scores)
), the association between documents and their scores was incorrect. In fact, queries would return documents with low relevance scores that did not correspond to the top vector matches, often returning chunks from the first pages of documents due to insertion order, rather than the actual most relevant content.Consider the following code:
This PR modifies
get()
to return documents in the same order as ids, ensuring that the score-document mapping remains accurate.Type of change
Checklist