Skip to content

Conversation

rmitsch
Copy link
Collaborator

@rmitsch rmitsch commented Sep 21, 2025

Description

This PR includes various demo fixes and renames OCR to Ingestion to better reflect the task's purpose. The changes improve documentation consistency and add a demo file for PyData Amsterdam 2025.

Related Issues

-

Changes Made

  • Renamed OCR to Ingestion throughout the codebase to better reflect task role
  • Added demo.py file for PyData Amsterdam 2025
  • Fixed package name in setup.py (spacy-llm -> sieves)
  • Improved Outlines engine batch processing with fallback for unimplemented batch methods
  • Updated documentation and examples to use Ingestion instead of OCR
  • Fixed typing annotations by removing TypeAlias usage for Python 3.9 compatibility
  • General code cleanup and formatting improvements

Checklist

  • Tests have been extended to cover changes in functionality
  • Existing and new tests succeed
  • Documentation updated (if applicable)
  • Related issues linked

Copy link

codecov bot commented Sep 21, 2025

Codecov Report

❌ Patch coverage is 89.47368% with 4 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
sieves/engines/outlines_.py 66.66% 3 Missing ⚠️
...es/tasks/predictive/information_extraction/core.py 90.90% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #156      +/-   ##
==========================================
+ Coverage   91.16%   91.20%   +0.04%     
==========================================
  Files          44       44              
  Lines        3056     3071      +15     
==========================================
+ Hits         2786     2801      +15     
  Misses        270      270              
Files with missing lines Coverage Δ
sieves/pipeline/core.py 94.11% <ø> (ø)
sieves/tasks/predictive/classification/core.py 94.07% <100.00%> (+0.28%) ⬆️
sieves/tasks/preprocessing/ingestion/core.py 92.85% <100.00%> (ø)
sieves/tasks/preprocessing/ingestion/docling_.py 79.16% <ø> (ø)
sieves/tasks/preprocessing/ingestion/marker_.py 32.07% <ø> (ø)
...ves/tasks/preprocessing/ingestion/unstructured_.py 92.59% <ø> (ø)
...es/tasks/predictive/information_extraction/core.py 93.10% <90.90%> (+0.65%) ⬆️
sieves/engines/outlines_.py 87.80% <66.66%> (-4.31%) ⬇️

... and 2 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@rmitsch rmitsch merged commit f876781 into main Sep 21, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant