Skip to content

Conversation

hcvdwerf
Copy link
Collaborator

@hcvdwerf hcvdwerf commented Sep 24, 2025

🚀 Pull Request Checklist

  • Title:

    • [ ] A brief, descriptive title for the changes.
  • Description:

    • [ ] Provide a clear and concise description of your pull request, including the purpose of the changes and the approach you've taken.
  • Context:

    • [ ] Why are these changes necessary? What problem do they solve? Link any related issues.
  • Changes:

    • [ ] List the major changes you've made, ideally organized by commit or feature.
  • Testing:

    • [ ] Describe how the changes have been tested. Include any relevant details about the testing environment and the test cases.
  • Screenshots (if applicable):

    • [ ] If your changes are visual, include screenshots to help explain your changes.
  • Additional Information:

    • [ ] Add any other information that might be useful for reviewers, such as considerations, discussions, or dependencies.
  • Checklist:

    • [ ] I have checked that my code adheres to the project's style guidelines and that my code is well-commented.
    • [ ] I have performed self-review of my own code and corrected any misspellings.
    • [ ] I have made corresponding changes to the documentation (if applicable).
    • [ ] My changes generate no new warnings or errors.
    • [ ] I have added tests that prove my fix is effective or that my feature works.
    • [ ] New and existing unit tests pass locally with my changes.

Summary by Sourcery

Improve multilingual tag handling by sanitizing translated tags and deriving primary tags from default language translations with fallback support

New Features:

  • Introduce a method to sanitize translated tags and remove invalid entries based on CKAN length rules
  • Adjust parse_dataset to repopulate dataset tags from sanitized default language translations with fallback to other languages

Enhancements:

  • Log a warning when invalid translated tags are removed during sanitation
  • Ensure tags are always validated after translation-based updates

Copy link

sourcery-ai bot commented Sep 24, 2025

Reviewer's Guide

Extend dataset parsing to include extra sanitation for multilingual tags by sanitizing per-language tag lists, selecting default-language tags for the main ‘tags’ field, and ensuring all tags conform to CKAN length rules.

File-Level Changes

Change Details Files
Enhance parse_dataset to handle and sanitize translated tags before validation
  • Check if tags_translated is a dict and invoke sanitation
  • Populate dataset_dict['tags_translated'] with sanitized tags
  • Derive default_language tags or fallback from sanitized translations
  • Assign derived tags to dataset_dict['tags'] and run validate_tags
ckanext/fairdatapoint/profiles.py
Introduce _sanitize_tags_translated for per-language tag filtering and validation
  • Iterate over each language’s tag list and remove empty values
  • Convert values to tag dicts and apply validate_tags
  • Reconstruct sanitized lists and log warnings for removed tags
ckanext/fairdatapoint/profiles.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes and they look great!

Prompt for AI Agents
Please address the comments from this code review:

## Individual Comments

### Comment 1
<location> `ckanext/fairdatapoint/profiles.py:90-97` </location>
<code_context>
+            cleaned = validate_tags(tag_dicts)
+            sanitized[lang] = [tag['name'] for tag in cleaned]
+
+            if len(values) != len(sanitized[lang]):
+                log.warning(
+                    'Removed invalid tags for language %s during multilingual sanitation',
</code_context>

<issue_to_address>
**suggestion:** Log message may be too generic for debugging.

Consider updating the log to include details about the removed tags or original values for improved traceability.

```suggestion
            cleaned = validate_tags(tag_dicts)
            sanitized[lang] = [tag['name'] for tag in cleaned]

            if len(values) != len(sanitized[lang]):
                removed_tags = [v for v in values if v not in sanitized[lang]]
                log.warning(
                    'Removed invalid tags for language %s during multilingual sanitation. Original: %r, Removed: %r',
                    lang,
                    values,
                    removed_tags
                )
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +90 to +97
cleaned = validate_tags(tag_dicts)
sanitized[lang] = [tag['name'] for tag in cleaned]

if len(values) != len(sanitized[lang]):
log.warning(
'Removed invalid tags for language %s during multilingual sanitation',
lang
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Log message may be too generic for debugging.

Consider updating the log to include details about the removed tags or original values for improved traceability.

Suggested change
cleaned = validate_tags(tag_dicts)
sanitized[lang] = [tag['name'] for tag in cleaned]
if len(values) != len(sanitized[lang]):
log.warning(
'Removed invalid tags for language %s during multilingual sanitation',
lang
)
cleaned = validate_tags(tag_dicts)
sanitized[lang] = [tag['name'] for tag in cleaned]
if len(values) != len(sanitized[lang]):
removed_tags = [v for v in values if v not in sanitized[lang]]
log.warning(
'Removed invalid tags for language %s during multilingual sanitation. Original: %r, Removed: %r',
lang,
values,
removed_tags
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant