Skip to content

Conversation

hcvdwerf
Copy link
Collaborator

@hcvdwerf hcvdwerf commented Sep 27, 2024

🚀 Pull Request Checklist

  • Title:

    • [x] feat: Upgrade to DCAT AP 3
  • Description:

    • [x] This pull request upgrades the schema, and preset files to comply with DCAT AP 3. Fix of unit tests. At the moment there is a bug in the DCAT extension that not correctly handles with multiple contact and publishers
  • Context:

    • [x] These changes are necessary to bring the system in line with the latest DCAT AP 3 standard, improving metadata handling and making the system compatible with new regulatory requirements. This resolves issue #122.
  • Changes:

    • [x] Fix UT according to new schema
    • [x] Remove obsolete code
    • [x] Update schema and preset files to match DCAT AP 3
    • [x] Use new DCAT extension
  • Testing:

    • [x] Changes have been tested locally on the development environment. Unit tests have been run and updated to reflect the schema changes. Functional tests on Solr have passed without issues.
  • Screenshots (if applicable):

    • [ ] N/A
  • Additional Information:

    • [ ] None
  • Checklist:

    • [x] I have checked that my code adheres to the project's style guidelines and that my code is well-commented.
    • [x] I have performed self-review of my own code and corrected any misspellings.
    • [x] I have made corresponding changes to the documentation (if applicable).
    • [x] My changes generate no new warnings or errors.
    • [x] I have added tests that prove my fix is effective or that my feature works.
    • [x] New and existing unit tests pass locally with my changes.

Summary by Sourcery

Upgrade the system to comply with DCAT AP 3 by updating schema and preset files, fixing unit tests, and removing obsolete code. Update CI workflows and documentation to reflect these changes, ensuring compatibility with the latest standards and regulatory requirements.

New Features:

  • Upgrade the schema and preset files to comply with DCAT AP 3, enhancing metadata handling and compatibility with new regulatory requirements.

Bug Fixes:

  • Fix unit tests to align with the updated schema and address issues with handling multiple contacts and publishers in the DCAT extension.

Enhancements:

  • Remove obsolete code and refactor the codebase to improve maintainability and align with the new DCAT AP 3 standards.

CI:

  • Update the CI workflow to use the latest version of the ckanext-dcat package, ensuring compatibility with DCAT AP 3.

Documentation:

  • Update the README to reflect the changes in the installation instructions for the ckanext-dcat package.

Tests:

  • Remove outdated tests and update existing tests to reflect changes in the schema and data handling, ensuring they pass with the new DCAT AP 3 compliance.

Chores:

  • Add new schema and preset YAML files for DCAT AP 3 compatibility, replacing the old JSON schema file.

Copy link

sourcery-ai bot commented Sep 27, 2024

Reviewer's Guide by Sourcery

This pull request upgrades the CKAN extension to comply with DCAT AP 3 standard, which involves significant changes to the schema, preset files, and unit tests. The main focus is on updating the data model and parsing logic to match the new DCAT AP 3 requirements.

Sequence Diagram

sequenceDiagram
    participant Client
    participant FAIRDataPointDCATAPProfile
    participant EuropeanDCATAP3Profile
    Client->>FAIRDataPointDCATAPProfile: parse_dataset(dataset_dict, dataset_ref)
    FAIRDataPointDCATAPProfile->>EuropeanDCATAP3Profile: super().parse_dataset(dataset_dict, dataset_ref)
    FAIRDataPointDCATAPProfile->>FAIRDataPointDCATAPProfile: _parse_creator(dataset_dict, dataset_ref)
    FAIRDataPointDCATAPProfile->>FAIRDataPointDCATAPProfile: validate_tags(dataset_dict['tags'])
    FAIRDataPointDCATAPProfile-->>Client: Updated dataset_dict
Loading

File-Level Changes

Change Details Files
Update data model to comply with DCAT AP 3
  • Change 'contact_point' to 'contact' in dataset dictionary
  • Update structure of 'contact' and 'creator' fields
  • Modify 'publisher' field to be a list of dictionaries
  • Remove 'extras' field containing 'uri' and add 'uri' as a top-level field
  • Change datetime fields to string format
ckanext/fairdatapoint/tests/test_profiles.py
ckanext/fairdatapoint/profiles.py
ckanext/fairdatapoint/tests/test_processors.py
Update DCAT extension dependency
  • Upgrade ckanext-dcat from v1.5.1 to v2.0.0
.github/workflows/test.yml
README.md
Modify unit tests to reflect DCAT AP 3 changes
  • Update expected output in test cases
  • Comment out tests that are no longer applicable
  • Remove datetime conversion function and related tests
ckanext/fairdatapoint/tests/test_profiles.py
ckanext/fairdatapoint/profiles.py
Add new schema and preset files for DCAT AP 3
  • Create new YAML file for GDI userportal schema
  • Add new preset file for datetime fields
ckanext/fairdatapoint/tests/test_data/scheming/schemas/gdi_userportal.yaml
ckanext/fairdatapoint/tests/test_data/scheming/presets/gdi_presets.yaml
Remove obsolete code and files
  • Delete _convert_extras_to_declared_schema_fields function
  • Remove convert_datetime_string function
  • Delete test_schema.json file
ckanext/fairdatapoint/profiles.py
ckanext/fairdatapoint/tests/test_data/test_schema.json

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it.

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @hcvdwerf - I've reviewed your changes - here's some feedback:

Overall Comments:

  • Consider revisiting the commented-out tests in test_profiles.py to ensure adequate test coverage is maintained after the DCAT AP 3 upgrade.
Here's what I looked at during the review
  • 🟢 General issues: all looks good
  • 🟢 Security: all looks good
  • 🟡 Testing: 4 issues found
  • 🟢 Complexity: all looks good
  • 🟡 Documentation: 1 issue found

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment to tell me if it was helpful.

],
'title': '[PUBLIC] Low-Grade Gliomas (UCSF, Science 2014)',
'notes': 'Whole exome sequencing of 23 grade II glioma tumor/normal pairs.',
'url': 'https://cbioportal.health-ri.nl/study/summary?id=lgg_ucsf_2014',
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (testing): Date format changed from datetime object to string

The test has been updated to expect a string representation of the date instead of a datetime object. Ensure that this change is consistent with the new requirements and that the application can handle this format correctly.

'modified': datetime(2019, 10, 30, 23, 0),
'issued': '2019-10-30 23:00:00',
'modified': '2019-10-30 23:00:00',
'identifier': 'lgg_ucsf_2014',
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (testing): Publisher structure changed from string to list of objects

The test has been updated to reflect a change in the structure of the 'publisher' field. It's now a list of objects instead of a single string. This change should be verified against the new schema requirements.

Comment on lines +84 to +87
"issued": '2023-10-06T10:12:55.614000+00:00',
"language": ["http://id.loc.gov/vocabulary/iso639-1/en"],
"license_id": "",
"modified": datetime(2023, 10, 6, 10, 12, 55, 614000, tzinfo=tzutc()),
"publisher_name": "Automatic",
"modified": '2023-10-06T10:12:55.614000+00:00',
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (testing): Date format changed to ISO 8601 string

The test now expects dates in ISO 8601 string format instead of datetime objects. Verify that this change is consistent across the application and that all date handling code can work with this format.

Comment on lines +111 to +112
pip install -e 'git+https://github.com/ckan/ckanext-dcat.git@v2.0.0#egg=ckanext-dcat'
pip install -r https://raw.githubusercontent.com/ckan/ckanext-dcat/v2.0.0/requirements.txt
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question (documentation): Consider if other documentation needs updating due to ckanext-dcat version change

The update from v1.5.1 to v2.0.0 of ckanext-dcat might introduce significant changes. Have you reviewed if any other parts of the documentation or codebase need to be updated to reflect this change?

@brunopacheco1 brunopacheco1 merged commit 44bd646 into main Sep 27, 2024
2 checks passed
@brunopacheco1 brunopacheco1 deleted the Upgrade-DCAT-AP3 branch September 27, 2024 11:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants