Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 25 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# SPDX-FileCopyrightText: 2024 Stichting Health-RI
#
# SPDX-License-Identifier: AGPL-3.0-only

FROM ckan/ckan-dev:2.10
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

important: can you please check these changes, they may conflict with dcat-ap 3 support and ckanext-dcat 2.0.


WORKDIR /opt

RUN pip install -e 'git+https://github.com/ckan/[email protected]#egg=ckanext-scheming[requirements]'
RUN pip install -e 'git+https://github.com/ckan/[email protected]#egg=ckanext-harvest[requirements]'
RUN pip install -e 'git+https://github.com/ckan/[email protected]#egg=ckanext-dcat[requirements]'
RUN pip install -r https://raw.githubusercontent.com/ckan/ckanext-dcat/v1.5.1/requirements.txt

COPY . /opt/fdp
WORKDIR /opt/fdp

RUN pip install -r requirements.txt
RUN pip install -r dev-requirements.txt
RUN pip install --upgrade pytest-rerunfailures

RUN python3 setup.py develop
# Replace default path to CKAN core config file with the one on the container
RUN sed -i -e 's/use = config:.*/use = config:\/srv\/app\/src\/ckan\/test-core.ini/' test.ini

CMD ./ckanext/fairdatapoint/tests/run_tests.sh
27 changes: 12 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@ SPDX-FileContributor: 2024 Stichting Health-RI
SPDX-License-Identifier: CC-BY-4.0
-->


[![REUSE status](https://api.reuse.software/badge/github.com/GenomicDataInfrastructure/gdi-userportal-ckanext-fairdatapoint)](https://api.reuse.software/info/github.com/GenomicDataInfrastructure/gdi-userportal-ckanext-fairdatapoint)
[![Tests](https://github.com/GenomicDataInfrastructure/gdi-userportal-ckanext-fairdatapoint/actions/workflows/test.yml/badge.svg)](https://github.com/GenomicDataInfrastructure/gdi-userportal-ckanext-fairdatapoint/actions/workflows/test.yml)
[![Quality Gate Status](https://sonarcloud.io/api/project_badges/measure?project=GenomicDataInfrastructure_gdi-userportal-ckanext-fairdatapoint&metric=alert_status)](https://sonarcloud.io/summary/new_code?id=GenomicDataInfrastructure_gdi-userportal-ckanext-fairdatapoint)
Expand All @@ -14,21 +13,21 @@ SPDX-License-Identifier: CC-BY-4.0
[![GitHub contributors](https://img.shields.io/github/contributors/GenomicDataInfrastructure/gdi-userportal-ckanext-fairdatapoint)](https://github.com/GenomicDataInfrastructure/gdi-userportal-ckanext-fairdatapoint/graphs/contributors)
[![Contributor Covenant](https://img.shields.io/badge/Contributor%20Covenant-2.1-4baaaa.svg)](code_of_conduct.md)


# ckanext-fairdatapoint

CKAN harvester for [FAIR Data Point](https://www.fairdatapoint.org/). Contains a harvester for FAIR data points. In the future, the FAIR data point API might be supported by this extension too.

## Stages

The harvester runs in three stages. Each of these stages can be modified.

1. Gather stage. The gather stage uses the FairDataPointRecordProvider which implements the IRecordProvider interface to create a list of identifiers of the objects which should be included in the harvest. In case of a FAIR data point, this list includes catalogs and datasets. In the future, collections could be added;
2. Fetch stage. The fetch stage downloads the actual source data. In this phase, additional data from other sources may be included to better suit the DCAT profile as expected by CKAN;
3. Import stage. The import stage does the actual import. How the RDF from the FAIR data point is mapped to CKAN packages and resources is determined by so-called application profiles. In case of a FAIR data point which uses custom fields, a profile must be created. A profile can be defined as a Python class in the ckanext.fairdatapoint.profiles.py file. The new profile must be registered in the [ckan.rdf.profiles] section of setup.py. What profile is being used for a particular is determined by the harvester configuration.
3. Import stage. The import stage does the actual import. How the RDF from the FAIR data point is mapped to CKAN packages and resources is determined by so-called application profiles. In case of a FAIR data point which uses custom fields, a profile must be created. A profile can be defined as a Python class in the ckanext.fairdatapoint.profiles.py file. The new profile must be registered in the [ckan.rdf.profiles] section of setup.py. What profile is being used for a particular is determined by the harvester configuration.

``
{
"profiles": "fairdatapoint_dcat_ap"
"profiles": "fairdatapoint_dcat_ap"
}
``

Expand All @@ -46,7 +45,6 @@ ckan --config=<full path to CKAN ini-file> search-index rebuild

For more information got to [GDI harvester information](https://genomicdatainfrastructure.github.io/gdi-userportal-docs/docs/ckan/harvester/)


## Requirements

Compatibility with core CKAN versions:
Expand All @@ -55,7 +53,6 @@ Compatibility with core CKAN versions:
|-----------------|-------------|
| 2.10 | tested |


## Installation

**TODO:** Add any additional install steps to the list below.
Expand All @@ -70,10 +67,10 @@ To install gdi-userportal-ckanext-fairdatapoint:

2. Clone the source and install it on the virtualenv

git clone https://github.com/GenomicDataInfrastructure/gdi-userportal-ckanext-fairdatapoint.git
git clone <https://github.com/GenomicDataInfrastructure/gdi-userportal-ckanext-fairdatapoint.git>
cd gdi-userportal-ckanext-fairdatapoint
pip install -e .
pip install -r requirements.txt
pip install -r requirements.txt

3. Add `fairdatapoint` to the `ckan.plugins` setting in your CKAN
config file (by default the config file is located at
Expand All @@ -83,17 +80,15 @@ To install gdi-userportal-ckanext-fairdatapoint:

sudo service apache2 reload


## Config settings

None at present

**TODO:** Document any optional config settings here. For example:

# The minimum number of hours to wait before re-checking a resource
# (optional, default: 24).
ckanext.fairdatapoint.some_setting = some_default_value

# The minimum number of hours to wait before re-checking a resource
# (optional, default: 24).
ckanext.fairdatapoint.some_setting = some_default_value
Comment on lines +89 to +91
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question (documentation): Changed formatting of config settings comments

Is this change in indentation intentional? It might affect readability.


## Developer installation

Expand All @@ -107,6 +102,7 @@ do:

Fairdatapoint plugin depends on `ckanext-scheming`, `ckanext-harvester` and `ckanext-dcat`. Make sure these are installed,
otherwise run:

```commandline
pip install -e 'git+https://github.com/ckan/[email protected]#egg=ckanext-scheming[requirements]'
pip install -e 'git+https://github.com/ckan/[email protected]#egg=ckanext-harvest[requirements]'
Expand All @@ -116,8 +112,7 @@ pip install -r https://raw.githubusercontent.com/ckan/ckanext-dcat/v1.5.1/requir

## Tests

To run the tests go to [GDI harvester test information](https://genomicdatainfrastructure.github.io/gdi-userportal-docs/docs/ckan/extension-local-setup-and-testing/)

To run the tests, run `docker compose build && docker-compose -f docker-compose.yml up --abort-on-container-exit --exit-code-from ckan-test && docker-compose rm -fsv`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (documentation): Updated test instructions with specific command

Consider keeping the link to the GDI harvester test information as well for more detailed guidance.

Suggested change
To run the tests, run `docker compose build && docker-compose -f docker-compose.yml up --abort-on-container-exit --exit-code-from ckan-test && docker-compose rm -fsv`
To run the tests:
1. Run the following command:
`docker compose build && docker-compose -f docker-compose.yml up --abort-on-container-exit --exit-code-from ckan-test && docker-compose rm -fsv`
2. For more detailed guidance on GDI harvester testing, refer to [GDI Harvester Test Information](link-to-gdi-harvester-test-info).


## Releasing a new version of ckanext-fairdatapoint

Expand Down Expand Up @@ -152,7 +147,9 @@ If ckanext-fairdatapoint should be available on PyPI you can follow these steps
git push --tags

## License

This work is licensed under multiple licenses. Because keeping this section up-to-date is challenging, here is a brief summary as of January 2024:

- All original source code is licensed under [AGPL](./LICENSES/AGPL-3.0-only.txt).
- All documentation is licensed under [CC-BY-4.0](./LICENSES/CC-BY-4.0.txt).
- Some configuration and data files are licensed under [CC-BY-4.0](./LICENSES/CC-BY-4.0.txt).
Expand Down
14 changes: 14 additions & 0 deletions ckanext/fairdatapoint/tests/run_tests.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
#!/bin/sh

# SPDX-FileCopyrightText: 2024 Stichting Health-RI
#
# SPDX-License-Identifier: AGPL-3.0-only

# Initialize database
ckan -c test.ini db init

# Run tests
pytest --ckan-ini=test.ini --cov=ckanext.fairdatapoint --disable-warnings ckanext/fairdatapoint

# Generate coverage report
coverage xml -o coverage.xml
57 changes: 57 additions & 0 deletions docker-compose.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
# SPDX-FileCopyrightText: 2024 Stichting Health-RI
#
# SPDX-License-Identifier: AGPL-3.0-only

services:
solr:
image: ckan/ckan-solr:2.10-solr9
# SOLR has very annoying log output from the health check
logging:
driver: none
healthcheck:
test: [ "CMD-SHELL", "curl -f http://localhost:8983/solr/ckan/admin/ping | grep \"OK\" || exit 1" ]
interval: 30s
timeout: 30s
retries: 5
start_period: 5s

postgres:
image: ckan/ckan-postgres-dev:2.11
environment:
POSTGRES_USER: postgres
POSTGRES_PASSWORD: postgres
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚨 suggestion (security): Consider using environment variables or Docker secrets for sensitive information

Hardcoding passwords in the Docker Compose file poses a security risk, especially if this file is committed to version control. Use environment variables or Docker secrets to manage sensitive information more securely.

      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}

POSTGRES_DB: postgres
healthcheck:
test: [ "CMD", "pg_isready" ]
interval: 10s
timeout: 5s
retries: 5
start_period: 5s

redis:
image: redis:7
healthcheck:
test: [ "CMD", "redis-cli", "ping" ]
interval: 10s
timeout: 30s
retries: 5
start_period: 3s

ckan-test:
platform: linux/amd64
build:
context: ./
dockerfile: Dockerfile
depends_on:
postgres:
condition: service_healthy
solr:
condition: service_healthy
redis:
condition: service_healthy
environment:
CKAN_SQLALCHEMY_URL: postgresql://ckan_default:pass@postgres/ckan_test
CKAN_DATASTORE_WRITE_URL: postgresql://datastore_write:pass@postgres/datastore_test
CKAN_DATASTORE_READ_URL: postgresql://datastore_read:pass@postgres/datastore_test
CKAN_SOLR_URL: http://solr:8983/solr/ckan
CKAN_REDIS_URL: redis://redis:6379/1