This repository contains the code to generate the Papermap journals database and compute their Papermap Journal Score
.
The initial journals data is retrieved from the OpenAlex database to create a list of 209,819 journals available in the /data/journals.jsonl file.
To create the least biased score possible, all publicly available metrics about scientific journals were included:
-
H-Index: the maximum value of H such that the journal has published H papers that have each been cited at least H times (from OpenAlex and SCImago)
-
Impact Factor: the average number of citations received this year by papers published in the journal during the last 2 years (from OpenAlex and SCImago)
-
CiteScore: the average number of citations received in the last 4 years by papers published in the journal during the same period (from Scopus)
-
SCImago Journal Rank: the average number of weighted citations received this year by papers published in the journal during the last 3 years (from SCImago and Scopus)
-
Source Normalized Impact per Paper: the average number of citations received in the last 3 years by papers published in the journal during the same period weighted by differences in citation practices between subject fields (from CWTS and Scopus)
-
Eigenfactor Score: the percentage of time spent visiting the journal when randomly following citation links from papers published in the last 5 years (from Eigenfactor)
-
Article Influence Score: the Eigenfactor Score divided by the number of papers published in the journal in the last 5 years (from Eigenfactor)
-
Self-citation rate: the percentage of citations to the journal that come from the journal itself (from CWTS)
-
Rigor & Transparency Index: the average SciScore for the journal's papers, reflecting how well they adhere to criteria such as randomization, blinding, power, transparency, and more (from SciScore)
-
Transparency and Openness Promotion Factor: a rating of journal policies assessing how strongly they promote transparency and reproducibility (from TOP Factor)
- News mentions: the average number of news mentions of the journal's papers (from Altmetric)
All the metrics are stored in the /data/journals.jsonl file, run the /data.ipynb notebook to update it.
The naive way to calculate a score would be to take the average of all these metrics, but this would be biased towards the ones with the highest values. A simple solution would be to normalize them before averaging (between 0 and 1 for example), but their distributions are still very different:
Moreover, the metrics based on citations or mentions have a highly exponential distribution that crushes most of the values to almost zero.
To solve this, all the metric values are first replaced by their rank in the list of journals (normalized so the best journal has a rank of 1 and the worst a rank of 0):
But now the opposite problem arises, this new distribution is too linear and doesn't reflect the real differences between the journals (the difference between the 1st and the 10,000th journal is not the same as between the 10,000th and the 20,000th). To correct this, a transformation is used to make the distribution more exponential, it's applied to the ranks of the metrics that are based on citations or mentions (keeping the others linear):
Finally, a warning often seen on the websites of these metrics is that they should not be compared between different fields of research. To take this into account, the same calculations are made for each field separately, and for each journal, the average of its scores in all its fields is taken and merged with its global score:
When averaging the metrics to create the final score, an important point is missing data, some journals have no value for some metrics. To avoid cases where a journal gets a really high score based on a single metric, a penalty is added to the denominator of the average for each missing metric.
The distribution of the final scores is falling a little too fast. To correct this, the same steps as described above (ranking + transformation) are also implemented, but here the result is combined with the initial score instead of replacing it:
The code that implements all these calculations is available in the /scores.py file.
The final scores are stored in the /journals/data.jsonl file, run the /scores.ipynb notebook to update it.
Here are the top 10 journals based on their Papermap Journal Score
:
Rank | Journal | Papermap Journal Score |
---|---|---|
1 | Nature | 0.981 |
2 | New England Journal of Medicine | 0.974 |
3 | Cell | 0.974 |
4 | CA: A Cancer Journal for Clinicians | 0.971 |
5 | The Lancet | 0.960 |
6 | Nature Medicine | 0.959 |
7 | Immunity | 0.958 |
8 | Cancer Cell | 0.958 |
9 | Nature Biotechnology | 0.956 |
10 | Science | 0.956 |