Import legacy data
Variant 1: Import legacy data automatically via OpenAlex
It is possible to import legacy data into OSIRIS without much effort. However, to prepare OSIRIS for the import, a few installation steps and configurations must first be carried out. The best way to do this is to follow the instructions for setting up workflows. You have to work through the general setup and the preparation of the queue workflow (including the configuration) until you reach the point where it says that you are ready for the legacy data import. Then return here to continue with the import of the legacy data.
The following script will import all legacy data from OpenAlex directly into your database. You can read why this is a great idea here. The script for importing the legacy data is started with the following call:
1 | |
If the script was executed successfully, DOIs should now appear and be entered into the database one after the other. Depending on how much legacy data is imported, this process can take quite a while.
Variant 2: Importing legacy data via a list of DOIs
You can also use a list of DOIs to quickly import a lot of data into OSIRIS. I have prepared a Python script for this purpose, which you can also find in the Jobs folder (only from 5 March 2024).
You carry out the following steps:
- you enter all the DOIs you would like to import into a CSV file, one DOI per line. Preferably without a URL, but you can also use one. I have already included a sample file in the folder.
- if your file has a different name or is stored somewhere else, you have to adjust the path in
import_doi.py. - that's actually it. Now just execute the following command in the command line and off you go:
1 | |
Note
Existing entries whose DOI is already stored in the database are not added again. This cannot be bypassed, as it means that if a script is cancelled, it can be executed again without changes.
By default, Levenshtein similarity is also used to filter for matching titles (more than 90% match). If this is not desired, you must set the value ignoreDupl in import_doi.py to False.