Download the public data file
Visit the below links to download the annual public data file.
- 2024 ORCID Public Data File
- 2023 ORCID Public Data File
- 2022 ORCID Public Data File
- 2021 ORCID Public Data File
- 2020 ORCID Public Data File
Software to access the file
- Windows: A tool to unpack tar files such as Winrar or 7zip
- Mac: No additional software needed
- XML-JSON conversion: From 2018, the file is provided only in XML format. Read on to learn how to generate JSON versions of the file.
Process
- Download the file from the links above. Each year’s data file may include multiple tar.gz files. For example, the 2018 file includes one tar.gz archive for all ORCID records and one tar.gz archive for all record activities.
- Windows: Use the tool (Winrar or 7zip) that you downloaded to unzip the tar.gz file; this will result in a single .tar file (which may appear with no extension). You may be required to run the tool a second time on the .tar file to unpack it.
Mac: Double-click the tar.gz file to unzip it; this will result in a single .tar file (which may appear with no extension). Double-click the .tar file to unpack it. - The output folder for each file will differ depending on the year that the file was generated and XSD version.
- 2013-2017: Inside the generated folder you will find multiple folders, for example json/ and xml/. Inside each folder is one file for each ORCID record in the specified format and XSD version.
- 2018+, records file: Inside the generated folder you will find one folder, summary/, which contains multiple folders containing individual ORCID records in XML format. The records are aggregated into subfolders based on a shared final three digits of the ORCID iD.
- 2018+, activity file: Inside the generated folder you will find multiple folders for each ORCID record. Each folder will include the full activities on each ORCID record in XML format, separated by activity subsection.
Record Summary Files
The summaries file is a tar file that contain the public record summaries for all existing ORCID records available at the time ORCID generated the file. The file is a tar file with a root folder ORCID_YYYY_MM_summaries.tar.gz.
The folder hierarchy will be defined by the last three digit of the ORCID iD as per the below image:
Below are examples of the folder structure of some ORCID iD’s:
ORCID iD | Path inside dump file |
0000-0000-0000-0001 | /ORCID_2020_10_summaries/001/0000-0000-0000-0001-summary.xml |
0000-0000-0000-0002 | /ORCID_2020_10_summaries/002/0000-0000-0000-0002-summary.xml |
0000-0000-0000-001X | /ORCID_2020_10_summaries/01X/0000-0000-0000-001X-summary.xml |
0000-0000-0001-001X | /ORCID_2020_10_summaries/01X/0000-0000-0001-001X-summary.xml |
0000-0000-0003-0001 | /ORCID_2020_10_summaries/001/0000-0000-0003-0001-summary.xml |
0001-0000-0003-9991 | /ORCID_2020_10_summaries/991/0001-0000-0003-9991-summary.xml |
0001-0000-0005-1234 | /ORCID_2020_10_summaries/234/0001-0000-0005-1234-summary.xml |
9999-9999-9999-9991 | /ORCID_2020_10_summaries/991/9999-9999-9999-9991-summary.xml |
Below are examples of how the folder hierarchy will look inside the dump file:
Activities file
The activities file is composed of eleven different compressed files, each one containing a subset of the public activities available at the time of generating the files.
Each compressed file will contain a subset of public activities based on the last 3 digits of the ORCID iD.
For example, there will be a file ORCID_YYYY_MM_activities.tar.gz containing the public activities for every ORCID record where the checksum digit is 0.
Using the public file
The file contains the public information associated with each user’s ORCID record. Each record is included as a separate file in both JSON and XML in the 2013-2017 files. In the 2018 file onwards, each record is included as a separate file in XML, and each full activity section for each record is included as a separate file in XML. For those who prefer JSON, use the ORCID Conversion Library to convert the XML files to JSON.
For the XSD required to interpret the files, see the ORCID GitHub repository.
Please see Syncronizing with ORCID on ways you can develop your integration to enable you to keep track of researchers and their activities.