In continued alignment with our founding principles and our value of being an open organization—and as we do every year—we are celebrating Open Access Week with the release of our annual public data file. The dataset contains a snapshot of publicly shared information in the ORCID Registry—ORCID iDs, name and information about affiliations, works, funding, peer review, and more, where record holders have chosen to make that data public. The file is available for anyone to use and is published under a CC0 waiver.
The file can be used as a primary source for research. Known use cases of the file include the Worldwide Map of Research, a project led by Dario Rodighiero, and the 2018 NASEM awarded study Restless minds by John Bohannon.
Organizations, regardless of ORCID membership, can also use the public data file for metadata aggregation purposes, among other use cases. OpenAire uses the data in its Research Graph to enrich the research product records. Dblp provides open bibliographic information on major computer science journals and proceedings and harvests data from various sources, including our public data file.
How is the community using the file?
At the time of writing, the 2020 Public Data File has been downloaded almost 93,000 times.
This year we present two new public data file use cases to help enrich scholarly metadata and COVID-19 research visualization.
Lens Profiles: Patents & Composite Data
Lens Profiles use the ORCID public data file and Member API to connect researchers to their work. Administered by Cambia, an ORCID member organization and non-profit focused on creating tools and technologies to accelerate innovation, Lens Profiles is most known for its easy integration with patents that allows researchers and inventors to claim author/inventor roles and highlights the influence of their work on innovation by syncing data from ORCID records with their composite profiles from a variety of data sources. Lens Profiles contributes the highest volume of patents metadata in the ORCID registry. Aaron Ballagh, Mark Garlinghouse (Cambia), and Brian Minihan (ORCID) will be presenting a poster at this year’s November 3-5 Australiasian Research Management Society event titled “Connecting Authors and Inventors: A Story of Synergy Between Two Systems.”
ResearchGraph: COVID Research Visualization
Research Graph Foundation is a nonprofit initiative that connects data from scholarly records across global repositories using persistent identifiers. They host new, innovative projects that utilize graph visualization and advanced analytics and build on the ORCID public data file to create and animate international collaboration networks. Last July, they presented how the Research Graph team has utilized the ORCID public data file to illustrate the digital footprint of COVID-19 research. You can learn more about this project through the Research Graph Foundation’s site.
How will you use the Public Data File?
ORCID’s Public Data File is open, transparent, and non-proprietary, and we encourage you to download it from the ORCID repository for your organization’s or your own use. We have also created a Public Data File Use Policy recommending community norms for use of the file.
The file is available in XML format and divided into 12 subsets to facilitate its download and use. The first set contains the full record summary for each record. The other 11 contain the activities for each record, including full work data.
If you prefer JSON, you can use our ORCID Conversion Library available in our Github repository. This Java application enables the generation of JSON from XML in the default version ORCID schema format.
For more information, please check our page on how to work with bulk data.
If you are already using the file, or are planning to and have questions, please let us know. Your use case can help others, and we’d love to hear from you!