Openness is one of ORCID’s foundational values, and sharing our annual ORCID Public Data File is one example of our commitment to openness. In fact, it is embedded in our founding principles.
Since the launch of the ORCID registry in October 2012, we have published the Public Data File annually, accessible freely and openly by all, as a means to ensure that any interested stakeholder has broad access to a dataset that has become a vital part of the scholarly communication infrastructure. We’re pleased to announce that we have now taken it a step further — to make the Public Data File more accessible — by partnering with Digital Science to provide a means of accessing the file in a way that, for the first time, facilitates exploratory data analysis.
Highlights
- ORCID publishes the Public Data File annually, at no cost, to ensure broad access to public ORCID data
- However, because the Public Data File is so large, it can be difficult to work with, inhibiting exploratory analysis
- ORCID member Digital Science is generously hosting the 2024 Public Data File on Google Big Query, making the Public Data File easily available for exploration and analysis.
In the 12 years that ORCID has been sharing the Public Data File, it has been downloaded more than 190,000 times, serving as a data source for a diverse range of projects such as the analysis of relationships and individual trajectories within the research community, scientific migrations, collaboration networks, and the adoption of ORCID across disciplines and locations. However, we understand that using the Public Data File in its current form requires a large amount of effort. Would-be users must possess an understanding of and skill with working with such a substantial dataset: how to download, parse, extract, and upload the data into a local environment—before analysis can even begin.
Building on our current relationship with Figshare which serves as the repository for the Public Data File, ORCID member Digital Science has now generously offered to host the 2024 Public Data File into Dimension’s Google Big Query (GBQ), meaning that the data is directly available for exploration and analysis without the need to first create a local copy.
Google Big Query is a cloud-based, fully-managed data analytics platform, optimized for handling large datasets efficiently. This makes it an ideal platform for exploring and analyzing the ORCID Public Data File, which contains millions of records. The ORCID Public Data File has been used for projects such as metadata enrichment, visualizing connections between authors, data sharing practices in a particular region, and analysis of scientist migration patterns.
The beta version of this service is now available, and we hope that the lower effort required to use it will enable our community to explore and develop new innovative use cases for the ORCID data, such as reporting on peer review practices, or analysis that involves linking ORCID data with data from the World Bank. While the dataset itself is and will remain freely available, those wishing to use it will need to establish their own GBQ account; Google offers a free tier of usage up to a certain level, but levies fees for usage beyond that. Within the free tier, it is possible to run many queries before running out of quota. Digital Science has also provided example sample queries that allow you to efficiently query different parts of the ORCID dataset.
To be clear, this service is offered in addition to, and does not replace, the archival copies of our Public Data File, which continue to be available for download from our Figshare repository.
What is in the ORCID Public Data File?
As mentioned on our website, The ORCID Public Data File contains all publicly available information for all ORCID iDs within the registry at the time of creation. ORCID releases the Public Data File under a CC0 1.0 Public Domain Dedication as further described in our Privacy Policy. Accordingly, ORCID does not impose restrictions or conditions on use of the Public Data File, but we have published recommended community norms in our Public Data File Use Policy.
In this version of the service, the ORCID Public Data File on Google Big Query reflects the data that is contained in the ORCID summaries files, meaning that extended details of work level (e.g. article) metadata is still only available in the downloadable versions of the ORCID Public Data File available on Figshare.
ORCID and Digital Science invites you to explore the ORCID Public Data File in Google Big Query. If you utilize the data in your project, we ask that you give attribution to Digital Science for the tool, linking back to this page, and to ORCID as the source of the Public Data File and link back to this page where technologically feasible, to facilitate access for others.
We’d love to learn about how you’re using the ORCID Public Data File on Dimension’s Google Big Query tool, so please let us know!