At ORCID, we have been largely focused on researchers who perform research and use the Registry to obtain an iD and link to their works. In fact, there is another group of researchers who will benefit: those who are interested in mining the linkages in ORCID data to derive and test models of knowledge flow and innovation.
A recent workshop on the Empirical Foundations of Science and Innovation Policy, organized by Julia Lane of the American Institute of Research, Jacques Mairesse of ENSAE, and Paula Stephan of Georgia Tech, brought together researchers and funding agency representatives from several countries, whose work has been inspired by both the U.S. STAR METRICS program and the National Science Foundation’s Science of Science and Innovation Policy (SciSIP) program, to start building an international community of practice for the science of science policy. During the proceedings, it became clear that persistent identifiers and data exchange standards are an important resource to support research in this emerging field.
Present at the meeting were economists, research administrators, and program administrators from funding organizations and research universities and centers, including the European Research Council, the National Science Foundation, the Observatoire des Sciences et des Techniques (OST), Agence National de Recherche, KAIST, the Univeristy of Strasbourg, Max Planck, Ministere de Enseignement Superieure et Recherche, Imperial College London, Melbourne University, California Institute of Technology, University Boccini, KU Leuven, Charles University, Institute of the Czech Langauge, Center for Economic Research and Graduate Education (CERGE-EI), École Polytechnique Fédérale de Lausanne, Institut für Samfunnsforskning, and Ohio State University.
Presentations focused on developing systematic approaches to measure scientific and economic results of research. Participants discussed feasibility studies to gather data to support their research, and the subsequent research studies. Among the questions under study are the effectiveness of “star researcher” programs, methods to link grant funding with research results, impact of funding on training and subsequent careers of graduate students, the impact of scientific team formation on research outputs, and the flow of people and ideas through organizations and firms. Abstracts and a participant list are available on the meeting website.
A common thread across presentations was the immense work required to obtain and clean research administration data—information on funding, faculty, trainees, research outputs, and spin-offs. Confidentiality was an important consideration that the community needs to address. Participants noted that previous work had been essentially one-off studies: barriers to access, updating, and annotation made each effort sui generis. It was not possible to generalize findings, particularly those studies for which data on researchers, funding, and outputs were derived from a single institution. In addition, tracking the mobility of scientists across organizations and disciplines requires the integration of data from different countries to be able to support cross-institutional or international models. The STAR METRICS and U METRICS programs, and the related international activities, have been largely focused on building a data infrastructure that permits the development of such linkages and creates the possibility for a common data platform to promote generalizable and replicable research.
Attendees agreed that the activities would be greatly helped by initiatives such as CASRAI, the EuroCRIS CERIF data model , and ORCID: they offer considerable potential to reduce the substantial data cleaning, linking, and standardization challenges if widely adopted. CASRAI provides a common dictionary for research administration data and exchangeable business objects, CERIF a common information metadata model for storage and exchange, and ORCID a persistent researcher identifier. Together with persistent identifiers for organizations and research works, these non-profit community-driven organizations can provide a thin-layer infrastructure to link data between databases through standard data exchange processes, a necessity for the study of the science of science policy. In many cases, all that is required to support interoperation is addition of identifier fields to existing databases and API mapping work, and much can be accomplished to address this disconnect in a sustainable way through cooperative efforts to tie together these various parts of the problem.
ORCID and similar community-driven infrastructure organizations can learn from the Science of Science Policy community about the needs of its researchers to ensure that the developing infrastructure is appropriately responsive. One example is how ORCID is leveraging CASRAI definitions and CERIF metadata. Another is our collaboration with DataCite on the ODIN project to support data citation (see our recent blog on the DataCite and RDA meetings). In turn, Science of Science Policy researchers can benefit from improvements in the long-term quality of data for analysis, and more broadly the research community will benefit from an evidence-based understanding of innovation.