Open Science Infrastructure

From HandWiki
Open Science infrastructure is one of the four pillars of Open Science in the UNESCO Recommendation on Open Science (2021)

Open Science Infrastructure (or open scholarly infrastructure) is an information infrastructure that supports the open sharing of scientific productions such as publications, datasets, metadata or code. In November 2021 the Unesco recommendation on Open Science describe it as "shared research infrastructures that are needed to support open science and serve the needs of different communities".[1]

Open science infrastructures are a form of scientific infrastructure (also called cyberinfrastructure, e-Science or e-infrastructure) that support the production of open knowledge. Beyond the management of common resources, they are frequently structured as community-led initiatives with a set collective norms and governance regulations, which makes them also a form of knowledge commons. The definition of open science infrastructures usually exclude privately-owned scientific infrastructures run by leading commercial publishers. Conversely it may include actors not always characterized as scientific infrastructures that play a critical role in the ecosystem of open science, such as publishing platforms in open access (open scholarly communication service).

Computing infrastructures and online services have played a key role in the production and diffusion of scientific knowledge since the 1960s. While these early scientific infrastructure were initially envisioned as community initiatives, they could not be openly used due to the lack of interconnectivity and the cost of network connection. The creation of the World Wide Web made it possible to share data and publications on a large scale. The sustainability of online research projects and services became a critical policy issue and entailed the development of major infrastructure in the 2000s.

The concept of open science infrastructure emerged after 2015 following a scientific policy debate over the expansion of commercial and privately-owned infratructures in numerous research activities and the publication of the Principles for Open Scholarly Infrastructures. Since the 2010s, large ecosystems of interconnected scientific infrastructures have emerged in Europe, South and North America through the development of new open science project and the conversion of legacy infrastructures to open science principles.

Definitions and terminology

Open science infrastructure is a form of knowledge infrastructure that makes it possible to create, publish and maintain open scientific outputs such as pûblication, data or softwares.

The Unesco recommendation of Open Science approved in November 2021 define open science infrastructures as "shared research infrastructures that are needed to support open science and serve the needs of different communities".[2] The SPARC report on European Open Science Infrastructure include the following activities within the range of open science infrastructures: "We define Open Access & Open Science Infrastructure as sets of services, protocols, standards and software contributing to the research lifecycle – from collaboration and experimentation through data collection and storage, data organization, data analysis and computation, authorship, submission, review and annotation, copyediting, publishing, archiving, citation, discovery and more"[3]

Infrastructure

The use of the term "infrastructure" is an explicit reference to the physical infrastructures and networks such as power grids, road networks or telecommunications that made it possible to run complex economic and social system after the industrial revolution: "The term infrastructure has been used since the 1920s to refer collectively to the roads, power grids, telephone systems, bridges, rail lines, and similar public works that are required for an industrial economy to function (…) If infrastructure is required for an industrial economy, then we could say that cyberinfrastructure is required for a knowledge economy".[4] The concept of infrastructure was notably extended in 1996 to forms of computer-mediated knowledge production by Susan Leigh Star and Karen Ruhleder, through an empirical observation of an early form of open science infrastructure, the Worm Community System.[5] This definition has remained influential through the next two decades in science and technology studies[6] and has affected the policy debate over the building of scientific infrastructure since the early 2000s[7]

Open science infrastructure have specific properties that contrast them with other forms of open science projects or initiatives:

  • Open science infrastructures are not simply a technical product but embed a set of tools, institutions and social norms[8][9] Consequently, infrastructures are not always visible as they can be largely hidden under the routine of normal activities[10][11] The resilience and tacitness of the infrastructures makes it especially difficult to identify the real contributions and "labour cost" of open science work, as it remains "invisible in the university system".[12] This make it also difficult to allocate funding effectively as critical infrastructure may remain undetected by funding bodies.[13]
  • Open science infrastructures are durable and resilient. They are expected to run on a long term basis and multiple research programs relies on.[14][8] To some extent, infrastructure are successful when they are forgotten and become an integral part of routine research activities: "Infrastructure at its best is invisible. We tend to only notice it when it fails."[15]
  • Open science infrastructures can be shared and used by different actors and communities. It must be sufficiently consistent to remain coordinated and yet it have to welcome a diverse array of local uses: "an infrastructure occurs when the tension between local and global is resolved".[16] Predefined agreement on the scope and the governance of the infrastructure within all stakeholders is a critical step.[17]

Openness and the commons

Open science infrastructures are open, which differentiate them with other scientific and knowledge infrastructure and, more specifically, with subscription-based commercial infrastructures. Openness is both a core value and a directing principle that affect the aims, the governance and the management of the infrastructure. Open science infrastructure face similar issues met by other open institutions such as open data repositories or large scale collaborative project such as Wikipedia: "When we study contemporary knowledge infrastructures we find values of openness often embedded there, but translating the values of openness into the design of infrastructures and the practices of infrastructuring is a complex and contingent process".[18]

The conceptual definition of open science infrastructures has been largely influenced by the analysis of Elinor Ostrom on the commons and more specifically on the knowledge commons. In accordance with Ostrom, Cameron Neylon understates that open infrastructures are not only characterized by the management of a pool of common resources but also by the elaboration of common governance and norms.[19] The economic theory of the commons make it possible to expand beyond the scope of limited scope of scholar associations toward large scale community-led initiatives: "Ostrom's work (…) provides a template (…) to make the transition from a local club to a community-wide infrastructure."[20] Open science infrastructure tend to favor a non-for profit, publicly-funded model with strong involvement from scientific communities, which disassociate them from privately-owned closed infrastructures: "open infrastructures are often scholar-led and run by non-profit organisations, making them mission-driven instead of profit-driven."[21] This status aims to ensure the autonomy of the infratructure and prevent their incorporation into commercial infrastructure.[22] It has wide range implications on the way the organization is managed: "the differences between commercial services and non-profit services permeated almost every aspect of their responses to their environment".[23]

Open science infrastructures are not only a more specific subset of scientific infrastructures and cyberinfrastructures but may also include actors that would not fall into this definition. "Open access publication platforms" such as Scielo, OpenEdition or the Open Library of Humanities are considered an integral part of open science infrastructures in the UNESCO definition[24] and in several literature review[25] and policy reports,[26] whereas they were usually considered as a separate entities in the policy debate on cyberinfrastructure and e-infrastructures.[27] In the 2010 report of the European Commission on e-infrastructure, scientific publishing plaforms are "not e-Infrastructures but closely related to it".[28]

Open science infrastructures may also incorporate additional values and ethical principles. Samuel Moore has theorized a form of care-full scholarly commons that does not exist yet but would incorporate latent forms of open science infrastructure and communities: "In addition to sharing resources with other projects, commoning also requires commoners to adopt an outwardly-focused, generous attitude to other commons projects, redirecting their labour away from proprietary."[29] In 2018, Okune et al. introduced a similar concept of "inclusive knowledge infrastructures" that "deliberately allow for multiple forms of participation amongst a diverse set of actors (…) and seek to redress power relations within a given context."[30]

Principles for open science infrastructures

In 2015 Principles for Open Scholarly Infrastructure have laid out an influential prescriptive definition of open science infrastructures. Subsequent definitions and terminologies of open science infratructures have been largely elaborated on this basis.[31][32][33] The text has also influenced the definition of open science infrastructure retained by the UNESCO in November 2021.[34]

The Principles attempt to hybridize the framework of infrastructure studies with the analysis of the commons initiated by Elinor Ostrom. The principles develop a series of recommendations in three critical areas to the success of open infrastructures:

  • Governance: the governance of the infrastruture should be open and accountable to the scientific communities it aims to serve. Specific measures should ensure that the management of the organization is transparent and diverse.[15]
  • Sutainability: the core activities of organization should be covered by recurring funds. Short-term subventions should be limited to short-term projects. Whil the organization could charge for services, it should not extend to the data that should remain "a community property".[15]
  • Insurance: the technical infrastructure and the output of the organization are open. This ensure that the infrastructure can be recreated if necessary (in the jargon of open source, it becomes "forkable").[15]

The text ends by mentioning several potential consequences of the principles. The authors advocate for a responsible centralization, that embodies a different than the large web commercial platforms like Google and Facebook while still maintaining the important benefit of centralized infrastructures: "we will be able to build accountable and trusted organisations that manage this centralization responsibly".[15] Existing examples of large open infrastructure include ORCID, the Wikimedia Foundation or CERN.

A more critical reception has focused on the underlying political philosophy of the Principles.[35][36] While the scientific community is a key part of the governance of open science infrastructure, Samuel Moore underline that it is never precisely defined, which raised potential issues of under-representation of minority groups:

[this] raises questions over who is the community that gets to govern and exclude, and what gives them the right to decide the conditions These questions are especially relevant for understandings of the commons that are all-encompassing or operate on a large scale, which tend to favour more powerful stakeholders, wealthy disciplines and countries in the Global North. Such commons treat subjects in a political vacuum rather than embedded in a particular situation and entangled in a number of different relationships and projects with asymmetrical power structures.[37]

History

Early developments (1950–1990)

The Sputnik launch has triggered one of the first major debate on scientific infrastructure

Scientific projects have been among the earliest use case for digital infrastructure. The theorization of scientific knowledge infrastructure even predates the development of computing technologies. The knowledge network envisioned by Paul Otlet or Vannevar Bush already incorporated numerous features of online scientific infrastructures.[38]

After the Second World War, the United States faced a "periodical crisis": existing journals could not keep up with the rapidly increasing scientific output.[39] The issue became politically relevant after the successful launch of Sputnik: "The Sputnik crisis turned the librarians’ problem of bibliographic control into a national information crisis."[40]. The emerging computing technologies were immediately considered as a potential solution to make a larger amount of scientific output readable and searchable. Access to foreign language publication was also a key issue that was expected to be solved by machine translation: in the 1950s, a significant amount of scientific publications were not available in English, especially the one coming from the Soviet block.

Influent members of the National Science Foundation like Joshua Ledeberg advocated for the creation of a "centralized information system", SCITEL that would at first coexist with printed journals and gradually replace them altogether on account of its efficiency.[41] In the plan laid out by Ledeberg to Eugen Garfield in November 1961, the deposit would index as much as 1,000,000 scientific articles per year. Beyond full-text searching, the infrastructure would also ensure the indexation of citation and other metadata, as well as the automated translation of foreign language articles.[42]

Although it anticipates key features of online scientific platforms, the SCITEL plan was technically irrealistic at the time. The first working prototype on an online retrieval system developed in 1963 by Doug Engelhart and Charles Bourne at the Stanford Research Institute was heavily constrained by memory issues: no more than 10,000 words of a few documents could be indexed.[43]

The indexation process of citations in MEDLARS, an early scientific infrastructure for publications in medicine

Instead of a general purpose publishing platform, the early scientific computing infrastructures focused on specific research areas, such as MEDLINE for medicine, NASA/RECON for space engineering or OCLC Worldcat for library search: "most of the earliest online retrieval system provided access to a bibliographic database and the rest used a file containing another sort of information—encyclopedia articles, inventory data, or chemical compounds."[44] This early development of scientific computing affected a large variety of disciplines and communities, including the social sciences: "The 1960s and 1970s saw the establishment of over a dozen services and professional associations to coordinate quantitative data collection".[45] Yet these infrastructures were mostly invisible to researchers, as most of the research was done by professional librarians. Not only were the search operating systems complicated to use, but the search has to be performed very efficiently given the prohibitive cost of long distance telecommunication.[46] To become technically feasible, scientific infrastructure could never be open and became fundamentally hidden to their end users:

The designers of the first online systems had presumed that searching would be done by end users; that assumption undergirded system design. MEDLINE was intended to be used by medical researchers and clinicians, NASA/RECON was designed for aerospace engineers and scientists. For many reasons, however, most users through the seventies were librarians and trained intermediaries working on behalf of end users. In fact, some professional searchers worried that even allowing eager end users to get at the terminals was a bad idea.[47]

The development of digital infrastructure for scientific publication was largely undertaken by private companies. In 1963, Eugene Garfield created the Institute for Scientific Information that aimed to transform the projects initially envisioned with Lederberg into a profitable business. The Science Citation Index relied on a computational processing of citation data. It had a massive and lasting influence on the structuration of global scientific publication in the last decades of the 20th century, as its most important metrics, the Journal Impact Factor, "ultimately came to provide the metric tool needed to structure a competitive market among journal.[48] Garfield also successfully launched Current Contents, a periodic compilation of scientific abstracts that acted as a simplified commercial version of the central deposit envisioned within SCITEL. Rather than being replaced by a centralized information system, leading scientific publishers have been able to develop their own information infrastructure that ultimately reinforced their business position. By the end of the 1960s, the dutch publisher Elsevier and the german publisher Springer have started to computarize their internal data, as well as the management of the journal reviews.[49]

Until the advent of the web, the landscape of scientific infrastructures remained fragmented.[50] Projects, and communities relied on their own unconnected networks at a national or institutional level: "the Internet was nearly invisible in Europe because people there were pursuing a separate set of network protocols".[51] The birthing place of the World Wide Web, the CERN, had its own version of Internet, CERN-Net and also supported its own protocol for e-mail exchange.[52] The European Space Agency used its own iteration of the RECON system also used by NASA engineers (ESRO/RECON).[53] The insulated scientific infrastructures could hardly be connected before the advent of the web. Communication between scientific infrastructures was not only challenging across space, but also across time. Whenever a communication protocol was no longer maintained, the data and knowledge it disseminated was likely to disappear as well: "the relationship between historical research and computing has been durably affected by aborted projects, data loss and unrecoverable formats".[54]

The Web Revolution (1990–1995)

The World Wide Web was originally framed as an open scientific infrastructure. The project was inspired by ENQUIRE, an information management software commissioned to Tim Berners-Lee by the CERN for the specific needs of high energy physics. The structure of ENQUIRE was closer to an internal web of data: it connected "nodes" that "could refer to a person, a software module, etc. and that could be interlined with various relations such as made, include, describes and so forth".[55] While it "facilitated some random linkage between information" Enquire was not able to "facilitate the collaboration that was desired for in the international high-energy physics research community".[56] Like any significant computing scientific infrastructure before the 1990s, the development of ENQUIRE was ultimately impeded by the lack of interoperability and the complexity of managing network communications: "although Enquire provided a way to link documents and databases, and hypertext provided a common format in which to display them, there was still the problem of getting different computers with different operating systems to communicate with each other".[57]

Sharing of data and data documentation was a major focus in the initial communication of the World Wide Web when the project was first unveiled in August 1991 : "The WWW project was started to allow high energy physicists to share data, news, and documentation. We are very interested in spreading the web to other areas, and having gateway servers for other data".[58]

The web rapidly superseded pre-existing online infrastructure, even when they included more advanced computing features. From 1991 to 1994, users of the Worm Community System, a major biology database on worms, switched to the Web and Gopher. While the Web did not include many advanced functions for data retrieval and collaboration, it was easily accessible. Conversely, the Worm Community System could only be browsed on specific terminals shared across scientific institutions: "To take on board the custom-designed, powerful WCS (with its convenient interface) is to suffer inconvenience at the intersection of work habits, computer use, and lab resources (…) The World-Wide Web, on the other hand, can be accessed from a broad variety of terminals and connections, and Internet computer support is readily available at most academic institutions and through relatively inexpensive commercial services.[59]"

The Web and similar protocols developed at the time have had a similar impact on scientific publications. Early forms of open access publishing were not developed by large scale institutional infrastructures but through small initiatives. Universal access, regardless of the operating system, made it possible to maintain and share community-driven electronic journals year before online commercial scientific publishings became viable:

In the late ‘80s and early ‘90s, a host of new journal titles launched on listservs and (later) the Web. Journals such as Postmodern Cultures, Surfaces, the Bryn Mawr Classical Review and the Public-Access Computer Systems Review were all managed by scholars and library workers rather than publishing professionals.[60]

The first open-access repositories were individual or community initiatives as well. In August 1991, Paul Ginsbarg created the first inception of the arXiv project at the Los Alamos National Laboratory in answer to recurring storage issue of academic mailboxes on account of the increasing sharing of scientific articles[61]

Building scientific infrastructures for the web (1995-2015)

Main pages: Cyberinfrastructure and E-Science

The development of the World-Wide Web had rendered numerous pre-existing scientific infrastructure obsolete. It also lifted numerous restrictions and obstacles to online contribution and network management that made it possible to attempt more ambitous project. By the end of the 1990s, the creation of public scientific computing infrastructure became a major policy issue.[62] The first wave of web-based scientific projects in the 1990s and the early 2000s revealed critical issues of sustainability. As funding was allocated on a specific time period, critical databases, online tools or publishing platforms could hardly be maintained[63] and project managers were faced with a valley of death "between grant funding and ongoing operational funding".[64].

Several competing terms appeared to fill this need. In the United States, the cyber-infrastructure was used in a scientific context by a US National Science Foundation (NSF) blue-ribbon committee in 2003: "The newer term cyberinfrastructure refers to infrastructure based upon distributed computer, information and communication technology. If infrastructure is required for an industrial economy, then we could say that cyberinfrastructure is required for a knowledge economy."[65] E-infrastructure or e-science were used in a similar meaning in the United Kingdom and European countries.

Thanks to "sizable investments",[66] major national and international infrastructures have been incepted from the initial policy discussion in the early 2000s to the economic crisis of 2007-2008, such as the Open Science Grid, BioGRID, the JISC, DARIAH or the Project Bamboo.[67][68] Specialized free software for scientific publishing like Open Journal Systems became available after 2000. This development entailed a significant expansion of non-commercial open access journals by facilitating the creation and the administration of journal website and the digital conversion of existing journals.[69] Among the non-commercial journals registered to the Directory of Open Access Journals, the number of annual creation has gone from 100 by the end of the 1990s to 800 around 2010, and not evolved significantly since then.[70]

By 2010, infrastructure are "no longer in infancy" and yet "they are also not yet fully mature".[66] While the development of the web solved a large range of technical issues regarding network management, bulding scientific infrastructure remained challenging. Governance, communication accross all involved stakeholders, and strategical divergences were major factors of success or failure. One of the first major infrastructure for the humanities and the social science, the Project Bamboo was ultimately unable to achieve its ambitious aims: "From the early planning workshops to the Mellon Foundation’s rejection of the project’s final proposal attempt, Bamboo was dogged by its reluctance and/or inability to concretely define itself".[71] This lack of clarity was further aggravated by recurring communication missteps between the project iniators and the community it aimed to serve. "The community had spoken and made it clear that continuing to emphasize Service-oriented architecture would alienate the very members of the community Bamboo was intended to benefit most: the scholars themselves".[72] Budgets cuts following the economic crisis of 2007-2008 underlined the fragility of ambitious infrastructure plans relying on a significant reccurring funds.[73]

Leading commercial ecosystems for scientific research

Leading commercial publishers were initially distanced by the unexpected rise of the Web for academic publication: the executive board of Elsevier "had failed to grasp the significance of electronic publishing altogether, and therefore the deadly danger that it posed—the danger, namely, that scientists would be able to manage without the journal".[74] The persistance of high revenues from subscription and the consolidation of the sector made it possible to fund the conversion of the pre-existing online services to the web as well as the digitization of past collections. By the 2010s, leading publishers have been "moving from a content-provision to a data analytics business"[75] and developed or acquired new key infrastructures for the management scientific and pedagogic activities: "Elsevier has acquired and launched products that extend its influence and its ownership of the infrastructure to all stages of the academic knowledge production process".[76]. Since it has expanded beyond publishing, the vertical integration of privately-owned infrastructures has become extensively integrated to daily research actvities.

The privatised control of scholarly infrastructures is especially noticeable in the context of ‘vertical integration’ that publishers such as Elsevier and SpringerNature are seeking by controlling all aspects of the research lifecycle, from submission to publication and beyond. For example, this vertical integration is represented in a number of Elsevier’s business acquisitions, such as Mendeley (a reference manager), SSRN (a pre-print repository) and Bepress (a provider of repository and publishing software for universities).[77]

Toward open science infrastructures (2015-…)

The consolidation and expansion of commercial scientific infrastructure had entailed renewed calls to secure "community-controlled infrastructure".[78] The acquisition of the open repositories Digital Commons and SSRN by Elsevier has highlighted the lack of reliability of critical scientific infrastructure for open science.[79][80][81] The SPARC report on European Infrastructures underlines that "a number of important infrastructures at risk and as a consequence, the products and services that comprise open infrastructure are increasingly being tempted by buyout offers from large commercial enterprises. This threat affects both not-for-profit open infrastructure as well as closed, and is evidenced by the buyout in recent years of commonly relied on tools and platforms such as SSRN, bepress, Mendeley, and Github."[82]

In contrast with the consolidation of privately-owned infrastructure, the open science movement "has tended to overlook the importance of social structures and systemic constraints in the design of new forms of knowledge infrastructures."[83]. It remained mostly focused to the content of scientific research, with little integration of technical tools and few large community initiatives. "common pool of resources is not governed or managed by the current scholarly commons initiative. There is no dedicated hard infrastructure and though there may be a nascent community, there is no formal membership."[84]

More precise concepts were needed to embed ethical principles of openness, community-service and autonomous governance in the building of infrastructure and ensure the transformation of small localized scholarly networks into large, "community-wide" structures.[85] In 2013, Cameron Neylon underlined that the lack of common infrastructure was one of the main weakness of the open science ecosystem: "in a world where it can be cheaper to re-do an analysis than to store the data, we need to consider seriously the social, physical, and material infrastructure that might support the sharing of the material outputs of research".[86] Two years later, Neylon, Geoffrey Bilder and Jenifer Lin defined a series of Principles for Open Scholarly Infrastructure[15] that reacted primarily to the discrepancy between the increasing openness of scientific publications or datasets and the closeness of the infrastructure that control their circulation.

Over the past decade, we have made real progress to further ensure the availability of data that supports research claims. This work is far from complete. We believe that data about the research process itself deserves exactly the same level of respect and care. The scholarly community does not own or control most of this information. For example, we could have built or taken on the infrastructure to collect bibliographic data and citations but that task was left to private enterprise.[15]

Since 2015 these principles have become the most influential definition of Open Science Infrastructures and been endorsed by leading infrastructures such as Crossref,[87] OpenCitations[88] or Data Dryad[89] and has become a commmon basis for the institutional evaluation of existing open infrastructures.[90] The main focus of the Principles is to build "trustworthy institutions" with significant committments in terms of governance, financial sustainability and technical efficiency sot that it can be durably relied on by scientific communities.[91]

By 2021, public services and infrastructures for research have largely endorsed open science as an integral part of their activity and identity: "open science is the dominant discourse to which new online services for research refer."[92] According to the 2021 Roadmap of the European Strategy Forum on Research Infrastructures (ESFRI), major legacy infrastructures in Europe have embraced open science principles. "Most of the Research Infrastructures on the ESFRI Roadmap are at the forefront of Open Science movement and make important contributions to the digital transformation by transforming the whole research process according to the Open Science paradigm."[93] Examples of extensive data sharing programs include the European Social Survey (in social science), ECRIN ERIC (for clinical data) or the Cherenkov Telescope Array (in Astronomy).[93]

In agreement with the original intent of the Principles, open science infrastructure are "seen as an antidote to the increased market concentration observed in the scholarly communication space."[94]. In November 2021, the UNESCO Recommendation for Open Science acknowledged open science infrastructure as one of the four pillar of open science, along with open science knowledge, open engagement of societal actors and open dialog with other knowledge system and called for sustained investment and funding: "open science infrastructures are often the result of community-building efforts, which are crucial for their longterm sustainability and therefore should be not-for-profit and guarantee permanent and unrestricted access to all public to the largest extent possible."[95]

The development of open scientific infrastructure has become a debated topic regarding the future of online scientific research. In January 2021, a collective of researchers called for a Plan I or Plan Infrastructure in reaction to perceived shortcomings of the international initiative for open science of the cOAlition S, the Plan S.[96] In contrast with the focus of Plan S on scientific publication, Plan I aims to integrate all research outputs on large interoperable infrastructures: "research and scholarship are crucially dependent on an information infrastructure that treats all scholarly output, text, data and code, equally and that is based on open standards and open markets."[97]

Organization of open infrastructures

Most of the landscape reports on Open Infrastructure have been undertaken in Europe and, to a lesser extent, in Latin America. For Europe, the main sources include the SPARC report from 2020,[98] the OPERAS report on social science and humanities infrastructure[99] as well as the 2019 report of Katherine Skinner (that also extends to a few North American infrastructures). International studies include European Commission 2010 report on The Role of E-Infrastructure which mostly receive input from Europe, South America and North America.[100]

These reports underline that important open science infrastructures may be already existing and yet remain invisible to funders and scientific policies: "alternative practices and projects exist inside and outside Europe, but these projects are almost invisible to the eyes of the public authorities".[101]

Type and roles

Open Access repositories are the most frequent form of Open Science Infrastructure[102] with 5,791 repositories in existence in December 2021 according to OpenDOAR[103]

Yet, there is a significant diversification of the roles and the activities of open science infrastructure, at least among the largest infrastructures. In the survey of European infrastructure conducted by SPARC Europe, 95% of the respondents mention that they provide services in at least three different stages of research production out of six (Creation, Evaluation, Publishing, Hosting, Discovering and Archiving).[104] Agregation, hosting and indexing are especially central activities, common to most Open Science Infrastructures regardless of their focus.

Specialization does happen at a higher level. A network analysis identifies "two main clusters of activities":

  • Publishing-focused infrastructures which are associated with the "publishing and hosting traditional text formats".[104] Among them, "paper submission (41 out of 70) and review (30) were the most commonly reported activities".[105]
  • Creation-focused infrastructures which deal preferably with the "processing and storing research outputs, particularly data". Theses actors provide specific services in the field of "data gathering (47 out of 71), and data analysis (40)".[105] Besides, "computation and machine learning (18) and Experimentation (15) were roughly half as common".[105]

Standards and technologies

Standardization is a major function of open science infrastructure as they aim to insure that the content they share and support is distributed consistently as well as ease reuse.

Maintaining open standards is one of the main challenge identified by leading European open infrastructures, as it implies choosing among competing standards in some case, as well as ensuring that the standards are correctly updated and accessibile through APIs or other endpoints.[106] Two third of the respondents have undertaken an evaluation of their technological environment during the past year, to ensure that key components have not become obsolete.[107] As a consequence of this sustained efforts, most open infrastructure complies with the new established standards of open science, such as FAIR data or Plan S.[107]

Open science infrastructures preferably integrate standards from other open science infrastructures. Among European infrastructures: "The most commonly cited systems – and thus essential infrastructure for many – are ORCID, Crossref, DOAJ, BASE, OpenAIRE, Altmetric, and Datacite, most of which are not-for-profit".[108] Google Scholar is the first mentioned commercial service, while Scopus, the leading proprietary academic search engine developed by Elsevier, is one of least quoted leading service.[109] Open science infrastructure are then part of an emerging "truly interoperable Open Science commons" that hold the premise of "researcher-centric, low-cost, innovative, and interoperable tools for research, superior to the present, largely closed system."[110]

Infrastructures are frequently dependent on choices made by external stakeholders, especially scientific publishers: they "do not themselves decide on the openness of content since they are dependent on the policies of content providers".[111] This affects not only the content but also the "user data policies [that] are set by publishers which limits what can be made available".[112]

Open Science Infrastructure have strong ties with the open source movement. 82% of the European infrastructures surveyed by SPARC claim to have partially built open source software and 53% have their entire technological infrastructure in open source.[107]

Governance

Governance has been self-identified as a potential weakness by the European infrastructure surveyed by SPARC.[113] Less than half of the respondents considering that they are at a "mature" stage in this regard and a "good governance" is quoted as the main challenge.[106] Interaction between the communities they aim to support and the other stakeholders and funders is especially complicated: "One specific challenge identified was the tension between serving the needs of the community of users versus prioritising the needs of clients that provide financial support to the OSI".[106]

The tension between centralization and diversity largely characterizes Open Science Infrastructure. While historically defined as a "centralized [Open Access] project", Redalyc aims to become a "community-based sustainable infrastructure in Latin America" (Berrecil). The leading European open infrastructures have reported "challenges around ensuring sufficient (and sufficiently diverse) representation" as well as the involvement from some professional communities like researchers and librarians.[106]

Audience

Open Science Infrastructure "target and serve a wide range of stakeholders".[114] Researchers remain the primary target, but libraries, teachers and learners are among the expected audience of more than half of the infrastructure surveyed by Sparc Europe.

A majority of european infrastructures "operate at a global scale", with English being the primary language of 82% of the respondents.[115] These infrastructures are also frequently multilingual and integrate a specific national focus: they "provide access to a range of language content of local and international significance".[115]

Distribution of disciplines among the infrastructures surveyed by the SPARC report Scoping the Open Science Infrastructure Landscape in Europe

Open Science Infrastructures benefit to diverse disciplines and scientific communities. In 2020, 72% of the european infrastructures surveyed by Sparc Europe claim to support all disciplines. The social sciences and the humanities are the most mentioned disciplines, which is partly attributed to the fact that the survey was "distributed widely by the OPERAS network".[116] In 2010, the infrastructures supporting the social sciences and the humanities were much less prevalent and most of the uses cases came from "biosciences, High Energy Physics and other fields of physics, earth and environmental sciences, computer science, astronomy and astrophysics".[117].

Economics

Many Open Science Infrastructure run "at a relatively low cost" as small infrastructures are an important part of the open science ecosystem.[118] In 2020, 21 out of 53 surveyed European infrastructures "report spending less than €50,000".[118] Consequently, more than 75% of surveyed European infrastructures are run by small teams of 5 FTEs or less.[119] The size of the infrastructure and the extent of its funding is far from always proportional to the critical service it offers: "some of the most heavily used services make ends meet with a tiny core team of two to five people."[120] Volunteer contributions are significant as well with is both "a strength and weakness to an OSI’s sustainability".[118] The landscape of open science infrastructures is therefore rather close to the ideals of a "decentralised network of small projects" envisioned by theoricians of the scholarly commons.[121] A very large majority of open science infrastructure are non-commercial[122] and collaborations or financial support from the private sector remain very limited.[123]

Overall, European infrastructures were financially sustainable in 2020[124] which contrasts with the situation ten years prior: in 2010, European infrastructures had much less visibility: they usually lacked "a long-term perspective" and struggled "with securing the funding for more than 5 years".[125] In 2020, European infrastructures frequently relies on grants from National funds and from the European Commission.[123] Without theses grants, most of theses actors would "could only remain viable for less than a year".[122] Yet, one quarter of surveyed European infrastructures was not supported by any grants and subventions and used either alternative means of incomes or voluntary contributions.[118] As they can be "difficult to define adequately", open science infrastructures can be overlooked by funding bodies, which "contributes to the challenge of securing funding".[126]

References

  1. UNESCO Recommendation on Open Science, 2021, CL/4363
  2. UNESCO Recommendation on Open Science, 2021, CL/4363
  3. Ficarra et al. 2020, p. 7
  4. Atkins 2003, p. 5
  5. Star & Ruhleder 1996
  6. Karasti et al. I 2016, p. 4
  7. Atkins 2003, p. 5
  8. 8.0 8.1 Fecher et al. 2021, p. 500
  9. Edwards et al. 2006, p. 6
  10. Moore 2019, p. 121: "infrastructures are not easily divisible, recognisable or compartmentalised"
  11. Okune et al. 2018, p. 3
  12. Moore 2019, p. 143
  13. Neylon 2018, p. 1
  14. Atkins 2003, p. 5
  15. 15.0 15.1 15.2 15.3 15.4 15.5 15.6 Neylon et al. 2015
  16. Star & Ruhleder 1996
  17. Bos et al. 2007, p. 667
  18. Karasti et al. IV 2016, p. 5
  19. Neylon 2018, p. 7
  20. Neylon 2018, p. 7-8
  21. Kraker 2021, p. 2
  22. Future of scholarly publishing 2019
  23. Fecher et al. 2021, p. 505
  24. UNESCO Recommendation on Open Science, 2021, CL/4363
  25. lewis 2020, p. 6
  26. Ficarra et al. 2020, p. 8
  27. Dacos 2013
  28. Role of e-Infrastructure 2010, p. 222
  29. Moore 2019, p. 183
  30. Okune et al. 2018, p. 3
  31. Ross-Hellauer et al. 2020, p. 13
  32. Ficarra et al. 2020, p. 7
  33. SPARC 2020
  34. Open Science MOOC Response to UNESCO Draft Open Science Recommendations, December 30, 2020
  35. Moore 2019
  36. Okune et al. 2018
  37. Moore 2019, p. 173
  38. Borgman 2007, p. 40
  39. Wouters 1999, p. 61
  40. Wouters 1999, p. 62
  41. Wouters 1999, p. 60
  42. Wouters 1999, p. 64
  43. Bourne & Hahn 2003, p. 16
  44. Bourne & Hahn 2003, p. 12
  45. Shankar et al. 2016, p. 63
  46. Regazzi 2015, p. 128
  47. Bourne & Hahn 2003, p. 397
  48. Future of scholarly publishing 2019, p. 15
  49. Andriesse 2008, p. 189
  50. Campbell-Kelly & Garcia-Swartz 2013
  51. Berners-Lee & Fischetti 2008, p. 17
  52. Berners-Lee & Fischetti 2008, p. 18
  53. Bourne & Hahn 2003, p. 304
  54. Dacos 2013
  55. Hogan 2014, p. 20
  56. Bygrave & Bing 2009, p. 30
  57. Berners-Lee & Fischetti 2008, p. 17
  58. Tim Berners-Lee, “Qualifiers on Hypertext Links”, mail sent on August, 6 1991 to the alt.hypertext
  59. Star & Ruhleder 1996, p. 131
  60. Moore 2020, p. 7
  61. Feder, Toni (8 November 2021). "Joanne Cohn and the email list that led to arXiv". Physics Today 2021 (4): 1108a. doi:10.1063/PT.6.4.20211108a. https://physicstoday.scitation.org/do/10.1063/PT.6.4.20211108a/full/. 
  62. Borgman 2007, p. 21.
  63. Dacos 2013.
  64. Skinner 2019, p. 6.
  65. Atkins 2003, p. 5
  66. 66.0 66.1 Eccles et al. 2009
  67. Dacos 2013
  68. Role of e-Infrastructure 2010
  69. OA Diamond Study 2021, p. 93
  70. OA Diamond Study 2021, p. 30
  71. Dombrowski 2014, p. 334
  72. Dombrowski 2014, p. 329
  73. Dombrowski 2014, p. 331
  74. Andriesse 2008, p. 257-258
  75. Aspesi et al. 2019, p. 5
  76. Posada & Chen 2018, p. 6
  77. Moore 2019, p. 156
  78. Joseph 2018, p. 1
  79. Boston 2021
  80. Joseph 2018
  81. Brembs et al. 2021
  82. Ficarra et al. 2020, p. 7
  83. Okune et al. 2018, p. 13
  84. Bosman et al. 2018, p. 19
  85. Neylon 2018, p. 7
  86. Neylon 2013
  87. Crossref’s Board votes to adopt the Principles of Open Scholarly Infrastructure
  88. OpenCitations’ compliance with the Principles of Open Scholarly Infrastructure
  89. Dryad’s Commitment to the Principles of Open Scholarly Infrastructure
  90. Ficarra et al. 2020, p. 21
  91. Neylon 2018, p. 7
  92. Fecher et al. 2021, p. 505
  93. 93.0 93.1 ESFRI Roadmap 2021, p. 159
  94. Kraker 2021, p. 2
  95. UNESCO Recommendation on Open Science, 2021, CL/4363
  96. Brembs et al. 2021
  97. Brembs et al. 2021, p. 4
  98. Ficarra et al. 2020
  99. Future of Scholarly Communication 2021
  100. Role of e-Infrastructure 2010
  101. Mounier 2018, p. 305
  102. Operas Landscape Study 2017, p. 15
  103. OpenDOAR Statistics
  104. 104.0 104.1 Ficarra et al. 2020, p. 13
  105. 105.0 105.1 105.2 Ficarra et al. 2020, p. 15
  106. 106.0 106.1 106.2 106.3 Ficarra et al. 2020, p. 23
  107. 107.0 107.1 107.2 Ficarra et al. 2020, p. 29
  108. Ficarra et al. 2020, p. 50
  109. Ficarra et al. 2020, p. 31
  110. Ross-Hellauer et al. 2020, p. 13
  111. Ficarra et al. 2020, p. 27
  112. Ficarra et al. 2020, p. 24
  113. Ficarra et al. 2020, p. 22
  114. Ficarra et al. 2020, p. 18
  115. 115.0 115.1 Ficarra et al. 2020, p. 20
  116. Ficarra et al. 2020, p. 19
  117. Role of e-Infrastructure 2010, p. 106
  118. 118.0 118.1 118.2 118.3 Ficarra et al. 2020, p. 35
  119. Ficarra et al. 2020, p. 41
  120. Kraker 2021, p. 3
  121. Moore 2019, p. 176
  122. 122.0 122.1 Ficarra et al. 2020, p. 48
  123. 123.0 123.1 Ficarra et al. 2020, p. 45
  124. Ficarra et al. 2020, p. 51
  125. Role of e-Infrastructure 2010, p. 103
  126. Neylon 2018, p. 1

Bibliography

Definitions

Report

Book & thesis

  • Wouters, P. F. (1999). The citation culture (Thesis). Retrieved 2018-09-09.
  • Bourne, Charles P.; Hahn, Trudi Bellardo (2003-08-01). A History of Online Information Services, 1963-1976. MIT Press. ISBN 978-0-262-26175-3. 
  • Borgman, Christine L. (2007-10-12). Scholarship in the Digital Age: Information, Infrastructure, and the Internet. Cambridge, MA, USA: MIT Press. ISBN 978-0-262-02619-2. 
  • Berners-Lee, Tim; Fischetti, Mark (2008-06-26). Weaving the Web: The Original Design and Ultimate Destiny of the World Wide Web by Its Inventor. Paw Prints. ISBN 978-1-4395-0036-1. 
  • Andriesse, Cornelis D. (2008-09-15). Dutch Messengers: A History of Science Publishing, 1930–1980. Leiden ; Boston: Brill. ISBN 978-90-04-17084-1. 
  • Bygrave, Lee A.; Bing, Jon (2009-01-22). Internet Governance: Infrastructure and Institutions. OUP Oxford. ISBN 978-0-19-956113-1. 
  • Hogan, A. (2014-04-09). Reasoning Techniques for the Web of Data. IOS Press. ISBN 978-1-61499-383-4. 
  • Regazzi, John J. (2015-02-12). Scholarly Communications: A History from Content as King to Content as Kingmaker. Rowman & Littlefield. ISBN 978-0-8108-9088-6. 
  • Le Deuff, Olivier (2018-04-16). Digital Humanities: History and Development. John Wiley & Sons. ISBN 978-1-119-30817-1. 
  • Moore, Samuel (2019-05-02). Common Struggles: Policy-based vs. scholar-led approaches to open access in the humanities (Thesis). Retrieved 2021-12-11.
  • Montgomery, Lucy; Hartley, John; Neylon, Cameron; Gillies, Malcolm; Gray, Eve (2021-08-03). Open Knowledge Institutions: Reinventing Universities. MIT Press. ISBN 978-0-262-36516-1. 

Article

Conference

Other resources