Open source and research infrastructure

by Gabriel Hanganu on 23 February 2010 , last updated 11 February 2013

Introduction

The UK features an impressive set of online systems and services aimed at helping researchers develop new ways of conducting research. However, there is some reserve among researchers in adopting these technologies to their full potential. This document argues that the main issues are social and organisational rather than technological, and suggests that one way to improve the current situation is to take stock of some key lessons from open source development practice. Two companion documents describe in more detail the most important community and sustainability lessons relevant for the research infrastructure field.

Let’s start with an overview of the main findings from recent studies and interviews with researchers and service providers in the academic sector.

Research infrastructure background

In recent years, academic research has been challenged by expanding access to online resources and the increased potential for distributed collaboration among researchers. Members of these research communities, supported by a technical framework that allows access regardless of geographic location, can now share, federate and exploit the collective power of global facilities. In this context, ‘e-Research’ can be defined as research performed in virtual communities across the academic and industrial sectors using specially designed online facilities and services. This network of tools, resources and services that allow globally distributed researchers to collaborate on producing research outputs is known as research e-Infrastructure, or simply e-Infrastructure.

In the UK, e-Infrastructure consists of a number of loosely connected or separate projects, tools and services. Some are general in scope and address the needs of various researchers. Others are quite specific and serve small groups of specialists. In both cases, e-Infrastructure is designed to foster new ways of conducting research and improving cross-subject collaboration. This is similar to the role assigned to e-Infrastructure at a European level. In the European Commission’s research strategy, for example, e-Infrastructure is seen as a key element of the new European Research Area, and an important tool for global scientific cooperation. e-Infrastructure is meant to provide an innovation space where the specific interests of scientists are met and cross-subject solutions for distributed research are provided.

Embedding human and technical infrastructures

The technical deployment of a complex e-Infrastructure is only one step towards fostering new ways of conducting research. An equally important step is encouraging researchers to use this technical framework to its full potential. Addressing this issue, a number of reports mention the need to embed the technical research framework in a ‘human infrastructure’. In this context, ‘human infrastructure’ refers to the social and organisational arrangements enabling technologies to be used effectively. The AVROSS report, for example, states that uptake of e-Infrastructure is as often hindered by human and organisational issues as it is by technical ones. Focusing on the UK, it recommends continued technical innovation in e-Research. At the same time, it suggests, the social framework that would allow research communities to better exploit these technical assets should be improved.

Other reports highlight similar issues. For example, the findings of the e-Research Community Engagement study mention the need to correct the current focus on building hi-tech systems and tools at the cost of grasping the real context of their use. The study suggests creating research intermediaries, or ‘boundary spanners’, who can act as facilitators between research domains, and between research and technical staff. This is similar to the role of community in open source software development, considered essential for the software code that is being built.

The vital role community development can play in the success of e-Research is beginning to be acknowledged on a global scale. For instance, the US National Science Foundation funds a program on ‘Virtual Organizations as Sociotechnical Systems’. The European Commission had a strand on Virtual Research Communities as part of the FP7 programme. In the UK, however, building communities around products and processes related to e-Research is not yet seen as a priority. A recent e-Research Community Engagement report suggests that UK funding bodies should also include future community engagement calls, similar to the EU and US programs.

Lessons from open source development

Building a ‘human infrastructure’, as these reports suggest, is not an easy job. In fact, it can be more challenging than building and making available the current array of technical infrastructure tools. Human nature is more complex than technology and affected by a host of factors rarely taken into account by software engineers. However, projects need not start from scratch in this process, as a fair amount of experience in building communities around technical systems already exists in the area of open source software development. Some of this experience applies to web-based collaboration in general; some is particularly relevant to building and maintaining software. The next two sections highlight how lessons learned from open development, particularly in the areas of community building and sustainability, could help to increase the uptake of e-Infrastructure by UK researchers.

Community

In open source, helping users and developers to engage with the project is crucial for building a thriving community. In fact, building a sustainable product largely depends on forging an environment in which users and developers share a culture of mutual support and a sense of following a common goal. The effects of seeding similar ideas in research infrastructure communities could be substantial. For instance, they could help sort out problems stemming from a lack of understanding between research stakeholders. They could also help service providers to identify the right level of guidance needed by researchers to engage with e-Infrastructure, without putting them off by excessive hand-holding.

To create such a welcoming and ‘want to come back’ environment, it is essential to encourage contribution from outside the project. To facilitate this, barriers that could prevent users’ access to the community should be removed. Successful open source projects make it clear that all are welcome to make their contribution in their own way. Those who do not have technical skills are also important, as they may become future users. They can also support the project by doing other tasks. These could include simply asking questions that get answers that may be useful to newcomers, to testing and giving feedback on new releases, writing documentation and promoting the product to their friends. In the research infrastructure context, non-technical users can spread success stories about the use of e-Infrastructure, or help service providers refine their products, documentation and training in response to user feedback.

Another lesson that e-Research can learn from open source communities is the importance of adapting tools to fit community needs. Do the provided infrastructure tools help users ask their research questions and carry out research in an efficient and consistent manner? Are these tools easy to adapt to the specific needs of the researchers? For instance, open development uses version control systems to synchronise material accessed by distributed users. This could be useful to infrastructure service providers who manage multiple versions of Grid nodes, which also need to be kept in sync. Another useful open development practice is the use of formats and standards that allow external developers to easily engage with the software. Also, it is essential that the researchers themselves are able to easily build on e-Infrastructure tools, and the output formats of these tools are diverse enough to allow them to choose one that best suits their needs. These issues are discussed in more detail in a separate document.

Sustainability

A key aspect of building open source communities is encouraging collaboration from the earliest days of developing the software. In line with the famous open source dictum ‘release early, release often’, developers are supposed to allow free access to the code from day one, despite its tentative nature, and to encourage all to contribute. This attitude is very important, as it attracts key early feedback and helps build confidence in the project. In the e-Infrastructure context, fostering collaboration from an early stage could help replace the culture of competing for funding, prompted by the Research Assessment Exercise, with a culture of jointly writing funding proposals. Promoting such an environment may encourage researchers to ask themselves what they can and cannot collaborate upon. For instance, if they are not always able to share data, they could at least consider sharing such resources as hardware or computing power.

Another feature that contributes to the success of an open source project is planning for sustainability from the beginning. Once the early versions of the project are available and potential users start showing an interest, it is critical that new members able to help with software or administrative tasks are brought on board. Although it may seem unlikely that people will just start contributing online without having met any of the team members, strong evidence from open source development shows that transparent, friendly, well-explained and well-managed projects attract both users and developers. Drafting a sustainability plan and making it known to everybody also expands the chances of attracting external contributions. A plan that shows that the project team has considered various options and identified a set of potential revenue streams will inspire confidence in the project’s likelihood to succeed. This will also attract new members.

For e-Infrastructure service providers, this is an important point. A well-thought-out sustainability plan that considers various paths for further development - including the possibility that central funding will dry up - will make researchers more likely to invest time and effort in these services. By being less dependent on continuation funding, one sends a strong signal to all stakeholders that even in the event of a financial downturn, the service will continue to exist and people’s contributions will remain safe.

Avoiding reliance on centrally provided developer support is also key to sustainability. Short-term developer support runs the risk of obscuring the need to build an environment attractive to new contributors. When developer support is withdrawn, the project does not have a mechanism in place for encouraging and absorbing external developer contributions. So, useful as it may be in the short term to help researchers produce project outputs, this support model provides little benefit in the long term, as it fails to create teams that can produce self-sustainable products. A more detailed analysis of these issues is provided in a separate document.

Conclusion

Embedding the existing technical research infrastructure in a ‘human infrastructure’ could help remove some of the main barriers to adopting these tools and services by researchers. A number of open development community and sustainability lessons could be applied in this area. These include creating a culture of mutual support, encouraging internal and external contributions, adapting technology to community needs, building a sustainability plan, and not relying entirely on continuation funding or central development support.

OSS Watch provides unbiased advice and guidance on the use, development, and licensing of free software, open source software, and open source hardware.