Project Catalogues and Project Descriptors using DOAP

by Ross Gardler on 2 January 2007 , last updated

Introduction

Open source projects need to be marketed if they want to attract users and contributors. Most projects have a website describing their work but how do they attract people to their website in the first instance? While many projects may rely on search engines to be found, there are a growing number of project catalogues that a project should consider being listed in, but maintaining entries in a large number of catalogues can be a time consuming process. This document introduces the Description of a Project (DOAP) format for managing project details in emerging catalogues.

An Open Project Descriptor

Listing your project in online catalogues such as Freecode is an onerous task, and can take up a considerable amount of time. Furthermore, each new release or significant change in your project setup requires someone to update each entry by visiting each catalogue’s site, again a time-consuming process. In many cases, this this means projects are either not listed or their entries become stale, giving a misleading impression of inactivity in the project.

What is needed is a way for projects to update all catalogues simultaneously. Fortunately, the Semantic Web (data provided in a form that is easily processed by machines) provides just such a mechanism. A Description of a Project (DOAP) is a machine- readable document that is used to share information about a project. A DOAP descriptor can be used for:

  • easy importing of projects into directories
  • automated updating of directories
  • data exchange between directories
  • automatic configuration for resources such as mailing lists, shared repositories and issue trackers
  • assisting package maintainers who bundle resources for distributors

DOAP in use

DOAP facilitates the building of project registries by allowing ‘aggregator’ sites to pull in project records from many different sources and combine them into a single database. In order for a project to be incorporated in such an aggregator, the project needs to create a DOAP file (see next section) and publish it somewhere where it is accessible via an http or https request. This usually means that it is published on the project’s website. Then the project can either notify the aggregator directly of the location of this file, or register it at one of the sites from which the aggregator harvests the files, i.e. examines them for updates. Once the aggregator website knows about a particular DOAP file, it can monitor the file for changes and therefore update its records whenever the DOAP file is updated.

Since, for optimal effect, the DOAP file is stored locally to the project, there is no need for project maintainers to visit the aggregator site in order to update their records. They simply update their local DOAP file and wait for the aggregator site to harvest the new information. Of course, since the DOAP file is made accessible via an http or https request, any other DOAP-enabled catalogue can also access the information as long as they know where to look.

As a project is listed in more DOAP enabled catalogues, the benefits of remote updating start to multiply. All the project maintainer needs to do is update their file, and all catalogues should automatically update themselves. How this updating occurs is dependent on how the catalogue is implemented. Some catalogues use an ATOM feed, other catalogues periodically harvest files.

Discovery of new DOAP files can also be supported by submitting a link to your DOAP file to one of the new wiki pages that list DOAP files. Again, a catalogue may use this list of links to discover previously unknown DOAP files. An example of wiki pages used for this purpose are the Semantic Web Bulletin Board .

Specialist catalogues and DOAP extensions

DOAP catalogues can be vast, as they do not generally restrict which projects are listed. Such a large and broad catalogue is not always what is required by a user. A user may want to view a list of projects that are directly related to a particular interest. For example, a member of project X may be interested to see a catalogue containing all known projects that are dependent on project X. Alternatively, they may wish to see a catalogue of all projects that project X depends upon.

An example of a specialist catalogue such as this can be found at The Apache Software Foundation (ASF). The ASF has many active projects and until they adopted DOAP there was no central point where interested parties were able to get a summary of all those projects. With the adoption of DOAP, the ASF are able to produce projects.apache.org, a website that provides many different ways of viewing the available projects within the ASF. For example, it is possible to retrieve a list of all XML-related projects, or all projects written in a particular programming language.

An interesting aspect of the ASF catalogue is that it has extended the DOAP description language to include additional information that is only relevant to the Apache Software Foundation. DOAP is specifically designed to allow this kind of extension without affecting other catalogues using the same DOAP file. For example, the ASF has added tags to capture information about the Project Management Committee for each project. This information is displayed in the projects.apache.org catalogue, but may be ignored by other catalogues.

Creating and Managing DOAP Files

Having established that DOAP is a good thing for a project, how does a project manager go about creating and maintaining a DOAP file?

DOAP files are text files, so you can use your favourite text editor to work with them. Alternatively, you can use your favourite XML editor. There are also a few web-based tools available for you to use, for example DOAP A Matic. However, these do not provide a means for editing an existing DOAP file (they simply help you create the initial file).

If you are interested in creating your own web-based DOAP generation tool that supports any customisations, then a good starting point would be the Apache Labs DOAPizer. This is a Javascript application that illustrates the creation of a DOAP with ASF extensions.

Projects creating software artifacts may find that their chosen build management tool, such as Apache Maven, provides a facility for managing your project’s DOAP files automatically as part of the normal project management cycle.

Some project hosting sites, such as SourceForge, also provide APIs for exposing information about projects using DOAP. In this case, the information you provide as part of your project description will be used to populate the DOAP from the site.

What does a DOAP file look like?

DOAP is a Resource Definition Framework (RDF) schema; that is, it is a vocabulary for describing a particular topic, in this case a project. It is commonly written as an XML document (RDF/XML).

Below is a simple example of a basic DOAP file describing the fictional Foo Bar OSS project.

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://usefulinc.com/ns/doap#" xmlns:foaf="http://xmlns.com/foaf/0.1/"> 
    <Project>
        <name>Foo Bar OSS</name> 
        <shortdesc xml:lang="en">Platform for making Foo into Bar</shortdesc> 
        <description xml:lang="en">There are too many Foo's in the world. This project provides a suite of tools for automatically converting various types of Foo into just about any kind of Bar. </description> <homepage rdf:resource="http://www.foobar.org"/> <wiki rdf:resource="http://wiki.foobar.org/" /> 
        <wiki rdf:resource="http://wiki.foobar.org/RecentChanges?action=rss_rc&ddiffs=1&unique=1" dc:format="application/rdf+xml" /> 
        <repository> 
             <SVNRepository> <location rdf:resource="http://svn.apache.org/repos/asf/labs/Foo/"/> 
                 <browse rdf:resource="http://svn.apache.org/viewvc/labs/Foo/"/> 
             </SVNRepository> 
        </repository> 
        <bug-database rdf:resource="http://issues.foobar.org" /> 
        <mailing-list rdf:resource="http://www.foobar.org/lists" /> 
        <license rdf:resource="http://usefulinc.com/doap/licenses/asl20"/> 
        <download-page rdf:resource="http://www.foobar.org/downloads" /> 
        <programming-language>Java</programming-language> 
        <release> 
            <Version> 
                <name>FooBar-Milestone2</name> 
                <created>2006-01-01</created> 
                <revision>0.2</revision> 
            </Version> 
        </release> 
        <created>2007-1-16</created> 
        <maintainer>
            <foaf:Person rdf:about="http://www.foobar.org/about/FooMan.html"/> 
                <foaf:name>Foo Man</foaf:name> 
            </foaf:Person> 
        </maintainer> 
        <category rdf:resource="http://simal.oss-watch.ac.uk/category/socialNetworking" rdfs:label="Social Networking"/> 
        <category rdf:resource="http://projects.apache.org/category/database" rdfs:label="Database"/> 
    </Project> 
</rdf:RDF> 

There are many more elements that can be used in DOAP documents. A full list of them is not provided here, but the complete schema can be seen at the DOAP project website (see Further reading).

One advantage of using RDF is that it is easy to extend the DOAP vocabulary. Such an extension can be seen in the above example. Note that the maintainer is described using another RDF format called Friend of a Friend (FOAF). This means that the DOAP vocabulary need not concern itself with how to describe a person. Similarly, as discussed above, the ASF has been able to extend the DOAP vocabulary to include information relevant to Apache projects only.

A further example of extending DOAP with other schemas comes in the form of ensuring your DOAP file is of use in more than one language. For example, links can be labelled with the Dublin Core meta data schema to distinguish between download pages in various languages, as can be seen below.

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://usefulinc.com/ns/doap#" xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:dc="http://purl.org/dc/elements/1.1"> 
    <Project> 
        <name>Java Platform</name> 
        <download-page rdf:resource="http://java.sun.com/javase/downloads/index.jsp" dc:format="text/html" dc:lang="en" /> 
        <download-page rdf:resource="http://java.sun.com/javase/ja/6/download.html" dc:format="text/html" dc:lang="ja" /> 
    </Project> 
</rdf:RDF> 

Creating and maintaining your own catalogue

As well as being easy to update other people’s catalogues with DOAP, it is also very easy to create your own catalogues using DOAP files. Before looking at how, let us consider why a project may want to do this.

As most webmasters know, the key to getting a high rank on most search engines is to have valuable and constantly updating content on the site. Furthermore, providing a useful service to your visitors is of great importance and increases the chances that they will link back to you. One way of covering each of these bases is to maintain a catalogue of projects that are related to your own project in some way. This can be done by using the category tag in the DOAP file. If several projects are listed with the same category, there may be an opportunity to work together and help multiple projects in becoming more sustainable.

Without DOAP it is necessary to encourage all listed projects to maintain their entries in your catalogue, which consumes their resources and so is unlikely to happen. Alternatively, someone in your project can maintain each entry, but that consumes your own project’s resources. However, if the projects listed in the catalogue maintain a DOAP file, the only additional overhead in managing that catalogue is maintaining a list of projects within it. If you have created your DOAP file and have your project listed in a project catalogue, it can benefit your project as well. You can find related projects and be found by others more easily, which in turn can help you in building a community around your project.

Software for the creation of catalogues from DOAP files has yet to fully mature. The code behind projects.apache.org is available from Apache but is still largely considered an internal project. Alternatively, Apache Forrest has experimental code in its projectInfo plugin that will produce a similar site to that at projects.apache.org. Despite the fact that these projects are still in their early stages, they both work well.

Other options include Exhibit from MIT, a lightweight structured data publishing framework that lets you create web pages with support for sorting, filtering, and rich visualizations by writing only HTML and optionally some CSS and Javascript code. The Exhibit project is not specifically targeted at DOAP files, but it works very well with them.

Further reading

Links:

Related information from OSS Watch: