MailScanner: a case study in sustainability

by Julian Field, University of Southampton on 5 June 2007, last updated

Archived This page has been archived. Its content will not be updated. Further details of our archive policy.

Introduction

In the latter half of 2006, the Joint Information Systems Committee (JISC) commissioned a study via its Teaching and Learning committee to examine the issues surrounding sustainability of open source software. The resulting report drew together seven case studies of successful but very different open source projects and examined each project’s sustainability model. Each of these case studies has been told from the point of view of the lead developer or one of the key personnel and gives a fascinating insight into the factors that have determined the success of each project. These case studies are now presented by OSS Watch as stand alone documents in a series.

This case study, examining the MailScanner project, has been written by Julian Field, University of Southampton.

Brief description

MailScanner is a complete e-mail security system, implemented at the gateway. It provides e-mail virus scanning, spam detection, and a large variety of other controls that can be used to implement a comprehensive e-mail security policy. MailScanner can be run on any Unix, Linux or Posix-based system, and will work with Sendmail, Postfix, Exim, ZMailer and Qmail. MailScanner is available under the GNU General Public License v3, and although it is free of charge, a user will need to have an appropriate licence for a host-based anti-virus package such as Sophos, Norton or F-Secure. A partner package, MailWatch, has been written by Steve Freegard, and this provides a very full range of management and reporting facilities, all driven from a Web browser.

MailScanner is probably one of the world’s most widely-used e-mail security and anti-spam systems. It is fair to estimate that there are about 60,000 sites around the world using it, and there have been over a million downloads. It has been implemented in many prestigious institutions including the US Navy, HP Labs, MIT, BAe Systems, Harvard Medical School and Vodafone Europe.

The MailScanner project has never had any official funding of any sort. I beg and borrow the facilities I need, and have the kind co-operation of my employer, the University of Southampton. I do the vast proportion of the work in my spare time, though my employers are happy for me to work on it when my workload permits.

Introduction

Over the years, the University of Southampton’s Intelligence, Agents, Multimedia (IAM) Group has undertaken a number of short studies in systems and networks security on behalf of JISC’s JCAS sub-committee. In 2000, they undertook an investigation into gateway-based anti-virus protection methods, detailing who was providing what and the cost implications of each product. As this project progressed, it became evident that anti-virus e-mail gateway products were both very expensive (for licences per host within the site) and were predominantly Windows-only. For example, for a university such as Southampton, with around 20,000 users, the costs, at the time, would have been in excess of £50,000 per annum in licence fees alone.

This was clearly far too expensive and in the course of investigating alternative solutions I decided that it would be better to build one from scratch. My ideas for the system were based on the limitations and problems I found with both proprietary and open source systems, and my desire to overcome them and provide a more elegant, open source solution.

Project history

Although I was not officially part of the IAM Group study for JISC, I was brought in from time to time in my role as the local e-mail postmaster and it seemed to me that the products that were on sale from commercial vendors were very basic. I also looked at open source/free packages but they had their own problems associated with them so they were also discounted. Once I started to think about what was wrong with the existing software I had an idea for a solution that seemed to be both more elegant and more efficient than any existing system. For example, at that time, systems would start up a virus scanner engine separately for every message, but my idea for MailScanner was that it should handle messages in batches, of varying sizes, and only run the virus scanner once for each entire batch. The implication of this was that the higher the load, the bigger the batches, and hence the greater the efficiency that can be achieved, thus making the system very scalable.

I started work on it in my spare time and ten days later put version 1.0 into production in our department. Once I had ironed out all the bugs I put it on my website and posted a short message about it to the uk-mail-managers mailing list, run by Newcastle University. A few people tried it out and liked it because it deleted infected attachments from messages and delivered as much of the message as possible. This was an improvement on existing systems, which deleted the whole message if they found an infected attachment.

However, after a short time, and in response to requests and suggestions from other people, it was obvious that support for additional virus scanners was much needed. So I rewrote it from scratch and produced version 2.0.

At that time spam was not really a problem, just viruses, so people who could not afford a commercial product tended towards MailScanner for purely financial reasons. It was being used by a couple of hundred sites, mostly within the UK, but also some .edu sites and various cash-strapped American government departments, such as the US urban development department and the US agricultural department. As news about MailScanner spread, uptake increased, most notably in France and the USA. New features were being requested and I added very simple spam detection through the use of blacklists, but it was rapidly outgrowing its architecture.

So, in late 2001, I began work on version 3.0. This took a month to write and had support for many other virus scanners but, more importantly, had far better spam detection due to the ability to plug in SpamAssassin. With version 3.0 MailScanner started to take off, and usage increased to several thousand sites. However, there was still little involvement with people outside: no-one contributed code (with the odd major exception—one university sent me a patch that was bigger than the whole of MailScanner itself!) and testing was purely ad hoc, done by a couple of big users who already had machine clusters dedicated to e-mail system testing and development. In addition, as more features were added to MailScanner, it gradually got slower and slower, so over the summer of 2003 I rewrote it from scratch yet again.

Growth and development

When version 4.0 was first launched in 2003 the project really started to take off. It was over three times faster than the previous version and out-performed everything else on the market. Interest grew rapidly; it started to be known in slightly wider circles and the rate of downloads grew consistently over the next two or three years. Testing improved, and there is now a dedicated team of users who do regular testing, ensuring that the new features and fixes haven’t, in turn, caused any other problems. There have also been more code contributions, particularly from organisations like Vodafone Europe.

Around this time I was approached by a large cable company who were interested in implementing MailScanner. They wanted me to write a detailed training course for their system administrators that would be delivered over two or three days of training sessions. Unfortunately, the decision over whether or not to implement MailScanner was continually delayed and I realised that the time I had spent writing material was in danger of being wasted. It seemed to me that one possible solution to this was to sell it as a book. I went to various seminars and meetings with publishers such as O’Reilly, but was unhappy about how little money I was going to make out of it (5% of the shop retail price). So I went to CafePress, who were already selling T-shirts for me, and set up a book printing deal with them. This meant that I received a much larger share of the retail price. The book has sold nearly 1,000 copies to date.

In 2005 I was a finalist in two of the Reader Awards categories at the UK Linux & Open Source Awards: Best Linux/Open Source Project, and UK Individual Contribution to Linux/Open Source.

Project structure: sustainability

The MailScanner project has never had any official funding of any sort. Occasionally I have managed to get sponsorship from my employer and from Transtec, one of our major suppliers, but mostly I beg and borrow the facilities I need and do the vast majority of the work in my spare time. However, the University of Southampton is very supportive of my work on MailScanner and has provided me with a member of staff to help out with my normal duties. This gives me more time to spend on MailScanner than would otherwise be possible.

Key participants

There are several people in Cambridge, Argentina, Spain and the USA, who have closely examined and audited every line of code in the project, and this auditing is very helpful. They have all suggested various improvements and, as they have all demonstrated a detailed knowledge of the code, their suggestions have been taken on board.

For the most part these people contribute on behalf of their institutions and they all have big MailScanner installations at their employers’ sites. Tony Finch at Cambridge University has made significant contributions in the past, and the people in Argentina contributed the support for Zmailer, which is a high-speed (but relatively unknown) Mail Transfer Agent (MTA). They have also contributed bug reports and have taken the time to go through the code line by line. The Spanish group used to work for Vodafone Europe but have now moved to the USA. While working for Vodafone, they made a small number of very important contributions to the code and put a lot of effort into profiling MailScanner to try to make it even faster: they re-wrote ten particular lines of code and added 25% to the speed! This took them several weeks of full-time work, and is something I don’t have time to do. Generally speaking, contribution levels tend to fluctuate and are usually related to what each participant needs MailScanner to do.

The American group is now called Fort Systems Ltd and they sell MailScanner as a commercial product that includes a much-improved version of MailWatch, and a Web-based configuration and administration package. Steve Freegard, the author of MailWatch, is one of their full-time employees.

Project structure: process and governance

I run MailScanner as a ‘benevolent dictatorship’, which, although very labour-intensive for me, leads to a package that is very reliable and has very few flaws. Whilst most open source projects allow access to Subversion or CVS archives to provide the latest version of the source code, I do not. I believe that this is dangerous as it would enable people to download code which is untested, and may potentially lose mail as a result. Losing mail is obviously unacceptable to users, so only I am able to release code to the general public, and only after it has been tested at least to the point where I am sure it cannot lose mail. After that point, I rely on users to do the majority of the functional testing.

Similarly, accepting contributions to the source from other people could affect the reliability of the software, as there may well be bugs and/or security vulnerabilities in the source code they supply. As MailScanner deals with e-mail, it is critical that it should maintain its reputation for being a reliable and safe package to use on a corporate e-mail system, so all code accepted from other people is closely inspected for security vulnerabilities and potential for attack and, as a result, is usually totally re-written in my own style.

Most decisions that affect the direction of the software are discussed amongst the users. If they want something, and there are enough of them to warrant the effort required, then I write it. At the time of writing, no-one has requested or suggested any major new features in a long time.

Apart from the book, there are three main sources of documentation: the wiki, the website and the automatically-generated documents. The wiki provides all sorts of sample set-ups and ‘HowTo’s contributed by the users. I don’t write any of the content in the wiki, although I created the basic framework for it. The website contains some easy-to-follow installation guides for the various different MTAs supported, which is just enough to get people started. The automatically-generated documents are extracted from the lengthy textual comment above each configuration setting in the main MailScanner configuration file, which explains how to use each particular setting.

Since version 4.0 was released in 2003, I have released a new version at the start of almost every month. Recently, however, due to pressures at work, I have been issuing a new release every two months. I also produce beta test versions every few weeks—as and when new features are added and fixes implemented.

Reflections and future

So far, I have not needed to give much thought to the future of the project. The model I have chosen for MailScanner is labour-intensive for me, but leads to a package that is very reliable and has very few flaws. I am settled in my current job and due to poor health, I am not likely to change jobs or find myself in a position where I don’t have any time to spend on it. Also, MailScanner has matured now and the rate at which I need to add new features is very low. As far as I’m concerned it is a small project and needs to remain that way if I am going to be able to continue doing all of it.

If something were to happen to me, the most sensible course of action would be for Fort Systems to take over my role. They already know an enormous amount about it through their consulting business, and they employ Steve Freegard who knows MailScanner completely and could take over maintenance with no great problem.

Project details

There are three mailing lists for MailScanner:

  • an ‘announcements’ list, which has about one posting per month at most. Anyone who contacts me is encouraged to join that list.
  • a ‘beta testers’ list where I announce changes and new beta versions that I would like people to test. This is also fairly low-traffic, usually about two or three postings per day on average.
  • The main ‘discussion’ list is where most of the technical support is done. This is a very high traffic list, and now has an average of 60 postings per day. There is a small army of volunteers who answer technical queries and discussions as and when they have time. Every sensible question is answered by at least one person, and nothing is left unanswered. I try to post to the list as well in order to correct or clarify information.

Current status

Since 2006, when this document was written, MailScanner has reached version 4.84. The MailScanner book was updated in June 2007 and the software continues to be updated in line with its frequent release cycle.

Further reading

Links:

Related information from OSS Watch:

Acknowledgements

The sustainability study from which this case study is taken was commissioned by the JISC Learning and Teaching committee and funded from HEFCE’s IT Infrastructure funds. The Learning and Teaching committe is responsible for supporting the learning and teaching community by helping institutions to promote innovation in the use of ICT to benefit learning and teaching, research and the management of institutions.

The sustainability study was edited by Gaynor Backhouse of IntelligentContent and her editorial guidance has contributed in large part to the excellent result.