Open Source and the cloud
by Rowan Wilson on 27 May 2014
‘Cloud computing’ is a complex and evolving concept. Broadly speaking it consists of the provision and use of computing resources – such as data storage and processing – as services across a network. By commoditizing computing resources, cloud service companies can achieve considerable economies of scale which can be passed on to customers.
Companies and individuals look to cloud services as a means of reducing their IT budgets, tailoring their resource usage to their needs more exactly, and accessing levels of compute power that previously would have been prohibitively expensive.
Free and open source software (FOSS) is used extensively in the provision of cloud services, and so the move towards the cloud could be seen in one way as a victory for FOSS both in terms of implementation and philosophy.
However there are reasons to question whether the cloud, and the reduction of direct control over our core computing hardware that it represents, is a positive step for software freedom.
This document discusses the varying levels of cloud service provision, the free and open source software that facilitates each, and the extent to which cloud ‘philosophy’ supports or undermines software freedom and openness.
Traditionally, if you wanted a server, you had the option of buying or renting physical machines and either hosting them in house, or in a data centre through a co-location service. More recently, we have seen these physical machines replaced with virtual machines, allowing several server systems to be run on the same piece of hardware. Again, these can be hosted internally or externally.
IaaS takes this to the next level, abstracting away the underlying hardware from the customer. An IaaS service is typically housed in a large data centre, where servers’ resources are pooled into a cluster. Virtual servers can then be provisioned on-demand, with underlying software taking care of which hardware is actually doing the work.
This model provides excellent flexibility and scalability, as new virtual servers can be provisioned and resources allocated to meet the demands of the services they are running. It also holds the potential for cost saving – different types of customers will have different peaks in demand, which can be balanced across the shared infrastructure. This can lower the overall computing capacity required, in turn lowering costs.
Set-ups like this are one of the reasons we use the term “cloud” – the blurring of division between pieces of hardware to make one ubiquitous “blob” of computing resource.
The best known example of IaaS is without a doubt Amazon’s Elastic Computing Cloud, or EC2. Amazon owns several data centres across the globe. When renting a server from EC2, you simply specify the resources you require and the location (for example, Europe or the US). The system takes care of the rest and presents you with a remote login to your server. EC2 is used by sites like Reddit and Foursquare to give them the ability to scale in line with demand.
OpenStack is produced by the OpenStack Foundation, originally founded by NASA and RackSpace but now comprising a sector-spanning group of technology companies. Several companies in the Foundation run public cloud services on the OpenStack platform, in competition with EC2.
CloudStack was originally developed by Cloud.com, who were bought out by desktop virtualisation giant Citrix. Citrix subsequently open sourced the CloudStack system through the Apache Foundation. CloudStack is used by big name brands such as BT and GoDaddy, as well as some smaller ones.
If your infrastructure is being provided as a service, it might not be immediately apparent why it matters if the underlying technology is open source. Your primary concern is likely to be what software you are running on the servers that you are renting. However, it certainly warrants some consideration.
The first aspect to look at is the choice it affords you. If you want a solution running on OpenStack, there are numerous companies for you to choose from, while knowing you are getting the same product backed by the same group of vendors. These companies will still want to compete between themselves, be it on price, the types of server they offer, or the management tools they provide. Also, as the underlying system is the same across vendors, you avoid lock-in.
Another factor that shouldn’t be overlooked is that you don’t have to use OpenStack or CloudStack as a service from someone else. If you have a data centre in your organisation, you can run your own, private IaaS system. While this option will not suit everyone - and might be said, strictly speaking to not be provision as-a-service - you can still think of it as a service provided internally. Rather than having to provision virtual machines in response to individual requests, running a private IaaS system allows users to scale solutions to meet their needs at any given time.
Private IaaS systems can have a disadvantage however: systems running side-by-side on a private cloud are likely to have similar peaks in demand, reducing the potential for cost savings.
PaaS provides a way of provisioning development environments and tools on top of an IaaS system. PaaS allows you to easily deploy the tools required for developing and running applications; languages, runtime environments and testing tools, with the inherent flexibility of resource allocation that comes with running on IaaS.
For example, you might want to start developing a Ruby on Rails application. Without PaaS, you might create a new Virtual machine, set up the operating system, install a database, Ruby, Rails and all their dependencies, configure remote access to the machine, and so on. With PaaS you can simply use the provided tools to deploy a pre-configured Ruby on Rails platform as per your specifications.
The key benefits here is that developers can focus on developing applications rather than setting up their environment and managing server resources, and that custom-built applications can scale easily to meet demand.
You can get PaaS solutions from many differing vendors. Windows Azure offers PaaS tools to run on top of its IaaS offerings, while Google App Engine allows you to develop and run your software from the search giant’s infrastructure. Both of these systems are proprietary, purely sold as a service.
Open source PaaS offerings are also emerging. RedHat’s OpenShift and Pivotal’s CloudFoundry are both platforms providing similar features to the proprietary competitors, but are available under an open source licence. RedHat and Pivotal both offer their respective solutions as a supported commercial service, but as is the nature of open source, there are other companies selling services based on the same systems.
The key differentiator touted by the open source solutions as opposed to their proprietary competitors is freedom from lock-in, although OpenShift and Cloud Foundry present this benefit differently. OpenShift focuses on support for portable technologies such as Java and Open Source languages as a means of allowing you to take your applications elsewhere if you choose. Cloud Foundry provides support for similar technologies, but focuses on avoiding lock-in to a single IaaS platform. This means that Cloud Foundry will run on VMWare’s vSphere infrastructure, OpenStack, or even Amazon’s EC2. The OpenShift website does make mention of OpenShift running on Amazon Web Services, but most of the documentation refers to running on OpenStack.
Of course, the other benefit of open source solutions is that you’re not tied to using a commercial service at all. If you’ve got a supported IaaS system running in house, or rented infrastructure from an external service, you can deploy your own PaaS atop it and let your staff spend more time developing and less time setting up platforms.
Software-as-a-Service is usually presented to the user as a web application. It is distinct from a standalone web application in that each customer has their own instance of the software, rather than just having an account on a larger system (this is sometimes referred to as multi-tenancy).
For example, LinkedIn wouldn’t be considered SaaS since by using it you do not receive control of a copy of the LinkedIn software, just a user account. However, signing up for a blog on WordPress.com does give you your own WordPress instance, on its own subdomain, with its own set of users. This would be considered SaaS.
There are more complex examples such as Sourceforge, which gives you an account within their system, but also instances of software for your project such as bug trackers and wikis. Google Docs also blurs this definition.
SaaS providers will usually offer multiple tiers of service, ranging from a highly limited free account to a well provisioned or even “unlimited” paid account.
There are two key advantages of SaaS. Firstly, it completely removes the administrative overhead of deploying software. Usually a few clicks of a web interface is all it takes to “install” your SaaS instance, and the provider takes care of the computing and storage resources required. Secondly, you can access the software from anywhere. As long as a machine has an Internet connection and a web browser, no further setup is usually required for end users.
There are of course potential issues to be considered. Unlike having software deployed locally, or a web application deployed in-house, you are unlikely to have direct access to your data (and depending on the terms of service, you might not even own it). All data will be stored by your SaaS provider and presented through the application. This makes backing up or archiving data locally difficult if not impossible, and requires absolute trust and confidence in your provider and their security policies.
When considering SaaS solutions, a key factor to look for is the portability of data. WordPress can publish your data in standard RSS formats, and Google Calendar uses the standard ical format. Some products will have an “export” feature, allowing you to download a copy of your data. In these cases, you realise additional benefits of choosing a SaaS product, as you can easily move your data to another provider, or in the cases of open source solutions, an in-house copy of the software.
Open source software appears in the SaaS world in several guises. Some SaaS products may be built using permissively-licensed components, but with some proprietary code sticking it all together.
Of course, a complete open source software product may be offered as a service. WordPress is released under the GNU General Public Licence (GPL), but is offered as both a free and commercial service at WordPress.com and other providers. ownCloud is released under the Affero General Public Licence (APGL) and available as a service from owncloud.com and other providers. The terms of the AGPL, unlike those of the GPL, mean that even when the software is used to provide a service, the service provider must allow you to download the source code to their version of the software.
The benefits of choosing an open source product when selecting SaaS is perhaps not as clear-cut as for lower layers of the “cloud” stack. If the product is not released under a ‘cloud-aware’ licence such as the AGPL, the service provider does not have to distribute the source code. If the software is released under a different open source licence, you will probably be able to download the software elsewhere and run your own instance locally as a contingency. However, without access to your data, the utility of this contingency is limited.
SaaS solutions using a combination of open source and proprietary components, as far as the customer is concerned, may as well be entirely proprietary. The provider may be developing and releasing the open source components, and these may be useful in other systems. However, in terms of the service being provided, the fact that some parts are open source does not directly benefit the customer, only the provider.
Cloud, openness and freedom
As we have seen, free and open source software is used and available at all levels of the cloud computing hierarchy. Major, widely used cloud infrastructure platforms are released and developed under permissive open source licences, with contributions from across the technology sector. It could be argued that open source is becoming as core to cloud services as it has always been for web technologies.
So why might the cloud be a problem for the free and open source software movements? Firstly, as Richard Stallman has pointed out many times, the cloud offers convenience in exchange for a fundamental loss of control over your software and data. Stallman has described the entire cloud as a ‘trap’ designed to give corporations access to individual’s data and prevent them from processing it freely and privately. As the Edward Snowden NSA leaks have shown us, even where cloud service providers are not making free with our data, its very presence on the internet tends to mean that it can be intercepted and retained by third parties.
While it is not a complete solution, storing your own data on your own server, possibly using FOSS you compiled yourself goes a long way to frustrating surveillance, whether legitimate or illegitimate. After all, the concept of ‘software freedom’ encompasses not only the free exchange of software but also software’s ability to protect our personal freedom by only doing what we want. Where the cloud replaces our ability to protect our personal freedom through personal computing and personal storage, it could be considered detrimental.
Secondly, most traditional free and open source licences were written to promote sharing of ideas and code in the context of software being distributed and run locally. Their requirements for attribution and licence-communication are triggered by distribution of the software itself. Of course this made much more sense before the advent of the cloud; now, it is quite possible to gain a lot of value from a piece of software, and sell that value to others, without ever transferring a copy of the software itself. As mentioned above, the Affero GPL tries to deal with this change by adding a new trigger for responsibilities – use of software for service provision over a network.
However this new condition will always be harder to enforce than its distribution-centred predecessors. When we receive a copy of a piece of software, even if it compiled, we can closely analyse it to try to ascertain if it is based on other FOSS that might require attribution or licence-communication. When we are only communicating with a program across a network we have far less ability to discover whether it is fulfilling its responsibilities to the authors of software it may reuse.
Finally, the combination of the growth in cloud services and closed – often mobile - platforms presents a third threat. Free and open source software development thrived and grew as a result of the personal computing revolution that began in the 1980s. The PC allowed anyone to develop and run software. The resulting code could be given to others who could also run it without the need to have it signed or approved by the PC’s manufacturer.
Increasingly however, the PC is being supplanted with more portable and usually more ‘closed’ client devices such as tablets, games consoles and phones. To make up for the computing power these devices lack, cloud services are used to supplement their capabilities. If this trend continues, we are at risk of losing the ability as individuals to easily create and run code without needing third party approval, whether from our cloud provider, our device manufacturer or both. It is questionable whether the free and open source software movement would ever have attained its present near ubiquity if it had been forced to develop under this emerging model.
So while the move to the cloud shows how useful free and open source solutions can be in implementing technologies, uniting technology companies and preventing lock-in, it also ironically threatens the conditions that permitted FOSS to develop in the first place.
The cloud opens great possibilities for individuals and small companies to access resources that would previously have been impractical to acquire and use. At the same time, it offers a model of computing that subtly shifts control away from the individual user. In a free market it is appropriate that consumers should be able to choose a solution that fits both their desire for convenience and their tolerance to mediation between themselves and their computing resources. Like most technological innovations, the cloud brings both opportunity and risk.