Integration in service provision

by Elena Blanco on 16 May 2006 , last updated 20 May 2007

Archived This page has been archived. Its content will not be updated. Further details of our archive policy.

Introduction

What is integration?

Integration is a term and a concept that is widely used across many disciplines. In the context of IT provision, integration can be defined as the connection of disparate services. These connections are made through the exchange of information, or communication, between applications and processes, and this communication is achieved by the use of common standards and protocols.

Why is integration important?

For the system administrator it is hard to ignore integration issues. There are few cases where a system administrator is responsible for a completely standalone system that has no need to communicate with any other systems, applications or processes. In the early days of computing it was commonplace to find such closed systems, perhaps a non-networked mainframe only allowing access from a very limited number of dedicated terminals. However these days this type of system is seldom found.

Networking and, more recently, the Internet specifically, have given rise to the notion that access to many different services should be possible from anywhere in the world. With this notion comes the requirement to integrate various services, some of which are visible and some of which are invisible to the user.

To illustrate this consider the commonplace task of accessing email using an email client installed on a home or office PC. First of all the computer needs to connect to a network. Next the email client starts up and goes through a number of steps including looking up the email server’s address, establishing a connection to that email server, and providing login credentials to the email server. After the client has authenticated it can then interact with the email server to both download and send email messages.

Each of these simple steps hides a hugely complex interaction between different computers and software and in each step both ends of the interaction needs to communicate using commonly understood standards and protocols. Many of these complex interactions take place behind the scenes. In this example the user may not be aware that the email client finds out the email server’s address by looking it up in the Domain Name Service (DNS). In contrast they will probably be aware that login credentials are sent to the email server as they may have to enter their username and password. These interactions are made possible by varying levels of integration between servers and services.

So, even when a system administrator is responsible for providing a single very specific service such as running the email server, they will still need to integrate the email server with other servers and services. This would almost certainly involve integration with the DNS and possibly a backup server. It could also involve integration with a directory service or an authentication service; perhaps it would need to access externally provided filestore; perhaps it would use a virus checking service; there are many possibilities.

Integration in academia

In the commercial sector, a business usually has a high degree of control over the systems and services that it implements and supports. This control means that certain things can be mandated, for example a limited set of operating systems may be in use, perhaps a particular email client is the only client that is provided and supported. If the variety and breadth of supported systems and applications is reduced, there is usually a corresponding reduction in the complexity of a solution to integrate those systems and applications.

In the academic sector things are very different. The very nature of an academic institution means that the user populace is incredibly wide and varying. This leads directly to a wide variety of platforms and applications from which centrally provided services will be accessed. Additionally, institutions are subject to government policy that may dictate that support for certain systems or groups of people is required. For example, support for visually impaired students may require that specialist software or access methods are supported and integrated into the IT service provided to students. Accessibility is often a key consideration for academic institutions.

The need to accommodate such a wide range of needs and interests leads naturally to solutions that are as open as possible. IT service departments within academic institutions often base their solutions on open standards and protocols as this type of solution is likely to be more accessible than a solution based on a proprietary system geared towards specific systems and applications. The importance of these standards and protocols is discussed further below.

Increasing pressure on resources also leads many institutions to integrate their services in a search for the best value for money. For example, duplicated data is usually more costly to administer and store than having a single source of data that is integrated with any service that needs to use that data. Similarly, having individual system administrators all performing their own account creation and administration is costly. A central account management and authentication system is clearly more efficient but to make this possible the individual systems will all need to be integrated with this central system. IT service provision is increasingly about the integration of services rather than providing disparate standalone services.

How is integration achieved?

Communication is the key to integration. Speaking a well-described, slang-free language ensures that you can communicate with the greatest number of people. Consider a situation where three people are having a business meeting and one person speaks only English whilst the other two people can speak both English and Spanish. If the meeting is conducted in Spanish then one person cannot be part of the meeting, therefore the meeting is conducted in English allowing everyone to converse happily. However, perhaps one of the bilingual people is not a native English speaker and once the meeting has switched to English then the other two people start to use colloquial language. The use of slang may mean that the non-native speaker can understand the gist of the conversation but perhaps doesn’t get the jokes. In this case the conversation works on a limited level but one person is not getting the full picture. All of these human communication scenarios map directly to computer communication scenarios.

Communication as a tool for achieving integration revolves around standards and protocols. Using the above analogy, standards describe the languages used by the people who are conversing: standard Spanish, standard English, and colloquial English. In this analogy protocols dictate how the people interact with each other; for example shaking hands on greeting or waiting until one person has finished speaking before someone else starts to speak. Returning to the world of communication between computers, standards and protocols have strict definitions.

A standard: is a specification for achieving a specific task. It may be an open standard in which the specification is publicly available and is unimpeded by patents and copyright making it available for all to implement. On the other hand it may be a proprietary standard where in order to obtain a copy of the standard or implement it a licence must be obtained from the organization that owns the copyright for the standard.
A protocol: is a set of rules that enables or controls how two separate entities connect, communicate, and transfer data. They are rarely used alone and are commonly combined to form a protocol stack or a protocol suite such as the TCP/IP protocol suite, the collection of protocols on which the Internet runs. Protocols may be implemented by hardware, software, or a combination of both.

Together, protocols and standards describe the communication and structure of information so that any two entities implementing the same protocols and standards can be connected together as they speak the same language.

The use of a common language is at the heart of providing integrated services. At one end, this may concern the provision of a broad range of services all provided by proprietary software from one vendor. These have often been designed from the start to work as part of a suite and so automatically share a set of standards and protocols. Novell Groupwise and Microsoft Exchange are examples of this. The standards and protocols used in this type of solution may or may not be open. At the other end this may concern the provision of the same broad range of services but in this case each component or service may be running on the hardware, operating system, and software chosen for that task alone rather than as a component of a larger turnkey solution. In this case the different components are standalone products and can only be connected to the other components through the use of commonly understood standards and protocols. For maximum interoperability the standards and protocols used will usually be open.

There are several elements that are usually found at the heart of integrated services. Together, these elements provide the fundamental building blocks used when connecting disparate services together. They are:
Directory service: a directory service can be used to locate services or to locate information necessary to provide a service.
Common authentication mechanism: authentication has become necessary for nearly every network service. Having a single service responsible for authentication avoids duplication of effort and data and can simplify the user experience.
Filestore access: providing network accessible filestore is a simple way in which to make data available to more than one system or service.
Supporting infrastructure: behind the scenes supporting infrastructure is always needed. Services such as the DNS are crucial to any networked system, let alone one that is integrated with others. Another example is Active Directory that is now found in most modern Windows networks.
XML: markup languages are seeing increasing use as the glue that joins systems together describing data in a way that means it can be re-used by many different systems or services.

Each of these elements will now be explored further including an implementation example for each that is in use in a UK academic institution today.

Directory service

A directory service provides information. A directory service can be used to provide many different types of information and that information may be provided directly to a human, to another computer, or to a computer program. For example, it is common for a university to store email account information in one central database and to make that information available via a directory service. That directory service may be browseable by staff and students whilst also being accessible by email clients configured to use this directory service as an extra address book.

Directories may sound very similar to databases but there are some key differences between the two:

Directories use a hierarchical structure unlike databases where the data is stored in a related but non-hierarchical way.
Directories are designed to be mostly read, unlike databases which are designed to be read and written to in equal part. There are many different types of directory services and in order to integrate a directory service within a larger system the directory service needs to make its information available via a well known protocol. The most commonly used protocol by far for this is the Lightweight Directory Access Protocol (LDAP).

LDAP

LDAP is a protocol description allowing the exchange of data between computers. The way that the information, or data, is actually stored varies according to the directory server itself; some servers store data as plain text, some store it in proprietary formats, and others use specific database file formats. Whichever format is used to store the data, if the directory server supports LDAP then the data can be accessed by any system that also understands the LDAP protocol.

In some cases the directory server may be referred to simply as an LDAP server especially if it is a dedicated server whose sole function is to provide an LDAP accessible directory. For example a university may store all of its computer account information on one LDAP server and allow all of the systems to authenticate against this central account database using LDAP. In other cases the directory server may in fact be a server running a software product that amongst other services is making some of its data accessible using LDAP. A common example of this situation occurs when an email server running Microsoft Exchange also makes its email address data available via LDAP.

When integrating different services it is clearly an advantage to store common information in one place rather than duplicating this information on each system. Whether this common information is housed on a dedicated LDAP server or whether the information is made available from the master data source using LDAP, the LDAP protocol makes a network accessible directory possible.

Case study - using the LDAP protocol to both query and provide data

The central IT service department of University A uses the LDAP protocol in two different contexts:

Use of LDAP to make authentication queries between the university web server and Active Directory

The web server is running Apache and contains some restricted pages for which a user must authenticate before access to these pages is granted. Active Directory is used as the central authentication mechanism for the University. When a user tries to access the restricted pages the web server prompts for a username and password which is then authenticated against the Active Directory server using an LDAP query. This is achieved very simply in Apache by using the mod_auth_ldap Apache module to perform an LDAP query. This chain of events is shown here:

Security of the user’s password is a concern. The LDAP lookup between the web server and the Active Directory server is within the machine room network so security here is felt to be adequate. However, the restricted web page may be accessed over http or https so if access is via http then the user’s password would go across the university network (or Internet) in the clear.

Provision of an LDAP server to make white page information available

The LDAP server is implemented using OpenLDAP on Solaris and is populated with information from central and departmental databases. A member of the university may have more than one record in the LDAP database, one for their college affiliation and a separate record for their departmental affiliation. A standard LDAP schema for the LDAP database was not used, instead a custom schema was developed. This LDAP server is used to store white page information, such as name, email address, department details, phone number etc. This information is available to mail clients by accessing the LDAP server. There is also a set of departmental “Contacts” web pages that are generated every night from the LDAP server and stored as static web pages.

Provision is also made for third party applications to query the LDAP server and others in the university have written PHP scripts to query the LDAP database for their own use. However, the custom schema is unpublished so it would be impossible to change the way that the LDAP database was organized without breaking all these third party scripts.

The LDAP server is a successful service but some lessons have been learnt that would inform the design of any replacement service. They are:

It is highly advisable to use a standard LDAP schema.
It is advisable to obtain a numbering scheme from IANA and build the LDAP attributes and objects within this global namespace from the beginning.

Common authentication mechanism

Nowadays all networks from the small to the large host many servers. This is especially true in devolved universities where servers may be provided both centrally and within departments. The idea of single sign on, where a user only submits their login credentials once which then grants access to multiple servers and services, has long been a holy grail. Developing a common authentication mechanism is not only a goal in its own right but is also the first step in providing a single sign-on solution.

The use of LDAP has already been explored with regard to the provision of directory services. Since the essence of LDAP is to provide data that is network accessible then it becomes clear that LDAP could be used to provide a common authentication mechanism. Rather than the account information being stored locally on each networked server or service an LDAP server can be used to store all user account information. Then, when a user needs to authenticate to a server or a Linux workstation or a Windows desktop the authentication is handed off to the central LDAP server that contains all the necessary account information.

Of course the local authentication mechanism must be configured to use the central LDAP server rather than its own local system but since LDAP is so widely used there is rarely any difficulty here. In this instance, it would of course be advisable to run the LDAP server using encryption to protect the login credentials as they travel across the network.

The LDAP server would be acting as a centralized login server but it should be noted that this is far from single sign on, the appropriate login credentials need to be supplied each time that authentication is required. Some installations of this centralized login server approach appear to be single sign on via the use of shared passwords. In this case the LDAP server still holds separate login credentials for each user account but when a user has more than one account the passwords are synchronized between the accounts. Clearly, this is a huge security concern. Another security issue with using LDAP as the centralized login server is that security depends on the LDAP server being tightly locked down so that access is only permitted from a trusted group of servers.

An alternative approach to centralized authentication that has been designed from the outset to lead to single sign on is the use of the Kerberos protocol. The sphere in which Kerberos excels is in providing a system that allows multiple logins to multiple systems using multiple protocols. This makes it an attractive proposition for heterogeneous networks.

Kerberos

Kerberos provides a solution in three separate areas:

Centralized authentication - not only does Kerberos authenticate clients in one place, it also provides assurance to those clients that the user is who s/he claims to be. This protects against man-in-the-middle security breaches.
Secure protection of passwords - passwords never travel in the clear with Kerberos so there is no vulnerability to password sniffing. The encryption mechanisms that Kerberos were designed with can be extended to protect other protocols when used by a Kerberized client.
Single sign on - once a user authenticates to a Kerberos server, access is granted to any Kerberos application that defers to the main Kerberos server: no further authentication from the user will be required. It should be noted however that Kerberos is only concerned with authentication. Other information that is necessary for account management such as home directory, group membership, personal data etc. has to be maintained outside the Kerberos server. This may be maintained in local files or databases or perhaps a centralized LDAP server.

Case study - migrating from using LDAP to Kerberos

The central IT service department of University B has moved from using LDAP for centralized authentication to Kerberos for some of the centrally provided services such as mail and Linux servers providing shell accounts. All the systems using Kerberos for authentication are running Debian Linux as are the LDAP and Kerberos servers themselves.

The motivation for migrating was because the security of the LDAP setup was dependent on LDAP servers being locked down and not available for general queries. Migrating to Kerberos would also bring the following benefits:

Secure authentication mechanism.
Single sign on rather than shared passwords which is all that LDAP could provide.

With the existing LDAP solution the chain of action is as follows:

In this scenario the application server has the user password so they have to be highly trusted which is not optimal. A Kerberos solution means that the application server does not have to see the user password as the chain of action would be like this:

It was necessary to make a gradual move from LDAP to Kerberos so provision had to be made for the situation where some services authenticated using LDAP and some authenticated using Kerberos. Administrative access is through a home grown account management system so this account management system had to be made aware of Kerberos and what services had been Kerberized, i.e. the account management system had to be told where to change the password: Kerberos or LDAP. Even after all the services had migrated to Kerberos the LDAP servers would still be used for all the account information: it would still provide everything other than the authentication. The LDAP servers were implemented using OpenLDAP.

The technical bones of the migration revolved around the need to tell the LDAP servers to use the Kerberos server instead of holding the password in an LDAP field. This was complicated by the fact that LDAP does not store the password directly, it stores a hash of the password so it was not possible simply to copy the data from LDAP to Kerberos. Fortunately, a tool is available to solve this problem. The Linux based application servers were using PAM (Pluggable Authentication Modules) which are libraries that provide a consistent interface to a range of authentication protocols. This allows a system administrator to change between authentication mechanisms simply by using the appropriate PAM module for the required authentication system. In this instance, the application servers could simply change from using the PAM modules for LDAP (pam_ldap) to the PAM module for Kerberos (usually pam_krb5). To assist with the migration from LDAP to Kerberos there is another module, pam_krb5_migrate that can be used in place of the pam_krb5 module. This module takes a username and password from another authentication module, in this case pam_ldap, and adds that user to the Kerberos server. In this way, once a person logged in to any of the application servers their username and password information could be migrated from the LDAP server to the Kerberos server automatically.

Once an account had been migrated to Kerberos then the LDAP server had to be updated to indicate that the information was now held on the Kerberos server. As mentioned earlier the LDAP server was implemented using OpenLDAP and this implementation supports saslauthd, a daemon that provides a proxy authentication service for plain text authentication requests. This allowed the contents of the LDAP server’s password field to be replaced with {SASL} which tells the LDAP server to contact the saslauthd daemon which in turn contacts the Kerberos server. So when an account had been migrated to the Kerberos server using the pam_kerb5_migrate module, the password field in the LDAP server would be replaced with {SASL}.

In time it was found that LDAP and saslauthd did not perform adequately for this site which sees approximately 1.2 million logins per day so refinements had to be made. To solve this scaling problem, the application servers were reconfigured to go directly to the Kerberos server rather than going to the LDAP server first so that the load on the LDAP servers was reduced. This was achieved by using the pam_kerb5 module on the application servers in place of the pam_ldap module. This allowed any servers other than the application servers to carry on using the LDAP servers which then used the Kerberos server for authentication as before.

This migration was a complicated project but the introduction of Kerberos brought the following benefits:

Others services can now use Kerberos for authentication, i.e. it is available for general queries.
A cookie based authentication system is now in place for browsers using webauth - a way of putting Kerberos tickets inside a cookie.
Load was reduced on the struggling LDAP servers.
It produced a much more secure system. It should also be noted however that in this university this is not a wide scale solution to the issue of authentication. The community does not use Kerberized clients yet and the bulk of authentication still takes place using plain authentication via username/password.

Filestore access

A natural result of the increase in the number of services on offer is the need for users to share data between these services. Additionally, many people access these services from more than one machine or location so data needs to be accessible from multiple machines. This is especially true on academic networks where the user population is rarely contained in one place with a specific dedicated machine per user. The twin problems of making data available regardless of both platform and location have led to the general use of network storage rather than local storage. Of course access from different machines and locations is not the only reason for making use of network storage; centralized provision of filestore removes the onus from the user to the IT service providers to ensure that an adequate backup regime is implemented.

There are many proprietary solutions for the provision of filestore between a server and its clients. However, when disparate services are connected together then an open solution using open protocols and standards can maximize the accessibility of the common filestore. If data needs to be shared between integrated services then it is necessary to use a well known protocol that can be implemented by all services that need access to the filestore. In the case of filestore the most widely used protocol is the Server Message Block/Common Internet File System protocol (SMB/CIFS). In fact this protocol can be used to share not only files but printers too. On heterogeneous networks today it is usual to find an SMB/CIFS server on the network that is used to share files between Windows, Mac and Unix/Linux platforms. This file sharing may take the form of drives or filesystems that are mounted on client machines where the fact that this is network filestore is invisible to the end user. It may also take the form of drives or filesystems that are shared between different servers, perhaps on different operating systems so that data can be read and/or written between systems.

With the rise of Unix/Linux for back room services and the dominance of Windows on the desktop it is a common scenario to have to share files between Windows clients and Unix/Linux servers. Many universities have chosen to host their network accessible filestore on a Unix/Linux platform. The SMB/CIFS suite that is available for this platform is called Samba, and although it is possible to provide a Windows based SMB/CIFS server the use of a Unix/Linux based server has become ubiquitous due in large part to its reliability. Indeed it is common for the SMB/CIFS protocol to mistakenly be referred to simply as Samba.

Case study - making Mac OS X filestore available to Windows clients

A research group at University C uses a small computational cluster to perform complex calculations on experimental data. The computational cluster is composed of several servers running Mac OS X, a Unix operating system. The researchers all have Windows desktop PCs. The pattern of work is such that researchers prepare their computations using tools on their desktop PC. When ready, this data must be transferred to the computational cluster and the software that resides on the cluster performs the calculations. After the computation has been performed the data is manipulated once more using tools on the Windows desktop PCs.

In order to solve the problem of making the data available to two different operating systems the computational cluster was configured to act as a Samba server in addition to its other tasks. Since version 10.2 of Mac OS X, the operating system has shipped with Samba built in so although as a Unix operating system it would have been possible to build Samba from source it had already been ported to OS X and was available by default. The Samba server was set up so that a particular storage area was made available as a Samba share. This share was configured to allow both read and write access. The Windows desktops were then configured to access this share as they would access a share on another Windows machine. The research data was now stored on a disc that was part of the computational cluster hardware but was available to both the cluster and any Windows desktops accessing that share. However, users had to authenticate to gain access to the share.

Samba can handle authentication in various ways. It can use:

Plain text passwords - Modern versions of Windows (sensibly) require encrypted passwords by default. Therefore they cannot work with a Samba server configured to use plain text passwords unless a Windows Registry entry is changed.
Local password database within Samba - A local password database must be created and populated with account information and passwords assigned to users.
Defer authentication to an external authentication server - Typically if authentication is deferred elsewhere it is deferred to a Windows domain controller. In this case, the Windows desktops were already part of an NT domain so authentication was deferred to the local Windows domain controller and users authenticated to the Samba share in the same way that they authenticated to login to the Windows desktop itself. This choice made the Samba server set up as simple as possible as no local password database had to be created.

An additional benefit of using Samba in this way is that all members of the research group were able to access pre- and post-computation data files that had previously been maintained on individual machines. These files are also now backed up as part of the cluster’s backup procedures rather than being subject to varying personal backup regimes.

Supporting infrastructure

On a heterogeneous network composed of different integrated services there are some supporting services that must be in place for all the various public facing services to function. Probably the most vital of these services is the Domain Name Service (DNS). Without DNS the Internet would not function. Indeed without DNS most local networks would cease to function.

The DNS is a distributed global name resolution service that records all of the computer names on the Internet and their associated IP addresses. This is clearly a huge amount of data and it is the fact that the DNS is both distributed and hierarchical that makes storing and managing this amount of data possible. At the top of the DNS hierarchy are the top level domains: these are the domains found at the end of a DNS hostname such as com, org or uk. Below the top level domains there is the next tier which maps to the next dot separated part of a hostname if it is read right to left. This might be ibm, co or ac. The tiers of the hierarchy continue downwards mapping directly to the dot separated parts of a hostname.

The scale of the DNS is accommodated because at each level of the hierarchy just a single server can hold the data for that level, data both above and below that level in the hierarchy can be held by other servers.

The DNS performs two roles:

It converts hostnames to IP addresses for local and remote computers.
It provides information about local servers such as mail or web servers to remote machines trying to locate those servers. Typically a UK academic institution will have its own domain within the ac.uk domain. This domain will be delegated to that institution to subdivide and manage using their own DNS servers. When a domain has been delegated, the DNS servers at the ac.uk level are told that in order to find information within this delegated domain the institution’s DNS servers must be consulted because they are now authoritative for that domain. Very occasionally an institution may employ a third party, perhaps another institution or an ISP, to provide DNS services for their domain but this is rare.

The majority of DNS servers on the Internet are implemented using Berkeley Internet Name Domain (BIND) open source software from the Internet Systems Consortium. Microsoft also provides DNS server software which is seeing increased use on internal networks in order to support the needs of Active Directory (as discussed below).

Active Directory is a directory service provided by Microsoft and is an integral part of the Windows 2000 network architecture (and subsequent versions of Windows). It is the successor to the NT domain controller and can act in exactly the same way for backwards compatibility. It is implemented as a distributed database housed on domain controllers. Active Directory can act as the central authority for network security providing authentication and authorisation for Windows clients. In an integrated network it can also provide those services for non-Windows clients under certain conditions. For example it has already been seen that Samba running on a Unix platform can defer authentication to an Active Directory server if required. Active Directory can provide much more than authentication and authorization: it also acts as a single point of management for Windows based accounts, clients, servers, and applications.

Active Directory has many points of connection with DNS:

They have the same hierarchical structure. Indeed a DNS and Active Directory namespace can look identical.
Active Directory uses DNS as a locator service, resolving Active Directory domain, site, and service names to an IP address.
DNS zones can be stored in Active Directory.

DNS and Active Directory may sound similar but there are important differences:

DNS is a highly specialized name resolution service whilst Active Directory is a full blown directory service.
Active Directory needs DNS in order to operate but DNS can operate without Active Directory.

It is this last fact, that Active Directory needs DNS to operate, that has been the catalyst for many institutions taking steps to integrate the two services.

There are many other infrastructure components that play a part in the process of integration. Firewalls and other security devices may be used to protect services that become more exposed as a result of integration with other services. The Dynamic Host Configuration Protocol (DHCP) may be used to dynamically assign IP address and configuration information to a growing number of networked computers. Formerly isolated services may become part of a network backup strategy as they join the fray. Indeed, backup is another service that a Samba server can provide in addition to the services already discussed. With integration there is a natural tendency to move towards centralized, network wide solutions to the normal requirements of a single service. Clearly, duplicated effort and resource in providing local solutions for these common requirements is avoided as a more network centric approach becomes the norm.

Case study - DNS and Active Directory integration

The central IT service department of University D had to integrate Active Directory (AD) with the existing DNS. This integration was necessary as AD depends heavily on DNS and was catalysed by the growing number of Exchange servers. Exchange adds extra information into AD as it requires more records than other services. AD itself also needs to have extra information present in the DNS in order to work, most notably it needs SRV records.

An SRV record, also known as a service record, is a service location record that provides information on available services. This type of record was not commonly found in the DNS until recent times. Windows clients also use DNS to locate services but will fall back to Netbios naming unless this is explicitly turned off.

Microsoft’s view of DNS is that it should be dynamic and all servers and workstations register and update themselves. Therefore AD has been designed with this dynamic update mechanism in mind and AD will try to dynamically update the DNS with the extra information that it needs. Unfortunately, this dynamic update was at odds with this institution’s policy regarding the DNS. The local DNS is considered to be a very important data source, the integrity of which is vital, and dynamic update was seen as a threat to this integrity. The existing DNS structure is provided by BIND running on Unix/Linux servers and there was no desire to migrate the DNS to Windows DNS servers so BIND and AD had to be made to work together.

The DNS in this institution is managed centrally but has been subdivided into more than 200 subdomains which map onto each of the different colleges and departments of the university. Each of the subdomains could protentially want to run an AD domain. As the capacity for dynamic update was so crucial to AD it was decided to delegate the necessary zones within each subdomain to a Windows DNS server (or servers) within the subdomain that could then work with the AD servers allowing dynamic update only within this delegated zone. Initially 4, then 6 zones for a subdomain were delegated to a Windows server that then became authoritative for those zones. The zones were:

_sites.subdomain.institution.ac.uk
_tcp.subdomain.institution.ac.uk
_udp.subdomain.institution.ac.uk
_mscds.subdomain.institution.ac.uk
domaindnszones.subdomain.institution.ac.uk (for 2003 server)
forestdnszone.subdomain.institution.ac.uk (for 2003 server)

Once these zones were delegated to the Windows DNS server then the fact that the server was running Microsoft’s DNS software meant that it could easily and securely allow dynamic updates within these zones. As dynamic update was only available within these delegated zones the server could not register itself as it would need to do this is in the zone above which was still under the control of the BIND servers. Therefore the Windows DNS server’s A record needed to be added to the BIND servers manually.

Windows clients have automatic registration enabled by default and they were now able to dynamically register themselves in the Windows DNS server. Additionally the Windows DNS server allows dynamic registration of entries such as ldap, kerberos, and a global catalogue server which Windows clients use to locate the relevant service. The most important entry that a Windows client looks for is the domain controller for its domain. However, clients were still pointing at the central BIND DNS because the Windows DNS server only has information about those delegated zones. Therefore, the process of a client trying to locate its domain controller looks like this:

The approach brings the following benefits:

Takes advantage of the secure update mechanism of Windows DNS.
Keeps dynamic and static DNS separate.
No extra domains added at the institutional level. (Some institutions added a top level AD domain.)

This solution does however impose some constraints and extra work:

Restricted to existing DNS domains.
Needed to set up 6 zones per domain on the domain controllers. However, it is only necessary to set them up on one domain controller, as this will be replicated to any other domain controllers.

XML

At first glance it may be difficult to see the relevance of text markup when it comes to the integration of services. However, some form of structured text markup is being used more and more as the glue that joins different systems together. Perhaps the most familiar markup language in use today is the Hyper Text Markup Language (HTML), the foundation of the web.

A markup language provides structure by including special commands in a text document. These commands are themselves plain text, but are distinguished by some kind of escape characters; thus in XML all command words are surrounded by angle brackets. Markup languages were developed within the publishing industry as people in this business not only need access to the text to be printed but also need to know how that text needs to be treated. For example it is important to differentiate between the body text, footnotes, titles, front pages etc. HTML is a specialized markup language developed for presenting text on the web but the widespread use of the Internet made it inevitable that the use of markup languages would spill over into more general areas of IT.

Recent years have seen a rise in the use of general purpose markup languages. The best known of these is the eXtensible Markup Language (XML). The primary purpose of XML is to encode data in a way that allows that data to be shared across different systems. This is, of course, where it becomes important for integration.

XML can be used in many different contexts. For example, many institutions put significant effort into creating material for their website. That material may also be useful as printed material, perhaps in a prospectus, perhaps in conference flyers. A central tenet of the use of markup languages is that the content is separated from the presentation. If the material is written in XML then the material can be stored in a way that separates the content from the presentation. Thus the presentation can be altered according to the situation; the material on the website may look very different from the way that it is presented in a prospectus but both originate from the same data.

Another example of the use of XML is in encoding data that needs to move between different software applications as illustrated in the case study below. XML itself imposes certain syntax constraints on data but there is an additional layer, in the form of an XML schema, which imposes constraints on the structure and contents of a particular XML document. Clearly, if the format and structure of a document follows known rules then systems that know those rules can use the data contained within the document. The XML schema is the key to integrating systems using XML data.

Whilst it can be simpler to implement, storing data as plain text files has many disadvantages when compared to XML. Plain text files cannot model hierarchical data, they are susceptible to corruption through common problems such as accidental line breaking, and they need additional documentation to identify what each field denotes. XML does not have these disadvantages, indeed a big strength of XML is that the special commands within the angle brackets explicitly state what each field denotes thereby making XML files self documenting.

Case study - using XML to integrate a Virtual Learning Environment and a student records system

University E provides a Virtual Learning Environment (VLE) using Blackboard, a proprietary VLE. Currently, the account creation and course enrolment operations are performed manually using spreadsheets generated from the data held in the student records database, another proprietary system. This manual process is both time consuming and prone to error.

Fortunately, the VLE software provides a tool to perform bulk uploads of account and enrolment data. This tool supports operations on data given either as delimited plain text files or as XML encoded data. It was decided that the student record data would be best handled as XML as this would provide a path towards the institution’s longer term goal of creating a fully fledged Managed Learning Environment (MLE). It would be easy to extract the relevant data from the student records system as delimited plain text files but this would be in a form specific to the VLE and would be unlikely to be of use to any other system in this form. With XML however, there is the potential to store the data in a way that can be useful to multiple systems.

Of course just storing the data in XML is not enough; the schema used to store the XML has to be understood by all the systems that wish to use it. To this end, the IMS Global Learning Consortium, a not-for-profit organization, has developed a set of specifications that have become widely accepted standards that support the interoperability of learning objects. In other words, if all systems understand and support the IMS XML specifications then as long as the data from the student records system is published using the IMS XML schema then the VLE will be able to use this data. Indeed, any IMS compliant system will be able to use this data. Since both the student records system and the VLE are proprietary, support for IMS had to be provided by the vendor but in this case both products were IMS compliant.

Complex database scripts were written to extract the relevant data from the student records database and the data was stored as XML using the IMS schema. The data concerning account creation was kept separately from the data concerning course enrolment.

Once the data from the student records system was available as XML, then the tool provided with the VLE could be used to process these XML files. The VLE accounts could be created and once created, users could be enrolled on the appropriate courses. It should be noted that the student records system did not contain all the necessary information about available courses; these still had to be created manually at the start of each academic year. Once the course data had been created, the automatic account creation and subsequent enrolment could be performed.

The VLE’s management tool offered an option to keep the two data sources (the student records data and the VLE account data) fully synchronized. This meant that if an account was no longer present in the feed from the student records system then it would automatically be deleted from the VLE’s account database. After testing, the database scripts and the VLE management tool were all left to run automatically each night so that the VLE accounts were synchronized with the student records system.

Future work on this integration project includes automating the creation of course data. Eventually it is hoped that the student records system will provide tools that allow the transfer of grades and exam results from the VLE back into the student records system but this is dependent on the vendor of the student records software. This again would be achieved using XML and a common schema, the data being derived from the VLE and read into the student records system.

Summary

Integration is a classic example of a situation where effort committed initially reaps much of its benefit later on. It can seem like a great deal of unnecessary effort to integrate systems that are quite happy ploughing their own furrow. However, integration can bring great benefits, both short and long term, which is why so many institutions have invested considerable effort in this sphere. The benefits include:

Avoidance of duplication of data - data held in just one place automatically has higher integrity and the administration burden of managing just one data source is lower than managing many copies.
Avoidance of duplication of effort - if every new system handles its own account management and authentication then the effort of setting up ten new servers is ten times that of setting up one new server: there is no economy of scale to be had. If however, a common authentication mechanism is used then each new additional system need only use that mechanism rather than having to be locally configured. This would represent a significant saving of system administration time.
Better service to the end user - the ability to access data from anywhere is often cited as a desire by end users. Even more important to users however, is the idea of single sign on where a single act of authenticating gives access to all the services required. Predictably, people prefer to have just one username and password to remember, and many problems are avoided when end users do not have to manage multiple accounts.
Joined up service provision - rather than providing isolated, disparate services, a service provider who integrates services into a cohesive whole is better positioned to both influence and implement institutional strategy.

Alongside effort, integration requires the use of standards and protocols to facilitate communication between services. Once services are communicating then some or all of the key elements necessary for integration can be provided: elements such as a directory service, a common authentication mechanism, and filestore access. An integrated solution usually relies on supporting infrastructure technologies such as the DNS and Active Directory and data is often encoded using a markup language like XML. All of these issues must be considered carefully when planning an integrated solution.

OSS Watch provides unbiased advice and guidance on the use, development, and licensing of free software, open source software, and open source hardware.