Caching for E-Learning
Keith Baker – 5th July 2002

Overview
The requirement for high bandwidth Internet connectivity for schools has long been recognised, and moves to provide this are already underway. However, it is now also recognised that even a 10Mbps connection is insufficient for the simultaneous delivery of emerging e-learning content to many students - a single digital video stream alone could take 1.5Mbps!

To overcome this an emphasis is presently being put on local caching as a means of content delivery, but traditional caching technologies alone will not provide this or solve the problem. However, a specialised ‘caching appliance’ with suitable additional functionality to store, update and serve e-learning content locally would be an acceptable solution,

Traditional Caching
Caching is the technique of storing previously requested data - usually ftp and http web traffic – geographically closer to those who may require it in the future. For example, if an ISP has a cache, and one of its customers requests a web page from a US site, which has not been requested before, the cache will automatically add a copy of this page to its data store as it passes through. If another customer then requests this same web page, it will be delivered directly from the cache, hence saving the delays of requesting and retrieving the data from the original US site.

The Internet is rich in caches as they save both time and bandwidth. In practice a request for a web site page may be made to many caches en route, and indeed, delivered from one of these caches, not the actual web site, if the page had previously passed-by one of them, as a copy would have been saved.

The most efficient form of caching from an end-user perspective is boundary caching? as it stores the data on a cache at the edge of their own network. This is particularly successful where Internet controls are also in place to restrict web access to a pre-defined list of web sites, as they could all eventually be cached locally.

Why traditional caching alone is not suitable for e-learning
A limitation of a cache is that it stores data sequentially as and when it arrives, and not in any ordered form. So, even if a complete web site had been cached over a period of time, it would be interspersed with other sites and web pages on the same caching device, and introduce its own delays whilst searching for each page. Indeed, it may be quicker to re-visit a web site in the US than to search a 5Gbyte local cache for it, especially considering the bandwidth now available to many end users.

A typical e-learning site could be many tens of Gigabytes of data, far too big to fit into standard caching products, and far too big to be efficiently navigated even if it could, as a new search of the cached content would be required for every page requested.

It is also worth considering how updates to content would be made? How would the local cache know when new or modified content had been added to the original source data, and that it should be updated?

Additional functions required for e-learning
For e-learning material to be effectively delivered at academic establishments, it should be done so from a local source that replicates the original data in a structured manner. The local cache providing this function should also be capable of automatic content updating as and when required, and capable of delivering its content to many students simultaneously, all of which is beyond the capability of standard caching technologies. A ‘caching appliance’, however, may be suitable if it includes the following functionality.

Structured data store
The ability to faithfully replicate the directory and file structure of the source content, which will enable the speedy retrieval and delivery of the content without unnecessary searching.

Automatic ftp mirroring (data pull)
The caching appliance should be capable of automatically checking its content against a remote server containing the source content data, and copying any new or modified files and directories. This would usually be done on a daily or weekly basis, and occur during the night or weekend.

Central ftp distribution (data push)
The appliance should also be capable of receiving updates and modifications pro-actively sent to it from the remote server holding the source content data. This is a useful function where many local caching appliances under the control of a single authority are remotely updated by that authority.

Web serving
E-learning content is almost exclusively web-based, so a web server built into the appliance would ensure its delivery locally without additional hardware. This would also add resilience against network failure, as all local PCs would point directly to the server held locally on the appliance, which would in turn deliver the content also held locally as a replica of the source content.

Management statistics & logs
It is likely that both the controlling authority and the content provider would be interested in the access patterns and statistics of users, and these should be presented in a standard format ready for easy analysis. This data may also form the basis of charging for the service. It is also important that all updates are recorded, and errors reported if an update fails. These statistics and logs should be automatically transmitted to a central administrator on a regular basis.

Furthermore, where a single authority manages many caching appliances, messages relating to potential or actual problems should be automatically transmitted to a central administrator.

Performance
It is likely that the combined content of providers of e-learning packages, such as Espresso and BBC, will exceed 100Gbytes within the next three years, so a hard drive larger than this should be the minimum recommended for units presently being installed.

The appliance should also be capable of the simultaneous delivery of e-learning content, such as digital video, to many tens of users. A resilient and high performance operating system, such as Linux, would be preferable, as it would also enable more users to be served from an equivalent platform running a Microsoft operating system.

 

Link to Site Map