|
Caching for E-Learning
Keith Baker – 5th July 2002
Overview
The requirement for high bandwidth Internet connectivity for
schools has long been recognised, and moves to provide this
are already underway. However, it is now also recognised that
even a 10Mbps connection is insufficient for the simultaneous
delivery of emerging e-learning content to many students -
a single digital video stream alone could take 1.5Mbps!
To overcome this an emphasis is presently
being put on local caching as a means of content delivery,
but traditional caching technologies alone will not provide
this or solve the problem. However, a specialised ‘caching
appliance’ with suitable additional functionality to
store, update and serve e-learning content locally would be
an acceptable solution,
Traditional Caching
Caching is the technique of storing previously requested data
- usually ftp and http web traffic – geographically
closer to those who may require it in the future. For example,
if an ISP has a cache, and one of its customers requests a
web page from a US site, which has not been requested before,
the cache will automatically add a copy of this page to its
data store as it passes through. If another customer then
requests this same web page, it will be delivered directly
from the cache, hence saving the delays of requesting and
retrieving the data from the original US site.
The Internet is rich in caches as they save
both time and bandwidth. In practice a request for a web site
page may be made to many caches en route, and indeed, delivered
from one of these caches, not the actual web site, if the
page had previously passed-by one of them, as a copy would
have been saved.
The most efficient form of caching from
an end-user perspective is boundary caching? as it stores
the data on a cache at the edge of their own network. This
is particularly successful where Internet controls are also
in place to restrict web access to a pre-defined list of web
sites, as they could all eventually be cached locally.
Why traditional caching alone is
not suitable for e-learning
A limitation of a cache is that it stores data sequentially
as and when it arrives, and not in any ordered form. So, even
if a complete web site had been cached over a period of time,
it would be interspersed with other sites and web pages on
the same caching device, and introduce its own delays whilst
searching for each page. Indeed, it may be quicker to re-visit
a web site in the US than to search a 5Gbyte local cache for
it, especially considering the bandwidth now available to
many end users.
A typical e-learning site could be many
tens of Gigabytes of data, far too big to fit into standard
caching products, and far too big to be efficiently navigated
even if it could, as a new search of the cached content would
be required for every page requested.
It is also worth considering how updates
to content would be made? How would the local cache know when
new or modified content had been added to the original source
data, and that it should be updated?
Additional functions required for
e-learning
For e-learning material to be effectively delivered at academic
establishments, it should be done so from a local source that
replicates the original data in a structured manner. The local
cache providing this function should also be capable of automatic
content updating as and when required, and capable of delivering
its content to many students simultaneously, all of which
is beyond the capability of standard caching technologies.
A ‘caching appliance’, however, may be suitable
if it includes the following functionality.
Structured data store
The ability to faithfully replicate the directory and file
structure of the source content, which will enable the speedy
retrieval and delivery of the content without unnecessary
searching.
Automatic ftp mirroring (data pull)
The caching appliance should be capable of automatically checking
its content against a remote server containing the source
content data, and copying any new or modified files and directories.
This would usually be done on a daily or weekly basis, and
occur during the night or weekend.
Central ftp distribution (data push)
The appliance should also be capable of receiving updates
and modifications pro-actively sent to it from the remote
server holding the source content data. This is a useful function
where many local caching appliances under the control of a
single authority are remotely updated by that authority.
Web serving
E-learning content is almost exclusively web-based, so a web
server built into the appliance would ensure its delivery
locally without additional hardware. This would also add resilience
against network failure, as all local PCs would point directly
to the server held locally on the appliance, which would in
turn deliver the content also held locally as a replica of
the source content.
Management statistics & logs
It is likely that both the controlling authority and the content
provider would be interested in the access patterns and statistics
of users, and these should be presented in a standard format
ready for easy analysis. This data may also form the basis
of charging for the service. It is also important that all
updates are recorded, and errors reported if an update fails.
These statistics and logs should be automatically transmitted
to a central administrator on a regular basis.
Furthermore, where a single authority manages
many caching appliances, messages relating to potential or
actual problems should be automatically transmitted to a central
administrator.
Performance
It is likely that the combined content of providers of e-learning
packages, such as Espresso and BBC, will exceed 100Gbytes
within the next three years, so a hard drive larger than this
should be the minimum recommended for units presently being
installed.
The appliance should also be capable
of the simultaneous delivery of e-learning content, such as
digital video, to many tens of users. A resilient and high
performance operating system, such as Linux, would be preferable,
as it would also enable more users to be served from an equivalent
platform running a Microsoft operating system.

|