Susannah Ross
Communication Services

 



Home

Using a website...


Using a website to open up the archive

by Julia Swann and Susannah Ross
(Free Pint - www.freepint.com - 28th April 2005)

When people say "It's all on the Web now", the response is usually "Yes, but only if you can find it". In quite a few cases, the response would be "No it's not; it's still in leatherbound books on our shelves!" This article is about how the Newcomen Society, a learned society dedicated to the history of engineering and technology, made its archive of fascinating research papers available through its website and at the same time attracted new members and created a welcome new source of income for the Society.

75 volumes: 1,000 papers

The Society's Transactions, containing research papers about topics as varied as windmills, railways and vacuum cleaners, date back to 1920. Although all today's volumes are originated in electronic form and copies of recent volumes are available, until recently anyone who happened to know of the existence of a paper in one of the early volumes had to ask the Society to make a photocopy. These early volumes, some of them finely bound and long out of print, were becoming somewhat battered. Fortunately, a legacy from a member offered an opportunity both to preserve the Society's archive and to make it available to a wider audience through the Web.

Making electronic copies and a database

The volumes were unstitched and each page scanned. All 75 volumes of the Transactions are now in electronic form on a CD, which is used as the master copy for any printouts, thus preserving the fragile originals for posterity. This first part of the job was undertaken by a firm of specialists, Somcom. The second part of the job, also successfully completed by Somcom, was to mount the papers on the Web and provide a database with search facility together with an online payment system.

All available now?

It's a common misconception that once an archive is in electronic form and available online, it will be found by anyone using a search engine. What people forget is that the search engine robots will not usually be able to gain access to data in a database - especially valuable data that is supposed to generate income for its owners. To gain access to the Newcomen Society's data, the robots would need to formulate the same queries as a human being (and then get out their credit or debit cards!) The consequence was that the papers were still really only available to those who already knew how to find the site and who, once there, queried the database and then chose to buy papers in their particular area of interest. For example, anyone who typed "davy lamp" into a search engine would not be directed to the relevant paper, as the keywords would remain hidden in the database.

Enter the website

This is where the website came in. The role of the Society's main website is to draw in visitors who are not aware of the existence of the Society or its website but are interested in the subject. This website, like so many other Societies' sites, was originally seen as a way of informing existing members of forthcoming meetings, visits and events, and of providing them with other useful information. It was not generating visits by non-members, unless they had already heard of the Newcomen Society, nor was it being used to show off the Society's main asset - its splendid research papers.

Using the table of contents

The next task was to get all the titles of the papers up on the Web in a form that would be attractive to the search engines. The table of contents, listing all the papers in the Transactions, was scanned, converted to machine-readable text (OCRd) and then mounted on the Web as an HTML file. This, however, was far too big and slow to load and needed to be split into manageable chunks.

A mini-database behind the scenes

By importing the OCRd table of contents into an Access database, it was possible to search and sort it in many different ways. Thus it could be presented in smaller quick-loading files, selected alphabetically by author (A-C, D-E, F-H etc.). Fortunately many of the titles were quite long and descriptive, containing a wealth of useful keywords, so almost instantly the traffic to the website increased.

Presenting the information better

A list of titles by authors A-C is not the most attractive or usable page to present to casual visitors to your website. Given the database of titles, it was not a huge job to search on various keywords and collect together titles on various themes, such as canals, bridges, steam engines, mills and electronics, and to produce a separate Web page for each theme. This also had the advantage of producing pages with greater keyword density: for example the word "mining" might occur eight times on a themed page, as opposed to once or twice in the alphabetical list of authors. And a list of papers on one topic makes a more interesting menu for the casual visitor than a list of titles by author.

Advertising your wares

The casual visitor, interested in canals for example, may well be pleased to see a list of titles of research papers on canals, but lists still make rather dry reading. For this reason - and to give a taste of what is in the archive - a handful of papers were picked out and written up in 'highlights', which could be found via quick links on the home page. This was rather laborious work (though quite good fun), as it involved browsing through the archive, reading a few papers quite carefully, selecting some lively passages with good illustrations, and wrapping each excerpt around with some useful context.

Search engines love depth

The result of the 'highlights', measured by log analysis and the number of requests for full copies of the original papers, has been quite dramatic. The reason that these pages are being found so easily is, once again, keyword density. But with excerpts rather than just titles, the keywords can be much more specific. For example, whereas the term "Mulberry harbours" appears only once in the themed list of titles, in the excerpt it appears several times, as do other related keywords. So visitors who have typed in "Mulberry rafts", "Churchill + Mulberry" or "Mulberry floating breakwaters" are also directed to the Society's website because all the additional words occur in the excerpt.

It works!

An excerpt from the paper titled "Beauvais Cathedral" has been found regularly by search terms as broad as "cathedral structure", "gothic cathedral engineering" and "nave collapsed", on top of all the searches that included the word "Beauvais". In a recent application for membership, the answer to the question "How did you hear of the Newcomen Society?" was "website article on Beauvais Cathedral" - proof that presenting highlights like this really works.

Topicality helps

While Mulberry harbours and cathedrals seem to be of perennial interest, the popularity of other topics may be more fleeting. The highlight on the 'Big Stink' of 1858 was particularly popular at the time of a television programme on Sir Joseph Bazalgette, who designed London's sewers. This was pure luck, but it would be clever to anticipate this sort of opportunity by looking out for anniversaries (such as Brunel's bicentenary in 2006), and coming television programmes, films or books, and having relevant excerpts ready.

Extra breadth

Another substantial increase in traffic happened as an unexpected result of a quite separate initiative. The index to the earliest 32 volumes of Transactions had long been out of print, and two kind volunteers laboriously scanned and OCRd the remaining library copy so that it could be made available as a free download from the website in PDF. Because the index lists the full range of keywords in the papers in the volumes, this PDF file presents a concentration of yet more useful words and phrases that could bring searchers to the site.

Again, it works!

Originally intended as a way of directing members to the right volume of Transactions, the index - even though it was a PDF file and a big one at that - was acting as a very successful magnet, attracting on average 40 hits per day. For example, someone recently searched for information about the "Whitefriars glass furnace"- a fairly specific and esoteric query. The search engines would not have found it in the database, nor in the list of titles, nor in any of the excerpts. It was the juxtaposition of these three keywords in the index that enabled the searcher to find the website and, we hope, the relevant paper in the archive.

We've tried to show that using a website to open up the archive involves not just the technicalities of converting documents to electronic format, but finding ways of presenting information in a way that is interesting and helpful to the user and makes sense to the search engines.

Back to the top




articles & books by Susannah Ross
 Articles &
Books




© copyright Home | About us | Broadcasting | Websites | Written word | Contact | Site map