Project Update: Accuracy, Already useful, Technical problems and historians and IT
A look at the first seven plots used as a demonstrator and what they already have shown and a progress report on the scanning and some of the issues raised.
A day after my last post I uploaded a very crude demonstration of how Googlemaps could be used to show spacial information. The original proof of concept had used some imaginary test data, the SS Lollipop carrying a cargo of sweets was one. However I was concerned about the accuracy of the plots and therefore how useful overlaying the locations on Google maps would be. With these concerns in mind I decided to use the first seven records scanned as an experiment. The results of just these seven reveal some surprising historic facts on how the coastline has changed overtime and resolved a (very) minor mystery.
We also look at some technical problems of scanning and some political ones as well.
The Plot
My concerns over the accuracy of the plots was two fold. Firstly how accurate was the source data? The original records are written to half a second, which in theory should locate a position to within some 15 metres or about 17 yards. This would be more than sufficient for historic purposes. However having some experience of navigation I know using stellar navigation an accuracy of within a nautical mile is coincided good enough. The positions however recorded in the records are used to update Admiralty charts and therefore can be expected to be, if not within 15 metres was in my opinion going to be close enough and indeed as close as was possible to get without a complete resurveying of the bay and river. The greater concern was that since the records had been updated in the mid 70s a new datum for latitude and longitude had been adopted which googlemaps used. Depending on location the discrepancy between the new and old system could be as much as 100 metres! Before deciding whether to add the complication of adjusting the plots for the new datum I decided to replace my imaginary set of wrecks with real data.
You can imagine my initial reaction when one of the plots shows not in the river or bay but a significant distance onshore. The ship is the Admiral Nelson wrecked in 1886. The ‘Circumstances of Lost’ in the record reads ‘Drove on shore on Wallasey beach…’. If you zoom in on the plot you can see the location is shown along an irregular feature and to seaward is a rather straight seawall. The wall was built around 1930 and the irregular feature is the old coastline. So our sample of seven already has shown us how the cost around the Wirral has changed over time. The next example of how plotting the wrecks on google helps is the case of the Albert. I chose seven records as the last four where all called Albert. One simple referred to ‘Albert’, one referred to ‘Albert’ with a hand written 2 next to it. The other two referred to ‘Albert (part of)’ one with another hand written 2 after it. Now a professional navigator may well be able to mentally plot the locations of these wrecks in their head but for use lesser mortals how did the parts of Albert relate to the two Albert wrecks? It became immediately clear that ‘Albert’ 2 was a totally different wreck as it is in the bay while the other three form a cluster in the river.
It is clear I think from this very small sample that the plotting exercise is going to show a number of interesting and historically significant aspects of the wrecks of Liverpool Bay.
Scanning Report:
The original plan was to scan the Mersey Docks and Harbour Board records and OCR them into a database. There have been however a number of issues. Firstly the quality of the original documents and the software available. The second is the attitude of the archive to the scanning exercise itself.
While the records are clean and typed the OCR exercise is not going as smoothly as I had hoped (surprise, surprise). The records a typed in Capitals and due the typewriter ribbons used wearing out and not completely forming the letters correctly B s are often read as R s for example. This is exacerbated by the records being written in in a short hand INS meaning ‘insert’ for example. The results is defiantly not normal English, and as my OCR software is Adobe Acrobat which for OCR is quite limited the resulting output is unusable with out a great deal of data cleansing. I have therefore as I use scanning/OCR a lot and plan to increase its use decided to buy a professional edition of Abbyy for my Christmas present. This will allow a form to be specified and hopefully ‘learn’ the jargon used reducing the work load of preparing the scanned data for input to a database.
There have been some issues arising with the museum and their attitude to the scanning. Perhaps surprisingly this has not been about the potential damage to the originals a scanning exercise can do but the dreaded ‘elf n safety’ and what is and is not ‘research’. The achieve has placed an embargo on scanning the documents as the equipment used (mine) had not been PAT tested. The fact that the equipment is not required (due to voltage) to require testing and as a member of the public using private equipment I am exempt appears not to be relevant. The solution shows the illogic of the situation. My scanner runs of the USB port on my laptop as long as I am not plugged into the mains I can continue scanning! This solution limits me to scanning 50 records a day before the battery runs out. As the Research Society want to increase the use of modern technology discussions are on going to find a more reasonable (logical) solution.
The second issue is the use of the scanned records as part of the google map. The archive wish to have the link in the googlemap removed and if the records are to be accessed a visit to the archive required, otherwise ‘it would just be a digitization exercise’. If I had visited the archive and diligently by hand copied the records and then published a book, of which there are many in the library, this would be considered my personnel and valid historical research. However if I write an API, format a web site and offer as evidence of the data presented a image of the record that is not research and just coping. Its putting the image on the site which seems to be the stumbling block. While I can see a potential copyright issue that’s not how the problem is phrased. So as it stand at the moment while I will probably be able to show wreck locations and an edited summary of the records you will have to travel personally to Liverpool to view the full record.
It is I suppose the attitudes of the professional historian and archivists to IT that promoted this site in the first place, and I can at least show how it could work with the existing seven plots which are being allowed.