Alberto Campagnolo: Errata (per oculos) corrige: Visual identification of meaningless data in database records of bookbinding structures.

In electronic databases, the encoding schema at the base of the database structure, allows for immediate monitoring of data correctness and completeness during input. Missing and inadmissible data is highlighted right away by the computer, prompting the compiler to add or correct the missing or erroneous data. Data validation yields to a reduction in errors and acts as quality control.

Not all mistakes can however be eliminated through data validation. Just as in language, the notion of ‘grammatically correct’ — or ‘valid according to the schema’ — should not be confused with ‘meaningful’. Consider the following sentences: (i) colourless sewing passing through stations; (ii) stations through passing sewing colourless. Both are nonsensical, but (i) would be recognized as grammatically correct by any English speaker. (Chomsky 1957) In the same way, data within a database can be valid — grammatically correct — but meaningless.

The task of checking for meaningfulness of data still lies with the compiler or a subsequent reader/editor. However, the fragmentation of the information within the dataset renders the mental visualization and consequent synchronic data analysis necessary to check for meaningfully correct data unmanageable for the human mind.

Visualization of bookbinding structure records

The Ligatus Research Centre of the University of the Arts London has developed a descriptive schema and glossary for bookbinding structures utilizing eXtensible Markup Language (XML) technologies. In 2007, this schema was used to survey the bookbinding structures of the printed books from the Library of the Monastery of Saint Catherine on Mount Sinai, Egypt. The structured description of bindings allowed for validation of data during the survey, resulting in fewer errors in the data compared to the previous paper-based survey of the manuscripts of the same library. (Velios and Pickwoad 2008, 2009)

More recently, we have been working towards a methodology to automatically transform the XML bookbinding structures descriptions recorded during the 2007 survey into Scalable Vector Graphics (SVG) diagrams. These automated visualizations offer many advantages: (i) standardized output, making them more easily compared and remembered; (ii) production speed, as they can save significant surveying time; (iii) synchronic view of data for each structure, and (iv) better accuracy, as they function as verification of the meaningfulness of data during the survey.

While uncertain, imprecise, and incomplete data is accommodated for and flagged through graphical means in the generation of the automated diagrams, the presence of errors due to valid but meaningless data is not easily identifiable through automated means. We therefore aim to provide a method for identifying these difficult to detect types of errors.

Visual accuracy control

In order to improve accuracy, thus diminishing the demand for the post-processing and editorial work needed on the collected data, we propose that our automatically generated diagrams can be used as a visual method to easily check for correctness.

As we have seen, data input into the database can be valid but meaningless. One could foresee the surveyor, or a subsequent reader/editor going through the XML tree of data and check for the correctness of these, one element at the time in sequence. However, the amount of information that would need to be kept in mind in order to be able to mentally visualize and synchronically analyse the data would be unmanageable. Mistakes thus easily slip through the control net and remain unchecked.

These problems can be solved by resorting to visual means, if these are strictly linked to the recorded data, as in the case of our diagrams. More than a third of the human brain is devoted to vision, our main way of gathering information about the world. (Findlay and Gilchrist 2003) Diagrams, as visual communication systems, naturally offer information in a synchronic manner, and can immediately highlight mistakes.

In fact, if the data input in the database is valid but incorrect the automated diagram will necessarily reflect this by showing something that cannot represent a reality. In other cases, the diagram will show something that is not relevant or consistent with the object being described. In both cases, the diagram will prompt to check the data and correct it accordingly. If the generation of the diagrams is integrated in the input interface, the surveyor could check for data validity during the survey, resulting in immediate correction of problems and increased accuracy of the data within the database.

Summary

Data accuracy is an essential element for any database, but automated data validation systems cannot avoid all kinds of errors. We propose a visual accuracy control system through automatically generated diagrams to identify meaningless data and increase accuracy of data within a database of bookbinding structures.

References

Chomsky, Noam (1957), Syntactic structures, The Hague; Paris: Mouton.

Findlay, John M and Iain D Gilchris (2003) Active vision: the psychology of looking and seeing, Oxford: Oxford University Press.

Velios, Athanasios and Nicholas Pickwoad (2008), “Collecting and managing conservation survey data”, in Gillian Fellows-Jensen and Peter Springborg (Eds.), Care and Conservation of Manuscripts 10: Proceedings of the Tenth International Seminar Held at the University of Copenhagen, 19th-20th October 2006, Copenhagen: Museum Tusculanum Press; University of Copenhagen, pp. 172–188.

Velios, Athanasios and Nicholas Pickwoad (2009), “An optimised workflow for large-scale condition surveys of book collection”, in Matthew James Driscoll and Ragnheiđur Mósesdóttir (Eds.), Care and Conservation of Manuscripts 11: Proceedings of the Eleventh International Seminar Held at the University of Copenhagen 24th-25th April 2009, Copenhagen: Museum Tusculanum Press; University of Copenhagen, pp. 269–290.