CIRCA:Renear, H. Allen. “Text Encoding”

From CIRCA

Revision as of 08:44, 9 April 2011 by JosephDung (Talk | contribs)
(diff) ← Older revision | Current revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Renear, H. Allen. “Text Encoding.” A Companion to Digital Humanities, ed. Susan Schrelbman, Ray Siemens, John Unsworth. Oxford: Blackwell, 2004. Reviewed By Joseph Dung

Renear in his article “Text Encoding” (2004) provides valuable insights about the historical and theoretical context needed to understand “both contemporary text encoding practices and the various ongoing debates that surround those practices.” He expounds on the theoretical frameworks that guided the development of markup related techniques and systems and the ongoing debates that surrounded them. He pointed out that traditional humanities computing was concerned mostly with literature and language analysis but text encoding encompasses a wider sense to include new cultural products like “new media”. In presenting a brief history of markup languages he proceeds to delineate the advantages of the dominant type of markup model that has been developed: the Descriptive markup.

However, even as Renear discusses the Descriptive model, he discusses how that model naturally fits with our view of text as an “Ordered Hierarchy of Content Objects” (OHCO); an intuitive view of text structure he challenges in his other article, “Refining our Notion of What Text Really Is: The Problem of Overlapping Hierarchies"-- with various axiomatic arguments. Since markup languages grew as a modern system of annotating text for additional processing by machines, they were initially procedural since formatting commands and instructions were embedded inside the text. Descriptive Markup coincided with the birth of SGML and had the advantage of labeling parts of the document in a way authors could understand. Underlying this Descriptive Markup is the idea that markup should be focused on the structural aspects of the document and leave the visual presentation of that structure to the interpreter. The way Descriptive models work (identifying words, lines, passages, paragraphs, headings) naturally fits the way humans understand text. Descriptive Markup labels also function like assembler mnemonics (or macros), where abbreviations could be used to represent and decode longer strings, enabling the possibility of creating global variables that can affect some aspects of the document, without entirely affecting the whole document. Renear compared and contrasted the descriptive and procedural models giving credence and weight to the former.

While providing background detail of the creation of SGML, XML and the TEI , Renear also took time to clarify confusing terminology in the field. SGML is not a markup language in the traditional sense, but it actually is a meta-language: a language that provides all the basic elements for authors to build their own markup languages. SGML provided a means by which other specific “grammars” could be built for any range of documents. Renear discusses how SGML finds use beyond the formatting of documents--it can also be useful for data interchange.

In the end Renear acknowledges how SGML’s adoption gradually suffered, first in the publishing world when WYSIWIG Word programs appeared which provided an even richer visual metaphor for text processing beyond macros and then second with the creation of HTML. HTML, even though it lacked an initial Document Type Definition, and had, in the words of Renear, “an impoverished element set”, was a much simpler and forgiving markup language. Curiously, HTML also included both descriptive and procedural markup in its syntax — a mix of different text encoding approaches. It is quite possible that the very structure of WYSIWIG text processing with its visual metaphors and the non-linear nature of hypertext in HTML challenged, for Renear, the descriptive OHCO model of text structure. Renear argues that a newer encompassing theory of text structure is needed to both understand and create new techniques for text encoding.

Personal tools