CIRCA:TEI XML

From CIRCA

(Difference between revisions)
Jump to: navigation, search
Line 16: Line 16:
This article focuses on TEI XML as a technology, including its meaning, its history, and its significance in both academia and wider realms.   
This article focuses on TEI XML as a technology, including its meaning, its history, and its significance in both academia and wider realms.   
-
XML is a markup language that is largely undefined – while it comes with rules and procedures, there is no set vocabulary or syntax associated with XML.  Instead, these are made at the discretion of the encoder.  Thus, XML is often accompanied by a DTD (document type definition), which specifies the rules a particular encoding follows.   
+
XML is a markup language that is largely undefined – it is a meta-language that comes with rules and procedures, but there is no set vocabulary or syntax associated with XML.  Instead, XML is used to develop particular markup languages by encoders and document analysts.  Thus, XML is often accompanied by a DTD (document type definition) or Schema, which specifies the rules a particular encoding follows.   
[[Image:TEI DTDs.png|200px|left|TEI DTDs.]]  XML is in therefore a subset of XML itself, created and maintained by a consortium.  Consisting of a number of guidelines known as [http://www.tei-c.org/Guidelines/P4/index.xml P4] and [http://www.tei-c.org/Guidelines/P5/index.xml P5] that propose specific [http://www.tei-c.org/Guidelines/P4/p4dtd.xml DTDs] and [http://www.tei-c.org/release/doc/tei-p4-doc/html/index.html tagsets] to use, TEI XML attempts to provide some basis of commonality to encoders.  TEI XML has been designed to suit encoding for a number of different types of data documents for the humanities and social sciences: manuscripts, dictionaries, poems, prose, transcripts, etc.  The idea is that anything should be able to be coded using these guidelines.  There is a great deal of flexibility to TEI XML as well, which increases its desirability amongst users.  TEI XML can be customized to fit a project, ant the encoder chooses what they wish to tag or not tag.  This means that while TEI proposes a set of guidelines that allow for easier transference and sharing of documents amongst scholars, it doesn’t provide a standard, which would be an unflexible set of rules that all encoders must meet.  Therefore, different documents in TEI will be encoded differently.  A user can still make a document encoding their own, even while allowing it to be more accessible to other scholars.  Generally, TEI encoded documents include a core tag set (teiHeader), a base tag set (selection of prose, play, etc.) and an additional tag set (linking, figures, etc.)
[[Image:TEI DTDs.png|200px|left|TEI DTDs.]]  XML is in therefore a subset of XML itself, created and maintained by a consortium.  Consisting of a number of guidelines known as [http://www.tei-c.org/Guidelines/P4/index.xml P4] and [http://www.tei-c.org/Guidelines/P5/index.xml P5] that propose specific [http://www.tei-c.org/Guidelines/P4/p4dtd.xml DTDs] and [http://www.tei-c.org/release/doc/tei-p4-doc/html/index.html tagsets] to use, TEI XML attempts to provide some basis of commonality to encoders.  TEI XML has been designed to suit encoding for a number of different types of data documents for the humanities and social sciences: manuscripts, dictionaries, poems, prose, transcripts, etc.  The idea is that anything should be able to be coded using these guidelines.  There is a great deal of flexibility to TEI XML as well, which increases its desirability amongst users.  TEI XML can be customized to fit a project, ant the encoder chooses what they wish to tag or not tag.  This means that while TEI proposes a set of guidelines that allow for easier transference and sharing of documents amongst scholars, it doesn’t provide a standard, which would be an unflexible set of rules that all encoders must meet.  Therefore, different documents in TEI will be encoded differently.  A user can still make a document encoding their own, even while allowing it to be more accessible to other scholars.  Generally, TEI encoded documents include a core tag set (teiHeader), a base tag set (selection of prose, play, etc.) and an additional tag set (linking, figures, etc.)

Revision as of 07:42, 21 October 2010

Slides as presented in-class

Colette Leung

October 14, 2010

Contents

Background

The acronym TEI can be used to refer to any of three different meanings. TEI stands for Text Encoding Initiative, and can refer to either:

TEI Logo.
  • TEI as a research community
  • TEI as an organization and consortium
  • TEI XML.

This article focuses on TEI XML as a technology, including its meaning, its history, and its significance in both academia and wider realms.

XML is a markup language that is largely undefined – it is a meta-language that comes with rules and procedures, but there is no set vocabulary or syntax associated with XML. Instead, XML is used to develop particular markup languages by encoders and document analysts. Thus, XML is often accompanied by a DTD (document type definition) or Schema, which specifies the rules a particular encoding follows.

TEI DTDs.
XML is in therefore a subset of XML itself, created and maintained by a consortium. Consisting of a number of guidelines known as P4 and P5 that propose specific DTDs and tagsets to use, TEI XML attempts to provide some basis of commonality to encoders. TEI XML has been designed to suit encoding for a number of different types of data documents for the humanities and social sciences: manuscripts, dictionaries, poems, prose, transcripts, etc. The idea is that anything should be able to be coded using these guidelines. There is a great deal of flexibility to TEI XML as well, which increases its desirability amongst users. TEI XML can be customized to fit a project, ant the encoder chooses what they wish to tag or not tag. This means that while TEI proposes a set of guidelines that allow for easier transference and sharing of documents amongst scholars, it doesn’t provide a standard, which would be an unflexible set of rules that all encoders must meet. Therefore, different documents in TEI will be encoded differently. A user can still make a document encoding their own, even while allowing it to be more accessible to other scholars. Generally, TEI encoded documents include a core tag set (teiHeader), a base tag set (selection of prose, play, etc.) and an additional tag set (linking, figures, etc.)

The mission of the TEI is stated as follows: “[to] develop and maintain a set of high-quality guidelines for the encoding of the humanities texts, and to support their use by a wide community of projects, institutions, and individuals.” (TEI, 2010) The TEI Consortium seeks to do this through developing guidelines, disseminating information, training workshops, and the cultivation of a research community. Currently, the TEI is used by libraries, museums, publishers, and individual scholars to present texts for online research, teaching, and preservation.


History

The origins of TEI XML must be traced back all the way to 1987. It was during this year that the Poughkeepsie Conference was held in New York, and the TEI was founded out of this to develop guidelines for encoding machine-readable texts of interest to the humanities and social sciences. We can see from this that although XML wasn’t initially involved, the desire for sharing, creating software and simplifying the training of encoders goes far back, as well as the desire to create some sort of control for the sprawl of markup languages.

Members of the Poughkeepsie Conference
In June 1990, the P1 guidelines were created. These were the guidelines outlined in the Poughkeepsie conference, and for the following three years, they were revised and improved. This finally resulted in the P3 guidelines for markup languages in 1994. These would become the most widely-used standard for text materials for performing online research and teaching, and are now the standard for the humanities.

However, by 1998, XML was recognized by the W3C, and so the P3 had to be updated to enable users to work with the emerging XML toolset. A consortium was created to maintain, develop, and promote TEI. By 2002, P4 was created, basically an XML version of P3. The Consortium actively continued to work on these guidelines, even as P5 was begun and released in 2007. The P5 guidelines are currently still under revision.

The TEI website also maintains accessible archives to past guidelines and listservs.

Significance

There are many online projects that use TEI XML. The TEI website includes a list of major projects, some of which include the Medieval Nordic Poetry Project, HESTIA: Herodotus timeline, and Fine Rills Henry III.

Any one of these project websites can be visited. Upon selecting “View Source” under the View Menu in your browser toolbar, you can see the TEI encoding used for these projects.

View Source for the Medieval Nordic Poetry Project with TEI XML

What is perhaps most significant of the TEI XML, however, is its success. Many argue that all of the goals of the Poughkeepsie conference have not only been met, but have been exceeded by the TEI. The TEI is very significant in the history of humanities computing, as it is symbolic both of what technology can do for the humanities, but also because it is so widely used and recognized. In fact, TEI XML is the encoding scheme of choice for the production of both small and large documents and projects in the humanities, whether they are texts, reference works, linguistic corpora, etc., as well as for cultural heritage collections. As such, it can be seen that the TEI spreads across many subject areas, and is available to students, scholars, and the wider public. This allows it to not only be easier to share work amongst scholars, but also can serve as a tool for long-term preservation. TEI XML connects many different social science and humanist disciplines together, and has created a diverse research community, who can write in and recommend changes and additions to existing guidelines.

TEI XML has been endorsed by a number of organizations, including digital libraries, electronic text progress, and the Library of Congress. The latter has produced guidelines for the best practice in applying TEI metadata. Further, TEI has not only allowed an exchange of information, but “improves the ability to describe textual features.” (Renear, 2004) TEI allows for new possibilities of representation and communication, and illuminates new textual issues.

Related Technologies

TEI XML is a subset of XML, and so it worth remembering that is thus related to other technologies, such as SGML, XHTML and HTML.

Special recognition should also be given to a specific DTD of TEI, TEI Lite. This came into existence in 1995. It is a subset of the full TEI encoding scheme, since there is much of the full TEI that is unnecessary. TEI Lite is so efficient, however, that it meets 90% of TEI community needs 90% of the time. (Vanhoutte 2004, 11)

It is also worth mentioning metadata schemes as related technologies of TEI XML. These tend to be designed for specific purposes, instead of covering a wide expanse of documents such as what TEI XML does. This makes certain metadata schemes incredibly useful in certain circumstances.

Dublin Core
The Dublin Core texts are perhaps one of the most popular of these metadata schemes. There are 15 base texts in the Dublin Core, which were developed in Ohio, and not Ireland as the name might suggest. This metadata scheme was meant for Library and Information Sciences, as well as Computer Sciences. The mission of the Dublin Core is stated as “to provide simple standards to facilitate the finding, sharing, and management of information.” It was designed for books, digital videos, sound, image, or text files.

It is possible to use TEI in conjunction with these metadata schemes, much like one can choose to use many different TEI DTDs for one document, if there are different kinds of text in that document. (For example, a piece of literature may have poetry as well as prose.)

METS
Another popular metadata scheme is that of METS (Metadata Encoding and Transmission Standard). This is XML designed for digital libraries. It is a standard for encoding descriptive, administrative, and structural metadata. This standard is maintained in the Network Development and MARC Standards Office of the Library of Congress. These are simply two examples of other XML metadata schemes.

Future

Considering the expansive use and scope of TEI XML, it easy to say that TEI will likely be a tool in existence for a long time, or at least as long as encoding languages are in use. TEI is used for so many different projects, and covers so many different areas, it is difficult to imagine it disappearing overnight, especially as it is continually updated and revised, such as with the current TEI P5 guidelines. This is part of what enables TEI to be projected as a tool for long-term preservation, and for software designs such as oXygen, an XML editor.

It is worth noting that many scholars speculate that TEI XML could benefit from more expansive and easier to use tutorials on how to use TEI XML. Perhaps this a direction that the future of TEI XML could take. As well, there is speculation that perhaps TEI XML will grow to include complicated kinds of text (Jannidis, 2009), instead of existing in conjunction only to metadata schemes such as the Dublin Core and METS.

References

Center for Digital Research in the Humanities. University of Nebraska-Lincoln. “What is TEI?” Accessed October 3, 2010. http://cdrh.unl.edu/articles/guide_site/tei.php

Jannidis, Fotis. “TEI in a crystal ball.” Literary & Linguistic Computing 24 (2009): 253-265. Accessed October 3, 2010. doi:10.1093/llc/fqp015. http://ehis.ebscohost.com.login.ezproxy.library.ualberta.ca/eds/pdfviewer/pdfviewer?vid=1&hid=114&sid=bb4cc903-5aec-4931-b42a-af2869e86232%40sessionmgr113

Mandell, Laura. University of Ohio. “An Introduction to TEI.” Accessed October 3, 2010. http://www.users.muohio.edu/mandellc/xml/

MIT Libraries. “Metadata Reference Guide: TEI (Text Encoding Initiative) Metadata.” Accessed October 3, 2010. http://libraries.mit.edu/guides/subjects/metadata/standards/tei.html

Renear, Allen H. “Text Encoding.” In Companion to Digital Humanities, ed. Susan Schreibman, Ray Siemens, John Unsworth. Oxford: Blackwell, 2004. http://www.digitalhumanities.org/companion/

TEI: Text Encoding Initiative. “TEI: History.” Accessed October 3, 2010. http://www.tei-c.org/About/history.xml

Vanhoutte, Edward. “An Introduction to the TEI and the TEI Consortium.” Literary & Linguistic Computing 19 (2004): 9-15. Accessed October 3, 2010. http://ehis.ebscohost.com.login.ezproxy.library.ualberta.ca/eds/pdfviewer/pdfviewer?vid=2&hid=114&sid=78a98960-0f8c-4903-b61a-fb6268042a6c%40sessionmgr111

Useful Links

W3C Homepage: http://www.w3.org/

TEI Homepage: http://www.tei-c.org/index.xml

TEI Guidelines: http://www.tei-c.org/Guidelines/P4/

TEI Teach Yourself: http://www.tei-c.org/Support/Learn/tutorials.xml

Projects Using TEI: http://www.tei-c.org/Activities/Projects/

TEI Lite: http://www.tei-c.org/Guidelines/Customization/Lite/

Dublin Core: http://dublincore.org/

METS: http://www.loc.gov/standards/mets/

Personal tools