TEI XML

From CIRCA

Jump to: navigation, search
VTracker
Content deleted. (5 Occurances)
Content stucture deleted. (2 Occurances)
Content inserted. (41 Occurances)
Content structure inserted. (37 Occurances)
Content structure deleted. (1 Occurances)
Content changed. (11 Occurances)
Content moved. (6 Occurances)
Content NEED DESCRIPTION. (3 Occurances)
Content NEED DESCRIPTION. (3 Occurances)
Content structure moved. (3 Occurances)
Content NEED DESCRIPTION. (2 Occurances)
Content style of a font changed. (1 Occurances)

Slides as presented in-class

Colette Leung

October 14, 2010

Contents

Background

The acronym TEI can be used to refer to any of three different meanings. TEI stands for Text Encoding Initiative, and can refer to either:

TEI Logo.
  • TEI as a research community
  • TEI as an organization and consortium
  • TEI XML.

This article focuses on TEI XML as a technology, including its meaning, its history, and its significance in both academia and wider realms.

XML is a markup language that is largely undefined ??? it is ameta-language that comes with rules and procedures, but there is noset vocabulary or syntax associated with XML. Instead, XML is usedto develop particular markup languages by encoders and documentanalysts. Thus, XML is often accompanied by a DTD (document typedefinition) or Schema, which specifies the rules a particularencoding follows.

TEI DTDs.
The TEI XML is therefore a subset of XML itself; more precisely the TEI provides guidelines for encoding literary and linguistic texts in XML. These guidelines, which can be used to create a DTD or Schema for humanities projects, were created and are maintained by the TEI Consortium.

*The TEI guidelines known as P4 and P5 propose specific DTDs and tagsets to use. While not a standard, TEI XML attempts to provide some basis of commonality to encoders. TEI XML has been designed to suit encoding for a number of different types of documents in the humanities and social sciences: manuscripts, dictionaries, poems, prose, transcripts, etc. The idea is that most scholarly documents should be able to be encoded using these guidelines.

to use, TEI XML attempts to provide some basis of commonality to encoders. TEI XML has been designed to suit encoding for a number of different types of data documents for the humanities and social sciences: manuscripts, dictionaries, poems, prose, transcripts, etc. The idea is that anything should be able to be coded using these guidelines. There is a great deal of flexibility to TEI XML as well, which increases its desirability amongst users. TEI XML can be customized to fit a project, ant the encoder chooses what they wish to tag or not tag. This means that while TEI proposes a set of guidelines that allow for easier transference and sharing of documents amongst scholars, it doesn???t provide a standard, which would be an unflexible set of rules that all encoders must meet. Therefore, different documents in TEI will be encoded differently. A user can still make a document encoding their own, even while allowing it to be more accessible to other scholars. Generally, TEI encoded documents include a core tag set (teiHeader), a base tag set (selection of prose, play, etc.) and an additional tag set (linking, figures, etc.)There is a great deal of flexibility to TEI XML as well, which increases its desirability amongst users. TEI XML can be customized to fit a project, ant the encoder chooses what they wish to tag or not tag. This means that while TEI proposes a set of guidelines that allow for easier transference and sharing of documents amongst scholars, it doesn???t provide a standard, which would be an unflexible set of rules that all encoders must meet. Therefore, different documents in TEI will be encoded differently. A user can still make a document encoding their own, even while allowing it to be more accessible to other scholars. Generally, TEI encoded documents include a core tag set (teiHeader), a base tag set (selection of prose, play, etc.) and an additional tag set (linking, figures, etc.)

The mission of the TEI is stated as follows: ???[to] develop and maintain a set of high-quality guidelines for the encoding of the humanities texts, and to support their use by a wide community of projects, institutions, and individuals.??? (TEI, 2010) The TEI Consortium seeks to do this through developing guidelines, disseminating information, training workshops, and the cultivation of a research community. Currently, the TEI is used by libraries, museums, publishers, and individual scholars to present texts for online research, teaching, and preservation.


History

The origins of TEI XML must be traced back all theway to 1987. It was during this year that the PoughkeepsieConference was held in New York, and the TEI was founded out ofthis to develop guidelines for encoding machine-readable texts ofinterest to the humanities and social sciences. We can see fromthis that although XML wasn???t initially involved, the desire forsharing, creating software and simplifying the training of encodersgoes far back, as well as the desire to create some sort of controlfor the sprawl of markup languages.

Members of the Poughkeepsie Conference
In June 1990, the P1 guidelines were created. These were theguidelines outlined in the Poughkeepsie conference, and for thefollowing three years, they were revised and improved. This finallyresulted in the P3 guidelines for markup languages in 1994. Thesewould become the most widely-used standard for text materials forperforming online research and teaching, and are now the standardfor the humanities.

*However, by 1998, XML was recognized by the W3C, and so the P3 had to be updated to enable users to work with the emerging XML toolset. A consortium was created to maintain, develop, and promote TEI. By 2002, P4 was created, basically an XML version of P3. The Consortium actively continued to work on these guidelines, even as *P5* was begun and released in 2007. The P5 guidelines are currently still under revision.

The TEI website also maintains accessible archives to past guidelines and listservs.

Significance

There are many online projects that use TEI XML. The TEI websiteincludes a list of major projects, some of which include the Medieval Nordic Poetry Project, HESTIA: Herodotus timeline, and Fine Rills Henry III.

Any one of these project websites can be visited. Upon selecting ???View Source??? under the View Menu in your browser toolbar, you can see the TEI encoding used for these projects.

*

View Source for the Medieval Nordic Poetry Project with TEI XML

What is perhaps most significant of the TEI XML, however, is itssuccess. Many argue that all of the goals of the Poughkeepsieconference have not only been met, but have been exceeded by theTEI. The TEI is very significant in the history of humanitiescomputing, as it is symbolic both of what technology can do for thehumanities, but also because it is so widely used and recognized.In fact, TEI XML is the encoding scheme of choice for theproduction of both small and large documents and projects in thehumanities, whether they are texts, reference works, linguisticcorpora, etc., as well as for cultural heritage collections. Assuch, it can be seen that the TEI spreads across many subjectareas, and is available to students, scholars, and the widerpublic. This allows it to not only be easier to share work amongstscholars, but also can serve as a tool for long-term preservation.TEI XML connects many different social science and humanistdisciplines together, and has created a diverse research community,who can write in and recommend changes and additions to existingguidelines.

TEI XML has been endorsed by a number of organizations,including digital libraries, electronic text progress, and theLibrary of Congress. The latter has produced guidelines for thebest practice in applying TEI metadata. Further, TEI has not onlyallowed an exchange of information, but ???improves';this.style.color = '#ff0000';" onMouseOut = "this.innerHTML = 'improves';this.style.color = '#000000';">improves the ability to describe textual features.??? (Renear, 2004) TEI allows for new possibilities of representation and communication, and illuminates new textual issues.

Related Technologies

TEI XML is a subset of XML, and so it worth remembering that is thus related to other technologies, such as SGML, XHTML and HTML.

Special recognition should also be given to a specific DTD of TEI, TEI Lite. This came into existence in 1995. Itis a subset of the full TEI encoding scheme, since there is much ofthe full TEI that is unnecessary. TEI Lite is so efficient,however, that it meets 90% of TEI community needs 90% of the time.(Vanhoutte 2004, 11)

It is also worth mentioning metadata schemes as related technologies of TEI XML. These tend to be designed for specific purposes, instead of covering a wide expanse of documents such as what TEI XML does. This makes certain metadata schemes incredibly useful in certain circumstances.

Dublin Core
The Dublin CoreThe Dublin Core texts are perhaps one of the most popular of these metadata schemes. There are 15 base texts in the Dublin Core, which were developed in Ohio, and not Ireland as the name might suggest. This metadata scheme was meant for Library and Information Sciences, as well as Computer Sciences. The mission of the Dublin Core is stated as ???to provide simple standards to facilitate the finding, sharing, and management of information.??? It was designed for books, digital videos, sound, image, or text files. texts are perhaps one of the most popular of these metadata schemes. There are 15 base texts in the Dublin Core, which were developed in Ohio, and not Ireland as the name might suggest. This metadata scheme was meant for Library and Information Sciences, as well as Computer Sciences. The mission of the Dublin Core is stated as ???to provide simple standards to facilitate the finding, sharing, and management of information.??? It was designed for books, digital videos, sound, image, or text files.

It is possible to use TEI in conjunction with these metadata schemes, much like one can choose to use many different TEI DTDs for one document, if there are different kinds of text in that document. (For example, a piece of literature may have poetry as well as prose.)

METS
Another popular metadata scheme is that of METS(Metadata Encoding and Transmission Standard). This is XML designedfor digital libraries. It is a standard for encoding descriptive,administrative, and structural metadata. This standard ismaintained in the Network Development and MARC Standards Office ofthe Library of Congress. These are simply two examples of other XMLmetadata schemes.

Future

Considering the expansive use and scope of TEI XML, it easy to say that TEI will likely be a tool in existence for a long time, or at least as long as encoding languages are in use. TEI is used for so many different projects, and covers so many different areas, it is difficult to imagine it disappearing overnight, especially as it is continually updated and revised, such as with the current TEI P5 guidelines. This is part of what enables TEI to be projected as a tool for long-term preservation, and for software designs such as oXygen, an XML editor.

It is worth noting that many scholars speculate that TEI XMLcould benefit from more expansive and easier to use tutorials onhow to use TEI XML. Perhaps this a direction that the future of TEIXML could take. As well, there is speculation that perhaps TEI XMLwill grow to include complicated kinds of text (Jannidis, 2009),instead of existing in conjunction only to metadata schemes such asthe Dublin Core and METS.

References

Center for Digital Research in the Humanities. University of Nebraska-Lincoln. ???What is TEI???? Accessed October 3, 2010. http://cdrh.unl.edu/articles/guide_site/tei.php

Jannidis, Fotis. ???TEI in a crystal ball.??? Literary & Linguistic Computing 24 (2009): 253-265. Accessed October 3, 2010. doi:10.1093/llc/fqp015. http://ehis.ebscohost.com.login.ezproxy.library.ualberta.ca/eds/pdfviewer/pdfviewer?vid=1&hid=114&sid=bb4cc903-5aec-4931-b42a-af2869e86232%40sessionmgr113

Mandell, Laura. University of Ohio. ???An Introduction to TEI.??? Accessed October 3, 2010. http://www.users.muohio.edu/mandellc/xml/

MIT Libraries. ???Metadata Reference Guide: TEI (Text Encoding Initiative) Metadata.??? Accessed October 3, 2010. http://libraries.mit.edu/guides/subjects/metadata/standards/tei.html

Renear, Allen H. ???Text Encoding.??? In Companion to Digital Humanities, ed. Susan Schreibman, Ray Siemens, John Unsworth. Oxford: Blackwell, 2004. http://www.digitalhumanities.org/companion/

TEI: Text Encoding Initiative. ???TEI: History.??? Accessed October 3, 2010. http://www.tei-c.org/About/history.xml

Vanhoutte, Edward. ???An Introduction to the TEI and the TEI Consortium.??? Literary & Linguistic Computing 19 (2004): 9-15. Accessed October 3, 2010. http://ehis.ebscohost.com.login.ezproxy.library.ualberta.ca/eds/pdfviewer/pdfviewer?vid=2&hid=114&sid=78a98960-0f8c-4903-b61a-fb6268042a6c%40sessionmgr111

Useful Links

W3C Homepage: http://www.tei-c.org/Activities/Projectshttp://www.w3.org/

TEI Homepage: http://www.tei-c.org/index.xml

TEI Guidelines: http://www.tei-c.org/Guidelines/P4/

TEI Teach Yourself: http://www.tei-c.org/Support/Learn/tutorials.xml

Projects Using TEI: http://www.tei-c.org/Activities/Projects/

TEI Lite: http://www.tei-c.org/Guidelines/Customization/Lite/

Dublin Core: http://dublincore.org/

METS: http://www.loc.gov/standards/mets/

--ColetteLeung 01:33, 3 December 2010 (UTC)

Personal tools