CIRCA:American and French Research for the Treasury of the French Language (ARTFL) Project

From CIRCA

(Difference between revisions)
Jump to: navigation, search
 
(6 intermediate revisions not shown)
Line 16: Line 16:
The ARTFL project, as it is exists today, began in 1981 as a cooperative project establishment between the [http://www.cnrs.fr/ ''Centre National de la Recherche Scientifique''] and the [http://www.uchicago.edu/index.shtml University of Chicago]. Its roots, however, date back to 1957, when the French Government held an international colloquium at ''Centre de Phililogie Romane of the Faculté des Lettres'' at Strasbourg. It was initiated, after much stressing by lexicographers led by Paul Imbs, that a new dictionary, under the project name ''Trésor de la Langue Française'', would be created and that it would be comprehensive in both its synchronic and its diachronic dimensions (Morrissey, 1993, p 93). The government decided that the best way to compile all of the words samples for the new dictionary was to transcribe an extensive selection of French texts for use with a computer. A daunting task, certainly, but also extremely forward thinking, given the first French built computers had only first been made available as little as two years prior (Monier-Kuhn, 1990). Regardless, during its creation, the team responsible for implementing the TLF oversaw the transcription of nearly 1,500 works from the eighteenth through to twentieth century’s, though only a modern (nineteenth and twentieth century) version would be produced, with plans for the previous century version set aside for a later date. What resulted was, perhaps, a lexicographers dream, but a logistical nightmare for everyone else involved. Once it was decided that the appropriate number of works from the modern centuries had be transcribed, the team was able to make relatively limited use of the TLF: they proceeded to generate one giant printed concordance the two modern centuries filling a large room at the headquarters in Nancy with a bound volume of word occurrences and their three line context (Morrissey, 1993, p 93). If a lexicographer desired more context, they could visit a larger room with a matching “''fiche-texte'':” an oversized file card with 18 lines of context, or if they were really in need of context, the basement of the building housed typed editions of the works. As it would turn out, many of those who desired the latter amount of context for a word ended up referring to the regular published edition of the text instead.
The ARTFL project, as it is exists today, began in 1981 as a cooperative project establishment between the [http://www.cnrs.fr/ ''Centre National de la Recherche Scientifique''] and the [http://www.uchicago.edu/index.shtml University of Chicago]. Its roots, however, date back to 1957, when the French Government held an international colloquium at ''Centre de Phililogie Romane of the Faculté des Lettres'' at Strasbourg. It was initiated, after much stressing by lexicographers led by Paul Imbs, that a new dictionary, under the project name ''Trésor de la Langue Française'', would be created and that it would be comprehensive in both its synchronic and its diachronic dimensions (Morrissey, 1993, p 93). The government decided that the best way to compile all of the words samples for the new dictionary was to transcribe an extensive selection of French texts for use with a computer. A daunting task, certainly, but also extremely forward thinking, given the first French built computers had only first been made available as little as two years prior (Monier-Kuhn, 1990). Regardless, during its creation, the team responsible for implementing the TLF oversaw the transcription of nearly 1,500 works from the eighteenth through to twentieth century’s, though only a modern (nineteenth and twentieth century) version would be produced, with plans for the previous century version set aside for a later date. What resulted was, perhaps, a lexicographers dream, but a logistical nightmare for everyone else involved. Once it was decided that the appropriate number of works from the modern centuries had be transcribed, the team was able to make relatively limited use of the TLF: they proceeded to generate one giant printed concordance the two modern centuries filling a large room at the headquarters in Nancy with a bound volume of word occurrences and their three line context (Morrissey, 1993, p 93). If a lexicographer desired more context, they could visit a larger room with a matching “''fiche-texte'':” an oversized file card with 18 lines of context, or if they were really in need of context, the basement of the building housed typed editions of the works. As it would turn out, many of those who desired the latter amount of context for a word ended up referring to the regular published edition of the text instead.
-
Surprisingly (or perhaps not given the French obsession with their treasured language), despite constant criticism from the research community in France, the TLC was maintained (and funded) more as a large collection of machine readable text’s than a functional dictionary. Finally in the late 70’s its director, Paul Imbs, was succeeded by Bernard Quémada, who “wished to modernize the technology and open up use of the database to a wider research community and for projects other than the creation of a dictionary” (Morrissey, 1993, p 94). The TLF was absorbed by the ''Institute National de Langue Française'' (INaLF), which would use the technology in “many aspects of linguistic analysis, distribution of electronic texts, and the creation of dictionary of Middle French” (Morrissey, 1993, p 94). Unfortunately, it was also about this time that the TLF was starting to show its age and had trouble responding to newer challenges, particularly the issue of providing greater access to the database. As luck would have it, a group of scholars from the University of Chicago were working at the research center for the TLF. A number of talks ensued, and the result was the ARTFL project, which mandated to make the TLF corpus available in North America and to develop facilities for querying the database. A few years later, in 1981, “three members of the INaLF arrived in Chicago, computer tapes in hand” (Morrissey, 1993, p 94).
+
Surprisingly (or perhaps not given the French obsession with their treasured language), despite constant criticism from the research community in France, the TLC was maintained (and funded) more as a large collection of machine readable text’s than a functional dictionary. Finally in the late 70’s its director, Paul Imbs, was succeeded by Bernard Quémada, who “wished to modernize the technology and open up use of the database to a wider research community and for projects other than the creation of a dictionary” (Morrissey, 1993, p 94). The TLF was absorbed by the ''Institute National de Langue Française'' (INaLF), which would use the technology in “many aspects of linguistic analysis, distribution of electronic texts, and the creation of dictionary of Middle French” (Morrissey, 1993, p 94). Unfortunately, it was also about this time that the TLF was starting to show its age and had trouble responding to newer challenges, particularly the issue of providing greater access to the database. As luck would have it, a group of scholars from the University of Chicago were working at the research center for the TLF. Conversations ensued, and the result was the ARTFL project, which mandated to make the TLF corpus available in North America and to develop facilities for querying the database. A few years later, in 1981, “three members of the INaLF arrived in Chicago, computer tapes in hand” (Morrissey, 1993, p 94).
-
Limited technology, by current standards, of the time made storing the database online far too costly. Early versions of the database query tools were unfortunately unwieldy as they only allowed access to only a few texts at a time. However, by the mid 1990’s storage techniques and information networks made it possible to not only store the entire database online, but also allow researchers an opportunity to expand it. By the turn of the century, over 2,700 texts populated the ever expanding [http://artfl-project.uchicago.edu/content/artfl-frantext ARTFL-FRANTEXT] database, and the entire system was entirely available on-line with much easier-to-use navigation software, known as [http://sites.google.com/site/philologic3/ PhiloLogic], was designed and implemented.
+
Limited technology, by standards of the time, made storing the database online far too costly. Early versions of the database query tools were unfortunately unwieldy as they only allowed access to only a few texts at a time. However, by the mid 1990’s storage techniques and information networks made it possible to not only store the entire database online, but also allow researchers an opportunity to expand it. By the turn of the century, over 2,700 texts populated the ever expanding [http://artfl-project.uchicago.edu/content/artfl-frantext ARTFL-FRANTEXT] database, and the entire system was entirely available on-line with much easier-to-use navigation software, known as [http://sites.google.com/site/philologic3/ PhiloLogic], was designed and implemented.
==Purpose / Audience==
==Purpose / Audience==
-
[[File:ARTFL_TL_Output.jpg|300px|thumb|left|Image of ARTFL Homepage]]Although the initial purpose of the TLF was to create a dictionary of the French language, its usefulness as such was almost non-existent to anyone who wasn’t a lexicographer. Thus, although the intended audience of the TLF may have initially been French people around the world, it never really got to that step. The main product of the TLF were lists of over 15 million French words, listed by their occurrence and cross referenced to their period of origin.  The idea, then, would be to create a dictionary based on the frequency of the words, and defined according to their most common context (also available with the TLF). Needless to say, outside of lexicography it is difficult to find a use for this product. When the TLF was expanded to be not only a database of French words, but also a database of French texts, however, it became far more useful as to researchers and scholars alike. When the database was then put online under the new banner of the ARTFL project, it became even more valuable simply because of its availability. For example, Keith Baker, a researcher of the French Revolution was able to compare and contrast the occurrence of opinion publique in the latter half of eighteenth century France. Because it was possible to not only find every occurrence of the word, but also every similar occurrence of the word and then determine its context, Baker was able to notice the transformation of the word from uncertainty and disorder, to rational authority during the revolutionary years (Morrissey, 1993, p 94).
+
[[File:ARTFL_TLF_Output.jpg|250px|thumb|right|Table of Sample TLF word occurrence output]]As the initial purpose of the TLF was to create a dictionary of the French language, it wasn't useful to anyone who wasn’t a lexicographer. Thus, although the intended audience of the TLF may have initially been French speakers around the world, it was never really that useful. The main product of the TLF were lists of over 15 million French words, listed by their occurrence and cross referenced to their period of origin.  The idea, then, was to create a dictionary based on the frequency of the words, and defined according to their most common context (also available with the TLF). Needless to say, outside of lexicography and linguistics it is difficult to find a use for such lists. When the TLF was expanded to be not only a database of French words, but also a database of French texts, it became far more useful to researchers and scholars alike. When the database was then put online under the new banner of the ARTFL project, it became even more valuable simply because of its availability. For example, Keith Baker, a researcher of the French Revolution was able to compare and contrast the occurrence of "opinion publique" in the latter half of eighteenth century France. Because it was possible to not only find every occurrence of the words, but also every similar occurrence and then determine its context, Baker was able to notice the transformation of the phrase from uncertainty and disorder, to rational authority during the revolutionary years (Morrissey, 1993, p 94).
==Technologies==
==Technologies==
-
The ARTFL project – or rather the TLF before it, was always conceived as being a project involving computers. The computers of the time in France, having been only conceived and delivered two years prior, would likely have been vacuum tube controlled, with electrical line or ferrite core working memory able to store anywhere from 60 to 256 words, and a backup storage drum able to hold anywhere from 2,000 to 32,000 words depending on the model. Clock speeds would have been in the 100’s of kHz and likely would not have exceeded even half a MHz. The input systems for computers of the time period were almost certainly punch cards, which meant a user would first need to produce the punch card for a text, and then have it fed into the system. For output, a user could make use of either an electronic printer known as a “''Numérograph'',” “''fiche-texte'',” or reels of computer punched tape that only another computer could then read (Mounier-Kouhn, 1990).
+
The ARTFL project – or rather the TLF before it, was always conceived as being a project involving computers. The computers of the time in France, having been only conceived and delivered two years prior, would likely have had vacuum tubes, with electrical line or ferrite core working memory able to store anywhere from 60 to 256 words, and a backup storage drum able to hold anywhere from 2,000 to 32,000 words depending on the model. Clock speeds would have been in the 100’s of kHz and likely would not have exceeded even half a MHz. The input systems for computers of the time period were almost certainly punch cards, which meant a user would first need to produce the punch card for a text, and then have it fed into the system. For output, a user could make use of either an electronic printer known as a “''Numérograph'',” “''fiche-texte'',” or reels of computer punched tape that only another computer could then read (Mounier-Kouhn, 1990).
-
When the ARTFL project was created, in 1981, the database was transferred to more modern computers of the period, though somewhat similar to what is in use today, though admittedly much slower and much larger. As the ARTFL project became publicly accessible, it quickly became apparent that a modern search tool was necessary to truly tap into the databases potential. [http://sites.google.com/site/philologic3/ PhiloLogic] is that tool.  PhiloLogic is essentially a menu driven research tool built primarily for use with the ARTFL database. Although it has its own sophisticated set of instructions, it remains (even to this day) remarkably well documented and supported and is actually quite easy for anyone to use. At this point, however, the technology that is common place, even in today’s internet, becomes the primary types of technology used by the project. The ARTFL project is available on the World Wide Web, and therefore uses Hyper Text Mark-up Language (HTML) for viewing with an internet browser that anyone in the world with an internet connection can use. The PhiloLogic tool is available for use on the projects website, though a subscription is required to access the database.
+
When the ARTFL project was created, in 1981, the database was transferred to more modern computers of the period they would have been servers somewhat similar to what is in use today, though admittedly much slower and much larger. As the ARTFL project became publicly accessible, it quickly became apparent that a modern search tool was necessary to truly tap into the databases potential. [http://sites.google.com/site/philologic3/ PhiloLogic] is that tool.  PhiloLogic is essentially a full-text search enging built primarily for use with the ARTFL database, but available for other projects. Although ARTFL has its own sophisticated set of instructions, it remains (even to this day) remarkably well documented and is actually quite easy for anyone to use. The ARTFL project is available on the World Wide Web, and therefore uses HyperText Markup Language (HTML) for viewing with an internet browser that anyone in the world with an internet connection can use. The PhiloLogic tool is available for use on the projects website, though a subscription is required to access the database.
 +
 
 +
==References==
 +
 
 +
Robare, Lori, and Joni Roberts. "ARTFL Project." College & Research Libraries News 61, no. 10 (November 2000): 942. Education Research Complete, EBSCOhost (accessed September 24, 2010).
 +
 
 +
Morrissey, Robert. "Texts and Contexts: The ARTFL Database in French Studies." Profession (1993): 27-33. MLA International Bibliography, EBSCOhost (accessed September 24, 2010).
 +
 
 +
Mounier-Kuhn, Pierre-E. "Specifications of Twelve Early Computers Made in France." IEEE Annals of the History of Computing 12, no. 1 (January 1, 1990): 3. TOC Plus, EBSCOhost (accessed September 24, 2010).
 +
 
 +
Wolff, M. "Poststructuralism and the ARTFL Database: Some Theoretical Considerations." INFORMATION TECHNOLOGY AND LIBRARIES 13, no. 1 (1994): 35. British Library Document Supply Centre Inside Serials & Conference Proceedings, EBSCOhost (accessed September 24, 2010).
 +
 
 +
==External Links==
 +
 
 +
[http://artfl-project.uchicago.edu/ American and French Research on the Treasury of the French Language Project] (ARTFL) - Home Page
 +
 
 +
[http://www.library.ualberta.ca/databases/databaseinfo/index.cfm?ID=83 ARTFL Database] - University of Alberta Access
 +
 
 +
[http://www.uchicago.edu/index.shtml University of Chicago] - Homepage
 +
 
 +
[http://www.cnrs.fr/ ''Centre National de la Recherche Scientifique''] - Homepage
 +
 
 +
[http://philologic.uchicago.edu/ PhiloLogic] Old Homepage
 +
 
 +
[http://sites.google.com/site/philologic3/ PhiloLogic] - New Homepage

Current revision as of 19:01, 6 October 2010

Contents

Project Overview

Image of ARTFL Homepage
The American and French Research on the Treasury of the French Language Project (ARTFL) is the extension of the Trésor de la Langue Française (TLF), a database, conceived in 1957 by the French Government, of French texts from the seventeenth through to the twentieth century’s. It was initially built to produce a new dictionary of words, but was later extended to be a database of texts as well. In 1981 the ARTFL project was established by the cooperation of the University of Chicago and the Centre National de la Recherche Scientifique, which would oversee the database going online. It has since been available, by subscription, to scholars, researchers, and students all around the world.

The ARTFL project's main corpus, the ARTFL-FRANTEXT database, currently contains nearly 3,000 pieces of work from the last four century’s of French writing, equating to well over 15 Million words. Accompanying the database is a proprietary search tool called PhiloLogic (now hosted by Google, via Google Sites), which was designed to navigate the wealth of texts with relative ease, but also pay homage to the initial intent of the TLF. Other databases available for subscription by the ARTFL project now include, among others, a French Women Writers database, a Provençal Poetry database, as well the Textes de Français Ancien (TFA): a database containing works from the twelfth though to the fifteenth century. A full list of the ARTFL databases and their descriptions can be found on the ARTFL website.

The three points of focus of the ARTFL project, according to the website and staff, have been and always will be:

  • to include a variety of texts so as to make the database as versatile as possible;
  • to create a system that would be easily accessible to the research community;
  • to provide researchers with an easy-to-use but effective tool;

The ARTFL project is supported by a full-time staff at the University of Chicago, which continues to obtain valuable texts that are transcribed and added to the database, while seeking out new contributions and proposals for more texts.

History

The ARTFL project, as it is exists today, began in 1981 as a cooperative project establishment between the Centre National de la Recherche Scientifique and the University of Chicago. Its roots, however, date back to 1957, when the French Government held an international colloquium at Centre de Phililogie Romane of the Faculté des Lettres at Strasbourg. It was initiated, after much stressing by lexicographers led by Paul Imbs, that a new dictionary, under the project name Trésor de la Langue Française, would be created and that it would be comprehensive in both its synchronic and its diachronic dimensions (Morrissey, 1993, p 93). The government decided that the best way to compile all of the words samples for the new dictionary was to transcribe an extensive selection of French texts for use with a computer. A daunting task, certainly, but also extremely forward thinking, given the first French built computers had only first been made available as little as two years prior (Monier-Kuhn, 1990). Regardless, during its creation, the team responsible for implementing the TLF oversaw the transcription of nearly 1,500 works from the eighteenth through to twentieth century’s, though only a modern (nineteenth and twentieth century) version would be produced, with plans for the previous century version set aside for a later date. What resulted was, perhaps, a lexicographers dream, but a logistical nightmare for everyone else involved. Once it was decided that the appropriate number of works from the modern centuries had be transcribed, the team was able to make relatively limited use of the TLF: they proceeded to generate one giant printed concordance the two modern centuries filling a large room at the headquarters in Nancy with a bound volume of word occurrences and their three line context (Morrissey, 1993, p 93). If a lexicographer desired more context, they could visit a larger room with a matching “fiche-texte:” an oversized file card with 18 lines of context, or if they were really in need of context, the basement of the building housed typed editions of the works. As it would turn out, many of those who desired the latter amount of context for a word ended up referring to the regular published edition of the text instead.

Surprisingly (or perhaps not given the French obsession with their treasured language), despite constant criticism from the research community in France, the TLC was maintained (and funded) more as a large collection of machine readable text’s than a functional dictionary. Finally in the late 70’s its director, Paul Imbs, was succeeded by Bernard Quémada, who “wished to modernize the technology and open up use of the database to a wider research community and for projects other than the creation of a dictionary” (Morrissey, 1993, p 94). The TLF was absorbed by the Institute National de Langue Française (INaLF), which would use the technology in “many aspects of linguistic analysis, distribution of electronic texts, and the creation of dictionary of Middle French” (Morrissey, 1993, p 94). Unfortunately, it was also about this time that the TLF was starting to show its age and had trouble responding to newer challenges, particularly the issue of providing greater access to the database. As luck would have it, a group of scholars from the University of Chicago were working at the research center for the TLF. Conversations ensued, and the result was the ARTFL project, which mandated to make the TLF corpus available in North America and to develop facilities for querying the database. A few years later, in 1981, “three members of the INaLF arrived in Chicago, computer tapes in hand” (Morrissey, 1993, p 94).

Limited technology, by standards of the time, made storing the database online far too costly. Early versions of the database query tools were unfortunately unwieldy as they only allowed access to only a few texts at a time. However, by the mid 1990’s storage techniques and information networks made it possible to not only store the entire database online, but also allow researchers an opportunity to expand it. By the turn of the century, over 2,700 texts populated the ever expanding ARTFL-FRANTEXT database, and the entire system was entirely available on-line with much easier-to-use navigation software, known as PhiloLogic, was designed and implemented.

Purpose / Audience

Table of Sample TLF word occurrence output
As the initial purpose of the TLF was to create a dictionary of the French language, it wasn't useful to anyone who wasn’t a lexicographer. Thus, although the intended audience of the TLF may have initially been French speakers around the world, it was never really that useful. The main product of the TLF were lists of over 15 million French words, listed by their occurrence and cross referenced to their period of origin. The idea, then, was to create a dictionary based on the frequency of the words, and defined according to their most common context (also available with the TLF). Needless to say, outside of lexicography and linguistics it is difficult to find a use for such lists. When the TLF was expanded to be not only a database of French words, but also a database of French texts, it became far more useful to researchers and scholars alike. When the database was then put online under the new banner of the ARTFL project, it became even more valuable simply because of its availability. For example, Keith Baker, a researcher of the French Revolution was able to compare and contrast the occurrence of "opinion publique" in the latter half of eighteenth century France. Because it was possible to not only find every occurrence of the words, but also every similar occurrence and then determine its context, Baker was able to notice the transformation of the phrase from uncertainty and disorder, to rational authority during the revolutionary years (Morrissey, 1993, p 94).

Technologies

The ARTFL project – or rather the TLF before it, was always conceived as being a project involving computers. The computers of the time in France, having been only conceived and delivered two years prior, would likely have had vacuum tubes, with electrical line or ferrite core working memory able to store anywhere from 60 to 256 words, and a backup storage drum able to hold anywhere from 2,000 to 32,000 words depending on the model. Clock speeds would have been in the 100’s of kHz and likely would not have exceeded even half a MHz. The input systems for computers of the time period were almost certainly punch cards, which meant a user would first need to produce the punch card for a text, and then have it fed into the system. For output, a user could make use of either an electronic printer known as a “Numérograph,” “fiche-texte,” or reels of computer punched tape that only another computer could then read (Mounier-Kouhn, 1990).

When the ARTFL project was created, in 1981, the database was transferred to more modern computers of the period they would have been servers somewhat similar to what is in use today, though admittedly much slower and much larger. As the ARTFL project became publicly accessible, it quickly became apparent that a modern search tool was necessary to truly tap into the databases potential. PhiloLogic is that tool. PhiloLogic is essentially a full-text search enging built primarily for use with the ARTFL database, but available for other projects. Although ARTFL has its own sophisticated set of instructions, it remains (even to this day) remarkably well documented and is actually quite easy for anyone to use. The ARTFL project is available on the World Wide Web, and therefore uses HyperText Markup Language (HTML) for viewing with an internet browser that anyone in the world with an internet connection can use. The PhiloLogic tool is available for use on the projects website, though a subscription is required to access the database.

References

Robare, Lori, and Joni Roberts. "ARTFL Project." College & Research Libraries News 61, no. 10 (November 2000): 942. Education Research Complete, EBSCOhost (accessed September 24, 2010).

Morrissey, Robert. "Texts and Contexts: The ARTFL Database in French Studies." Profession (1993): 27-33. MLA International Bibliography, EBSCOhost (accessed September 24, 2010).

Mounier-Kuhn, Pierre-E. "Specifications of Twelve Early Computers Made in France." IEEE Annals of the History of Computing 12, no. 1 (January 1, 1990): 3. TOC Plus, EBSCOhost (accessed September 24, 2010).

Wolff, M. "Poststructuralism and the ARTFL Database: Some Theoretical Considerations." INFORMATION TECHNOLOGY AND LIBRARIES 13, no. 1 (1994): 35. British Library Document Supply Centre Inside Serials & Conference Proceedings, EBSCOhost (accessed September 24, 2010).

External Links

American and French Research on the Treasury of the French Language Project (ARTFL) - Home Page

ARTFL Database - University of Alberta Access

University of Chicago - Homepage

Centre National de la Recherche Scientifique - Homepage

PhiloLogic Old Homepage

PhiloLogic - New Homepage

Personal tools