CIRCA:TextArc

From CIRCA

(Difference between revisions)
Jump to: navigation, search
(Citations: BBC proper now)
(Introduction to TextArc)
Line 4: Line 4:
[http://www.textarc.org/ TextArc] is a program for visualizing the structure of text. It treats the text as a "data container" [Paul, 2007] and uses word frequency and pairings to develop a "visual index" [Kimport, 2006].<br/><br/>
[http://www.textarc.org/ TextArc] is a program for visualizing the structure of text. It treats the text as a "data container" [Paul, 2007] and uses word frequency and pairings to develop a "visual index" [Kimport, 2006].<br/><br/>
It draws the full text on an outer spiral (the spiral highlighted in green on the image below), meaning that the entire text is always visible. Although this full text is in a 1 point font, it can be read by hovering over a line.
It draws the full text on an outer spiral (the spiral highlighted in green on the image below), meaning that the entire text is always visible. Although this full text is in a 1 point font, it can be read by hovering over a line.
-
Every word is then repeated inside the spiral, with the word's appearances in the text pulling it into position (as though a "rubber band" was attached to each appearance [BBC article, 2002]). <br/>
+
Every word is then repeated inside the spiral, with the word's appearances in the text pulling it into position (as though a "rubber band" was attached to each appearance [Fildes, 2002]). <br/>
Selecting a word will show a line to every appearance of that word in the text. It can also show, in a different colour (purple in the image below), lines to other words that are associated by proximity to that word. Additionally the text can 'play' the text by showing a orange line moving through each word in sequence. <br/><br/>
Selecting a word will show a line to every appearance of that word in the text. It can also show, in a different colour (purple in the image below), lines to other words that are associated by proximity to that word. Additionally the text can 'play' the text by showing a orange line moving through each word in sequence. <br/><br/>
The visualization is highly interactive, allowing the user to move through the text and it's words to explore the text's structure. The overall display is also artistic in design (and has been shown in numerous art galleries and shows) and blurs the lines between tool, science, art, and text [Chen, 2010].
The visualization is highly interactive, allowing the user to move through the text and it's words to explore the text's structure. The overall display is also artistic in design (and has been shown in numerous art galleries and shows) and blurs the lines between tool, science, art, and text [Chen, 2010].

Revision as of 01:04, 20 November 2010

  • This page is under construction*

Contents

Introduction to TextArc

TextArc is a program for visualizing the structure of text. It treats the text as a "data container" [Paul, 2007] and uses word frequency and pairings to develop a "visual index" [Kimport, 2006].

It draws the full text on an outer spiral (the spiral highlighted in green on the image below), meaning that the entire text is always visible. Although this full text is in a 1 point font, it can be read by hovering over a line. Every word is then repeated inside the spiral, with the word's appearances in the text pulling it into position (as though a "rubber band" was attached to each appearance [Fildes, 2002]).
Selecting a word will show a line to every appearance of that word in the text. It can also show, in a different colour (purple in the image below), lines to other words that are associated by proximity to that word. Additionally the text can 'play' the text by showing a orange line moving through each word in sequence.

The visualization is highly interactive, allowing the user to move through the text and it's words to explore the text's structure. The overall display is also artistic in design (and has been shown in numerous art galleries and shows) and blurs the lines between tool, science, art, and text [Chen, 2010].

Screenshot of TextArc

TextArc is shown reading Alice in Wonderland in orange with words associated to Alice glowing purple while the places the word Alice appears showing in green on the outer ring

Audience and Purpose

TextArc is aimed at an audience of people who need to filter a text quickly. It is able to expose the structure implied by word distribution, and the timing and interconnection of the words. It can potentially provide a deeper interpretation of the text based on it's structure.

The developer of TextArc frames it's use in this way: "Suppose you have 5 minutes to understand a 500 page book with no index or chapters..."

Significance

TextArc has been described as "the first accurate cyber-accountant of literature that is capable of analysing the content and structure of a text" [Emerson, 2003]. This quote appears in reference to a book by Italo Calvino ("If on a Winter's Night a Traveller") which proposes the idea of a computer that can understand a text by analysing it's word count and frequency.

A simple interpretation of it's usage is that by exposing the structure of a text it allows researchers to better analyse and interpret the text.

Bradford Paley, the developer, suggests that deeper interpretations are also possible[1]. Looking at the image below which shows the words 'King' and 'Queen' selected. Both appear mainly in two sections of the book. We can see that the Queen dominates the first appearance, while the King dominates the second. But beyond this, we can see that before these sections, the Queen is mentioned on four occasions. These are other characters mentioning the Queen, foreshadowing her arrival and ensuring she stays in the reader's minds. The even spacing indicates that CS Lewis structured these mentions deliberately, aware that the Queen was a key figure in the novel's dénouement.

TextArc has been influential. It has been seen by thousands at various shows, and has inspired other text visualization projects, including a visualization of the history of information visualization, for an information visualization conference (InfoVis) [Hsu, 2004].

TextArc explores the relationship between structure and meaning, raising the question of how much meaning is inherent in structure. For it importance is based on frequency, and connection is based on co-location. Ultimately, though, it privileges the word as a discrete unit. It is not able to distinguish the different meanings a single word may have. Nor is it able to interpret the meanings that emerge through specific phrasings of sequences of words, other than noticing their proximity.

Although TextArc is presented as a way to quickly understand a text, it is arguably more useful to consult a review of a novel in order to quickly establish it's meaning. Some texts, such as speeches, may be more appropriate to using TextArc in this fashion.

Screenshot of TextArc

Lines radiated from the word King (in gray) and Queen (in orange) showing that both words chiefly appear in two sections of the book, although there are four prior occasions at regular intervals that mention the Queen: closer inspection reveals these to be foreshadowing mentions of the Queen by other characters.

Technologies

TextArc is a Java applet, typically run in a web browser. Java is an Operating System-independent programming language released by Sun in 1995. It is designed to solve the problem of needing to rewrite code in order to port it from system to system.

The text is the key input parameter to the applet. Due to being linked to Project Gutenberg it is able to visualize thousands of texts. Other input that could be usefully displayed by TextArc includes:

  • E-mails archives
  • Legal documents
  • Source code
  • Financial news updates
  • Genomics

History

TextArc was conceived, designed and developed by Bradford Paley. Bradford teaches interaction design as "cognitive engineering" at Columbia University. He is also a consultant for Wall Street, creating visualizations for stock traders. The program was originally conceived as a text analysis tool.

Bradford Paley did not write every line of code. Hai Ng, JueyChong Ong, and Greta Peterman at Bradford's company Digital Image Design Incorporated (didi) worked on a Java codebase from which TextArc was built. As well, inspiration for aspects of the cacheing that allows the entire text to appear (twice) on the screen is credited to Clifford Beshers [2].

TextArc was released in 2002 (although a preview was shown at the Banff Centre for the Arts in 2001). Since then it has been displayed in numerous locations, including:

Citations

Fildes, Jonathon. (2002, November 29). Visual net spins literary web. BBC News.

Research Report by Katrina Kimport for the Transliteracies Project, 2006.

Emerson, L. "Digital Poetry as Reflexive Embodiment." Cybertext Yearbook (2002-2003): 88 – 106.

Hsu, T.-W.; Inman, L.; Mccolgin, D.; Stamper, K. (2004): MonkEllipse: Visualizing the History of Information Visualization. In: Proc. of INFOVIS’04, IEEE Computer Society, p. 216.9.

Paul, Christiane. “The Database as System and Cultural Form: Anatomies of Cultural Narratives.” Database Aesthetics: Art in the Age of Information Overflow. Ed. Victoria Vesner. Minneapolis: U of Minnesota P, 2007. 95-109.

Chen, Chaomei. "Information visualization." Wiley Interdisciplinary Reviews Computational Statistics, Volume 2, Issue 4, pages 387–403, July/August 2010.

Personal tools