CIRCA:TextArc

From CIRCA

Jump to: navigation, search

TextArc (http://www.textarc.org/)

Michael Burden

17 November 2010

Contents

Introduction to TextArc

TextArc is a program for visualizing the structure of text. It treats the text as a "data container" [Paul, 2007] and uses word frequency and pairings to develop a "visual index" [Kimport, 2006].

TextArc draws the full text on an outer spiral (the spiral highlighted in green on the image below), meaning that the entire text is always visible. Although this full text is in a 1 point font, it can be read by hovering over a line. Every word is then repeated inside the spiral, with the word's appearances in the text pulling it into position (as though a "rubber band" was attached to each appearance [Fildes, 2002]). Thus Alice, which appears throughout the book, appears in the centre, while the Gryphon, who appears only in a single chapter, appears close to 9 o'clock.

Selecting a word will show a line to every appearance of that word in the text. It can also show, in a different colour (purple in the image below), lines to other words that are associated by proximity to that word. Additionally TextArc can 'play' the text by showing an orange line moving through each word in sequence.

The visualization is highly interactive, allowing the user to move through the text and it's words to explore the text's structure. The overall display is also artistic in design (and has been shown in numerous art galleries and shows) and blurs the lines between tool, science, art, and text [Chen, 2010].

Screenshot of TextArc

TextArc is shown reading Alice in Wonderland in orange with words associated to Alice glowing purple while the places the word Alice appears showing in green on the outer ring

Audience and Purpose

"TextArc was developed to help people deal with the ever-increasing influx of data they are forced to accept and integrate into their knowledge base" [Paley, 2002].

TextArc is aimed at an audience of people who need to filter a text quickly. This might include academics, financial analysts, or lawyers. It is able to expose the structure implied by word distribution, and the timing and interconnection of the words. It can potentially provide a deeper interpretation of the text based on it's structure.

The developer of TextArc frames it's use in this way: "Suppose you have 5 minutes to understand a 500 page book with no index or chapters..."

As a work of art, it is designed with aesthetic principles in mind, and has been called beautiful [Vesna, p.101] and "remarkable" [Mirapaul, 2002].

Significance

TextArc has been described as "the first accurate cyber-accountant of literature that is capable of analysing the content and structure of a text" [Emerson, 2003]. This quote appears in reference to a book by Italo Calvino ("If on a Winter's Night a Traveller") which proposes the idea of a computer that can understand a text by analysing it's word count and frequency.

A simple interpretation of it's usage is that by exposing the structure of a text it allows researchers to better analyse and interpret the text.

Bradford Paley, the developer, suggests that deeper interpretations are also possible[1]. Looking at the image below which shows the words 'King' and 'Queen' selected, both appear mainly in two sections of the book (around 8 o'clock and 11 o'clock). We can see that the Queen dominates the first appearance, while the King dominates the second. But beyond this, we can see that before these sections, the Queen is mentioned on four occasions. These are other characters mentioning the Queen, foreshadowing her arrival and ensuring she stays in the reader's minds. The even spacing indicates that CS Lewis structured these mentions deliberately, aware that the Queen was a key figure in the novel's dénouement.

TextArc has been influential. It has been seen by thousands of people at various shows, and has inspired other text visualization projects, including a visualization of the history of information visualization, for an information visualization conference (InfoVis) [Hsu, 2004].

TextArc explores the relationship between structure and meaning, raising the question of how much meaning is inherent in structure. For it importance is based on frequency, and connection is based on co-location. Ultimately, though, it privileges the word as a discrete unit. It is not able to distinguish the different meanings a single word may have. Nor is it able to interpret the meanings that emerge through specific phrasings of sequences of words, other than noticing their proximity.

Although TextArc is presented as a way to quickly understand a text, it is arguably more useful to consult a review of a novel in order to quickly establish it's meaning. Some texts, such as speeches, may be more appropriate to using TextArc in this fashion.

Screenshot of TextArc

In this image, lines radiated from the word King (in gray) and Queen (in orange) showing that both words chiefly appear in two sections of the book, although there are four prior occasions at regular intervals that mention the Queen: closer inspection reveals these to be foreshadowing mentions of the Queen by other characters.

Technologies

TextArc is a Java applet, typically run in a web browser. Java is an Operating System-independent programming language released by Sun in 1995. It is designed to solve the problem of needing to rewrite code in order to port it from system to system.

The text is the key input parameter to the applet. Due to being linked to Project Gutenberg it is able to visualize thousands of texts. Other input that could be usefully displayed by TextArc includes:

  • E-mails archives
  • Legal documents
  • Source code
  • Financial news updates
  • Genomics

History

TextArc was conceived, designed and developed by Bradford Paley. Bradford teaches interaction design as "cognitive engineering" at Columbia University. He is also a consultant for Wall Street, creating visualizations for stock traders. The program was originally conceived as a text analysis tool.

Bradford Paley did not write every line of code. Hai Ng, JueyChong Ong, and Greta Peterman at Bradford's company Digital Image Design Incorporated (didi) worked on a Java codebase from which TextArc was built. As well, inspiration for aspects of the cacheing that allows the entire text to appear (twice) on the screen is credited to Clifford Beshers [2].

TextArc was released in 2002 (although a preview was shown at the Banff Centre for the Arts in 2001). Since then it has been displayed in numerous locations, including:

Citations

Chen, Chaomei. "Information visualization." Wiley Interdisciplinary Reviews Computational Statistics, Volume 2, Issue 4, pages 387–403, July/August 2010.

Emerson, L. "Digital Poetry as Reflexive Embodiment." Cybertext Yearbook (2002-2003): 88 – 106.

Fildes, Jonathon. (2002, November 29). Visual net spins literary web. BBC News. Retrieved 19 November 2010

Hsu, T.-W.; Inman, L.; Mccolgin, D.; Stamper, K. (2004): MonkEllipse: Visualizing the History of Information Visualization. In: Proc. of INFOVIS’04, IEEE Computer Society, p. 216.9.

Kimport, Katrina. (2006). TextArc Research Report. Transliteracies Project (Research in the Technological, Social, and Cultural Practices of Online Reading). Home page. University of California. Retrieved 19 November 2010.

Mirapaul, Matthew. (September 16, 2002). "Secrets of Digital Creativity Revealed in Miniatures" The New York Times, "Arts" section. Retrieved 20 November 2010.

Paley, W. Bradford. 2002. "TextArc: Showing word frequency and distribution in text". In Proc. of IEEE Symp. on Information Visualization (InfoVis), Poster, Boston, USA, October. IEEE Computer Society.

Paul, Christiane. “The Database as System and Cultural Form: Anatomies of Cultural Narratives.” Database Aesthetics: Art in the Age of Information Overflow. Ed. Victoria Vesner. Minneapolis: U of Minnesota P, 2007. 95-109.

Vesna, Victoria, ed. Database Aesthetics: Art in the Age of Information Overflow. Minneapolis: Minn., 2007. Print.

Personal tools