CIRCA:TextArc
From CIRCA
- This page is under construction*
Contents |
Introduction to TextArc
TextArc is a program for visualizing the structure of text. It treats the text as a "data container" [Paul, 2007] and uses word frequency and pairings to develop a "visual index" [Kimport, 2006].
It draws the full text on an outer spiral (the spiral highlighted in green on the image below), meaning that the entire text is always visible. Although this full text is in a 1 point font, it can be read by hovering over a line.
Every word is then repeated inside the spiral, with the word's appearances in the text pulling it into position (as though a "rubber band" was attached to each appearance [BBC article, 2002]).
Selecting a word will show a line to every appearance of that word in the text. It can also show, in a different colour (purple in the image below), lines to other words that are associated by proximity to that word. Additionally the text can 'play' the text by showing a orange line moving through each word in sequence.
The visualization is highly interactive, allowing the user to move through the text and it's words to explore the text's structure. The overall display is also artistic in design (and has been shown in numerous art galleries and shows) and blurs the lines between tool, science, art, and text [Chen, 2010].
TextArc is shown reading Alice in Wonderland in orange with words associated to Alice glowing purple while the places the word Alice appears showing in green on the outer ring
Audience and Purpose
TextArc is aimed at an audience of people who need to filter a text quickly. It is able to expose the structure implied by word distribution, and the timing and interconnection of the words. It can potentially provide a deeper interpretation of the text based on it's structure.
The developer of TextArc frames it's use in this way: "Suppose you have 5 minutes to understand a 500 page book with no index or chapters..."
Significance
TextArc has been described as "the first accurate cyber-accountant of literature that is capable of analysing the content and structure of a text" [Emerson, 2003]. This quote appears in reference to a book by Italo Calvino ("If on a Winter's Night a Traveller") which proposes the idea of a computer that can understand a text by analysing it's word count and frequency.
A simple interpretation of it's usage is that by exposing the structure of a text it allows researchers to better analyse and interpret the text.
Bradford Paley, the developer, suggests that deeper interpretations are also possible. Looking at the image below which shows the words 'King' and 'Queen' selected. Both appear mainly in two sections of the book. We can see that the Queen dominates the first appearance, while the King dominates the second. But beyond this, we can see that before these sections, the Queen is mentioned on four occasions. These are other characters mentioning the Queen, foreshadowing her arrival and ensuring she stays in the reader's minds. The even spacing indicates that CS Lewis structured these mentions deliberately, aware that the Queen was a key figure in the novel's dénouement[1].
TextArc has been influential. It has been seen by thousands at various shows, and has inspired other text visualization projects, including a visualization of the history of information visualization, for an information visualization conference (InfoVis) [Hsu, 2004].
TextArc explores the relationship between structure and meaning, raising the question of how much meaning is inherent in structure. For it importance is based on frequency, and connection is based on co-location. Ultimately, though, it privileges the word as a discrete unit.
Although TextArc is presented as a way to quickly understand a text, it is arguably more useful to consult a review of a novel in order to quickly establish it's meaning. Some texts, such as speeches, may be more appropriate to using TextArc in this fashion.
Lines radiated from the word King (in gray) and Queen (in orange) showing that both words chiefly appear in two sections of the book, although there are four prior occasions at regular intervals that mention the Queen: closer inspection reveals these to be foreshadowing mentions of the Queen by other characters.
Technologies
TextArc is a Java applet, typically run in a web browser. Java is an Operating System-independent programming language released by Sun in 1995. It is designed to solve the problem of needing to rewrite code in order to port it from system to system.
The text is the key input parameter to the applet. Due to being linked to Project Gutenberg it is able to visualize thousands of texts. Other input that could be usefully displayed by TextArc includes:
- E-mails archives
- Legal documents
- Source code
- Financial news updates
- Genomics
History
TextArc was conceived, designed and developed by Bradford Paley. Bradford teaches interaction design as "cognitive engineering" at Columbia University. He is also a consultant for Wall Street, creating visualizations for stock traders. The program was originally conceived as a text analysis tool.
Bradford Paley did not write every line of code, though. Hai Ng, JueyChong Ong, and Greta Peterman at Digital Image Design Incorporated (didi) worked on a Java codebase from which TextArc was built. Inspiration for aspects of the cacheing that allows the entire text to appear twice on screen is credited to Clifford Beshers[2].
TextArc was released in 2002 (although a preview was shown at the Banff Centre for the Arts in 2001). Since then it has been displayed in numerous locations, including:
- Columbia University
- SIGGRAPH Art Show (Bradford Paley was a "working artist" at the show)
- New York Public Library plasma screen
- Whitney Museum of Modern Art ARTPORT gallery
- The Japan Media Arts Festival in 2002, where it won the Grand Prize Non-Interactive Digital Art Award for a poster of TextArc displaying Alice in Wonderland
- Places & Spaces Part 4: 2nd iteration, where TextArc displayed the text of "History of Science", in 2006
- Google Project Room at Chelsea Art Museum in 2010
References
Research Report by Katrina Kimport for the Transliteracies Project, 2006.
Emerson, L. "Digital Poetry as Reflexive Embodiment." Cybertext Yearbook (2002-2003): 88 – 106.
Hsu, T.-W.; Inman, L.; Mccolgin, D.; Stamper, K. (2004): MonkEllipse: Visualizing the History of Information Visualization. In: Proc. of INFOVIS’04, IEEE Computer Society, p. 216.9.
Paul, Christiane. “The Database as System and Cultural Form: Anatomies of Cultural Narratives.” Database Aesthetics: Art in the Age of Information Overflow. Ed. Victoria Vesner. Minneapolis: U of Minnesota P, 2007. 95-109.
Chen, Chaomei. "Information visualization." Wiley Interdisciplinary Reviews Computational Statistics, Volume 2, Issue 4, pages 387–403, July/August 2010.