TextArc

From CIRCA

Jump to: navigation, search
VTracker
Content deleted. (5 Occurances)
Content stucture deleted. (2 Occurances)
Content inserted. (19 Occurances)
Content structure inserted. (15 Occurances)
Content changed. (8 Occurances)
Content moved. (3 Occurances)
Content NEED DESCRIPTION. (1 Occurances)
Content structure moved. (1 Occurances)

  • *

TextArc (http://www.textarc.org/)

Michael Burden

17 November 2010

Contents

Introduction to TextArc

TextArc is a program for visualizing the structure of text. It treats the text as a "data container" [Paul, 2007] and uses word frequency and pairings to develop a "visual index" [Kimport, 2006].

TextArc draws the full text on an outer spiral (the spiralhighlighted in green on the image below), meaning that the entiretext is always visible. Although this full text is in a 1 pointfont, it can be read by hovering over a line.Every word is thenrepeated inside the spiral, with the word's appearances in thetext pulling it into position (as though a "rubber band"was attached to each appearance [Fildes, 2002]). Thus Alice, whichappears throughout the book, appears in the centre, while theGryphon, who appears only in a single chapter, appears close to 9o'clock.

Selecting a word will show a line to every appearance of that wordin the text. It can also show, in a different colour (purple in theimage below), lines to other words that are associated by proximityto that word. Additionally TextArc can 'play' the text byshowing an orange line moving through each word in sequence.

The visualization is highly interactive, allowing the user to move through the text and it's words to explore the text's structure. The overall display is also artistic in design (and has been shown in numerous art galleries and shows) and blurs the lines between tool, science, art, and text [Chen, 2010].

Screenshot of TextArc

TextArc is shown reading Alice in Wonderland in orange with words associated to Alice glowing purple while the places the word Alice appears showing in green on the outer ring

Audience and Purpose

"TextArc was developed to help people deal with the ever-increasing influx of data they are forced to accept and integrate into their knowledge base" [Paley, 2002].

TextArc is aimed at an audience of people who need to filter atext quickly. This might include academics, financial analysts, orlawyers. It is able to expose the structure implied by worddistribution, and the timing and interconnection of the words. Itcan potentially provide a deeper interpretation of the text basedon it's structure.

The developer of TextArc frames it's use in this way: "Suppose you have 5 minutes to understand a 500 page book with no index or chapters..."

As a work of art, it is designed with aesthetic principles in mind, and has been called beautiful [Vesna, p.101] and "remarkable" [Mirapaul, 2002].

Significance

TextArc has been described as "the first accurate cyber-accountant of literature that is capable of analysing the content and structure of a text" [Emerson, 2003]. This quote appears in reference to a book by Italo Calvino ("If on a Winter's Night a Traveller") which proposes the idea of a computer that can understand a text by analysing it's word count and frequency.

A simple interpretation of it's usage is that by exposing the structure of a text it allows researchers to better analyse and interpret the text.

Bradford Paley, the developer, suggests that deeper interpretations are also possible[1]. Looking at the image below which shows thewords 'King' and 'Queen' selected, both appearmainly in two sections of the book (around 8 o'clock and 11o'clock). We can see that the Queen dominates the firstappearance, while the King dominates the second. But beyond this,we can see that before these sections, the Queen is mentioned onfour occasions. These are other characters mentioning the Queen,foreshadowing her arrival and ensuring she stays in thereader's minds. The even spacing indicates that CS Lewisstructured these mentions deliberately, aware that the Queen was akey figure in the novel's d??nouement.

TextArc has been influential. It has been seen by thousands ofpeople at various shows, and has inspired other text visualizationprojects, including a visualization of the history of informationvisualization, for an information visualization conference (InfoVis) [Hsu, 2004].

TextArc explores the relationship between structure and meaning, raising the question of how much meaning is inherent in structure. For it importance is based on frequency, and connection is based on co-location. Ultimately, though, it privileges the word as a discrete unit. It is not able to distinguish the different meanings a single word may have. Nor is it able to interpret the meanings that emerge through specific phrasings of sequences of words, other than noticing their proximity.

Although TextArc is presented as a way to quickly understand a text, it is arguably more useful to consult a review of a novel in order to quickly establish it's meaning. Some texts, such as speeches, may be more appropriate to using TextArc in this fashion.

Screenshot of TextArc

In this image, lines radiated from the word King (in gray) andQueen (in orange) showing that both words chiefly appear in twosections of the book, although there are four prior occasions atregular intervals that mention the Queen: closer inspection revealsthese to be foreshadowing mentions of the Queen by othercharacters.

Technologies

TextArc is a Java applet, typically run in a web browser. Java is an Operating System-independent programming language released by Sun in 1995. It is designed to solve the problem of needing to rewrite code in order to port it from system to system.

The text is the key input parameter to the applet. Due to being linked to Project Gutenberg it is able to visualize thousands of texts. Other input that could be usefully displayed by TextArc includes:

  • E-mails archives
  • Legal documents
  • Source code
  • Financial news updates
  • Genomics

History

TextArc was conceived, designed and developed by Bradford Paley.Bradford teaches interaction design as "cognitive engineering" at Columbia University. He is also a consultant for Wall Street, creating visualizations for stock traders. The program was originally conceived as a text analysis tool.

Bradford Paley did not write every line of code. Hai Ng,JueyChong Ong, and Greta Peterman at Bradford's company DigitalImage Design Incorporated (didi) worked on a Java codebase from whichTextArc was built. As well, inspiration for aspects of the cacheingthat allows the entire text to appear (twice) on the screen iscredited to Clifford Beshers [2].

TextArc was released in 2002 (although a preview was shown at the Banff Centre for the Arts in 2001). Since then it has been displayed in numerous locations, including:

*Citations

*Chen, Chaomei. "Information visualization." Wiley Interdisciplinary Reviews Computational Statistics, Volume 2, Issue 4, pages 387???403, July/August 2010.

*

Emerson, L. "Digital Poetry as Reflexive Embodiment." Cybertext Yearbook (2002-2003): 88 ??? 106.

Fildes, Jonathon. (2002, November 29). Visual net spins literary web. BBC News. Retrieved 19 November 2010

Hsu, T.-W.; Inman, L.; Mccolgin, D.; Stamper, K. (2004): MonkEllipse: Visualizing the History ofInformation Visualization. In: Proc. of INFOVIS???04, IEEE Computer Society, p. 216.9.

Kimport, Katrina. (2006). TextArc Research Report.Transliteracies Project (Research in the Technological, Social, and Cultural Practices of Online Reading). Home page. University of California. Retrieved 19 November 2010.

Mirapaul, Matthew. (September 16, 2002). "Secrets of Digital Creativity Revealed in Miniatures" The New York Times, "Arts" section. Retrieved 20 November 2010.

Paley, W. Bradford. 2002. "TextArc: Showing word frequency and distribution in text". In Proc. of IEEE Symp. on Information Visualization (InfoVis), Poster, Boston, USA, October. IEEE Computer Society.

Paul, Christiane. ???The Database as System and Cultural Form: Anatomies of Cultural Narratives.??? Database Aesthetics: Art in the Age of Information Overflow. Ed. Victoria Vesner. Minneapolis: U of Minnesota P, 2007. 95-109.

Vesna, Victoria, ed. Database Aesthetics: Art in the Age of Information Overflow. Minneapolis: Minn., 2007. Print.

Personal tools