CIRCA:Text Analysis Literature Review

From CIRCA

Revision as of 17:05, 27 January 2011 by AshleyMoroz (Talk | contribs)
Jump to: navigation, search
  • CYBERGATE: A DESIGN FRAMEWORK AND SYSTEM FOR TEXT ANALYSIS OF COMPUTER-MEDIATED COMMUNICATION.

By: Abbasi, Ahmed; Chen, Hsinchun. MIS Quarterly, Dec2008, Vol. 32 Issue 4, p811-837

  • Supervised categorization of JavaScriptTM using program analysis features.

By: Lu, W.; Kan, M.Y. Special issue on AIRS2005: Information Retrieval Research in Asia Information Processing and Management March 2007 43 (2): 431-444


Web pages often embed scripts for a variety of purposes, including advertising and dynamic interaction. Understanding embedded scripts and their purpose can often help to interpret or provide crucial information about the web page. We have developed a functionality-based categorization of JavaScript, the most widely used web page scripting language. We then view understanding embedded scripts as a text categorization problem. We show how traditional information retrieval methods can be augmented with the features distilled from the domain knowledge of JavaScript and software analysis to improve classification performance. We perform experiments on the standard WT10G web page corpus, and show that our techniques eliminate over 50% of errors over a standard text classification baseline.

Anyone who has used a text-analysis tool like TACT has at some point been frustrated by its limitations and wished that a feature or two could be added. In 1992 we set out to imagine a text-analysis environment which would not only have the features we desired, but could be extended continually as our research evolved. This paper describes the limitations of current tools, possible solutions to these limitations, and the design philosophy behind Eye-ConTact, a prototype of a visual programming environment suited for text manipulation.

A good deal of the emerging research literature concerned with online information resources focuses on information retrieval, which is concerned with the use of search engines to locate desired information. Far less attention has been paid to how the found materials are read and how that critical engagement can be enhanced in online reading environments. This paper reports on a study examining the question of whether a set of well-designed reading tools can assist humanities computing scholars in comprehending, evaluating and utilizing the research literature in their area. Thirteen computing humanists were interviewed regarding their experience using the reading tools. They were asked which tools, if any, and to what degree, these tools contribute to their comprehension, evaluation, and interest in utilizing the work they are reading. Responses varied widely among users but it was found that overall, the reading tools had the potential to lead to a variety of useful additional materials that would help one come to a better understanding of a particular article. The reading tools were deemed to be an exceptionally good resource for students or beginners in the field. Participants also identified several issues with the tools themselves and the web as a whole that affect the online reading and research experience.

Personal tools