CIRCA:Text Analysis Literature Review

From CIRCA

Revision as of 16:51, 27 January 2011 by AshleyMoroz (Talk | contribs)
(diff) ← Older revision | Current revision (diff) | Newer revision → (diff)
Jump to: navigation, search
  • CYBERGATE: A DESIGN FRAMEWORK AND SYSTEM FOR TEXT ANALYSIS OF COMPUTER-MEDIATED COMMUNICATION.

By: Abbasi, Ahmed; Chen, Hsinchun. MIS Quarterly, Dec2008, Vol. 32 Issue 4, p811-837

  • Supervised categorization of JavaScriptTM using program analysis features.

By: Lu, W.; Kan, M.Y. Special issue on AIRS2005: Information Retrieval Research in Asia Information Processing and Management March 2007 43 (2): 431-444


Web pages often embed scripts for a variety of purposes, including advertising and dynamic interaction. Understanding embedded scripts and their purpose can often help to interpret or provide crucial information about the web page. We have developed a functionality-based categorization of JavaScript, the most widely used web page scripting language. We then view understanding embedded scripts as a text categorization problem. We show how traditional information retrieval methods can be augmented with the features distilled from the domain knowledge of JavaScript and software analysis to improve classification performance. We perform experiments on the standard WT10G web page corpus, and show that our techniques eliminate over 50% of errors over a standard text classification baseline.

Personal tools