CIRCA:Text Analysis Literature Review

From CIRCA

(Difference between revisions)
Jump to: navigation, search
 
(11 intermediate revisions not shown)
Line 1: Line 1:
-
* CYBERGATE: A DESIGN FRAMEWORK AND SYSTEM FOR TEXT ANALYSIS OF COMPUTER-MEDIATED COMMUNICATION.
+
Here is our Zotero lit review library. https://www.zotero.org/groups/jwdtd/items
-
By: Abbasi, Ahmed; Chen, Hsinchun. MIS Quarterly, Dec2008, Vol. 32 Issue 4, p811-837
+
-
* Supervised categorization of JavaScriptTM using program analysis features.
 
-
By: Lu, W.; Kan, M.Y. Special issue on AIRS2005: Information Retrieval Research in Asia Information Processing and Management March 2007 43 (2): 431-444
 
 +
"Text Analysis and the Dynamic Edition? A Working Paper, Briefly Articulating Some Concerns with an Algorithmic Approach to the Electronic Scholarly Edition." Ray Siemens, with the TAPoR Community. Text Technology. Volume 14 issue 1. 2005. pg. 91-98
 +
 +
In this article, Siemens talks about the rise of text analysis.  In the 1990’s text analysis and text analysis computing tools (TACT) were becoming popular.  However, these highly-encoded electronic texts did not offer text analysis features that meet our expectations.  Concerns over the development of the electronic scholarly edition fell into four areas: meeting community needs, expectations, and expertise or familiarity levels; repurposing existing tools and developing new tools; the seamless integration of those tools with one another; and development of an interface.  Among the community, hypertext was adopted quickly since it was more intuitive.  Basic searching, collocation, and condording were the main tools familiar to scholars, while the other tools the community was less familiar with.  Simple navigational strategies, as well as those more complex such as visualizations or the ability to work with large corpora was identified.  Text analysis tools such as those found in TAPoR and TACT could be repurposed and adapted.  However, new tools would also be created that will work with texts in various formats and encoded states.  These tools would also need to be integrated via an interface, which would be seamless and intuitive.
-
Web pages often embed scripts for a variety of purposes, including advertising and dynamic interaction. Understanding embedded scripts and their purpose can often help to interpret or provide crucial information about the web page. We have developed a functionality-based categorization of JavaScript, the most widely used web page scripting language. We then view understanding embedded scripts as a text categorization problem. We show how traditional information retrieval methods can be augmented with the features distilled from the domain knowledge of JavaScript and software analysis to improve classification performance. We perform experiments on the standard WT10G web page corpus, and show that our techniques eliminate over 50% of errors over a standard text classification baseline.
 
-
* Rockwell, Geoffrey and John Bradley. "Eye-ConTact: Towards a New Design for Text-Analysis Tools." CHWP A.4, publ. February 1998. <http://www.chass.utoronto.ca/epc/chwp/rockwell/>
 
-
Anyone who has used a text-analysis tool like TACT has at some point been frustrated by its limitations and wished that a feature or two could be added. In 1992 we set out to imagine a text-analysis environment which would not only have the features we desired, but could be extended continually as our research evolved. This paper describes the limitations of current tools, possible solutions to these limitations, and the design philosophy behind Eye-ConTact, a prototype of a visual programming environment suited for text manipulation.
 
-
* Siemens, Ray et al. "A Study of Professional Reading Tools for Computing Humanists." A report at <http://etcl-dev.uvic.ca/public/pkp_report/>
+
"Towards Next Generation Text Analysis Tools: The Text Analysis Markup Language (TAML)". Stéfan Sinclair. Text Technology. Volume 14 issue 1. 2005. pg 99-107
-
A good deal of the emerging research literature concerned with online information resources focuses on information retrieval, which is concerned with the use of search engines to locate desired information. Far less attention has been paid to how the found materials are read and how that critical engagement can be enhanced in online reading environments. This paper reports on a study examining the question of whether a set of well-designed reading tools can assist humanities computing scholars in comprehending, evaluating and utilizing the research literature in their area. Thirteen computing humanists were interviewed regarding their experience using the reading tools. They were asked which tools, if any, and to what degree, these tools contribute to their comprehension, evaluation, and interest in utilizing the work they are reading. Responses varied widely among users but it was found that overall, the reading tools had the potential to lead to a variety of useful additional materials that would help one come to a better understanding of a particular article. The reading tools were deemed to be an exceptionally good resource for students or beginners in the field. Participants also identified several issues with the tools themselves and the web as a whole that affect the online reading and research experience.
+
In this article, Sinclair discusses the Text Analysis Markup Language (TAML) so the community can help develop text analysis tools. Humanists do not recognize text analysis tools, and those being used are limited to older applications such as TACT Wordcruncher, and Concordancer. Other tools that are available are stand-alone programs that do not benefit the larger community of developers. Development of text analysis tools has been a long-term plan with the proposed establishment of peer review.  By providing incentives and recognition, more colleagues will be more willing to develop new text analysis tools.  The TAML, which is discusses, allows developers to reuse many resources so they only have to focus on creating innovative functionality. Developers can extend the possibility of existing tools rather than building them again.  So far there has not been a successful effort of code development for text analysis tools.  These developments failed due to a lack of robust mechanism for data interchange between tools.
 +
 
 +
 
 +
 
 +
 
 +
"Drawing Knowledge from Information: Early Modern Texts and Images on the TAPoR Platform". Claire Carlin . Text Technology. Volume 14 issue 1. 2005. pg 13-20
 +
 
 +
 
 +
 
 +
 
 +
"Determining Value for Digital Humanities Tools: Report on a Survey of Tool Developers". Susan Schreibman. Digital Humanities Quaterly. Volume 4 issue 2. 2010.
 +
 
 +
 
 +
 
 +
 
 +
"Eye-ConTact: Towards a New Design for Text-Analysis Tools". Geoffrey Rockwell, John Bradley. CHWP A.4. Feb. 1998.
 +
 
 +
 
 +
 
 +
 
 +
"Words, Patterns and Documents: Experiments in Machine Learning and Text Analysis". Shlomo Argamon, Mark Olsen. Digital Humanities Quarterly. Volume 3 issue 2. Spring 2009.
 +
 
 +
 
 +
 
 +
 
 +
"ManiWordle: Providing Flexible Control over Wordle". Kyle Koh, Bongshin Lee, Bohyoung Kim, Jinwook Seo. Visualization and Computer Graphics, IEEE. Volume 16 issue 6. 2010. pg 1190-1197
 +
 
 +
 
 +
 
 +
"Participatory Visualization with Wordle." F.B. Viegas, M. Wattenberg, J. Feinberg. Visualization and Computer Graphics, IEEE. Volume 15 issue 6. 2009. pg 1137-1144
 +
 
 +
 
 +
 
 +
"Using Wordle as a Supplementary Research Tool". Carmel McNaught, Paul Lam. Qualitative Report. Volume 15 issue 3. May 2010. pg 630-643

Current revision as of 12:35, 8 April 2011

Here is our Zotero lit review library. https://www.zotero.org/groups/jwdtd/items


"Text Analysis and the Dynamic Edition? A Working Paper, Briefly Articulating Some Concerns with an Algorithmic Approach to the Electronic Scholarly Edition." Ray Siemens, with the TAPoR Community. Text Technology. Volume 14 issue 1. 2005. pg. 91-98

In this article, Siemens talks about the rise of text analysis. In the 1990’s text analysis and text analysis computing tools (TACT) were becoming popular. However, these highly-encoded electronic texts did not offer text analysis features that meet our expectations. Concerns over the development of the electronic scholarly edition fell into four areas: meeting community needs, expectations, and expertise or familiarity levels; repurposing existing tools and developing new tools; the seamless integration of those tools with one another; and development of an interface. Among the community, hypertext was adopted quickly since it was more intuitive. Basic searching, collocation, and condording were the main tools familiar to scholars, while the other tools the community was less familiar with. Simple navigational strategies, as well as those more complex such as visualizations or the ability to work with large corpora was identified. Text analysis tools such as those found in TAPoR and TACT could be repurposed and adapted. However, new tools would also be created that will work with texts in various formats and encoded states. These tools would also need to be integrated via an interface, which would be seamless and intuitive.



"Towards Next Generation Text Analysis Tools: The Text Analysis Markup Language (TAML)". Stéfan Sinclair. Text Technology. Volume 14 issue 1. 2005. pg 99-107

In this article, Sinclair discusses the Text Analysis Markup Language (TAML) so the community can help develop text analysis tools. Humanists do not recognize text analysis tools, and those being used are limited to older applications such as TACT Wordcruncher, and Concordancer. Other tools that are available are stand-alone programs that do not benefit the larger community of developers. Development of text analysis tools has been a long-term plan with the proposed establishment of peer review. By providing incentives and recognition, more colleagues will be more willing to develop new text analysis tools. The TAML, which is discusses, allows developers to reuse many resources so they only have to focus on creating innovative functionality. Developers can extend the possibility of existing tools rather than building them again. So far there has not been a successful effort of code development for text analysis tools. These developments failed due to a lack of robust mechanism for data interchange between tools.



"Drawing Knowledge from Information: Early Modern Texts and Images on the TAPoR Platform". Claire Carlin . Text Technology. Volume 14 issue 1. 2005. pg 13-20



"Determining Value for Digital Humanities Tools: Report on a Survey of Tool Developers". Susan Schreibman. Digital Humanities Quaterly. Volume 4 issue 2. 2010.



"Eye-ConTact: Towards a New Design for Text-Analysis Tools". Geoffrey Rockwell, John Bradley. CHWP A.4. Feb. 1998.



"Words, Patterns and Documents: Experiments in Machine Learning and Text Analysis". Shlomo Argamon, Mark Olsen. Digital Humanities Quarterly. Volume 3 issue 2. Spring 2009.



"ManiWordle: Providing Flexible Control over Wordle". Kyle Koh, Bongshin Lee, Bohyoung Kim, Jinwook Seo. Visualization and Computer Graphics, IEEE. Volume 16 issue 6. 2010. pg 1190-1197


"Participatory Visualization with Wordle." F.B. Viegas, M. Wattenberg, J. Feinberg. Visualization and Computer Graphics, IEEE. Volume 15 issue 6. 2009. pg 1137-1144


"Using Wordle as a Supplementary Research Tool". Carmel McNaught, Paul Lam. Qualitative Report. Volume 15 issue 3. May 2010. pg 630-643

Personal tools