วันอังคารที่ 5 ธันวาคม พ.ศ. 2560

Using Corpus Analysis Software to Analyse Specialised Texts

Using Corpus Analysis Software to Analyse Specialised Texts


1. What is a corpus?

A corpus is a collection of texts, written or spoken, usually stored in a computer database. A corpus may be quite small, for example, containing only 50,000 words of text, or very large, containing many millions of words.

Ref: https://21centurytext.wordpress.com/home-2/special-section-window-to-corpus/what-is-corpus/

2. Sources of language corpora

·        Subscribe to a large corpus provider such as the British National Corpus (BNC)
- http://www.natcorp.ox.ac.uk/
·        Use web concordancing
- http://corpus.leeds.ac.uk/protected/query.html (general corpus; English)
- http://corpus.byu.edu/ (general corpus; American/British English)
- http://lextutor.ca/conc/eng/ (general and specialized corpora; English)
·        Compile own corpora and analyze data using corpus analysis software
- Antconc’ (for monolingual corpus)
- ‘Wordsmith’ (for monolingual corpus)
- ‘Paraconc’ (for multilingual corpora)

3. Designing a specialized corpus

Corpus size
·        There are no fixed ruled; depending on research purposes, availability of data and time.
Text extracts vs. full texts
·        Depends on the aim of corpus compilation.
Number of texts
·        Choices can be made between collect few texts of large size or a number of texts with smaller sizes.
·        Choices can also be made between selecting texts written by one or two key writers or sources
·        Depends on your research focus e.g. to study overall language use or to study idiosyncrasy.
Medium
·        Can be spoken or written texts or mixed.
·        Depends on research questions.
Subject and text type
·        Should mainly focus on the specialized text under investigation, although this is less clear-cut in multidisciplinary subjects.
·        Texts may come from different subject if the research focus is on the study of particular language features rather than term extraction.
·        Text types within a specialized subject field may vary from ‘expert-to-expert’ texts to ‘expert-to-non-expert’ texts, or in other words, from technical to popular texts.
Other considerations
·        Authorship: Texts written by experts in a field tend to present more reliable and authentic examples of specialized language.
·        Language: Specialized texts can be stored and retrieved in the form of monolingual, comparable, or parallel corpora.
·        Publication date: Texts should come from recent publications unless queries are made in relation to particular periods of time.

 
4. Sources of specialized texts

·        Printed materials
·        Word document
·        CD-ROMs
·        Texts on the Web
·        Online databases

 
5. Getting started with Antconc

- Download the latest version of Antconc.
- Creating a specialized corpus profile

- Doing small-scaled research on your own specialized corpora.

ไม่มีความคิดเห็น:

แสดงความคิดเห็น

Acronyms

Directions: Find words or phrases standing for the following acronyms with  short descriptions.  1. IT : stands for the abbreviation fo...