Using Corpus Analysis Software to Analyse Specialised Texts
1. What is a corpus?
A corpus is a collection of texts, written or spoken,
usually stored in a computer database. A corpus may be quite small, for
example, containing only 50,000 words of text, or very large, containing many
millions of words.
Ref: https://21centurytext.wordpress.com/home-2/special-section-window-to-corpus/what-is-corpus/
2. Sources of language corpora
· Subscribe to
a large corpus provider such as the British National Corpus (BNC)
- http://www.natcorp.ox.ac.uk/
· Use web
concordancing
- http://corpus.leeds.ac.uk/protected/query.html (general
corpus; English)
- http://corpus.byu.edu/ (general corpus; American/British
English)
- http://lextutor.ca/conc/eng/ (general and specialized
corpora; English)
· Compile own corpora and analyze data using
corpus analysis software
- Antconc’ (for monolingual corpus)
- ‘Wordsmith’ (for monolingual corpus)
- ‘Paraconc’ (for multilingual corpora)
3. Designing a specialized corpus
Corpus size
· There are no
fixed ruled; depending on research purposes, availability of data and
time.
Text extracts vs. full texts
· Depends on the
aim of corpus compilation.
Number of texts
· Choices can
be made between collect few texts of large size or a number of texts with
smaller sizes.
· Choices can
also be made between selecting texts written by one or two key writers or
sources
· Depends on
your research focus e.g. to study overall language use or to study
idiosyncrasy.
Medium
· Can be spoken
or written texts or mixed.
· Depends on
research questions.
Subject and text type
· Should mainly
focus on the specialized text under investigation, although this is less
clear-cut in multidisciplinary subjects.
· Texts may
come from different subject if the research focus is on the study of particular
language features rather than term extraction.
· Text types
within a specialized subject field may vary from ‘expert-to-expert’ texts to
‘expert-to-non-expert’ texts, or in other words, from technical to popular
texts.
Other considerations
· Authorship:
Texts written by experts in a field tend to present more reliable and authentic
examples of specialized language.
· Language:
Specialized texts can be stored and retrieved in the form of monolingual,
comparable, or parallel corpora.
· Publication
date: Texts should come from recent publications unless queries are made in
relation to particular periods of time.
4. Sources of specialized texts
· Printed
materials
· Word document
· CD-ROMs
· Texts on the
Web
· Online
databases
5. Getting started with Antconc
- Download the latest version of Antconc.
- Creating a specialized corpus profile
- Doing small-scaled research on your own specialized
corpora.
ไม่มีความคิดเห็น:
แสดงความคิดเห็น