• Ei tuloksia

Using Specialised Target Language Corpora to Improve Translation Quality - An experimental case study of terminological accuracy in LSP translation

N/A
N/A
Info
Lataa
Protected

Academic year: 2023

Jaa "Using Specialised Target Language Corpora to Improve Translation Quality - An experimental case study of terminological accuracy in LSP translation"

Copied!
98
0
0

Kokoteksti

(1)

UNIVERSITY OF EASTERN FINLAND PHILOSOPHICAL FACULTY

SCHOOL OF HUMANITIES English language and translation

Tuomas Ari Tapani Kuosmanen

Using Specialised Target Language Corpora to Improve Translation Quality

An experimental case study of terminological accuracy in LSP translation

MA Thesis

May 2017

(2)

ITÄ-SUOMEN YLIOPISTO – UNIVERSITY OF EASTERN FINLAND

Tiedekunta – Faculty

Philosophical Faculty Osasto – School

School of Humanities Tekijät – Author

Tuomas Ari Tapani Kuosmanen Työn nimi – Title

Using Specialised Target Language Corpora to Improve Translation Quality An experimental case study of terminological accuracy in LSP translation Pääaine – Main subject Työn laji – Level Päivämäärä –

Date Sivumäärä – Number of pages

English Language and Translation

Pro gradu -tutkielma x

20.05.2017 68 pages + Appendix of 17 Sivuainetutkielma

Kandidaatin tutkielma Aineopintojen

tutkielma Tiivistelmä – Abstract

Technological advancements during the past few decades have opened up new avenues for linguistic research.

Particularly, corpus-based Translation Studies has become more relevant, since it is now possible to compile and process large amounts of information electronically. However, research into corpus-enhanced translation has been limited despite the possible benefits it may hold for translating language for specific purposes (LSP).

This makes it an interesting and necessary field of study.

The first chapter of the theoretical section introduces corpus linguistics as a methodology and how it has been applied to translation research. The following chapters examine what corpora are, how to classify them, and how corpora are compiled for research purposes. The chapters also discuss the associated legal and ethical ramifications of using online materials to compile corpora, as explored by Wilkinson (2005) and McEnery and Hardie (2012). The theoretical section concludes with an in-depth look at the two key issues of this study. The first issue is the problem of translating LSP, and the second is translation quality assessment (TQA). Based on the theoretical section, I created a TQA model using the models of House (2015) and Hassani (2011) as reference.

The hypothesis of this study was that using a specialised target language corpus (STLC) as a translation aid when translating specialised field texts increases translation accuracy. Specifically, this study was interested in the terminological accuracy of medical texts and whether or not using a corpus improves said accuracy. Moreover, there were two research questions which this study also sought to answers. First, how does using a corpus affect the translation process and, secondly, what position do corpora inhabit among other translation aids? To test the hypothesis, and to answer the research questions, an experimental case study was conducted where six test subjects translated two Finnish language medical texts into English. One of these texts was translated using the General Oncology Corpus (GOC) and the other without the corpus. The instances of terminological translation within the target texts were then analysed according to the TQA model described in the theory section. These results were then contrasted against the translations done without the corpus to determine whether the GOC had improved the terminological accuracy of the translations or not.

The hypothesis of this study was not supported, but it was discovered that using a corpus in conjunction with other translations aids seemed to produce more accurate translations faster than translating without a corpus.

The two research questions were also answered. The corpus had a beneficial effect on the translation process and it yielded the best results when it was used as an auxiliary translation aid.

Avainsanat – Keywords

corpus-based translation studies; translating language for special purposes; translation quality assessment.

(3)

ITÄ-SUOMEN YLIOPISTO – UNIVERSITY OF EASTERN FINLAND

Tiedekunta – Faculty

Filosofinen tiedekunta Osasto – School Humanistinen osasto Tekijät – Author

Tuomas Ari Tapani Kuosmanen Työn nimi – Title

Using Specialised Target Language Corpora to Improve Translation Quality An experimental case study of terminological accuracy in LSP translation Pääaine – Main subject Työn laji – Level Päivämäärä –

Date Sivumäärä – Number of pages

Englannin kieli ja kääntäminen

Pro gradu -tutkielma x

20.05.2017 68 sivua + liite, 17 sivua Sivuainetutkielma

Kandidaatin tutkielma Aineopintojen

tutkielma Tiivistelmä – Abstract

Viime vuosikymmenien aikana tapahtuneet teknologiset edistysaskeleet ovat avanneet uusia ovia lingvistiselle tutkimukselle. Etenkin korpuspohjainen käännöstutkimus on tullut yhä oleellisemmaksi osaksi

kielentutkimusta, koska tietokoneet ja Internet ovat mahdollistaneet suurien tietomäärien kokoamisen ja käsittelyn nopeasti ja tehokkaasti. Vaikka korpuspohjainen kielentutkimus on lisääntynyt viime

vuosikymmenten aikana sen koko potentiaalia ei ole kuitenkaan hyödynnetty. Esimerkiksi korpuksen käyttöä erikoiskielen kääntämisessä ei ole tutkittu laajalti, mistä johtuen se on ajankohtainen ja aiheellinen

tutkimuskohde.

Teoriaosion ensimmäinen kappale pohjustaa tutkimusta esittelemällä korpuslingvistiikan tapana tutkia kieltä ja esittelee, kuinka sitä on jo sovellettu käännöstutkimukseen. Seuraavat teoriaosion kappaleet käsittelevät korpuksia yleisellä tasolla, ja sitä miten niitä luokitellaan ja kootaan tutkimus tarkoituksiin. Lisäksi osiossa tarkastellaan Wilkinsonin (2005) ja McErnien ja Hardien (2012) kuvailemia Internet materiaalista koottaviin korpuksiin liittyviä laillisia ja eettisiä ongelmia. Teoriaosio päättyy erikoiskielen kääntämisen vaikeuden ja käännöksen laadun arvioinnin käsittelyyn. Teoriaosion pohjalta tutkimusta varten kehitettiin käännös laadun arviointimalli, joka pohjautui Housen (2015) ja Hassanin (2011) malleihin.

Tutkimuksen hypoteesi oli, että erikoiskielen korpuksen käyttäminen erikoiskieltä kääntäessä parantaa käännöksen tarkkuutta. Tutkimus keskittyi lääketieteellisten käännösten terminologisen tarkkuuden mittaamiseen.

Hypoteesin testaamisen lisäksi tutkimuksessa oli myös kaksi tutkimuskysymystä. Nämä kysymykset olivat:

kuinka korpuksen käyttö vaikuttaa käännösprosessiin ja kuinka korpusta tulisi käyttää muiden käännösapuvälineiden kanssa? Hypoteesia testattiin kokeellisella tutkimuksella, jossa kuusi koehenkilöä käänsivät suomenkielisen lääketieteellisen tekstin englanniksi käyttäen korpusta ja toisen tekstin ilman sitä. Käännöksissä olleet termit analysoitiin teoriaosiossa kuvaillun laadunarviointimallin mukaan. Korpuskäännöksistä saatuja tuloksia

verrattiin kontrolliryhmän käännöksiin ja sen pohjalta arvioitiin, oliko korpuksen käyttö parantanut käännösten terminologista tarkkuutta.

Tutkimuksen hypoteesi ei osoittautui todeksi, sillä pelkkä korpuksen käyttö ei lisännyt käännösten tarkkuutta.

Korpuksen käyttö yhdessä muiden käännösapuvälineiden kanssa kuitenkin tuotti tarkempia ja nopeammin tehtyjä käännöksiä kontrolliryhmän käännöksiin verrattuna. Molempiin tutkimuskysymyksiinkin saatiin vastaus.

Korpuksella oli positiivinen vaikutus käännösprosessiin. Lisäksi korpus oli hyödyllisimmillään, kun sitä käytettiin toisten käännösapuvälineiden kanssa.

Avainsanat – Keywords

korpuspohjainen käännöstutkimus; erikoiskielen kääntäminen; käännöslaadun arviointi

(4)

Contents

1. Introduction ...1

2. Corpus linguistics and applied corpus-based Translation Studies ...4

2.1. Corpora in Translation Studies...6

2.2. What is a corpus?...7

2.3. Typology of corpora ... 12

2.4. Compiling corpora for translation purposes ... 16

2.5. The legal issue with compiling corpora from the Internet... 19

3. The problem of translating language for special purposes (LSP) ... 22

3.1. Translating LSP with corpora ... 23

3.2. Corpora in comparison to other translation aids when translating LSP ... 24

3.2.1. Dictionaries... 25

3.2.2. The Internet ... 26

3.2.3. Translation memories ... 28

4. Overview of translation quality assessment (TQA) ... 30

4.1. The problem of TQA ... 30

4.2. Evaluating medical translations ... 32

4.3. A model for evaluating terminological accuracy in medical translations ... 34

4.4. Examples of how the TQA model was implemented on a general level... 37

5. Methods and materials ... 40

5.1. Study setting and participants ... 40

5.2. General Oncology Corpus ... 43

5.3. Finnish medical source texts translated in this study ... 44

5.4. Evaluation corpora... 45

5.5. Camtasia recordings... 46

6. Results ... 48

6.1. Overall results ... 48

6.2. Text 1 translation results ... 55

6.3. Disambiguation of Text 1 corpus translation results ... 56

6.3.1. Translation 1 (T1)... 57

6.3.2. Translation 2 (T2)... 57

6.3.3. Translation 3 (T3)... 58

6.3.4. Summation of Text 1 corpus translation results ... 59

6.4. Text 2 translation results ... 59

6.5. Disambiguation of Text 2 corpus translation results ... 60

6.5.1. Translation 4 (T4)... 61

(5)

6.5.2. Translation 5 (T5)... 61

6.5.3. Translation 6 (T6)... 62

6.5.4. Summation of Text 2 corpus translation results ... 63

7. Discussion and conclusions... 65

References... 69

Appendices... 76

(6)

1. Introduction

The aim of this thesis is to explore the use of a specialised target language corpus (STLC) in translation and to find out if it has a positive impact on the resulting translations. Specifica lly, this study aims to test the hypothesis that using an STLC will improve the accuracy of special field translation. For the purposes of this study, the terms accuracy and quality will be used interchangeably to refer to terminological accuracy. The hypothesis also serves as the basis for two research questions, which are the following:

1) How does using a corpus affect the translation process?

2) What position do corpora inhabit among other translation aids?

To test the hypothesis, and to answer the research questions, I conducted an experimental case study. In this study, six test subjects translated two oncology related Finnish language medical articles into English. All of the subjects were advanced students of English language and translation, meaning they all had finished their BA theses. One subject had recently graduated with an MA degree and another was close to doing so. The rest of the subjects were studying at MA level. For the study, the subjects were divided into two groups of three and translated one of the Finnish medical texts using a General Oncology Corpus (GOC) and the other without it. Furthermore, the groups translated different texts using the corpus so that the resulting translations formed both the control group and the test group for the study. This approach evened out any difference in skill between the subjects and also helped to reduce any impact that the order in which the texts were translated might have had. The GOC was compiled specifically for use in this study and it consists of 288 English language oncology related medical texts, taken from three different medical journals and amounting to 2,236,064 words.

Accuracy in this study was defined as terminological accuracy or equivalence of the target text, i.e. the translation, to the source text, i.e. the original untranslated text. The accuracy of the translations was determined through the use of two separate evaluation corpora – one for each Finnish medical article. These evaluation corpora were compiled from the English langua ge source/parallel texts and abstracts of the Finnish language source texts to ensure that the terminology was as close to author’s original intent as possible.

The reason why a study like this is pertinent in the current state of Translation Studies is because

(7)

2

technology has advanced rapidly during the past three decades. We live in a digital age, where we have access to vast amounts of almost any kind of information through the Internet. Not only can we access that information, but we also have the tools (computers) to compile and analyse it effectively. Because of this technological development, it is now possible to compile large, electronic corpora, which are collections of written and transcribed spoken text, with relative ease (Xiaoping and Rij-Heyligers 2008: 2). This has led to the corpus-based approach to language study becoming more common throughout the various disciplines of linguistics and Translation Studies (McEnery and Hardie 2012: 1, Kennedy: 2014: 2).

However, the field of corpus-based approach to Translation Studies (CTS) is still a relative ly new one (Tengku Mahadi, et al. 2010: 35) and, as such, is in need of new research. Furthermore, the use of corpora as actual translation aids has not been explored to the same extent as various other aspects of CTS (Wilkinson 2005). There has been relatively little research into how corpora affect translation on a practical level. Most research on the subject of corpora as translation aids has been more descriptive in nature (see e.g. Bowker 2000, Zanettin 2002, Wilkinson 2005). Although some studies into how a corpus can improve translation quality do exist (see e.g. Bowker 1998), they have been inconclusive. Because of this lack of recent, in- depth research and the still uncharted possibilities that modern corpora hold, research into the subject of corpus-enhanced translation is relevant.

My reasons for choosing this subject are three-fold. First, there is the aforementioned need for new in-depth research into the growing field of corpus-based Translation Studies (CTS).

Secondly, this study could provide insight on how corpora can be better utilized in translatio n practice and how a corpus affects the translation process. Thirdly, I have a personal interest to continue to study the subject of corpus-enhanced translation (see Kuosmanen 2013).

The theoretical framework of this study aims to define corpora and place them among other translation aids. The theoretical section explains what a corpus is, how it is compiled, and why its use in translation practice is the subject of this study. In addition, a brief history of corpus linguistics and how corpora have been used in translation research is presented in order to provide context. The section ends with identifying and exploring the main issues of langua ge for special purposes (LSP) and translation quality assessment (TQA). Throughout the thesis, the theory will be discussed in relation to the study itself.

The thesis is structured as follows: the theoretical section begins with Chapter 2, which provides a brief overlook at the history of corpus linguistics, which explains in more detail what CTS is.

(8)

3

This is followed by a short introduction into how the methodology of corpus linguistics has been implemented in translation research. In addition, Chapter 2 defines corpora and their typology, as well as how the GOC fits the presented definitions and categorisations. The chapter also covers the practice of compiling corpora for translation purposes from the Internet and the legal/ethical ramifications therein. Chapter 3 discusses the issue of translating LSP and it also places corpora in the larger context of translation aids to help define the best way to use them in translation. Chapter 4 presents the topic of TQA alongside a presentation of the prerequisites for evaluating medical translations and a proposition for a model for evaluating the terminological accuracy in medical translations. The empirical portion of this experimental case study begins in Chapter 5 with an explanation and justification of the methods, as well as a description on how this study was conducted. The chapter also explains how the GOC was compiled and the way in which the data from the translations was gathered. This is followed by Chapter 6, which contains the analysis section, in which the data from the experiment is presented and analysed – first together and then individually. Finally, the results are discussed and concluded in Chapter 7.

(9)

4

2. Corpus linguistics and applied corpus-based Translation Studies

This study does not fall directly under the purview of corpus linguistics, but rather the field of corpus-based Translation Studies. However, the latter cannot be discussed without first reviewing the former, since when a corpus is used as a basis for linguistic research we enter the field of corpus linguistics. Corpus linguistics can be said to have existed as far back as 1897, when Käding used a collection of 11-million words to study the sequence of letters in langua ge (McEnery and Wilson 2010: 2–3). However, even though Käding undoubtedly employed a corpus-based methodology, the term corpus linguistics did not exist back then. Instead, research like Käding’s that took place before the 1960s is referred to as early corpus linguistics.

This early form of corpus linguistics faced a lot of criticism, most notably from the American linguist Noam Chomsky (McEnery and Wilson 1996: 5–9), whose critique of it in the late 1950s was so deeply influential as to cause a drastic shift in linguistics – steering it away from empiricism and towards rationalism (Kennedy 2014: 23–25). Chomsky believed corpora to be fundamentally skewed, since they could never account for the entirety of language and would thus provide an incomplete or inaccurate view of it (McEnery and Wilson 1996: 7–12). While Chomsky’s criticism was exceedingly influential, it was directed specifically at early corpus linguistics that existed in a period of time when the collection and processing of large quantity of data was a laboursome affair. Due to the physical nature of early corpora, the process of analysing the data was slow and prone to error (McEnery and Wilson 1996: 7–12, Kennedy 2014: 5). Now, with the advancement of computer technology, and the emergence of electronic corpora, Chomsky’s criticism has become irrelevant. As such, corpus linguistics as a method for language study owes much to the advancement of computer technology (Aijmer and Altenberg 1996: 9–10, Kennedy 2014: 3). Modern corpora contain millions upon millions of words, while far from containing everything in language, some conclusions pertaining to the nature of language use, if not of language itself, can be drawn from such quantity of data (Aijmer and Altenberg 1996: 10–15, McEnery and Wilson 1996: 13–21).

The term corpus linguistics first emerged, in its modern form, in the 1960s (Connor and Upton 2004: 1, Laviosa 2002: 5), and it can be described as an empirical approach to studying the principles and processes of language through the use of corpora (Kennedy 2014: 4). However, describing corpus linguistics simply as an empirical approach to linguistics is only one of many definitions for the term. Many linguists are unwilling to define corpus linguistics and its

(10)

5

position within the larger framework of Language Studies (see e.g. Bennett 2010: 7, Teubert and Krishnamurthy 2007: 1). To some, like Kennedy (2014: 4) corpus linguistics is clearly a methodology; a system of methods and practices through which they can inspect language. To others, such as Bennett (2010:7) or Teubert and Krishnamurthy (2007:1), the definition is more uncertain, even though they do acknowledge that there is a method being employed when the principles of corpus linguistics are used. For the purposes of this study, the term corpus linguistics will refer to a methodology in which the study of language is based on the analysis of examples of authentic language from real-life (i.e. a corpus), rather than abstract postulatio ns and rational introspection of language (McEnery and Wilson 1996: 1–2, Bennett 2010: 7).

Tognini-Bonelli (2001: 84–85) suggests that any linguistic enquiry which uses the methodolo gy of corpus linguistics can generally be classified as being either corpus-based or corpus-drive n.

A corpus-based approach (CBA) supposes a pre-existing hypothesis or assumption about language, which is then either proved, disproved or refined based on information gained from a corpus. A corpus-driven approach (CDA), conversely, would have the researcher examine corpus data without a pre-existing hypothesis and draw conclusions based solely on the evidence found in the data. The difference between these approaches is that CBA uses the corpus as a tool to test a hypothesis, whereas CDA achieves its results through inductive reasoning based on corpus data.

Tognini-Bonelli’s (2001) classification has generally been accepted among researchers as a useful one (see e.g. Storjohann 2005, McEnery and Hardie 2012), but it has also faced criticism.

Saldanha (2009: 4) argues that – while Tognini-Bonelli’s classification is useful for illustra ting the distinction between approaches – it is too simple, since CBA and CDA are not mutua lly exclusive. A study can contain aspects of both approaches. Furthermore, intuition will always play a part in any kind of study. This is something Tognini-Bonelli herself acknowledged by stating that no study can be purely inductive in its approach (2001: 84). While there is no denying the simplicity of Tognini-Bonelli’s classification, it works as a much-needed rule of thumb.

According to Tognini-Bonelli’s model, this study is classified as a corpus-based study.

However, while this study makes use of corpora, they are not the subject of scrutiny. Instead, it is the effect that the corpus use might have on the translation that is being observed. Because of its practical nature, this study also falls under the purview of applied corpus-based

(11)

6

Translation Studies (ACTS), which is an aspect of CTS that studies the practice of using a corpus in translation, translation education, and translation research.

2.1. Corpora in Translation Studies

Having discussed the subject of corpus linguistics as a methodology in the previous chapter, it is relevant to look at what types of research it has already been applied to and how this study fits among them. As stated earlier, the use of electronic corpora in Translation Studies has a shorter history than the use of corpora in linguistics (Olohan 2004: 1). However, since the 1990s, corpora have become more prominent in translation research. Thus, despite its shorter history in the field of Translation Studies, the methodology of corpus linguistics has already been applied to a wide range of research.

For example, Baker (1993: 243) advocated the idea that corpora could prove an ideal tool for searching for universals in translation. These universals are ubiquitous features or patterns in translation that have arisen as unintentional side-effects of the translation process and are not caused by interference between language pairs. Since then, perhaps inspired or galvanized by Baker’s influential work, many researchers have performed corpus-based studies to describe — rather than to prescribe — translation as a science. For instance, Laviosa (1998a) and later Xiao (2010a), used monolingual corpora to perform contrastive studies to prove the existence of translation universals. Similarly, Mauranen (2004) performed a comparative study to examine the nature of interference, and whether it is a translation universal on its own right or something that is inherent to translation as practice.

The search for universals in translation is but one example of how corpora have been used in Translation Studies. Wang (2010) performed a study, in which he examined the use of the word however in two translation corpora. These kinds of descriptive studies, which focus on a single element or phenomenon in translated language, be it grammatical, syntactical or lexical, are common within modern CTS (see e.g. Xiao 2010b).

Corpora have also been applied to many practical research questions. For instance, corpora have been an integral part of the rise and development of machine translation and computer-assisted translation (Delpech 2014: 3-7). Furthermore, translator education has been a subject of many corpus-based studies (see e.g. Marco and van Lawick 2009, Inés 2009) and some theorists have

(12)

7

taken to using corpora as means to evaluate translation quality (see e.g. Bowker 2001, Hassani 2011). The subject of this study, which is how using a corpus affects translation quality, has also been examined in some form or another by previous studies (see e.g. Bowker 1998), but more in-depth studies are needed to chart the field more fully.

With such a wide assortment of already existing corpus-based research, the methodology of corpus linguistics seems to have established itself as an integral part of modern Translatio n Studies. As Laviosa (1998b: 474) aptly states:

the corpus-based approach is evolving, through theoretical elaboration and empirical realisation, into a coherent, composite and rich paradigm which addresses a variety of issues pertaining to theory, description, and the practice of translation.

However, there are still plenty of aspects of translation that could be, and perhaps should be, investigated using a corpus-based methodology, which is why this study is relevant.

2.2. What is a corpus?

Chapter 2 defined corpus linguistics as a methodology which uses the corpus as a means to study language. Since this thesis aims to inspect the impact that using an STLC can have on a translation, it becomes necessary to define what a corpus is. In the simplest of terms, a corpus (pl. corpora) can be defined as a collection of written or spoken texts. In modern linguistics, however, this definition often fails to fully reflect the function of the word (Fernandes 2006:

88, Meyer 2002: preface). While a corpus is a collection of texts it is also more than that. In the context of CTS, and corpus linguistics in general, corpora have become associated with specific characteristics that set them apart from regular text collections, archives, and databases (Kennedy 2014: 4). However, what those characteristics are vary between definitions. To find out if there are any shared characteristics among the various definitions, I have looked at five definitions for a corpus traversing 14 years of linguistic study and compared them to each other.

The first definition for a corpus comes from McEnery and Wilson (1996: 29–32), who outline the characteristics that are associated with corpora into four categories. These categories are finite size, machine-readable form, sampling and representativeness, and a standard reference. The first two criteria seem self-explanatory. Finite size implies that a corpus is a contained unit, which remains unaltered after its creation. While older forms of corpora often abide by the criterion of finite size, there are some which are updated frequently. An example

(13)

8

of such an updatable corpus is the COBUILD corpus, which has been kept up to date by having texts added to it to account for changes in language (McEnery and Wilson 1996: 29–32). The second criterion of machine-readable form proposes that a corpus should be in a format that can be read by a machine or a specialised software called a corpus analysis tool. This characteristic is a recent addition to the definition of a corpus (Meyer 2002: preface). Before technology became advanced enough to allow the creation of electronic corpora, the term corpus was used to refer to printed or written texts collections (McEnery and Wilson 1996: 31). In modern times, printed corpora have become rarer, although they still exist. However, in modern corpus linguistics, it is unusual to encounter non-electronic corpora, since an electronic corpus allows the information to be accessed, arranged, and analysed more easily and efficiently (Baker 1995:

225). The third criterion of sampling and representativeness means that a corpus should be composed of texts that yield an accurate depiction of the language type or field they represent.

For example, a corpus of English law should be comprised of English legal texts, since an addition of other types of texts would shift the purpose and focus of the corpus. The fourth and final criterion is standard reference. It maintains that for a collection of texts to qualify as a corpus it needs to constitute a standard reference (McEnery and Wilson 1996: 32). Therefore, any study that is conducted using a standard reference corpus can be compared to previous and future research which uses the same corpus, since they employ the same database. In this way, the corpus becomes a measuring device that allows researchers to contrast studies easily. This is only possible since the corpus, to McEnery and Wilson, is a closed unit. For example, the COBUILD corpus does not constitute a standard reference, since its contents have changed, meaning that older research that used it had a different set of data than newer research.

The second definition for a corpus is by Bowker and Pearson (2002: 9–11), according to whom a corpus is a large collection of electronic, authentic texts collected according to specific criteria. This definition has many similarities with the previous definition. Both definitio ns have four criteria by which they characterise a corpus. Furthermore, the first two criteria in both definitions share a clear theme: size and form. However, unlike McEnery and Wilson, Bowker and Pearson’s criterion implies that a corpus must be large. While McEnery and Wilson do not specify size, it is generally acknowledged that an electronic corpus is far larger than is easy to analyse and collect manually (Fernandes 2006: 88.). This means that both definitions contain, or at least imply, a large size. However, what is considered large varies from corpus to corpus and from definition to definition. Moreover, what is considered large has also changed over the years. Modern technology allows the compilation of much larger corpora than before

(14)

9

(McEnery, et.al. 2006: 71). For example, The British National Corpus (BNC) contains 100,000,000 words, and was for a long time considered to be one to the largest corpora. By today’s standards, however, the BNC can be considered quite small. For example, the Corpus of Contemporary American English (COCA), which is one of the largest corpora today, contains 521,000,000 words. As such, Bowker and Pearson (2002: 45–55) acknowledge that the term large is ambiguous and that “there are no hard and fast rules that can be followed to determine the ideal size of a corpus”. It should be noted, however, that there are still some vague guidelines as to how large a corpus needs to be, but those will be discussed more fully in Chapter 2.4. The second criterion of electronic form has the same meaning as McEnery and Wilson’s machine-readable form. Bowker and Pearson’s third criterion is authentic texts. This means that the texts, which comprise a corpus, are real instances of written or spoken discourse that were not created to be used in a corpus (Bowker and Pearson 2002: 9–11). Finally, a corpus must contain texts that serve its purpose. This is what the fourth criterion of specific criteria posits. A corpus must be compiled out of texts chosen to be representative of a language variety or type (ibid.). This term overlaps with sampling and representativeness, as defined by McEnery and Wilson, increasing the number of similarities between the definitions. In addition, Bowker and Pearson’s concept of authentic texts contains notions of representativeness, since an authentic text would be representative of its own genre and type.

The third definition for a corpus is that of Fernandes (2006: 88), who describes a corpus in terms of size, electronic form, representativeness, and open-endedness. Upon inspection, it is evident that the first three of Fernandes’ criteria are synonymous with the criteria in the previous definitions. Like Bowker and Pearson (2002), Fernandes (2006: 88) acknowledges it is difficult to determine what size a corpus should be, but a corpus is customarily associated with large size. He goes even further by stating that “the issue of corpus size in CTS becomes a relative one in the sense that qualitative aspects sometimes may be more relevant than quantitative ones.” (ibid.). By this he means that the size of the corpus is not directly relative to its quality and as such, the size of a corpus can vary greatly depending on its purpose. If the purpose of the corpus is to provide an overview of general English language, then it will inevitably end up being much larger than a corpus describing a narrower, specialised field, such as ophthalmology. However, Fernandes does say that this is the case only when the other qualities of a corpus outweigh the need for large sample size (ibid.). The second criterion of electronic form is identical to the ones presented before, while the third criterion of representativeness directly corresponds with McEnery and Wilson’s sampling and

(15)

10

representativeness and the notion presented by Bowker and Pearson in their criterion of specific criteria. Fernandes’ fourth criterion for a corpus is open-endedness, which differs from all the other presented criteria. By open-endedness, Fernandes’ refers to flexibility: the ability of the researcher to apply the corpus, or parts of it, to any research question as they see fit (Fernandes 2006: 89). This criterion of accessibility runs contrary to McEnery and Wilson’s view of a corpus as a closed unit that must be used as it is for it to constitute a standard reference for all associated research. Overall, Fernandes’ view of a corpus is very similar to both of the previously explored definitions.

The fourth definition is by Teubert and Cermáková (2007: 140), who define a corpus as: “a collection of naturally occurring texts in electronic form, often compiled according to specific design criteria and typically containing many millions of words”. In other words, a large, electronic collection of natural texts that have been gathered according to specific criteria.

Teubert and Cermáková’s definition matches with that of Bowker and Pearson. Where the latter speaks of authentic texts, the former refers to the same notion as naturally occurring. The same happens with specific criteria and specific design criteria. While the terminology is slightly different the general ideas are similar.

The fifth and final definition comes from Bennett (2010: 2), who claims that a corpus is a large, electronically stored, principled collection of naturally occurring texts. Once again, the definition, even on a superficial basis, shares several characteristic with all the previously presented definitions. Large size, electronic form, and naturally occurring texts are all criteria which have been covered earlier and their meaning remains the same. The final criterion of principled is the same as Bowker and Pearson’s specific criteria – almost to the letter. Bennett (2010: 14) explains principled collection as “meaning that the language comprising the corpus cannot be random but chosen according to specific characteristics.” This is a remarkable similarity, considering that Bowker and Pearson (2002: 10) say that “a corpus is not simply a random collection of texts […] rather, the texts in a corpus are selected according to explic it criteria”. As such, the definitions presented here share many of the same criteria and notions as to what constitutes a corpus. Considering that Bennett’s and McEnery and Wilson’s definitions share some characteristics despite having been published 14 years apart, it is reasonable to say that there are some universal criteria which seem to apply to all modern corpora.

(16)

11

To summarize, it appears that there exists a consistency, and a continuum, as to what the most prominent characteristics of a corpus are or should be. In the light of the comparisons, both above in text and summarized in Table 1, it can be said that there appear to be some aspects that most corpora share with one another. First, among the definitions inspected here, each contains some mention of size, form, and representativeness. While McEnery and Wilson did not specify size separately in their definition – beyond that it should be finite – an electronic corpus is historically seen as one that is far too large to have been compiled physically (Bowker and Pearson 2002: 9–11, Fernandes 2006: 88). Thus, I am inclined to believe that McEnery and Wilson’s criteria of finite size contains an implicit notion of largeness in it as well. Thirdly, all definitions claimed that a corpus should be electronic in form and representative of the langua ge it contains. Finally, three of the five definitions contained a clause insisting on authentic texts to be used, when compiling a corpus. This discrepancy between the definitions acts as a reminder that there is no unanimous agreement on what a collection of texts must be for it to constitute a corpus, though there are some universally shared traits.

Table 1. Comparison between the definitions. Green means similar/corresponding criteria, red means different criteria.

Size Form Representativeness Authenticity

McEnery and Wilson (1996)

Finite size Machine- readable

Sampling and representativeness

Standard reference Bowker and

Pearson (2002)

Large Electronic Specific criteria Authentic

Fernandes (2006)

Size Electronic Representativeness Open-endedness

Teubert and Cermáková (2007)

Large Electronic Specific design criteria

Naturally occurring

Bennett (2010)

Large Electronic Principled Naturally

occurring

Ultimately, it falls on the researcher to determine what they mean with a corpus. For the purposes of this study, the term corpus is used to refer to a large collection of authentic texts in an electronic form that has been chosen with a specific purpose in mind to illustrate a certain

(17)

12

aspect or variant of language. This description of a corpus resembles most of the definitio ns presented above.

2.3. Typology of corpora

The previous chapter defined what a corpus is and demonstrated that there appears to be some universally shared characteristics among all types of corpora. However, these traits merely describe what constitutes a corpus on a basic level. In practice, corpora differ from one another depending on how they are constructed (Bowker and Pearson 2002: 11) and different types of studies will make use of different types of corpora. The typology of a corpus is often classified according to sets of contrastive criteria (Laviosa 2010: 80) as shown in Table 2 below, which contains the six of the most frequently used sets of these criteria. It should be noted, however, that these only represent the most common criteria, and that many other ways for classifica tio n also exist.

Table 2. Illustration of the contrasting corpus criteria

General Specialised

Written Spoken

Synchronic Diachronic

Open Closed

Monolingual Multilingual

Parallel Comparable

The first set of contrastive criteria is general and specialised corpora. A general corpus is representative of everyday language (Laviosa 2010: 80, Bowker and Pearson 2002: 11–12) and it often contains a varied selection of text types. This type of corpora is usually compiled from both written and spoken material from a multitude of sources. Examples of general corpora include the aforementioned COCA and the BNC. As a general corpus, the BNC contains a wide selection of written and spoken British English from various sources ranging from books and newspapers to radio programs and informal conversation recorded by volunteers.

In opposition to the broad nature of general corpora there exists the specialised, or special purpose, corpora. A corpus created for a special purpose has been compiled to provide a

(18)

13

comprehensive but narrow view of a particular aspect or variety of language (Bowker and Pearson 2002: 11–12, Laviosa 2010: 80). A special purpose corpus usually depicts the langua ge of specific field of knowledge, such as engineering or biochemistry. The General Oncology Corpus (GOC) used in this study is this kind of specialised corpus, which aims to provide the translator with field specific knowledge.

Where general corpora can be used to make observations about language in general, specialised corpora can only be used to observe the specific and singular type of language it contains (Bowker and Pearson 2002: 11–12, Laviosa 2010: 80). However, these two types of corpora can be used in conjunction with each other to determine which features are specific to one variant of the language in question.

The second set of criteria is written and spoken corpora. A written corpus is compiled entirely from samples of written language, whereas a spoken corpus, such as the Michigan Corpus of Academic Spoken English, consists of spoken language that has been transcribed so that it can be accessed by a corpus analysis software that works in text format (Bowker and Pearson 2002:

11–12, Laviosa 2010: 80). Most corpora contain a combination of both written and transcribed spoken material, although pure variants of both exist. In some instances, like with the Lancaster/IBM Spoken English Corpus, the actual recordings are available separately, creating a kind of audio corpus (McEnery and Wilson: 1996: 31). These kinds of corpora are vital in phonetic analysis. Moreover, there are also multimedia corpora, which contain combinations of video, audio, and subtitling. An example of such a multimedia corpus can be found on the Internet at www.playphrase.me. Playphrase is a collection of 250,000 video fragments and more than 100,000 phrases intended as a learning aid for students of English. The website contains clips from various TV shows and movies with the associated audio and English langua ge subtitling that the user can inspect using the sites own search method.

The third set of criteria is synchronic and diachronic corpora. A synchronic corpus is one that contains examples of language use from one particular period of time (Bowker and Pearson 2002: 12, Laviosa 2010: 80). For example, a corpus containing all the football commentary from the past half a year would be classified as synchronic. Contrastively, a diachronic corpus is one that contains language use from either a long stretch of time or from several differe nt points in time. As a consequence, a researcher can compare the texts in the corpus to see how language has changed, since the texts form a timeline of language use.

(19)

14

The fourth set of criteria is open and closed corpora. An open corpus, such as the COBUILD monitor corpus mentioned earlier, is a corpus that can be expanded indefinitely, meaning that new texts can be added to it at any time (Bowker and Pearson 2002: 12). This allows for researchers and dictionary makers to see new words emerging and meanings changing (Bowker and Pearson 2002: 13). These types of corpora are uncommon due to the constant need to keep them up-to-date (McEnery and Wilson 1996: 29). Unlike the open corpus, a closed corpus is a set and finite whole which remains unaltered after it has been created (Bowker and Pearson 2002: 13). This makes them subjectable to ageing, especially in fields of language that are developing quickly (e.g. computer science). However, they are easier to maintain and constitute a standard reference for all research that uses them, as discussed in the previous chapter.

The fifth set of criteria is monolingual and multilingual corpora. As the names suggest, a monolingual corpus contains only one language, while a multilingual corpus contains two or more languages (Bowker and Pearson 2002: 12, Laviosa 2010: 80). A multilingual corpus can be further categorised into two distinct types: comparable and parallel corpora (Bowker and Pearson 2002: 12), which are discussed below.

Comparable and parallel corpora form the sixth and final set of criteria. A comparable corpus contains texts which share a common feature by virtue of which they can be considered to have the same communicative function (Bowker and Pearson 2002: 12, 93). This shared feature can be almost anything. For example, the time of publishing, topic, genre or subject field. However, deciding what the shared feature in the texts is essential, since that is the only link between the texts (McEnery, Xiao, and Tono 2006: 46–47). Therefore, each text must be representative of that shared feature. In contrast, a parallel corpus is always linked, since it is compiled out of source texts and their corresponding target texts.

The usage of a comparable corpus is similar to that of a parallel corpus. Both corpora are used for translation and contrastive studies, but they also have their own merits and flaws which affect their use (McEnery, Xiao, and Tono 2006: 48). Parallel corpora are good when studying how a message is conveyed between two languages, since the texts are translations of each other. However, parallel corpora suffer from what is referred to as translationese – a type of ungrammatical or awkward language caused by overly literal translations. This means that parallel corpora are not the best method for studying the differences between two langua ges.

(20)

15

Conversely, a comparable corpus provides a useful tool for such contrastive studies, but is less useful when trying to find translation solutions.

The other subtype of multilingual corpora is a parallel corpus, which presents the same texts in their original language and their translations in one or more languages (Bowker and Pearson 2002: 12). The notion of parallel corpus has existed several centuries before modern corpus linguistics (McEnery and Wilson 1996: 70). For instance, bibles containing biblical texts in several languages side by side have been produced since medieval times. A parallel corpus can be either bilingual, meaning that it contains only the source and target texts in one language, or multilingual, which can contain translations in any number of languages (Bowker and Pearson 2002: 92).

A parallel corpus can be used to find correspondences between texts from two differe nt languages (McEnery and Wilson 1996:70). In other words, they can be used to find, or at least assist in the search of, a translation solution for a phrase or a word. This is the reason the use of parallel corpora has led to significant progress in the field of machine translation (Brown et al.

1991, 1993). Parallel corpora also enable the identification of terminological similarities between languages (Bowker and Pearson 2002: 194). Furthermore, using a parallel corpus can help a translator to use more idiomatic phraseology based on the language found in the corpus, thus improving the stylistic aspect of a translation. However, it should be noted that the accuracy of the information gained from a parallel corpus depends entirely on the quality of the translations it contains. If the translations are incorrect then the user of such a corpus is very likely to repeat the same mistakes.

The benefits of using parallel corpora as translation aids seem clear. However, parallel corpora do have a disadvantage as well. Very few free, ready-made parallel corpora are available, and compiling one takes considerable time and effort (Bowker and Pearson 2002: 198). When translating a text from a specialised field, it is unlikely that texts of that specific type are available in both source and target language, although some exceptions do exist (e.g. manuals and product labels). This makes the availability of parallel corpora, especially in the field of specialised language translation, scarce and the time and effort needed to compile one often outweigh the benefits.

Based on the above criteria, the type of corpus used in this study is a closed, written,

(21)

16

monolingual, specialised corpus. In other words, the GOC is a specialised target language corpus (STLC). While it might seem that using a parallel corpus would yield better results, since they are better suited for translating, they pose a problem. As Wilkinson (2006a) points out, parallel corpora are scarce, especially when trying to find one for specialised field of knowledge. Thus, relying on premade parallel corpora is an unreliable strategy for translating specialised language. Moreover, compiling a parallel corpus is very labour intensive for most fields, since it requires access to translations of the same type of text in two languages. In a real-life scenario, where the translator must search, assimilate, and apply new informa tio n quickly, compiling a parallel corpus for each commission separately is an unlikely prospect.

Contrastively, if a translator works frequently with the same types of texts, the possibility of compiling a parallel corpus becomes more valid because of the access to both source and target texts from which the corpus can be compiled from. This type of situation is unusual, however, since unless a translator has their own specialty it is unlikely that similar commissions are received regularly enough to warrant compiling a corpus.

In contrast, to compile a monolingual corpus all that is needed are texts of the specialised field in the target language (Bowker and Pearson 2002: 198–199). Although a monolingual corpus might not be the best option for a translator in terms of overall usability, it can become a valuable tool despite not containing direct translations. As presented by Wilkinson (2005), a monolingual corpus can be used to verify intuition. Furthermore, reflecting ideas and hunches off of a corpus can yield insight into how the specialised target language works, and it can also reveal equivalences between the two languages. As a result, a monolingual corpus becomes a tool of reflection that can help with choosing between multiple choices of translation, steering the translator towards a more idiomatic and terminologically accurate translation (ibid.). To my mind, this is the greatest strength a monolingual corpus has, since it can lead to surprising insights about special fields through introspections of one’s own choices.

2.4. Compiling corpora for translation purposes

The previous chapters have defined a corpus on a theoretical level: what it is, how it is classified, and why it is useful as a translation aid. To compile a corpus for research or translation purposes, however, puts that theory to practice. Since this study makes use of three different corpora – two for evaluating the target texts and one for the use in the study itself – it is necessary to inspect how a corpus is made.

(22)

17

Corpora that are compiled specifically in order to answer a research or a translation question are called ad hoc corpora (sometimes called DIY [do-it-yourself], disposable, or virtua l corpora), and are often abandoned after they have fulfilled their purpose (Tengku Mahadi, et al. 2010: 15). Ad hoc corpora exist apart from various research corpora, such as the BNC, which are more cemented in their status and use. In addition, they are mostly compiled electronica lly through the Internet, which raises legal and ethical issues which will be discussed at length in Chapter 2.5.

The process of compiling an ad hoc corpus follows certain guidelines, although, as previously stated, there are no hard and fast rules to determine some of the aspects of a corpus. The first step in compiling an ad hoc corpus is to decide upon its purpose (Tengku Mahadi, et al. 2010:

15). In many ways, the purpose of the corpus is synonymous to its typology, which was discussed in the chapter above, and it determines many aspects of the corpus. For example, a general corpus will be larger than a specific one. If the corpus is not about language use during a certain time, it is diachronic instead of synchronic, thus making it easier to compile, since it can contain texts from different time periods. However, some aspects, for example, whether or not the corpus will be a bilingual parallel corpus or monolingual target language corpus depend on the researcher, since different types of corpora can fulfil the same purpose in different ways.

The second step is to determine the size of a corpus, which – as mentioned in Chapter 2.2 – can vary greatly. The purpose of a corpus provides some guidance as to how large or small the corpus should be, but it does not define it completely, since even a small corpus can provide sufficient examples of a particular linguistic feature or grammatical device (McEnery, et al.

2006: 72). For example, corpora that aim to study grammatical devices, such as how main and subordinate clauses work, can be smaller than a general corpus. This is because the syntactica l freezing point of the investigated grammatical device, which is the point after which the frequency of the device does not change, is relatively low (ibid.). Another limiting factor can be the amount of data available, as is the case with some historical texts or dead languages. This means that the corpus ends up being smaller than it perhaps should be in an ideal setting, because there simply is no data to be had. At other times, for example, during a translatio n commission, it is simply impractical to use up time to compile a large corpus to address a relatively small translation issue. There have been efforts to create statistical formulas to estimate the “required” size of a corpus. Such models, however, tend towards the largest possible size in order to provide a reliable model that accounts for even the most infreque nt

(23)

18

language features (Tengku Mahadi, et al. 2010: 17). In conclusion, the size of the corpus hinges on so many practical and theoretical considerations that defining what is an “ideal” size for a corpus becomes an issue that should be resolved on a case-by-case basis, depending on the issue the corpus is designed to address, and what is practical (McEnery, et al. 2006: 73).

The third step is to ensure that the corpus is representative (McEnery, et al. 2006:73). In part, the purpose for which the corpus is compiled defines what can be considered representative.

For instance, the Corpus of Contemporary American English (COCA) needs to be representative of its namesake. To do this, it must contain contemporary American English from a multitude of sources; e.g. fiction, popular magazines, newspapers, and academic journals. It cannot contain other languages or variants, because it would change the purpose of the corpus. In addition, the collected texts must be current, complete, and by many authors.

According to Wilkinson (2006a), this would ensure that the corpus is both up-to-date and that no language features are lost because they are present in only a part of the text material To summarize, when compiling a corpus for research purposes it is important to take in consideration the three guidelines of purpose, size, and representativeness. The three corpora used in this study were compiled with these factors in mind. First, the research corpus for this study is a general corpus of oncology. Therefore, to fulfil that purpose, it needed to be large enough to depict the entirety of the field. Thus, the GOC contains 288, representative, full articles on various fields of oncology. The articles can be considered representative because they come from three different English language oncology related medical journals. These articles amounted to 2,236,064 words. Therefore, the corpus is quite large for a study of this size. Furthermore, the articles are from different periods from the past 60 years, thus further enhancing its purpose as a general corpus.

The evaluation corpora were, in comparison, much smaller – amounting to 11,300 and 117,306 words, respectively. However, the purpose of the evaluation corpora was to help determine the target language terminological translation solutions. Thus, they did not need to be as large. To achieve this goal of evaluating terminological accuracy, the texts comprising the corpora were chosen among the English language source/parallel texts and abstracts of the Finnish langua ge source texts. This way it was certain that the terminology used in them was representative of what the original source text authors were aiming for.

(24)

19

2.5. The legal issue with compiling corpora from the Internet

The previous chapter explained how to compile a corpus, but it did not discuss the legal and ethical concerns it raises. Since compiling a corpus from sources on the Internet is so simple and effortless, it is easy to forget the issue of copyright. After all, since the material is already public, using it without permission does not seem like a breach of copyright – especially if none of the material is published. However, this study must still address this issue in order to justify the use of corpora, which have been compiled from Internet sources.

According to Wilkinson (2006b), as long as the corpus is used solely for private purposes there are no legal issues. This notion is echoed by McEnery and Hardie (2012: 57–58), who draw attention to the fact that copyright is an issue only if the collected texts are shared. Just because the material can be found for free on the Internet does not mean it can be distributed freely.

Copyright, unless specifically stated otherwise, applies to all texts on the web as it would to physical ones. Wilkinson (2006b) defers to fair dealings/fair use laws, when it comes to justifying the sharing of a corpus on a legal level. This view is shared by Hilton (2001), who indicates that as long as sharing the corpus does little to no harm to the original author’s property rights and serves to further research there is no need to worry about copyright. However, an issue with fair use laws that Wilkinson (2006b), as well as McEnery and Hardie (2012: 60), point out is that they vary from country to country.

Fortunately, for this study, the law of fair use (Tekijänoikeuslaki 8.7.1961/404) applies in Finland (Wilkinson 2006b). While there have been revisions to the law, most recently in 2015, it still allows for fair use of copyrighted material in moderation under non-profit academic conditions (Finlex). However, fair use is not infallible and to avoid copyright infringeme nt, McEnery and Hardie (2012: 57–60) suggest various methods from sharing only the web addresses of the material to using only material that is free of copyright. Ultimately, they do note that the only completely safe way is to acquire permission from the authors and publishe rs of the texts which comprise the corpus. This appears to be the consensus. While Kilgarriff (2002) and Wilkinson (2006b) both agree with the previous statement, they also say there is simply too much grey area in the legislation and that publishers are unlikely to be interested in private non-profit corpora – a notion McEnery and Hardie (2012: 60) share.

(25)

20

Since this study makes use of corpora, the legal issues inherent in compiling a corpus from online sources are pertinent. This is why the GOC and the evaluation corpora used in this experimental study were compiled with copyright in mind. The GOC contains free to view articles from the archives of three oncology journals: American Journal of Cancer Research (AJCR), British Journal of Cancer (BJC) and Nature Reviews Cancer (NRC). The AJCR is an open access journal, meaning it publishes articles under “the Creative Commons Attributio n Non-commercial License, enabling the unrestricted non-commercial use, distribution, and reproduction of the published article in any medium, provided that the original work is properly cited” (American Journal of Cancer Research – open access journal of oncology). Since the corpus contains citations to AJCR and the names of the authors it is properly cited, thus fulfilling the clause. The second journal, BJC, does not restrict the fair use of their articles in non-commercial academic work under Creative Commons License, Attribution-Non- Commercial-Share-Alike 3.0. (http://www.nature.com/bjc). While the NRC does not have a similar Creative Commons policy like the other two sources they do note on their website (www.nature.com) that:

Nature Publishing Group grants permission for authors, readers and third parties to reproduce material from its journals and online products as part of another publication or entity. This includes, for example, the use of a figure in a presentation, the posting of an abstract on a web site, or the reproduction of a full article within another journal. Certain permissions can be granted free of charge; others incur a fee.

It appears they are more concerned about re-publication of their content than it being used in a non-profit research context without any part of it being reproduced or shared in public. In addition, Nature Publishing Group only manages the copyright, which belongs to the author.

Thus, since the corpus used in this study is a private, non-profit, source citing, moderate collection of few articles (not whole issues of NRC), and since the material is freely obtainable on the Nature Reviews Cancer website, it can be used under the law of fair use.

As stated in the previous chapter, the two evaluation corpora used to determine the accuracy of the translations in this study were compiled out of the English language source/parallel texts and abstracts of the Finnish language source texts. The same reasoning that using freely available public materials for research purposes without publishing them is in accordance with the law of fair use applies to the evaluation corpora as well. While parts of the corpus are presented in this thesis as examples and justification for the terminological equivalents in this study, they do not show enough of the original text to be considered a breach of copyright.

(26)

21

To summarize, the legal and ethical ramifications of compiling a corpus from the Internet without the permission from copyright holders are many. Therefore, the corpora used in this thesis have been compiled from sources, which are open access or allow access and use under the laws of fair use. Furthermore, no parts of the corpora are published or shared with anyone and the corpora were deleted after the study concluded – ensuring that no infringement could occur on a later date.

Viittaukset

LIITTYVÄT TIEDOSTOT

Secondly, the study corpus of ST and TT Nadsat instances will be used to determine whether Konttinen’s Finnish translation of the novel follows a ST or TT oriented global

The act of translation itself could be summarized simply as “replacement of a text in the source language by a semantically and pragmatically equivalent text in the target lan-

The aim of the present study is to find out the global translation strategy applied to the proper names and common nouns specific to the British culture in the novel This

The aim of this thesis is to find out, whether the translators who are working in the field of audiovisual translation (from now on referred to as AVT) in Finland are

In intersemiotic translation, the accent of actors is generally intentional, whereas the objective of this study is to explore the audience reactions and the motivation behind them

The aim of this study was to examine if and how the original narratives get altered in the translation process by comparing twenty Finnish language online news articles

My aim in this study was to find out which local translation strategies are most commonly used in the Finnish subtitles of the Gilmore Girls when translating the popular culture

The purpose of this study was to discover which translation strategies were used for the translation of culture- and language-specific wordplay and allusions from English into