Thursday, September 18, 2014

A phonological and phonetic description of Sumi, a Tibeto-Burman language of Nagaland

So I should probably apologise / apologize for my lack of updates the past year or so. It's been pretty crazy since I started grad school - I'd have to spend many a blog post explaining all the wonderful things I've been able to do since I started in the linguistics PhD programme here at the University of Oregon.

In the meantime, in the 'American' spirit of self-promotion, I thought I should mention that I finally finished revising my University of Melbourne MA thesis A phonological and phonetic description of Sumi, a Tibeto-Burman language of Nagaland and got it published with Asia-Pacific Linguistics in Canberra.
It's an open access ebook (print on demand), and you can download it right here at the ANU digital collections page here.

I have too many people to thank for this, especially my family who've supported me all through this crazy journey, as well as the Sumi community / my Sumi family. I'm so thankful for all the amazing people I've met along the way, and all the help I've received in making this possible. Noshikimithi va na!

Monday, December 2, 2013

The examples linguists use

My apologies to all my readers, I just haven't had all that much time to blog since I started grad school, though I have a lot of things I'd like to blog about! (I'll be making time after finals week next week to catch up on my posting.)

Thanks to the Nom Nom Linguistics Facebook page, I just found out about this Tumblr site called
Linguistics Sample Sentences: http://lingsamplesentences.tumblr.com/

Here you can see a selection of the weirdest / funniest / slightly more obscene examples that linguists use to illustrate various points about the grammars of other languages. Sometimes linguists need these 'weird' examples to see how a language performs a certain function. Sometimes these examples highlight how creative the speakers of a language can be.

And sometimes linguists just choose the weirdest examples for comic relief. (Because talking about grammar.)

In general, I'm told we sound like a violent bunch. If we're trying to study something like transitivity -simply put, the ways in which languages describe an event that involves more than 1 participant- the most common examples you see tend to involve a verb like hit, e.g. John hit Mary or Mary hit John. However, I've even been told that hit is not always the best example of a transitive verb (for the linguists: this is because in some languages, the verb hit may take an argument with locative marking instead of patient marking), so what you really need is a verb like kill to illustrate the point!

Great, even more violence.

I think the weirdest sentence I've had to elicit from a language consultant was "The man cooked me for the chicken." But I'm sure there'll be weirder ones to come!

[Note: the point of such examples is not and should not be to make fun of a language or speakers of a language - if anything, we're both showing appreciation and poking fun at the nature of the science  for (a) making linguists ask speakers of a language to say a particularly unnatural utterance; and/or the linguist themself for (b) choosing that particular example to put in a publication just to illustrate a certain point, when another (though less humorous) example would have sufficed. But it's what you have to do if you're trying to work out the genius and creativity underlying any spoken language.]

Saturday, October 26, 2013

On Not Having a Mother Tongue

At the moment, I'm TA-ing for a course called Language and Power here at the University of Oregon, and I've been recounting the following story to my students.

It happened more than 10 years ago after I'd just moved from Singapore to Melbourne. I was at my university orientation, where I met a number of people, including a guy from Sweden. We got to talking, and he eventually asked me what languages I spoke. I told him that I spoke English and some Chinese (Mandarin), but that my Chinese wasn't very good.

The very next thing he said to me was, "Oh, so you don't speak any language well!"

Before I could recover from the shock of what he'd just said, he quickly proceeded to 'correct' my English. I remember we were talking about purchasing textbooks for our courses at a particular bookshop. I said something like: "You can get them cheap over there." He told me that it should be: "You can get them cheaply over there." because you need an adverb with the verb 'get'. At that point, I said something like, "No, I'm using it as an adjective to describe the thing I'm getting." But it was clear that I had little say in what was 'right' or what was 'wrong'.

Now this was before I'd started any formal study in linguistics, but I had had 'English grammar' lessons in school in Singapore, with explanations given for many 'grammatical rules'. Of course, people like me were a pain for our English teachers because they'd give us a particular phrase or sentence, and ask us why it was 'correct' or 'grammatical'.

We'd just say, "Because it sounds right."

And that's the thing about your 'mother tongue' - you don't need to be formally taught the rules of the language in school. Through enough exposure as a child, you just know what 'sounds' right and what doesn't. That knowledge is what linguists usually think of as 'grammar' - it's not the rules that you are explicitly taught in a classroom (unless the language is not your native language), it's knowing how to say things that don't sound odd to either you or the people from the community you grew up with.

To be repeatedly confronted and told that my mother tongue - the language I used at home and in daily life, and the language I knew best (let's not even go into what Singapore calls one's 'mother tongue') was 'incorrect' or defective has had a few effects on me. On the downside, I find it difficult to claim 'ownership' or 'expertise' in English. Even now I am quick to get defensive about my own linguistic knowledge, sometimes justifiably so, but sometimes I perhaps get a little too defensive. On the upside, I've often felt motivated me to learn more languages (to varying degrees of fluency). Most importantly, this insecurity has made me delve deeper into the field of linguistics.

Jacques Derrida, in his book Monolingualism of the Other, wrote, "I have but one language - yet that language is not mine." While his words can be interpreted on many different levels (his central thesis was that we are all alienated from our 'mother tongue'), I can think of no better quote to apply to the linguistic situation I find myself in. I also imagine that this is something many people in the modern world whose 'languages' or 'dialects' are looked down upon and vilified can relate to.

(Yes, I ended that last sentence with a preposition. And yes, it's perfectly grammatical to do so in English.)

Monday, September 30, 2013

Fun with tone sandhi - The solution!

Okay, I apologise for the long delay, but finally(!), I present you with the solution to the problem set I posted in my last blog post, many months ago (see here).

(Right click the image below and select 'Open Image in New Tab'.
Or click here for an image you can magnify.
The language is Singaporean Teochew, as spoken by an aunt of mine who lives in Singapore. It's part of the Min Nan group of languages, but Singaporean Teochew is said to have undergone dialect leveling with Singaporean Hokkien - the two are much more mutually intelligible than their counterparts still spoken in China today. Also, although most descriptions of Teochew give 8 tones, I've only been able to find 7 contrastive ones - but there might still be an 8th one that I've missed!

I know I was supposed to post this in mid-June, but a lot of stuff came up, including a move to the United States (via Australia). As some of you may already know, I've just started grad school at the University of Oregon, where I am pursuing a PhD in Linguistics. It's a really exciting time for me. I'll be heading back to India at some point during my course, but unfortunately not this year.

Looking forward to posting about all the cool linguistics topics I'll be looking at during the next year!

Saturday, May 25, 2013

Fun with tone sandhi

The past few months, I've been learning a language here in Singapore that's been noted for its crazy mind-bending use of tone sandhi. I thought I'd write a little about it in this post, since it's a phenomenon that some linguists may not be familiar with (given the tendency for many to run away at the first 'hearing' of anything tonal). At the end of this post, I'm also going to throw in a little puzzle set that I created, just to give people a chance to see the sorts of data some linguists work with. I'm hoping it'll appeal to all the puzzle solvers out there.


Tone sandhi in Mandarin Chinese
Experienced learners of Mandarin will already be familiar with the phenomenon, exemplified by the initially confusing and dreaded rule that specifies that Tone 3 becomes Tone 2 before another Tone 3. This prevents you from saying two Tone 3s, one after the other. For example, the word for 'you' in Mandarin is 你 nǐ (with Tone 3) when said on its own and the word for '(to be) good' is 好 hǎo (also Tone 3). However, when you put them together to get the ubiquitous Mandarin greeting 你好, written as  hǎo in Pinyin, you find that 你 is now pronounced with Tone 2. (This makes it homophonous with 泥 'mud', but most speakers can work out from context that you're not talking about the quality of earth.)

Importantly, the rule applies whenever two Tone 3s occur next to each other in the same phrase, regardless of the actual meaning of the words. Using another example, 很 hěn, an intensifier with the meaning of 'very', remains as Tone 3 in phrases like 很多 hěn duō 'a lot' and 很快 hěn kuài 'very fast', since 多 duō has Tone 1 and 快 kuài 'has Tone 4. But if you want to say 很好 hěn hǎo 'very good', you would have to pronounce 很 as hén, with Tone 2.

Ask a native speaker of Mandarin why on God's less-than-green earth they would say 你好 or 很好 this way, and they'll probably just say that 'it sounds nicer'. There's also actually no physiological, or aesthetic, reason preventing you from producing two Tone 3s in a row. The thing is, tone sandhi rules are language-specific: some tone languages do allow sequences of similarly low (and creaky) tones to occur next to each other, while others may disallow sequences of two falling tones, which Mandarin does allow.

Of course, if you're only interested in learning a tone language that does have tone sandhi, it doesn't really help to ask why it happens, or for instance, why Tone 3 becomes Tone 2 and not Tone 4. You just need to accept that it does happen and that it happens the way it does. And then you need to learn how to apply the tone sandhi rules in actual speech so you don't sound completely moronic.


Tone sandhi vs Tone change
On the other hand, if you're in the business of describing tonal languages, tone sandhi is something that pops up again and again. It can sometimes be a little tricky to talk about, since there's still some disagreement as to how to what the term 'tone sandhi', sometimes called 变调 biàndiào in Mandarin, should include. At least, it is generally accepted that 'tone sandhi' differs from 'tone change', or 变音 biànyīn, which describes similar kinds of tone alternations that are restricted to specific words, largely due to historical reasons. For example, 好 when pronounced hào with Tone 4, means 'to be fond of' (example taken from Chen 2000: 31) - here you can see the connection with 好 hǎo '(to be) good', which indicates a likeable quality. However, this correspondence between Tone 3 and Tone 4 is specific to 好, and changing Tone 3 on another word to Tone 4 is not likely to yield a similar change in meaning.

In contrast, tone sandhi rules, which can also be the products of historical changes in a language, are more 'general', in the sense that they almost always apply regardless of the meaning of words as long as the necessary sound environment condition is present. However, there are instances when tone sandhi rules are not strictly observed - even native Mandarin speakers may sometimes fail to observe the rule described above when confronted with new compound words consisting of Tone 3 + Tone 3.


A tone sandhi puzzle
In the process of learning this tonal language in Singapore, which I'm calling 'Language X' for the moment, I came up with a little puzzle involving tone sandhi. It's similar to the problem sets we give out to undergraduate linguistic students, except I've simplified it a little so you don't need a lot of linguistic knowledge to solve it. I've used the letters A-G to indicate the tones, as well as some symbols known as Chao tone letters which give a visual representation of the tones. The 'stopped' tones refer to tones on words that end in the consonants k and h.

You can view a draft of the puzzle below. Now this may not be the easiest puzzle to cut your linguistics teeth on, but I hope it gives you a taste of the sorts of data linguists work with, and the kind of analytic skills required to describe languages.

(Right click the image below and select 'Open Image in New Tab'.
Or click here for an image you can magnify.) 


The solution will come in mid-June!

[I may have to post less frequently than I already do this coming month because I'm busy revising my Masters thesis to get it published.]


Reference
Chen, Matthew Y. 2000. Tone sandhi: Patterns across Chinese dialects. Cambridge: Cambridge University Press.

Tuesday, May 14, 2013

Issues with Ice Age linguistics

Last week I had a few friends ask me about a recently published study titled "Ultraconserved words point to deep language ancestry across Eurasia" by Mark Pagel, Quentin D. Atkinson, Andreea S. Calude and Andrew Meade. It's been making headlines all over the globe in articles with titles like "English May Have Retained Words From an Ice Age Language" (Wired.com), "Ice Age language may share words with modern tongues" (News.com.au and various sites) and "15000-year-old 'fossil' words reveal ancestral Ice Age language" (LA Times).

You can download their report here. Also, the data for the study comes from the Languages of the World Etymological Database, which can be accessed at this site.

As always, Language Log has a great post by Sally Thomason that highlights many of the issues about the study here, including issues with both the data and methodology. Similarly, another post at GeoCurrents by Asya Pereltsvaig rubbishes the study.

Now, before you go and cry 'Academics marking territory!', there are very good reasons to take the study by Pagel et al. with a sea-ful of salt. But let me start with a short personal anecdote and brief introduction into the world of historical linguistics. Also, if you're a believer in Nostratic, you should probably just ignore this post altogether.


Nagaland and the Yucatec Peninsula?

A few years ago, a friend of mine from Nagaland in North-East India saw Mel Gibson's Apocalypto and was astounded that her language and Mayan (technically, Yucatec Maya) shared a number of words in common. She thought the two languages might be related and asked me about it. I told her this was highly unlikely given (a) the geographic distance between the two and (b) the lack of any recent contact between the people of Nagaland and the Mayans. Of course, I could tell she was still sceptical of my response even some time after.

Now my dismissal of her theory wasn't just because I found the geographic distance and lack of recent contact problematic (or the fact that she was basing her observations on translations given in the subtitles). It was the fact that given the geographic distance and the lack of recent contact, the words she cited were just too similar in both pronunciation and meaning. Such similarity between cognates, that is, words in related languages that are descended from the same etymological source (and not through borrowing), is actually highly unlikely. Such words rarely keep both their original form and meaning as time goes by, and the languages they belong to drift apart. As an example, let's look at the Italian word for 'dog': cane (pronounced /ka.ne/, like 'car-nay' with a [k] sound at the start). The French equivalent is chien (pronounced /ʃjɛ̃/ with a sound usually written in English as sh). Despite both words deriving from Latin canis, the modern equivalents in Italian and French sound quite different.


Historical Linguistics 101




(Image by Koryakov Yuri, taken from Wikimedia Commons)

To address this problem of sound change, most historical linguists apply what is known as the Comparative Method. The idea is to look for sound correspondences across a number of words in two languages, and not just individual words in each language that sound identical and mean the same thing. Applying this method reveals that the /ʃ/ 'sh' sound in French (written as ch) regularly corresponds to a 'k' sound in Italian (written as c): compare French chanter with Italian cantare 'to sing', French bouche with Italian bocca 'mouth'. It is these regular sound correspondences that form the basis for genetic groupings of languages, not similarities in the actual forms of the words themselves. Historical linguists will then use these sound correspondences to attempt to reconstruct a 'proto-language' from the forms in the modern languages. Such proto-languages are always theoretical - even 'proto-Romance', a proto-language reconstructed based on modern Romance languages like Spanish, Sardinian and Romanian, is not identical to Vulgar Latin, which had many varieties spoken in across the Roman Empire.

However, even before historical linguists can begin to establish sound correspondences, they first need to identify cognates in various languages. This process of identification is complicated by the fact that words don't just change in pronunciation, they also change in meaning. For example, English dog and Swedish hund /hɵnd/ 'dog' sound nothing alike, even though they share the same meaning. On the other hand, English hound /haʊnd/ and Swedish hund share many similarities in pronunciation, with similar consonants both at the start and end of each word. However, Swedish hund refers to any kind of dog, while English hound refers to only a specific breed of dog. Which word in English would we say is cognate with Swedish hund then? Given the similarity in pronunciation and the somewhat related meaning, hound is the more likely answer.

Now this may not look like a huge semantic leap that could cause much confusion, but a combination of both sound drift and semantic drift can make it difficult to locate cognates. Take for instance, the Swedish word for 'animal', pronounced /jʉːr/, almost like English you're. Based on this spoken form, can you think of a word in English that might be cognate with this?

Unless you know something about proto-Germanic linguistics, I'm guessing that you probably weren't able to work out that the Swedish word for 'animal'written as djur, is actually cognate with English deer. (Yes, the spelling might have helped, but imagine you're working with languages that have no written records.) The word deer in English does not refer to animals in general, but to a specific kind of animal, somewhat analogous to English hound. Speakers of German may have seen the connection, since German Tier means 'animal (in general)' and still sounds similar to English deer. However, the point here is that as languages diverge more over time, the task of identifying cognates between them gets increasingly difficult.

Certain types of sound and semantic change are quite common, and follow well-established patterns. For example, in a number of languages, the word for 'five' is historically derived from the word for 'hand': compare Malay lima 'five' with Hawaiian lima 'hand' (see here for more words for 'hand' in Austronesian languages). However, the rules governing such changes are not necessarily predictive, and at best can only give a probability that a word developed from a particular source. This is when historical linguists can get rather creative in deciding whether two words are cognates or not - disagreements over what words should be used as cognates can lead to rather different reconstructions of what is supposed to be the same hypothetical proto-language.


Swooning over Swadesh lists

To help identify cognates, many linguists start by comparing items from Swadesh lists in various languages. The list was first developed by Morris Swadesh in the 1940s and 50s and contains words that are viewed as belonging to the 'core vocabulary' of all languages, as opposed to culturally-specific vocabulary. Depending on the version of the list, there may be 100 or sometimes up to more than 200 items on the list. The items include nouns referring to body parts like 'heart' and 'tooth', personal pronouns like 'I' and 'we', kinship terms like 'father' and 'mother', some verbs of motion, the numerals 1-5, etc. It was originally assumed that such 'core vocabulary' was more stable over time and underwent replacement by other words in the language at a slow but constant rate, analogous to the process of radioactive decay. Furthermore, there was the implicit belief that words for such 'basic' concepts were not likely to be borrowed from other languages.

Based on such assumptions, Swadesh applied a method called glottochronology to these word lists, which then allowed him to propose dates for when various languages / language families split from each other. Today, this method has been largely discredited, mainly for its flawed assumption that word replacement happens at a steady rate across languages and across all words in a language - although there do remain proponents of this type of research. Furthermore, 'core vocabulary' is not always resistant to replacement by borrowed words. One notable example of this is the adoption of the Chinese numeral system in the genetically unrelated Japanese, Thai and Vietnamese languages.

Despite all these limitations, many field linguists and historical linguists see the Swadesh list as a useful starting point, myself included. But any decent fieldworker or historical linguist would also know that you need to move beyond a Swadesh list consisting of some 200 items (at the maximum) if you want to get any real insight into a language and its past. One needs to go beyond studying the etymology of only 'core vocabulary' and look at other areas like morphology (e.g. prefixes and suffixes), syntax, as well as sociolinguistic variation. Some linguists would also argue for the need to look at vocabulary associated with agriculture and material culture, words that the Swadesh list deliberately omits. In a sense, Swadesh lists are the 'standardised testing' of historical linguistics, designed to make quick and 'consistent' comparisons by omitting large amounts of information and disregarding any subtle nuances in the data. A study that uses data drawn solely from Swadesh lists is inevitably going to be woefully inadequate, just like education policies based entirely on the results of standardised testing.


Words frozen in time?

Coming back to Pagel et al's work, which I now have the overwhelming desire to call the 'Ice Age language study', I hope you can start to see some of the problems with their methodology. Now I'm certainly not saying that their methodology is as basic as my friend's casual linguistic comparison of what are essentially false cognates (pairs of words with similar pronunciation and meaning but very different sources) in her language and Yucatec Maya.

Nevertheless there are issues with their study, as listed here:

(1) They only use Swadesh list data.
(2) There are a number of inaccuracies in the data used to reconstruct certain proto-words, as noted by Thomason.
(3) They apply the Comparative Method to reconstructed proto-words, which are themselves hypothetical and disputable, to reconstruct even older proto-words. (Note: this is acceptable, but only if your first reconstructions are solid.)
(4) There are some questionable judgements about which words to treat as cognates, although this is always going to be a subject of debate in any historical linguistic research. Some linguists simply err on the side of caution, while others are more liberal in their judgements.

It should be obvious by now that this is not an exact science - you can apply all the statistics you want, but if the initial data is based on somewhat subjective judgements, the results of the statistical analysis are not going to be very convincing. To their credit though, they try to show that the rate of word replacement can be correlated with frequency of use, and provide a more empirically-based study than what Swadesh did, even if this study is based on just 200 items on the Swadesh list.

Personally, I find questions about the origins of language families fascinating because they are intimately linked to human migration in prehistoric times, and going back deep enough, to our origins as a species. Judging by the amount of media coverage, this also seems to be an issue that media outlets believe people are interested in reading about. All that I've said doesn't mean that I don't believe that a super 'Eurasiatic' / 'Ice Age' language could have ever existed - I'm certainly in no position to say if one did or did not. I just don't think the evidence provided is compelling enough to suggest that one did. And given the time depth we are talking about, it's doubtful that we'll be able to recognise true cognates using the Comparative Method.

I don't think linguistics by itself will be able to give any satisfying conclusions about our origins, or about prehistoric human migration. But this doesn't mean that we should abandon the collection of linguistic data altogether. Comparative work like this calls for a lot more subtle attention to detail than lists of 200 words. Linguists, such as Roger Blench and George van Driem have also increasingly started to collaborate with anthropologists, archaeologists and geneticists to try and corroborate findings for each field in order to provide a better picture of our prehistoric movements.  More sophisticated statistical, genetic and geography-based computer modelling are also being developed and some are being applied to linguistic data. With any luck, some of these will bring promising results in the future.

Saturday, May 4, 2013

What a 'hotel' can mean in India

According to the Online Etymology Dictionary, the English word hotel was first recorded in the 1640s and denoted a 'public official residence'. The modern sense of the word as 'an inn of the better sort' (i.e. 'a place offering lodging, food and other services to travellers') was first recorded in 1765. The word comes from the French hôtel, which itself is derived from the Medieval Latin hospitale via Old French hostel.

In French, hôtel was used to refer mainly to public official buildings that frequently received visitors, but this has been largely replaced by the meaning of 'place offering lodging and food to travellers', as used in contemporary English. However, you can still see traces of this old usage in words like hôtel de ville 'town hall' and hôtel des impôts 'tax office' and hôtel de police 'police headquarters'.

In India, the term hotel has taken on a slightly different meaning (and pronunciation, with stress on the first syllable, not the second.) Visitors to India are likely to find that big modern buildings offering lodging are called 'hotels', but they might be slightly shocked to see signs for hotels that do not provide lodging at all.

Take for instance this hotel located right next to the Dimapur Railway Station. As you can see, the hotel only offers 'fooding', a very common term in Indian English meaning 'the provision of food' - this can include the catering at an event or simply selling food at a restaurant.

Next to Dimapur Railway Station

I'm not entirely certain how the term 'hotel' has come to be used to refer to (what I would call) a 'restaurant', where only food and no lodging is provided. I doubt that this use derives from the original French meaning of a public building that frequently receives visitors. Incidentally, there are also hotels in India that advertise 'only lodging' with no 'fooding'.

My guess is that the term did originally designate a place frequented by travellers and provided both food and lodging - I imagine that travellers were the most likely people to frequent places offering food since most people would have taken their meals at home or packed their own food. Over time, some establishments may have stopped providing one service or the other for whatever reason (e.g. greater profits from selling food), but the label 'hotel' remained. Consequently, the term 'hotel' no longer denoted a place of lodging, but simply a place frequented by travellers. Someone else starting a restaurant near a train station or along a highway may then choose to call their business a 'hotel', even though they have no intention of providing lodging, as long as their expected clientele are likely to be travellers stopping in for a meal.

Whatever the history of the word may be, don't be shocked if you rock up to a hotel in India and can't get a room - some of them simply don't have any for guests!