Saturday, 29 January 2005

Using the Internet language corpus: are acronyms words?

The Economist this week has a nice article on how the Internet is starting to be accepted by linguists as a corpus for analysis of language usage. The advantages of the vast amount of data on the Net outweigh the disadvantages of its biases. (The biases: since it's a published medium, it's more formal than speech, but that's a disadvantage many corpuses (corpi?) face; more problematic, Internet language usage may be deliberately skewed towards words used to attract people to gambling and pornography sites.)

As I recall, my old friend Bert was already roughly doing so by comparing the number of Google results for the so-called "proper" usage. Of course, there are technical issues with using the Internet - for one, although the article doesn't say this, no search engine is perfect, so the choice of engines could make a difference in terms of how the word-distribution is skewed.

But the concluding paragraph was perhaps the most intriguing:
The easy availability of the web also serves another purpose: to democratise the way linguists work. Allowing anyone to conduct his own impromptu linguistic research, some linguists hope, will do more to popularise their notion of studying the intricacy and charm of language as it really exists, not as killjoy prescriptivists think it should be.

So, in that amateur-linguist spirit, I decided to use this rough corpus to look at the old question of whether acronyms are words. The numbers seem to suggest that yes, in the linguistic sense, people do treat them as words and not just as stand-ins for words. And that's not just the ones that've become part of the standard vocabulary like "laser" or "scuba", but also ones like "ATM" and "PIN".

Clearly the mind doesn't immediately break up acronyms into their 'component' words. If not, people wouldn't say "ATM machine" (276,000 hits on Google) or "PIN number" (728,000 hits on Google). The high number of incidences of usage of these phrases show that those phrases are almost instinctive, which means at some level people's minds treat "ATM" and "PIN" as separate words that are adjectives rather than nouns. Interesting. Not an original question, I must admit, but I wasn't expecting that high number of hits.

