The Student’s Guide to Indo-European

The Student’s Guide to Indo-European

Anton Rytting
For Dr. Cynthia Hallen
Linguistics 450 — Winter Semester, 1998

Introduction

Indo-European has always had a special place in the field of Comparative-Historical Linguistics. Indeed, in the early stages of the disciplines, Comparative-Historical and Indo-European studies were practically synonymous, the former merely referring to the preferred method of investigating the latter. Yet Indo-European (IE) is not the easiest family to reconstruct; indeed, it is still the source of some of the knottiest problems in Historical Linguistics. Why is this language family the object of so much focus? I suggest three reasons. First, it is close to home. The overwhelming majority of linguists happen to speak an IE language, and there exists a certain fascination with studying one’s roots. Secondly, about half of the world’s population speaks a language from the IE family, making it the world’s most expansive language family. Third, and perhaps most important, the wealth of written evidence makes it possible to investigate with some degree of surety back as far the second millennium BC.

The language family called Indo-European is commonly divided into eight ‘living’ sub-families (Germanic, Italic, Greek, Indo-Iranian, Celtic, Balto-Slavic, Armenian, and Albanian) and two ‘dead’ ones (Anatolian and Tocharian). Naturally there exist relationships between these families, but the relationships are quite complex, and it is safe to say that there is more controversy than consensus regarding any higher grouping of these various sub-families. Indeed, as we will see, the history of the field may involve more discarding of misconceptions than actual progress into known fact. This little guide will introduce you to some of the ‘key-players’ (both linguists and languages) in the history studies, and provide a brief overview of scholars’ current viewpoints of Indo-European (what little consensus does exist).

History of the Discipline

Introductory texts either of Historical-Comparative Linguistics or of Indo-European studies commonly begin with Sir William Jones’ 1792 speech to the Royal Asiatic Society. This is not to say that he began either discipline. Many before him had noticed the connections between the European languages and even had connected them with Sanskrit, and by the end of the 18^th century most scholars had stopped regarding Biblical Hebrew a priori as the mother tongue. (During the middle ages, ecclesiastical tradition considered Hebrew the oldest of human languages.) For example, a certain James Parsons in 1767 published a treatise on the "European" family of tongues, which included most of the major families recognized today, and excluded Hebrew. Many others published treatises claiming this or that language as the "original" language of the European peoples. However, Jones is usually credited with voicing the thought that the "original Indo-European tongue" was not Latin, not Greek, not Sanskrit, but some language for which no written evidence existed. This was a crucial turning point for ‘comparative philologists’(as historical linguists were then called), for they had a new task before them: to reconstruct a language from scratch.

Jones’ reputation as a scholar helped his hypothesis catch on. However, although the goal was now in mind, the pathway was not. Etymologists still pieced together word-histories haphazardly, one by one, and fell into many errors through lack of a method. A method of scientific inquiry was not to be found until three Germanic philologists advanced the concept of systematic sound correlations between the various members of the IE family. The works of Rasmus Rask (1818), Franz Bopp (1816), and Jakob Grimm (1819) began the formulation of sound laws: rules of language development believed to be absolute.

During the late 19^th and early 20^th centuries, these sound laws were greatly refined. Grassmann’s internal reconstructions of Greek and Sanskrit (1863) explained some irregularities in their morphology, and Verner’s law (1875) resolved the seeming ‘exceptions’ in Grimm’s phonological law. Karl Brugmann’s theory of syllabic liquids and nasals (1876) helped scholars reconstruct the proto-IE phonemic inventory. As philologists began to express their findings as universal laws and principles, the science of linguistics began to emerge from philology.

While the earliest scholars focused mostly comparative phonology, August Schleicher (1821-1868) turned his attention to the proto-IE lexicon. He began to take lists of cognates and reconstruct what he felt to be the original Proto-Indo-European roots, and first employed ‘genetic’ trees to show relationships between languages and language sub-families. His only foible was perhaps taking his own results too seriously, for he later felt confident enough to write a one-paragraph ‘children’s story’ in ‘the Proto-Indo-European language!’ Johannes Schmidt (1843-1901) felt that ‘genetic’-type trees badly over-simplified the complex relationships between the IE sub-families, and instead proposed a ‘wave-model’ which showed both the more ancient relationships and also those resulting from later periods of language contact. However, the weakness of his model is that it does not try to distinguish when or how these similarities came about; it only shows the similarities.

With the discovery and deciphering of Tocharian (c.1902), Hittite (1915-17), and Mycenaean (or Linear B, 1952-53), the whole face of IE studies changed. The new finds broadened and deepened the scope of IE research considerably. Hittite pre-dated even Sanskrit by at least 600 years, and Tocharian took Indo-European as far east as Chinese Turkestan.

However, these new data also discredited several central hypotheses in grouping the IE sub-families. Although the Mycenaean tablets confirmed Greek’s importance as one of the oldest IE languages, they challenged then-current chronologies of Indo-European migration into the Balkans. And while the Hittite and Tocharian data admirably confirmed the existence of laryngeals in Proto-Indo-European (a theory first suggested by Ferdinand de Saussure and others, but confirmed with Hittite data by Kury_owicz in 1927), they also undermined the most fundamental division of IE language-families: the so-called "centum-satem" split. According to this theory, proto-IE (P-IE) split early on into western and eastern dialects. Western IE languages, such as Italic and Celtic, preserved P-IE velar consonants; Eastern IE languages, such as Slavic and Indo-Iranian, shifted them to palatals or sibilants. Tocharian and Hittite, although both "centum" languages by the standard definition, share many other features with their eastern neighbors. For this reason, some linguists now theorize that the fronting of velars happened independently in each of the so-called "satem" language groups, and at different times.

Enough of history for now, however: let us proceed to a brief description of each sub-family, very roughly in the order of inclusion into the larger family of Indo-European.

Germanic

In a sense, Germanic can be said to be the starting point of the Comparative Method. In 1818 the Danish philologist Rasmus Rask noticed the ‘shift’ in Germanic consonants relative to other IE languages. Grimm named this correspondence the ‘First Germanic Sound Shift,’ and included it, substantiated with examples from Sanskrit, in his 1822 edition of his German Grammar. This law, also known as Grimm’s Law, cleanly differentiates Germanic from all other sub-families of IE. As the law is usually stated now, IE unvoiced stops become fricatives in Proto-Germanic; voiced, unaspirated stops are devoiced; and voiced, aspirated stops become fricatives and then unaspirated stops. In his book In Search of the Indo-Europeans, Mallory dates this sound shift at about 500 BC (see p. 85). As mentioned above, Verner refined Grimm’s Law by accounting for a seeming exception for consonants that occur between unaccented and accented vowels. The tables below sum up Grimm’s and Verner’s Laws:

The earliest text in the Germanic family is a partial translation of the Bible by Bishop Wulfilas into Gothic, the principal member of the now-extinct East Germanic branch. The Goths seem originally to have come from the Black Sea region, and Gothic was still spoken in the Crimea until the 16^th century.

The two living branches of Germanic are Northern Germanic (or Scandinavian) and the West Germanic. The Northern branch subdivides into East and West groups, claiming respectively Swedish and Danish on the one hand, and Norwegian and Icelandic on the other. West Germanic split into ‘High’ and ‘Low’ dialects (referring to geographic elevation, not social status). The High version (comprising Old High German, Standard Modern German, and Yiddish) underwent another sound change, called the Second Sound Shift, where Common Germanic voiceless stops shifted to affricates initially, and fricatives medially and finally; voiced stops were devoiced; and voiced fricatives became voiced stops. The Low version (which includes Dutch, Afrikaans, English, Flemish, Frisian, and Low German) underwent different changes.

Indo-Iranian

Next to German, the earliest Indo-Europeanists considered Indo-Iranian the most important family in their research (which is still evidenced by the German term for Indo-European: ‘Indo-Germanic!’), for it was the oldest family known at the time. The earliest Indic literature – the Vedic hymns – were composed perhaps as early as 1200 BC and handed down orally. By 500 BC, Indian grammarians developed exceptionally accurate descriptions of Classical Sanskrit, their literary language. The oldest Iranian literature include the G_th_s of Zarathustra (written in Avestan, 600 BC) and cuneiform from 500 BC, written in Old Persian.

The Indo-Iranian family is a combination of two closely related sub-families: the Indic (including Vedic and Classical Sanskrit, and the various Prakrits from which the modern Indic languages emerged); and the Iranian (Old Persian, Avestan, and Modern Persian (Farsi), Kurdish, and Pašto). These two branches, which probably split before 1500 BC, are sometimes referred to as ‘Indo-Aryan,’ because the speakers both of Vedic Sanskrit and of Avestan referred to themselves as ‘Arya’ or ‘Airya.’ A third branch, Kafiri or Dardic, is sometimes included in this family.

Proto-Indo-Iranian shows IE */e/, */o/, and the syllabic nasals merging with /a/; */l/ merging with /r/; and palatalization of the labio-velars. Both branches are quite similar in phonology and syntax, preserving all eight cases known to be IE. Sanskrit preserves IE voiced aspirates and develops voiceless ones; Avestan changes these to unaspirated stops and fricatives.

Greek

Probably the next most important language in early IE studies was Greek, for it shared with Latin an immense body of ancient texts, but is both older and morphologically more conservative than Latin. Greek’s importance re-surged in the 1950's with the discovery and deciphering of dialects 500 years more ancient than Homer’s.

These new discoveries, the oldest texts identifiable as Greek, include some 4,500 clay tablets from Knossos, Mycenae, and Pylos, during the Late Bronze Age (12th-13th c. BC). The brief scratchings (mostly economic records) are written in a borrowed syllabary termed Linear B. Once deciphered by Michael Ventris in 1952, they showed a dialect of Greek much closer to P-IE, and clarified the etymology of many Greek words to P-IE roots.

One phonemic shift unique to Greek is the split of labio-velar /k^w/, /g^wh/ into /t/, /t^h/ before front vowels, but /p/, /p^h/ before back vowels. The weakening of initial /s/ and /w/ to /h/ is also characteristic. IE voiced aspirates became unvoiced in Greek, and sometimes underwent a strange phonotactic transformation called Grassmann’s Law: if two aspirates occur in the same word, the first one becomes unaspirated.

The Greek language shows some aboriginal (non-IE) substratum in its names of gods and heroes, terms for "king," "slave," and other social ranks, and in Mediterranean flora and fauna.

Otherwise, Greek shows some weak evidence kinship with Phrygian, Macedonian, and Armenian. The taxonomy of Greek dialects is still contested, but it is common to divide them into East Greek (Attic-Ionic, Aeolic, and Arcado-Cyprian) and West Greek (Northwest Greek and Doric). Mycenaean seems to have been a mix of dialects, like Homer’s ‘epic dialect.’ From Homer’s time till the "classical" period (5th-4th c. BC), the East Greek dialects dominated literature. Indeed, under the great leveling of Alexander’s empire, Attic and Ionic merged into a ‘common’ dialect called Koin_, which eclipsed all other dialects. From the Koin_ dialect stem Byzantine, Medieval, and eventually Modern Greek.

Italic

When writing swept north across Italy from Greek colonists and traders around 800 BC, it preserved a hodgepodge of languages along "the Boot" — some Indo-European, some not. Of the IE languages, the principal two families were the Osco-Umbrian group, preserved by Umbrian religious texts (the Iguvian tablets, c. 200 BC) and Samnite (Oscan) graffiti at Pompeii; Latin; and the near cousin Faliscan to the north. Linguists disagree as to whether these two groups are genetically related, or converged through long, close contact. In addition, some linguists see a connection between the Italic and the Celtic families, because both Osco-Umbrian and the Brythonic (p-Celtic) languages show a shift of P-IE labio-velars into labials (e.g., */k^w/ > /p/). (Latin and Faliscan retain the labio-velar.) This link is tenuous, however.

Judging from the evidence of non-IE groups to the west, the Italic peoples probably came either from the north over the Alps, or from the east across the Adriatic. Latin gradually extinguished the other languages on the peninsula through Rome’s conquest (Oscan was probably extinct by the first c. AD), and soon spread all across former Celtic territory, effectively eliminating Continental Celtic by the 4th c. AD. The Romance languages (Italian, French, Provençal, Spanish, Catalan, Portuguese, Galician, Roumanian, and Raeto-Romansh) all stem from Latin — perhaps not the elegant Latin written by Cicero or Virgil, but the ‘Vulgar Latin’ spoken by the common Roman soldier.

Celtic

The Celts dominated western and central Europe during the Iron Age, originally settling the area near Belgium and northern France. They soon spread to the British Isles, then into the Iberian peninsula (about 600 BC), into northern Italy (400 BC), and into eastern Europe and beyond (300-200 BC). In their western European homeland, the Celts remained the dominating force until invasions from the Romans (on the south) and Germans (on the northeast) wiped Continental Celtic languages off the map in the early centuries of the Christian Era.

Scholars have postulated a number of Continental Celtic languages, such as Gallic, Celtiberian, and Lepontic, but very little is known about these tongues, due to the sparseness of written records. The Celts in the British Isles spoke a language (or languages) known as Insular Celtic; this branch spilt into the Goidelic (or q-Celtic) and Brythonic (or p-Celtic) branches, characterized by shift of IE labio-velars into velars or labials, respectively.

Displaced from the continent by the Romans and Germans, pushed from most of Britain by the Jutes, Angles, and Saxons, and now under pressure from English and French, Celtic today is a dying family. Only Scots- and Irish-Gaelic (on the Goidelic side), and Welsh and Breton (on the Brythonic side) are still spoken. Nearly all speakers of each of these languages are bilingual (either English or French), except on the westernmost edges of Ireland. Someday we may be dependant on written and audio-recorded records for our knowledge of this family of tongues.

Balto-Slavic

Just as with Continental Celtic, our knowledge of early ‘Balto-Slavic’ is hampered by lack of written evidence. Both these sub-families acquired their alphabets, along with Christianity, at the hands of foreign missionaries. The Slavic tongues were first written down in the 9^th c. AD, when St. Cyril and St. Methodius translated the Greek Orthodox liturgies into a south Slavic tongue (now appropriately called "Old Church Slavonic"). The oldest Baltic texts are Lutheran catechisms in Lithuanian, from the 16^th century. By this time the Baltic peoples’ once-expansive territory had been much reduced by the expansion of Slavs and Germans. Several Baltic languages were never written at all; indeed, the whole west branch was nearly lost. Old Prussian, the last West Baltic tongue, died out about 1700 AD, not long after it was first written.

Because both families’ histories are shrouded by lack of documentation, the exact relationship between Baltic and Slavic is hotly contested. Vladimir Georgiev, in his work Introduction to the History of Indo-European, posits a Balto-Slavic unity during the 3^rd millennium BC, with Proto-Slavic branching off during the 2^nd and 1^st millennia BC (p. 219). Other scholars feel that the two are not closely related genetically, but converged during their long mutual contact. This latter view is not unreasonable: neither language family seems particularly averse to outside influence. Although the Baltic languages are said to be among the most conservative of all the IE languages, they still show considerable ‘contamination’ from Germanic. The Slavic languages also show significant borrowing from Germanic (especially Gothic) up until 400 A.D., as well as borrowing religious vocabulary from Iranian sources.

Armenian

Even though literacy reached Armenia as early as the 5^th century AD (again, at the hands of missionaries), Armenian still puzzled IE scholars for quite some time. Its connections with the rest of the family are somewhat obscured by extensive borrowings from non-IE languages. Even after its belated inclusion in the IE family, it was mis-classified for some time as an Indo-Iranian offshoot. Finally, Henrich Hübschmann established it as an independent branch in 1875. Today several scholars connect Armenian with Phrygian, Thracian, and also with Greek. As well as showing phonetic and morphological similarities, Greek and Armenian share some 400 cognates. About 10 percent of these are found in no other known language.

The Armenians’ migration route can be traced from their borrowings: from the Anatolian languages before 1200 BC, and from Iranian and Semitic languages (especially Aramaic) between 500 and 100 BC. It seems, therefore, that the Armenian people actually migrated west, from the Balkans, through Anatolia, into Asia Minor. They also must have borrowed substantially from the (non-IE) inhabitants of their new homeland, since many words in Armenian cannot be traced to any other known language.

Albanian

Like Armenian, Albanian was not definitely established as IE until the end of the 19^th century, and also mis-classified at first because of its wealth of loan-words. Albanian shows much influence from its neighbors: Latin, Greek, Slavic, and Turkish. Also, since its first written records appear very late (during the 15^th century AD), neither Albanian’s history nor its relationships to other IE languages can be known with any surety. Albanian may be a descendant of the Old Illyrian tongue, but there is no way to know for sure, since all we have preserved of Illyrian itself are personal names and place names. Since the Albanian terminology for fishing and boating is borrowed, and there is no mention of the Albanian people living in Albania until the 9^th century AD, some linguists have argued that the Albanians migrated from inland, perhaps from the same homeland as the Dacians (Rumanians).

Despite Albanian’s frustrating lack of historical data, linguists can learn much from its similarities with its present-day neighbors. There exists a peculiar convergence of grammatical and syntactic patterns among Albanian, Greek, Rumanian, and the South Slavic languages, which has lead some researchers to posit a Balkan Sprachbund, or linguistic area of mutual influence. For example, all of these languages have substituted a particle plus the subjunctive for an older infinitive construction, and several of these languages have merged the genitive and dative cases.

One important lesson from this modern convergence is that similarities between languages may have nothing to do with genetic ‘inheritance,’ but may spring from later contact— sort of a ‘peer pressure’ among languages. Thus, Albanian calls into question the traditional emphasis in IE studies on ‘genetic’ relationships. Schmidt’s wave model, in its conservative way, may often be the more accurate representation of the data for IE languages, or groups of languages in general.

Tocharian

The discovery of two ‘extinct’ families of Indo-European, Tocharian and Hittite, marked an auspicious start to 20^th-century IE studies. The first Tocharian manuscripts, which seem to date from 600 AD, were originally discovered right around the turn of the century. Because they were written in an Indic alphabet already known to scholars, the texts were deciphered quite quickly. Many of these were Buddhist texts, translations from the Sanskrit; others contained treatises on magic or medicine, or business transactions for caravans. The name "Tocharian" comes from a people called by the Greeks "Tocharoi," who moved from Turkestan to Bactria during the 2^nd c. B.C. This identification is still hotly debated. but the name, accurate or not, has stuck.

The so-called "Tocharian A" is preserved just in the eastern part of Chinese Turkestan,

mostly in liturgical texts. The "B" variety is spread out also to the west, and seems to have been more vernacular. The dialects do not differ greatly from each other: Tocharian A tends to drop off word-final vowels, and shows simple vowels /e/ and /o/ where Tocharian B has diphthongs /ai/ and /au/.

Tocharian’s relationship with the other IE families is far from clear, and the history of its migration still just speculation. As mentioned before, the velar /k/ in such words as känt (Toch. A), kante (Toch. B) (=Latin centum, or 100) has been preserved, and a medio-passive -r suffix is similar to Latin, Irish, and Hittite. However, it shares adjectival suffixes with Slavic, and certain cognates with Greek. Other scholars see connections with Thracian and Armenian; still others with Germanic and Balto-Slavic. Because it shows similarities to both eastern and western IE languages, Tocharian poses serious problems for a fundamental east-west split such as the traditional centum-satem hypothesis.

Anatolian (Luwo-Hittite)

Late in the 19^th century, large numbers of cuneiform were found in an unknown language at Tell el-Amarni in Egypt. Ten years later, about 25,000 tablets were discovered 90 miles east of Ankara, Turkey, in the same language. In 1906, Winckler found the "Archives of the Hittite Kings," with records dating from the 14^th and 13^th c. BC. In 1915, Bed_ich Hrozný suggested that this previously unknown "Hittite" language was a particularly ancient member of the Indo-European family. Although the oldest inscriptions date back to around 1800 BC, these merely consist of isolated words. Actual continuous texts do not appear in Hittite until 1650 BC, and in Luwian (a related language) until 1400 BC. There is some evidence of other languages in the Anatolian sub-family, such as Lycian, Lycian, and Palaic, but they are not well attested. Palaic, for example, is known only by 200 words.

There is considerable influence on both Luwian and Hittite from Non-IE languages, especially from neighboring Semitic tongues. They show many interesting innovations: the merging of masculine and feminine gender into a ‘common’ or ‘animate’ gender; the use of postpositions (prepositions which come after the noun) and postpositional possessive pronouns; and the loss of special markers for comparatives and superlatives. Nevertheless, Hittite’s and Luwian’s most basic words and syntax structures are definitely IE, and Hittite plays a crucial role in confirming de Saussure’s Laryngeal Theory mentioned above.

The Present State of the Discipline

We can never be sure when a new language will be unearthed from an archaeological site, or some other breakthrough will revolutionize the field. For the time being, however, linguists are keeping themselves quite busy with the data they have. There is still much work to be done on the phonology and morphology of more recently discovered languages (Tocharian and Hittite), and in reconstructing the phonological inventory of P-IE. Saussure’s ‘laryngeal theory’ has evolved considerably from the time he first proposed it, and it continues to receive considerable attention. Another ‘hot’ phonological theory is the ‘glottalic theory.’ In 1973, linguists who were uncomfortable with P-IE’s ‘unbalanced’ typology of stops (i.e., voiced aspirates with no voiceless counterparts) proposed that P-IE possessed a series of glottalic stops. Vennemann’s The New Sound of Indo-European contains nothing but articles on these two theories.

There is also continuing interest in making the genetic relationships, families, and ‘super-families’ ever more inclusive, and numerous scholars have tried, and are still trying to connect the whole Indo-European family to other families. The most persistently recurring of these attempts is the grouping of Indo-European, Afro-Asiatic (a.k.a. Hamito-Semitic), Uralic, Altaic, Kartvelian and others into the superfamily called ‘Nostratic.’ Nostratic Scholars have amassed quite a bit of data, but traditional IE scholars question their methods, and point out that there is no clear way of distinguishing cognates from loan-words so far back in time.

As the field of archeology continues to progress, IE scholars and pre-historians keep on trying to find a when and a where for P-IE, seeking to connect it with some culture known to archeologists. As Mallory so aptly quips, "One does not ask, ‘where is the Indo-European homeland?’ but rather ‘where do they put it now?’"

Among the hard-core linguists, the trend of reconstructing IE syntax continues – a much slippier task than phonology or even morphology. Although this problem is far from adequately solved, a few scholars in very recent years have gone one step further, to begin a theory of ‘comparative poetics.’ Calvert Watkins (1995) and Ranko Matasovi_ (1996) have set their hands to the task of reconstructing the metrical form, patterns, and even ‘stock storylines’ and common motifs of IE poetry and song. Out of the maze of linguistics, philology rises once again!

Bibliography

Primary Sources (Cited in Text):

Arlotto, Anthony. Introduction to Historical Linguistics. Boston: Houghton Mifflin Company, 1972.

Baldi, Philip. An Introduction to Indo-European Languages. Carbondale and Edwardsville, IL: Southern Illinois UP, 1983.

Byron, Theodora. Historical Linguistics. Cambridge: Cambridge UP, 1977.

Georgiev, Vladimir Ivanov. Introduction to the History of Indo-European. (3^rd Edition.) Sofia: Publishing House of the Bulgarian Academy of Sciences, 1981.

Mallory, J. P. In Search of the Indo-Europeans: Language, Archeology, and Myth. London: Thames and Hudson, Ltd., 1989.

The American Heritage Dictionary of Indo-European Roots. Calvert Watkins, editor and revisor. Marion Severnyse, editor/etymologist. Boston: Houghton Mifflin Company, 1985.

Secondary Sources (Mentioned in Text or Cited in Endnotes):

Anttila, Raimo. Historical and Comparative Linguistics (2^nd Revised Edition). Amsterdam/Philadelphia: John Benjamins Publishing Company, 1989.

Guar, Albertine. A History of Writing. New York: Charles Schribner’s Sons, 1984.

Lehmann, Winfred P., ed. and trans. A Reader in Nineteenth-Century Historical Linguistics. Bloomington and London: Indiana UP, 1967.

Matasovi_, Ranko. A Theory of Textual Reconstruction in Indo-European Linguistics. Frankfort-am-Main: Peter Lang, 1996.

Sen, Subhadra Kuma, ed. "Proto-Indo-European: A Multiangular View." In The Journal of Indo-European Studies 22, no. 1&2 (Spring/Summer, 1994): 67-90.

Vennemann, Theo, ed. The New Sound of Indo-European: Essays in Phonological Research. Berlin/New York: Mouton De Gruyter, 1989.

Watkins, Calvert. How to Kill a Dragon: Aspects of Indo-European Poetics. New York; Oxford: Oxford UP, 1995.

1998-1999 © Dr. Cynthia L. Hallen
Department of Linguistics
Brigham Young University
Last Updated: Monday, September 6, 1999