Tuesday, March 22, 2011

Speech Technology : Phonology

Words  are formed by series of syllables which re formed by series of phones. In last blog we looked at the Phonetics and how variety of  speech sounds are generated.

Contrasts, Phonemes and Allophones
We already looked the variety and variations of the phonetic elements. But not all sounds are used by all the langauge. Most importantly not all sounds are distict in the same langauge. Consider ph and p (aspirated vs non aspirated). This distinction is not important in english but extremely important in indic langauges. Similarly indic langauges have two D's one dental and other palatal. English has mostly alveolar where as spanish has mostly dental. If indic langauge has vast number of consonants the english langauge has many finer distinctions betweeen vowels.  The point ? What makes two sounds similar or distinct really depends on the langauge. From speech technology perspective this distinction is important because speakers may not articulate sounds exactly like each other when that distinction doesn't matter. Whereas when speech recognition fails to distinguish the sounds when they should it is a problem. 
When two or more phonetic elements are distinct but map to the same phoneme it is called allophones. One of the most striking example of allophones having distinct value is "visarga" in Sanskrit where 's' and 'h' are equivalent to each other. Similar historical allophone is hard 'g' like in gut and soft 'g' as in ginger. That is exactly the reason why we have letter 'g' used for two completely different sounds. At some point these sounds were considereed close to each other and therefore allophones of each other.

Structure of Syllables
Sounds are either 'singable' meaning they are sonorant or not. Vowels, nasals and liquids are singable.All others are not. Sylables almost always have sonorant sound at the nucleus.
Syllable is formed by   Onset+(Nucleus+Coda).
Not all combinations are relevant. Some langauges allow many constonants in onset while others can not handle it. In every language there are some obvious constraints on what words are valid and what are non-sense. We must understand these constraints.

Once we understand all this our next task is to actually group the phonemes into sylables.

Stress, Mora and Syllable timed languages
In a stressed langauge like english or russian each stressed syllable takes about equal time. Where as in other languages like Spanish or most of the indic languages the syllables are timed equally. The mora based langaues are yet another way of timing the phonemes.

Yet another complication is formed by what is called Sadhi in Sanskrit and Liasion in French. I have studied Sanskrit for three years and I know first hand the pain of "breaking" the Sandhi and coming up with individula words. I won't go into all the details but all of us in all of the langauges merge words while talking and we must be able to separate them when recognizing the speech.

