Sanzhi Dargwa dictionary

by Diana Forker

Area, speakers and sociolinguistic situation

Sanzhi Dargwa is a Nakh-Daghestanian language from the Dargwa subbranch and belongs to the South Dargwa varieties. It is spoken by approximately 250 speakers. More than 40 years ago all Sanzhi speakers left the village of origin, Sanzhi, in the Caucasian mountains and moved to the lowlands. Today, the majority of Sanzhi speakers live in the village of Druzhba in the Daghestanian lowlands (Kajakentskij Rajon). Druzhba is an ethnically and linguistically heterogeneous settlement with speakers of other South Dargwa varieties, other Nakh-Daghestanian languages such as Tabasaran, Agul, Lezgian, and Lak and also very few Kumyk (Turkic) and Russian speakers.

Sanzhi Dargwa, like many other comparatively small languages and varieties spoken on the territory of Daghestan, is an unwritten language that is only used for oral communication within the Sanzhi community. In school, Sanzhi children have around two hours of mother tongue education per week, during which they learn Standard Dargwa (based on the variety spoken in the large Dargwa settlement Akusha), which is mutually unintelligible with Sanzhi Dargwa. Russian serves as the main language of interethnic communication and the only language used in education, administration and more generally in the public sphere in Daghestan. Therefore, all Sanzhi speakers know at least some Russian. Before the arrival of Russian in the remote parts of the central Daghestanian mountains where the original village of Sanzhi is located Kumyk served as the language of interethnic communication in the area. Nevertheless, among the Sanzhi speakers with whom I worked nobody claimed to have a significant command of Kumyk.

Most children and young people (30 years and younger) still learn Sanzhi as their first language (depending on the family constellation), but they come in contact with Russian right from the first day of their life. At latest when they attend kindergarten Russian becomes the dominant language. Therefore, they only have a limited and mostly passive command of Sanzhi and prefer to speak only Russian. They will probably not pass the language to the following generation such that Sanzhi is heavily endangered.

Socioeconomic and cultural situation

The village of Sanzhi, the original home village of the Sanzhi population, is located in the valley of the river Ulluchay, at an altitude of about 1,500 meters in the northeast of the Caucasian mountains. The closest neighboring villages are Itsari, Shari, Khuduts, Ashty, and Amukh. The distance from the Daghestanian capital Makhachkala is around 200 kilometers, from the regional center of the Dakhadayevskiy rayon, Urkarakh, it is 60 kilometers, and from the old and important city of Derbent around 130 kilometers. In contrast to most other Daghestanian villages there is no direct road to Sanzhi and also no electricity. These were major reasons for Sanzhi speaker to migrate to the Daghestanian lowlands to the village of Druzhba which is located on the main road between Astrakhan and Baku, approximately 10 km from the Caspian Sea cost and 30 kilometers to the north of Derbent.

The village of Sanzhi was surrounded by terrace fields that have been used for centuries to grow crops such as rye, wheat, barley, oats, and in the recent past also carrots, radishes, potatoes, and others. The traditional occupations of the Sanzhi people were farming and breeding, in particular sheep breeding. Today, Sanzhi people usually have given up their traditional occupations and work as employees, teachers, builders or have occasional jobs in the area.

Sanzhi people are Sunni Muslims and have been influenced by neighboring Daghestanian people (e.g. Lak, various Lezgic communities) and Turkic people with Persian and Arabic being important, but geographically more distant cultural reference points.

State of research

As an unwritten language there is no long tradition of description or analysis. Since 2012 Sanzhi is documented and described in the project “Documenting Dargi languages in Daghestan: Shiri and Sanzhi”, financed by the VW foundation and lead by Diana Forker. Detailed information on the project, pictures, electronic dictionaries, texts, audio recordings and other materials can be found on the project website The recordings and annotated texts are available via the DoBeS archive. A glossed and translated corpus (around 45,000 words) can be found at the Corpora of the Russian Federation. A comprehensive grammar and a sketch grammar will be published soon (Forker In Press, accepted[b]). Topics in the morphosyntax of Sanzhi and other aspects of Sanzhi have been treated in Forker (2014; 2016; 2018; 2019a, b, c; accepted[a, b]). A collection of texts with Russian translations and a Sanzhi-Russian and Russian-Sanzhi dictionary is Forker & Gadzhimuradov (2017).

Standard Dargwa, to which Sanzhi Dargwa is relatively closely related although mutual understanding is normally not possible, is a major language in Daghestan with a certain tradition of writing and research. See Sumbatova (In Press) for a recent overview and references.

Typological profile


Sanzhi has four plain vowels and three pharyngealized vowels, of which [iˁ] is very rare and its phonemic status needs further clarification. Table 1 shows the vowel inventory. In addition, there is one long vowel [aː], which is not phonemic, but occurs relatively frequently.

Table 1: The vowel inventory

Front Central Back

[ı], [i]; [ıˁ], [iˁ]

i; iˁ

[u], [ʊ]; [ʊˁ]

u; uˁ





[a], [aˁ]

a, aˁ

Table 2 displays the consonant inventory. A non-phonemic glottal stop, which is not written, occurs before word-initial non-pharyngealized vowels, e.g. aba [ʔaba] ‘mother’, including vowel-initial words in compounds. In addition to the segments listen in Table 2, the voiceless labiodental fricative [f] is attested in the ideophone uf b-ik’ʷ-ij ‘blow’ (uf HPL-say.IPFV-INF) and in loan words, mostly from Russian. In older loans it had been replaced with [p].

Table 2: The consonant inventory

Bilabial Dental Alveolar Velar Uvular Pharyngeal / epiglottal Glottal

[p] [b] [pʼ]

p b pʼ




[t] [d] [tʼ]

t d tʼ




[k] [ɡ] [kʼ]

k g kʼ

[kʷ] [gʷ] [kʼʷ]

kʷ gʷ kʼʷ





[q] [qʼ]

q qʼ

[qʷ] [qʼʷ]

qʷ qʼʷ









[s] [z]

s z




[ʃ] [ʒ]

š ž









[χ] [ʁ]

χ ʁ

[χʷ] [ʁʷ]

χʷ ʁʷ










[t͡s] [t͡sʼ]

c cʼ


[t͡ʃ] [t͡ʃʼ]

č čʼ


















All plain consonants occur in initial, medial, and final position. All velars and uvulars occur also in labialized form, predominantly in syllable-initial position. All voiceless nonejective stops and fricatives (except for the pharyngeal/epiglottal and the glottal sounds) and even a number of labialized consonants also occur in tense form (geminates). Tense consonants and three labialized consonants (qʼʷ, χʷ, ʁʷ) are never found in syllable-final position.

Stress is not a very prominent category in Sanzhi Dargwa. Stress is dynamic and has no fixed position, but it is lexicalized. The stress is quite weak, the stress properties of words are very hard to determine, and are therefore not indicated in the dictionary. Some affixes attract stress such that the position of stress in roots and in inflected word forms of one and the same lexeme may differ, e.g. plural suffixes of nouns: qːap ‘sack’ > qːup-né ‘sack’.

Because Sanzhi is an unwritten language, there is no official orthography. In this dictionary and in the published corpus (Forker & Gadzhimuradov 2017), a Cyrillic orthography is used that is largely based on the orthography for Standard Dargwa. The Cyrillic orthography as well as the transcription employed in this dictionary is given in Table 7.

Morphology and syntax

The morphology is concatenative and mostly suffixing. Prefixes are only found with verbs in in the form of gender prefixes, spatial preverbs, and negation. Sanzhi has ergative alignment that is largely restricted to the morphology. It is predominantly dependent-marking with a rich case inventory. Salient traits of the grammar are two independently operating agreement systems: gender/number agreement and person agreement. Gender/number agreement operates at the phrasal and at the clause level (Section on gender/number agreement). Person agreement operates at the clausal level only, and functions according to a person hierarchy. SOV is the most frequent constituent order; other orders are also possible and found in texts. Constituent order of most types of subordinate clauses is more strictly head-final, and noun phrases and postpositional phrases are exclusively head-final.

Gender/number agreement

Gender/number agreement is a pervasive feature of Nakh-Daghestanian languages including Sanzhi. It is possible that within one clause three, four, or even more linguistic items agree with one and the same agreement target.

Sanzhi has three genders that have a transparent semantic basis: masculine, feminine, and neuter. Agreement targets for gender/number agreement can be divided according to their agreement domains:

Clausal domain

  • most vowel-initial verbs
  • a few preverbs (e.g. b-al ‘together’, b-at ‘set free, let’)
  • the spatial preverbs b-i- ‘in, inside’ and b-it- ‘thither’
  • almost all copulas
  • the postpositions/adverbs b-i ‘in’, b-alli ‘together’, b-arxle ‘directly, straight’
  • all items inflected for the essive case, e.g. nouns, pronouns, spatial adverbs, postpositions; all items that inflect for the directional case, which are mostly spatial adverbs

Domain of the noun phrase

  • a handful of adjectives
  • the quantifier li<b>il ‘all’ and group numerals
  • all adjectives formed with the derivational suffix -či-b

Furthermore, a small number of nouns (e.g. b-ah ‘owner, master’), reflexive pronouns in the absolutive case and one reciprocal pronoun contain gender exponents that express the gender of the referent.

Within the noun phrase, modifiers agree with the head in gender and number independently of the case marking on the head. At the clausal level, gender agreement is most frequently controlled by the absolutive argument of the clause. Few verb forms allow the ergative or the dative experiencer argument as controller, and other clauses lack absolutive arguments and resort to default agreement.

The agreement affixes are given in Table 3. All forms except the zero marking for masculine singular agreement can occur as prefixes, suffixes, and infixes. Verbs (except for copulas) and adjectives have prefixes; the other agreement targets have suffixes (or very rarely infixes).

Table 3: Gender/number agreement affixes

Singular First and second person plural Third person plural
Masculine w / Ø d b
Feminine r
Neuter b d

As Table 3 shows, there are fewer distinctions in the plural than in the singular, because masculine and feminine are united in human plural agreement. In addition, human plural is conditioned by person: first and second person plural agreement is expressed with d, third person with b.

In this dictionary, all words that have agreement affixes are given with the affix b whenever possible (i.e. neuter singular or third person plural). For those few words for which it would be difficult to find contexts in which the affix b can be used, the masculine singular form is used instead.

Parts of speech and affixes

Parts of speech can be identified on the basis of their morphosyntactic properties. Table 4 lists all parts of speech, which the dictionary contains. The main properties of each part of speech are described below.

Table 4: Parts of speech

N nouns
V verbs
ADJ adjectives
ADV adverbs
CONJ conjunctions
COP copula verbs
INTERJ interjections
POST postpositions
PREV preverbs
PRON pronouns
PRT particles

There are a number of lexical items that can belong to more than one part of speech depending on their function. This mainly concerns:

  • adverbs / postpositions (see below)
  • preverbs (see below)

In addition to the parts of speech of independent words, prefixes and suffixes included in the dictionary are divided into six different types according to their function and position (Table 5).

Table 5: Affix classes in the dictionary

SUF suffixes used across different parts of speech (gender suffixes and the cross-categorical suffixes -ce and -il)
V.PREF verbal prefixes (gender prefixes, negation prefixes; NOT spatial preverbs)
V.SUF verbal suffixes (TAME and person agreement suffixes)
N.SUF nominal suffixes (number, case)
NUM.SUF suffixes used for the derivation of numerals
DERIV other derivational suffixes

Nouns The grammatical categories of nouns and other nominals in Sanzhi are gender, number and case. Sanzhi has the typical Dargwa gender system of three genders that have a transparent semantic basis: human masculine, human feminine, and neuter. Genders are almost never marked on nouns, but on other agreement targets (Section on gender/number agreement). The combined gender / number agreement affixes are given in Table 3 above.

Most nouns can be marked for plural by means of a suffix. There are 14 different plural suffixes whose use is largely lexicalized and therefore included in the dictionary. Many nouns have more than one plural form. Alternative plural forms are also provided in this dictionary. In addition, there is an associative plural suffix.

Sanzhi Dargwa has four grammatical cases, namely absolutive, ergative, dative, and genitive, and many more semantic cases. Most of the latter are spatial cases. Case suffixation is (almost) completely regular and predictable. Case suffixes often do not directly attach to the nominal root, but are preceded by a so-called oblique marker, which for most nouns is identical to the ergative suffix (-l(i) or one of its allomorphs -ri, -ni).

Adjectives Adjectives can be distinguished from nouns or verbs since they are not lexically specified for gender, and they cannot take tense suffixes or other inflectional morphology reserved for verbs. They are formally rather heterogeneous. Only a few adjectives have gender agreement prefixes.

As characteristic for Dargwa varieties, Sanzhi has a class of underived adjectival stems that are used as attributes to nominals, but cannot be used substantively or predicatively. Some of these adjectival stems are also used as preverbs in compounding, especially in compound verbs (see examples (1c, d) below). All underived adjectival stems can take the cross-categorical suffix -ce (plural -te) in order to be employed as predicates or nominals, and are given together with this suffix in the dictionary.

All numerals are adjective-like in their morphosyntactic properties and therefore classified as adjectives. In order to easily identify numerals this dictionary contains the semantic domain ‘numerals’, which includes cardinal numerals, ordinal numerals, collective numerals, etc.

Pronouns Sanzhi Dargwa has the following types of pronouns:

  • personal pronouns for first and second person
  • demonstrative pronouns, which are also used as third person pronouns
  • reflexive pronouns
  • reciprocal pronouns
  • interrogative pronouns (and pro-forms)
  • various types of indefinite pronouns (and pro-forms)

Interrogative and indefinite pro-forms which do not function as pronouns belong to the part of speech of adverbs.

Verbs The morphosyntactic categories of verbs in Sanzhi are person, gender, number, polarity, tense, mood, aspect, evidentiality, and voice.

Based on their morphological make-up verbs can be divided into the following morphological classes:

  • simple verbal stems
  • derived verbs (using spatial preverbs or causativization)
  • compound verbs

There are comparably few simple verbal stems that can be used and are actually used without having undergone additional derivational or compositional operations.

Derived verbs contain spatial preverbs and / or the causative suffix -aq. In their original meaning, spatial preverbs express location, direction and deixis/elevation. Spatial preverbs form a closed paradigm with three slots and occur in the order [(location)-(direction)]-(deixis/elevation)-root (Table 6).

Table 6: Spatial preverbs

Location preverbs
či- ‘on’ gu- ‘under’ sa- ‘in front of’
gm-i- ‘in, inside’ hitːi- ‘behind, after’ tːura- ‘outside’
kʷi- ‘in(to) / to, in(to) the hands’
Direction preverbs
gm- ‘essive’ (no movement) Ø- ‘lative’ (to the goal) r- ‘ablative’ (away from the goal)
Deixis / elevation preverbs
ha- ‘up, upwards’ ka- ‘down, downwards’ sa- ‘to the speaker, hither’
gm-it- ‘away from the speaker, thither’


All location and direction preverbs are identical to spatial postpositions, spatial adverbials or spatial cases. Spatial preverbs are mostly optional, but there are a number of bound verbal roots for which they are obligatory. Spatial preverbs are most commonly used with verbs denoting movement, position or posture but also with other verbs. In this dictionary, a large part of verbs contain spatial preverbs. The preverbs are always written together with the verbal stem and are thus verbal prefixes. They are nevertheless classified belonging to the lexical class ‘preverb’ (=PREV) in the dictionary (see the section on preverbs below).

Sanzhi has many more preverbs with mostly non-spatial meaning. All those other preverbs are used in verbal compounding. Compounding is a very productive way of extending the verbal lexicon. Compound verbs consist of two parts: a first part that can be a noun (often a loan word), adjectival stem (without the suffix -ce), ideophone, or bound lexical stem (including verbal stems). The bound lexical stems and the ideophones form a closed class that occurs only in compound verbs and are categorized as ‘preverbs’ (together with the spatial preverbs from Table 6). The second part of compound verbs is a light verb. The most frequent light verbs are b-iχʷ-ij ‘be, become, can’, b-ik’ʷ-ij ‘say’ and b-arq’-ij ‘do’. The verb b-ik’ʷ-ij is widely used in compounds that denote verbs of speech and the production of other sounds, but also in many verbs of movement.

Compound verbs are written separately in this dictionary. This means that the first part and the second part are not written together but as two separate words. Examples of compound verbs are given in (1).




a. er ‘life’ + b-iχʷ-ij (hpl-be.pfv-inf) ‘live’




b. sːalam ‘greeting’ + b-ičː-ij (n-give.pfv-inf) ‘greet’




c. ʡaˁħ ‘good’ + b-arq’-ij (n-do.pfv-inf) 'improve, correct'




d. ʡaˁħ ‘good’ + b-iχʷ-ij (n-be.pfv-inf) ‘be, become good, get healthy’




e. qaˁš (ideophone) + k-aʁ-ij (down-do.pfv-inf) ‘cut off, cut into pieces’

Almost all verbal stems come in pairs that can be treated as expressing the aspectual opposition between perfective and imperfective. This opposition is preserved in most TAM forms including non-finite verb forms such as participles and converbs. Only some finite and non-finite verb forms are available for perfective as well as for imperfective verb stems; most TAM forms can be built only from imperfective or only from perfective stems. The formation of the aspectual pairs is largely lexicalized and cannot be predicted. Therefore, in the dictionary for every verb form its aspectual value is given. For the few verbal stems that cannot be classified as imperfective or perfective a note specifies that ‘the verb lacks the aspectual opposition’. Note that the Russian translations of the Sanzhi verbs do not always match the aspect, i.e., not for every imperfective verb in Sanzhi the meaning is described by means of imperfective Russian verbs.

The morphological structure of verbs in Sanzhi is fairly complex. There are up to five morphemes that can precede the root and up to five that can follow it. These morphemes include first parts of compound verbs and enclitics expressing person, tense, modality and illocutionary force. The root can be followed by up to three suffixes; a derivational suffix for the causative that is directly attached to the root; the other two suffixes express finite and non-finite TAM forms including person agreement for some verb forms. To certain verb forms person or tense markers can be encliticized. There are restrictions on the combinability of markers in the various slots, e.g. TAM forms requiring person suffixes exclude the use of enclitic person or tense markers.

Sanzhi has gender and person agreement. The two systems are formally and functionally independent. Most of the vowel-initial verbal stems and the two spatial preverbs b-i- ‘in(side)’ and b-it- ‘away from the speaker, thither’ have gender agreement affixes. Furthermore, copulas (see below) have a slot for a gender agreement suffix. The agreement affixes are displayed in Table 3. The same agreement affixes, which can show up as prefixes, infixes and suffixes, occur with all other parts of speech that have gender agreement (Section on gender/number agreement). The agreement affix for masculine singular -w is always used when it occurs as a suffix. The same affix is regularly omitted when it occurs as a prefix to a verbal root beginning with u, e.g. ukː-unne=da (masc.) vs. r-ukː-unne=da (fem.) (F-eat.IPFV-ICVB=1SG) ‘I will eat’. It is optionally omitted when the root starts with i, e.g. (w-)ik’-ul (masc.) vs. r-ik’-ul (F-say.IPFV-ICVB). In this dictionary, most verbs are given with the agreement prefix b- (neuter singular or third person plural) except for a few verbs for which this agreement prefix could only be used in a very limited number of context. Those few verbs are given in the form of the masculine singular and the entries often contain an example of the feminine singular.

Sanzhi Dargwa has only two indicative synthetic verb forms, but a large number of morphologically complex forms, which make use of the copulas or other auxiliaries.

The preterite is the default past tense with respect to form and function. It is most commonly used with perfective stems and one of the major verb forms. The preterite is the basis for a number of frequently used verb forms (including the perfective converb). It is formed by means of one of four suffixes (-ib, -ur, -un, -ub), which cannot be predicted from the verbal stem. Therefore, the preterite suffix is given in this dictionary for all perfective verb stems and all verbs, which are not specified for aspect.

Major non-indicative verb forms are imperative and optative. The imperative (as a few other verb forms) distinguishes between intransitive and transitive verbs expressed through the use of dedicated suffixes. The dictionary contains the imperative forms for some of the perfective verbs (because the imperative is mostly formed from the perfective stem).

Non-finite verb forms are infinitive (suffix -ij), subjunctive (i.e. so-called ‘agreeing infinitive’), masdar, general converbs, enclitics used for the formation of specialized converbs, conditional and concessive forms and participles. In principle, the infinitive can be formed from imperfective and perfective stems but it is almost exclusively used with perfective stems in the Sanzhi corpus. In this dictionary, the infinitive is used as the basic form for entries of verbs.

Sanzhi has two general (i.e. contextual) converbs, the imperfective converb and the perfective converb. The imperfective converb (-ul(e), -unne) can only be formed from imperfective stems and from stems of which the aspect is not specified. It is used for the formation of a number of tense forms and thus belongs to the most commonly used inflectional forms of verbs. Since its form cannot be predicted from the infinitive, the dictionary contains the imperfective converb form of every imperfective verb and of those verbs, which lack the aspectual opposition.

Copulas The default copula and its negative counterpart are widely used for the formation of periphrastic tenses and in copula clauses. In addition, there are four existential / locational copulas that have a similar functional range as the copula (i.e. copula clauses with existential / locational semantics and to a certain degree the formation of periphrastic tenses). All affirmative copulas have gender/number agreement suffixes; the negative copula has gender/number agreement prefixes.

Adverbs Sanzhi has some basic spatial adverbs, but most of them are derived. Several series of deictic spatial adverbs can productively be derived from demonstrative pronouns. All spatial postpositions can also be used adverbially without a dependent NP. Some of them have not only spatial, but also temporal semantics. The spatial and temporal adverbs can be inflected for the directional cases lative (no suffix), essive (gender marker) and ablative suffix -r).

Basic temporal adverbs denote times of the day, temporal relations, etc. Deictic manner adverbs are also derived from demonstratives and other manner adverbs are formed by means of the suffix -le. The suffix -le also forms perfective converbs.

Postpositions Sanzhi has spatial and non-spatial postpositions. Some of the spatial postpositions also have temporal readings. The majority of the spatial postpositions is widely used as adverbs and then occurs without a dependent noun phrase. Spatial postpositions govern spatial cases and/or the genitive. The non-spatial postpositions govern the comitative or the absolutive. The spatial and temporal postpositions can be inflected for the directional cases lative (no suffix), essive (gender marker) and ablative suffix -r).

Conjunctions Sanzhi does not have native conjunctions except for one. The main way of conjoining phrases is the use of the additive enclitic, and at the clause level converbs are employed. The dictionary contains 10 conjunctions, which have been borrowed mostly from Arabic and Russian and are attested in the corpus.

Particles The class of ‘particles’ comprises predicative particles, temporal particles, modal particles, equative particles, and focus-sensitive particles. The vast majority of them occur as enclitics. They fulfil grammatical, semantic and partially pragmatic functions.

The term ‘predicative particles’ has been introduced in reference to a closed class of grammatical elements with functions similar to those of copula-like auxiliaries (e.g. Sumbatova & Mutalov (2003); Kalinina & Sumbatova (2007); Sumbatova & Lander (2014)). They function as heads of nominal predicate clauses and similar clauses that do not contain other verbs, and that they are used in analytic verb forms together with non-finite verb forms in order to form full main clauses.

Temporal particles are enclitics used in specialized converbal clauses and for the expression of other adverbial phrases.

Interjections Pause fillers, politeness particles, address particles, greetings, exclamatives, and interjections have all been labeled ‘interjections’ in order to be able to differentiate them from other particles, which generally have more grammatical functions. In principle, all interjections are also particles, but they show less grammatical connection with other words than the class of ‘particles’. Except for the politeness particles, which are enclitics, they are phonologically independent words. Many of the items in this class, namely address particles, greetings, exclamatives, and interjections, can be used as independent, full utterances.

Preverbs The class labelled ‘preverbs’ in this dictionary all items which are used in compound verbs and which are not nouns. The class also includes spatial preverbs, although spatial preverbs form a closer unit with the verb are prefixes, and co-occur with the items labelled ‘preverbs’. All preverbs occur as the first part in the compound together with a native light verb and are written as separate words in the dictionary. They can be adjectival stems (1c, d) (without the suffix -ce, see Section on parts of speech and affixes), ideophones (2a, b), bound lexical stems (2c), or, very rarely, verbal stems (2d). The category of preverbs contains borrowings, e.g. (2d) contains a verbal stem ‘open’ borrowed from Turkic.




a. ħaˁħaˁ + b-ikʼ-ʷij (hpl-say.ipfv-inf) 'laugh'




b. cʼip + či-r-aʁ-ij (spr-abl-do.pfv-inf) ‘cut off, chop off’




c. lakʼ ‘throw’ + b-arq’-ij (n-do.pfv-inf) ‘throw’




d. ‘open’ + b-arq’-ij (n-do.pfv-inf) ‘open’


The Sanzhi lexicon is composed of items inherited from Proto-Dargwa and of numerous loans that mostly originate from Turkic, Arabic, Persian, and nowadays form Russian (including all recent international borrowings). Loan words in the dictionary are classified with the semantic domain ‘loans’ (although loan words of course do not form a semantic domain. Since there has been mutual borrowing between Turkic, Arabic, and Persian, it is not easy to establish from which language Sanzhi actually borrowed those words and thus sometimes their ultimate origin and mostly no origin at all is indicated. A simple Russian-Sanzhi and Sanzhi-Russian dictionary can be found in Forker & Gadzhimuradov (2017). A preliminary version of the electronic dictionary, which is also the base for this dictionary, can be found here.

The dictionary

Methodology, aims and scope

The dictionary has been prepared as part of the language documentation project “Documenting Dargi languages in Daghestan: Shiri and Sanzhi.” The words have been collected in several ways and in several stages: - by means of other published dictionaries of Daghestanian languages, in particular Khalilov & Comrie (2010), Sumbatova & Lander (2014), Kibrik & Kodzasov (1988; 1990) and the grammar by Sumbatova & Mutalov (2003), van den Berg (2001), which have been used to systematically check Sanzhi equivalents of Russian words and Sanzhi equivalents of words from other Daghestanian languages - based all lexemes contained in the glossed Sanzhi corpus.

The vast majority of the examples sentences have been elicited specifically for the dictionary because sentences from the corpus would usually be too long and/or too complicated to be used as illustrations of the lexemes.

All lexemes have been entered into a Lexique Pro dictionary, which is also available through the project website. This dictionary was originally set up by André Müller, who worked as a student assistant in the project in 2013-2014. He set the basis for the dictionary, entered the first lexemes (several hundred) and made the first recordings. His work was then continued by two more student assistants (Teresa Klemm and Felix Anker), the main language assistant Gadzhimurad Gadzhimuradov and the project leader Diana Forker. Diana Forker and Gadzhimurad Gadzhimuradov gathered the data including the example sentences, did the translations into Russian (G. Gadzhimuradov) and English (D. Forker) and entered the data into a spreadsheet table and word files. Diana Forker also checked all entries and edited the final version of the dictionary. Gadzhimurad Gadzhimuradov read aloud all lexemes and many example sentences for the audio recordings of the entries. Teresa Klemm and Felix Anker cut the audio files, entered the audio files and new entries into the Lexique Pro dictionary, checked entries and assisted with the preparation of the Lexique Pro dictionary for dictionaria.

This is the first dictionary of a Nakh-Daghestanian language including not only a Russian translation, but also an English translation and both the orthographic representation with Cyrillic letters and the Latin-based transcription used by specialists of Nakh-Daghestanian languages. It can therefore, in principle, be used both by members of the Sanzhi community and by linguists and other researchers. In fact, one of the aims of collecting the Sanzhi lexicon was to produce a simple dictionary for the speech community, which can be found in Forker & Gadzhimuradov (2017). Due to the semantic categorization of many words it is possible to investigate only specific semantic domains. The dictionary can also serve as the basis for investigating in more detail the morphology and morphophonology of Dargwa varieties.

Conventions for transcription and glossing and orthographic conventions

The dictionary uses a Latin-based broad transcription, which conforms to the conventionalized transcription system employed by the majority of linguists working on Nakh-Daghestanian and West Caucasian languages and which is also used in the Sanzhi reference grammar (Forker In Press) and other work on Sanzhi by Forker. In addition, it also contains an orthographic representation with Cyrillic letters, which is based on the orthography of Standard Dargwa with a few additional conventions. This orthography has been used by my main assistant Gadzhimurad Gadzhimuradov and by myself since 2012 and can be understood by speakers of Sanzhi Dargwa who have some training in Standard Dargwa. The letters given in brackets represent phonemes that occur only in loan words.

Table 7: Sanzhi orthography and transcription

вw, ʷw, ʷ
еe, jee, je

In addition to the transcription in Table 7 a few more conventions were adopted by Gadzhimurad Gadzhimuradov and me during our work on the Sanzhi dictionary. One major convention is that words are generally written according to their phonological form, which can differ from their phonemic form. For instance, inflection can result in geminates occurring across morpheme boundaries, but the spelling of words and example sentences will reflect the underlying form of the two separate morphemes, not the actual pronunciation. For instance, a number of verbs contain spatial preverbs ending in a gender marker, which are prefixed to a verb with another gender marker as in či-b-b-aš-ij ‘walk’. The sequence b-b in such verbs is not spelled , although it is pronounced like that. Similarly, the verb-initial gender prefix w- is normally omitted when the vowel i follows, but since this omission is optional and predictable, it is not reflected in the spelling.

However, pharyngealized vowels have always been written even in those environments where, in principle, they could be predicted because there appears to be some inter-speaker variation which requires further study. Furthermore, in a number of words Gadzhimurad Gadzhimuradov was not sure about whether the voiceless fricatives s, ʃ, x and χ were geminates (tense) or not, and in those cases we agreed to write them as non-geminate.

The structure and contents of lexical entries

Every entry contains minimally a lexeme written in the Latin-based transcription and in the Cyrillic orthography, an English translation, a Russian translation and information about the part of speech. The Russian translation of verbs does not always reflect the aspectual value of the Sanzhi verb. In other words, from the aspectual value of Russian translations it is not possible to derive the aspectual value of the Sanzhi verb to which the translation belongs. The list of the parts of speech and the abbreviations used for them can be found in Table 4 above. In the above sections it is also explained for each part of speech which form has been chosen as the base form.

Obligatory fields for all lexical entries:

  • headword: contains the head word (lemma)
  • Cyrillic: the representation of the word in the Cyrillic orthography
  • part of speech
  • meaning description: approximate definition in English
  • Russian: approximate definition in Russian

Optional fields:

  • morphological structure: underlying form, given for all morphologically complex words
  • plural: plural form(s), given only for nouns and some adjectives
  • imperfective converb: form of the imperfective converb, given only for imperfective verb stems and verb stems lacking the aspectual specification
  • preterite: form of the preterite tense, given only for perfective verb stems and verb stems lacking the aspectual specification
  • imperative: form of the imperative, given only for some perfective verb stems
  • semantic domain: indication of one or more semantic domains if applicable (see Section on semantic domains)
  • note: any comments about meaning, function, donor language, use, transitivity, aspectual value, etc. some notes report comments by the main language assistant Gadzhimurad Gadzhimuradov, abbreviated as HM

Example phrases or sentences contain four fields. In Sanzhi, the gender of human referents can often not be inferred from the form of the verbs or other constituents in the clause. By contrast, the Russian past tense forms express gender. The gender used in the Russian translations is largely masculine singular because the sentences have been uttered by a male speaker (Gadzhimurad Gadzhimuradov), and many of the have first person subjects. The English translations comply with the Russian translations. However, gender is very often not formally encoded in the Sanzhi sentences, and thus the sentences could also be translated by making use of feminine or plural referents.

Fields of examples phrases or sentences:

  • example sentence
  • representation in Cyrillic
  • English translation
  • Russian translation


Table 8: Abbreviations for parts of speech and affixes

COPcopula verbs
DERIVother derivational suffixes
N.SUFnominal suffixes
NUM.SUFsuffixes used for the derivation of numerals
V.PREFverbal prefixes
V.SUFverbal suffixes

Table 9: Glosses (only in the introductory text and in some notes)

1SGfirst person singular
ABLablative case
DDdefinite description
GMgender marker
HPLhuman plural
ICVBimperfective converb
INspatial case ‘in’
NPLneuter plural
SPRspatial case ‘on’

Table 10: General abbreviations

e.g.for example cetera
HMGadzhimurad Gadzhimuradov, the main Sanzhi language assistant and collaborator for the dictionary
lit.literal translation est

Associated audio recordings

The audio recordings of all lexical entries and all examples have been made together with Gadzhimurad Gadzhimuradov, between 2016 and 2018 at his home in Druzhba (Daghestan, Russian Federation) and at the University of Bamberg (Germany).

Semantic domains

The dictionary has been classified for the following semantic domains:

  • plant
  • animal
  • body part
  • kinship
  • food
  • clothing
  • tool
  • housing & lifestyle
  • musical instrument
  • religion & mythology
  • profession
  • color
  • time
  • measure
  • numerals
  • toponym
  • demonym (i.e. label for ethnic groups or residents of particular places)
  • name

The list includes a category ‘loan words’, which is obviously not a semantic category, but was added in order to facilitate the identification of loan words. All Russian borrowings have a note that states their origin from Russian. Most other words in this category have simply been categorized as ‘loans’, but some are classified as loans from Arabic, Turkic or Persian.

Since there is some overlap between a few categories (in particular ‘tool’ and ‘housing & lifestyle’, but also between ‘loan word’ and many of the above categories, some words belong to more than one category).


This dictionary is the result of several years of collaboration with Gadzhimurad Gadzhimuradov, my main Sanzhi language assistant. Without him its compilation would not have been possible. He patiently collected words, tried to find the best Russian translation and read the lexemes and many examples aloud for the audio recordings. I also want to thank his wife Batʼičaj (Fatimat) and his entire family for being so wonderful and warm-hearted hosts through all the years.

Another big barkalla goes to Isakadi Bakhmudov, a teacher and former school-director from Druzhba who supported the work on Sanzhi by helping to collect words, reading through a preliminary version of Forker & Gadzhimuradov (2017), telling stories and providing information about the Sanzhi community.

The dictionary is dedicated to the Sanzhi community and all its members, in particular those people who actively helped me in my work on Sanzhi:

  • Asabali Gadžimuradov
  • Tavlu Džaparov
  • Valijula Gasanaliev
  • Magomed-Salam Kamilov
  • Kanpaj Abdulxalikov
  • Abdulxalik Gusejnov
  • Žavgar Abdulxalikova
  • Xamis Džaparova
  • Žamilat Israpilova
  • Zarema Džabrailova
  • Amatulla Abdulxalikova
  • Zulajxat Kamilova
  • Xurijat Abdulxalikova
  • Ramazan Žaparov
  • Rabazan Džabrailov
  • Muslimat Kurbanova
  • Patimat Baxmudova
  • Xadižat Abdulxalikova
  • Zamir Gadžimuradov
  • Malla Abdulxalikov
  • Bajta Gusejnova
  • Gusejni Rabadanov
  • Sanijat Gadžimuradova
  • Aminat Ašurbekova
  • Ašura Baxmudova
  • Ašura Abdulxalikova
  • Kaj Rabadanov

I am grateful to my colleague and friend Rasul Mutalov, native-speaker of Itsari Dargwa and linguist who first had the idea to document Sanzhi Dargwa and collaborated with me in the project “Documenting Dargi languages in Daghestan: Shiri and Sanzhi” (2012-2019). He also helped me to organize the field trips, hosted me in Makhachkala and Moscow, made two trips to the village of Sanzhi possible and answered numerous questions about Dargwa languages.

I also thank André Müller, who was the first student assistant in the Sanzhi documentation project and who set up the dictionary in Lexique Pro and collected the first several hundred entries. He was followed by Teresa Klemm and Felix Anker who entered the bulk of all other entries and integrated the audio recordings.

The project “Documenting Dargi languages in Daghestan: Shiri and Sanzhi” and thus the compilation of this dictionary was finally supported by a grant from the VW Foundation, DoBeS program to Diana Forker (Grant Number 86 357).

I would also like to thank Iren Hartmann for her help and advice during the process of preparing the dictionary for publication with dictionaria. All errors and deficiencies are my own.


Comrie, Bernard & Madzhid Khalilov. 2010. Slovar' yazykov i dialektov narodov Severnogo Kavkaza [Dictionary of languages and dialects of the peoples of the Northern Caucasus]. Leipzig/Makhachkala: Max Planck Institute for Evolutionary Anthropology.

Forker, Diana. 2014. Are there subject anaphors? Linguistic Typology 18, 51–81.

Forker, Diana. 2016. Floating agreement and information structure: The case of Sanzhi. Studies in Language 40, 1–25.

Forker, Diana. 2018. Sanzhi-Russian code switching and the Matrix Language Frame Model. International Journal of Bilingualism.

Forker, Diana. 2019a. Elevational deixis and insubordination in Sanzhi Dargwa. Language Typology and Universals 71, 29–62.

Forker, Diana. 2019b. Grammatical relations in Sanzhi Dargwa. In Alena Witzlack-Makarevich & Balthasar Bickel (eds.), Argument Selectors. A new perspective on grammatical relations, 69-106. Amsterdam: Benjamins.

Forker, Diana. 2019c. Reported speech constructions in Sanzhi Dargwa and their extension to other areas of grammar. In Patrizia Noel Aziz & Barbara Sonnenhauser (eds.), The syntax of pragmatics: Addressing, adding, signaling. Special issue of Sprachwissenschaft.

Forker, Diana. In Press. A grammar of Sanzhi Dargwa. Berlin: Language Science Press.

Forker, Diana. Accepted [a]. More than just a modal particle: The enclitic =q'al in Sanzhi Dargwa. Functions of Language.

Forker, Diana. Accepted [b]. Sketch grammar of Sanzhi Dargwa. Submitted to Yuri Koryakov, Yury Lander & Timur Maisak (eds.), The Caucasian Languages: An International Handbook. Berlin: De Gruyter.

Forker, Diana & Gadzhimurad Gadzhimuradov. 2017. Sanžinskie skazki i rasskazy [Sanzhi fairy tales and stories]. Makhachkala.

Kalinina, Elena & Nina R. Sumbatova. 2007. Clause structure and verbal forms in Nakh- Daghestanian. In Irina Nikolaeva (ed.), Finiteness: Theoretical and empirical foundations, 183–249. Oxford: Oxford University Press.

Kibrik, Aleksandr E. & Sandro V. Kodzasov. 1988. Sopostavitel'noe izučenie dagestanskix jazykov: Glagol [The comparative study of Daghestanian languages: The verb]. Moscow: MGU.

Kibrik, Aleksandr E. & Sandro V. Kodzasov. 1990. Sopostavitel'noe izučenie dagestanskix jazykov: Imja. Fonetika [The comparative study of Daghestanian languages: The noun. Phonology]. Moscow: MGU.

Sumbatova, Nina R. Accepted. The Dargwa languages. Submitted to Yuri Koryakov, Yury Lander & Timur Maisak (eds.), The Caucasian Languages: An International Handbook. Berlin: De Gruyter.

Sumbatova, Nina R. & Rasul O. Mutalov. 2003. A grammar of Icari Dargwa. Munich: Lincom.

Sumbatova, Nina R. & Yury Lander. 2014. Darginskij govor selenija Tanty: Grammatičeskij očerk, voprosy sintaksisa [The Dargwa dialect of Tanti: A grammatical sketch, problems of syntax]. Moscow: Jazyki slavjanskoj kul'tury.

van den Berg, Helma. 2001. Dargi folktales: Oral stories from the Caucasus with an introduction to Dargi grammar. Leiden: Research School of Asian, African and Amerindian Studies.

Full Entry Headword Part of Speech Meaning Description Cyrillic Russian Semantic Domain Examples
Primary Text Analyzed Text Gloss Translation IGT