Monday, March 30, 2015

Georgian Coursebook Transcriptions and Translations

Below are three excellent coursebooks for Georgian language study that I have transcribed and translated to the best of my ability.

Georgian Language and Culture

Howard L. Aronson and Dodona Kiziria

This is a transcription and translation of all the dialogues in the Dialogues section of Georgian Language and Culture. This includes all 110 pages of dialogue content, an original English translation, well as a brief copy of the glossary of idioms found at the end of the dialogues.

Download | Buy

Georgian: A Reading Grammar (web version)

Howard L. Aronson

This is a full transcription of Georgian: A Reading Grammar for web viewing. Original content includes full English translations of all the reading passages for lessons 5 through 15.

View | Buy

Einführung in die Georgische Sprache

Introduction to the Georgian Language

Kita Chenkeli (კიტა ჩხენკელი)

This was originally a German publication, published in 1958 for German speakers learning Georgian. These documents include transcriptions and full English translations of the lesson exercises from Volume 2.

Download

Tuesday, June 3, 2014

Georgian - English Dictionary Application

The following dictionary application was written in C# and requires .NET 2.0 or higher to be installed on your system. It is based on the content in the massive two-volume A Comprehensive Georgian - English Dictionary Copyright 2006 © Shukia Apridonidze, Laurence Broers, Ariane Chanturia, Levan Chkhaidze, Tina Margalitadze, and Donald Rayfield.  This edition is currently out of print but a print version of the dictionary can still be purchased here or through various other booksellers on the internet.  It is also available in electronic format, the content of which the following application contains.




The entries in the PDF were arranged into two columns per page, in the typical dictionary style. I first had to split the columns up into two separate files using briss, then shuffle them together with PDF Split and Merge. Since the dictionary is so large, I split up the merged PDF into four separate files. Having done this, I then converted the PDF documents to word documents using 7-PDF2Word, formatted the content in Word 2010 and finally saved them to rich text format (to more easily integrate the content into the application's RichTextBox control).  After this, the rest of the work involved writing the code for converting the entries in the dictionary to searchable objects, and searching and displaying the results through the application interface.

The program contains many useful features, including fast searching, searching while you type, searching within definitions, find regular expressions, highlighting search queries, exporting search results to a file, retaining search history, and more.  Font face and size can be modified in the preferences, and search history can be exported to file.  The option of minimizing or closing to the system tray is also supported. The icons used in the program were taken from the free FatCow icon set.

Wednesday, May 21, 2014

Georgian Word and Phrase Frequency Lists

This first post will be used to promote the following word frequency collections I have created for the Georgian language.  Frequency lists for other foreign languages exist on the web (here is another collection), but the ones for Georgian are scarce, very limited in size, or don't necessarily reflect any colloquial or practical usage as one might expect to find on news portals, message boards, or blogs.  I have thus created several lists generated from the content of such sites with the intention of not only to aid my study of Georgian but also to provide any else interested in this language and how it is used in the media and by Georgian speakers using the internet for communication.

Frequency Lists

forum.ge (Download)
This list was created from 6 months worth of posts. 43 million words were analyzed.

Be warned, the list contains several words and phrases that may be regarded as non-standard or grammatically incorrect, in addition to words that are considered vulgar and/or offensive by native Georgian speakers. Many non-Georgian (predominantly Russian) words are also present in the list. As such, I don't advise you to refer to this as a standard vocabulary for studying Georgian, but as a rough portrait of modern, colloquial Georgian as it is used on the internet.

Despite the above caveat, the list may be useful as a modern reference when studying from a formal Georgian language course like Georgian: A Reading Grammar (Aronson, Howard) or Einführung in die Georgische Sprache (ჩხენკელი, კიტა), i.e. special focus may be directed towards those words and verb forms which appear both in the list and the course vocabularies.

intermedia.ge, civil.ge, mediamall.ge (Download)
These lists were created from three separate media sites, each focusing on a different variety of topics.


Each archive above contains not only single word frequencies, but 2 and 3-word combination frequencies.  The lists are very simple, the words and phrases arranged in a list from most to least common.  Each entry is tagged with its count, i.e. number of occurrences in the source material analyzed.  The following example lists the top 10 words in the forum.ge archive:

და    1883889
არ    957538
რომ    604393
რა    473123
თუ    402558
უნდა    314222
ეს    308874
მე    239708
ამ    228322
მაგრამ    207494

და ('and' or 'sister') is the most common, followed by არ ('not') and რომ ('that' 'which').  In all of the lists I've found, და has always appeared at the top.  Interestingly, the most common word occurs nearly twice the number of times than the second most common word.  The same phenomenon is present in the list for Hungarian: 'a' ('the') is twice as common as 'nem' ('not').  Both Georgian and Hungarian are agglutinating, the former possessing far more irregularities in several parts of speech.

Information on the Georgian Language

Georgian is primarily spoken in the country of Georgia, located in the Caucasus between Russia and Turkey, by about 4.2 million people.  The language uses its own writing system and its grammar is characterized by a strikingly complex verb system.  Verbs can contain many components including, in addition to the root, morphemes that indicate not only the subject and tense but the object, direct object, and aspect.  Because of this, a phrase in English such as "I had him send it" can be expressed in only one or two words in Georgian (გავაგზავნინე).

Due to the morphological properties of Georgian, the above lists contain the counts for words used as is in the source material analyzed and not necessarily the dictionary (or "root") forms of a word.  As a result, a word like "this" will appear several times in with different declensions of both the demonstrative adjectival and nominal forms.  Included in each archive is a readme.txt file with further details on the source material and the format of each list.