Automated translation

Automated Translation

1. Language independent UIDs

Each thing/object is represented throughout the Gellish language by its own unique identifier (UID), which is a natural language independent string of characters. Each UID is related to at least one language dependent term (name) or phrase to enable human interpretation in the natural language and language community that is applicable. So, any object has only one UID, but it may have many ‘names’, being the terms, codes and synonyms or phrases by which it may be denoted in different languages and language communities. For example, UID 40153 is a language independent representation of a particular concept that is denoted in English by the term road and in Dutch by the term weg. Also individual objects as well as kinds of relations and ideas have their unique UIDs and their language dependent names or phrases.
Each Gellish expression is normally a sequence of such UIDs, terms and a phrase and therefore those expressions have a natural language independent part (the UIDs) and a natural language dependent part (the ‘names’). In addition to that Gellish defines how the elements in the expression (the columns in the Gellish expression tables) relate to each other. The definition of those columns and the relations between them forms the language independent syntax of the Gellish language.
The following examples of expressions in a Gellish English expression table illustrate that.

Gellish Language=
English
Language UID of
LH object
Name of
LH object
UID of idea UID of kind of relation Name of kind of relation UID of RH object Name of RH object
English 40153 road 201 2069 can have as aspect a 550464 width
international 102 B-23 202 1225 is classified as a 40153 road
Dutch 40153 weg 203 4691 is a translation of 40153 road
Dutch 550464 breedte 204 4691 is a translation of 550464 width

Table : Example expressions in Gellish English

The row above specifies that the content is a Gellish expression table and that the used language is English. The languages in the first column of the table indicate on each line the language of the name of the left hand (LH) object. The first two rows in the table express some ideas in English, whereas the last two rows express some assertions that typically is content from a multi-language Gellish dictionary. Wherever the kind of relation with UID 4691 <is a translation of> is used, the language of the left hand name and the language of the right hand name is different.
The first idea (201) expresses the general knowledge that a road can have as aspect a width. This idea is true, independent of the language in which that idea is expressed. That idea (201) is therefore expressed in a language independent way by the combination of the UIDs: (40153, 2069, 550464)

All three UIDs in this example are selected from the Gellish English dictionary, although their combination in idea 201 is new.

The second idea (202) in Table 1 is an expression that specifies the name of a particular individual road and it indicates that the name of the road (B-23) is given in an international language. In other words, the name of the road is language independent.

  • Note: The Gellish dictionary contains a number of concepts that have ‘names’ that are language independent, such as currencies and units of measure. Examples of international ‘names’ for numbers are 1, 2, 3, etc., whereas for decimals such as 3.5 Gellish uses the dot as international separator, although a number of languages such as Dutch and German use a comma (‘,’) as separator. Examples of international ‘names’ of units of measure are mm, m, km, bar, psi, deg C, deg F, etc.

2. Automated translation

The four lines of Table 1 are sufficient for a piece of software to present the same facts in the Dutch language (assuming that translations of the kinds of relations are also available in the dictionary). This is possible, because the UIDs of the concepts are language independent and idea 203 and 204 on the third and fourth line give the translation of the names of the used concepts in Dutch. This means for example that users of Gellish enabled software could ask questions in one language whereas they can specify that the response should be in a different language. Another example is that software can be able to create Table 2 from an interpretation of the content of Table 1!

Gellish Taal=
Nederlands
Language UID of
LH object
Name of
LH object
UID of idea UID of kind of relation Name of kind of relation UID of
RH object
Name of
RH object
Nederlands 40153 weg 201 2069 kan als aspect hebben een 550464 breedte
Internationaal 102 B-23 202 1225 is geclassificeerd als een 40153 weg
Nederlands 40153 road 203 4691 is een vertaling van 40153 weg
Nederlands 550464 width 204 4691 is een vertaling van 550464 breedte

Table 2, Automated translation of Table 1 in Gellish Nederlands

This also illustrates that there is no need for expressing the same ideas in various languages. Gellish enabled software only needs a Gellish dictionary of the other language in order to be able to present the ideas in that other language as well.

3. Multi-language support

Comparison of the content of Table 1 and Table 2 illustrates that they represent the same ideas in two different languages. So the ideas are the same, but the expressions are different. The ideas with UID 201, 202, 203 and 204 are ideas that are true, independent of any language in which the ideas are expressed.

The last two ideas (203 and 204) in Table 1 are expressions in English about the translation of the term road and the term width. They represent ideas in an English – Dutch dictionary. Therefore, the name of the language in the first column (‘Dutch’) is given in English (!), because the name of that language in Dutch would have been ‘Nederlands’. The equivalent ideas 203 and 204 in Table 2 however are expressions in Dutch about the same ideas as they would appear in a Nederlands – Engels woordenboek (dictionary).

Table 1 and Table 2 illustrate that the UID of an idea remains the same, even if the language for the expression changes. This means that a database that includes expressions in multiple languages may contain multiple lines with the same idea UIDs. To distinguish such lines those lines will have different line UIDs. Note that to save space the line UIDs are not shown in Table 1 and Table 2.

4. ASCII and Unicode

As the ASCII character set is not sufficient to represent the characters in many languages and also for the names of units of measure, Gellish is therefor by default expressed in Unicode, but may also use ASCII. The used character set is indicated by a separate parameter.

Continue with Change Management