Gellish Syntax and Contextual facts

Definition of the Gellish Expression Format
for Universal Semantic Databases, Data Exchange Messages and Queries

Gellish Formal English is a universally applicable language. This enables to define universal databases, provided that we can define a universal syntax or data model. This is indeed possible on the basis of the universal basic semantic patterns.
A Gellish Universal Semantic Database or Gellish Data Exchange Message or Query consists of a collection of Gellish Expressions with a uniform structure as briefly described in this section.
Every Gellish expression is an expression of a 'main fact' (a main statement or proposition or query) and a number of 'contextual facts' that are relevant for the correct interpretation of the main fact. Together, the contextual facts form the “Gellish collection of contextual facts ”. The collection is comparable with the 'Dublin core'. The Gellish collection is intended to be suitable as a complete set of contextual facts. Each collection is related to a single main fact and does not imply additional main facts.The Gellish collection of contextual facts is briefly described below.

The structure or syntax of Gellish expressions is also universally applicable and does not require a dedicated data model, nor an extensive database design. The Gellish Expression Format consists of a structure that can be implemented in one universal Gellish Expression Table or a semantically equivalent structure, e.g. in RDF/XML.
The document 'Gellish Syntax and Contextual Facts', a definition of the Gellish Expression Format for Universal Semantic Databases, Data Exchange Messages and Queries, defines the full collection of contextual facts that corresponds to columns in a Gellish Expression Table, including a detailed definition of each contextual fact.

Each Gellish Database, Exchange Message or Query consists of one or more Gellish Expression Tables or equivalent formats. Each of them has basically the same structure. and is standardized and is application system independent. This differs from conventional databases that usually have proprietary data structures, and that have many database tables that are all different. Each collection of Gellish Expressions shall contain at least the obligatory contextual facts of one of the subsets of contextual facts that are defined in the Gellish Syntax definition document, as is summarized below.
The Gellish Expressions shall be compliant with the grammar and the dictionary of Gellish Formal English (or a Gellish variant in any other natural language). The standardized format, combined with the Gellish formal language and its management of unique identifiers (UIDs) ensures that the content of various collections of expressions can be combined and used as an integrated collection without the need for data harmonization or conversion. This enables to combine an arbitrary number of collections of Gellish Expressions into one (real or virtual) Database. Such a database might be centralized, but it can also be a distributed database. The consistency of the various collections of expressions can be verified by software. Furthermore it enables that a Gellish query can be executed on each independent collection of expressions, whereas the results of the query can be combined and presented together to a user. This means that the collections then act as one distributed database.
The various Gellish collections of expressions all have basically the same format, such as having identical column definitions. Apart from the fact that expressions may use a subset collection of contextual facts as appropriate. Preferred collections of contextual facts are defined in standard Gellish subsets of contextual facts.

A Gellish Database with Gellish Expressions may be implemented in various formats. It may be in tabular form that are implemented in the form of flat Unicode tables, SQL databases, or even in the form of XLS spreadsheet tables. The tabular form may also be converted into non-tabular implementation forms, such as in RDF/XML triple stores.

2. Limitations of conventional databases

Conventional databases typically consist of many tables, each of which is composed of a number of columns. The definition of those tables and columns determine the storage capabilities of the database, whereas the relations between the columns define the kinds of facts that can be stored in such a database. Those columns and relations determine the database structure that defines the expression capabilities of the database. Similar rules apply for the structure of data exchange files and thus for the information that is exchanged in electronic data files.
This conventional database technology has some major constraints:

  • When data was not covered during the database design and thus is not included in the data model, then such data cannot be stored in the database nor exchanged via such a data file structure.
  • Different databases have different data structures, which causes that data in one database cannot be integrated with data from other databases nor exchanged between databases without dedicated data conversion.
  • A database modification or extension requires redesign of the database structure, modification of software and data conversion, which makes it a relatively complicated and costly exercise.

Another characteristic of conventional databases is that there are hardy international standards available or used for the content of the databases, being the data that is entered by its users. This typically means that local conventions are applied to limit the diversity of data that may be entered in those databases. As local conventions usually differ from other local conventions this has as disadvantage that data that are entered in one database cannot be compared or integrated with data in other databases, even if those database structures are the same and even if the application domain of the databases is the same. For example, within a company there may be various implementations of the same system in various sites for the storage of data about equipment, whereas for example the performance data about the same type of equipment still cannot be compared with the performance data in another location, because the equipment types have different names and the properties are also different.

3. Gellish Expression Format Definition

The document 'Gellish Syntax and Contextual Facts' defines the full collection of main and contextual facts in each Gellish Expression. Such a collection can be a part of a Gellish Database, a Gellish Message or a Gellish Query. The document also defines a number of standardized subsets for usage in applications that do not require the full number of contextual facts. The definition of the Gellish Expression Format is also included in the book 'Semantic Modeling in Formal English'.
One of those subsets, the Business Model subset, is suitable for nearly all database contents data exchange usecases that describe knowledge and propositions. Its application range includes business communication about both designs (imaginary objects) as well as real world objects (observed individual objects) during their lifecycle and about enquiries, answers, orders, confirmations, etc. This table is a superset (indicated in bold) of the product model subset, so it can also be used for knowledge about classes of objects.
The subsets consists up to over 30 standard kinds of contextual facts.

A summary of the syntax definition document is given below.


The Gellish Expression collection header definition

Each collection of Gellish Expressions, typically in the form of a table, should have a header or table definition that defines the facts (columns in the table) and a body of expressions of main facts and their contextual facts. Typically one row in the table for each expression.
A Gellish expression collection consists of a predefined number of contextual facts. Thus a table can consist either of a complete set of columns or of a subset of columns. The document defines a number of standard subsets of contextual facts.
Each contextual fact or column has a column ID and a column name and has a meaning as defined below.
Note that the presence of a value in a column field implies one or more relations with values in other columns. The semantics of these implied relations are specified in the definitions of the table columns. Those relations define the (accessory) facts about the main fact!

If a collection is implemented in a table in a spreadsheet or ASCII or Unicode file, then the table starts with a header of three lines, as follows:

  • The first line contains a sequence of the following four fields A1, A2, A3 and A4, which shall contain the following text:

A1 = ’Gellish’
A2 = Natural language of the expressions in the table. Default 'English'.
A3 = ‘Version:’
A4 = version number of the applicable Gellish dictionary.
A5 = date of the release of the facts in this table (optional).
followed by free text fields.

  • The second line contains the column ID’s which consists of standard numbers, although arbitrarily chosen. They allow the columns to be presented in a different sequence without loss of meaning (the numbers below correspond to those column ID’s).
  • The third line contains human readable text in every column field providing a short name of the column. This name is free text.

The Gellish expression collection body

The lines (rows) in a collection of Gellish expressions are independent of each other and thus the lines may be sorted in any sequence, without loss of semantics (meaning).

Each line in a collection of Gellish expressions (which in a spreadsheet table starts on the fourth line) expresses a group of facts, which consists of a 'main fact' and a number of 'contextual facts' that are defined as follows.

Main fact.
A main fact is expressed by a combination of the following objects (the column IDs' are given in brackets):

  • A UID of a main fact (1)
  • A UID of a left hand object (2)
  • A UID of a relation type (60)
  • A UID of a right hand object (15)
  • A UID of a scale (unit of measure) (66)
  • A UID of an intention (5)

Prime contextual facts.

The prime contextual facts are represented by the following table columns, each of which implies an expression by a triple of objects (which are implicitly classified). The table columns are:

  • A UID of a left hand kind of role (72)
  • A UID of a right hand kind of role (74)
  • A pair of left hand object cardinalities (44)
  • A pair of right hand object cardinalities (45)
  • A UID of the accuracy of a quantification (76)
  • A UID of a pick list for the qualification of aspects (70)
  • A UID of the validity context for a fact (19)
  • A partial textual definition of a concept or individual thing (65)
  • A full textual definition of a concept or individual thing (4)
  • A textual description of a main fact (42)
  • Remarks on the expression of a main fact (14)
  • Approval status of the expression of a main fact (8)

Secondary contextual facts.

The secondary contextual facts are represented by the following table columns, each of which implies a triple of classified objects. These contextual facts form the context for the validity of the UID’s and the names for objects that are identified by their UID’s:

  • A reason for latest change of status
  • A UID of the successor of the fact, in cases it has the status 'replaced'
  • UID of creator of fact
  • Date-time of start of validity of the fact
  • Date-time of start of availability of the expression
  • Date-time of creation of copy
  • Date-time of latest change of the expression
  • UID of addressee of the expression
  • References
  • UID of the expression of the fact (Line UID)
  • UID of a collection of facts to which the fact belongs
  • A presentation sequence in which the expressions can be presented

In a tabular implementation, the columns with UID's are accompanied by columns with a name for the thing that is represented by the UID.

Field formats and optionality

Several columns contain unique identifiers (UID’s). Each UID should preferably be represented by a 64-bit integer (8-byte, Int64 or bigint'),' whereas only positive values shall be used. It is not recommended to use an unsigned integer (which only allows positive values) because SQL only enables the bigint datatype, which is signed.
Most other columns contain character string values. For database implementations it is indicated whether they have a fixed or variable length (nvarchar or varchar) or whether the string is externally stored (data types ntext and text). In addition to that it is indicated whether the cells may contain Unicode.
Fields in columns that are indicated as optional may be left empty, in which case the indicated default value is applicable. Otherwise a field value is obligatory.

Further details of the column definitions are given in the document 'Gellish Syntax and Contextual Facts'.