Gellish Formal English is a universally applicable language. This enables to define universal databases, provided that we can define a universal syntax or data model. This is indeed possible on the basis of the universal basic semantic patterns.
A Gellish Universal Semantic Database or Gellish Data Exchange Message or Query consists of a collection of Gellish Expressions with a uniform structure as briefly described in this section.
Every Gellish expression is an expression of a 'main fact' (a main statement or proposition or query) and a number of 'contextual facts' that are relevant for the correct interpretation of the main fact. Together, the contextual facts form the “Gellish collection of contextual facts ”. The collection is comparable with the 'Dublin core'. The Gellish collection is intended to be suitable as a complete set of contextual facts. Each collection is related to a single main fact and does not imply additional main facts.The Gellish collection of contextual facts is briefly described below.
The structure or syntax of Gellish expressions is also universally applicable and does not require a dedicated data model, nor an extensive database design. The Gellish Expression Format consists of a structure that can be implemented in one universal Gellish Expression Table or a semantically equivalent structure, e.g. in RDF/XML.
The document 'Gellish Syntax and Contextual Facts', a definition of the Gellish Expression Format for Universal Semantic Databases, Data Exchange Messages and Queries, defines the full collection of contextual facts that corresponds to columns in a Gellish Expression Table, including a detailed definition of each contextual fact.
Each Gellish Database, Exchange Message or Query consists of one or more Gellish Expression Tables or equivalent formats. Each of them has basically the same structure. and is standardized and is application system independent. This differs from conventional databases that usually have proprietary data structures, and that have many database tables that are all different. Each collection of Gellish Expressions shall contain at least the obligatory contextual facts of one of the subsets of contextual facts that are defined in the Gellish Syntax definition document, as is summarized below.
The Gellish Expressions shall be compliant with the grammar and the dictionary of Gellish Formal English (or a Gellish variant in any other natural language). The standardized format, combined with the Gellish formal language and its management of unique identifiers (UIDs) ensures that the content of various collections of expressions can be combined and used as an integrated collection without the need for data harmonization or conversion. This enables to combine an arbitrary number of collections of Gellish Expressions into one (real or virtual) Database. Such a database might be centralized, but it can also be a distributed database. The consistency of the various collections of expressions can be verified by software. Furthermore it enables that a Gellish query can be executed on each independent collection of expressions, whereas the results of the query can be combined and presented together to a user. This means that the collections then act as one distributed database.
The various Gellish collections of expressions all have basically the same format, such as having identical column definitions. Apart from the fact that expressions may use a subset collection of contextual facts as appropriate. Preferred collections of contextual facts are defined in standard Gellish subsets of contextual facts.
A Gellish Database with Gellish Expressions may be implemented in various formats. It may be in tabular form that are implemented in the form of flat Unicode tables, SQL databases, or even in the form of XLS spreadsheet tables. The tabular form may also be converted into non-tabular implementation forms, such as in RDF/XML triple stores.
Conventional databases typically consist of many tables, each of which is composed of a number of columns. The definition of those tables and columns determine the storage capabilities of the database, whereas the relations between the columns define the kinds of facts that can be stored in such a database. Those columns and relations determine the database structure that defines the expression capabilities of the database. Similar rules apply for the structure of data exchange files and thus for the information that is exchanged in electronic data files.
This conventional database technology has some major constraints:
Another characteristic of conventional databases is that there are hardy international standards available or used for the content of the databases, being the data that is entered by its users. This typically means that local conventions are applied to limit the diversity of data that may be entered in those databases. As local conventions usually differ from other local conventions this has as disadvantage that data that are entered in one database cannot be compared or integrated with data in other databases, even if those database structures are the same and even if the application domain of the databases is the same. For example, within a company there may be various implementations of the same system in various sites for the storage of data about equipment, whereas for example the performance data about the same type of equipment still cannot be compared with the performance data in another location, because the equipment types have different names and the properties are also different.
The document 'Gellish Syntax and Contextual Facts' defines the full collection of main and contextual facts in each Gellish Expression. Such a collection can be a part of a Gellish Database, a Gellish Message or a Gellish Query. The document also defines a number of standardized subsets for usage in applications that do not require the full number of contextual facts. The definition of the Gellish Expression Format is also included in the book 'Semantic Modeling in Formal English'.
One of those subsets, the Business Model subset, is suitable for nearly all database contents data exchange usecases that describe knowledge and propositions. Its application range includes business communication about both designs (imaginary objects) as well as real world objects (observed individual objects) during their lifecycle and about enquiries, answers, orders, confirmations, etc. This table is a superset (indicated in bold) of the product model subset, so it can also be used for knowledge about classes of objects.
The subsets consists up to over 30 standard kinds of contextual facts.
A summary of the syntax definition document is given below.
Each collection of Gellish Expressions, typically in the form of a table, should have a header or table definition that defines the facts (columns in the table) and a body of expressions of main facts and their contextual facts. Typically one row in the table for each expression.
A Gellish expression collection consists of a predefined number of contextual facts. Thus a table can consist either of a complete set of columns or of a subset of columns. The document defines a number of standard subsets of contextual facts.
Each contextual fact or column has a column ID and a column name and has a meaning as defined below.
Note that the presence of a value in a column field implies one or more relations with values in other columns. The semantics of these implied relations are specified in the definitions of the table columns. Those relations define the (accessory) facts about the main fact!
If a collection is implemented in a table in a spreadsheet or ASCII or Unicode file, then the table starts with a header of three lines, as follows:
A1 = ’Gellish’
A2 = Natural language of the expressions in the table. Default 'English'.
A3 = ‘Version:’
A4 = version number of the applicable Gellish dictionary.
A5 = date of the release of the facts in this table (optional).
followed by free text fields.
The lines (rows) in a collection of Gellish expressions are independent of each other and thus the lines may be sorted in any sequence, without loss of semantics (meaning).
Each line in a collection of Gellish expressions (which in a spreadsheet table starts on the fourth line) expresses a group of facts, which consists of a 'main fact' and a number of 'contextual facts' that are defined as follows.
A main fact is expressed by a combination of the following objects (the column IDs' are given in brackets):
Prime contextual facts.
The prime contextual facts are represented by the following table columns, each of which implies an expression by a triple of objects (which are implicitly classified). The table columns are:
Secondary contextual facts.
The secondary contextual facts are represented by the following table columns, each of which implies a triple of classified objects. These contextual facts form the context for the validity of the UID’s and the names for objects that are identified by their UID’s:
In a tabular implementation, the columns with UID's are accompanied by columns with a name for the thing that is represented by the UID.
Field formats and optionality
Several columns contain unique identifiers (UID’s). Each UID should preferably be represented by a 64-bit integer (8-byte, Int64 or bigint
'),' whereas only positive values shall be used. It is not recommended to use an unsigned integer (which only allows positive values) because SQL only enables the bigint datatype, which is signed.
Most other columns contain character string values. For database implementations it is indicated whether they have a fixed or variable length (nvarchar or varchar) or whether the string is externally stored (data types ntext and text). In addition to that it is indicated whether the cells may contain Unicode.
Fields in columns that are indicated as optional may be left empty, in which case the indicated default value is applicable. Otherwise a field value is obligatory.
Further details of the column definitions are given in the document 'Gellish Syntax and Contextual Facts'.