Semantic consistency verification

Databases and data exchange messages should be verified on the semantic consistency of their content. Semantic consistency verification checks whether collections of expressions contain redundant statements or statements that are in conflict with other statements. The verification possibilities and processes for the content of conventional databases, such as in relational (entity-attribute-relationship) data model instances, differ from the verification possibilities and processes for databases which content is expressed in Gellish which is based on a defined taxonomy of concepts. This article discusses the verification of several semantic consistency rules and constraints in Gellish. Several consistency rules are implemented in the Gellish Communicator reference application software on Github.

Prerequisite: known typology (mandatory classifications and generalizations)

In Gellish, kinds of relations specify the kinds of things that are allowed to be related by such a relation. Thus related things shall be of kinds that comply with those specifications. In other words, if something plays a role in a relation of a particular kind, then it shall be of a kind that is an allowed kind of role player or a subtype of such a kind of role player. Semantic verification therefore has two prerequisites: things shall be classified and kind of things shall be related to their direct supertypes. In other words, it requires classification of all individual things and generalization of all kinds of things.

Classification of individual things

The first prerequisite is that individual things shall be of kinds that are allowed by the relations in which they are involved. This implies that the correct use of relations of particular kinds can only be verified when from all individual things it is known of which kind(s) they are. In other words:
– Every individual thing (including also individual aspects, properties and activities and individual relations) shall be classified at least once by a kind.
In semantic databases and data exchange messages in Gellish, the requirement for classification of individual things implies that explicit classification relations are required. For example, by a statement such as: P1 <is classified as a> pump. Such explicit classification relations allows that the classifying kinds are selected from the kinds in the dictionary of the language, including from any of many subtypes. Thus classification is not restricted to instantiation in available attribute types. Furthermore, multiple explicit classification relations for the same individual things are possible and simple in Gellish, and all things can freely appear in many relations of various kinds, without the constraint that the related things shall be classified by the same kind. This means that classification in Gellish is more flexible than classification in conventional databases and conventional messages.

Generalization of kinds of things

Possibilities, requirements and constraints are typically expressed as relations between kinds of things. When those kinds of things are arranged in a subtype-supertype hierarchy (a taxonomy), then the possibilities, requirements and constraints that are expressed for particular kinds are by definition inherited to all the subtypes of those kinds. When individual things are classified by one of those subtypes, then this means that all the possibilities, requirements and constraints that are specified for its supertype kinds is inherited to the classifying subtype kind and by consequence they are applicable for the classified individual thing.
Thus, to enable the verification whether individual things satisfy the possibilities, requirements and constraints, it is required that such a taxonomy is known. The second prerequisite for this verification is therefore that for all kinds of things (the classifiers of the individual things) mandatory generalization relations are required. In other words:
– Every kind (concept), including also every kind of relation, shall be defined by a specialization relation as a subtype of one or more supertype concepts (more generalized concepts).

Thus specialization-generalization relations between concepts, denoted by the phrase ‘is a kind of’ enable that possibilities, requirements, constraints and definitions are inherited from supertype concepts to their subtypes, whereas the classification relations determine the individual things for which those possibilities, requirements and constraints are applicable. Those are the reasons why classification relations and specialization-generalization relations are prerequisites for semantic verification.
Note 1: Gellish can be applied without these mandatory relations, but then many semantic verifications will be impossible.
Note 2: Conventional data models usually do not include a taxonomy (subtype-supertype hierarchy) of kinds. Thus then there is no taxonomy available for the determination whether specified knowledge, requirements and constraints are applicable for individual things.

For example, an individual thing that is denoted as Paris is used in an expression in Gellish, such as ‘the Eiffel tower <is located in> Paris’. Thus both the Eiffel tower and Paris are things that will be incorporated in the Gellish language. However, the validity of their involvement in relations and the applicability of possibilities, requirements and constraints can only be verified when those two objects are defined by classification relations. Such as by the expression ‘Paris is classified as a city’. The validity of that classification relation can only be verified when the concept city is a valid classifier. This can be determined by it being defined by a specialization relation as a subtype of the more generic concept (kind) that is by definition allowed to be a classifier in a classification relation. Thus the concept city should be an element in the subtype-supertype hierarchy (the taxonomic dictionary). Some concepts in that hierarchy will appear in definitions of other kinds of relations as being allowed as players of roles of various kinds in such relations. For example, physical objects are specified to be allowed to play a role as ‘locator’ in <is located in> relations. This allowed role player role is inherited via the taxonomy e.g. by the concept ”city”. The classification of Paris as a city thus specifies that statements such as ‘the Eiffel tower <is located in> Paris’ are valid statements.

To enable the maintenance of a semantically consistent database, the following rule is applicable: If all classifications of an individual thing or all specialization relations of a kind are deleted or terminated (historicized), then that thing terminates its existence. In other words: then its being represented and being known in the language is terminated. Such a termination implies that all binary relations in which the thing appears shall be deleted or terminated (historicized) as well. This may cause a chain of deletions and terminations.

Redundancy – unnecessary duplicates

In Gellish every thing shall have its own unique identifier (UID). Thus one thing may not have multiple UIDs, although every thing may have multiple names (synonyms). (Note that e.g. in RDF and OWL things may have multiple IRIs in different namespaces; thus IRIs differ from UIDs). Furthermore, in Gellish the same name may be used for different things (homonyms), provided that the identical names are specified as having their base in different language communities (naming contexts) and those different things have different UIDs.

This means that duplicate things are defined as things that are the same but that have different UIDs.
Such duplicates can thus be detected when different UIDs are denoted by the same name in the same language community (although they may appear not to be real duplicates when the identical names are caused by a naming error). Although it is difficult for software to detect proven duplicates (e.g. in case of twin baby’s), software can conclude that it is likely that different UIDs represent the same thing in the following situations:

When different UIDs are denoted by the same name in different language community contexts, i.e. they seem to be homonyms, they nevertheless are likely duplicates when they are individual things that are classified by the same kind (or by a near subtype of the other kind) or when they are kinds of things that are defined as being subtypes of (nearly) the same supertype. When they have the same name, but one is an individual thing and the other is a kind of thing then they are somewhat suspect. To reduce that chance it is recommended to denote individual things by names that start with a capital and to denote kinds of things by names that start with a lower case character.

When different UIDs are related to the same other role player, whereas playing roles of the same kind in relations of the same kind. This probability becomes stronger when cardinality constraints do not allow for that number of relations with the same role player. Note that the UIDs are not duplicates when they play roles of different kinds in a relation between each other (especially when another relation explicitly states that they are distinct things; e.g. when A and B are both a daughter of C, whereas a relation states that A is a first born twin of B); or when a time difference of occurrences in which they are involved excludes that they are the same thing.

Duplicate binary relations are relations that relate the same two things (UIDs) by a relation of the same kind (or by a subtype of the other kind of relation), even if the names of the related things are different (synonyms), whereas the validity periods have an overlap.
For a duplicate higher order relation the same applies as for duplicate things. But in addition to that it holds that higher order relations are likely duplicates when each of them has involvement relations with the same involved objects, so that the same things are involved in two similar occurrences (activities) at the same time. However, this is not a proof, because occasionally the same object may be involved in two similar activities at the same time.

Note that in particular contexts databases might allow for the recording of different expressions of the same statements. For example, a database may record multiple identical opinions about the same topic expressed by different persons, possibly in the same language or in different languages. In those cases the things (UIDs) may still not be duplicated, because the expressions are about the same things. However, the expressions (utterances) should be distinguished from the statements (relations plus intention). In exceptional cases databases may allow for the recording that the same person expresses the same opinion several times. Then they are still not duplicate expressions, because they are expressed at different moments in time.

Consistent typology

Individual things can be classified explicitly and kinds can have explicit specialization relations, whereas the fact that they are used in relations of particular kinds implies that they are (subtypes of) allowed kinds of role players for such relations. These various kinds should be consistent. This requirement results in the following rules:

Consistent typology (1): The Gellish taxonomic dictionary, being a language defining ontology, defines kinds of relations by the kinds of roles that are played in such relations and by specifying which kinds of things are allowed to play such roles. This implies that it also specifies things of kinds that are not allowed to play such a role. Thus, individual things (which shall always be classified) may not play a role in a relation that conflicts with the kind that is required for its role player. Thus the kind of individual thing shall be identical or belong to the hierarchy of subtypes of the allowed kind of role player for the kind of relation.

Consistent typology (2): An individual thing may be classified multiple times by different kinds, and a kind may be a subtype of various supertype kinds. In both cases the kinds are subtypes of a nearest common ancestor kind. That nearest common ancestor kind shall be allowed as a role player in the relations of the kind in which that thing or kind is involved. The typology is inconsistent when one of the kinds is not allowed a relation of a kind in which the thing is involved.

Constraints and requirements verification

See also the related topic of modeling and verification of constraints and requirements.

Gellish.net