October 22, 2024

Taxonomize This! How to Build and Refine a Taxonomy

The development of a knowledge graph typically starts with the design of an ontology, which models the general aspects of the domain of interest - that is, classes and relations. In turn, one of the key tasks when designing an ontology is the construction of a taxonomy. Even though a taxonomy alone will be insufficient for many knowledge modelling endeavours, it still provides the backbone around which a richer ontology can be built. This is not surprising, since every inquiry or knowledge systematisation starts with the classification of the objects of interest: once we have established what we are dealing with, we can study its properties, and eventually apply this framework to dealing with individual cases. In the realm of ontology and semantic technologies, these three stages are reflected in the construction of a taxonomy, the development of a richer ontology, and the eventual generation of a data-loaded knowledge graph.

Being a key step in the design of an ontology, the construction of a taxonomy is crucial for the success of the whole endeavour.

A taxonomy is a classification system, which consists of subclasshood (or subtyping) statements - that is, statements saying that a class is a subclass of another class, as in:

:Mammal rdfs:subClassOf :Animal .
:ElectricCar rdfs:SubClassOf :Car .
:Planet rdfs:subClassOf :AstronomicalObject.

‍

That being said, the point of constructing a good taxonomy is not just to draw a network of subclasshood relationships between classes: instead, it is to draw the right network of subclasshood relationships. In this article, we will see a few foundational principles for building a good taxonomy - or improving a pre-existing one. In particular, we are going to see two different types of hierarchical structures that can be used in the construction of a taxonomy, how those structures can be complied with, and how their adoption can support the formulation of definitions as well as the expansion of the taxonomy.

Two types of hierarchical structures

A key decision in the development of a taxonomy is the choice between two types of hierarchical structure: mono-hierarchy and poly-hierarchy. In a mono-hierarchical taxonomy, each class has at most one direct superclass, as exemplified by the tree below:

By contrast, in a poly-hierarchical taxonomy classes are allowed to have multiple direct superclasses; for example, in the tree below Class 4 has as a direct superclass Class 2 as well as Class 3:

‍

While neither of these two hierarchical structures is inherently superior, each is better suited for certain applications as opposed to others. As a general rule, mono-hierarchical taxonomies are better suited for organising expert or technical knowledge. By contrast, poly-hierarchy is better suited for constructing navigation taxonomies - that is, taxonomies that are aimed at optimising findability and browsing experience, like the ones that are employed in e-commerce applications.

Expert knowledge taxonomies and mono-hierarchy

The main consideration in favour of a mono-hierarchy is that it makes for a more informative taxonomy. Consider the taxonomic tree below:

Once we know that Class 8 is a subclass of Class 6, we thereby know that it’s also a subclass of Class 3 and Class 1, but not of Class 7 or Class 4. According to semantic models of information, carrying information is about excluding possibilities, which is why ruling out certain subclasshood links makes our taxonomy more informative.

A concrete example will make this virtue more apparent. Consider the car taxonomy below:

Once we know that Mild Hybrid Car is a subclass of Hybrid Car, we thereby know that it’s also a subclass of Electric Car and Car, but not of Fully Electric Car or Petrol-only Car.

An additional constraint that is typically combined with mono-hierarchy consists in keeping sibling classes disjoint from each other - that is, with no instance in common. Here again, the advantage of keeping sibling classes disjoint is that the resulting classification is more informative. Consider again the car taxonomy above: once we know that an individual car c is an instance of Mild Hybrid Car, we thereby know that c is not an instance of Self-charging Hybrid Car or of Plug-in Hybrid Car - because those three siblings are all disjoint from each other. Given the mono-hierarchical constraint, we can also infer that c is not an instance of Internal Combustion Engine Car, because it is an instance of Electric Car - which is disjoint from Internal Combustion Engine Car.

The most obvious benefit of increased information richness is for the taxonomy users, but disjointness between siblings can also be enforced within the knowledge graph by using the predefined OWL property owl:disjointWith: this allows one to run inferences and consistency checks. For example, a reasoner will output an error in case the knowledge graph contains a set of triples like the following:

:entity_1 rdf:type :Class_8 .
:entity_1 rdf:type :Class_9 .
:Class_8 owl:disjointWith :Class_9 .

‍

The value of classification criteria

The structural constraints that we have seen - mono-hierarchy and disjointness between siblings - might seem difficult to comply with in practice. However, there is a simple way of implementing them: adopting a uniform classification criterion. A classification criterion is a rule that groups entities according to a given dimension. Intuitively, a dimension consists of a series of qualities that are (i) commensurable, and (ii) mutually exclusive. A simple example of a dimension is offered by the range of neutrals that go from white to black, passing through the shades of grey:

The atomic numbers used for classifying chemical elements offer an example of a dimension within a specialised field:

This is the criterion according to which atoms are grouped into types - namely, the chemical elements. Similar guidelines can be applied to any domain; for example, the car taxonomy that we have seen in the previous section classifies cars according to their fuelling type:

To expand or refine this taxonomy, we would simply consider the different fuels and energy sources that can power a car’s engine.

Needless to say, entities vary across multiple dimensions: for example, cars also vary according to size and body type, and atoms also vary according to their mass number. Choosing one dimension as a classification criterion is the key to building a mono-hierarchical taxonomy: since all the points along one dimension are mutually exclusive, every class to which it is applied will be uniquely divided into disjoint subclasses.

Besides promoting certain structural features, adopting a uniform classification criterion brings a procedural advantage: it provides general guidance for the expansion of the taxonomy. Instead of relying only on their intuitions, the contributors to the taxonomy can follow explicit, agreed-upon guidelines for picking the subclasses of a given class. If applied systematically, a classification criterion can become a robust design pattern for developing and expanding a taxonomy - and thus the ontology that revolves around it.

Defining classes in a mono-hierarchy

Another major virtue of a mono-hierarchical taxonomy is that it facilitates the construction of definitions. More specifically, it facilitates the construction of Aristotelian definitions. As the name suggests, Aristotelian definitions were developed by the ancient Greek philosopher and polymath Aristotle (384-322 BCE), who analysed their logical form and employed them in a variety of classificatory tasks, in both philosophical and empirical inquiries.

Despite their venerable age, Aristotelian definitions are still fruitfully used in various types of research and endeavours, including applied ontology. An Aristotelian definition has the following form:

A terrestrial planet is a planet that is mostly made of silicates, rocks, or metals.

The part of the definition to the right of “is a” consists of two components: a proximate genus and a specific difference; in the example above, "planet" is the genus and "that is made mostly of silicates, rocks, or metals" is the specific difference. For our purposes, the proximate genus is simply the direct superclass of the class to define; the specific difference consists of the properties that differentiate the members of the class from the rest of the superclass. In our example, terrestrial planets are, among all planets, those that are mostly made of silicates, rocks, or metals.

It’s easy to see why this template fits a mono-hierarchical structure: a mono-hierarchy guarantees that each subclass has exactly one direct superclass, which selects the proximate genus for the definition of the class in question. For example, in the taxonomy below Class 3 is the only direct superclass of Class 4:

For defining Class 4, we will just select Class 3 as the proximate genus of the definition. By contrast, in a poly-hierarchy the class to define might have multiple direct superclasses; in the taxonomy below, it’s not obvious whether one should select Class 2 or Class 3 for defining Class 4:

If an explicit classification criterion is being used in the development of the taxonomy, then the specific difference for the definition can be picked by simply selecting a value of the dimension of interest; in the example of planets, one will pick the different material compositions that a planet can have.

In conclusion, mono-hierarchy and Aristotelian definitions can work in synergy: in one way, a mono-hierarchical taxonomy provides the architecture for constructing Aristotelian definitions; in the other way, an Aristotelian definition thereby locates a class within a mono-hierarchical taxonomy, because it specifies both its superclass and the dimension according to which it is being singled out.

A plurality of mono-hierarchies?

That being said, there will be organisations and endeavours that need to represent a given domain from a variety of perspectives - that is, according to multiple dimensions. In those cases, a mono-hierarchy structure can be preserved by simply constructing multiple mono-hierarchical taxonomies, each with its own classification criterion. For example, an automotive company might be interested in classifying cars according to their fueling as well as their body type. Instead of merging together all subclasshood statements, one could organise them into two mono-hierarchical taxonomies, each with its own classification criterion:

‍

Here again, adopting a uniform classification criterion will guide the expansion of each taxonomy, as well as the construction of Aristotelian definitions for its classes.

Navigation taxonomies and poly-hierarchy

Differently from the taxonomy examples that we have seen until now, a navigation taxonomy is not intended to organise expert knowledge, but rather to optimise the browsing experience of the users of certain applications. Poly-hierarchy has the edge here, because the users are likely to browse a domain according to a variety of dimensions - irrespective of how important those dimensions are from the standpoint of the domain experts. For example, prospective tea buyers might be interested in types of tea according to their origin, their function, or their level of oxidation; the taxonomy below encompasses all three dimensions:

Overall, the developers of a navigation taxonomy shouldn’t privilege a single dimension, but rather be content with a multi-criteria classification. It’s important to stress the difference in purpose between navigation taxonomies and the type of taxonomies that we have seen in the previous sections: instead of organising the expert knowledge of a certain domain, navigation taxonomies are aimed at optimising its navigability for a certain group of users, who typically have no technical background. This fundamental difference in objectives is one of the main considerations to take into account when dealing with the construction of a taxonomy or a group of taxonomies.

Conclusions

To sum up, choosing the appropriate taxonomic structure together with explicit and systematic classification criteria can promote the development and refinement of a robust ontological architecture, as well as the formulation of well-formed and informative definitions. Once the taxonomy and the associated class definitions are in good shape, they can be expanded into a richer ontology.

At Semantic Partners, we have a team of ontologists. Our experts are equipped to help clients define their taxonomies and ontologies to enhance decision-making processes. Contact us to learn how we can support your digital transformation effort.

About the author: