March 7, 2023

What is Linked Data?

With enterprises having a vast amount of disparate heterogeneous data sources and the increase in demand of more innovative data-fuelled applications providing intelligent output, the need for unifying and interconnecting sources became ever so more decisive.

Linked Data revolutionised the way data is structured, connected, and organised, creating a web of semantically meaningful interlinked, shareable machine-understandable data. Data is defined in a lingua franca, RDF, making potential heterogenous data, homogenous. This interlinking converts isolated silos of data into a seamlessly graph of knowledge, where computers can easily navigate and make sense of information. This adds value to an enterprise data asset, as the potential is exponentially increased.

In this blog post, we will discuss how organisations can benefit by adopting a Linked Data approach. We will explore the key principles, benefits, challenges, and pitfalls and how to mitigate them, and finally we will discuss some potential use cases.

Key Principles of Linked Data

In his “Linked Data” note, Tim Berners-Lee outlined four principles of linked data.

  1. Uniform Resource Identifiers (URIs) - The foundation of Linked Data lies on the principle that every individual that exists in the data spectrum has a name and a globally unique identifier. These URIs can be considered similar to web addresses.
  2. HTTP for Data Access - One fundamental principle is that data on the web can be resolved using Hypertext Transfer Protocol. This means that every URI is accessible. However, one must make a distinction whether the data is intended for public access. Some platforms and organisations use URN (unique resource name) schemes that are resolvable either internally in an organisation or within the defined platform.
  3. Usage of open standards - In Linked Data, data should be defined through the Resource Description Framework (RDF). RDF is a data model for Linked Databased on triples, subject predicate object, which represents explicit facts (e.g John knows Mary). Furthermore,Linked Data can be queried using SPARQL. SPARQL allows users to ask questions – simple or complex - about the data and extract valuable insights. There are other open standards that can be used within Linked Data.For example, SHACL can be used as a modelling schema to define how data looks, how it can be validated, and in some platforms also as means to create and define inference values. Nowadays platforms offer other querying opportunities, for example through GraphQL or APIs, to lower the barriers further for adoption.
  4. Interlink data - Linked data is extremely valuable when relationships are defined between different individuals. Having data interconnected, it would enable us to discover different things and uncover new inferred information.

Linked Open Data

One aspiratory vision of Sir Tim Berners-Lee was to promoteLinked Open Data. The idea was to have knowledge-rich data available for public and machine consumption, to boost knowledge discovery and better data-driven analytics. Furthermore, Linked Open Data can be used by governments and public bodies to publish open data in a fair, transparent manner, fostering innovation and accountability.

In order for a dataset to be deemed Linked Open Data, a 5star principle was defined:

  • 1 star – data is openly available in some format, not necessarily machine readable (e.g PDF);
  • 2 stars – data is openly available in a structure format (e.g. XLS);
  • 3 stars – data is openly available in a non-proprietary structured format (e.g CSV);
  • 4 stars – data is openly available and defined with an open standard like RDF, and using URIs
  • 5 stars -  All of the 4 stars plus interlinking to other open sources.

Benefits of Linked Data

Linked Data offers a wide range of benefits for enterprises, making it a valuable paradigm for data sharing and transfer between different downstream systems:

  • Data Interoperability -Since data is all modelled and defined using a common open standard, RDF, Linked Data enables heterogenous data to seamlessly interoperate. Linked Data breaks the disparity and nurtures compatibility.
  • Data Integration - The whole point of Linked Data is to foster integration and interlinking of heterogeneous data.
  • Discoverability – Data assets, which were previously only available to distinct sources can now be available and highly discoverable. The concept of assigning unique URIs makes discoverability easy, which is crucial for knowledge sharing.
  • Flexibility – Unlike traditional DBMS, Linked Data is extremely flexible and can easily adapt and evolve new knowledge to the existing graph with minimal to no disruption.
  • Semantics – Data turns to knowledge. All LinkedData graphs are backed by one or more schemas or ontologies, meaning that each piece of raw data now has meaning that any machine can understand. This semantic meaning facilitates business applications such as semantic searching, and innovative AI algorithms.
  • Transparency – Public bodies are using Linked Data to publish open data, making public information available to public and machine consumption, fostering transparency, innovation, and accountability.

Challenges and Pitfalls

As with all technological paradigms there are challenges and concerns that need to be addressed in an appropriate manner. Linked Data platforms offer different solutions for these challenges, however, sometimes these challenges require addressing at an organisational level.

  • Data Scale – Linked Data can exponentially grow when more and more sources are added to the graph. This might mean that managing and querying these large datasets could become resource intensive.This can be solved by fragmenting and structuring graphs strategically, for example by subject areas.
  • Data Governance – Like all data assets, Linked Data graphs need to be governed by the appropriate subject matter experts. Like the scale challenge, large datasets can become too complex to govern and thus ensure good quality, security, privacy and compliance. One biggest concern is that linking various datasets might expose certain sensitive and PII data.Therefore, it is imperative to track the lineage of the data, for example by keeping provenance information using standard ontologies such as PROV-O.
  • Data Inconsistencies – Converting your data to LinkedData is not an overnight job and does not automatically guarantee best quality and consistent data. When converting data, any inconsistencies in the original sources are propagated, unless cleaned beforehand or the mapper can automatically resolve them. Resolving semantic differences and maintaining data consistency can be a challenge, however, having good underlying models and ontologies can help facilitate this.
  • Technical Boundaries – At face value, working with LinkedData requires expertise in data modelling and related technologies such as RDF and SPARQL. Vendors and open-source tools are striving to lower the entry barriers, and with tools such as GraphQL, Linked Data is becoming even more reachable and mainstream.

Use Cases and Applications

Linked Data is used in several industry verticals and domain-independent applications. In this section we will list a few of these verticals and applications, as Linked Data is not by any stretch limited to these.

Linked Data in Industry

  1. Pharma/Healthcare – Linked Data is used for various use cases, such as for drug discovery, drug-drug interaction, or for clinical research.
  2. Banking/Finance – In this vertical we have seen use cases where Linked Data was used to define data standards to support business applications, or to build metadata catalogs for their data assets.
  3. Industrial –Some industrial companies are using Linked Data to support their supply chain, starting from procuring the smallest bolt, to the complete product for the enduser.
  4. Culture – Museums and libraries are using Linked Data to link artifacts to (a) provide a better understanding of the said artifacts and uncover new knowledge, and (b) provide consumers a better overall experience.
  5. Home Furnishing – One major player in this industry built a common graph for interior designer knowledge, which powers several business applications to serve back to their consumers.

Applications

  1. Taxonomies – Using standards such as SKOS, hierarchical taxonomies can be defined with a semantic meaning.
  2. Business Glossaries – Enterprise-wide glossaries can be defined in Linked Data to serve as a standard that can be used within different business applications. Furthermore, these glossaries can be used to further refine and increase value in links between conceptual and logical models.
  3. Data Catalogs – Linked Data is the perfect solution to build a data catalog of an enterprise data asset. Having a metadata model underlying the data, an organisation can get an overall view of the data assets, their metadata, and relationships between the various assets within an enterprise. Furthermore, these data assets can be linked to structured taxonomies and business glossaries to give further meaning to the vast organisation’s data.
  4. AI and Machine Learning – The semantic and graph structure of Linked Data makes it ideal to enhance and augment innovative AI algorithms, either during training, or during execution via inferencing.
  5. Semantic Search – Linked Data allows for complex context-aware searches to return highly relevant results to the consumer.

Final Remarks

Linked Data is fundamental to this data-driven world, where it allows enterprises to weave the various heterogeneous data sources into one standard, semantically interoperable format. The benefits are clear; however, one must always take the pitfalls into consideration. Linked Data is not the penicillin to all the enterprise problems; however, it is a start. EmbracingLinked Data is a step towards new horizons in terms of innovation, the potential is limitless. At Semantic Partners, we are happy to join you on this journey, helping you in every step of the way to make your organisation data more interoperable and help you pave the way for more innovative products for your consumers.

About the author:
REach out

To discuss a project, collaboration, or for anything else, just shoot us a message.

Let's work together
Contact