Building a Mini Datamesh with Python

Data Lab

For organisations, the relevance of building a data mesh architecture is becoming increasingly apparent. This approach has gained popularity in recent years. Consulting firms are even advocating for it as it offers many advantages in terms of data management.

I will try to explain the concepts of a data mesh architecture and why it is growing in popularity using Python, one of the most popular languages in the data industry.

Data Mesh: A Definition and Overview

Before we start coding, let's have a closer look at what defines a data mesh approach.

The data mesh approach was introduced by Zhamak Dehghani, a computer scientist who is currently the founder and CEO of Nextdata. She specialises in transforming complex pipelines and bolt-on tools into a standardised, scalable network of autonomous data products, a further explanation of which will follow shortly.

She was motivated by the realisation that the infrastructure deployed at the time was no longer suitable for managing the complexity of data ingested within organisational systems. This was partly due to the ever-increasing volume of data.

In response, she developed a decentralised system in which ownership of data is distributed across multiple domains. This promotes information sharing, accessibility and sourcing, and can be scaled up effectively. Ultimately, this allows for federated data governance.

The Three Pillars of Data Mesh

The first is Data as a Product, the main feature of a Data Mesh architecture. A product is a solution developed around the end user to achieve a specific goal or outcome. It is iterative, meaning that it consistently evolves according to changing user needs.

The second one is called: Data Ownership. Traditionally, organisations have centralised all data collected within their systems under a single governance team that is responsible for it. However, this approach often leads to bottlenecks. In a data mesh architecture, responsibility for overseeing all data products is distributed across all domains. This means that each domain is accountable for managing its own data, including access and sharing.

The last principle is called Data as Self-Service. It aims to reduce the friction involved in discovering, accessing, and using data within the organisation. Regardless of their role (technical or not) within a domain team, individuals should face minimal barriers when accessing analytical data. The organisation's leaders remain in control by providing all domains with self-service tools for analysing data. This eases the sharing and access of data between domains, and encourages collaboration.

Adopting requires of course a cultural shift that is essential to achieve a successful transition from a traditional architecture. Especially regarding data products as it requires from people to follow agile guidelines such as transparency and listening.

Building a Compact Data Mesh for Agile Teams

Now that you have grasped the fundamentals of the Data Mesh concept, here is a short, practical example to help you consolidate your learning. Non-technical readers may wish to skip this section and move on to the next chapter.

My example is based on DC superheroes, with each data product containing information about the character's secret identity and ownership.

In addition to Python, I used FastAPI and Uvicorn — tools that simulate a web-based interface for the Data Mesh, enabling self-service access to the data. Inside the project folder, I created a subfolder called 'domains', containing two products: 'batman.py' and 'superman.py'.

The project also includes two additional files: 'registry.py', which lists and records all the data products and their URLs; and 'main.py', which serves as the entry point for requesting and interacting with the data products.

This small project demonstrates the core principles of a Data Mesh: decentralised ownership, self-service access and discoverable data products — all within a simple, practical example. You can clone the repository and run it on your own machine:

https://github.com/flohms18/DC_Mesh

http://127.0.0.1:8000/products/batman

Feel free to add as many data products as you like, or adapt the project to create your own version. To avoid conflicts with external dependencies, be sure to create a virtual environment. Detailed instructions are provided in the README file.

http://127.0.0.1:8000/products

Data Mesh in Practice: Bridging Technical and Organisational Complexity

Building a Data Mesh infrastructure is, of course, far more complex and requires strong engagement from all teams to ensure effective adoption. As well as the technical challenges, it involves organisational changes such as defining clear data ownership, establishing governance policies and fostering collaboration between teams.

As mentioned earlier, unlike a traditional centralised data platform, a data mesh requires the decentralisation of both data ownership and infrastructure. This means that each team or domain must manage its own pipelines, schemas and quality checks.

Successful implementation depends on the right tools and architecture, as well as alignment between business and technical stakeholders, to ensure that data becomes a reliable, self-service product that delivers real value across the organisation.

Related Posts