MDS
Modern Data Stack
A term I chance upon while browsing on duckdb’s blog. There were a few blogs that talks about a sets of tools (on these in a bit) that would make up of a modern data stack (or mds), e.g. mds-in-a-box or poor man’s data lake.
Below are some of the definition of MDS:
The modern data stack (MDS) is a suite of tools used for data integration.
~ fivetran
it’s a suite of tools that makes for easier collection, operationalization, and analysis of data.
~ dataiku
A data stack, or data stack architecture, is a collection of tools, technologies, and components that organizations use to manage, process, store, and analyze data.
~ airbyte
In picture
Before going into the stack, a picture would help with a mental model.
This is an abstract architecture and lacks a lot of details but would help me with understanding the roles that each tools are responsible.
The tech stack
The tools that are commonly discuss in various blogs and the one that I would be interested are
- DBT ELT layer in MDS stack
- Dagster ochestrator for data pipelines
- evidence or superset or metabase for BI
- DuckDB in-process OLAP database that integrates very well with the tools above
These tools would fit my local development environment and is the easiest to provision and getting started. This stack, could easily interchange the component depending on its environment. E.g. using postgres/snowflake as the data mart/warehouse in production where it requires the scales/size.
Intended use
I have been pulling data from JIRA manually and ingesting the data with a simple CLI that I wrote into a local postgres running in docker. The data are then visualized via metabase.
I am intending to replace this with the above tools to automate the process and also expanding it into more use cases such as providing more insights about the cloud platforms, e.g. cost usage analytcs, operational analytics and etc.
Next
I have start building the ELT for JIRA tickets Garmin activities ingestion with dagster and duckdb. The progress of this
work will be posts for the future.
Update
- 25-Feb-24. Section “Next”. Change to work on garmin activities instead.