Datasembly stands as a leading provider of real-time hyper local data for market intelligence, extracting data from over 220 retailers including major chains like Walmart and Target. They empower brands and retailers with insights across pricing, promotional intelligence, compliance, and distribution data, serving retailers, consumer packaged goods (CPG) companies, and various partners.
Navigating Complex Data Lineage and Change Management
Datasembly's core business revolves around a massive pricing data set with intricate downstream lineage. This complexity presented significant challenges for the data team, particularly in implementing changes and new features.
At Datasembly, the team faced a significant challenge with their main pricing data table. Its complex downstream lineage was largely undocumented, relying heavily on the unwritten knowledge of individual engineers. While some team members had a mental map of the dependencies, there was no structured way to share or preserve that understanding across the organization. As a result, even small changes to the data pipeline felt risky and often led to hesitation, making teams reluctant to introduce updates for fear of unintended consequences.
This lack of visibility led to several critical issues:
Jamie Hollowell, Lead Data Engineer at Datasembly, explains, "We were afraid of breaking things downstream, data is delivered directly to clients so don't want to break internal things but also client-facing deliverables. Changing the behavior of a column was extremely hard to understand who would be affected, which data products, which dashboards."
These challenges were compounded by Datasembly's complex data ecosystem, which includes:
"We were afraid of breaking things downstream, data is delivered directly to clients so don't want to break internal things but also client-facing deliverables. Changing the behavior of a column was extremely hard to understand who would be affected, which data products, which dashboards."
Jamie Hollowell
Lead Data Engineer at Datasembly
Comprehensive Data Discovery and Lineage Tracking
To address these challenges, Datasembly implemented Select Star, a data management tool that provided crucial visibility into their data lineage. The solution offered several key features:
"Work that easily would have taken 80 hours, and where we would have had a confidence level of 50% can now be donein a day with high confidence. It feels like having another engineer on the team.
Jamie Hollowell
Lead Data Engineer at Datasembly
Increased Efficiency and Confidence in Data Management
The implementation of Select Star led to significant improvements in Datasembly's data management practices:
Hollowell summed up the impact: "Work that easily would have taken 80 hours, and where we would have had a confidence level of 50% can now be done in a day with high confidence. It feels like having another engineer on the team."
With Select Star, Datasembly has noticed an important shift in their data management practices. The improved data governance and visibility have contributed to higher confidence in data-driven decision-making across the organization. It has enabled the company to operate more efficiently with a leaner data team, improve data consistency across products, and ultimately deliver more value to their clients.
Looking ahead, Datasembly is excited about future data initiatives and the continued impact of Select Star on their operations. The tool has become an integral part of their data management strategy, supporting their mission to provide accurate, comprehensive grocery pricing data and insights to their clients.