How Datasembly Saves 90% of Data Engineering Time with Select Star

Datasembly stands as a leading provider of real-time hyper local data for market intelligence, extracting data from over 220 retailers including major chains like Walmart and Target. They empower brands and retailers with insights across pricing, promotional intelligence, compliance, and distribution data, serving retailers, consumer packaged goods (CPG) companies, and various partners.

$30k+

saved per year in Snowflake costs

90%+

reduction in time to complete data projects

Industry:

Software

Company size:

<100 employees

Integrations:

Snowflake

dbt

Apache Airflow

Tableau

Challenge

Navigating Complex Data Lineage and Change Management

Datasembly's core business revolves around a massive pricing data set with intricate downstream lineage. This complexity presented significant challenges for the data team, particularly in implementing changes and new features.

At Datasembly, the team faced a significant challenge with their main pricing data table. Its complex downstream lineage was largely undocumented, relying heavily on the unwritten knowledge of individual engineers. While some team members had a mental map of the dependencies, there was no structured way to share or preserve that understanding across the organization. As a result, even small changes to the data pipeline felt risky and often led to hesitation, making teams reluctant to introduce updates for fear of unintended consequences.

This lack of visibility led to several critical issues:

Data Inconsistency: Different teams working on separate products (SaaS app, raw data delivery, BI) led to inconsistencies in data representation across platforms.
Limited Understanding of Impact: It was extremely challenging to comprehend how changes would affect various data products, dashboards, and client-facing deliverables.
Inefficient Project Planning: Designing new projects or implementing changes could take up to 80 hours, with only a 50% confidence level in the outcome.
Change Management Paralysis: The concern of disrupting existing data products or client deliverables often resulted in delayed feature implementations.

Jamie Hollowell, Lead Data Engineer at Datasembly, explains, "We were afraid of breaking things downstream, data is delivered directly to clients so don't want to break internal things but also client-facing deliverables. Changing the behavior of a column was extremely hard to understand who would be affected, which data products, which dashboards."

These challenges were compounded by Datasembly's complex data ecosystem, which includes:

Data scraping that flows through Kafka into Snowflake
A database with trillions of rows and hundreds of terabytes
Airflow for job scheduling and dbt for data transformation
Various data delivery methods including cloud buckets, FTP, and a SaaS product with embedded Tableau dashboards

"We were afraid of breaking things downstream, data is delivered directly to clients so don't want to break internal things but also client-facing deliverables. Changing the behavior of a column was extremely hard to understand who would be affected, which data products, which dashboards."

Jamie Hollowell
Lead Data Engineer at Datasembly

Solution

Comprehensive Data Discovery and Lineage Tracking

To address these challenges, Datasembly implemented Select Star, a data management tool that provided crucial visibility into their data lineage. The solution offered several key features:

Detailed Lineage Visualization: Select Star enabled the team to see all connections and dependencies within their data ecosystem, empowering confident decision-making for changes and new features.
Centralized and AI-Assisted Documentation: Select Star first facilitated the organization of data products more effectively to ensure consistency. Then by connecting primary keys and leveraging AI-assisted documentation, it streamlined the process of maintaining comprehensive, accurate data records.
User-Friendly Interface: The intuitive interface and focus on data documentation were particularly appealing, making it easier for both technical and non-technical users to understand data relationships.

"Work that easily would have taken 80 hours, and where we would have had a confidence level of 50% can now be donein a day with high confidence. It feels like having another engineer on the team.

Jamie Hollowell
Lead Data Engineer at Datasembly

Result

Increased Efficiency and Confidence in Data Management

The implementation of Select Star led to significant improvements in Datasembly's data management practices:

Time Savings: Projects that previously took up to 80 hours could now be completed in 6 hours, with a much higher confidence level.
Cost Reduction: The improved efficiency is expected to save Datasembly between $30,000 to $40,000 annually in costs related to Kafka and Snowflake.
Enhanced Change Management: The team can now make changes more confidently, fostering innovation and faster feature implementation.
Improved Data Consistency: By unifying the underlying data layer, Datasembly can ensure consistency across their various data products, enhancing trust in their data.

Hollowell summed up the impact: "Work that easily would have taken 80 hours, and where we would have had a confidence level of 50% can now be done in a day with high confidence. It feels like having another engineer on the team."

With Select Star, Datasembly has noticed an important shift in their data management practices. The improved data governance and visibility have contributed to higher confidence in data-driven decision-making across the organization. It has enabled the company to operate more efficiently with a leaner data team, improve data consistency across products, and ultimately deliver more value to their clients.

Looking ahead, Datasembly is excited about future data initiatives and the continued impact of Select Star on their operations. The tool has become an integral part of their data management strategy, supporting their mission to provide accurate, comprehensive grocery pricing data and insights to their clients.

Jamie Hollowell
Lead Data Engineer at Datasembly

How Datasembly Saves 90% of Data Engineering Time with Select Star

Challenge

Solution

Result

More Customer Stories