Select Star helps Xometry save millions of dollars from data inaccuracies

Since 2013, Xometry (NYSE: XMTR) has helped engineers and product teams meet their custom manufacturing needs with an industrial marketplace supporting products from aerospace to consumer goods. Artificial intelligence enables the 5,000+ Xometry manufacturing partners to connect with customers around the world, making better matches and determining optimal pricing.

200+
Hours saved for Data engineering team
36x
Faster data debugging

Industry:
Marketplace
Company size:
100-500 employees
Integrations:
This is some text inside of a div block.
Integrations:

Challenge

Data Outages and Losing Trust in Data

Like any fast-growing company, Xometry faced scaling challenges while experiencing exponential growth of their marketplace. They’ve seen tremendous international growth, requiring them to cater to new country currencies, materials, and regulations. 

In order to connect manufacturing suppliers and partners across the globe, Xometry built an AI-based algorithm to show potential manufacturing runs with prices and lead times.

However, the data required to make the AI pricing platform work can be unwieldy and chaotic: it takes in the manufacturer capabilities, previous work, evolving customer reviews, preferences, locations, lead times, and more.

“We need accurate data in order to drive accurate quotes. But we did not have an end-to-end, bird's-eye view of our data flow to tell if the data is accurate,” says Jisan Zaman, Sr Data Engineer at Xometry. “We also have cross-system data, and our business intelligence (BI) tools get data from various sources feeding into a variety of dashboards.”

For Xometry, data comes from all different source systems; customer usage, search catalog, and manufacturer capabilities are stored in different systems. Also whenever data moves between systems, it must be matched, cleansed, transformed, and verified. 

“We use our data to predict seller and buyer behavior accurately. The first part of the problem is sorting out where each dataset was coming from. If we don't know where the data is coming from, we can't use it in our prediction. This is why we started looking for data lineage solutions,” says Jisan.

When Jisan and team started evaluating data lineage tools, they realized that there wasn’t a fully-automated column-lineage solution in the market. “There are many data modeling tools for table-level lineage, but the truly challenging part of data is in column-level lineage, where it discerns where that data is coming from, what it means, and what it represents.”

Moreover, Jisan and team were starting to experience data outages related to reporting, revealing cracks in their data pipeline. Now there was pressure from the business to make sure accurate data was available when needed. Data outages were a major issue because it led to:

  • Inability to make informed business decisions: as teams lost trust in the veracity of their reports and the underlying data, business decisions could not be made quickly and accurately
  • Long turnaround times: data outage issues took a long time to identify and resolve without column-level lineage
  • Increased human error: manually flagging potential issues before they led to outages was prone to human error

“In our applications, data accuracy is the most important thing. We estimate that data inaccuracies cost Xometry millions of dollars every month,” says Jisan. 

"We chose Select Star because it automatically detects and displays column-level data lineage, so it’s easy to see where data comes from and flag issues in real time."

Jisan Zaman

Senior Data Engineer, Xometry

Solution

Column-level Data Lineage Integrated into CI Workflow

Looking for a solution that could identify issues quickly and protect their data pipeline, Xometry decided their most important need was to understand the impact of column changes across tables. “In order to eliminate data outages, we needed to track our end-to-end data flow back to the root cause of the issue - where the data is being captured or transformed,” says Jisan.

After a year of trying out every data lineage tool in the market, Xometry selected SelectStar.

“We chose Select Star because it automatically detects and displays column-level data lineage, so it’s easy to see where data comes from and flag issues in real time”, says Jisan.

“Select Star eliminated the data quality issues we had before and brought transparency in our data. Our engineers can now understand their impacts on downstream data easily since Select Star highlights and reports on differences right away.“

Select Star gave Xometry instant access to their data flows and showed where issues threatened the data pipelines. “Before Select Star, we would dig for answers. Now, we are just able to see that a column, say column X in our data, was derived from column F in another table,” says Jisan. “And that column F had been derived from columns A and B somewhere else. This eliminates all the time spent searching for where the problem is coming from, and locating the root cause of it.”

Furthermore, Jisan and team have integrated Select Star’s column-level lineage API, which put an end to their data outages. “We still have updates every week on our data pipeline, but nobody has run into data outage issues anymore, which is pretty impressive,” says Jisan. With no data incidents of erroneous reporting, Jisan and his team have stepped up the level of data trust at Xometry.  

Result

Building Trust in Data with Select Star

“Since we integrated Select Star in our CI pipeline, data quality issues are visible to the data producers, and it’s handled long before they get into production. That saves time, since we never need to hunt for where the data issues are coming from”, says Jisan. “More importantly, our stakeholders can trust the numbers are correct and our customers benefit from more accurate quotes”. 

Xometry’s data engineering team has saved over 200 hours this year, over 30hrs / month, by using Select Star, allowing the data engineer team to channel their time into higher-value tasks. In the past, they would have spent those hours tracking down the origins of data downtime.

“Because of the time we’ve saved, we’re being more proactive about our data needs instead of being reactive to internal requests,” says Jisan.

Want to learn more about how other companies improve their data quality and outages? Talk to us.

Unlock the full context of your data

Get Started