Block Automates Column-level Data Lineage at Exabyte Scale with Select Star

Block, a global finance company formerly known as Square, provides integrated, omnichannel solutions with embedded financial services tools for merchants of all sizes. With 210 million+ buyer profiles to manage, Block processes more than 200 billion dollars annually and exabytes of data on the cloud.

42M+

Lineage Objects & Relationships Managed

6,600+

Data Consumers Supported

Industry:

Financial Services

Company size:

10,000+ employees

Integrations:

Looker

Snowflake

Challenge

Billions of transactions and strict compliance regulations demand best-in-breed data governance

Block processes over 200 billion dollars in transactions annually through their systems, resulting in billions of data points that they have to manage and secure. Being a global finance company, they also have to comply with loads of compliance regulations — GDPR, CCPA, PCI DSS, SOX, GLBA, PSD2, and more.

But with its massive scale, manual tracking of sensitive data is nearly impossible. So, they set up a best-in-breed data ecosystem consisting of visualization tools, automation and integration data pipelines, data inflow and acquisition, and data storage platforms.

*Within these data ecosystem categories are tools like Select Star, dbt, Fivetran, Amazon S3, Snowflake, Looker, and others.*

Because of the sheer amount of data and users, many of the tools in use are stretched to their maximum capacities. Looker, their primary business intelligence visualization platform, must cater to 6,600 monthly active users, process more than 6 million Snowflake database tables, and accommodate over 2,200 users of ETL tools and pipelines.

“We have a lot of data, a lot of users to support, and a reasonably small number of people to get that job done,” says Sam Osborn, Sr. Software Engineer at Block.

Across all their services, processes, and data platforms, Block collects metadata. But with no single source of truth to govern that data, their decentralized, organic operating model only exacerbated their existing challenges. They had:

Inconsistencies in their tools’ usage and team communication which created a fragmented data ecosystem.
Multiple data tools that were hindering the discovery and identification of pertinent data tables.

As a result, Block’s thousands of data analysts and developers were uncertain about which tables to use and how to consume them. Data users had a hard time figuring out where the data came from and where it was going, which made it tough to ensure its accuracy and understand its journey.

It was getting trickier by the day to manage data securely in their decentralized setup.

Select Star has been incredibly helpful for tracking down the impacts of a data change. My teammate has called Select Star a godsend!

Sharon

Data Architect at Block (CashApp)

Solution

Select Star provides the column-level granularity needed for tracking dependencies, integrating across existing platforms, and handling sensitive data

Block’s foundation engineering team builds applications designed specifically for all of their other engineering teams – all based on those teams’ data. Because they support everyone else in the company, they get hundreds of support tickets and end up spending the majority of their time tracking and debugging issues that they can’t quite pinpoint the origin of.

Drowning under all that data, the foundation engineering team reached a triggering event: they simply couldn't control all their other teams using the data in their own ways. That's when they realized they needed a best-in-class data governance solution – especially when it came to tracking their PIIs for compliance regulations.

After many internal customer team surveys, the foundation engineering team at Block identified automated column-level lineage as an essential component of their data ecosystem. They wanted to achieve the following use cases with lineage:

Track dependencies and data flows between data jobs based on their impact
Allow for internal applications to be built on top of lineage to integrate with existing data platforms and tools
Handle sensitive data classification, data usage, and audit reporting at scale

Overall, the goal was to use automated column-level lineage to enable smarter services, increase developer velocity, and stay in compliance with regulations. After 8 months of evaluating whether they should build a solution internally vs. buy one that already existed, and vetting multiple vendors, Block selected Select Star as their solution based on the following factors:

User Experience: Select Star received positive feedback from Block’s data users during the trial, who highlighted its intuitive UI and user-friendly experience. The ability to visualize column-level lineage within Select Star's interface helped data consumers successfully navigate their workflows without additional training
Popularity Scores: Select Star automatically generated popularity scores for data tables, dashboards, and columns based on usage patterns, such as view counts. This feature helped Block’s data engineers and analysts understand the relevance and impact of different data objects within the organization.
API Flexibility: Select Star offered a flexible API that seamlessly integrated with Block's existing data platforms and allowed for customized integrations. This flexibility supported Block’s planned workflows and future data management initiatives.
Customer Service: Select Star provided excellent support and communication throughout the pilot program, easing the onboarding process and expediting the time to value.

“We have found Select Star enjoyable to use and highly effective in addressing the specific use cases at Block,” says Sam.

Block has set out to use Select Star as the single source of truth data portal to enhance their understanding of how data products, particularly Snowflake tables and columns, integrate within their larger ecosystem. They centralized all of their metadata in Select Star and set up automated workflow and tooling. Select Star has provided visibility and clarity on data models, which enables data producers and consumers to conduct impact analysis upstream and downstream using lineage.

Select Star also aided in identifying risks associated with the spread of sensitive data, particularly in relation to personally identifiable information (PII) columns. By making lineage the sole source of truth, it made team communication easier – even across several layers. Powered by new-and-improved collaboration, Block streamlined issue debugging, feature requests, and schema updates.

We have a lot of data, a lot of users to support, and a reasonably small number of people to get that job done.

Sharon

Data Architect at Block (CashApp)

Result

Countless hours saved and reduced data redundancies for supporting 6,600+ data consumers

Since implementing Select Star six months ago, Block has already witnessed significant improvements in their data management processes. ”Select Star saves us countless hours of manual effort to track down all our dependencies while allowing us to ramp up our development velocity to drive new fact-based insights,” says Paul Luong, Head of Data Intelligence at Block.

Automated access control management is a recent initiative that has already saved Block’s teams significant time. Before implementing Select Star, it was common for a user to request help to get access to data tables and dashboards, causing the Business Intelligence Automated Reporting Tools team to spend valuable time editing access control several times a day manually. To streamline these requests and better serve the data users, Block developed Bellhop, an internal tool that handles these repeatable steps to resolution.

Bellhop automates security requests and provides a consolidated view of access permissions, allowing users to quickly identify and resolve access issues. Using the Select Star API, it retrieves the list of Snowflake tables used in a dashboard and leverages the Looker API to obtain user roles and permission sets. By comparing this information with the centralized Registry tool, which serves as the single source of truth for security permissions, users can now simply input their username and the link to any Looker content to receive a comprehensive report with appropriate system links to obtain access.

The ability to build customized integrations with the Select Star API has significantly reduced the manual effort required to handle access-related inquiries and has improved the overall user experience.“Select Star bridged the gap between Looker and Snowflake, eliminating the need for manual parsing of LookML files and reducing errors. By leveraging Snowflake metadata and query history, Select Star provides an efficient and accurate representation of which tables are actually being used,” says Matthew White, Business Intelligence Analyst at Block.

Block has one of the largest Looker instances to date, owning over 11K dashboards and 25K Looks used by over 6,600 users. Prior to the adoption of Select Star, it was challenging to determine whether a table already existed in LookML, leading to the influx of duplicate content. The lack of metadata integration from Snowflake further complicated the process.

But with Select Star, they were able to create a single source of truth by reducing redundancies in their content through automated data discovery. By leveraging Select Star’s search and filtering capabilities, developers could efficiently make empirical, data-driven decisions.

“Select Star offers a user-friendly interface allowing users to search for any object, including LookML view files, Looker Explores, and Snowflake tables. It provides comprehensive lineage information, enabling Looker administrators to easily identify all instances where the object is utilized within Looker. This functionality has proven invaluable for our administrative tasks,” says Matthew. This feature, in particular, has been highly valuable for data administrators and Looker developers, enabling them to quickly identify existing content and eliminate redundant entries.

Ultimately, through the implementation of Select Star’s automated data lineage, Block overcame their initial set of data management challenges, and they plan to continue expanding how they use the tool to optimize their data management program. “We have plans to leverage the column level lineage API and other APIs to integrate with existing tools, automate workflows, and enhance responsible ownership, data quality, documentation, navigation between tools, access control, and sensitive data handling,” says Sam.

Sharon

Data Architect at Block (CashApp)

Block Automates Column-level Data Lineage at Exabyte Scale with Select Star

Challenge

Solution

Result

More Customer Stories