Today, we’re excited to announce our official partnership with Snowflake as part of Snowflake’s Data Governance Accelerated Program. Snowflake has been a pivotal integration partner of Select Star since the inception of our company. Today, Select Star processes more than 10 million SQL queries and tracks over 100 million tables, columns, and other types of metadata every month from Snowflake customers around the world.
Select Star and Snowflake share a common interest in supporting our customers’ efforts to automate data governance efforts in effective, sustainable ways. Now, as part of the Data Governance Accelerated program, we’re announcing two deeply integrated features for Snowflake: Data Lineage and Snowflake Tag Sync.
Why data governance?
As data has become central in many businesses, the need for data governance policies and tools has grown. Data is becoming widely recognized as a product, and all parts of an organization are consumers of that product with questions about what data exists, where it exists, and who is using it.
Data governance defines processes, roles, policies, standards, and metrics on who can use which data, how they can use it, and when. To implement an effective data governance strategy on a data warehouse that holds all of an organization’s data can be an overwhelming task. This is why companies are looking to streamline and automate the processes of organizing, documenting, and tracking metadata with data discovery platforms like Select Star today.
How Select Star works with Snowflake to provide the full context of data
At Select Star, we hear that data analysts and engineers spend too much time trying to trace their data from target to source and vice versa. This search can often span across multiple tools, each with different methods of presenting data. Even if a search is confined to one tool, it’s a time consuming and error-prone process.
While some companies may have sufficient resources to build in-house solutions to parse through their logs and solve this problem in the short term, cloud warehouse and data lake technology evolves rapidly. Even if building an internal tool is viable at first, the cost of maintaining it as the business changes can quickly outweigh the value it provides. An old solution may also accumulate errors, to the point that the faulty information it displays is no longer worth using.
This is why Select Star has supported up-to-date, built-in integrations since day one of our launch. When customers connect their data warehouses, BI tools, and ETL tools to Select Star, the platform analyzes and maps the metadata and query history logs and automatically generates popularity and data lineage models. Select Star can guarantee the most reliable and up-to-date methods are being used by directly integrating with the features provided by partners such as Snowflake.
Data Lineage with Snowflake Access History
Data analysts have been using Select Star as a fast and easy way to debug data issues and identify the downstream impacts of changes through the lineage graph. Data engineers can trace columns from a dashboard all the way to the source table without manually sifting through SQL logs. They can also check which downstream assets are using Snowflake tables to analyze the impact of data changes, as well as keep the main stakeholders of those downstream assets informed of planned maintenance.
Select Star’s query parser integration with Snowflake is built on their Query History, which preserves a record of all the queries run on Snowflake databases and operational metadata such as the time taken to run the query. Every day, Select Star’s query parser examines the DDL and DML queries from Query History to generate the data lineage.
Today, Snowflake is announcing a new feature: Access History. Access History shows which columns, tables, or views have been accessed per query. This provides an accurate picture of the query metadata and hence makes Select Star data lineage & popularity more accurate.
As a fast growing company, Snowflake releases many new features and SQL functions every quarter. With the integration of Access History, Select Star now has a fallback method so that even if a query includes a Snowflake custom function that the query parser doesn’t support yet, it can refer to Access History to get all the metadata.
This is also useful in cases where a query is too long to fit in the QUERY_TEXT field of the Query History View, as Select Star’s query parser can then complete the lineage using the metadata Access History returns.
Getting a Complete Picture of Data Usage
By default, Select Star runs query analysis on the last 90 days of activity logs. These queries are not only used to generate data lineage, but also to provide contextualized data auditing tools; Select Star users can see usage analysis in the context of a table, an individual, a team, or even an entire data warehouse.
Popular Queries & Joins
Select Star provides a dedicated page for each Snowflake table in its application. From these table pages, users can see popular queries, popular joins, and the most recent queries run on that table. Select Star will also show the user who ran the query and its execution time.
The table page will even show who uses the table the most, who uses the dashboards directly downstream of these tables the most, and which dashboards those would be.
Admins can also see queries for the entire data warehouse from the Select Star settings page. Here, they can easily sort by query run time to see their most costly queries.
Who is using what?
When someone creates an account in Select Star, a profile is automatically created based on their activity in different data tools. The profile page shows aggregated information about their most queried datasets, recent queries, and datasets they have created or own. When users create their accounts, they will also be added to a team, meaning the same information is available at a team level.
This resource can help not only when onboarding people who have just joined an organization, but also when trying to understand the usage patterns of different teams or individuals. Non-technical users may interact with different datasets or in different ways than engineers, and team level analysis can provide that insight.
Having contextualized query history and activity readily available makes it easier to see which data is being used the most, who is using it, and how they are using it. This can help companies determine which datasets are the most important to maintain and create data governance policies and processes to ensure the quality of those datasets.
Data Organization with Snowflake Tag Sync
Being able to trace data through its entire lifecycle helps customers identify problems in known datasets and maintain key data flows. But individuals in different departments of organizations may not have clarity of which datasets are relevant to their work. This is why Select Star supports creation of fully customizable tags which can be applied to any data objects connected in the application.
Select Star now provides Snowflake customers an option to sync Select Star tags back to Snowflake using their new Object Tagging feature. By enabling Snowflake Tag Sync in Select Star, customers will be able to see Category and Status tags applied to data from their entire data stack in Select Star, and also see those same tags supported in Snowflake without duplicating effort.
Object Tagging is a powerful feature in Snowflake for adding classification, organization and control down to the column level. Select Star provides an intuitive UI for easily managing and organizing metadata. With Snowflake Tag Sync, Snowflake instances can have up-to-date tags from Select Star, without requiring a manual update of data from one system to another.
Moving towards automated data governance
Having true visibility into the complete lifecycle of data and automatic sync of data tags allows customers of both Select Star and Snowflake to easily govern and manage metadata at scale. This post has been a brief overview of some of the ways these two companies are working to provide solutions which integrate seamlessly into existing data workflows and offer new options for implementing data governance.
Select Star is an ambitious new contender in the realm of data discovery, with data governance use cases as a key driving factor in its development. Other features include a comprehensive search, data popularity, and metadata management tools which enable organizations to explore and use their data to its full potential. To learn more about how you can improve your data discovery with Select Star and Snowflake, schedule a demo today.