Back
Blog Post

Best Practices for Organizing Data

Rya Sciban
August 13, 2024

Rya Sciban is the Head of Product at Select Star.

Organizing data is essential for both data discovery and data governance. In this post, I’ll share best practices on organizing data, including defining data domains, documenting the domains, leveraging hierarchy, and aligning with permissions. This comes from years of building products for data consumers at Select Star, Affinity, and Sisense combined with my previous experience as a data analyst. 

A data domain is a grouping or category of data that helps organizations discovery, manage, or govern their data. Defining data domains builds a structured and organized framework for managing and categorizing data, which is essential for several reasons.

Ease of Access and Efficiency

  1. Quick location and access to needed data
  2. Reduces time spent searching, allowing focus on analysis and decision-making
  3. Proper tagging, logical categorization, and comprehensive descriptions aid in identifying relevant data

Data Governance and Compliance

  1. Ensures consistent data management across the enterprise
  2. Aid adherence with regulatory and data privacy requirements
  3. Facilitates auditing and monitoring by tracking data usage, modifications, and ownership
  4. Supports a culture of transparency, accountability, and trust for effective data-driven decision-making

Well-defined data domains facilitate better data integration, reduce redundancy, and support consistent data management practices across the organization. Ultimately, this leads to more informed decision-making, improved operational efficiency, and a robust foundation for scaling data initiatives.

Organizing Data with Tags

Tags are a great way for data consumers to discover data content relevant to a particular domain, and is supported on all asset pages to allow data managers to organize their key data, and provide that context to consumers.

Select Star supports two kinds of tags - category tags and status tags. Category tags help customers categorize their data with user defined domains. Status tags are used to indicate a status or quality of data, for example Certified, Sensitive (PII), or Deprecated.

Let’s explore a few approaches to defining category tags / data domains.

Examples of category tags Select Star users leverage to define their data domains.
Examples of category tags Select Star users leverage to define their data domains.

1. Business Function-Oriented Approach

With the business function-oriented approach, data domains aligns data categorization with core organizational functions like sales, marketing, finance, and operations. This strategy offers distinct advantages, providing clear connections to business objectives and streamlining communication with key stakeholders. However, it's not without its challenges. Organizations may encounter issues with overlapping data across different domains, and successful implementation demands a comprehensive understanding of intricate business processes. Despite these potential hurdles, this approach can significantly enhance data organization and utilization when executed thoughtfully.

Example of Business Data Domains:

  • Operations: This domain manages supply chain management data, inventory levels, and production schedules.
  • Sales: This domain includes data related to customer transactions, sales performance, and sales targets.
  • Marketing: This domain encompasses data from campaigns, lead generation, and customer segmentation.
  • Finance: This domain holds financial statements, budgeting data, and expenditure tracking information.

2. Technology-Oriented Approach

In the technology-oriented approach, data domains are structured around the core technologies and platforms utilized within an organization, such as Customer Relationship Management (CRM) systems, Enterprise Resource Planning (ERP) software, and data warehouses. This approach capitalizes on existing technical infrastructure, making it easier to integrate data within these established platforms. While it offers the advantage of leveraging familiar technical boundaries, it's important to note that this approach may not always align perfectly with business objectives. There's a risk of creating technology silos, potentially limiting cross-platform data insights and collaboration. Organizations considering this approach should carefully weigh its technical benefits against potential limitations in meeting broader business needs.

Examples of Technical Data Domains:

  • Cloud Platform: Cloud resource utilization, cost management, and deployment logs.
  • CRM System: Customer profiles, sales pipeline data, and customer interactions.
  • ERP System: Financials, supply chain data, and human resources information.
  • Data Warehouse: Historical sales data, business intelligence reports, and data marts.

3. Product-Oriented Approach

Last but not least, the product-oriented approach is where data domains are structured around the organization's various products or services. This method offers distinct advantages, such as enabling product-specific insights and streamlining data management for product teams. However, it also presents challenges. Organizations may encounter duplication of shared data across different products, and there's a need for meticulous coordination when conducting cross-product analysis. Despite these potential hurdles, this approach can be particularly beneficial for companies with diverse product lines, allowing for more focused and relevant data organization within each product domain.

Examples of Product Data Domains:

  • Product A: Sales data, customer feedback specific to Product A, and usage analytics.
  • Product B: Development timelines, market performance, and feature adoption rates.
  • Service X: Subscription data, service performance metrics, and user reviews.
  • Service Y: Implementation timelines, service-level agreements (SLAs), and customer satisfaction.

Other Considerations When Organizing Data

As you consider the best approach for you to categorize your data for consumption within Select Star, some nuances and adjacencies to think about:

1. Align data organization and domains with permissions in source systems

Some systems such as Snowflake and BigQuery allow admins to use tags to set access for tables in those systems. If that is something your organization is considering, you can align your data domains within Select Star with assets you want to be able to grant permissions on collectively. You then sync your tags from Select Star to these systems (via the API or via Snowflake Tag Sync) to be able to ensure data assets are correctly tagged.

2. Document your domains

Your users will need to know how to leverage the domains - how will they know which ones have the data assets they’re looking for. By documenting the key data domains themselves and the underlying principles of data domains, data consumers will better understand where to find the right data assets.

3. Use hierarchy to enable discoverability

Select Star supports hierarchy in tags, having a parent category and nested category tags within. As you balance the need to be simple and discoverable and provide enough nuance in your domains, hierarchy can be a tool in your toolbelt to help you achieve the best experience.

Data domains and status tags can be synced back to source systems to facilitate appropriate access to data assets.

Take your next steps in organizing data

Choosing the right approach to organizing your data with data domains in a data mesh depends on the organization's structure, business goals, and data management maturity. Often, a hybrid or iterative approach is necessary to effectively define data domains. Regular reviews and adjustments are crucial to ensure the domain definitions remain relevant and effective, allowing for continuous improvement in alignment with evolving business needs and technological advancements.

Defining data domains is just one step in the data management journey. Along with domains, it is most effective to define responsibilities of stewards, business and technical owners within the domains, and how cross-domain data decisions will be made.

Related Posts

Understanding Snowflake Data Usage for Cost Optimization
Learn More
Monte Carlo Integration for Enhanced Data Observability
Learn More
Semantic Layers 101: Everything You Need to Know to Get Started
Learn More
Data Lineage
Data Lineage
Data Quality
Data Quality
Data Documentation
Data Documentation
Data Engineering
Data Engineering
Data Catalog
Data Catalog
Data Science
Data Science
Data Analytics
Data Analytics
Data Mesh
Data Mesh
Company News
Company News
Case Study
Case Study
Technology Architecture
Technology Architecture
Data Governance
Data Governance
Data Discovery
Data Discovery
Business
Business
Data Lineage
Data Lineage
Data Quality
Data Quality
Data Documentation
Data Documentation
Data Engineering
Data Engineering
Data Catalog
Data Catalog
Data Science
Data Science
Data Analytics
Data Analytics
Data Mesh
Data Mesh
Company News
Company News
Case Study
Case Study
Technology Architecture
Technology Architecture
Data Governance
Data Governance
Data Discovery
Data Discovery
Business
Business
Turn your metadata into real insights