How to Flatten JSON Data in Snowflake 

Published: June 25, 2024 - 8 min read

Julian Alvarado

JSON data is increasingly common in modern data ecosystems, but its nested structure can pose challenges for analysis. This guide explains how to flatten JSON data in Snowflake, making it easier to query and analyze.

Why Flatten JSON Data in Snowflake?

Snowflake is a cloud-based data warehouse that excels at handling structured and semi-structured data, including JSON. 

However, the nested nature of JSON can make complex queries challenging. Flattening JSON data transforms nested structures into a tabular format, offering several benefits:

  • Simplified querying using standard SQL
  • Improved query performance
  • Easier integration with other data sources
  • Enhanced data analysis capabilities

FLATTEN Function 101: Understanding the Basics

The FLATTEN function in Snowflake takes a JSON or semi-structured data type as input and returns a table with one or more rows, depending on the structure of the input data. The function can handle both JSON objects and JSON arrays, flattening them into a relational format.

The syntax for the FLATTEN function is as follows:

FLATTEN(input_column, [path], [outer_only])

  • input_column: The column or expression containing the JSON or semi-structured data to be flattened.
  • path: (Optional) The JSON path to the specific element or array you want to flatten.
  • outer_only: (Optional) A boolean flag that determines whether to include only the outermost level of the JSON structure (true) or all levels (false).

By using the FLATTEN function, you can easily transform complex JSON structures into a format that can be queried and analyzed using standard SQL. Let’s explore some examples of how to use the FLATTEN function to flatten JSON data in Snowflake.

Flatten JSON Data in Snowflake

Let’s walk through the process of flattening JSON data in Snowflake using a step-by-step approach.

Step-by-Step Guide to Flattening JSON Data

Let’s walk through the process of flattening JSON data using a simple example:

Step 1. Create a table with JSON data:

CREATE TABLE customer_data (

    customer_id INT,

    customer_details VARIANT

);

INSERT INTO customer_data (customer_id, customer_details)

VALUES (1, PARSE_JSON(‘

{

    “name”: “John Doe”,

    “email”: “john@example.com”,

    “address”: {

        “street”: “123 Main St”,

        “city”: “Anytown”,

        “state”: “CA”

    }

}

‘));

Step 2. Use the FLATTEN function to extract nested data:

SELECT

    c.customer_id,

    f.value:name::STRING AS customer_name,

    f.value:email::STRING AS customer_email,

    f.value:address.street::STRING AS customer_street,

    f.value:address.city::STRING AS customer_city,

    f.value:address.state::STRING AS customer_state

FROM customer_data c,

LATERAL FLATTEN(input => c.customer_details) f;

This query creates a flattened view of the JSON data, with each nested element as a separate column.

Handling Complex JSON Structures: Objects and Arrays

JSON data often contains more complex structures, including nested objects and arrays. Let’s explore how to handle these using the FLATTEN function.

Flattening JSON Objects

Consider a scenario where the customer_data table contains a JSON object with additional details:

INSERT INTO customer_data (customer_id, customer_details)

VALUES (2, PARSE_JSON(‘

{

    “name”: “Jane Smith”,

    “email”: “jane@example.com”,

    “address”: {

        “street”: “456 Oak Ave”,

        “city”: “Somewhere”,

        “state”: “NY”

    },

    “orders”: [

        {

            “order_id”: 1001,

            “product”: “Widget”,

            “quantity”: 2,

            “price”: 9.99

        },

        {

            “order_id”: 1002,

            “product”: “Gadget”,

            “quantity”: 1,

            “price”: 19.99

        }

    ]

}

‘));

To flatten this more complex structure, we can use multiple FLATTEN functions:

SELECT

    c.customer_id,

    f.value:name::STRING AS customer_name,

    f.value:email::STRING AS customer_email,

    f.value:address.street::STRING AS customer_street,

    f.value:address.city::STRING AS customer_city,

    f.value:address.state::STRING AS customer_state,

Coefficient Excel Google Sheets Connectors
Try the Free Spreadsheet Extension Over 314,000 Pros Are Raving About

Stop exporting data manually. Sync data from your business systems into Google Sheets or Excel with Coefficient and set it on a refresh schedule.

Get Started

    o.value:order_id::INT AS order_id,

    o.value:product::STRING AS product,

    o.value:quantity::INT AS quantity,

    o.value:price::FLOAT AS price

FROM customer_data c,

LATERAL FLATTEN(input => c.customer_details) f,

LATERAL FLATTEN(input => f.value:orders) o;

This query flattens both the outer JSON object and the nested array of orders, creating a row for each order while maintaining the customer information.

Handling JSON Arrays

For scenarios where you have a JSON array at the top level, you can use the FLATTEN function directly on the array:

CREATE TABLE order_data (

    order_batch INT,

    orders VARIANT

);

INSERT INTO order_data (order_batch, orders)

VALUES (1, PARSE_JSON(‘

[

    {

        “order_id”: 1001,

        “customer”: “John Doe”,

        “product”: “Widget”,

        “quantity”: 2,

        “price”: 9.99

    },

    {

        “order_id”: 1002,

        “customer”: “Jane Smith”,

        “product”: “Gadget”,

        “quantity”: 1,

        “price”: 19.99

    }

]

‘));

SELECT

    od.order_batch,

    f.value:order_id::INT AS order_id,

    f.value:customer::STRING AS customer,

    f.value:product::STRING AS product,

    f.value:quantity::INT AS quantity,

    f.value:price::FLOAT AS price

FROM order_data od,

LATERAL FLATTEN(input => od.orders) f;

This query flattens the JSON array, creating a separate row for each order in the array.

Best Practices for Working with JSON in Snowflake

  1. Data Validation: Ensure your JSON data is well-formed before ingestion. Use Snowflake’s TRY_PARSE_JSON function to handle potential errors.
  2. Performance Optimization:
    • Use clustering keys on frequently queried JSON paths
    • Create materialized views for commonly accessed flattened structures
    • Leverage Snowflake’s query result caching for repeated queries
  3. Data Type Casting: Always cast extracted JSON values to appropriate data types to ensure proper comparisons and aggregations.
  4. Partial Flattening: For very large JSON structures, consider flattening only the most frequently accessed elements to balance performance and flexibility.

Troubleshooting Common Issues

  1. Null Values: When flattening arrays, null values may cause unexpected results. Use COALESCE or IFNULL functions to handle these cases.
  2. Performance Issues: If queries on flattened data are slow, check your clustering keys and consider creating materialized views.
  3. Data Type Mismatches: Ensure proper type casting when extracting values from JSON. Use TRY_CAST to handle potential type conversion errors.
  4. Large JSON Documents: For very large JSON structures, consider using Snowflake’s OBJECT_CONSTRUCT and ARRAY_CONSTRUCT functions to build smaller, more manageable JSON objects.

Unlock the Power of Your Snowflake Data

Flattening JSON data in Snowflake is a powerful technique that simplifies analysis of complex, nested data structures. By mastering the FLATTEN function and following best practices, you can unlock valuable insights from your semi-structured data.

To further enhance your data analysis capabilities, consider using Coefficient, a tool that seamlessly integrates Snowflake data with spreadsheet applications. Get started with Coefficient and take your data analysis to the next level.

Sync Live Data into Your Spreadsheet

Connect Google Sheets or Excel to your business systems, import your data, and set it on a refresh schedule.

Try the Spreadsheet Automation Tool Over 350,000 Professionals are Raving About

Tired of spending endless hours manually pushing and pulling data into Google Sheets? Say goodbye to repetitive tasks and hello to efficiency with Coefficient, the leading spreadsheet automation tool trusted by over 350,000 professionals worldwide.

Sync data from your CRM, database, ads platforms, and more into Google Sheets in just a few clicks. Set it on a refresh schedule. And, use AI to write formulas and SQL, or build charts and pivots.

Julian Alvarado Content Marketing
Julian is a dynamic B2B marketer with 8+ years of experience creating full-funnel marketing journeys, leveraging an analytical background in biological sciences to examine customer needs.
350,000+ happy users
Wait, there's more!
Connect any system to Google Sheets in just seconds.
Get Started Free

Trusted By Over 20,000 Companies