Text Processing Formulas

SQL Most used formulas

This article introduces you to the most commonly used formulas that can solve a wide variety of problems you may encounter when working with data, particularly in SQL and related environments. Whether you're working with text processing, numerical calculations, or date manipulation, these formulas will help you streamline your work and improve your efficiency. This guide is aimed at users who are familiar with basic SQL functions and are looking to optimize their queries.

Prerequisites

Basic knowledge of SQL queries and functions is recommended to fully understand and apply these formulas.

CONCAT

This function addresses the common need to combine data from different sources into a single, unified field. It concatenates one or more values into a single result, making it ideal for tasks like combining first and last names or merging address components.

How to use it: CONCAT(value1, value2 [, value3, etc.])

Issue: You have separate columns for first and last name and need to display the full name in a single field. Or, you need to combine address components (street, city, state) into a single address field for reporting or analysis.

Example: CONCAT("John", " ", "Doe") -- Result: "John Doe"

SPLIT

The SPLIT function can be used to extract different categories or subcategories from a string when your data is organized hierarchically with delimiters, such as categories separated by " > " in product descriptions.

How to use it: SPLIT(column, delimiter)[SAFE_OFFSET(position)]

Tips: The offset numbering starts at zero. To extract the first element, use SAFE_OFFSET(0), and so on.

Issue: Your product descriptions are organized into hierarchical categories separated by " > ", and you need to extract the most specific subcategory for targeted marketing efforts.

Example: SPLIT("Electronics > Audio > Headphones > Wireless", " > ")[SAFE_OFFSET(3)] -- Result: " Wireless"

ARRAY_TO_STRING or IFNULL

Although both ARRAY_TO_STRING and IFNULL deal with data manipulation, they serve distinct purposes and are used in different scenarios.

ARRAY_TO_STRING: This function handles arrays, converting them into a readable string format.
IFNULL: This function deals with individual values, specifically handling nulls by replacing them with a default value.

ARRAY_TO_STRING

How to use it: ARRAY_TO_STRING(array, delimiter)

Issue: You have an array of product categories that you want to display in a human-readable format.

Solution: Use ARRAY_TO_STRING to concatenate the array elements into a single string with a specified delimiter.
ARRAY_TO_STRING(['Electronics', 'Clothing', 'Home Goods'], ', ') -- Result:"Electronics, Clothing, Home Goods"

IFNULL

How to use it: IFNULL(expression, null_replacement)

Issue: You are concatenating product names and categories, but some products have null values for their names or categories.

Solution: Use IFNULL to provide a default value for null entries before concatenation.
CONCAT(IFNULL(Product_Name, "Unknown"), " - ", IFNULL(Category, "Uncategorized"))
Result: "Laptop - Electronics" or "Unknown - Office Supplies"

REPLACE

This function tackles the challenge of maintaining accuracy and consistency in textual data. It replaces a specific part of a text string with another value, enabling you to update product descriptions, correct typos, or standardize information across your dataset.

How to use it: REPLACE(column, old_value, new_value)

Issue: You need to update product descriptions with new information, correct spelling errors in customer reviews, or standardize inconsistent data entries, such as replacing abbreviations with full words.
Example: REPLACE(description, "old_product", "new_product")

MULTIPLE_REGEXP_REPLACE

This function performs multiple string replacements simultaneously using regular expressions. It simplifies cleaning and standardizing data by applying several find-and-replace operations in one step.

How to use it: MULTIPLE_REGEXP_REPLACE(column_to_modify, [('old_expression_1', 'new_expression_1'), ('old_expression_2', 'new_expression_2'), ...])

Issue: You have inconsistent data requiring several simultaneous replacements, such as standardizing abbreviations or correcting multiple common misspellings in a single column.
Example: MULTIPLE_REGEXP_REPLACE(text_field, [('dr.', 'doctor'), ('st.', 'street'), ('mrs.', 'madam')])

LIKE and REGEXP_CONTAINS

Both LIKE and REGEXP_CONTAINS are used for string matching, but they differ in their capabilities and complexity.

LIKE: This operator provides basic pattern matching using wildcards '%' (matches any sequence of zero or more characters) and '_' (matches any single character).
REGEXP_CONTAINS: This function supports more advanced pattern matching using regular expressions, allowing for complex pattern definitions and greater flexibility.

LIKE

How to use it: column LIKE 'pattern'

Issue: You need to find all products with names containing "shirt".

Solution: Use LIKE with the '%' wildcard to match any characters before and after "shirt"
Example: product_name LIKE '%shirt%'

REGEXP_CONTAINS

How to use it: REGEXP_CONTAINS(column, regexp)

Issue: You need to find all products with IDs starting with "A" followed by three digits.

Solution: Use REGEXP_CONTAINS with a regular expression to define the pattern.
REGEXP_CONTAINS(product_id, '^A{3}$')

Tips: To learn more about REGEX regular expressions and their usage in BigQuery, refer to our dedicated article

INITCAP

This function addresses the need for consistent and visually appealing text formatting. It capitalizes the first letter of each word in a string, ensuring that names, titles, or any textual data adhere to a standardized presentation style.

How to use it: INITCAP(value)

Issue: You need to format names consistently in a report, ensure that product titles are displayed with proper capitalization, or standardize the presentation of textual data for improved readability.

Example: INITCAP("hello world")
Result: "Hello World"

LENGTH

This function counts the number of characters in a given string. It's useful for performing conditional checks based on the length of text fields.

How to use it: LENGTH(column)

Issue: You need to ensure a text field meets a minimum character length requirement, such as verifying that product descriptions are sufficiently detailed for display.

Example: LENGTH(product_description) IF(LENGTH(product_description) < 150, product_title, product_description)

Quick Answers to Common Issues

My `CONCAT` isn't adding a space between names sometimes. Why?

Ensure you've explicitly included a space as a separate value within your CONCAT function: CONCAT(first_name, ' ', last_name). Also, check your source data for any unwanted leading or trailing spaces.

My `SPLIT` function isn't returning the correct part of the string. What's likely wrong?

Double-check that the delimiter in your SPLIT function exactly matches the separator in your data, including any spaces. Also, verify that the SAFE_OFFSET value you're using corresponds to an actual element in the split string.

Why isn't my `LENGTH` function giving me the expected character count?

Be mindful of any leading or trailing spaces in your data, as these will be counted as characters. Also, ensure you are applying the LENGTH function to the correct column.

For further assistance with SmartFeeds, consider reviewing these articles:

For additional help, feel free to reach out via our Contact Us page.

Text Processing Formulas

Prerequisites

Table of Contents

CONCAT

SPLIT

ARRAY_TO_STRING or IFNULL

ARRAY_TO_STRING

IFNULL

REPLACE

MULTIPLE_REGEXP_REPLACE

LIKE and REGEXP_CONTAINS

LIKE

REGEXP_CONTAINS

INITCAP

LENGTH

Quick Answers to Common Issues

My `CONCAT` isn't adding a space between names sometimes. Why?

My `SPLIT` function isn't returning the correct part of the string. What's likely wrong?

Why isn't my `LENGTH` function giving me the expected character count?

Related Articles

Was this article helpful?

Text Processing Formulas

Prerequisites

Table of Contents

CONCAT

SPLIT

ARRAY_TO_STRING or IFNULL

ARRAY_TO_STRING

IFNULL

REPLACE

MULTIPLE_REGEXP_REPLACE

LIKE and REGEXP_CONTAINS

LIKE

REGEXP_CONTAINS

INITCAP

LENGTH

Quick Answers to Common Issues

My CONCAT isn't adding a space between names sometimes. Why?

My SPLIT function isn't returning the correct part of the string. What's likely wrong?

Why isn't my LENGTH function giving me the expected character count?

Related Articles

Was this article helpful?

My `CONCAT` isn't adding a space between names sometimes. Why?

My `SPLIT` function isn't returning the correct part of the string. What's likely wrong?

Why isn't my `LENGTH` function giving me the expected character count?