6 REGEXEXTRACT Google Sheet Examples

Joel Mesherghi | Published | Updated July 25, 2020

This post features 6 examples that will make your life infinitely easier as a digital marketer, showing how to use REGEXEXTRACT in Google Sheets GIF by GIF.

REGEXEXTRACT is a handy means to extract precise text from strings. If you work in SEO or any digital marketing job that requires a fair bit of URL analysis, then REGEXEXTRACT has your back. 

Additionally, it’s also well worth spending time learning how to use REGEXEXTRACT in Google Sheets. The same syntax can be applied to the powerful REGEXMATCH and useful REGEXREPLACE functions, as well as other Google products like Google Analytics.

I’ll be adding new REGEXEXTRACT examples to this post whenever I find really useful examples, so be sure to keep up to date!

REGEXEXTRACT examples

The syntax principles you need to know

Before we jump into the examples, I want to quickly show you the REGEXEXTRACT syntax you’ll always need to use in Google Sheets.

It’ll help you better understand how REGEXEXTRACT functions work in Google Sheets and the examples we’ll walk through, so bear with me for a moment. 

The syntax you’ll always need to use when writing a REGEXEXTRACT formula:

=REGEXEXTRACT(text, regular_expression)

  • =REGEXEXTRACT is how you’ll always start your formula. It tells Google Sheets the function you want to use.
  • Text is the cell where you want to extract a certain word or number. The text is also referred to as a string
  • Regular_expression – This is the syntax we add to create a REGEX formula

This will become clearer as we work our way through the examples.

REGEXEXTRACT Google Sheet examples

1. REGEXEXTRACT subfolders

Every digital marketer has had to run some form of URL analysis and REGEXEXTRACT can make it all the easier.

Large websites often use subfolders to organise their content e.g. /men/shirts/burton. But subfolders can sometimes make it a little tricky to run ad hoc analysis.

Let’s say I want to find out how many sessions each product category generates for a website. 

The formula we need to use:

=REGEXEXTRACT(A2,".*\/(.+)\/")

Now that I can attribute sessions to product categories, I could then use SUMIFS to get total session counts for each product type. Pretty cool, huh?  

Let’s break down the REGEX formula: 

Step 1:

To start, I used “.*” and this is telling our formula to match any character. Starting with “.*” means our formula will be as succinct as possible, as it automatically matches the /men/ subfolder for us.

Step 2:

I then added “\/” to specify the trailing slash just before the first subfolder e.g. /shirts. Remember when we had to add the escape character “\” to our parameter formula because “?” has a special meaning in REGEX? Well, it’s the same for a trailing slash.

Step 3:

Just like our extracting parameters example, I added “(.+)” and then another “\/” to match the trailing slash just after the first subfolder e.g. shirts/. 

Adding “(.+)” allowed me to pinpoint product categories while ignoring the trailing slashes either side. 

2. REGEXEXTRACT the ending subfolder

Now we know the top-performing product categories, let’s find out what the top-performing brands are. It’s even more straightforward than the last formula.

This formula will extract the ending subfolder of any URL, regardless of URL length. 

The formula we need to use:

=REGEXEXTRACT(A2,".*\/(.+)")

This formula follows the same principles as the last example. However, since we’re trying to extract the very last subfolder, we simply cut out the middle man.

Step 1:

.*” matches any character from the beginning of our formula. 

Step 2:

We then simply target the last trailing slash just before our brand names by adding “\/”.

Step 3:

Finally, we specify that we only want to extract the text after the last trailing slash by using “(.+)”.

3. REGEXEXTRACT parameters

If you’ve worked in digital marketing long enough, then you’ve likely come across URL parameters.

You know, those URLs that end in gibberish? Like:

  • www.exampledomain.com/services/cloud/application-management?rdc=55567

And as part of an analytics audit, you’ll likely have to collect a list of them. REGEXEXTRACT will make that possible in no time.

The formula we need to use:

=REGEXEXTRACT(A2,"\?(.+)=")

Step 1:

Because “?” has a special meaning in REGEX, we need to add a “\ just before our “?”. “\” is called an escape character and is telling our formula to literally match a question mark in our string. 

Step 2:

.+” means we’re going to extract one or more of every character after the “?” symbol. However, you’ll see that “.+” is in between brackets. By adding brackets around “.+” (known as grouping) we’re asking our formula to ignore both the “?” and “=” symbol for our final output.

See what happens if I don’t include the brackets below.

Our extracted parameters now include the question mark and equals symbols e.g “?rdc=”, which we don’t want.

4. REGEXEXTRACT numbers

Extract numbers from the beginning of a string

When it comes to extracting numbers, your REGEX formula is going to wildly differ depending on the use case. We must focus on understanding how to identify patterns as we go through the examples, so you can apply the same logic to your needs. 

If you want to keep tabs on competitors or just get an idea about their investment in content, one little tip is to see how frequently they post content. This can be easy with a web scraping tool like Screaming Frog, but sometimes you need to play around with the output depending on who you’re keeping tabs on.

Using Hubspot as an example, the only way to extract publish dates for each blog is to extract lot’s of unwanted text along with it. You’ll see what I mean in the GIF below.

The formula we need to use:

=REGEXEXTRACT(B2,"published(.+).:[0-9].:")

The key to getting our final output is to see what text preceded the publish date. We can see that it’s “published”. That’s one of our main patterns right there!

Step 1:

We need to tell our formula to exclude everything that comes before the month the blog was published. 

We know that “published” is before the publish dates.  And just like our parameter formula, we need to use grouping to tell our formula to ignore that text; however, we still need to extract one or more of every character after it. Our formula now looks like “published(.+)

Step 2:

We now need to tell our formula to ignore everything after the publish date. The text after the publish dates includes semicolons and other numbers. 

After “published(.+)” we use the full stop “.” to match any character and add “:” to literally match the semicolon symbol. This prevents us from extracting some of the publish time numbers i.e. “1:”. Our formula now looks like “published(.+).:

Step 3:

To match the other unwanted numbers in the string, we use “[0-9]”, which will match any single number between 0 and 9.  Finally, we add another “.:” to prevent us from extracting the remaining publish time numbers i.e. “1:3”. 

Extract numbers from the end of a string

What if we want to extract the updated date, rather than the publish date? We can do that by switching up our formula.

The formula we need to use:

=REGEXEXTRACT(B2,"updated(.+)")

Step 1:

Just like our previous formula, we need to identify what text precedes the updated date, so we can exclude it. We can see it’s “updated” and that’s how we start our formula. Our formula now looks like “updated”.

Step 2:

Because the updated date is the last few bits of text in the string, we simply need to tell our formula to extract one or more of every character after the word “updated”. Our final formula looks like “updated(.+)”.

Extract decimal points

There may be times where you need to extract decimal point figures from a string. I’ve provided a few random examples in the GIF below.

The formula we need to use:

=REGEXEXTRACT(B2,"[0-9]*\.[0-9]+")

Step 1:

We need to match the decimal point and the numbers that precede it. “[0-9]*\.” does this for us.

[0-9]” will match any single number between 0 and 9. “*” by itself means zero or more times and when we add “\.” to match the decimal point, we extract the first part of our string. 

Using the first example in the GIF, we extract 655.00 from the string The price is £655.00.

Step 2:

We then need to match the remaining numbers after the decimal point. Adding “[0-9]+” does this for us.

We know “[0-9]” will match any single number between 0 and 9, but adding a “+” after “[0-9]” i.e. “[0-9]+” will ensure we capture all digits. “+” means one or more times.

5. REGEXEXTRACT domain extensions

As an SEO, you’ll probably be asked by a stakeholder what domain extension they should use for a website launch in a new market.

Importantly, we know Google uses country-code top-level domain names (ccTLDs), such as .es for Spain, as a signal to determine a target locale. You could use REGEXEXTRACT to quickly pull the domain extension of ranking URLs and perform some analysis to provide data-driven insights. This type of analysis is particularly useful for e-commerce websites looking to expand into new markets. 

The formula we need to use:

=REGEXEXTRACT(A2,"(\..+)")

This one is easy. By grouping “( )”, targeting the full stop “(\.)” and then telling our formula to extract everything after the full stop “(\..+)”, we get what we want.

Again, the full stop has a special meaning in REGEX, so we need to add an escape character and tell our formula to literally match a full stop “.” in our string. 

This formula is a quick way to get a high-level view of the landscape and forms the foundation for deeper analysis.

6. REGEXEXTRACT domain name

This last one is pretty beastly. There’s no getting around it. But it’s extremely useful if you’re working on some sort of domain analysis, such as a  backlink audit.

The important thing to take away from this formula is that it will extract a domain regardless of protocol e.g. http or https and whether a domain uses www or not. I’ve highlighted two cells in green to show you what I mean by this.

It’s quite a long formula, so I’m going to just copy and paste it in.

The formula we need to use:

=REGEXEXTRACT(A2,"^(?:.*://)?(?:www\.)?([^:/]*).*$")

Summary

This post walks you through a few REGEXEXTRACT Google Sheet examples that should make extracting precise text or numbers a breeze. 

If you want to play around with and debug your REGEX formulas, then I recommend REGEX 101. You’ll need to select the Golang option, as that’s the type of REGEX Google uses in their products like Google Sheets and Google Analytics.

There are also a ton of REGEX cheat sheets out there. I like to glance at this REGEX cheatsheet.

If you’ve got any cool examples of using REGEXEXTRACT in Google Sheets that I haven’t listed, I’d love to see them. Drop a line in the comment section below.

And if you have questions, I’d love to help out, so feel free to ask away.

Happy REGEXing!

Leave a comment