Regexp 8211 How To Make Preparing Data Easier

Regexp 8211 How To Make Preparing Data Easier

Management Summary

RegExp formulas help with the analysis and preparation of data in various online marketing programs (e.g. Data Studio, Google Analytics,...) - although they are often intimidating due to their complex appearance, with calm, patience and a curious mindset, the basics can be learned quickly. In order to use RegExp formulas efficiently, it is important that the naming of dimensions is consistent and correct.

We explain the 3 most important and useful RegExp formulas and how you can use them::

  1. REGEXP_EXTRACT
  2. REGEXP_MATCH
  3. REGEXP_CONTAINS

Prepare data with RegExp formulas

Basically, a RegExp formula is made up of the data source, the dimension and the formula itself. For example, Campaign Manager, DV360, and Google Analytics are commonly used data sources for dashboards.

The dimensions come from the data source. “Placement” can be taken from a Campaign Manager data source and “Insertion Order” from a DV360 data source – but mixing does not work. This requires a data blend specifically for the dashboard, or a BigQuery data source that can be accessed again and again.

There are two different ways to create RegExp formulas in Data Studio. Only for a specific table or as another dimension for the entire dashboard.Blog RegExp - RegExp Formel in Data Studio erstellenCreate a RegExp formula in Data Studio:1.: For the selected table / 2.: For the dashboard

To use RegExp formulas, we still need a basic understanding of the different characters that can be used to create the formula. There is stillsome more, but here the most important ones are briefly explained:

The dot and the asterisk

To replace words and numbers in a RegExp formula without having to specify every possible combination, dot-asterisk ( .* ) can be used. The dot ( . ) is a symbol for a character in RegExp. So it stands for letters, numbers and special characters that can be used in the name.

The asterisk ( * ) indicates that there is a longer number of characters – thus covering an entire word. Dot-Asterisk replaces the characters in a formula until it encounters the next character specified in the RegExp formula. Therefore, it is common to use hyphae ( – ) or underscores ( _ ) in terms to simplify working with RegExp formulas.

For example, for

Awareness_Subject

Engagement_Subject

not two formulas can be created, but only one

.*_Subject

In this case, the dot asterisk replaces both “awareness” and “engagement”.

The pipe

Pipes ( | ) are used in RegExp formulas to specify two words at the same time in a formula. For example, several formats can be taken together:

(HPA|MediumRectangle)

The brackets and quotation marks

Brackets are used in two different ways in RegExp formulas. On the one hand, every RegExp formula needs a round bracket ( ( & ) ) at the beginning and end of the function to delimit it.

The second function of the parentheses is to highlight part of the dimension name. More about this in the examples for REGEXP_EXTRACT and REGEXP_MATCH

The quotation marks (“ “) are placed at the beginning and end of the text used in the RegExp formula. Both for the name of the dimension and for the self-selected text that is specified in the formula.

The REGEXP data set

The following data set is used for the following examples:

Display_Awareness_HPA_Cat

Display_Awareness_HPA_Dog

Display_Awareness_MediumRectangle_Cat

Display_Awareness_MediumRectangle_Dog

Display_Engagement_HPA_Cat

Display_Engagement_HPA_Dog

Display_Engagement_MediumRectangle_Cat

Display_Engagement_MediumRectangle_Dog

From this, information can be drawn about the channel (display or video), the funnel level (awareness or engagement), the format (HPA, medium rectangle or preroll) and the subject (cat or dog).

In order to use RegExp formulas, it is important that the naming of the dimensions is consistent and correct.

REGEXP_EXTRACT

The EXTRACT formula is used to pull individual words from the dimension. For example, the funnel stage – awareness or engagement – can be taken from the data set:

REGEX_EXTRACT (Placement,(Awareness|Engagement))
Result:
Commitment
Awareness

Since there is both awareness and engagement, these are summarized with a pipe in the formula. The example data set of 12 lines becomes 2 – one line per funnel level.

However, the formula can also be applied more flexibly. If there are further funnel levels (e.g. performance or loyalty), these can either be added or read out using the dot asterisk ( .* ).

The advantage of the dot asterisk ( .* ) is that not all possible funnel stages have to be listed. This flexibility is particularly helpful with varying names, such as subject names.

REGEX_EXTRACT (Placement,”Display_(.*)_.*”)

Display_Engagement_MediumRectangle_Cat

Display_Awareness_HPA_Dog

Result:

Commitment

Awareness

This formula is a little more complex in structure because the exact words are no longer specified. The formula must go to the place where the words are located.

The structure of the formula is as follows:

REGEX_EXTRACT (   )

Placement

,

= thefunctionthat is used

= thedimensionthat is used

= the decimal place after the dimension indicates that now theRulethe functionfollows.

”Display_

(.*)

_.*”)

= the beginning of the placement name

= with theBracketsis signaled that this word – replaced by dot-asterisk –taken outwill

= through the last dot asterisk therestthe placement namereplaced.

With thequotation marksbecomes the dimension nameenclosedand with thatround brackettheRegExp formula.

In the example data set, all lines start the same, with “Display_”, which makes use of the formula. However, if the lines have different beginnings, such as “Display_” and “Video_”, the beginning can also be replaced with a dot asterisk ( .* ).

REGEX_EXTRACT(Placement,”.*_(.*)_.*”)

Display_Engagement_MediumRectangle_Cat

Display_Awareness_HPA_Dog

Result:

Commitment

Awareness

Using the same logic, words that later appear in the placement name can also be removed. For example, this formula takes out the format:

REGEX_EXTRACT(Placement,”.*_.*_(.*)_.*”)

Display_Engagement_MediumRectangle_Cat

Display_Awareness_HPA_Dog

Result:

MediumRectangle

HPA

EXTRACT is suitable for quickly summarizing and analyzing data.

REGEXP_MATCH

The MATCH function is used in conjunction with another function. Data Studio can be used to assign all matched rows to a category.

CASE
WHEN (REGEXP_MATCH Formula 1) THEN “Category 1”
WHEN (REGEXP_MATCH Formula 2) THEN “Category 2”
ELSE “missing value”
END

Since this is a self-programmed field, it is important to use CASE at the beginning and END at the end to narrow down the function.

WHEN describes the condition under which THEN occurs. ELSE must be used with WHEN formulas – this makes it possible to combine everything that does not fall into the previously determined categories, or to display an error message.

The structure of the pure MATCH RegExp formula is similar to the EXTRACT formula:

REGEXP_MATCH(Placement,”.*_(Cat|Dog)”)

Display_Engagement_Medium Rectangle_Cat

Display_Awareness_HPA_Dog

Together with the WHEN – THEN function, for example, a category can be defined for the subject:

CASE
WHEN REGEXP_MATCH(Placement,”.*_(Cat|Dog)”) THEN “Animal”
ELSE (“other”)
END

Result:
animal

The MATCH formula is also suitable for reprocessing abbreviations that are in the placement name:

CASE
WHEN REGEXP_MATCH(Placement,”.*_(HPA)_.*”) THEN “Half Page Ad”
ELSE (“other”)
END

Result:
Half Page Ad

MATCH in conjunction with WHEN – THEN is one of the most flexible functions in Data Studio.

REGEXP_CONTAINS

Unlike the functions discussed so far, CONTAINS does not return a word or category, but rather whether a statement is TRUE or FALSE.

It should therefore be the formula:

REGEX_CONTAINS (Placement,”Display”)

Display_Engagement_MediumRectangle_Cat

Display_Awareness_HPA_Dog

result in TRUE for all lines of the example data set.

CONTAINS is not used in data analysis, but rather helps troubleshooting to find typos or incorrect naming orders.

Conclusion:

With RegExp formulas, data sets can be summarized, analyzed and controlled efficiently, and they also make working with large data sets easier. We promise, after the initial shock is over, it’ll be great fun!

Would you like to learn more about RegExp and need help with your dashboards? Contact us viakontakt@e-dialog.group

e-dialog office Vienna
Relevant content

More about Analytics