DataFeedWatch Request demo

RegExp – Ninja Level Feed Optimization

Posted on April 17, 2014 by Mateusz Miodek

Those of you who already are using DataFeedWatch might have noticed the word “regexp” in mapping options. In this article I will explain how RegExp can be used in our app, but let’s first clarify what exactly RegExp is.

A regular expression (RegExp for short) is a special text string to describe a search pattern. You can think of regular expressions as wildcards on steroids. You are probably familiar with wildcard notations such as *.txt to find all text files in a file manager. RegExp works on the same principles but can do so much more.

It takes some practice to get the hang of RegExp, but once mastered it comes it very handy. For those who are interested in learning regular expressions I can recommend this tutorial.

Also it is a good idea to test your RegExp before deploying it. There are many online tools out there to do just that. The tool I use is called Rebular.

So let’s move on to real life examples to see how RegExp can be helpful when it comes to feed optimization.

Example 1

Imagine you need to create a field ‘color’ for your Google Shopping feed. You do not have an field for color in your store but you know that all the titles of your products end with a color name (e.g. Adidas Mens Snova Glide 5 Running Shoes Green).

The best way to deal with this situation is to map color from name and use an additional replace rule with RegExp like this:


What it does is:

  1. divide each name into two groups:
    group 1 – everything except the last word represented by (.*) where
    .* => any single character appearing any number of timesgroup 2 –last word represented by (s[^s]+) where
    s => any whitespace character
    [^s]+ => any single character except whitespace appearing at least once
  2. Replace existing value which can be described as (.*)(s[^s]+) with a new value which is group 2 (in RegExp taxonomy written as $2)

The outcome of this mapping for “Adidas Mens Snova Glide 5 Running Shoes Green” would be “Green”.

Example 2

Imagine you create a price field for a channel that accepts 2 decimal points (e.g. 12.45) and your prices have 4 (12.4500). Again replace rule with RegExp comes in handy. To fix the format we need to set it like this:


Similarly to the previous example this rule:

  1. divides each price into 2 groups:
    group 1 – everything except the last two decimal points ([0-9]+.[0-9]{2}) where
    [0-9]+ => any whole number
    . => dot character
    ([0-9]{2} => any 2-digit numbergroup 2 – the last two decimal points ([0-9]{2})
  2. replaces existing value which can be described as ([0-9]+.[0-9]{2})([0-9]{2}) with a new value which is group 1 ($1)

The outcome of this mapping for 12.4500 is 12.45.

Be advised that this mapping does not round up the price to two decimal points, but instead cuts off the last two digits.

Example 3

Let’s say you want to set product_type for Google Shopping as a main category of your products (e.g. Car parts) but in your system you have only the whole category paths (e.g. Car parts > BMW > 320i > 2013).

What you need to do here is remove everything beginning from “ >”. The rule that covers this would look like this:

s>.* => any single character followed by “>” followed by any single character appearing any number of times

The outcome of this mapping for “Car parts > BMW > 320i > 2013” would be “Car parts”.

Example 4

For the last example imagine a channel that requires UPCs, but in your system not all products have UPCs and the UPCs that you do have do not all have a proper format (12-digit). If you send a feed with products for which UPCs are empty or improper, the whole feed could be rejected. What you need to do is exclude those products. This can be achieved with a single exclude rule using guess what … RegExp.


What we do here is include only products for which UPC is exactly a 12-digit number. In other words include products only if UPC matches regexp ^[0-9]{12}$

Those are only a few of numberless examples of how RegExp can be used. The rule of a thumb is that whenever there is some complex mapping you need to RegExp is your “weapon of choice”.

If you have any mapping issues please describe them in the comments and I will try to find a proper RegExp to deal with it (if possible).


About DataFeedWatch

DataFeedWatch is data feed management software  that enables merchants on Magento, Shopify, Volusion, BigCommerce, 3DCart  and numerous other shopping carts  to optimize their product data feed for Google and 200+ Comparison Shopping Engines

Share Button

Related Post

Increase Your Offline Sales with Google’s Lo...   Google Local Inventory Ads drastically increase offline sales by turning nearby online shoppers into in-store customers. How can you sei...
How to check your source feed. You often need to check what is in your source feed. You know what’s in your store, but still; reviewing the INPUT feed is often the best way to troub...
How to see the number of products in your data fee... You often wonder how many products are in your source feed (the one we download from your store), how many you excluded and how many are in your out...
The Power of Merging Multiple Data Feeds   If your data feed contains product data like gross margin, stock, cpc or price-rank, you can optimize your ppc-campaigns very effectivel...

Posted in: Data Feeds,DataFeedWatch News,Features & Functionality,Tips & Tricks

  • paulo rossini


    Great post with good examples, congratulations !

    How would a Regex for setting limited caracteres be ? Google Shopping, recommends using no more than 70 caracteres in product tittle, for example.

    • Jacques van der Wilt

      The regexp for that would be: replace (.{70}).* with $1 however there is no need to do this as DataFeedWatch automatically truncates titles to 70 characters now.

  • Mark

    Love it, awesome article, thanks!

How to Double your Google Shopping Sales in 1 hour

Download our free eBook!