This article is derived from an email I sent my development team. It explains the strategy for avoiding duplicate content penalties based on the Googlebot indexing all variations of the product category listing pages.
This is specifically related to the Officesmart stationery and office products website I'm working on currently, but the principles would apply to any digital commerce site with paginated category pages. Especially those with too many products to list quickly on a "view all" type page.
The current category nav menu links to URLs in the form:
Changing the View drop-down results in the addition of the pageSize and pageNumber querystring parameters (link below as above with view changed to 36):
Changing the Sort by drop-down changes the orderBy parameter (same page changed to Sort By : PRICE HIGH TO LOW):
That all makes perfect sense.
Removing duplicate content
However, from a Google indexing perspective, variations to the orderBy, pageSize, onSale and specialOffer parameters results in content that is accessible somewhere else (it’s duplicated), but there’s no direct correlation between the content on each page (because it’s unique for each combination). The simplest solution, therefore, is to tell Google we don’t want to index any URLs that include these parameters. This can be achieved using the following tag on pages that include those parameters:
<meta name="robots" content="noindex, follow">
The “noindex” tells the googlebot not to index this page; the “follow” instructs Google to still follow the links on the page, which we want (because we still want Google to find every path to the product pages). The canonical <link> tag should also be omitted on these pages as there is no direct alternative.
Where the orderBy, pageSize, onSale and specialOffer parameters are excluded from the URL, the site should assume the default options (orderBy=1, pageSize=12, onSale=0, specialOffer=0). The default URL structure (for page 1) therefore becomes:
Our <title> tag for this page becomes:
<title>Officesmart | Education & Art</title>
- or -
<title>Officesmart | Education & Art | Page 1</title>
Either is okay so long as the system is consistently applied.
Our canonical href becomes:
<link rel=”canonical” href=”https://example.com/shop?category=_03”>
Which is handy because this is what’s already in the sitemaps.xml file.
Alternatively, we can use:
<link rel=”canonical” href=”https://example.com/shop?category=_03&pageNumber=1”>
Again, either option is okay so long as the system is consistently applied and we only ever have one version of the canonical href referring to the page.
Pagination is a bit different. Using the pagination links updates the pageNumber parameter (page 4 of same selection under current structure):
Once again, we want Google to ignore any pages that include the orderBy, pageSize, onSale and specialOffer parameters, so the default pagination URL (based on the current structure) will look like:
The content/links on each page is unique compared to any other pageNumber value. Therefore, we need to create canonical <link> tags for every paginated standard page with this URL structure:
<link rel=”canonical” href=”https://example.com/shop?category=_03&pageNumber=4”>
The orderBy, pageSize, onSale and specialOffer parameters are assumed and we still don’t want to index any content that includes these parameters (using the robots meta tag above) even if there’s a pageNumber parameter.
Our <title> tag for this page, which should also be unique for each page, becomes:
<title>Officesmart | Education & Art | Page 4</title>
Additionally, we can add the following pagination tags:
<link rel=”prev” href=”https://example.com/shop?category=03&pageNumber=2”>
<link rel=”next” href=”https://example.com/shop?category=03&pageNumber=4”>
These provide additional detail to Google to tell it that this is part of a paginated set. The “prev” option should be omitted from page 1; the “next option should be omitted from the last page in the pagination set.
If we include the query parameter pageNumber=1 for the first page, we should ensure that the canonical href is pointing to:
- or -
Either option is fine, so long as it’s consistent in all occurences.
Couple of additional points:
- Ideally we want to incorporate category names instead of numbers (e.g. category=educationart instead of category=03)
- Wherever there is a unique canonical <link> tag, there should also be a matching, unique <title> tag
- The search refine options (price, sub-category, brand) do not change the querystring, so can be ignored (YAY!)
Wow…that was a lot of brain work. Well done if you made it this far – hope it makes sense!
Please let me know if you have any questions, or add your comments below.