Managing Secure Pages for eCommerce Sites
When you're managing an eCommerce site, securing your shopping cart and other pages where sensitive customer data is passed is essential. But, what happens to the other pages on your website when you are using an SSL certificate? Unless your SSL and site are configured correctly, having a HTTPS version of your site is potentially a cause for duplicate content, and may lead to the dilution of link equity throughout your site which can lead to problems with your ranking and SEO.
In an ideal world you’d simply only secure those pages for which you need SSL and leave the HTTP protocol for all other pages on the site. However, the reality is the option isn't always available--whether it be through the hosting provider, or the provider of the SSL this option. So what do you do if you’re one of those who fall under either category and have little control over your SSL configuration?
The first step is to check if you actually have any duplicated pages on your site and whether this is even an issue. Use the site: and inurl: commands to check whether or not any of your secure pages have made it into the index. Ideally this search would return zero results, but if you see results you’ll want to try to get those pages to fall out of the index.
Example: site:yourdomain.com inurl:https
What To Do If You Have Duplicate Content Due To Secured Pages (https)
If you do find pages, it’s likely the result of an internal (or external) link to these pages. Usually for shopping carts, it’s a link to the cart in the top right corner of the site. This is how Google crawls and eventually indexes your HTTPS secured pages. If you have identified the source of the link that you believe to be allowing Google crawl that page, you may consider adding the no-follow tag to these links as an extra percussion.
While no-follow and using robots.txt and redirects are a viable solution, simply using canonical tags is an easy way to clean up the index when it comes to non-secured and secured pages. For example, let’s say you have an eCommerce site, and the secure version of product pages are being found in the index. While this isn’t the worst instance of duplicate content, it is technically still duplicate. In order to remove these secure pages from the index you can add a canonical tag pointing to the non-secure version of that page. This would generally be enough to have the secure version eventually wash out of search, but not always, and for those situations further steps can be taken.
If that doesn't work, you may also consider adding 301 redirects to all HTTPS versions of your URLS that appear in search, then adding the pages you want disallowed to the robots.txt, such that they do not get indexed. Finally adding the noindex tag to all secure pages should clean everything up if simply adding canonical tags does not work.
It’s hard to put a finger on how much Google is weighing these factors. Given the fact that Google announced in June of 2014 that all websites should be on HTTPS. However aside from potential duplicate content penalization, having duplicate versions of your site hurts in some even more profound ways. Duplicate content becomes an issue because it causes a dilution of link equity, which is the distribution of page rank across a wider set of targets vs more targeted ones, or in this case the preferred version of the page. By allowing a HTTPS version of your site exist unchecked, you have created twice as many targets for Google to deal with. You must either do canonicals or 301 redirects in order to get the number of pages Google has to deal with back down to one site. And that doesn't even stop the Google crawl from having to deal with both versions of the site.
Tips for Duplicate Content From https Pages on Wordpress Sites
For those using Wordpress as their CMS, Yoast has a great tool for setting up your canonical tags such that you don’t have to worry about secure pages in the index. Simply go to your Yoast settings, and under permalinks you’ll find “Canonical Settings”. While some sites/ themes may default to http, you want to ensure you have selected “force http”.
To ensure that you have done this correctly navigate to one of your https pages and right click “view source”. Once in the view source window, hit control F to open a search box and look for “canonical”. You should see the http:// version of your site in the canonical tag. If you do see that version, but the site is still showing https versions in search, it may take a few days or even weeks for those pages to fall out of the index.