Updated Sep 19, 2020 to include WordPress’s default XML sitemap generation.
This post walks you through how to find your sitemap in easy-to-follow GIFs. I’ll even show you how to quickly audit your sitemap after you’ve found it too!
Just a heads up, while there are a few different types of sitemap, I’ll be focusing on the most common type, which is an XML sitemap.
What is a sitemap?
A sitemap is a list of indexable pages that belongs to a website. They are used as an efficient way to communicate new pages and content updates to web crawlers. Even though web crawlers will discover new and updated content by crawling a website from the off, search engines like Google don’t have infinite crawling resources and rely on sitemaps to help them out. This is particularly relevant to websites that have 100,000s+ pages.
What are the different kinds of sitemaps?
The most common type of sitemap and the focus of this post is an XML sitemap. Google supports a range of sitemaps formats that include:
- RSS, mRSS, and Atom 1.0
- Google sites
And sitemaps for different media types:
If your website has loads of indexable URLs, then it might use a sitemap index file. An XML sitemap file can only contain a maximum of 50,000 URLs and if a website has a bunch of XML sitemaps or you want to better organise your content, you can consolidate each individual sitemap within the sitemap index file. In layman’s terms, a sitemap of sitemaps.
Okay, now that you have a brief overview of sitemaps, let’s find out how you can find your website’s sitemap.
How to find your website’s sitemap
1. Type your sitemap URL
Just like a webpage, a sitemap.xml file has to be hosted on your web server, so that search engines can access it. That means a sitemap.xml file has a URL that you can directly access by typing it into your browser’s search bar.
The naming convention of the sitemap.xml URL will depend on the person who originally created it, but will usually follow one of the below.
Alright, I’m going to give it a try out in the wild by pasting https://www.delta.com/sitemap.xml into my browsers search bar. Let’s see what happens…
Above: Delta Air Lines XML sitemap file
It worked! I’ve found Delta Air Lines XML sitemap file. Give it a shot on your website.
2. Check your robots.txt file
So you go ahead and try the approach above. You type a sitemap URL into your browser, hit enter and eagerly await. But you’re greeted with an annoying error message…
Okay, dang. If you’re struggling to find your sitemap by directly typing a URL in your browser’s search bar, then you might be able to find it in your robots.txt file.
Robots.txt is a file that tells web crawlers how to crawl a website and is generally one of the first places a crawler visits. And that’s why it’s a great place to store XML sitemaps.
And just like a sitemap file, a robots.txt file will have its own URL you can directly access. If your website has a robots.txt file, then you’ll find it by appending robots.txt after your domain name.
Let’s have a look at Walmart’s robots.txt and see if we can spot an XML sitemap.
Above: Sitemap index files found in Walmart’s robots.txt file
Success! We actually found a couple of sitemap index files. Remember? A sitemap of sitemaps…
3. Check your Google Search Console
If you’re still struggling to find your website’s sitemap you can check if it’s been submitted in Google Search Console. I know I’ve done this in my Google Search Console, so I can show you what to expect when signing into yours. Sign in to your Google Search Console account and click “Sitemaps”.
Above: A sitemap submitted in Google Search Console
So looking at the screenshot above, I can see that I would need to append
sitemap_index.xml after my domain name i.e. www.justjamdigital.com/sitemap_index.xml to find my sitemap URL.
4. Look into how your CMS generates a sitemap
If your website uses a more off the shelf content management system (CMS), like WordPress, then it’s pretty easy to find your sitemap. I’m going to go through how you can find your sitemap with some of the common CMS’ listed below.
How to find your sitemap on WordPress
WordPress automatically generates a sitemap for your website and you can find it by appending
/wp-sitemap.xml to your domain name, just like this:
www.exampledomain.com/wp-sitemap.xml. However, you can also download Yoast SEO, a WordPress plugin, that amongst other things, will generate a more fully-featured sitemap.
In fact, I use Yoast SEO for JUST JAM and it generated the sitemap URL that I showed you in my Google Search Console account. By default, the Yoast SEO plugin will create a sitemap index, so try appending
sitemap_index.xml after your domain name to find your sitemaps once you’ve installed the plugin.
How to find your sitemap on Shopify, Squarespace and Wix
How to audit your sitemap using Screaming Frog
After you’ve found your XML sitemap, you might want to run a crawl and identify all the URLs listed in it and discover errors. I highly recommend downloading Screaming Frog and investing in the paid version to audit your sitemap. You’ll also get access to loads of more useful features. The free version will only crawl up to 500 URLs, so if you have a bigger website that’s going to be a problem.
After running the crawl of your XML sitemap, you’ll be able to:
- Identify all the URLs listed in your sitemap
- Identify the response codes of URLs listed in your sitemap e.g. 200, 301, 404 etc
- Identify whether they’re indexable or not e.g. do they have a noindex tag, are they canonicalised etc
Screaming Frog XML sitemap set up:
- Mode > List
- Configuration > robots.txt > Settings > Respect robots.txt
- Upload > Download XML Sitemap > Paste XML sitemap URL and click “OK”
- Export the data once the crawl has finished
Above: How to set up Screaming Frog when auditing an XML sitemap
Okay, so there’s one interesting thing to note here. You might’ve noticed I ticked the “Respect robots.txt” box. Whenever you use “List” mode, Screaming Frog defaults to “Ignore robots.txt”.
If you’re auditing your XML sitemap, you’ll want to know if there are URLs that shouldn’t be in there. Remember, an XML sitemap should only contain URLs that you want people to find on search engines and a URL blocked by robots.txt certainly shouldn’t be in there!
The exported spreadsheet gives you some useful data to get stuck into. For example, I can filter the “Status Code” column for all response codes other than 200. You only want URLs that return a 200 response code in your sitemap…
Above: Screaming Frog XML sitemap export
You should now be able to find your website’s sitemap without any problems and even go one step further and audit it!
However, if you’re still having some issues, feel free to comment below and I’ll help you out.