Thierry Muller, a Developer Relations Program Manager at Google, and several contributors posted an update on the XML sitemaps feature that may land in WordPress this year. After seven months of development, the team has made the XML Sitemaps feature plugin available on GitHub. It is currently open for testing and feedback. The plugin should also be available in the WordPress plugin directory by next week.
Update (January 31, 2020): The Core Sitemaps feature plugin is now available in the WordPress plugin repository.
The project aims to ship a basic version of an XML sitemaps feature to all WordPress installations. It will also offer an API for plugin developers to manipulate. Therefore, sitemap plugins would not automatically disappear. Instead, plugins would offer users various options on how their sitemaps work.
A team created by Google, Yoast, and other contributors originally proposed XML sitemaps as a core WordPress feature in June 2019. Traditionally, WordPress has left this feature to plugins to implement, and many have filled this role over the years. However, several other major content management systems ship with sitemaps as part of their core codebase.
Many praised the initiative, such as WordPress project lead Matt Mullenweg. “This makes a lot of sense, looking forward to seeing the v1 of this in core and for it to evolve in future releases and cement WordPress’ well-deserved reputation of being the best CMS for SEO,” he said.
However, several people questioned whether WordPress should ship with XML sitemaps. Some were worried about performance and others felt like the feature should remain in plugins.
“At a high level, expanding the number of WordPress sites with Sitemaps ultimately speeds up content discoverability by search engines and re-crawl fresher content flagged by the lastmod date faster than a scheduled bot would,” Muller said of the primary reasons the feature belongs in core.
WordPress users may see this feature arrive in major update this year. “Ambitiously [version] 5.4,” said Muller of the release goal. “Realistically 5.5.”
The feature plugin currently indexes the following URLs for a site:
- Homepage
- Blog posts page (if not the homepage)
- Posts and pages
- Categories and tags
- Custom post types
- Custom taxonomies
- Users/Authors
Custom post types and taxonomies are registered only if they are public. There is also a filter hook available to change which post types, taxonomies, and users are indexed. Ideally, WordPress would provide a registration flag for post types and taxonomies.
Solving the Performance Issues
One of the primary concerns with the initial proposal is how well a core sitemaps feature would perform and scale, particularly on larger sites. Without a full caching solution built into core, it presented some hurdles for the team.
“Solving the performance issue is not trivial, and we have looked into various solutions,” said Muller. “We believe that we landed on a solution that doesn’t need full caching and will still be scalable.”
For performance, there are two primary challenges:
- The number of URLs per page.
- The
lastmod
date in theindex.xml
file.
“Addressing the number of URLs per page is fairly trivial,” said Muller. “While sitemaps can have up to 50,000 URLs per sitemap, we found that capping it at 2,000 is acceptable from a performance perspective and totally acceptable from a search engine perspective.” The team decided to stick with a default of 2,000 URLs per sitemap and to provide a filter hook for plugins to alter if necessary.
Finding a solution for the lastmod
date was not as easy. “We believe we found a good balance, which will be scalable and doesn’t open the can of worms that full caching exposes us to,” said Muller.
The solution the team implemented involved scheduling a cron task that runs twice daily (the frequency can be filtered by plugins). The cron job fetches the lastmod
dates of each sitemap and stores them in the options table, which essentially works as a light caching solution.
“Relying on cron should be stable enough for small to medium websites,” said Muller. “Enterprise websites usually have server cron set up to more regularly ping WP Cron instead of relying on website visitors to trigger it. In fact, most managed hosting providers have that for all plans.”
If the team’s initial implementation is not well-rounded enough, they have been researching an alternative implementation that uses custom post types to store and update sitemap data. Two open GitHub tickets further explore performance that developers may want to check out: Issue #1 and Issue #39.
What Happens to Sites With Existing Sitemaps?
One question that remains unanswered is what happens when a user updates to WordPress 5.4/5.5 and already has a sitemap. There are likely millions of WordPress sites that are running a plugin or have some sort of sitemap solution in place.
“This is a question which we haven’t quite solved,” said Muller. “It is important to work with plugin authors, and in an ideal world, all plugins providing advanced sitemaps solutions would extend the core API. We would love to get feedback from the community on that one.”
WordPress must take care to avoid any major conflicts or indexing errors, or at least alleviate issues for the users who may be unaware of this upcoming feature.