add_filter( 'wp_sitemaps_enabled', '__return_false' );
WordPress lost some of my trust by automatically opting me into something I did not want. I’m especially annoyed because I just found a page in a search engine that should not be discoverable. I intentionally did not make a sitemap.xml file, and now I see WordPress is automatically redirecting my (non-existent) sitemap.xml to wp-sitemap.xml
I don’t know who is responsible, but I decided to write Pascal Birchler to share my concern, since he made the announcement on WordPress.org.
FWIW I made some minor edits to the actual email…
Hi Pascal, I don’t know who is responsible but your name came up in “new-xml-sitemaps-functionality-in-wordpress-5-5”
To be brief, here’s my concern…
Scenario: Many people depend on a blogger. Everything is fine because certain posts rank in Google, the advertising is optimized, and she keeps working. Google automatically calculated certain pages are important–no problem. Then one day WordPress adds a new feature she didn’t ask for–a feature that tells Google which pages are important.
“Remember that sitemaps are a recommendation to Google about which pages you think are important;”Then Googlebot comes by, sees the “important pages” (that she did not choose) and Google changes her rankings by some small factor. Now her best earning pages fall off page 1 and she’s out of business. Seems like this sitemap feature has the potential for major disruption.
“Google ignores <priority> and <changefreq> values.”OK that is encouraging, but what if the policy changes?
For many years Google’s official position was… sitemap.xml is not required and not having one will not impact SEO. Therefore, I have always advised people to not use a sitemap… also because of the “priority” problem, URL limit problem, and the fact Yoast is rubbish (in my opinion.)
Since you work at Google, maybe you can assure bloggers this won’t be a problem. BUT if this change doesn’t affect SEO, then what is the point of a sitemap in the first place? What a huge waste of time and energy!
Page discovery was never a problem for WordPress. Google always finds my posts within a few seconds.
And if someday Google decides to use sitemaps priority data, bloggers will have a hard time deciding how to prioritize their content in sitemaps.xml
– Should they give favorite content a higher priority?
– Should they give popular content a higher priority?
– Should they give profitable content a higher priority?
As a blogger, these are questions I would rather not spend time on. Just imagine “Sitemap Priority Engineer” could be an actual job title because of this nonsense.
CREATIVE TECH SYSTEMS LLC
3-14-22 Update: Pascal wrote back. Also I think my server stack is somehow blocking this feature. I will investigate that tomorrow. How did I discover this problem in the first place? Because I noticed DuckDuckGo indexed my admin URL, I have a hunch there is a problem with the function “do_robots()” If I fix the sitemap.xml problem, will the robots.txt problem fix itself? Still, I don’t understand why robots.txt would ever direct robots to an admin page, that makes no sense to me. To be continued…
3-15-22 Update: I opened this question on StackExchange. Why does do_robots() Allow: /wp-admin/admin-ajax.php by default?
Then I wrote this: Testing The Default WordPress Sitemap
OK, the sitemap is working for me now on Ferodynamics.com, it appears there is some kind of caching delay for new posts. Reloading the sitemap a few times with “view-source” seems to work. Also I think maybe there is a bug, WordPress doesn’t end the server connection after the xml file finishes loading, which seems like a problem in the browser, but I don’t think robots will care/notice.
3-16-22 Update: I added the following rant to the ticket here.
It seems like all the comments of concern were ignored.
WordPress does not need permission from robots.txt to access itself. Take a step back and ask yourself why robots are “allowed” access to wp-admin/admin-ajax.php. It makes no sense, as if people forgot the role of robots.txt
Robots.txt does not exist to manage all conceivable non-human interactions. It was originally created to save bandwidth, because one gig of transfer could cost over $10. If you simply crawled somebody’s website back in 1999, they might threaten to sue you. Ask me how I know. Frankly, bandwidth and CPU are cheap enough now that robots.txt should be obsolete.
After some decades and bazillions of pageviews, I have never used a robots.txt, because I don’t want to discourage robots from enjoying my content. I don’t know the exact version of WordPress this changed, but it seems WordPress decided to make robots.txt mandatory. In my expert opinion, that was a foolish decision.
Respectful bots will avoid /wp-admin/ without being told. Disrespectful bots will do whatever they want. This auto-generated robots.txt is unnecessary, it just creates confusion and solves nothing.
I’m told the WordPress philosophy is to make decisions instead of offering options. You don’t make certain decisions without asking my permission. I’m drawing the line at robots.txt, I see this as a violation where WordPress is claiming ownership of something that does not belong to WordPress. (My server, my choice.) If I “allow” a robot onto my server, that is my decision to make as a system admin. Just because the average WordPress user is not very sophisticated with technology, that doesn’t mean you can just take control of whatever you want, just because I gave you permission to auto-install upgrades.
What’s next? Are you going to try creeping into php.ini? Seriously, this should be a concern as larger companies are allowed to submit code to WordPress. If you’re going to draw a line somewhere, might as well draw the line with robots.txt
As for admin-ajax.php, whoever added that line should at least include a robots.readme to explain why robots.txt is mandatory, why it makes no sense, the relationship to wp-sitemap.xml, and include a link back to this URL, because it took me two days to follow the breadcrumbs back to this ticket. To say the least, this is not how I wanted to spend my week. And after digging into pages and pages of explanations, I’m still wondering why anyone would add admin-ajax.php to robots.txt
“Since it’s often used on front-end.”
Like I said, the front-end has nothing to do with robots.txt
“What might be the downside of allowing admin-ajax.php to be crawled? Any chance of unwanted content appearing in SERPs?”
BINGO! I’m having a problem with DuckDuckGo right now, it’s listing /wp-admin/ as the #1 search result for my domain.