Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix 504 errors for crawlers #7

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

maximmax42
Copy link

When crawlers try to index a folder on a website, say example.com/folder, Prerender is trying to fetch example.comfolder instead, thus giving a 504 to the bot. Adding a / between %{HTTP_HOST} and $2 in the .htaccess fixes that.

Real life example:
image
Last 2 are before the fix, first 2 are after the fix.

@varrocs
Copy link
Contributor

varrocs commented Oct 14, 2022

Hi
It used to be that way but the slash was deliberately removed because users had // in their URLs
See: https://github.com/prerender/prerender-apache/pull/5/files

@maximmax42
Copy link
Author

maximmax42 commented Oct 14, 2022

From https://httpd.apache.org/docs/current/mod/mod_rewrite.html#rewriterule:

  • (from What is matched?) In per-directory context (Directory and .htaccess), the Pattern is matched against only a partial path, for example a request of "/app1/index.html" may result in comparison against "app1/index.html" or "index.html" depending on where the RewriteRule is defined.
    The directory path where the rule is defined is stripped from the currently mapped filesystem path before comparison (up to and including a trailing slash). The net result of this per-directory prefix stripping is that rules in this context only match against the portion of the currently mapped filesystem path "below" where the rule is defined.
  • (from Per-directory Rewrites) The removed prefix always ends with a slash, meaning the matching occurs against a string which never has a leading slash. Therefore, a Pattern with ^/ never matches in per-directory context.

If I understand this correctly, the pattern (and $2, by definition) will never have a leading slash, meaning %{HTTP_HOST}$2 in .htaccess will always result in something like example.comfolder, so the / between the host and $2 is required. It wouldn't be if this rewrite rule was in the VirtualHost context, which is when, I assume, people were getting double slashes with the %{HTTP_HOST}/$2 RewriteRule. Or maybe super old Apache version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants