-
-
Notifications
You must be signed in to change notification settings - Fork 151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix open graph urls for RECAP pdfs #3094
Conversation
These are a special case where we redirect opengraph bots to /recap/og-lookup/?file_path=/recap/the/path so that they can get their opengraph content. Previously, this page was using the og:url from the base.html template, which doesn't include the query parameters. This means that clients who depend on the og:url property for the click action on an opengraph card (e.g. Mastodon) were getting URLs that 404ed when you click the preview card but not the link in the content that generated it. The og:url would always be `courtlistener.com/recap/og-lookup/` which of course 404s since the query param it's expecting isn't present. To fix, override the template on recap_document.html. Pass an `og_file_path` argument when we do the redirect and check for it in the template.
This way it will work when we redirect to the first attachment as well as the regular case.
I tweaked the approach a little to handle the case where we redirect to the first attachment if the docket entry itself wasn't found, after a test failure illuminated the fact that I was handling that wrong before. I was not able to conclusively test this in my local dev setup because I don't have any real RECAP documents, nor permissions on the API to clone them using the clone_from_cl script. However, it appears that |
Well, this is a confusing mess. When you post a PDF link on CL to Mastodon:
Right now, that HTML includes og:url content for So users click the link in Mastodon, and ploof, it fails with a 404. Great. This fix makes it so that the og:url value for the document pages is the It seems like the view for the regular document pages needs to know whether it's responding to a og bot or not. If so, return the og:url value for storage.cl.com/recap/xxx, and if not, return the regular og:url value. Or am I missing something? This is stupidly complex! |
You have the same understanding of the flow as I do after debugging it. I agree that Mastodon's behavior seems a bit silly here...it makes no sense that the URL for the OG card should be different than the URL in the post that created that same card.
Hmm, yes. That would happen here. I did not notice that.
This is more along the lines of what I wrote originally in 09ba479: I had added an argument to The problem I had there is that there's an edge case (main document may have been converted to an attachment) where In that one case, it's hard to pass the "are we responding to an OG bot" value through without adding it as part of the URL path or something. I suppose we could add a query parameter to the URL that gets constructed there after the |
Yes, good assumption. It's in an S3 lambda.
That won't happen to a PDF though. This happens because appellate dockets start out as docket entry 1, then you learn that actually there's no docket entry 1, only 1-1, and 1-2, etc, so you convert doc 1 to doc 1-1. All of this junk happens before you get the PDF, so if somebody is sharing a PDF, it would have already been figured out. I think that's right. Anyway, even if it's not, I think we can ignore this edge case because worst case, somebody shares a PDF, lambda redirects to the og-lookup, og-lookup says, "Oh shoot, this item was converted", and then redirects to the document HTML page. Mastodon users see a link to a PDF, and get the HTML. That's...fine. At least it's not a 404. |
is_og_bot will be false normally and only true when view_recap_document is invoked via redirect_og_lookup. If false, None will be sent to the template and will use og:url value from base.html template.
for more information, see https://pre-commit.ci
Excellent! So it should be straightforward to just pass an |
This looks about right. Merging, thank you. This will deploy in about twenty minutes (you can watch in the Github Actions tab). It'd be great if you could just check that it's working once it's deployed. Thank you again! |
Checked after the python deploy finished. It looks like I missed a
I'll have a PR in a second to fix that one. |
These are a special case where we redirect opengraph bots to /recap/og-lookup/?file_path=/recap/the/path so that they can get their opengraph content. Previously, this page was using the og:url from the base.html template, which doesn't include the query parameters. This means that clients who depend on the og:url property for the click action on an opengraph card (e.g. Mastodon) were loading URLs that 404ed when you click the preview card but not the link in the content that generated it, leading to a confusing experience for users.
The og:url would always be
courtlistener.com/recap/og-lookup/
which of course 404s since the query param it's expecting isn't present.To fix, override the og:url on recap_document.html template. Pass an
og_file_path
argument when we do the redirect and check for it in the template.I decided against simply including the query parameters in the OG URL (either in the base.html template or in recap_document.html) so that the og:url reflects the "canonical" URL, not the url via the og-lookup redirect. Also because I'm not sure what all the possible query parameters might be and whether it would be harmful to include them in the URL or not.