Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

4489 Block V3 API usage for new and anonymous users. #4579

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

albertisfu
Copy link
Contributor

@albertisfu albertisfu commented Oct 16, 2024

As outlined in #4489, this PR introduces a permission class to block anonymous users and new users from accessing V3 of the API.

Additionally, this PR introduces stats logging in Redis for V4, so once merged, the logs will be independent from V3, using v4 as a prefix. The stats logged will be exactly the same as for V3.

I also noticed that the Stat API Events, which log total API request milestones and API user milestones, are tied to API stats in Redis. As a result, when we start logging V4 stats in Redis, we will have duplicate milestones since V4 is starting from scratch.

To differentiate events between V3 and V4, I tweaked the description so future milestones for V3 are logged as:

  • API v3 has logged 10,000 total requests.
  • User 'username' has placed their 1st API v3 request.

For V4:

  • API v4 has logged 1 total request.
  • User 'username' has placed their 1st API v4 request.

Does this seem good to you?

  • One issue I encountered was how to detect if a user is a current V3 user. Initially, I planned to use v3.user.counts in Redis to check if the user existed there, but since we'll only check some requests, when a new user request is allowed, they will become a V3 user and will never be blocked in the future. The solution is to create an initial v3-users-list, which must be set up before enabling the V3 block feature.

  • The permission process works according to the following checks, in order of priority:

  1. If the request is not from the V3 API or BLOCK_NEW_V3_USERS is disabled, the request is allowed.
  2. If the request is from an AnonymousUser, it is blocked, returning a 403 status code with this message:Anonymous users don't have permission to access V3 of the API. Please use V4 instead.
  3. Then, we check if the user is in the v3-blocked-users-list, a list where "new" users are added so that all their future requests are blocked.
  4. Lastly, it checks only some requests to identify new users by checking if they are present in the v3-users-list. If the user is not found there, the request is blocked, and the user is added to the v3-blocked-users-list. The response is a 403 status code with this message:As a new user, you don't have permission to access V3 of the API. Please use V4 instead.

If we want this to be stateless, we can just take a mod of unixtime, and go with that.

So I implemented this method:

    def check_request() -> bool:
        if int(time.time()) % 10 == 0:
            return True
        return False

It checks requests every 10 seconds (we can adjust this value as needed), meaning all requests happening within one second every 10 seconds will be checked. Does this sound good to you? An alternative is to use a random function to check about 1 in 50 requests, but that wouldn't be fully stateless.

  • BLOCK_NEW_V3_USERS is a setting that defaults to False, so the V3 block feature is disabled during testing (to avoid conflicts) and for developers by default.
    This setting should be set to True in production after creating the v3-users-list.
    To create it, we can use the following command:
from cl.lib.redis_utils import get_redis_interface

r = get_redis_interface("STATS")
v3_stats_members = r.zrange("api:v3.user.counts", 0, -1)
r.sadd("v3-users-list", *v3_stats_members)

This command will copy all the current users in api:v3.user.counts to the v3-users-list.

Finally, I added the new V3APIPermission class to all API endpoints, except for:

  • MembershipWebhookViewSet: This is not available to users and is only used for Neon, so it's probably not worth performing the V3APIPermission checks here.

I also didn't add the V3APIPermission to the following RECAP endpoints, which currently use the following permission classes:

  • PacerProcessingQueueViewSet: RECAPUploaders
  • EmailProcessingQueueViewSet: EmailProcessingQueueAPIUsers
  • PacerDocIdLookupViewSet: RECAPUsersReadOnly

My thinking is that these endpoints are mostly for internal use with the RECAPExtension and RECAPEmail, so it's probably not worth performing the V3APIPermission checks here.

For all the other API endpoints that already had a permission class, the new V3APIPermission was added at the end of the list so that it prioritizes other permissions as before.

Let me know what you think.

@albertisfu albertisfu marked this pull request as ready for review October 16, 2024 19:40
@mlissner
Copy link
Member

An alternative is to use a random function to check about 1 in 50 requests,

Since this will have some performance impact, a random number does seem better. Otherwise, every X seconds all API requests will be impacted.

The rest sounds right to me. Thank you Alberto!

@albertisfu
Copy link
Contributor Author

Since this will have some performance impact, a random number does seem better. Otherwise, every X seconds all API requests will be impacted.

Yeah, that's the downside of the time approach.
I've changed it to the random approach. So ~1 in 50 are checked.

    def check_request() -> bool:
        # Check 1 in 50 requests.
        if random.randint(1, 50) == 1:
            return True
        return False

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 👀 In review
Status: Main Backlog
Development

Successfully merging this pull request may close these issues.

2 participants