Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Terrafrom] Add rate limiting #1084

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

phalbert
Copy link
Contributor

Description

What - This PR implements rate limiting logic in the Terraform Cloud integration to address frequent 429 Too Many Requests warnings in the logs.

Why - We were experiencing excessive 429 Too Many Requests warnings, indicating that our integration was exceeding Terraform Cloud's API rate limits. This was cluttering the logs and potentially impacting performance.

How - Implemented a more robust rate limiting mechanism in the TerraformClient class:

  1. Added a wait_for_rate_limit method to enforce rate limits.
  2. Implemented a token bucket algorithm to manage API request rates.
  3. Added asyncio locks to ensure thread-safe rate limiting in asynchronous operations.
  4. Updated the send_api_request method to use the new rate limiting logic.

Type of change

  • Bug fix (non-breaking change which fixes an issue)

All tests should be run against the port production environment(using a testing org).

Core testing checklist

  • Integration able to create all default resources from scratch
  • Resync finishes successfully
  • Resync able to create entities
  • Resync able to update entities
  • Resync able to detect and delete entities
  • Scheduled resync able to abort existing resync and start a new one
  • Tested with at least 2 integrations from scratch
  • Tested with Kafka and Polling event listeners
  • Tested deletion of entities that don't pass the selector

Integration testing checklist

  • Integration able to create all default resources from scratch
  • Resync able to create entities
  • Resync able to update entities
  • Resync able to detect and delete entities
  • Resync finishes successfully
  • If new resource kind is added or updated in the integration, add example raw data, mapping and expected result to the examples folder in the integration directory.
  • If resource kind is updated, run the integration with the example data and check if the expected result is achieved
  • If new resource kind is added or updated, validate that live-events for that resource are working as expected
  • Docs PR link here

Preflight checklist

  • Handled rate limiting
  • Handled pagination
  • Implemented the code in async
  • Support Multi account

Screenshots

[Include screenshots of logs showing reduced 429 warnings, if available]

API Documentation

@phalbert phalbert self-assigned this Oct 18, 2024
@phalbert phalbert requested a review from a team as a code owner October 18, 2024 17:03
Copy link
Member

@mk-armah mk-armah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work 👏🏽, left few comments. Also I was wondering if we could use aiolimiter to control requests easily since terraform rate limit is time bound

# https://developer.hashicorp.com/terraform/cloud-docs/api-docs#rate-limiting
RATE_LIMIT_PER_SECOND = 30
RATE_LIMIT_BUFFER = 5 # Buffer to avoid hitting the exact limit
MAX_CONCURRENT_REQUESTS = 10
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 10 ?

json=json_data,
)
response.raise_for_status()
async with self.semaphore:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a concurrency constraint on the Terraform Cloud API as well ?

workspaces[i : i + CHUNK_SIZE]
for i in range(0, len(workspaces), CHUNK_SIZE)
]:
chunk_results = await gather(*[fetch_runs_for_workspace(w) for w in chunk])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a specific reason for replacing as_completed with gather ? waiting for all tasks within a chunk to complete before moving on to the next chunk appears to impede performance in this context. Please correct me.

Comment on lines +43 to +66

self.rate_limit = RATE_LIMIT_PER_SECOND
self.rate_limit_remaining = RATE_LIMIT_PER_SECOND
self.rate_limit_reset: float = 0.0
self.last_request_time = time.time()
self.request_times: list[float] = []
self.semaphore = asyncio.Semaphore(MAX_CONCURRENT_REQUESTS)
self.rate_limit_lock = asyncio.Lock()

async def wait_for_rate_limit(self) -> None:
async with self.rate_limit_lock:
current_time = time.time()
self.request_times = [t for t in self.request_times if current_time - t < 1]

if len(self.request_times) >= RATE_LIMIT_PER_SECOND:
wait_time = 1 - (current_time - self.request_times[0])
if wait_time > 0:
logger.info(
f"Rate limit reached, waiting for {wait_time:.2f} seconds"
)
await asyncio.sleep(wait_time)
self.request_times = self.request_times[1:]

self.request_times.append(current_time)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants