Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for scaling based on Nakadi #636

Merged
merged 2 commits into from
Dec 7, 2023
Merged

Conversation

mikkeloscar
Copy link
Contributor

@mikkeloscar mikkeloscar commented Nov 15, 2023

This adds support for scaling based on Nakadi subscriptions by utilizing the nakadi stats api: https://nakadi.io/manual.html#/subscriptions/subscription_id/stats_get

It's possible to define an HPA like below:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: myapp-hpa
  annotations:
    metric-config.external.my-nakadi-consumer.nakadi/interval: "60s" # optional
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: custom-metrics-consumer
  minReplicas: 0
  maxReplicas: 8 # should match number of partitions for the event type
  metrics:
  - type: External
    external:
      metric:
          name: my-nakadi-consumer
          selector:
            matchLabels:
              type: nakadi
              subscription-id: "708095f6-cece-4d02-840e-ee488d710b29"
              metric-type: "consumer-lag-seconds|unconsumed-events"
      target:
        # value is compatible with the consumer-lag-seconds metric type.
        # It describes the amount of consumer lag in seconds before scaling
        # additionally up.
        # if an event-type has multiple partitions the value of
        # consumer-lag-seconds is the max of all the partitions.
        value: "600" # 10m
        # averageValue is compatible with unconsumed-events metric type.
        # This means for every 30 unconsumed events a pod is scaled up.
        # unconsumed-events is the sum of of unconsumed_events over all
        # partitions.
        averageValue: "30"
        type: AverageValue

To scale on either consumer-lag-seconds or unconsumed-events.

Fix #5

TODO

  • Unit tests
  • Better documentation in README

selector:
matchLabels:
type: nakadi
subscription-id: "708095f6-cece-4d02-840e-ee488d710b29"
Copy link
Member

@AlexanderYastrebov AlexanderYastrebov Nov 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think subscription is uniquely identified by the owning_application, event_types and consumer_group https://nakadi.io/manual.html#creating-subscriptions so maybe (as a future improvement) this metric handler could also support these three fields and figure out subscription id by listing subscriptions https://nakadi.io/manual.html#getting-and-listing-subscriptions

This could be handy to define the same logical subscription (same owning_application, event_types and consumer_group) in staging and production environments (instead of templating subscription-id value).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jFYI: this could be a feature improvement but not prioritized for now. But it's simple to add later even if we don't do it in the first iteration.

pkg/nakadi/nakadi.go Outdated Show resolved Hide resolved
@mikkeloscar mikkeloscar force-pushed the nakadi-collector branch 4 times, most recently from 423c640 to cc0557a Compare November 20, 2023 10:08
@mikkeloscar mikkeloscar force-pushed the nakadi-collector branch 2 times, most recently from 3d88292 to 1160fe8 Compare December 4, 2023 14:28
@mikkeloscar mikkeloscar marked this pull request as ready for review December 4, 2023 14:28
@AlexanderYastrebov
Copy link
Member

👍

@mikkeloscar mikkeloscar force-pushed the nakadi-collector branch 2 times, most recently from 537d1a3 to 998d1f1 Compare December 4, 2023 16:18
Signed-off-by: Mikkel Oscar Lyderik Larsen <mikkel.larsen@zalando.de>
@AlexanderYastrebov
Copy link
Member

👍

README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
kind: Deployment
name: custom-metrics-consumer
minReplicas: 0
maxReplicas: 8 # should match number of partitions for the event type
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can improve it in the future that autoscaler does that on its own. it is possible to know number of partitions for Subscription

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then we need some abstraction on top of HPA to hide this setting for users. Not saying we should not, but it's more complex than what we can implement with the native resource.

README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved

`unconsumed-events` - is the total number of unconsumed events over all
partitions. When using this `metric-type` you should also use the target
`averageValue` which indicates the number of events which can be handled per
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is that really per pod or per the whole the stack? so if I have 2 pods and average number goes beyond 30 then a pod is added. it means it is 30 per 2 pods.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is how you define it. The autoscaler will always scale higher rather than lower. So if the averageTarget is 30 and you have 31 unconsumed events, it will scale to 2 pods, but it will not scale higher until 61 unconsumed events where it then needs 3 pods.

Co-authored-by: Andrey <andrey.dyachkov@gmail.com>
Signed-off-by: Mikkel Oscar Lyderik Larsen <mikkel.larsen@zalando.de>
@mikkeloscar
Copy link
Contributor Author

👍

1 similar comment
@AlexanderYastrebov
Copy link
Member

👍

@mikkeloscar mikkeloscar merged commit 49f9f5f into master Dec 7, 2023
9 checks passed
@mikkeloscar mikkeloscar deleted the nakadi-collector branch December 7, 2023 11:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

idea: Nakadi collector
4 participants