Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

francoposa/casie tree queue sketch integration debug #8573

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
8c0f045
Sketching out new tree queue with shuffle shard nodes
chencs Apr 10, 2024
ee2a804
Add test to illustrate updating tenantQuerierAssignments outside of tree
chencs Apr 10, 2024
fadc1be
pull out dequeue ops
chencs Apr 26, 2024
3cd2481
Deduplicate node logic and extract node state types
chencs May 1, 2024
d2f00a0
Add a test for reassigning tenantQuerierID map value
chencs May 1, 2024
4b155bf
wip
chencs May 28, 2024
26497e9
added back lastTenantIndex concept, and empty tenant placeholders
chencs Jun 3, 2024
10fcce3
actually add queuing algorithms
chencs Jun 3, 2024
92ef094
better way to pull out tenantID
chencs Jun 3, 2024
295b9ae
fixing tests
chencs Jun 4, 2024
461b92d
Port extant, relevant tree queue tests to new Tree, and finish rippin…
chencs Jun 6, 2024
d8a1330
Add config to flip tree to query component -> tenant
chencs Jun 7, 2024
abc4ab0
Add EnqueueFrontByPath tests, makeQueuePath based on prioritizeQueryC…
chencs Jun 10, 2024
d539206
Rename to TreeQueue
chencs Jun 10, 2024
3bae2d6
lint
chencs Jun 10, 2024
ce9491f
PR feedback
chencs Jun 11, 2024
57fadd6
renaming args and removing config flag re: PR feedback
chencs Jun 12, 2024
9be157a
some test cleanup
chencs Jun 12, 2024
2b45713
PR feedback on tests, mostly
chencs Jun 17, 2024
872a80d
Patrick PR feedback: Comment and naming updates
chencs Jun 18, 2024
6d7b9a6
Re-include original tree queue with a config switch
chencs Jun 24, 2024
44a393d
Implement state update fn and bring back TreeQueue-specific tests
chencs Jun 25, 2024
ecc02ea
Update CHANGELOG and flag name
chencs Jun 25, 2024
cd21632
update docs
chencs Jun 25, 2024
c56cdfc
Update docs
chencs Jun 26, 2024
3cd5486
debug setup; not enabled for any container
francoposa Jul 1, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# Changelog

## main / unreleased
* [ENHANCEMENT] Query-scheduler: Introduce `query-scheduler.prioritize-query-components`, which allows configuration of the request queue tree to prioritize dequeuing from a specific query component more highly than dequeueing from a specific tenant. #7873

### Grafana Mimir

Expand Down
11 changes: 11 additions & 0 deletions cmd/mimir/config-descriptor.json
Original file line number Diff line number Diff line change
Expand Up @@ -15909,6 +15909,17 @@
"fieldType": "boolean",
"fieldCategory": "experimental"
},
{
"kind": "field",
"name": "use_multi_algorithm_query_queue",
"required": false,
"desc": "Use an experimental version of the query queue which has the same behavior as the existing queue, but integrates tenant selection into the tree model.",
"fieldValue": null,
"fieldDefaultValue": false,
"fieldFlag": "query-scheduler.use-multi-algorithm-query-queue",
"fieldType": "boolean",
"fieldCategory": "experimental"
},
{
"kind": "field",
"name": "querier_forget_delay",
Expand Down
2 changes: 2 additions & 0 deletions cmd/mimir/help-all.txt.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -2317,6 +2317,8 @@ Usage of ./cmd/mimir/mimir:
Backend storage to use for the ring. Supported values are: consul, etcd, inmemory, memberlist, multi. (default "memberlist")
-query-scheduler.service-discovery-mode string
[experimental] Service discovery mode that query-frontends and queriers use to find query-scheduler instances. When query-scheduler ring-based service discovery is enabled, this option needs be set on query-schedulers, query-frontends and queriers. Supported values are: dns, ring. (default "dns")
-query-scheduler.use-multi-algorithm-query-queue
[experimental] Use an experimental version of the query queue which has the same behavior as the existing queue, but integrates tenant selection into the tree model.
-ruler-storage.azure.account-key string
Azure storage account key. If unset, Azure managed identities will be used for authentication instead.
-ruler-storage.azure.account-name string
Expand Down
10 changes: 5 additions & 5 deletions development/mimir-microservices-mode/compose-up.sh
Original file line number Diff line number Diff line change
Expand Up @@ -25,10 +25,10 @@ cd "$SCRIPT_DIR" && make
CGO_ENABLED=0 GOOS=linux go build -mod=vendor -tags=netgo,stringlabels -gcflags "all=-N -l" -o "${SCRIPT_DIR}"/mimir "${SCRIPT_DIR}"/../../cmd/mimir
docker_compose -f "${SCRIPT_DIR}"/docker-compose.yml build --build-arg BUILD_IMAGE="${BUILD_IMAGE}" distributor-1

if [ "$(yq '.services.query-tee' "${SCRIPT_DIR}"/docker-compose.yml)" != "null" ]; then
# If query-tee is enabled, build its binary and image as well.
CGO_ENABLED=0 GOOS=linux go build -mod=vendor -tags=netgo,stringlabels -gcflags "all=-N -l" -o "${SCRIPT_DIR}"/../../cmd/query-tee "${SCRIPT_DIR}"/../../cmd/query-tee
docker_compose -f "${SCRIPT_DIR}"/docker-compose.yml build --build-arg BUILD_IMAGE="${BUILD_IMAGE}" query-tee
fi
#if [ "$(yq '.services.query-tee' "${SCRIPT_DIR}"/docker-compose.yml)" != "null" ]; then
# # If query-tee is enabled, build its binary and image as well.
# CGO_ENABLED=0 GOOS=linux go build -mod=vendor -tags=netgo,stringlabels -gcflags "all=-N -l" -o "${SCRIPT_DIR}"/../../cmd/query-tee "${SCRIPT_DIR}"/../../cmd/query-tee
# docker_compose -f "${SCRIPT_DIR}"/docker-compose.yml build --build-arg BUILD_IMAGE="${BUILD_IMAGE}" query-tee
#fi

docker_compose -f "${SCRIPT_DIR}"/docker-compose.yml up "$@"
2 changes: 2 additions & 0 deletions development/mimir-microservices-mode/config/mimir.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -138,6 +138,7 @@ frontend:
cache_results: true
additional_query_queue_dimensions_enabled: true
shard_active_series_queries: true
max_outstanding_per_tenant: 10000

# Uncomment when using "dns" service discovery mode for query-scheduler.
# scheduler_address: "query-scheduler:9011"
Expand All @@ -163,6 +164,7 @@ query_scheduler:
# Change to "dns" to switch to query-scheduler DNS-based service discovery.
service_discovery_mode: "ring"
additional_query_queue_dimensions_enabled: true
max_outstanding_requests_per_tenant: 10000

limits:
# Limit max query time range to 31d
Expand Down
5 changes: 3 additions & 2 deletions development/mimir-microservices-mode/dev.dockerfile
Original file line number Diff line number Diff line change
@@ -1,11 +1,12 @@
ARG BUILD_IMAGE # Use ./compose-up.sh to build this image.
FROM $BUILD_IMAGE
ENV CGO_ENABLED=0
RUN go install github.com/go-delve/delve/cmd/dlv@v1.21.0
RUN go install github.com/go-delve/delve/cmd/dlv@v1.22.1

FROM alpine:3.19.1

RUN mkdir /mimir
WORKDIR /mimir
COPY ./mimir ./
COPY --from=0 /go/bin/dlv ./
RUN ln -s ./mimir /usr/local/bin/mimir
COPY --from=0 /go/bin/dlv /usr/local/bin/dlv
4 changes: 2 additions & 2 deletions development/mimir-microservices-mode/docker-compose.jsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ std.manifestYamlDoc({

// If true, Mimir services are run under Delve debugger, that can be attached to via remote-debugging session.
// Note that Delve doesn't forward signals to the Mimir process, so Mimir components don't shutdown cleanly.
debug: false,
debug: true,

// How long should Mimir docker containers sleep before Mimir is started.
sleep_seconds: 3,
Expand All @@ -23,7 +23,7 @@ std.manifestYamlDoc({
ring: 'memberlist',

// If true, a load generator is started.
enable_load_generator: true,
enable_load_generator: false,

// If true, start and enable scraping by these components.
// Note that if more than one component is enabled, the dashboards shown in Grafana may contain duplicate series or aggregates may be doubled or tripled.
Expand Down
62 changes: 32 additions & 30 deletions development/mimir-microservices-mode/docker-compose.yml

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
Expand Up @@ -1708,6 +1708,12 @@ The `query_scheduler` block configures the query-scheduler.
# CLI flag: -query-scheduler.additional-query-queue-dimensions-enabled
[additional_query_queue_dimensions_enabled: <boolean> | default = false]

# (experimental) Use an experimental version of the query queue which has the
# same behavior as the existing queue, but integrates tenant selection into the
# tree model.
# CLI flag: -query-scheduler.use-multi-algorithm-query-queue
[use_multi_algorithm_query_queue: <boolean> | default = false]

# (experimental) If a querier disconnects without sending notification about
# graceful shutdown, the query-scheduler will keep the querier in the tenant's
# shard until the forget delay has passed. This feature is useful to reduce the
Expand Down
2 changes: 1 addition & 1 deletion integration/e2emimir/client.go
Original file line number Diff line number Diff line change
Expand Up @@ -1523,7 +1523,7 @@ func (c *Client) doRequest(method, url string, body io.Reader) (*http.Response,
req.Header.Set("X-Scope-OrgID", c.orgID)

client := *c.httpClient
client.Timeout = c.timeout
client.Timeout = 2 * time.Minute

return client.Do(req)
}
Expand Down
5 changes: 3 additions & 2 deletions integration/e2emimir/service.go
Original file line number Diff line number Diff line change
Expand Up @@ -22,12 +22,13 @@ func NewMimirService(
envVars map[string]string,
httpPort int,
grpcPort int,
otherPorts ...int,
otherPorts []int,
portMapContainerToLocal map[int]int,
) *MimirService {
s := &MimirService{
// We don't expose the gRPC port cause we don't need to access it from the host
// (exposing ports have a negative performance impact on starting/stopping containers).
HTTPService: e2e.NewHTTPService(name, image, command, readiness, httpPort, otherPorts...),
HTTPService: e2e.NewHTTPService(name, image, command, readiness, httpPort, otherPorts, portMapContainerToLocal),
grpcPort: grpcPort,
}
s.SetEnvVars(envVars)
Expand Down
131 changes: 83 additions & 48 deletions integration/e2emimir/services.go
Original file line number Diff line number Diff line change
Expand Up @@ -15,12 +15,14 @@ import (
)

const (
httpPort = 8080
grpcPort = 9095
httpPort = 8080
grpcPort = 9095
mimirBinaryName = "mimir"
delveBinaryName = "dlv"
)

// GetDefaultImage returns the Docker image to use to run Mimir.
func GetDefaultImage() string {
// GetMimirImageOrDefault returns the Docker image to use to run Mimir.
func GetMimirImageOrDefault() string {
// Get the mimir image from the MIMIR_IMAGE env variable,
// falling back to grafana/mimir:latest"
if img := os.Getenv("MIMIR_IMAGE"); img != "" {
Expand Down Expand Up @@ -68,20 +70,56 @@ func getExtraFlags() map[string]string {
return extraArgs
}

func newMimirServiceFromOptions(name string, defaultFlags, flags map[string]string, options ...Option) *MimirService {
func newMimirServiceFromOptions(name string, defaultFlags, overrideFlags map[string]string, options ...Option) *MimirService {
o := newOptions(options)
serviceFlags := o.MapFlags(e2e.MergeFlags(defaultFlags, overrideFlags, getExtraFlags()))

return NewMimirService(
name,
"grafana/mimir:latest",
e2e.NewCommandWithoutEntrypoint(mimirBinaryName, e2e.BuildArgs(serviceFlags)...),
e2e.NewHTTPReadinessProbe(o.HTTPPort, "/ready", 200, 299),
o.Environment,
o.HTTPPort,
o.GRPCPort,
o.OtherPorts,
nil,
)
}

func newMimirDebugServiceFromOptions(name string, defaultFlags, flags map[string]string, options ...Option) *MimirService {
o := newOptions(options)
serviceFlags := o.MapFlags(e2e.MergeFlags(defaultFlags, flags, getExtraFlags()))
binaryName := getBinaryNameForBackwardsCompatibility()
serviceArgs := e2e.BuildArgs(serviceFlags)

delveListenPort := o.HTTPPort + 10000 // follow convention in docker compose development environment

// delve args must be ordered correctly due to the use of `--`
// to delineate the end of delve args and beginning of the binary's args;
// do not use any interface utilizing a map structure as order will not be preserved
delveArgs := []string{
"--log",
"exec",
"mimir",
fmt.Sprintf("--listen=:%d", delveListenPort),
"--headless=true",
"--api-version=2",
"--accept-multiclient",
"--",
}

cmd := e2e.NewCommandWithoutEntrypoint(delveBinaryName, append(delveArgs, serviceArgs...)...)

return NewMimirService(
name,
o.Image,
e2e.NewCommandWithoutEntrypoint(binaryName, e2e.BuildArgs(serviceFlags)...),
cmd,
e2e.NewHTTPReadinessProbe(o.HTTPPort, "/ready", 200, 299),
o.Environment,
o.HTTPPort,
o.GRPCPort,
o.OtherPorts...,
[]int{delveListenPort},
map[int]int{delveListenPort: delveListenPort, o.HTTPPort: o.HTTPPort},
)
}

Expand All @@ -95,10 +133,8 @@ func NewDistributor(name string, consulAddress string, flags map[string]string,
"-ingester.ring.replication-factor": "1",
"-distributor.remote-timeout": "2s", // Fail fast in integration tests.
// Configure the ingesters ring backend
"-ingester.ring.store": "consul",
"-ingester.ring.consul.hostname": consulAddress,
"-ingester.partition-ring.store": "consul",
"-ingester.partition-ring.consul.hostname": consulAddress,
"-ingester.ring.store": "consul",
"-ingester.ring.consul.hostname": consulAddress,
// Configure the distributor ring backend
"-distributor.ring.store": "memberlist",
},
Expand All @@ -115,10 +151,8 @@ func NewQuerier(name string, consulAddress string, flags map[string]string, opti
"-log.level": "warn",
"-ingester.ring.replication-factor": "1",
// Ingesters ring backend.
"-ingester.ring.store": "consul",
"-ingester.ring.consul.hostname": consulAddress,
"-ingester.partition-ring.store": "consul",
"-ingester.partition-ring.consul.hostname": consulAddress,
"-ingester.ring.store": "consul",
"-ingester.ring.consul.hostname": consulAddress,
// Query-frontend worker.
"-querier.frontend-client.backoff-min-period": "100ms",
"-querier.frontend-client.backoff-max-period": "100ms",
Expand Down Expand Up @@ -160,10 +194,8 @@ func NewIngester(name string, consulAddress string, flags map[string]string, opt
"-log.level": "warn",
"-ingester.ring.num-tokens": "512",
// Configure the ingesters ring backend
"-ingester.ring.store": "consul",
"-ingester.ring.consul.hostname": consulAddress,
"-ingester.partition-ring.store": "consul",
"-ingester.partition-ring.consul.hostname": consulAddress,
"-ingester.ring.store": "consul",
"-ingester.ring.consul.hostname": consulAddress,
// Speed up the startup.
"-ingester.ring.min-ready-duration": "0s",
// Enable native histograms
Expand All @@ -174,10 +206,6 @@ func NewIngester(name string, consulAddress string, flags map[string]string, opt
)
}

func getBinaryNameForBackwardsCompatibility() string {
return "mimir"
}

func NewQueryFrontend(name string, flags map[string]string, options ...Option) *MimirService {
return newMimirServiceFromOptions(
name,
Expand All @@ -186,8 +214,6 @@ func NewQueryFrontend(name string, flags map[string]string, options ...Option) *
"-log.level": "warn",
// Quickly detect query-scheduler when running it.
"-query-frontend.scheduler-dns-lookup-period": "1s",
// Always exercise remote read limits in integration tests.
"-query-frontend.remote-read-limits-enabled": "true",
},
flags,
options...,
Expand Down Expand Up @@ -225,20 +251,31 @@ func NewCompactor(name string, consulAddress string, flags map[string]string, op
)
}

func NewSingleBinary(name string, flags map[string]string, options ...Option) *MimirService {
var singleBinaryDefaultFlags = map[string]string{
// Do not pass any extra default flags (except few used to speed up the test)
// because the config could be driven by the config file.
"-target": "all",
"-log.level": "warn",
// Speed up the startup.
"-ingester.ring.min-ready-duration": "0s",
// Enable native histograms
"-ingester.native-histograms-ingestion-enabled": "true",
}

func NewSingleBinary(name string, overrideFlags map[string]string, options ...Option) *MimirService {
return newMimirServiceFromOptions(
name,
map[string]string{
// Do not pass any extra default flags (except few used to speed up the test)
// because the config could be driven by the config file.
"-target": "all",
"-log.level": "warn",
// Speed up the startup.
"-ingester.ring.min-ready-duration": "0s",
// Enable native histograms
"-ingester.native-histograms-ingestion-enabled": "true",
},
flags,
singleBinaryDefaultFlags,
overrideFlags,
options...,
)
}

func NewDebugSingleBinary(name string, overrideFlags map[string]string, options ...Option) *MimirService {
return newMimirDebugServiceFromOptions(
name,
singleBinaryDefaultFlags,
overrideFlags,
options...,
)
}
Expand Down Expand Up @@ -304,17 +341,17 @@ func NewAlertmanagerWithTLS(name string, flags map[string]string, options ...Opt
"-target": "alertmanager",
"-log.level": "warn",
}, flags, getExtraFlags()))
binaryName := getBinaryNameForBackwardsCompatibility()

return NewMimirService(
name,
o.Image,
e2e.NewCommandWithoutEntrypoint(binaryName, e2e.BuildArgs(serviceFlags)...),
"grafana/mimir:latest",
e2e.NewCommandWithoutEntrypoint(mimirBinaryName, e2e.BuildArgs(serviceFlags)...),
e2e.NewTCPReadinessProbe(o.HTTPPort),
o.Environment,
o.HTTPPort,
o.GRPCPort,
o.OtherPorts...,
o.OtherPorts,
nil,
)
}

Expand All @@ -325,10 +362,8 @@ func NewRuler(name string, consulAddress string, flags map[string]string, option
"-target": "ruler",
"-log.level": "warn",
// Configure the ingesters ring backend
"-ingester.ring.store": "consul",
"-ingester.ring.consul.hostname": consulAddress,
"-ingester.partition-ring.store": "consul",
"-ingester.partition-ring.consul.hostname": consulAddress,
"-ingester.ring.store": "consul",
"-ingester.ring.consul.hostname": consulAddress,
},
flags,
options...,
Expand Down Expand Up @@ -363,7 +398,7 @@ type Option func(*Options)
// newOptions creates an Options with default values and applies the options provided.
func newOptions(options []Option) *Options {
o := &Options{
Image: GetDefaultImage(),
Image: GetMimirImageOrDefault(),
MapFlags: NoopFlagMapper,
HTTPPort: httpPort,
GRPCPort: grpcPort,
Expand Down Expand Up @@ -417,7 +452,7 @@ func WithConfigFile(configFile string) Option {
}

// WithNoopOption returns an option that doesn't change anything.
func WithNoopOption() Option { return func(*Options) {} }
func WithNoopOption() Option { return func(options *Options) {} }

// FlagMapper is the type of function that maps flags, just to reduce some verbosity.
type FlagMapper func(flags map[string]string) map[string]string
Expand Down
2 changes: 1 addition & 1 deletion integration/integration_memberlist_single_binary_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -138,7 +138,7 @@ func newSingleBinary(name string, servername string, join string, testFlags map[
flags["-ingester.ring.observe-period"] = "0s" // No need to observe tokens because we're going to be the first instance.
}

serv := e2emimir.NewSingleBinary(
serv := e2emimir.NewDebugSingleBinary(
name,
mergeFlags(
DefaultSingleBinaryFlags(),
Expand Down
2 changes: 2 additions & 0 deletions integration/vault_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,8 @@ func TestVaultTokenRenewal(t *testing.T) {
nil,
e2e.NewHTTPReadinessProbe(httpPort, "/v1/sys/health", 200, 200),
httpPort,
nil,
nil,
)
vault.SetEnvVars(map[string]string{"VAULT_DEV_ROOT_TOKEN_ID": devToken})
require.NoError(t, s.StartAndWaitReady(vault))
Expand Down
Loading
Loading