Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate to github.com/go-redis/redis #235

Closed
wants to merge 13 commits into from
Closed

Conversation

j178
Copy link
Contributor

@j178 j178 commented Jul 30, 2022

Closes #210

@j178 j178 changed the title [WIP] Migrate to github.com/go-redis/redis Migrate to github.com/go-redis/redis Jul 30, 2022
@codecov
Copy link

codecov bot commented Jul 30, 2022

Codecov Report

Merging #235 (244daa6) into master (f9fe49d) will increase coverage by 1.20%.
The diff coverage is 87.87%.

@@            Coverage Diff             @@
##           master     #235      +/-   ##
==========================================
+ Coverage   83.06%   84.27%   +1.20%     
==========================================
  Files          36       37       +1     
  Lines        8207     8017     -190     
==========================================
- Hits         6817     6756      -61     
+ Misses       1055      948     -107     
+ Partials      335      313      -22     
Impacted Files Coverage Δ
redis_shard.go 78.04% <77.21%> (+11.68%) ⬆️
broker_redis.go 74.22% <93.10%> (+2.03%) ⬆️
internal/util/unsafe.go 100.00% <100.00%> (ø)
node.go 92.94% <100.00%> (+0.19%) ⬆️
presence_redis.go 75.73% <100.00%> (+0.14%) ⬆️
client.go 83.50% <0.00%> (+0.08%) ⬆️
hub.go 84.92% <0.00%> (+0.26%) ⬆️
... and 2 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@FZambia
Copy link
Member

FZambia commented Jul 31, 2022

Oh, wow, that's the hard work! And seems great overall 👍 Removing dependencies and the code that manages various connection modes is awesome.

Probably let's collaborate next time before introducing such a big changes? Just to make sure we are on the same page about things.

In general, the issue was about researching things, we need to prove the migration to go-redis/redis makes sense. So some additional understanding required here to proceed with changes.

I have not looked at code closely yet – will do at some point, for now some top-level things:

  1. What was the reason of bump to go1.18 in go.mod? We should not depend on features of go1.18 for now.
  2. The difference in timeout behavior. In the current implementation we have both read/write connection timeouts set, probably you can describe the difference in timeout behavior with new implementation. I see lot's of usages of context.Background() - does the operations have some underlying timeouts anyway? So some description regarding working with timeouts in implementation with go-redis would be awesome to have in PR.
  3. Run existing benchmarks - we have several Redis benchmarks and they should perform better to justify the migration. Could you run them with https://pkg.go.dev/golang.org/x/perf/cmd/benchstat ? Super curious to look at results!
  4. Performance in Redis Cluster – this is not covered by current benchmarks but we need to understand the performance difference in Cluster. Possibly we need some new benchmarks that measure how Engine performs in Cluster. Should not be worse than Redigo + mna/redisc either.
  5. Have you tested failover with Sentinel? I think existing tests do not cover it - only connect to current master. Probably we can automate this somehow, but at least manual testing would be awesome to have some confidence it works in general. Otherwise, redis-go should be stable enough I believe to rely on it.
  6. Though I think we still need to look through go-redis open issues list and try to find out whether some of bugs apply to Engine implementation.

Redis Cluster can be run with sth like this locally (but I suppose to compare the performance it's better to try excluding Docker from setup eventually):

version: "3.8"

services:
  cluster:
    image: redis:7.0.0-alpine
    entrypoint:
      - /bin/sh
      - -c
      - |
        redis-server --port 7001 --save "" --appendonly no --cluster-enabled yes --cluster-config-file 7001.conf &
        redis-server --port 7002 --save "" --appendonly no --cluster-enabled yes --cluster-config-file 7002.conf &
        redis-server --port 7003 --save "" --appendonly no --cluster-enabled yes --cluster-config-file 7003.conf &
        while ! redis-cli --cluster create 127.0.0.1:7001 127.0.0.1:7002 127.0.0.1:7003 --cluster-yes; do sleep 1; done
        wait
    ports:
      - "7001:7001"
      - "7002:7002"
      - "7003:7003"

@j178
Copy link
Contributor Author

j178 commented Aug 1, 2022

Hi, thanks for your comment!

I'm sorry for submitting an important change without making any notes. In fact, I really wanted to explain something, but my English is so poor that it was hard to express it clearly. So I chose to submit the code first.

This PR is only a proof of concept and there is still a lot of work before it is finished. I wanted to send it first and hear your opinion, before me going too far down the wrong path.

Some to-do before it's ready for review:

  • Test agasin sentinel and cluster
  • Use real context (instead of context.Background)
  • Verify timeout settings
  • Add new cluster benchmarks
  • Conclude some benchmark numbers

With the current implementation, I ran some benchmark locally, but most of the results are a bit worse than redigo (~10% slower). I'm trying to find out why and do some optimization and hopefully we can get a better result.

@FZambia
Copy link
Member

FZambia commented Aug 1, 2022

Hmm, interesting – I expected a better performance than with Redigo. Actually I did some measurements before opening that issue and results were quite promising. That was the biggest motivation to try the migration. So I'd say looking at benchmarks is the first step here to decide whether it's worth it or not. Since otherwise current implementation is pretty stable I believe to change it for sth new.

@j178
Copy link
Contributor Author

j178 commented Aug 2, 2022

Here are some benchmark results ran in my local machine:

goos: darwin
goarch: amd64
cpu: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz

Most numbers are fine, except for RedisAddPresence_1Ch, RedisPresence_1Ch and RedisPresence_ManyCh, especially RedisAddPresence_1Ch became incredibly slower. No idea why, still investigating.

name                     old time/op    new time/op    delta
RedisExtractPushData-12    54.9ns ± 0%    38.3ns ± 5%   -30.12%  (p=0.008 n=5+5)
name                     old alloc/op   new alloc/op   delta
RedisExtractPushData-12     16.0B ± 0%      0.0B       -100.00%  (p=0.008 n=5+5)
name                     old allocs/op  new allocs/op  delta
RedisExtractPushData-12      1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)

// 100 nodes always stuck, it's wired, just skipped it for now
name                          old time/op  new time/op  delta
RedisSurvey/1_node_512B-12    1.54µs ±11%  1.30µs ± 6%  -15.77%  (p=0.008 n=5+5)
RedisSurvey/2_nodes_512B-12   49.7µs ± 2%  29.4µs ± 4%  -40.94%  (p=0.008 n=5+5)
RedisSurvey/3_nodes_512B-12   66.0µs ±15%  45.0µs ± 4%  -31.85%  (p=0.008 n=5+5)
RedisSurvey/4_nodes_512B-12   86.6µs ±11%  61.5µs ± 5%  -28.94%  (p=0.008 n=5+5)
RedisSurvey/5_nodes_512B-12    103µs ± 5%    82µs ± 3%  -20.66%  (p=0.008 n=5+5)
RedisSurvey/10_nodes_512B-12   246µs ±16%   200µs ±14%  -18.98%  (p=0.032 n=5+5)
RedisSurvey/1_node_4KB-12     1.86µs ± 8%  1.68µs ± 4%   -9.29%  (p=0.016 n=5+5)
RedisSurvey/2_nodes_4KB-12    77.3µs ±13%  57.4µs ±11%  -25.71%  (p=0.008 n=5+5)
RedisSurvey/3_nodes_4KB-12     180µs ±31%   173µs ±17%     ~     (p=1.000 n=5+5)
RedisSurvey/4_nodes_4KB-12     297µs ±55%   256µs ±29%     ~     (p=0.421 n=5+5)
RedisSurvey/5_nodes_4KB-12     393µs ±42%   328µs ±12%     ~     (p=0.151 n=5+5)
RedisSurvey/10_nodes_4KB-12    835µs ± 8%   473µs ±42%  -43.36%  (p=0.008 n=5+5)

name                     old time/op  new time/op  delta
RedisConsistentIndex-12  55.7ns ± 1%  51.5ns ± 0%  -7.53%  (p=0.016 n=5+4)

name           old time/op  new time/op  delta
RedisIndex-12  43.3ns ± 1%  39.5ns ± 2%  -8.70%  (p=0.008 n=5+5)

name                 old time/op  new time/op  delta
RedisPublish_1Ch-12  2.67µs ±12%  2.81µs ± 2%   ~     (p=0.222 n=5+5)

name                    old time/op  new time/op  delta
RedisPublish_ManyCh-12  2.42µs ± 1%  2.76µs ± 1%  +14.15%  (p=0.008 n=5+5)

name                                 old time/op  new time/op  delta
RedisPublish_History_1Ch/lists-12    27.7µs ± 2%  27.2µs ± 1%  -1.97%  (p=0.008 n=5+5)
RedisPublish_History_1Ch/streams-12  34.1µs ± 1%  33.8µs ± 1%    ~     (p=0.056 n=5+5)

name                                old time/op  new time/op  delta
RedisPub_History_ManyCh/lists-12    28.4µs ± 2%  28.0µs ± 5%   ~     (p=0.310 n=5+5)
RedisPub_History_ManyCh/streams-12  28.7µs ± 2%  28.6µs ± 5%   ~     (p=0.690 n=5+5)

name               old time/op  new time/op  delta
RedisSubscribe-12  2.20µs ± 7%  2.41µs ± 9%   ~     (p=0.056 n=5+5)

name                         old time/op  new time/op  delta
RedisHistory_1Ch/lists-12    23.0µs ± 8%  20.7µs ± 1%  -10.09%  (p=0.008 n=5+5)
RedisHistory_1Ch/streams-12  37.6µs ± 1%  36.8µs ± 0%   -2.17%  (p=0.008 n=5+5)

name                         old time/op  new time/op  delta
RedisRecover_1Ch/lists-12     388µs ± 3%   386µs ± 3%    ~     (p=0.421 n=5+5)
RedisRecover_1Ch/streams-12  48.8µs ± 2%  47.3µs ± 1%  -2.98%  (p=0.008 n=5+5)

name                              old time/op  new time/op  delta
RedisHistoryIteration/lists-12     591ms ± 2%   524ms ± 2%  -11.40%  (p=0.008 n=5+5)
RedisHistoryIteration/streams-12  57.1ms ± 2%  52.5ms ± 4%   -8.13%  (p=0.008 n=5+5)

// RedisAddPresence is awfully slower, still investigaing
name                     old time/op  new time/op    delta
RedisAddPresence_1Ch-12  15.8µs ± 0%  4680.2µs ± 1%  +29449.22%  (p=0.016 n=4+5)

name                  old time/op  new time/op  delta
RedisPresence_1Ch-12  13.8µs ± 1%  18.4µs ± 6%  +33.55%  (p=0.008 n=5+5)

name                     old time/op  new time/op  delta
RedisPresence_ManyCh-12  11.7µs ± 0%  15.7µs ± 4%  +33.82%  (p=0.008 n=5+5)

@j178
Copy link
Contributor Author

j178 commented Aug 2, 2022

After fixed the redis.Nil problem, here is the updated benchmark results about RedisPresence:

name                     old time/op  new time/op  delta
RedisAddPresence_1Ch-12  16.2µs ± 2%  16.6µs ±12%   ~     (p=0.690 n=5+5)

name                  old time/op  new time/op  delta
RedisPresence_1Ch-12  13.9µs ± 1%  12.9µs ± 1%  -7.44%  (p=0.008 n=5+5)

name                     old time/op  new time/op  delta
RedisPresence_ManyCh-12  11.9µs ± 1%  11.0µs ± 1%  -7.44%  (p=0.008 n=5+5)

@tony-pang
Copy link

Hi, this is an Interesting work, I just started my implementation using https://github.com/rueian/rueidis which supports sharded pubsub feature of redis 7 for my project.

@FZambia
Copy link
Member

FZambia commented Aug 3, 2022

Think bench results look promising, @j178 actually we are interested not only in latency improvements but also in the number of allocations since that directly affects CPU usage. I suppose that in go-redis allocs are consistently less.

As the next step I think comparing the performance of go-redis with Redigo in Redis Cluster case would be great.

@tony-pang I used rueidis to implement Redis engine for Centrifugo PRO. Unfortunately we can't replace Centrifuge library builtin Redis engine with it since we need to maintain compatibility with Redis instances which work over RESP2. Probably at some point in the future it will be possible to make rueidis-based implementation default in the core of Centrifuge too. But for now go-redis seems the only reasonable alternative to redigo for builtin Engine implementation.

@FZambia
Copy link
Member

FZambia commented Aug 3, 2022

As the next step I think comparing the performance of go-redis with Redigo would be great.

Sorry, I mean performance comparison in Redis Cluster case, updated the comment :)

@j178
Copy link
Contributor Author

j178 commented Aug 3, 2022

Sorry, I mean performance comparison in Redis Cluster case, updated the comment :)

Will do it 😄 But I want to figure out why the BenchmarkRedisSurvey hang when running with 100 nodes on my machine first. Could you run this benchmark successfully? Maybe there is something wrong with my local environment.

@FZambia
Copy link
Member

FZambia commented Aug 3, 2022

But I want to figure out why the BenchmarkRedisSurvey hang when running with 100 nodes on my machine first. Could you run this benchmark successfully? Maybe there is something wrong with my local environment.

Tried, works fine for both redigo and go-redis implementations. Is it hanging for both redigo and go-redis on your machine?

@j178
Copy link
Contributor Author

j178 commented Aug 3, 2022

Is it hanging for both redigo and go-redis on your machine?

That's right. Seems like there is something wrong with my setup.

@j178
Copy link
Contributor Author

j178 commented Aug 4, 2022

@FZambia Hi, have you tried run the benchamrk multiple times? like go test -tags integration -bench=BenchmarkRedisSurvey/100_nodes -run=NONE -count=5. In my case, it always hang in the second or third run.

@FZambia
Copy link
Member

FZambia commented Aug 4, 2022

have you tried run the benchamrk multiple times? like go test -tags integration -bench=BenchmarkRedisSurvey/100_nodes -run=NONE -count=5. In my case, it always hang in the second or third run.

Yep - reproduced! Could you try #237 - this fixed the issue for me.

@j178 j178 force-pushed the go-redis branch 2 times, most recently from ffcf673 to 3299619 Compare August 12, 2022 09:15
@j178
Copy link
Contributor Author

j178 commented Aug 12, 2022

Latest benchstat results against the master branch(including cluster mode stats):

name                                         old time/op    new time/op    delta
RedisExtractPushData-16                        71.9ns ± 1%    32.0ns ± 1%   -55.50%  (p=0.008 n=5+5)
RedisSurvey/1_node_512B-16                     1.57µs ± 2%    1.60µs ± 1%    +1.93%  (p=0.032 n=5+5)
RedisSurvey/2_nodes_512B-16                     276µs ± 6%     165µs ± 3%   -40.45%  (p=0.008 n=5+5)
RedisSurvey/3_nodes_512B-16                     308µs ± 0%     177µs ± 3%   -42.54%  (p=0.008 n=5+5)
RedisSurvey/4_nodes_512B-16                     331µs ± 3%     190µs ± 3%   -42.50%  (p=0.008 n=5+5)
RedisSurvey/5_nodes_512B-16                     349µs ± 2%     204µs ± 1%   -41.48%  (p=0.008 n=5+5)
RedisSurvey/10_nodes_512B-16                    433µs ± 2%     273µs ± 1%   -36.97%  (p=0.008 n=5+5)
RedisSurvey/100_nodes_512B-16                  2.54ms ± 5%    2.06ms ± 2%   -18.77%  (p=0.008 n=5+5)
RedisSurvey/1_node_4096B-16                    1.53µs ± 1%    1.52µs ± 1%    -0.88%  (p=0.024 n=5+5)
RedisSurvey/2_nodes_4096B-16                    312µs ± 3%     188µs ± 2%   -39.74%  (p=0.008 n=5+5)
RedisSurvey/3_nodes_4096B-16                    371µs ± 2%     231µs ± 2%   -37.70%  (p=0.008 n=5+5)
RedisSurvey/4_nodes_4096B-16                    423µs ± 3%     275µs ± 1%   -34.86%  (p=0.008 n=5+5)
RedisSurvey/5_nodes_4096B-16                    478µs ± 2%     333µs ± 1%   -30.22%  (p=0.008 n=5+5)
RedisSurvey/10_nodes_4096B-16                   758µs ± 4%     681µs ± 2%   -10.08%  (p=0.008 n=5+5)
RedisSurvey/100_nodes_4096B-16                 10.3ms ± 3%     9.3ms ± 2%    -9.60%  (p=0.008 n=5+5)
RedisSurvey/1_node_512B_cluster-16             1.45µs ± 1%    1.44µs ± 1%    -0.88%  (p=0.032 n=5+5)
RedisSurvey/2_nodes_512B_cluster-16             285µs ± 7%     162µs ± 1%   -43.16%  (p=0.008 n=5+5)
RedisSurvey/3_nodes_512B_cluster-16             310µs ± 3%     177µs ± 1%   -42.98%  (p=0.008 n=5+5)
RedisSurvey/4_nodes_512B_cluster-16             333µs ± 2%     193µs ± 1%   -41.95%  (p=0.008 n=5+5)
RedisSurvey/5_nodes_512B_cluster-16             354µs ± 2%     206µs ± 1%   -41.91%  (p=0.008 n=5+5)
RedisSurvey/10_nodes_512B_cluster-16            442µs ± 1%     277µs ± 2%   -37.45%  (p=0.008 n=5+5)
RedisSurvey/100_nodes_512B_cluster-16          2.51ms ± 2%    2.10ms ± 5%   -16.23%  (p=0.008 n=5+5)
RedisSurvey/1_node_4096B_cluster-16            1.44µs ± 1%    1.45µs ± 2%      ~     (p=0.889 n=5+5)
RedisSurvey/2_nodes_4096B_cluster-16            307µs ± 2%     191µs ± 1%   -37.85%  (p=0.008 n=5+5)
RedisSurvey/3_nodes_4096B_cluster-16            364µs ± 2%     231µs ± 1%   -36.53%  (p=0.008 n=5+5)
RedisSurvey/4_nodes_4096B_cluster-16            416µs ± 1%     277µs ± 1%   -33.37%  (p=0.008 n=5+5)
RedisSurvey/5_nodes_4096B_cluster-16            465µs ± 0%     332µs ± 2%   -28.57%  (p=0.008 n=5+5)
RedisSurvey/10_nodes_4096B_cluster-16           751µs ± 3%     688µs ± 1%    -8.45%  (p=0.008 n=5+5)
RedisSurvey/100_nodes_4096B_cluster-16         10.4ms ± 3%     9.4ms ± 2%    -9.55%  (p=0.008 n=5+5)
RedisConsistentIndex-16                        54.9ns ± 0%    52.6ns ± 0%    -4.25%  (p=0.008 n=5+5)
RedisIndex-16                                  44.5ns ± 2%    39.4ns ± 1%   -11.37%  (p=0.008 n=5+5)
RedisPublish_1Ch/lists-16                      9.91µs ± 1%    9.70µs ± 2%    -2.14%  (p=0.032 n=5+5)
RedisPublish_1Ch/streams-16                    10.0µs ± 3%     9.4µs ± 1%    -5.90%  (p=0.008 n=5+5)
RedisPublish_1Ch/lists_cluster-16              9.00µs ± 2%   10.91µs ± 2%   +21.19%  (p=0.008 n=5+5)
RedisPublish_1Ch/streams_cluster-16            8.87µs ± 0%   10.93µs ± 4%   +23.26%  (p=0.016 n=5+4)
RedisPublish_ManyCh/lists-16                   9.80µs ± 2%    9.48µs ± 1%    -3.27%  (p=0.008 n=5+5)
RedisPublish_ManyCh/streams-16                 9.87µs ± 1%    9.49µs ± 2%    -3.81%  (p=0.008 n=5+5)
RedisPublish_ManyCh/lists_cluster-16           9.10µs ± 1%   10.07µs ± 3%   +10.66%  (p=0.008 n=5+5)
RedisPublish_ManyCh/streams_cluster-16         9.03µs ± 1%    9.99µs ± 1%   +10.62%  (p=0.008 n=5+5)
RedisPublish_History_1Ch/lists-16              43.2µs ± 1%    33.4µs ± 2%   -22.69%  (p=0.008 n=5+5)
RedisPublish_History_1Ch/streams-16            48.5µs ± 1%    38.7µs ± 2%   -20.30%  (p=0.008 n=5+5)
RedisPublish_History_1Ch/lists_cluster-16      45.6µs ± 2%    45.7µs ± 1%      ~     (p=0.841 n=5+5)
RedisPublish_History_1Ch/streams_cluster-16    50.7µs ± 2%    51.2µs ± 1%      ~     (p=0.548 n=5+5)
RedisPub_History_ManyCh/lists-16               42.9µs ± 1%    33.5µs ± 2%   -22.01%  (p=0.008 n=5+5)
RedisPub_History_ManyCh/streams-16             43.0µs ± 2%    33.3µs ± 2%   -22.62%  (p=0.008 n=5+5)
RedisPub_History_ManyCh/lists_cluster-16       23.1µs ± 4%    25.5µs ± 5%   +10.43%  (p=0.008 n=5+5)
RedisPub_History_ManyCh/streams_cluster-16     23.0µs ± 4%    25.2µs ± 4%    +9.43%  (p=0.008 n=5+5)
RedisSubscribe/non_cluster-16                  6.33µs ± 2%    6.39µs ± 1%      ~     (p=0.151 n=5+5)
RedisSubscribe/with_cluster-16                 6.34µs ± 2%    6.38µs ± 2%      ~     (p=0.841 n=5+5)
RedisHistory_1Ch/lists-16                      38.1µs ± 1%    28.5µs ± 0%   -25.26%  (p=0.008 n=5+5)
RedisHistory_1Ch/streams-16                    49.3µs ± 1%    39.8µs ± 1%   -19.16%  (p=0.008 n=5+5)
RedisHistory_1Ch/lists_cluster-16              37.3µs ± 1%    39.5µs ± 3%    +5.91%  (p=0.008 n=5+5)
RedisHistory_1Ch/streams_cluster-16            49.5µs ± 1%    51.2µs ± 2%    +3.46%  (p=0.008 n=5+5)
RedisRecover_1Ch/lists-16                       675µs ± 1%     661µs ± 2%    -2.03%  (p=0.032 n=5+5)
RedisRecover_1Ch/streams-16                    57.9µs ± 1%    49.4µs ± 1%   -14.60%  (p=0.008 n=5+5)
RedisRecover_1Ch/lists_cluster-16               399µs ± 2%     399µs ± 3%      ~     (p=1.000 n=5+5)
RedisRecover_1Ch/streams_cluster-16            56.2µs ± 5%    58.8µs ± 2%    +4.78%  (p=0.032 n=5+5)
RedisHistoryIteration/lists-16                  1.03s ± 1%     0.93s ± 1%   -10.00%  (p=0.008 n=5+5)
RedisHistoryIteration/streams-16                163ms ± 1%     105ms ± 2%   -35.48%  (p=0.008 n=5+5)
RedisHistoryIteration/lists_cluster-16          925ms ± 3%     914ms ± 1%      ~     (p=0.421 n=5+5)
RedisHistoryIteration/streams_cluster-16        101ms ± 1%     101ms ± 1%      ~     (p=0.310 n=5+5)
RedisAddPresence_1Ch/lists-16                  32.7µs ± 1%    23.9µs ± 1%   -26.76%  (p=0.008 n=5+5)
RedisAddPresence_1Ch/streams-16                32.5µs ± 1%    24.2µs ± 3%   -25.61%  (p=0.008 n=5+5)
RedisAddPresence_1Ch/lists_cluster-16          30.1µs ± 2%    31.3µs ± 1%    +4.08%  (p=0.008 n=5+5)
RedisAddPresence_1Ch/streams_cluster-16        31.8µs ± 4%    31.2µs ± 1%      ~     (p=0.310 n=5+5)
RedisPresence_1Ch/lists-16                     32.4µs ± 1%    22.4µs ± 2%   -31.01%  (p=0.008 n=5+5)
RedisPresence_1Ch/streams-16                   31.8µs ± 2%    22.4µs ± 2%   -29.66%  (p=0.008 n=5+5)
RedisPresence_1Ch/lists_cluster-16             29.9µs ± 2%    30.9µs ± 2%    +3.29%  (p=0.032 n=5+5)
RedisPresence_1Ch/streams_cluster-16           30.4µs ± 3%    30.7µs ± 3%      ~     (p=0.421 n=5+5)
RedisPresence_ManyCh/lists-16                  31.6µs ±11%    21.4µs ± 2%   -32.20%  (p=0.008 n=5+5)
RedisPresence_ManyCh/streams-16                31.3µs ±11%    21.4µs ± 2%   -31.57%  (p=0.008 n=5+5)
RedisPresence_ManyCh/lists_cluster-16          17.0µs ± 4%    17.0µs ± 2%      ~     (p=0.690 n=5+5)
RedisPresence_ManyCh/streams_cluster-16        16.7µs ± 4%    16.9µs ± 3%      ~     (p=0.690 n=5+5)

name                                         old alloc/op   new alloc/op   delta
RedisExtractPushData-16                         16.0B ± 0%      0.0B       -100.00%  (p=0.008 n=5+5)
RedisSurvey/1_node_512B-16                     1.14kB ± 0%    1.14kB ± 0%      ~     (all equal)
RedisSurvey/2_nodes_512B-16                    7.52kB ± 0%    7.14kB ± 0%    -5.12%  (p=0.008 n=5+5)
RedisSurvey/3_nodes_512B-16                    12.6kB ± 0%    12.0kB ± 0%    -5.28%  (p=0.008 n=5+5)
RedisSurvey/4_nodes_512B-16                    17.7kB ± 0%    16.8kB ± 0%    -5.42%  (p=0.008 n=5+5)
RedisSurvey/5_nodes_512B-16                    22.8kB ± 0%    21.6kB ± 0%    -5.45%  (p=0.008 n=5+5)
RedisSurvey/10_nodes_512B-16                   49.2kB ± 0%    46.5kB ± 0%    -5.47%  (p=0.008 n=5+5)
RedisSurvey/100_nodes_512B-16                   544kB ± 0%     505kB ± 0%    -7.05%  (p=0.008 n=5+5)
RedisSurvey/1_node_4096B-16                    1.15kB ± 0%    1.15kB ± 0%      ~     (p=0.643 n=5+5)
RedisSurvey/2_nodes_4096B-16                   28.8kB ± 0%    28.1kB ± 0%    -2.68%  (p=0.008 n=5+5)
RedisSurvey/3_nodes_4096B-16                   54.7kB ± 1%    53.7kB ± 1%    -1.91%  (p=0.008 n=5+5)
RedisSurvey/4_nodes_4096B-16                   82.1kB ± 4%    80.0kB ± 1%    -2.47%  (p=0.008 n=5+5)
RedisSurvey/5_nodes_4096B-16                    109kB ± 3%     107kB ± 2%      ~     (p=0.151 n=5+5)
RedisSurvey/10_nodes_4096B-16                   243kB ± 2%     240kB ± 1%      ~     (p=0.222 n=5+5)
RedisSurvey/100_nodes_4096B-16                 2.68MB ± 0%    2.66MB ± 3%      ~     (p=0.730 n=4+5)
RedisSurvey/1_node_512B_cluster-16             1.15kB ± 0%    1.15kB ± 0%      ~     (p=0.238 n=4+5)
RedisSurvey/2_nodes_512B_cluster-16            10.6kB ±10%     7.8kB ± 5%   -25.95%  (p=0.008 n=5+5)
RedisSurvey/3_nodes_512B_cluster-16            15.4kB ±15%    13.2kB ± 4%   -14.00%  (p=0.008 n=5+5)
RedisSurvey/4_nodes_512B_cluster-16            22.8kB ± 7%    19.4kB ± 7%   -14.87%  (p=0.008 n=5+5)
RedisSurvey/5_nodes_512B_cluster-16            27.0kB ±12%    25.1kB ±11%      ~     (p=0.222 n=5+5)
RedisSurvey/10_nodes_512B_cluster-16           53.4kB ± 4%    50.3kB ± 2%    -5.91%  (p=0.008 n=5+5)
RedisSurvey/100_nodes_512B_cluster-16           562kB ± 3%     515kB ± 1%    -8.35%  (p=0.008 n=5+5)
RedisSurvey/1_node_4096B_cluster-16            1.16kB ± 1%    1.15kB ± 1%      ~     (p=0.087 n=5+5)
RedisSurvey/2_nodes_4096B_cluster-16           33.1kB ± 6%    29.9kB ± 4%    -9.79%  (p=0.008 n=5+5)
RedisSurvey/3_nodes_4096B_cluster-16           57.9kB ± 3%    58.2kB ± 3%      ~     (p=0.690 n=5+5)
RedisSurvey/4_nodes_4096B_cluster-16           86.5kB ± 6%    83.7kB ± 4%      ~     (p=0.222 n=5+5)
RedisSurvey/5_nodes_4096B_cluster-16            113kB ± 2%     110kB ± 3%      ~     (p=0.222 n=5+5)
RedisSurvey/10_nodes_4096B_cluster-16           246kB ± 2%     243kB ± 3%      ~     (p=0.690 n=5+5)
RedisSurvey/100_nodes_4096B_cluster-16         2.85MB ± 3%    2.74MB ± 4%    -3.80%  (p=0.032 n=5+5)
RedisConsistentIndex-16                         8.00B ± 0%     8.40B ± 7%      ~     (p=0.444 n=5+5)
RedisIndex-16                                   8.00B ± 0%     8.00B ± 0%      ~     (all equal)
RedisPublish_1Ch/lists-16                        648B ±10%      670B ± 6%      ~     (p=0.690 n=5+5)
RedisPublish_1Ch/streams-16                      643B ±12%      689B ± 8%      ~     (p=0.278 n=5+5)
RedisPublish_1Ch/lists_cluster-16                760B ±14%      730B ±18%      ~     (p=0.548 n=5+5)
RedisPublish_1Ch/streams_cluster-16              735B ±14%      718B ±19%      ~     (p=0.730 n=5+5)
RedisPublish_ManyCh/lists-16                     697B ± 5%      663B ±14%      ~     (p=0.222 n=5+5)
RedisPublish_ManyCh/streams-16                   719B ±12%      692B ±10%      ~     (p=0.690 n=5+5)
RedisPublish_ManyCh/lists_cluster-16             709B ± 6%      780B ±24%      ~     (p=0.151 n=5+5)
RedisPublish_ManyCh/streams_cluster-16           667B ±10%      734B ±10%      ~     (p=0.310 n=5+5)
RedisPublish_History_1Ch/lists-16              2.10kB ±13%    1.68kB ±16%   -20.16%  (p=0.016 n=5+5)
RedisPublish_History_1Ch/streams-16            2.03kB ±13%    1.75kB ±20%      ~     (p=0.222 n=5+5)
RedisPublish_History_1Ch/lists_cluster-16      2.60kB ± 2%    1.73kB ±20%   -33.61%  (p=0.008 n=5+5)
RedisPublish_History_1Ch/streams_cluster-16    2.45kB ±23%    2.06kB ±26%      ~     (p=0.222 n=5+5)
RedisPub_History_ManyCh/lists-16               2.27kB ±11%    1.74kB ± 9%   -23.29%  (p=0.008 n=5+5)
RedisPub_History_ManyCh/streams-16             2.02kB ±10%    1.85kB ± 9%      ~     (p=0.095 n=5+5)
RedisPub_History_ManyCh/lists_cluster-16       1.84kB ± 8%    1.57kB ±20%   -14.52%  (p=0.032 n=5+5)
RedisPub_History_ManyCh/streams_cluster-16     1.91kB ± 3%    1.67kB ±17%      ~     (p=0.095 n=5+5)
RedisSubscribe/non_cluster-16                  1.44kB ± 4%    1.20kB ± 4%   -16.75%  (p=0.008 n=5+5)
RedisSubscribe/with_cluster-16                 1.53kB ± 5%    1.16kB ± 6%   -24.23%  (p=0.008 n=5+5)
RedisHistory_1Ch/lists-16                      2.53kB ±17%    2.02kB ±17%   -20.20%  (p=0.016 n=5+5)
RedisHistory_1Ch/streams-16                    3.52kB ± 5%    2.70kB ±12%   -23.49%  (p=0.008 n=5+5)
RedisHistory_1Ch/lists_cluster-16              2.80kB ± 9%    2.36kB ± 4%   -15.54%  (p=0.008 n=5+5)
RedisHistory_1Ch/streams_cluster-16            3.45kB ± 7%    3.12kB ±22%      ~     (p=0.151 n=5+5)
RedisRecover_1Ch/lists-16                       173kB ± 3%     142kB ± 3%   -18.29%  (p=0.008 n=5+5)
RedisRecover_1Ch/streams-16                    3.82kB ±19%    3.21kB ±18%      ~     (p=0.222 n=5+5)
RedisRecover_1Ch/lists_cluster-16               167kB ± 2%     140kB ± 0%   -16.51%  (p=0.016 n=5+4)
RedisRecover_1Ch/streams_cluster-16            4.22kB ± 9%    3.51kB ±15%   -16.89%  (p=0.008 n=5+5)
RedisHistoryIteration/lists-16                  184MB ± 4%     148MB ± 2%   -19.44%  (p=0.008 n=5+5)
RedisHistoryIteration/streams-16               5.41MB ±17%    4.66MB ±25%      ~     (p=0.222 n=5+5)
RedisHistoryIteration/lists_cluster-16          178MB ± 4%     145MB ± 3%   -18.65%  (p=0.008 n=5+5)
RedisHistoryIteration/streams_cluster-16       5.25MB ±13%    4.68MB ±23%      ~     (p=0.548 n=5+5)
RedisAddPresence_1Ch/lists-16                  1.57kB ± 5%    1.14kB ±17%   -27.45%  (p=0.008 n=5+5)
RedisAddPresence_1Ch/streams-16                1.29kB ±17%    1.32kB ±17%      ~     (p=0.548 n=5+5)
RedisAddPresence_1Ch/lists_cluster-16          1.80kB ±12%    1.46kB ±25%      ~     (p=0.056 n=5+5)
RedisAddPresence_1Ch/streams_cluster-16        1.75kB ±15%    1.48kB ±20%   -15.53%  (p=0.032 n=5+5)
RedisPresence_1Ch/lists-16                     1.97kB ±16%    1.49kB ±12%   -24.24%  (p=0.008 n=5+5)
RedisPresence_1Ch/streams-16                   2.02kB ±12%    1.59kB ± 9%   -21.28%  (p=0.008 n=5+5)
RedisPresence_1Ch/lists_cluster-16             2.06kB ±14%    1.61kB ±15%   -21.66%  (p=0.032 n=5+5)
RedisPresence_1Ch/streams_cluster-16           2.22kB ± 5%    1.61kB ±27%   -27.28%  (p=0.008 n=5+5)
RedisPresence_ManyCh/lists-16                  1.39kB ± 8%    1.24kB ±18%      ~     (p=0.095 n=5+5)
RedisPresence_ManyCh/streams-16                1.45kB ±14%    1.19kB ±21%   -17.79%  (p=0.032 n=5+5)
RedisPresence_ManyCh/lists_cluster-16          1.49kB ± 8%    1.16kB ±11%   -22.00%  (p=0.008 n=5+5)
RedisPresence_ManyCh/streams_cluster-16        1.53kB ± 3%    1.17kB ±14%   -23.43%  (p=0.008 n=5+5)

name                                         old allocs/op  new allocs/op  delta
RedisExtractPushData-16                          1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
RedisSurvey/1_node_512B-16                       15.0 ± 0%      15.0 ± 0%      ~     (all equal)
RedisSurvey/2_nodes_512B-16                      90.0 ± 0%      81.0 ± 0%   -10.00%  (p=0.008 n=5+5)
RedisSurvey/3_nodes_512B-16                       140 ± 0%       124 ± 0%   -11.43%  (p=0.008 n=5+5)
RedisSurvey/4_nodes_512B-16                       190 ± 0%       167 ± 0%   -12.11%  (p=0.008 n=5+5)
RedisSurvey/5_nodes_512B-16                       240 ± 0%       210 ± 0%      ~     (p=0.079 n=4+5)
RedisSurvey/10_nodes_512B-16                      491 ± 0%       426 ± 0%   -13.16%  (p=0.000 n=4+5)
RedisSurvey/100_nodes_512B-16                   5.63k ± 1%     4.66k ± 0%   -17.19%  (p=0.008 n=5+5)
RedisSurvey/1_node_4096B-16                      15.0 ± 0%      15.0 ± 0%      ~     (all equal)
RedisSurvey/2_nodes_4096B-16                     96.8 ± 1%      82.8 ± 1%   -14.46%  (p=0.008 n=5+5)
RedisSurvey/3_nodes_4096B-16                      147 ± 4%       127 ± 3%   -13.99%  (p=0.008 n=5+5)
RedisSurvey/4_nodes_4096B-16                      214 ±16%       180 ± 3%   -16.14%  (p=0.008 n=5+5)
RedisSurvey/5_nodes_4096B-16                      279 ±14%       243 ±12%      ~     (p=0.056 n=5+5)
RedisSurvey/10_nodes_4096B-16                     575 ± 7%       530 ± 7%      ~     (p=0.190 n=5+5)
RedisSurvey/100_nodes_4096B-16                  7.85k ± 1%     6.94k ±11%   -11.62%  (p=0.016 n=4+5)
RedisSurvey/1_node_512B_cluster-16               15.0 ± 0%      15.0 ± 0%      ~     (all equal)
RedisSurvey/2_nodes_512B_cluster-16               129 ±10%        89 ± 6%   -30.75%  (p=0.008 n=5+5)
RedisSurvey/3_nodes_512B_cluster-16               175 ±17%       140 ± 6%   -19.79%  (p=0.008 n=5+5)
RedisSurvey/4_nodes_512B_cluster-16               252 ± 7%       200 ± 8%   -20.84%  (p=0.008 n=5+5)
RedisSurvey/5_nodes_512B_cluster-16               290 ±13%       253 ±14%      ~     (p=0.063 n=5+5)
RedisSurvey/10_nodes_512B_cluster-16              541 ± 5%       472 ± 3%   -12.82%  (p=0.008 n=5+5)
RedisSurvey/100_nodes_512B_cluster-16           5.85k ± 3%     4.79k ± 2%   -18.09%  (p=0.008 n=5+5)
RedisSurvey/1_node_4096B_cluster-16              15.0 ± 0%      15.0 ± 0%      ~     (all equal)
RedisSurvey/2_nodes_4096B_cluster-16              151 ±15%       106 ±13%   -30.25%  (p=0.008 n=5+5)
RedisSurvey/3_nodes_4096B_cluster-16              188 ±12%       184 ±12%      ~     (p=0.722 n=5+5)
RedisSurvey/4_nodes_4096B_cluster-16              271 ±22%       226 ±19%      ~     (p=0.222 n=5+5)
RedisSurvey/5_nodes_4096B_cluster-16              329 ± 7%       282 ±14%   -14.17%  (p=0.008 n=5+5)
RedisSurvey/10_nodes_4096B_cluster-16             612 ±11%       571 ±17%      ~     (p=0.222 n=5+5)
RedisSurvey/100_nodes_4096B_cluster-16          9.93k ±10%     8.00k ±16%   -19.36%  (p=0.016 n=5+5)
RedisConsistentIndex-16                          1.00 ± 0%      1.00 ± 0%      ~     (all equal)
RedisIndex-16                                    1.00 ± 0%      1.00 ± 0%      ~     (all equal)
RedisPublish_1Ch/lists-16                        11.0 ± 0%       9.6 ± 6%   -12.73%  (p=0.000 n=4+5)
RedisPublish_1Ch/streams-16                      10.8 ±11%      10.0 ± 0%      ~     (p=0.095 n=5+4)
RedisPublish_1Ch/lists_cluster-16                12.2 ±10%       9.8 ±18%   -19.67%  (p=0.024 n=5+5)
RedisPublish_1Ch/streams_cluster-16              12.2 ±10%      10.0 ±20%   -18.03%  (p=0.032 n=5+5)
RedisPublish_ManyCh/lists-16                     12.0 ± 0%      10.6 ±13%   -11.67%  (p=0.048 n=5+5)
RedisPublish_ManyCh/streams-16                   12.4 ± 5%      10.8 ±11%   -12.90%  (p=0.032 n=5+5)
RedisPublish_ManyCh/lists_cluster-16             12.4 ± 5%      12.0 ± 0%      ~     (p=0.333 n=5+4)
RedisPublish_ManyCh/streams_cluster-16           12.0 ± 8%      11.0 ± 9%      ~     (p=0.246 n=5+5)
RedisPublish_History_1Ch/lists-16                38.6 ± 9%      31.4 ±11%   -18.65%  (p=0.008 n=5+5)
RedisPublish_History_1Ch/streams-16              37.8 ± 8%      32.4 ±14%      ~     (p=0.056 n=5+5)
RedisPublish_History_1Ch/lists_cluster-16        47.4 ± 1%      33.4 ±14%   -29.54%  (p=0.008 n=5+5)
RedisPublish_History_1Ch/streams_cluster-16      45.6 ±16%      37.6 ±18%      ~     (p=0.056 n=5+5)
RedisPub_History_ManyCh/lists-16                 41.0 ± 7%      32.2 ± 6%   -21.46%  (p=0.008 n=5+5)
RedisPub_History_ManyCh/streams-16               37.6 ± 6%      33.8 ± 7%   -10.11%  (p=0.016 n=5+5)
RedisPub_History_ManyCh/lists_cluster-16         37.6 ± 4%      31.2 ±13%   -17.02%  (p=0.008 n=5+5)
RedisPub_History_ManyCh/streams_cluster-16       38.2 ± 3%      32.4 ±11%   -15.18%  (p=0.008 n=5+5)
RedisSubscribe/non_cluster-16                    28.2 ± 8%      20.4 ± 3%   -27.66%  (p=0.008 n=5+5)
RedisSubscribe/with_cluster-16                   31.0 ± 3%      19.8 ± 6%   -36.13%  (p=0.008 n=5+5)
RedisHistory_1Ch/lists-16                        59.4 ± 9%      48.8 ± 8%   -17.85%  (p=0.008 n=5+5)
RedisHistory_1Ch/streams-16                      96.2 ± 2%      81.2 ± 5%   -15.59%  (p=0.008 n=5+5)
RedisHistory_1Ch/lists_cluster-16                65.4 ± 5%      54.4 ± 1%   -16.82%  (p=0.008 n=5+5)
RedisHistory_1Ch/streams_cluster-16              97.4 ± 3%      88.0 ±10%    -9.65%  (p=0.008 n=5+5)
RedisRecover_1Ch/lists-16                       5.20k ± 1%     4.18k ± 1%   -19.60%  (p=0.008 n=5+5)
RedisRecover_1Ch/streams-16                       110 ± 8%        96 ± 8%   -12.52%  (p=0.008 n=5+5)
RedisRecover_1Ch/lists_cluster-16               5.12k ± 1%     4.15k ± 0%   -18.89%  (p=0.016 n=5+4)
RedisRecover_1Ch/streams_cluster-16               117 ± 4%       101 ± 6%   -13.65%  (p=0.008 n=5+5)
RedisHistoryIteration/lists-16                  5.33M ± 2%     4.27M ± 1%   -19.80%  (p=0.008 n=5+5)
RedisHistoryIteration/streams-16                 164k ± 7%      147k ±10%    -9.83%  (p=0.016 n=5+5)
RedisHistoryIteration/lists_cluster-16          5.26M ± 2%     4.23M ± 1%   -19.48%  (p=0.008 n=5+5)
RedisHistoryIteration/streams_cluster-16         162k ± 5%      147k ± 9%      ~     (p=0.056 n=5+5)
RedisAddPresence_1Ch/lists-16                    26.0 ± 4%      19.8 ±11%   -23.85%  (p=0.008 n=5+5)
RedisAddPresence_1Ch/streams-16                  22.2 ±10%      22.2 ±10%      ~     (p=1.000 n=5+5)
RedisAddPresence_1Ch/lists_cluster-16            31.0 ±10%      25.4 ±17%   -18.06%  (p=0.008 n=5+5)
RedisAddPresence_1Ch/streams_cluster-16          31.2 ± 2%      25.4 ±14%   -18.72%  (p=0.016 n=4+5)
RedisPresence_1Ch/lists-16                       34.0 ±12%      26.8 ± 8%   -21.18%  (p=0.008 n=5+5)
RedisPresence_1Ch/streams-16                     34.8 ± 8%      28.0 ± 7%   -19.54%  (p=0.008 n=5+5)
RedisPresence_1Ch/lists_cluster-16               37.0 ±11%      29.4 ±12%   -20.54%  (p=0.008 n=5+5)
RedisPresence_1Ch/streams_cluster-16             39.4 ± 4%      29.6 ±18%   -24.87%  (p=0.008 n=5+5)
RedisPresence_ManyCh/lists-16                    24.8 ± 5%      22.8 ±12%      ~     (p=0.063 n=5+5)
RedisPresence_ManyCh/streams-16                  25.8 ±11%      22.0 ±14%   -14.73%  (p=0.040 n=5+5)
RedisPresence_ManyCh/lists_cluster-16            27.8 ± 6%      22.6 ± 7%   -18.71%  (p=0.008 n=5+5)
RedisPresence_ManyCh/streams_cluster-16          28.6 ± 2%      23.0 ± 9%   -19.58%  (p=0.008 n=5+5)

@FZambia
Copy link
Member

FZambia commented Aug 13, 2022

I think overall looks good and worth moving forward 👍

As the next step - let's concentrate on timeouts? In general the goal is to avoid calls without timeout. We need to make sure we don't have infinite waiting for sending/waiting indefinitely. In Redigo case we achieve this by having both WriteTimeout and ReadTimeout set for the pool.

Possibly go-redis also have a way to set global timeouts to use. Possibly not - and we will have to use per-call context-based timeouts then. Using per-call context timeouts will definitely hit the performance so I'd prefer mimicking Redigo as first step if possible.

From time to time I am thinking that all our Engine methods should have context.Context as first argument (and possibly Node methods that call Engine), but that's sth we can consider later. Whether using context cancellation/timeouts from the outside is good or not is not obvious for me at this point. Most Go libraries tend to propagate Context in every call where possible, but since we can't really cancel operation sent to Redis - I still think global timeouts have sense and pretty practical.

I will try to look at timeouts too and code here more closely during next days - for now a bit busy with various home tasks :) Many thanks for pushing this forward @j178.

@j178 j178 force-pushed the go-redis branch 2 times, most recently from 5573178 to 603025d Compare August 13, 2022 15:09
@FZambia
Copy link
Member

FZambia commented Aug 23, 2022

redis_shard.go Outdated Show resolved Hide resolved
shard.subCh = make(chan subRequest)
shard.pubCh = make(chan pubRequest)
shard.dataCh = make(chan *dataRequest)
if !shard.useCluster {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

go-redis implements a way to run pipeline in cluster mode too, let's try it out and see whether it works。

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this way btw? So currently in runDataPipeline we will receive requests with keys belonging to different key slots. So we will construct a pipeline object from them all. Does go-redis do some sort of remapping in this case?

Copy link
Contributor Author

@j178 j178 Aug 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does go-redis do some sort of remapping in this case?

That's right, go-redis maps commands in a pipeline to different nodes, in ClusterClient._processPipeline:

https://github.com/go-redis/redis/blob/c561f3ca7e5cf44ce1f1d3ef30f4a10a9c674c8a/cluster.go#L1062

Copy link
Member

@FZambia FZambia Aug 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm 🤔 While this leads to a better performance, it makes me think that go-redis approach with splitting commands by node and attempting to retry the entire pipeline in case of an error on one node may increase the chance of getting duplicate command invocations. For example:

  1. We have 2 "publish with history" requests: first to node A, second to node B
  2. Assuming A is currently not available, like in the process of failover
  3. Part of the pipeline will be successfully executed on B, but we will get an error since it was failed on A
  4. All commands in batch will get an error on Centrifuge level, possibly will be re-issued by some application code - thus there is a good chance to execute publish on B again.

Think the scenario above (and some other error scenarios) only has a negative effect for publishing with history at the moment - i.e. when double invocations as a result of retry on application level result into duplicate messages sent and duplicate messages saved into a stream.

It seems to me that a better way to manage this is only retrying failed parts of pipeline, not the entire one. I think it's properly done in rueidis library which has implicit pipelining and we have a granular error management for pieces of pipeline out of the box. For go-redis I don't think we can simply use pipelining in Cluster mode given the concerns above ^

Copy link
Contributor Author

@j178 j178 Sep 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

attempting to retry the entire pipeline in case of an error on one node

This is not right I think, _processPipeline records failed commands in failedCmds, and then attempts to run those failed only IIUIC: https://github.com/go-redis/redis/blob/c561f3ca7e5cf44ce1f1d3ef30f4a10a9c674c8a/cluster.go#L1104

possibly will be re-issued by some application code - thus there is a good chance to execute publish on B again

This is possible, but I don't see it will happen in centrifuge. Because we collect commands and run in a pipeline in the redis_shard level, if failed, by no means we will retry the whole batch in this level (since it is dangerous), and the upper level, knows nothing about the pipelining thing, what it can do is just reissuing the single failed command it issued before. So I don't think the issue will happen.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, but I meant sth different. Imagine a Redis Cluster scenario, in this case we only have one Redis shard on Centrifuge level that operates with Cluster. Now, let's suppose we have the case I described above:

  1. We have 2 "publish with history" requests: first to node A, second to node B. They are combined into one pipeline and executed together inside runDataPipeline.
  2. Assuming A is currently not available, like in the process of failover. And B is healthy
  3. go-redis will execute part of the pipeline successfully (on B), part of pipeline with error (on A). But we will get an error from pipe.Exec
    cmds, err := pipe.Exec(context.Background())
    since it was failed on A - because go-redis returns first observed error: https://github.com/go-redis/redis/blob/c561f3ca7e5cf44ce1f1d3ef30f4a10a9c674c8a/cluster.go#L1107
  4. All commands in batch will get an error on Centrifuge level due to this one error on A - see
    if err != nil {
    - while some commands in batch may be successful.

Does it make sense? Or I am missing sth? Possibly we can change Centrifuge code a bit to iterate over pipeline Exec result and manually extracting which commands were successful and which failed. And extracting cmd results from successful only.

redis_shard.go Show resolved Hide resolved
redis_shard.go Outdated
}
}
cmds, err := pipe.Exec(context.Background())
if err == redis.Nil {
Copy link
Member

@FZambia FZambia Aug 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems we can miss the error here as go-redis always returns first non-nil error when executing Pipeline and I have not found it checks for redis.Nil internally.

Copy link
Contributor Author

@j178 j178 Sep 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This err == redis.Nil is something I don't know how to handle correctly. For Redis scripts that return nothing, like addPresenceSource, Redis will reply with a nil string, go-redis takes it as an error and returns a redis.Nil error.

If we don't ignore the redis.Nil here, then the whole runDataPipeline will exit and rerun, that's a huge performance loss.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't ignore the redis.Nil here, then the whole runDataPipeline will exit and rerun, that's a huge performance loss.

Yes, definitely we should not restart pipeline. For me it seems like an issue of go-redis - because it seems that redis.Nil is not actually an error so pipe.Exec should not return it as an error at all.

Possibly we should just ignore error from pipe.Exec and just iterate over returned cmds to handle each error individually 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed a commit that ignores errors from pipe.Exec, and defer the redis.Nil judgment to the upper lever (Presencemanager.addPresence and removePresence) where it knows whether redis.Nil is expected or not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, so this should also effectively solve the concerns in this comment right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so, current behavior is in line with this statement:

we can change Centrifuge code a bit to iterate over pipeline Exec result and manually extract which commands were successful and which failed. And extracting cmd results from successful only.

@j178 j178 marked this pull request as ready for review September 7, 2022 16:16
@j178 j178 marked this pull request as draft September 8, 2022 11:21
@j178 j178 force-pushed the go-redis branch 2 times, most recently from 73d6364 to 1f62391 Compare September 13, 2022 07:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Investigate possible migration to github.com/go-redis/redis
3 participants