Using aedes-persistence-redis with Aedes resulting in slow client connections. #10

rasalasantosh · 2017-01-06T05:50:05Z

When aedes-persistence-redis used as persistence with Aedes , client connections are very slow . For 20k client connections it was taking around 10 minutes.But without persistence configured,connections are very fast around 50 k connection in less than a minute.

Please find the code below used to run aedes server.

var redis = require('mqemitter-redis');
var aedesPersistenceRedis = require('aedes-persistence-redis');

var mq = redis({
port: 6379,
host: '172.31.38.96',
db: 12
});

var persistence = aedesPersistenceRedis({
port: 6379,
host: '172.31.38.96'

});

var aedes = require('aedes')({
mq:mq,
persistence:persistence
})

var server = require('net').createServer(aedes.handle)
var port = 1883

server.listen(port, function () {
console.log('server listening on port', port)
})

mcollina · 2017-01-06T11:12:17Z

Can you upload a Node.js script to reproduce the problem?

rasalasantosh · 2017-01-06T11:28:32Z

I don't have Node.js script ,I have tried with java based client program. In program there is a while loop which creates new client and connects to server and after connection subscribe to a topic whose value is same as clientId.

mcollina · 2017-01-06T11:30:37Z

anything that can reproduce the problem is good, otherwise it will be extremely hard to fix.

mcollina · 2017-01-06T11:30:54Z

cc @GavinDmello

GavinDmello · 2017-01-06T20:00:29Z

Yea. This is an issue. We have a lot of anonymous functions in persistence which is preventing optimizations, I think. Also, ResultsHolder from fastSeries always seems to be un-optimized.

mcollina · 2017-01-07T09:36:07Z

@GavinDmello removing the anon functions should be feasible, do you want to send a PR? Mostly are forEach/Reduce etc, that needs to be changed into straight for loop.

ResultsHolder should be optimized if the rest is optimized. There might be a small issue somewhere in there too.

At any rate, we need a script to verify this problem is fixed.

GavinDmello · 2017-01-07T14:29:57Z

@mcollina Yea. That would be cool. I'll make a PR
@rasalasantosh Could you share that java script ? Would be helpful.

rasalasantosh · 2017-01-09T04:51:58Z

@mcollina , kindly share the client script that you have used to test 900 k connections , so that i modify according to my use case and test.

mcollina · 2017-01-09T06:26:31Z

@rasalasantosh as said elsewhere, I did not do that test myself.

mcollina · 2017-01-10T13:27:25Z

@rasalasantosh can you try version v2.6.1 and see if it improves things?

rasalasantosh · 2017-01-11T07:09:47Z

@mcollina , is it redis v2.6.1?If so will try and let you know.

GavinDmello · 2017-01-11T07:10:54Z

@rasalasantosh v2.6.1 of this module

rasalasantosh · 2017-01-11T08:11:46Z

@GavinDmello , I installed aedes using npm i aedes aedes-persistence-redis@2.6.1 --save and tried again.

It will still taking around 11 minutes for 25k connections , but without persistence it was taking less than a minute for 50k connections.

And also tried with mosca with persistence ,it was taking around 3 minutes for 25k connections.

mcollina · 2017-01-11T09:42:27Z

@rasalasantosh are you connecting with what options? are you also subscribing to a topic?

rasalasantosh · 2017-01-11T10:47:59Z

@mcollina ,yes after connection successful ,iam subscribing to topic as well.

mcollina · 2017-01-13T08:05:15Z

@rasalasantosh you should check v2.6.3, it has lua script caching, which should improve the throughput.

rasalasantosh · 2017-01-13T12:11:32Z

@mcollina , I have tested again ,but giving the same results i.e. 11min for 25k connections.

mcollina · 2017-01-13T13:29:45Z

@rasalasantosh are you connecting with what options? are you also subscribing to a topic?
From our test, it takes just 30 seconds to connect 20k connections, so maybe something else is happening.

rasalasantosh · 2017-01-16T06:05:50Z

@mcollina , iam connecting with clean session =true , QoS=1,keepalive - 60 sec .And also after connecting we will be subscribing to topic which is same as clientId.

And also please share the test script,so i can verify at my side also with your script.

GavinDmello · 2017-01-16T06:29:59Z

@rasalasantosh Could you try it with this ? https://gist.github.com/GavinDmello/dda612661df4db11ba84b41097470c95
It's not currently subscribing to a topic and clean is true.

rasalasantosh · 2017-01-16T14:12:01Z

@GavinDmello , Aedes is taking 142 sec to connect for 50k connections without subscriptions,but 1360 sec to connect with subscriptions.
Tested the same with mosca ,it is taking 190 sec without subscriptions ,but 205 sec with subscriptions.

Please find the script below that is used to test subscriptions.

client-sub.txt

mcollina · 2017-01-16T14:50:42Z

@rasalasantosh with Redis configured as persistence, is it correct?

I think the createRetainedStream is where the problem is: https://github.com/mcollina/aedes-persistence-redis/blob/master/persistence.js#L98-L105. I'll try to send a PR to improve things

mcollina · 2017-01-16T19:58:52Z

@rasalasantosh check v2.6.4.

rasalasantosh · 2017-01-17T05:42:31Z

@mcollina with v2.6.4 getting same results.

GavinDmello · 2017-01-17T05:51:21Z

@rasalasantosh Try removing all unwanted keys from redis. Also, is redis working on the same box ? Maybe you're running out of memory.

rasalasantosh · 2017-01-17T06:29:19Z

@GavinDmello ,I tried after flushing redis,but getting same results . More over redis , Aedes and client are running in seperate instances.

mcollina · 2017-01-17T08:06:18Z

Mosca is using an hashtable for the retained messages, while we are just using keys and h here. https://github.com/mcollina/mosca/blob/master/lib/persistence/redis.js#L241-L267.

Maybe that's the approach we should take here as well. I thought using SCAN was smart

toddtreece · 2017-01-20T15:33:24Z

@mcollina i'm wondering if this is due to the lua script. we eventually run into issues in production once the subscription count hits a certain point. i missed this section in the redis docs when testing the lua changes: ...while the script is running no other client can execute commands.

from the redis docs:

Redis uses the same Lua interpreter to run all the commands. Also Redis guarantees that a script is executed in an atomic way: no other script or Redis command will be executed while a script is being executed. This semantic is similar to the one of MULTI / EXEC. From the point of view of all the other clients the effects of a script are either still not visible or already completed.

However this also means that executing slow scripts is not a good idea. It is not hard to create fast scripts, as the script overhead is very low, but if you are going to use slow scripts you should be aware that while the script is running no other client can execute commands.

mcollina · 2017-01-20T16:23:24Z

@toddtreece that is probably the reason.
So, do you think we should revert #3?

mcollina · 2017-06-18T09:36:50Z

I'm ok with changes, as long as they are ported to the other persistences as well.

mcollina · 2017-06-18T09:37:31Z

(sorry I was from mobile)

GavinDmello · 2017-06-18T10:00:47Z

@behrad How many subscriptions are dealing with ? I did not use any SCAN/HSCAN implementation of redis because, if count is not set to an optimum value the performance isn't good. Also the redis server was doing a lot of heavy lifting in terms of CPU , when many aedes instances are connected

behrad · 2017-06-18T10:15:25Z

How many subscriptions are dealing with ?

may be a few millions in production, however current implementation of aedes persistence [redis] will rise issues around a few hundred thousands.

I did not use any SCAN/HSCAN implementation of redis because

please check mosca's redis implementation here: https://github.com/mcollina/mosca/blob/master/lib/persistence/redis.js#L137

we can't set an upper limit for client subscription count

Also the redis server was doing a lot of heavy lifting in terms of CPU , when many servers are aedes instances are connected

This may be an issue with the core arch/design of a multi process/instance around a single shared redis model. I believe aedes-persistence-redis should support clustered redis internally (however we can test it against a transparent proxy (like twemproxy or codis))

Do you think your concern is the same as mcollina/mqemitter-child-process#1 @GavinDmello ?

GavinDmello · 2017-06-18T11:37:40Z

I think you can fetch all keys in a list with a -1 arg. My concern is using SCAN in hot paths. The count 25000 which is present in mosca may be good enough for a few million keys but as the subscriptions increase the number will have to be bumped .
Nutcracker can't be used with multi at the moment.

behrad · 2017-06-18T12:53:28Z

I think you can fetch all keys in a list with a -1 arg

You mean LRANGING a million sized list is faster than SCAN?
if yes, how much faster? does it worth the redundancy (in large deployments redis memory is valuable)
and why not a SET? why LIST? and there should be some dedup login inside using a list. currently it is duplicating subs on each handleSubscribe

The count 25000 which is present in mosca may be good enough for a few million keys but as the subscriptions increase the number will have to be bumped .

that is the chunk buffer size, and my tests show that should be lowered even to 5000 or lower.

Nutcracker can't be used with multi at the moment.

regarding multi, (multi is heavy for redis) What if we sacrifice a-bit of consistency for clusterability/performance and remove/option-lize multi? @mcollina
We can at-least provide an configurable option for this since subscriptions are idempotent and subsequent client subscriptions can fix any inconsistencies.

mcollina · 2017-06-18T14:16:19Z

I'm 👍 on removing MULTI, I do not think it would cause any issue in practice.

GavinDmello · 2017-06-18T15:19:25Z

and why not a SET?

Set would've been a good option. I wasn't aware of the handleSubscribe issue

that is the chunk buffer size, and my tests show that should be lowered even to 5000 or lower.

I'm okay with this if nutcracker support is added.

GavinDmello · 2017-06-21T11:44:42Z

@behrad SCAN is not supported on Nutcracker as well.

https://github.com/twitter/twemproxy/blob/master/notes/redis.md

SSCAN, ZSCAN, HSCAN is supported though

behrad · 2017-06-23T12:49:55Z

SCAN is not supported on Nutcracker as well.

Unfortunately yes, and seems we SHOULD have a container for all subs/clients:

SET is preferred since it does the dedup for topics
For storage/memory optimization I think we should store only one index, either subs or clients. and calculate/implement needed abstract persistence interface methods.

currently, this module is storing both. e.g. a client's subscription is stored both inside client's hash, and also inside the topic hash (only clientId would suffice in the latter)

SSCAN, ZSCAN, HSCAN is supported though

How do you think SSCAN will compare to SMEMBERS for a large (over a million entry) set @GavinDmello ?

behrad · 2017-06-29T12:51:07Z

To test further, i migrated my current actual subscription data (under mosca), to aedes format under redis. When I launch aedes, memory grows until process halts. I noticed the _setup has some issues. However when I moved the same code inside _setup (to hgetallbuffer all topic keys, but without the process part) and ran it isolated in a single node.js file. It worked! it took 67606.250ms to hgetall all topics.

console.time('@@@@@@@@@@@@@@@@@@ READY');
  function split (keys, enc, cb) {
    for (var i = 0, l = keys.length; i < l; i++) {
      this.push(keys[i])
    }
    cb()
  }
  var splitStream = through.obj(split);
  var count = 0; var result = 0;
  var hgetallStream = throughv.obj(function getStream (chunk, enc, cb) {
    ioredisClient.hgetallBuffer(chunk, function(e,r){ result++; cb(); });
    count++;
  }, function emitReady (cb) {
    console.timeEnd('@@@@@@@@@@@@@@@@@@ READY');
    console.log('count ', count);
    cb()
  });
  console.time('smembers');
  ioredisClient.smembers('sub:client', function lrangeResult (err, results) {
    console.timeEnd('smembers');
    setInterval(()=>{console.log('Count='+count, ' Result='+result)},1000);
    if (err) {
      splitStream.emit('error', err)
    } else {
      splitStream.write(results);
      splitStream.end()
    }
  });

  pump(splitStream, hgetallStream, function pumpStream (err) {})

Surprisingly when I moved in the same code above (which is not processing keys) inside _setup it didn't work again!
What can be introducing a memory leak here with the same code guys? @mcollina @GavinDmello

behrad · 2017-06-29T14:12:12Z

My heap dump shows it's inside SingleCaller -> thoughtv's ```fastparallel` usage

mcollina · 2017-06-29T14:16:11Z

You don't need to use streams here. You will get some speed bump from removing them.

GavinDmello · 2017-06-29T14:41:07Z

@mcollina Could it be qlobber ? We're adding all the decoded packets to qlobber versus the topic.

mcollina · 2017-06-29T14:49:48Z

@GavinDmello not sure, but I don't think so.

behrad · 2017-06-29T15:01:22Z

So strange!!! I figured it out: NODE_ENV=development
the difference between two runs was NODE_ENV! and when it's set on development, two things was happening:

thoughtv was introducing a memory leak (as I mentioned inside it's fastparallel SingleCaller) changing it to through fixed that.
A 10x speed reduction!!! I can't figure this out yet. Some module's development inspection (may be ioredis) should cause this.

Could it be qlobber ?

No, Qlobber part of code was commented.

behrad · 2017-06-29T15:06:37Z

You don't need to use streams here. You will get some speed bump from removing them.

by removing streams and putting hgetallbuffer inside a for loop, it gets faster, however it won't scale memory wise, we need a huge memory footprint on aedes startup and process freezes until persistence is ready.

mcollina · 2017-06-29T15:09:35Z

by removing streams and putting hgetallbuffer inside a for loop, it gets faster, however it won't scale memory wise, we need a huge memory footprint on aedes startup and process freezes until persistence is ready.

You can use something like http://npm.im/fastq.

behrad · 2017-06-29T15:42:57Z

My benchmarks ran just now:

through(stream)                70 secs
forloop:                     86 secs
fastq(concurrency=1)         126 secs
fastq(concurrency=100)       67 secs
fastq(concurrency=1000)     84 secs
fastq(concurrency=10000)   91 secs

So it seems changing to fastq won't help that much :)

Before I go and test with Qlobber code activated, there are some optimizitions I wanna mention:

By loading from a topic index, we are hgetall-ing a clientId multiple times.
Instead of a topic index, we'd better load by a client index (as mosca) and fetching less buffers. I'll try to optimize redis key schema for this.
How much will it help, If we spawn multiple child processes, each hgetall-ing a subset of the set, then returning topic buffers back to the main setup process? (LRANGE or SSCAN can be used for this) My idea is to use multiple cores for loading large subscription data, will redis be saturated then @mcollina ?
Aedes opens port before persistence is ready, persistence queues all callbacks until ready, and I don't like the user to be offline_msg_forwarded after 70 seconds, I prefer non ready instance has no port listening (like mosca) so that my load balancer relays user to a running instance. Can we change this to the method mosca behaves @mcollina ?

mcollina · 2017-06-29T15:58:27Z

on your results, can I see your code?

on 1. go ahead!

on 2. I'm really against going multi-process. This is very deployment specific.

on 3. Aedes does not listen automatically. That part of logic in Mosca was super complicated, so it's up to you. If you want to send an update to the README, go ahead. If we are missing some events for this, please send a PR and we'll add them.

behrad · 2017-06-29T16:05:43Z

on your results, can I see your code?

stream code is same as the above snippet (obtained from current aedes' code)
for loop, removes through and calls hgetallbuffer inside smembers callback.
fastq is something like this:

  var queue = require('fastq')(worker, 100)
  queue.drain = function() {
    console.timeEnd('@@@@@@@@@@@@@@@@@@ READY')
    console.log('count ', count)
  }
  function worker (arg, cb) {
    that._db.hgetallBuffer(arg, function(e,r){ result++; cb(); })
    count++;
  }
  this._db.smembers(subscriptionsKey, function lrangeResult (err, results) {
    for (var i = 0, l = results.length; i < l; i++) {
      queue.push(results[i])
    }
  }

on 2. I'm really against going multi-process. This is very deployment specific.

So, I'll suspend this, and go for more priority tasks like number 1.

on 3. Aedes does not listen automatically. That part of logic in Mosca was super complicated, so it's up to you. If you want to send an update to the README, go ahead. If we are missing some events for this, please send a PR and we'll add them.

Great. I should check if persistence api already exposes ready .

mcollina · 2017-06-29T16:10:03Z

I think 1. might be the path that leads to the best benefits.

behrad · 2017-06-29T16:16:01Z

Yes, sure.

Can you clarify me also on these extra details @mcollina ?

Why are you using thoughtv instead of though2 ? The same could've been achieved by through2
Can we play with concurrency of thought.obj by setting a hightWaterMark?
Why are using pump, how it helps?

mcollina · 2017-06-29T16:34:58Z

throughv parallelize, through2 does things in series.

you can play with throughv concurrency by setting a highWaterMark.

pump makes sure all streams are closed if one errors.

behrad · 2017-06-29T17:04:46Z

thank you man 👍

and have you got any clues why fastparallel's SingleCaller is eating memory when NODE_ENV=development ?

behrad · 2017-07-09T19:46:09Z

This issue should be resolved by #31

robertsLando · 2020-01-24T08:55:10Z

@mcollina @behrad can this be closed?

mcollina mentioned this issue Jan 16, 2017

Use synctrhough to process things #14

Merged

mcollina closed this as completed Jun 18, 2017

mcollina reopened this Jun 18, 2017

robertsLando added the bug label Jan 24, 2020

Using aedes-persistence-redis with Aedes resulting in slow client connections. #10

Using aedes-persistence-redis with Aedes resulting in slow client connections. #10

Comments

rasalasantosh commented Jan 6, 2017

mcollina commented Jan 6, 2017

rasalasantosh commented Jan 6, 2017 • edited Loading

mcollina commented Jan 6, 2017

mcollina commented Jan 6, 2017

GavinDmello commented Jan 6, 2017

mcollina commented Jan 7, 2017

GavinDmello commented Jan 7, 2017

rasalasantosh commented Jan 9, 2017

mcollina commented Jan 9, 2017

mcollina commented Jan 10, 2017

rasalasantosh commented Jan 11, 2017

GavinDmello commented Jan 11, 2017

rasalasantosh commented Jan 11, 2017

mcollina commented Jan 11, 2017

rasalasantosh commented Jan 11, 2017

mcollina commented Jan 13, 2017

rasalasantosh commented Jan 13, 2017

mcollina commented Jan 13, 2017

rasalasantosh commented Jan 16, 2017 • edited Loading

GavinDmello commented Jan 16, 2017

rasalasantosh commented Jan 16, 2017 • edited Loading

mcollina commented Jan 16, 2017

mcollina commented Jan 16, 2017

rasalasantosh commented Jan 17, 2017

GavinDmello commented Jan 17, 2017

rasalasantosh commented Jan 17, 2017

mcollina commented Jan 17, 2017

toddtreece commented Jan 20, 2017

mcollina commented Jan 20, 2017

mcollina commented Jun 18, 2017 • edited Loading

mcollina commented Jun 18, 2017

GavinDmello commented Jun 18, 2017 • edited Loading

behrad commented Jun 18, 2017 • edited Loading

GavinDmello commented Jun 18, 2017

behrad commented Jun 18, 2017

mcollina commented Jun 18, 2017

GavinDmello commented Jun 18, 2017

GavinDmello commented Jun 21, 2017 • edited Loading

behrad commented Jun 23, 2017

behrad commented Jun 29, 2017 • edited Loading

behrad commented Jun 29, 2017

mcollina commented Jun 29, 2017

GavinDmello commented Jun 29, 2017

mcollina commented Jun 29, 2017

behrad commented Jun 29, 2017 • edited Loading

behrad commented Jun 29, 2017

mcollina commented Jun 29, 2017

behrad commented Jun 29, 2017 • edited Loading

mcollina commented Jun 29, 2017

behrad commented Jun 29, 2017

mcollina commented Jun 29, 2017

behrad commented Jun 29, 2017

mcollina commented Jun 29, 2017

behrad commented Jun 29, 2017 • edited Loading

behrad commented Jul 9, 2017

robertsLando commented Jan 24, 2020

rasalasantosh commented Jan 6, 2017 •

edited

Loading

rasalasantosh commented Jan 16, 2017 •

edited

Loading

rasalasantosh commented Jan 16, 2017 •

edited

Loading

mcollina commented Jun 18, 2017 •

edited

Loading

GavinDmello commented Jun 18, 2017 •

edited

Loading

behrad commented Jun 18, 2017 •

edited

Loading

GavinDmello commented Jun 21, 2017 •

edited

Loading

behrad commented Jun 29, 2017 •

edited

Loading

behrad commented Jun 29, 2017 •

edited

Loading

behrad commented Jun 29, 2017 •

edited

Loading

behrad commented Jun 29, 2017 •

edited

Loading