Load-Peaks and still not multidomain-usable #57

tackin · 2020-03-20T19:34:27Z

The Gateways Erai an Rustig are using our fork (https://github.com/freifunktrier/mesh-announce) of this repo.
I have had 3 problems:

Load peaks
permanent changing wrong node-data in YANIC
warnings like:
Mar 20 17:13:49 pegol yanic[9430]: time="2020-03-20T17:13:49.179+01:00" level="warn" msg="override nodeID from 2661965025dc to 266196502501 on MAC address 26:61:96:60:25:05" caller="nodes. go:207 github.com/FreifunkBremen/yanic/runtime.(*Nodes).readIfaces"
Mar 20 17:13:49 pegol yanic[9430]: time="2020-03-20T17:13:49.208+01:00" level="warn" msg="override nodeID from 266196502504 to 266196502505 on MAC address 26:61:96:60:25:04" caller="nodes. go:207 github.com/FreifunkBremen/yanic/runtime.(*Nodes).readIfaces"
Mar 20 17:13:49 pegol yanic[9430]: time="2020-03-20T17:13:49.209+01:00" level="warn" msg="override nodeID from 2661965025dc to 266196502501 on MAC address 26:61:96:60:25:05" caller="nodes. go:207 github.com/FreifunkBremen/yanic/runtime.(*Nodes).readIfaces"
Mar 20 17:13:49 pegol yanic[9430]: time="2020-03-20T17:13:49.211+01:00" level="warn" msg="override nodeID from 266196501003 to 2661965010dc on MAC address 26:61:96:60:10:dc" caller="nodes. go:207 github.com/FreifunkBremen/yanic/runtime.(*Nodes).readIfaces"
Mar 20 17:13:49 pegol yanic[9430]: time="2020-03-20T17:13:49.216+01:00" level="warn" msg="override nodeID from 266196502504 to 266196502505 on MAC address 26:61:96:60:25:04" caller="nodes. go:207 github.com/FreifunkBremen/yanic/runtime.(*Nodes).readIfaces"
Mar 20 17:13:49 pegol yanic[9430]: time="2020-03-20T17:13:49.217+01:00" level="warn" msg="override nodeID from 2661965025dc to 266196502501 on MAC address 26:61:96:60:25:05" caller="nodes. go:207 github.com/FreifunkBremen/yanic/runtime.(*Nodes).readIfaces"
Mar 20 17:13:49 pegol yanic[9430]: time="2020-03-20T17:13:49.218+01:00" level="warn" msg="override nodeID from 266196501003 to 2661965010dc on MAC address 26:61:96:60:10:dc" caller="nodes. go:207 github.com/FreifunkBremen/yanic/runtime.(*Nodes).readIfaces"

I shiftet to my older mesh-announce fork from ffda (multicast on ff02:....) and my problems are gone.

AiyionPrime · 2020-04-04T08:33:55Z

Your numbers two and three should be resolved by the merge of #58 .
Can you confirm that, @tackin ?
About the load peaks I cannot say anything, yet.

tackin · 2020-04-04T10:26:15Z

Need to install/test it again for 2. and 3. If 1. is fixed.
No. 3. is a YANIC thing. (May be solved)
No. 2. is not clear to me, if it is a YANIC- or mesh-announce-bug.

tackin · 2020-04-04T19:56:35Z

@AiyionPrime
Tested:
No. 3 seems to be solved.
No. 2 is not solved.

tackin · 2020-04-04T20:49:18Z

@AiyionPrime
Your PR#58 solves problem no.2 for me.

AiyionPrime · 2020-04-05T08:29:25Z

The laod peaks appear in hannover as well, but seem to correlate with fastd's cpu usage (likely the context switches) and not mesh-announce. How to reproduce the finding of mesh-announce being the evil one?

tackin · 2020-04-05T09:10:15Z

By simply disable the service and see what happend.

See pict above. The Loadpeaks stopped when I stopped the service on rustig and erai.

AiyionPrime · 2020-04-05T09:15:18Z

Thanks, I will try to reproduce it tonight.

AiyionPrime · 2020-04-05T20:51:26Z

First things first, hanover has the same issue on all four supernodes.
The peaks are always about one hour and 45 minutes apart from each other (averaged over the last day).

One thing to note is, they don't peak or start to spike at the same time.
We watched the load this day and could not find anything but fastd and ocassionally mesh-announce in the top 10 of htop.

At 20:30 we stopped the mesh-announce service, the resulting graph is this one.

As you can see, this does drastically reduce the load, but doesn't prevent the spike altogether.
As it appears mesh-announce is responsible for part of the load, but not the triggering event itself.
Therefore I can confirm the bug, a workaround that reduces the load is possibly to use the multi-domain feature in one instance.

Trier likely had a bigger impact by mesh-announce, as they had more instances running.
I'll try that tomorrow, for now our supernodes are busily tested in order to exclude causes like or monitoring, our zabbix, our whatsoever.

Looking over to Trier the load appears to peak in the same frequency:
https://draco.freifunk-trier.starletp9.de:3000/d/Gb1_MoJik/freifunk-trier-uberblick?orgId=1

Quite possible, that I miss the forest for the trees, but I can't figure out, whats triggering after 105 minutes, independent of when a system booted.

AiyionPrime · 2020-04-05T21:00:28Z

@moridius just stopped fastd on a supernode, it drastically reduces the spike as well, but not completely, if mesh-announce is left running.
@tackin have you already taken dumps of the traffic for two or three period-lenghts?

tackin · 2020-04-06T08:04:18Z

@tackin have you already taken dumps of the traffic for two or three period-lenghts?

No, sorry, I have no idea where/what to look for in a dump.
For us stopping fastd also would drop all tunnels and traffic. Would not make sense in testing I guess.

AiyionPrime · 2020-04-06T13:53:49Z

Well, then.
Yesterday 20:30 I've shut down the first supernode 09, reducing its temp-load drastically, as seen in the last graph.
Thid did not change in the last 16? hours.

Today, 13:00 o'clock I've shut down the other mesh-announce instances as well.
They all showed the same result, drastic reduction of their load in the peak window.

The second shutdown did not effect the loadpeak on sn09 at all.
My conclusion stands, mesh-announce is responsible for (part of) the loadpeak, but for the event triggering it, it is not.

Here is the current graph, sn[01,08,09,10] are currently all of our supernodes running mesh-announce. The red dot marks 13:05, when my shutdown of the remaining three instances took effect.

We'll start tcpdumps later this afternoon. I'm now firing up mesh-announce again.

AiyionPrime · 2020-04-07T07:10:44Z

I got my non-findings of the event and the resulting load peer reviewed yesterday.
Unlikely, that tcpdumps will help at this point already. Will determine, whether darmstadts fork had the issue as well. If not, go back to the fork determine it wasn't an issue back then, too and finally bisect, when things went south. Will do this after lunch.

TobleMiner · 2021-03-21T14:10:45Z

Does this issue still exist? There have been major changes in mesh-announce and thus additional confirmation on this issue is required. This issue will be closed in a month if there is no further activity.

tackin · 2021-03-22T08:02:29Z

@TobleMiner Sorry, didn't find the time yet to test it. It's not a big issue/problem for us at the moment, so I feel no pressure. ;-) I'll come back to it a.s.a.p.

AiyionPrime mentioned this issue Sep 7, 2020

[RFC] [RFT] Move options to config file #60

Merged

TobleMiner added the confirmation needed label Mar 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Load-Peaks and still not multidomain-usable #57

Load-Peaks and still not multidomain-usable #57

tackin commented Mar 20, 2020 •

edited

Loading

AiyionPrime commented Apr 4, 2020

tackin commented Apr 4, 2020

tackin commented Apr 4, 2020

tackin commented Apr 4, 2020

AiyionPrime commented Apr 5, 2020

tackin commented Apr 5, 2020 •

edited

Loading

AiyionPrime commented Apr 5, 2020 •

edited

Loading

AiyionPrime commented Apr 5, 2020

AiyionPrime commented Apr 5, 2020 •

edited

Loading

tackin commented Apr 6, 2020 •

edited

Loading

AiyionPrime commented Apr 6, 2020

AiyionPrime commented Apr 7, 2020

TobleMiner commented Mar 21, 2021

tackin commented Mar 22, 2021

Load-Peaks and still not multidomain-usable #57

Load-Peaks and still not multidomain-usable #57

Comments

tackin commented Mar 20, 2020 • edited Loading

AiyionPrime commented Apr 4, 2020

tackin commented Apr 4, 2020

tackin commented Apr 4, 2020

tackin commented Apr 4, 2020

AiyionPrime commented Apr 5, 2020

tackin commented Apr 5, 2020 • edited Loading

AiyionPrime commented Apr 5, 2020 • edited Loading

AiyionPrime commented Apr 5, 2020

AiyionPrime commented Apr 5, 2020 • edited Loading

tackin commented Apr 6, 2020 • edited Loading

AiyionPrime commented Apr 6, 2020

AiyionPrime commented Apr 7, 2020

TobleMiner commented Mar 21, 2021

tackin commented Mar 22, 2021

tackin commented Mar 20, 2020 •

edited

Loading

tackin commented Apr 5, 2020 •

edited

Loading

AiyionPrime commented Apr 5, 2020 •

edited

Loading

AiyionPrime commented Apr 5, 2020 •

edited

Loading

tackin commented Apr 6, 2020 •

edited

Loading