Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revisit how and when to start the Zenoh router #231

Open
clalancette opened this issue Jul 1, 2024 · 8 comments
Open

Revisit how and when to start the Zenoh router #231

clalancette opened this issue Jul 1, 2024 · 8 comments

Comments

@clalancette
Copy link
Collaborator

There are a couple of different things we can do here:

  1. Assume that the router is already running. This is what we currently do. This has the benefit of being relatively easy to do for rmw_zenoh_cpp. It is also nice if the user wants to run zenohd on their own, through systemd or another management system. However, this means that things don't "just work" out-of-the-box, like they do with the DDS RMWs. This essentially gets us back to a system that looks similar to ROS 1 with the roscore.
  2. Attempt to contact the router, and launch one if we cannot contact it. This has the benefit of "just working" for those who don't know anything about the system. However, implementing this is fraught with peril. While we certainly could come up with a fork/exec pattern on Linux, launching it on Windows is going to be harder. Also, managing the lifetime of that separate zenohd process is unclear. Should it go away when the node that launched it goes away? When the last local user goes away? What system is responsible for killing it off (since it may be the case that the thing that launched it has gone away by this time)? What about race conditions between multiple nodes starting up at the same time and trying to create multiple zenohd routers?
@Timple
Copy link
Contributor

Timple commented Jul 1, 2024

I like the auto launch idea. Not having to start the master is one of the few things that did not get more complicated after the migration 😄

Although we don't want to end up with a situation like the ros2-daemon. It starts by itself (ok), but the configuration might be outdated (nok). Leading to tutorials to always kill it just in case...

Regarding cleanup: a self destruct if no clients are connected (for 5s) perhaps?

@clalancette
Copy link
Collaborator Author

I like the auto launch idea. Not having to start the master is one of the few things that did not get more complicated after the migration 😄

While I like the idea in theory, I think it is going to be very, very difficult to come up with something cross-platform and reliable to manage it. I'm leaning towards "make the user do it" (e.g. number 1), but we definitely have to improve our story around initial connection to the router before we make a final decision.

@codebot
Copy link
Member

codebot commented Jul 8, 2024

Although automatic launch would be great, as @clalancette said, both phases seem difficult: when starting a bunch of ROS 2 nodes near-simultaneously, which one should decide that it should try to fork zenohd? The shutdown situation is also tricky: does there need to be a "shutdown daemon" that observes nothing has talked to zenohd for some timeout, then send it some signal (and the equivalent on Windows). Then there are the corner cases when the shutdown timeout happens at the same time as another node's startup 😵 Every platform provides elaborate functionality for things like this (systemd, etc.), but would require per-platform implementations to hook in nicely.

I guess I'd vote for (1) and try to make it easier-to-use via documentation:

1a) setup guides for those who want to have it auto-launch on popular platforms using systemd, etc. Perhaps we could make an Ubuntu package that would automate this, like ros-rolling-zenohd-autostart (name TBD), that people could install if they just don't want to think about it? Installing this package would add some systemd configuration that ensures zenohd is started (and restarted) automatically, whenever the system is running. This would be convenient for some cases, but probably not all.

1b) to be more user-friendly, ideally we could try to detect connection failures to zenohd on startup, and then provide a friendly help-text snippet in loud yellow/green/red text to the console which includes a link to web-based documentation. I'm not sure how hard it would be to detect, however, since it probably needs a timeout to allow for potential network latency, and timeouts are always hard in the general case.

@clalancette
Copy link
Collaborator Author

Agreed, those are both great ideas @codebot .

@Timple
Copy link
Contributor

Timple commented Jul 22, 2024

Good idea!
Although one has to be carefull it doesn't automatically get pulled in by a package calling it as a dependency 🙂

@nkoenig
Copy link
Contributor

nkoenig commented Aug 9, 2024

I really like some form of autostart, whether it comes from a separate package or is otherwise built in. Without this, the end user will have to fall back on shell scripts or other out-of-band tricks to get this working. This is a major pain point for me right now, and I'm very selfish.

I'll poke at option 2 (from the original post). There is an autostart concept in the ros2 daemon already. I don't know the details of how it works. I assume autostarting a daemon is acceptable in this case because it's a human at a keyboard that triggers the daemon via the usage of CLI tools? However, what happens if someone creates a launch file that just runs a number of ExecuteProcess commands all using CLI tools (I should try this..)? All that is basically asking the question, "Have we already handled the complexity of autostarting a process/daemon, and can we just re-use that logic?"

@clalancette
Copy link
Collaborator Author

So there are 2 major problems with the way we currently launch the ros2 daemon, which means it doesn't apply here:

  1. The ros2 daemon is only run "sometimes". The conditions under which it runs are when certain ros2cli commands are run (like ros2 topic echo, etc). But if you never use those particular ros2cli commands (which is common, for instance, on a remote robot), then the daemon is never run. For rmw_zenoh, we need the router running all of the time, otherwise nothing works.
  2. When we do launch the ros2 daemon, we do it via Python. That obviously makes it cross-platform, but makes it unsuitable for use down here in rmw_zenoh (which is written in C/C++).

Thus, while we do indeed have cross-platform logic for launching the daemon, I don't think we can reuse it at this layer. I'm happy to be proven wrong about that, though.

@nkoenig
Copy link
Contributor

nkoenig commented Aug 9, 2024

That makes sense. Thanks for the notes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants