Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve port handling #124

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ananthakumaran
Copy link

This commit tries to improve two issues

  1. A free port is obtained by setting the port value as zero and the OS will bind to a free port. We immediately close the port and then later create another socket on the same port. The issue with the approach is, OS could allocate the same port to another because we have closed the port. This leads to a situation where more than one bypass server could listen on the same port (this is possible because of SO_REUSEPORT flag). The issue is fixed by not closing the socket.

  2. Bypass exposes a down api, which closes the socket. The issue here is the same as above, the OS is free to allocate the port to others. The current solution tries to fix the issue by keeping track of which test owns which ports and try not to reuse the same ports. This is still not foolproof, there is a small interval during which the socket is active, but better than the old logic.

To set some background context, we have a very large test suite and run into this issue a lot. We have been using this patch for some time and it has definitely improved our CI situation. We still hit the race condition from time to time due to the usage of Bypass.down. I am open to different approaches as well since the fix is not perfect.

Thanks to @akash-akya for helping me with debugging and brainstorming.

This commit tries to improve two issues

1) A free port is obtained by setting the port value as zero and the
OS will bind to a free port. We immediately close the port and then
later create another socket on the same port. The issue with the
approach is, OS could allocate the same port to another because we
have closed the port. This leads to a situation where more than one
bypass server could listen on the same port (this is possible because
of SO_REUSEPORT flag). The issue is fixed by not closing the socket.

2) Bypass exposes a down api, which closes the socket. The issue here
is the same as above, the OS is free to allocate the port to
others. The current solution tries to fix the issue by keeping track
of which test owns which ports and try not to reuse the same
ports. This is still not foolproof, there is a small interval during
which the socket is active, but better than the old logic.
@bszaf
Copy link
Contributor

bszaf commented Jun 1, 2023

Hello @ananthakumaran, sorry for the time you waited to handle it.

This approach looks promising. What about transferring ownership to listening process? Right now it is invoked from

case :ranch_tcp.listen(Utils.so_reuseport() ++ [ip: Utils.listen_ip(), port: 0]) do
in Bypass.FreePort. The listening process ownership should be transferred to Bypass.Instance, what can improve closing ports and even get rid of the race condition you are mentioning. This is due to fact that the socket will be linked with Bypass.Instance process and thus should be closed automatically. Also opening listening socket with {linger, {false, 0}} can improve timing as well. Can you please try that?

Apart from that, a piece of documentation is needed for PR to be accepted.

@ananthakumaran
Copy link
Author

Unfortunately, I have left the company and I no longer have access to the big test suite.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants