Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow renewing cert without stopping web server (was: Starting/Stopping services fails when service not available) #5

Open
gronke opened this issue Aug 31, 2016 · 9 comments

Comments

@gronke
Copy link
Contributor

gronke commented Aug 31, 2016

When attempting to stop a service that is not yet installed on the target system, the letsencrypt role exists with an error. We should therefore check if a specific service exists, before trying to start/stop it.

This becomes important when you decide to first configure your webserver and then install it without exposing the default site for a short moment. When doing so the certificates need also to be in place before installing, so that the server can successfully launch.

@gronke
Copy link
Contributor Author

gronke commented Aug 31, 2016

I have no idea yet how to approach this. The simplest would probably be to check if the file /etc/init.d/{{service_name}} exists, but what about systems using systemd or upstart? We can look how the service module selects the right launch daemon, but that would mean we need to follow future changes to that module and reflect them here too. Maybe it's appropriate to give this upstream to Ansible to add a soft-fail option on services? .. feedback welcome 😅

@jaywink
Copy link
Owner

jaywink commented Sep 2, 2016

Well, right now the role only targets Debian based systems, with a promise of Ubuntu 14.04. I think it's safe to in this case check whether Apache2 is installed using commands available on these systems. If someone contributes to make the role work for say RPM based Linuxes, then those commands can be expanded to suit.

So something like this and then remove apache2 from the letsencrypt_pause_services list if not installed?

@gronke
Copy link
Contributor Author

gronke commented Sep 3, 2016

That would probably work on all Debian based linux hosts and with some conditional tasks we could also add support for other distributions and BSD. We'd need to keep track of package and according service names in case both are not identical. That list should be easy to extend from outside, so that we can deal with unknown services too.

The approach of stopping running services results in downtime of (production) systems, which we could try to improve in the same breath. I could imagine two ways to reduce the downtime:

Option 1: Using the Firewall to redirect traffic

Serving the certbot acme challenge on different tcp ports and temporary configuring the firewall to forward incoming traffic to these would allow to keep all services running, so that we would not have to deal with that. (The downtime would still remains a problem - ideally we know the source IPs used by LetsEncrypt.)

Edit: Example with iptables

iptables-save > /tmp/iptables-rules.v4

iptables -t nat -A PREROUTING -p tcp --dport 80 -j REDIRECT --to-port 8080
iptables -A INPUT -p tcp --dport 8080 -j ACCEPT

iptables -P INPUT ACCEPT
iptables -P FORWARD ACCEPT
iptables -P OUTPUT ACCEPT
iptables -t nat -F
iptables -t mangle -F
iptables -F
iptables -X
iptables-restore < /tmp/iptables-rules.v4

Option 2: Serve .well-known/acme-challenge/ with the running webserver

Many common webservers support global overrides for specific paths on all virtual hosts. Like done here with nginx or here with apache2. Reloading the service once after configuring the override and one more time after cleaning up would allow us to switch to the new certificates without having a downtime at all.

@jaywink
Copy link
Owner

jaywink commented Sep 3, 2016

I'm a strong fan of option 2, definitely. Initially I wanted to look at something like that but ended up going for the simple stop/start, which unfortunately does not go well in more demanding environments that can't allow downtime. The SO answer though does have some comments that it might not work on all virtualhosts, but that is probably a tweaking issue :) Also, Apache 2.2 and 2.4 might have some differences - here 2.4 should be at least supported.

@gronke
Copy link
Contributor Author

gronke commented Sep 10, 2016

I wonder if we should just provide some examples how to configure nginx, apache2 and some other common webservers to server the acme-challenges from a common directory that certbot exports to. Personally I do not like the idea of fiddling with existing webserver configs from within this role.

The only thing we need to do is to allow switching off --standalone and add --webroot-path instead.

@jaywink
Copy link
Owner

jaywink commented Sep 20, 2016

I would still like the role to be able to be used without the user having to pre-configure anything for it, so would prefer to at least by default the behaviour to install temporary configuration into apache / nginx, and then remove it. The configuration of these is pretty standard.

For the edge cases or those who want to not have this role mess with their web server, provide a possibility to disable configuring apache/nginx?

Closing this would allow all the service start/stop code to be removed I guess?

@gronke
Copy link
Contributor Author

gronke commented Sep 20, 2016

For the edge cases or those who want to not have this role mess with their web server, provide a possibility to disable configuring apache/nginx?

In more complex setups the last thing I would want is certbot trying to be smart and automatically change configuration of a service that is not running on port 80/443 at all.

I think we should not start maintaining single services to allow continuous service, but instead add a mode where certbot puts the the challenge to a configurable output directory (not using --standalone). Doesn't certbot already has the ability to tweak configurations that we could mask behind a configuration option?

Closing this would allow all the service start/stop code to be removed I guess?

Nope, it works as intended. The services are stopped before certbot runs. In case of failure during certificate creation the service should most probably come up again anyway - renewing a certificate is a common task with this role and you don't want your service to stay down just because LetsEncrypt was not reachable. What you probably still want is that the rest of your playbook is not executed and error's out, so that you become aware of the problem.

@jaywink
Copy link
Owner

jaywink commented Sep 20, 2016

What about certbot-external plugin? It makes a temporary virtualhost afaict, and we could provide variants of the handler script for both Apache and nginx. This would not need any global config changes - and of course we could allow disabling this via configuration, but this could be an easy to adopt default way of getting certificates without stopping the web server for downtime.

@jaywink
Copy link
Owner

jaywink commented Sep 20, 2016

See also discussion in this issue.

@jaywink jaywink changed the title Starting/Stopping services fails when service not available Allow renewing cert without stopping web server (was: Starting/Stopping services fails when service not available) Sep 22, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants