Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ejabberdctl restart sometimes fails #4137

Open
nandlab opened this issue Jan 1, 2024 · 10 comments
Open

ejabberdctl restart sometimes fails #4137

nandlab opened this issue Jan 1, 2024 · 10 comments
Assignees

Comments

@nandlab
Copy link

nandlab commented Jan 1, 2024

Happy New Year!

Environment

  • ejabberd version: 23.10.0
  • Erlang version: 14.1.5
  • OS: Devuan GNU/Linux 5 (daedalus)
  • Installed from: official dpkg repository from ProcessOne

Errors from error.log/crash.log

No errors

ejabberd.log

2023-12-30 03:23:04.611133+01:00 [info] <0.2280.0>@ejabberd_config:load/1:82 Loading configuration from /opt/ejabberd/conf/ejabberd.yml
2023-12-30 03:23:05.926536+01:00 [info] <0.2280.0>@ejabberd_config:load/1:89 Configuration loaded successfully
2023-12-30 03:23:06.333227+01:00 [info] <0.2537.0>@ejabberd_systemd:init/1:94 Got no NOTIFY_SOCKET, notifications disabled
2023-12-30 03:23:06.664343+01:00 [info] <0.2536.0>@gen_mod:start_modules/0:141 Loading modules for zauber.dynv6.net
2023-12-30 03:23:06.887456+01:00 [info] <0.2631.0>@mod_stun_disco:parse_listener/1:616 Going to offer STUN/TURN service: a.b.c.d:3478 (udp)
2023-12-30 03:23:07.338042+01:00 [info] <0.2689.0>@mod_mqtt:init_topic_cache/2:644 Building MQTT cache for zauber.dynv6.net, this may take a while

Bug description

After running sudo ejabberdctl restart ejabberd has written the log seen above and crashed. The ejabberd server was not available that day. There were no error messages and I could not find any information about what went wrong.

Later, after a manual sudo service ejabberd start ejabberd started normally and has written the following lines to the log:

[...]
2024-01-01 07:52:48.997734+01:00 [info] <0.591.0>@mod_mqtt:init_topic_cache/2:644 Building MQTT cache for zauber.dynv6.net, this may take a while
2024-01-01 07:52:49.288450+01:00 [info] <0.179.0>@ejabberd_cluster_mnesia:wait_for_sync/1:123 Waiting for Mnesia synchronization to complete
2024-01-01 07:52:49.351047+01:00 [warning] <0.472.0>@ejabberd_pkix:log_warnings/1:393 Invalid certificate in /etc/letsencrypt/live/zauber.dynv6.net/fullchain.pem: at line 58: certificate is signed by unknown CA
2024-01-01 07:52:49.460733+01:00 [info] <0.179.0>@ejabberd_app:start/2:63 ejabberd 23.10.0 is started in the node ejabberd@localhost in 3.41s

Why could sudo ejabberdctl restart bring the server down? What could make ejabberd stuck/crash when "Building MQTT cache"?

@licaon-kter
Copy link
Contributor

Is this with loglevel: info ?

Can you try this fix #4109 (comment) ?

@nandlab
Copy link
Author

nandlab commented Jan 1, 2024

Yes, I had loglevel: info. I have now changed it to loglevel: debug.

@nandlab
Copy link
Author

nandlab commented Jan 1, 2024

I am closing this as duplicate for now. If this issue still persists after the new commits, I will reopen it.

@nandlab nandlab closed this as not planned Won't fix, can't repro, duplicate, stale Jan 1, 2024
@nandlab nandlab reopened this Jan 4, 2024
@nandlab
Copy link
Author

nandlab commented Jan 4, 2024

I set EJABBERD_OPTS="" in ejabberdctl.cfg but ejabberd still crashed on startup.

ejabberd.yml
include_config_file:
  /var/www/html/ipa.yml:
    # 'SELF_IPV4A'
    # 'SELF_IPV6A'
    allow_only:
      - "define_macro"
  /opt/ejabberd/conf/ejabberd-secret-credentials.yml:
    # 'SQL_PASSWORD'
    allow_only:
      - "define_macro"

hosts:
  - zauber.dynv6.net

hide_sensitive_log_data: true
loglevel: debug

sql_type: mysql
sql_server: "localhost"
sql_database: "ejabberd"
sql_username: "ejabberd"
sql_password: 'SQL_PASSWORD'
sql_prepared_statements: true
auth_method: [sql]
default_db: sql
# update_sql_schema: true

certfiles:
  - /etc/letsencrypt/live/zauber.dynv6.net/fullchain.pem
  - /etc/letsencrypt/live/zauber.dynv6.net/privkey.pem

listen:
  -
    port: 5222
    ip: "::"
    module: ejabberd_c2s
    max_stanza_size: 262144
    shaper: c2s_shaper
    access: c2s
    starttls_required: true
  -
    port: 5223
    ip: "::"
    module: ejabberd_c2s
    max_stanza_size: 262144
    shaper: c2s_shaper
    access: c2s
    tls: true
  -
    port: 5269
    ip: "::"
    module: ejabberd_s2s_in
    max_stanza_size: 524288
    shaper: s2s_shaper
  -
    port: 5443
    ip: "::"
    module: ejabberd_http
    tls: true
    request_handlers:
      /admin: ejabberd_web_admin
      /api: mod_http_api
      /bosh: mod_bosh
      /captcha: ejabberd_captcha
      /upload: mod_http_upload
      /ws: ejabberd_http_ws
  -
    port: 5280
    ip: "::"
    module: ejabberd_http
    request_handlers:
      /admin: ejabberd_web_admin
#      /.well-known/acme-challenge: ejabberd_acme
  -
    port: 3478
    ip: "::"
    transport: udp
    module: ejabberd_stun
    use_turn: true
    ## The server's public IPv4 address:
    turn_ipv4_address: 'SELF_IPV4A'
    ## The server's public IPv6 address:
    # turn_ipv6_address: "2001:db8::3"
  -
    port: 1883
    ip: "::"
    module: mod_mqtt
    backlog: 1000

acme:
    auto: false

s2s_use_starttls: optional

captcha_cmd: /usr/local/bin/captcha.sh
captcha_url: auto

acl:
  admin:
    user: admin1@zauber.dynv6.net
  local:
    user_regexp: ""
  loopback:
    ip:
      - 127.0.0.0/8
      - ::1/128

access_rules:
  local:
    allow: local
  c2s:
    deny: blocked
    allow: all
  announce:
    allow: admin
  configure:
    allow: admin
  muc_create:
    allow: local
  pubsub_createnode:
    allow: local
  trusted_network:
    allow: loopback

api_permissions:
  "console commands":
    from:
      - ejabberd_ctl
    who: all
    what: "*"
  "admin access":
    who:
      access:
        allow:
          - acl: loopback
          - acl: admin
      oauth:
        scope: "ejabberd:admin"
        access:
          allow:
            - acl: loopback
            - acl: admin
    what:
      - "*"
      - "!stop"
      - "!start"
  "public commands":
    who:
      ip: 127.0.0.1/8
    what:
      - status
      - connected_users_number

shaper:
  normal:
    rate: 3000
    burst_size: 20000
  fast: 100000

shaper_rules:
  max_user_sessions: 10
  max_user_offline_messages:
    5000: admin
    100: all
  c2s_shaper:
    none: admin
    normal: all
  s2s_shaper: fast

modules:
  mod_adhoc: {}
  mod_admin_extra: {}
  mod_announce:
    access: announce
  mod_avatar: {}
  mod_blocking: {}
  mod_bosh: {}
  mod_caps: {}
  mod_carboncopy: {}
  mod_client_state: {}
  mod_configure: {}
  mod_disco: {}
  mod_fail2ban: {}
  mod_http_api: {}
  mod_http_upload:
    put_url: https://@HOST@:5443/upload
    custom_headers:
      "Access-Control-Allow-Origin": "https://@HOST@"
      "Access-Control-Allow-Methods": "GET,HEAD,PUT,OPTIONS"
      "Access-Control-Allow-Headers": "Content-Type"
  mod_last: {}
  mod_mam:
    db_type: sql
    assume_mam_usage: true
    default: always
  mod_mqtt: {}
  mod_muc:
    access:
      - allow
    access_admin:
      - allow: admin
    access_create: muc_create
    access_persistent: muc_create
    access_mam:
      - allow
    default_room_options:
      mam: true
  mod_muc_admin: {}
  mod_offline:
    access_max_user_messages: max_user_offline_messages
  mod_ping: {}
  mod_privacy: {}
  mod_private: {}
  mod_proxy65:
    access: local
    max_connections: 5
  mod_pubsub:
    access_createnode: pubsub_createnode
    plugins:
      - flat
      - pep
    force_node_config:
      ## Avoid buggy clients to make their bookmarks public
      storage:bookmarks:
        access_model: whitelist
  mod_push: {}
  mod_push_keepalive: {}
  mod_register:
    captcha_protected: true
  mod_roster:
    versioning: true
  mod_s2s_dialback: {}
  mod_shared_roster: {}
  mod_stream_mgmt:
    resend_on_timeout: if_offline
  mod_stun_disco: {}
  mod_vcard: {}
  mod_vcard_xupdate: {}
  mod_version:
    show_os: false
ejabberd.log ends with
2024-01-04 03:17:15.185020+01:00 [info] <0.1870.0>@mod_mqtt:init_topic_cache/2:644 Building MQTT cache for zauber.dynv6.net, this may take a while
2024-01-04 03:17:15.186906+01:00 [debug] <0.1790.0>@supervisor:report_progress/2:1565 PROGRESS REPORT:
    supervisor: {local,ejabberd_gen_mod_sup}
    started: [{pid,<0.1870.0>},
              {id,'mod_mqtt_zauber.dynv6.net'},
              {mfargs,{gen_server,start_link,
                                  [{local,'mod_mqtt_zauber.dynv6.net'},
                                   mod_mqtt,
                                   [<<"zauber.dynv6.net">>,
                                    #{cache_size => 1000,max_queue => 5000,
                                      cache_missed => true,db_type => sql,
                                      queue_type => ram,
                                      cache_life_time => 3600000,
                                      use_cache => true,ram_db_type => mnesia,
                                      max_topic_depth => 8,
                                      match_retained_limit => 1000,
                                      session_expiry => 300000,
                                      access_publish => [],
                                      access_subscribe => [],
                                      max_topic_aliases => 100}],
                                   []]}},
              {restart_type,transient},
              {significant,false},
              {shutdown,60000},
              {child_type,worker}]

2024-01-04 03:17:15.187762+01:00 [debug] <0.1717.0>@gen_mod:start_module/4:164 Loading mod_proxy65 at zauber.dynv6.net
2024-01-04 03:17:15.208536+01:00 [debug] <0.1743.0>@supervisor:report_progress/2:1565 PROGRESS REPORT:
    supervisor: {local,ejabberd_listener}
    started: [{pid,<0.1874.0>},
              {id,{7777,{192,168,178,55},tcp}},
              {mfargs,
                  {ejabberd_listener,start,
                      [{7777,{192,168,178,55},tcp},
                       mod_proxy65_stream,
                       #{access => local,name => <<"SOCKS5 Bytestreams">>,
                         port => 7777,recbuf => 65536,send_timeout => 15000,
                         sndbuf => 65536,hosts => [],
                         host => <<"proxy.zauber.dynv6.net">>,
                         ip => {192,168,178,55},
                         supervisor => true,hostname => undefined,
                         backlog => 128,unix_socket => #{},shaper => none,
                         transport => tcp,max_connections => 5,
                         ram_db_type => mnesia,accept_interval => 0,
                         use_proxy_protocol => false,
                         server_host => <<"zauber.dynv6.net">>,
                         vcard => undefined,auth_type => anonymous}]}},
              {restart_type,transient},
              {significant,false},
              {shutdown,brutal_kill},
              {child_type,worker}]

2024-01-04 03:17:15.209713+01:00 [debug] <0.1758.0>@supervisor:report_progress/2:1565 PROGRESS REPORT:
    supervisor: {local,ejabberd_backend_sup}
    started: [{pid,<0.1875.0>},
              {id,mod_proxy65_mnesia},
              {mfargs,{mod_proxy65_mnesia,start_link,[]}},
              {restart_type,transient},
              {significant,false},
              {shutdown,5000},
              {child_type,worker}]

2024-01-04 03:17:15.210396+01:00 [debug] <0.1877.0>@ejabberd_router:register_route/4:174 Route registered: proxy.zauber.dynv6.net
2024-01-04 03:17:15.210786+01:00 [debug] <0.1876.0>@supervisor:report_progress/2:1565 PROGRESS REPORT:
    supervisor: {local,'ejabberd_mod_proxy65_zauber.dynv6.net'}
    started: [{pid,<0.1877.0>},
              {id,mod_proxy65_service},
              {mfargs,
                  {mod_proxy65_service,start_link,[<<"zauber.dynv6.net">>]}},
              {restart_type,transient},
              {significant,false},
              {shutdown,5000},
              {child_type,worker}]

2024-01-04 03:17:15.211253+01:00 [debug] <0.1790.0>@supervisor:report_progress/2:1565 PROGRESS REPORT:
    supervisor: {local,ejabberd_gen_mod_sup}
    started: [{pid,<0.1876.0>},
              {id,'ejabberd_mod_proxy65_zauber.dynv6.net'},
              {mfargs,{mod_proxy65,start_link,[<<"zauber.dynv6.net">>]}},
              {restart_type,transient},
              {significant,false},
              {shutdown,infinity},
              {child_type,supervisor}]

2024-01-04 03:17:15.211649+01:00 [debug] <0.1717.0>@supervisor:report_progress/2:1565 PROGRESS REPORT:
    supervisor: {local,ejabberd_sup}
    started: [{pid,<0.1790.0>},
              {id,ejabberd_gen_mod_sup},
              {mfargs,{gen_mod,start_link,[]}},
              {restart_type,permanent},
              {significant,false},
              {shutdown,infinity},
              {child_type,supervisor}]

@nandlab
Copy link
Author

nandlab commented Jan 7, 2024

I noticed this only happens sometimes with ejabberdctl restart. ejabberd starts fine with service ejabberd restart which internally does ejabberdctl stop and ejabberdctl start.

@nandlab nandlab changed the title Ejabberd crashes on start without any error messages ejabberdctl restart sometimes fails Jan 7, 2024
@badlop badlop self-assigned this Jan 15, 2024
@badlop
Copy link
Member

badlop commented Jan 16, 2024

I can reproduce the problem (or at least I think it's the same problem) one of every two or three restarts.

The problem appears when using any binary installer from master branch, 23.10, 23.04, 22.05

When running ejabberdctl restart, the functions called are:

  • ejabberd_app:start
  • ejabberd_hooks:run(ejabberd_started, [])
  • ejabberd_pkix:ejabberd_started()
  • fast_tls:clear_cache()
  • fast_tls:clear_cache_nif()
  • and this goes to the fast_tls C code that crashes

It is very easy to reproduce:

  • Install ejabberd using binary installers (master branch or any recent release)
  • Start that ejabberd. If you use "ejabberdctl live", you will be able to see its behaviour in realtime
  • Call ejabberdctl restart
  • If it restarts correctly, call it again. After 2 or 3 restarts, the problem should appear.

The problem does not appear when using the container, or ejabberd compiled from source code with Erlang 26.1 (the one used in the installers). In those cases, ejabberdctl restart successfully restart ejabberd everytime.

@badlop badlop assigned prefiks and unassigned badlop Jan 16, 2024
@nosnilmot
Copy link
Contributor

Is this seen with OpenSSL 3, and when no dhfile is configured for listeners in ejabberd.yml? if so, this may be fixed by processone/fast_tls#63

@badlop
Copy link
Member

badlop commented Jan 26, 2024

The problem is reproducible when using binary installers (23.10, older releases, also master release). Those installers are built by tools/make-binaries which downloads and compiles openssl 1.1.1w

ssl_vsn='1.1.1w'

As you mentioned, fast_tls included in git some fixes. It's now tagged and used by ejabberd, but testing installers that use latest fast_tls throws the same problem.

I guess it's worth trying to update binary installers to OpenSSL 3, but unfortunately I get compilation problems in that case.

@Neustradamus
Copy link
Contributor

Neustradamus commented Jan 27, 2024

@badlop
Copy link
Member

badlop commented Feb 9, 2024

The newest ejabberd installers use OpenSSL 3.2.1 (since 1962fc8) and fast_tls 1.1.18, which includes the fix processone/fast_tls@da16622. Those installers can be downloaded from https://github.com/processone/ejabberd/actions/runs/7844499079

However, the problem mentioned in this issue is still present.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants