From a53f91638229dbd74304ce070343d268ef4a1b2e Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?R=C3=A9gis=20Behmo?= Date: Tue, 9 Apr 2024 11:42:55 +0200 Subject: [PATCH] docs: improvements to the "troubleshooting" page I tried to remove as many "you" pronouns as possible. Also, clarify instructions on the "images build" resource issue. --- docs/troubleshooting.rst | 70 +++++++++++++++++++--------------------- 1 file changed, 33 insertions(+), 37 deletions(-) diff --git a/docs/troubleshooting.rst b/docs/troubleshooting.rst index a6594a399a..8e44a8da1e 100644 --- a/docs/troubleshooting.rst +++ b/docs/troubleshooting.rst @@ -6,15 +6,15 @@ Troubleshooting What should you do if you have a problem? .. warning:: - **Do not** create a Github issue! + **Do not** create a GitHub issue! 1. Read the error logs that appear in the console. When running a single server platform as daemon, you can view the logs with the ``tutor local logs`` command. (see :ref:`logging` below) 2. Check if your problem already has a solution right here in the :ref:`troubleshooting` section. -3. Search for your problem in the `open and closed Github issues `_. +3. Search for your problem in the `open and closed GitHub issues `_. 4. Search for your problem in the (now legacy) `Tutor community forums `__. 5. Search for your problem in the `Open edX community forum `__. 6. If despite all your efforts, you can't solve the problem by yourself, you should discuss it in the `Open edX community forum `__. Please give as many details about your problem as possible! As a rule of thumb, **people will not dedicate more time to solving your problem than you took to write your question**. You should tag your topic with "tutor" or the corresponding Tutor plugin name ("tutor-discovery", etc.) in order to notify the maintainers. -7. If you are *absolutely* positive that you are facing a technical issue with Tutor, and not with Open edX, not with your server, not your custom configuration, then, and only then, should you open an issue on `Github `__. You *must* follow the instructions from the issue template!!! If you do not follow this procedure, your Github issues will be mercilessly closed 🤯. +7. If you are *absolutely* positive that you are facing a technical issue with Tutor, and not with Open edX, not with your server, not your custom configuration, then, and only then, should you open an issue on `GitHub `__. You *must* follow the instructions from the issue template!!! If you do not follow this procedure, your GitHub issues will be mercilessly closed 🤯. Do you need professional assistance with your Open edX platform? `Edly `__ provides online support as part of its `Open edX installation service `__. @@ -24,7 +24,7 @@ Logging ------- .. note:: - Logs are of paramount importance for debugging Tutor. When asking for help on the `Open edX forum `__, **you should always include the unedited logs of your app**. You can get those with:: + Logs are of paramount importance for debugging Tutor. When asking for help on the `Open edX forum `__, **always include the unedited logs of your app**. Logs are obtained with:: tutor local logs --tail=100 -f @@ -36,11 +36,11 @@ To view the logs from just one container, for instance, the webserver:: tutor local logs --follow caddy -The last commands produce the logs since the creation of the containers, which can be a lot. Similar to a ``tail -f``, you can run:: +The last commands produce the logs since the creation of the containers, which may be a lot. Similar to a ``tail -f``, past logs can be removed with:: - tutor local logs --tail=0 -f + tutor local logs --tail=0 --follow -If you'd rather use a graphical user interface for viewing logs, you are encouraged to try out :ref:`Portainer `. +User who are more comfortable with a graphical user interface for viewing logs are encouraged to try out :ref:`Portainer `. .. _webserver: @@ -65,9 +65,9 @@ If the above command does not work, you should fix your Docker installation. Som Open edX requires at least 4 GB RAM, in particular, to run the SQL migrations. If the ``tutor local launch`` command dies after displaying "Running migrations", you most probably need to buy more memory or add swap to your machine. -On macOS, by default, Docker allocates at most 2 GB of RAM to containers. ``launch`` tries to check your current allocation and outputs a warning if it can't find a value of at least 4 GB. You should follow `these instructions from the official Docker documentation `__ to allocate at least 4-5 GB to the Docker daemon. +On macOS, by default, Docker allocates at most 2 GB of RAM to containers. ``launch`` tries to check the current allocation and outputs a warning if it can't find a value of at least 4 GB. Follow `these instructions from the official Docker documentation `__ to allocate at least 4-5 GB to the Docker daemon. -If migrations were killed halfway, there is a good chance that the MySQL database is in a state that is hard to recover from. The easiest way to recover is simply to delete all the MySQL data and restart the launch process. After you have allocated more memory to the Docker daemon, run:: +If migrations were killed halfway, there is a good chance that the MySQL database is in a state that is hard to recover from. The easiest way to recover is to delete all the MySQL data and restart the launch process. After more memory has been allocated to the Docker daemon, run:: tutor local stop sudo rm -rf "$(tutor config printroot)/data/mysql" @@ -79,26 +79,26 @@ If migrations were killed halfway, there is a good chance that the MySQL databas "Can't connect to MySQL server on 'mysql:3306' (111)" ----------------------------------------------------- -The most common reason this happens is that you are running two different instances of Tutor simultaneously, causing a port conflict between MySQL containers. Tutor will try to prevent you from doing that (for example, it will stop ``local`` containers if you start ``dev`` ones, and vice versa), but it cannot prevent all edge cases. So, as a first step, stop all possible Tutor platform variants:: +The most common reason this happens is that two different instances of Tutor are running simultaneously, causing a port conflict between MySQL containers. Tutor will try to prevent this situation from happening (for example, it will stop ``local`` containers when running ``tutor dev`` commands, and vice versa), but it cannot prevent all edge cases. So, as a first step, stop all possible Tutor platform variants:: tutor dev stop tutor local stop tutor k8s stop -And then run your command(s) again, ensuring you're consistently using the correct Tutor variant (``tutor dev``, ``tutor local``, or ``tutor k8s``). +And then run the command(s) again, ensuring the correct Tutor variant is consistently used (``tutor dev``, ``tutor local``, or ``tutor k8s``). -If that doesn't work, then check if you have any other Docker containers running that may using port 3306:: +If that does not work, then check if there are any other Docker containers running that may be using port 3306:: docker ps -a -For example, if you have ever used `Tutor Nightly `_, check whether you still have ``tutor_nightly_`` containers running. Conversely, if you're trying to run Tutor Nightly now, check whether you have non-Nightly ``tutor_`` containers running. If so, switch to that other version of Tutor, run ``tutor (dev|local|k8s) stop``, and then switch back to your preferred version of Tutor. +For example, if you have ever used :ref:`Tutor Nightly `, check whether there are still ``tutor_nightly_`` containers running. Conversely, if trying to run Tutor Nightly now, check whether there are non-Nightly ``tutor_`` containers running. If so, switch to that other version of Tutor, run ``tutor (dev|local|k8s) stop``, and then switch back to the preferred version of Tutor. Alternatively, if there are any other non-Tutor containers using port 3306, then stop and remove them:: docker stop docker rm -Finally, if you've ensured that containers or other programs are making use of port 3306, check the logs of the MySQL container itself:: +Finally, if no container or other programs are making use of port 3306, check the logs of the MySQL container itself:: tutor (dev|local|k8s) logs mysql @@ -108,17 +108,17 @@ Check whether the MySQL container is crashing upon startup, and if so, what is c Help! The Docker containers are eating all my RAM/CPU/CHEESE ------------------------------------------------------------ -You can identify which containers are consuming most resources by running:: +Containers that are consuming most resources are identified by running:: docker stats In idle mode, the "mysql" container should use ~200MB memory; ~200-300MB for the the "lms" and "cms" containers. -On some operating systems, such as RedHat, Arch Linux or Fedora, a very high limit of the number of open files (``nofile``) per container may cause the "mysql", "lms" and "cms" containers to use a lot of memory: up to 8-16GB. To check whether you might impacted, run:: +On some operating systems, such as RedHat, Arch Linux or Fedora, a very high limit of the number of open files (``nofile``) per container may cause the "mysql", "lms" and "cms" containers to use a lot of memory: up to 8-16GB. To check whether a platforms is impacted, run:: cat /proc/$(pgrep dockerd)/limits | grep "Max open files" -If the output is 1073741816 or higher, then it is likely that you are affected by `this mysql issue `__. To learn more about the root cause, read `this containerd issue comment `__. Basically, the OS is hard-coding a very high limit for the allowed number of open files, and this is causing some containers to fail. To resolve the problem, you should configure the Docker daemon to enforce a lower value, as described `here `__. Edit ``/etc/docker/daemon.json`` and add the following contents:: +If the output is 1073741816 or higher, then it is likely that the OS is affected by `this MySQL issue `__. To learn more about the root cause, read `this containerd issue comment `__. Basically, the OS is hard-coding a very high limit for the allowed number of open files, and this is causing some containers to fail. To resolve the problem, configure the Docker daemon to enforce a lower value, as described `here `__. Edit ``/etc/docker/daemon.json`` and add the following contents:: { "default-ulimits": { @@ -130,7 +130,7 @@ If the output is 1073741816 or higher, then it is likely that you are affected b } } -Check your configuration is valid with:: +Check the configuration is valid with:: dockerd --validate @@ -138,7 +138,7 @@ Then restart the Docker service:: sudo systemctl restart docker.service -Launch your Open edX platform again with ``tutor local launch``. You should observe normal memory usage. +Launch the Open edX platform again with ``tutor local launch``. We should observe normal memory usage. "Build failed running pavelib.servers.lms: Subprocess return code: 1" ----------------------------------------------------------------------- @@ -149,11 +149,11 @@ Launch your Open edX platform again with ``tutor local launch``. You should obse ... Build failed running pavelib.servers.lms: Subprocess return code: 1`" -This might occur when you run a ``paver`` command. ``/dev/null`` eats the actual error, so you will have to run the command manually. Run ``tutor dev shell lms`` (or ``tutor dev shell cms``) to open a bash session and then:: +This might occur when running a ``paver`` command. ``/dev/null`` eats the actual error, so we have to run the command manually to figure out the actual error. Run ``tutor dev shell lms`` (or ``tutor dev shell cms``) to open a bash session and then:: python manage.py lms print_setting STATIC_ROOT -The error produced should help you better understand what is happening. +The error produced should help better understand what is happening. The chosen default language does not display properly ----------------------------------------------------- @@ -163,40 +163,36 @@ By default, Open edX comes with a `limited set `__. +To learn more, check out `this GitHub issue `__. .. _high_resource_consumption: -High resource consumption on ``tutor images build`` by docker +High resource consumption by Docker on ``tutor images build`` ------------------------------------------------------------- -This issue can occur when building multiple images simultaneously by Docker, issue specifically related to BuildKit. - - -Create a buildkit.toml configuration file with the following contents:: +Some Docker images include many independent layers which are built in parallel by BuildKit. As a consequence, building these images will use up a lot of resources, sometimes even crashing the Docker daemon. To bypass this issue, we should explicitely limit the `maximum parallelism of BuildKit `__. Create a ``buildkit.toml`` configuration file with the following contents:: [worker.oci] max-parallelism = 2 -This configuration file limits the number of layers built concurrently to 2, which can significantly reduce resource consumption. +This configuration file limits the number of layers built concurrently to 2, but we should select a value that is appropriate for our machine. -Create a builder that uses this configuration:: +Then, create a builder named "max2cpu" that uses this configuration, and start using it right away:: - docker buildx create --use --name= --driver=docker-container --config=/path/to/buildkit.toml - -Replace with a suitable name for your builder, and ensure that you specify the correct path to the buildkit.toml configuration file. + # don't forget to specify the correct path to the buildkit.toml configuration file + docker buildx create --use --name=max2cpu --driver=docker-container --config=/path/to/buildkit.toml Now build again:: - tutor images build + tutor images build all -All build commands should now make use of the newly configured builder. To later revert to the default builder, run ``docker buildx use default``. +All build commands should now make use of the newly configured builder. To later revert to the default builder, run ``docker buildx use default``. -.. note:: +.. note:: Setting a too low value for maximum parallelism will result in longer build times. fatal: the remote end hung up unexpectedly / fatal: early EOF / fatal: index-pack failed when running ``tutor images build ...`` @@ -204,6 +200,6 @@ fatal: the remote end hung up unexpectedly / fatal: early EOF / fatal: index-pac This issue can occur due to problems with the network connection while cloning edx-platform which is a fairly large repository. -First, try to run the same command once again to see if it works as the network connection can sometimes drop during the build process. +First, try to run the same command once again to see if it works, as the network connection can sometimes drop during the build process. If that does not work, follow the tutorial above for :ref:`High resource consumption ` to limit the number of concurrent build steps so that the network connection is not being shared between multiple layers at once.