Skip to content

Commit

Permalink
Add support of Amazon EFA (#179)
Browse files Browse the repository at this point in the history
Address issue: EFA Support #167

The upstream JAX container contains only MOFED NIC support. The MOFED
package
from Mellanox that we use installs *ibverbs* libraries which do not
contain
    libefa*.so which are required for AWS.
    
A temporary solution is to provide a script as a part of base container
(/usr/local/bin/install-efa.sh) that AWS user can run inside the
container
    to handle this issue.
    The script does the following:
    1. Remove all *ibverbs* and RDMA related libraries
    2. Download Amazon EFA installer
    3. Install EFA

**How to use**: in the running container run **install-efa.sh** script:
`
root@<container-id> $> install-efa.sh
`

---------

Co-authored-by: Vladislav Kozlov <vkozlovnvidia.com>
Co-authored-by: Yu-Hang "Maxin" Tang <Tang.Maxin@gmail.com>
  • Loading branch information
DwarKapex and yhtang authored Sep 1, 2023
1 parent 7e76672 commit d7c3fb9
Show file tree
Hide file tree
Showing 2 changed files with 46 additions and 0 deletions.
8 changes: 8 additions & 0 deletions .github/container/Dockerfile.base
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,14 @@ RUN install-cudnn.sh
ADD install-ofed.sh /usr/local/bin
RUN install-ofed.sh

##############################################################################
## Amazon EFA support (need to run it inside container separately)
##############################################################################

ADD install-efa.sh /usr/local/bin
ENV LD_LIBRARY_PATH=/opt/amazon/efa/lib:${LD_LIBRARY_PATH}
ENV PATH=/opt/amazon/efa/bin:${PATH}

###############################################################################
## Emergency fix: nsys not in PATH
###############################################################################
Expand Down
38 changes: 38 additions & 0 deletions .github/container/install-efa.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
#!/bin/bash

set -ex

# Update distro
apt-get update

# Install required packages
apt-get install -y curl

# clean up all previously installed library to avoid conflicts
# while installing Amazon EFA version
dpkg --purge efa-config efa-profile libfabric openmpi \
ibacm ibverbs-providers ibverbs-utils infiniband-diags \
libibmad-dev libibmad5 libibnetdisc-dev libibnetdisc5 \
libibumad-dev libibumad3 libibverbs-dev libibverbs1 librdmacm-dev \
librdmacm1 rdma-core rdmacm-utils

# Download Amazon EFA package and install
EFA_INSTALLER_VERSION=latest
WORKDIR=$(mktemp -d)

pushd ${WORKDIR}

AMAZON_EFA_LINK="https://efa-installer.amazonaws.com/aws-efa-installer-${EFA_INSTALLER_VERSION}.tar.gz"
curl -O "$AMAZON_EFA_LINK"
tar -xf aws-efa-installer-${EFA_INSTALLER_VERSION}.tar.gz && cd aws-efa-installer
./efa_installer.sh -y -g -d --skip-kmod --skip-limit-conf --no-verify

popd

# check the installation is successful
/opt/amazon/efa/bin/fi_info --version

# Clean up
apt-get clean
rm -rf /var/lib/apt/lists/*
rm -rf ${WORKDIR}

0 comments on commit d7c3fb9

Please sign in to comment.