Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error while running ML algorthims: No module named numpy #112

Open
shwetamittal019 opened this issue Dec 23, 2020 · 7 comments
Open

Error while running ML algorthims: No module named numpy #112

shwetamittal019 opened this issue Dec 23, 2020 · 7 comments

Comments

@shwetamittal019
Copy link

File "/usr/bin/spark-3.0.1-bin-hadoop3.2/python/lib/pyspark.zip/pyspark/ml/param/init.py", line 26, in
import numpy as np
ModuleNotFoundError: No module named 'numpy'

Please help

@danielschulz
Copy link

Have you installed a Python dependency manager and installed nympy with it? Else, I see your missing step...

@GezimSejdiu
Copy link
Member

Hi @shwetamittal019 ,

are you running the example within the python-template? or directly on spark-shell in an iterative way? If via python-template you can add numpy as one of the dependencies on your requirement.txt file and i will be installed on build:

ONBUILD COPY requirements.txt /app/
ONBUILD RUN cd /app \
&& pip3 install -r requirements.txt

Feel free to comment more so that we can help. Or better, feel free to share your use-case so that we can also reproduce.

Best regards,

@j-juric
Copy link

j-juric commented Apr 23, 2021

Hi @GezimSejdiu I am also having trouble with this. I did add numpy to requirements.txt yet upon starting the container while the numpy module is being installed I'm getting this error:

`Step 1/12 : FROM bde2020/spark-python-template:2.4.3-hadoop2.7

Executing 3 build triggers

---> Running in fc96aff6d8d3
Collecting Cython (from -r requirements.txt (line 1))
Downloading https://files.pythonhosted.org/packages/f6/e3/293d7d18a64dde5e60f809c5c3354ee812af713b1679c74708f88986a6b6/Cython-0.29.23-py2.py3-none-any.whl (978kB)
Collecting numpy==1.18.1 (from -r requirements.txt (line 2))
Downloading https://files.pythonhosted.org/packages/40/de/0ea5092b8bfd2e3aa6fdbb2e499a9f9adf810992884d414defc1573dca3f/numpy-1.18.1.zip (5.4MB)
Installing build dependencies: started
Installing build dependencies: finished with status 'done'
Getting requirements to build wheel: started
Getting requirements to build wheel: finished with status 'done'
Preparing wheel metadata: started
Preparing wheel metadata: finished with status 'error'
Complete output from command /usr/bin/python3.7 /usr/lib/python3.7/site-packages/pip/_vendor/pep517/_in_process.py prepare_metadata_for_build_wheel /tmp/tmp2ikrbto3:
Processing numpy/random/_bounded_integers.pxd.in
Processing numpy/random/_bounded_integers.pyx.in
Processing numpy/random/_common.pyx
Processing numpy/random/_bit_generator.pyx
Processing numpy/random/_generator.pyx
Processing numpy/random/_philox.pyx
Processing numpy/random/mtrand.pyx
Processing numpy/random/_sfc64.pyx
Processing numpy/random/_pcg64.pyx
Processing numpy/random/_mt19937.pyx
Cythonizing sources
blas_opt_info:
blas_mkl_info:
customize UnixCCompiler
libraries mkl_rt not found in ['/usr/local/lib', '/usr/lib']
NOT AVAILABLE

blis_info:
  libraries blis not found in ['/usr/local/lib', '/usr/lib']
  NOT AVAILABLE

openblas_info:
  libraries openblas not found in ['/usr/local/lib', '/usr/lib']
  NOT AVAILABLE

atlas_3_10_blas_threads_info:
Setting PTATLAS=ATLAS
  libraries tatlas not found in ['/usr/local/lib', '/usr/lib']
  NOT AVAILABLE

atlas_3_10_blas_info:
  libraries satlas not found in ['/usr/local/lib', '/usr/lib']
  NOT AVAILABLE

atlas_blas_threads_info:
Setting PTATLAS=ATLAS
  libraries ptf77blas,ptcblas,atlas not found in ['/usr/local/lib', '/usr/lib']
  NOT AVAILABLE

atlas_blas_info:
  libraries f77blas,cblas,atlas not found in ['/usr/local/lib', '/usr/lib']
  NOT AVAILABLE

accelerate_info:
  NOT AVAILABLE

blas_info:
  libraries blas not found in ['/usr/local/lib', '/usr/lib']
  NOT AVAILABLE

blas_src_info:
  NOT AVAILABLE

  NOT AVAILABLE

/bin/sh: svnversion: not found
non-existing path in 'numpy/distutils': 'site.cfg'
lapack_opt_info:
lapack_mkl_info:
  libraries mkl_rt not found in ['/usr/local/lib', '/usr/lib']
  NOT AVAILABLE

openblas_lapack_info:
  libraries openblas not found in ['/usr/local/lib', '/usr/lib']
  NOT AVAILABLE

openblas_clapack_info:
  libraries openblas,lapack not found in ['/usr/local/lib', '/usr/lib']
  NOT AVAILABLE

flame_info:
  libraries flame not found in ['/usr/local/lib', '/usr/lib']
  NOT AVAILABLE

atlas_3_10_threads_info:
Setting PTATLAS=ATLAS
  libraries lapack_atlas not found in /usr/local/lib
  libraries tatlas,tatlas not found in /usr/local/lib
  libraries lapack_atlas not found in /usr/lib
  libraries tatlas,tatlas not found in /usr/lib
<class 'numpy.distutils.system_info.atlas_3_10_threads_info'>
  NOT AVAILABLE

atlas_3_10_info:
  libraries lapack_atlas not found in /usr/local/lib
  libraries satlas,satlas not found in /usr/local/lib
  libraries lapack_atlas not found in /usr/lib
  libraries satlas,satlas not found in /usr/lib
<class 'numpy.distutils.system_info.atlas_3_10_info'>
  NOT AVAILABLE

atlas_threads_info:
Setting PTATLAS=ATLAS
  libraries lapack_atlas not found in /usr/local/lib
  libraries ptf77blas,ptcblas,atlas not found in /usr/local/lib
  libraries lapack_atlas not found in /usr/lib
  libraries ptf77blas,ptcblas,atlas not found in /usr/lib
<class 'numpy.distutils.system_info.atlas_threads_info'>
  NOT AVAILABLE

atlas_info:
  libraries lapack_atlas not found in /usr/local/lib
  libraries f77blas,cblas,atlas not found in /usr/local/lib
  libraries lapack_atlas not found in /usr/lib
  libraries f77blas,cblas,atlas not found in /usr/lib
<class 'numpy.distutils.system_info.atlas_info'>
  NOT AVAILABLE

lapack_info:
  libraries lapack not found in ['/usr/local/lib', '/usr/lib']
  NOT AVAILABLE

lapack_src_info:
  NOT AVAILABLE

  NOT AVAILABLE

running dist_info
running build_src
build_src
building py_modules sources
creating build
creating build/src.linux-x86_64-3.7
creating build/src.linux-x86_64-3.7/numpy
creating build/src.linux-x86_64-3.7/numpy/distutils
building library "npymath" sources
Could not locate executable gfortran
Could not locate executable f95
Could not locate executable ifort
Could not locate executable ifc
Could not locate executable lf95
Could not locate executable pgfortran
Could not locate executable f90
Could not locate executable f77
Could not locate executable fort
Could not locate executable efort
Could not locate executable efc
Could not locate executable g77
Could not locate executable g95
Could not locate executable pathf95
Could not locate executable nagfor
don't know how to compile Fortran code on platform 'posix'
Running from numpy source directory.
setup.py:461: UserWarning: Unrecognized setuptools command, proceeding with generating Cython sources and expanding templates
  run_build = parse_setuppy_commands()
/tmp/pip-install-n0yoj555/numpy/numpy/distutils/system_info.py:1896: UserWarning:
    Optimized (vendor) Blas libraries are not found.
    Falls back to netlib Blas library which has worse performance.
    A better performance should be easily gained by switching
    Blas library.
  if self._calc_info(blas):
/tmp/pip-install-n0yoj555/numpy/numpy/distutils/system_info.py:1896: UserWarning:
    Blas (http://www.netlib.org/blas/) libraries not found.
    Directories to search for the libraries can be specified in the
    numpy/distutils/site.cfg file (section [blas]) or by setting
    the BLAS environment variable.
  if self._calc_info(blas):
/tmp/pip-install-n0yoj555/numpy/numpy/distutils/system_info.py:1896: UserWarning:
    Blas (http://www.netlib.org/blas/) sources not found.
    Directories to search for the sources can be specified in the
    numpy/distutils/site.cfg file (section [blas_src]) or by setting
    the BLAS_SRC environment variable.
  if self._calc_info(blas):
/tmp/pip-install-n0yoj555/numpy/numpy/distutils/system_info.py:1730: UserWarning:
    Lapack (http://www.netlib.org/lapack/) libraries not found.
    Directories to search for the libraries can be specified in the
    numpy/distutils/site.cfg file (section [lapack]) or by setting
    the LAPACK environment variable.
  return getattr(self, '_calc_info_{}'.format(name))()
/tmp/pip-install-n0yoj555/numpy/numpy/distutils/system_info.py:1730: UserWarning:
    Lapack (http://www.netlib.org/lapack/) sources not found.
    Directories to search for the sources can be specified in the
    numpy/distutils/site.cfg file (section [lapack_src]) or by setting
    the LAPACK_SRC environment variable.
  return getattr(self, '_calc_info_{}'.format(name))()
/usr/lib/python3.7/distutils/dist.py:274: UserWarning: Unknown distribution option: 'define_macros'
  warnings.warn(msg)
Traceback (most recent call last):
  File "/usr/lib/python3.7/site-packages/pip/_vendor/pep517/_in_process.py", line 207, in <module>
    main()
  File "/usr/lib/python3.7/site-packages/pip/_vendor/pep517/_in_process.py", line 197, in main
    json_out['return_val'] = hook(**hook_input['kwargs'])
  File "/usr/lib/python3.7/site-packages/pip/_vendor/pep517/_in_process.py", line 69, in prepare_metadata_for_build_wheel
    return hook(metadata_directory, config_settings)
  File "/tmp/pip-build-env-obypaer2/overlay/lib/python3.7/site-packages/setuptools/build_meta.py", line 166, in prepare_metadata_for_build_wheel
    self.run_setup()
  File "/tmp/pip-build-env-obypaer2/overlay/lib/python3.7/site-packages/setuptools/build_meta.py", line 259, in run_setup
    self).run_setup(setup_script=setup_script)
  File "/tmp/pip-build-env-obypaer2/overlay/lib/python3.7/site-packages/setuptools/build_meta.py", line 150, in run_setup
    exec(compile(code, __file__, 'exec'), locals())
  File "setup.py", line 488, in <module>
    setup_package()
  File "setup.py", line 480, in setup_package
    setup(**metadata)
  File "/tmp/pip-install-n0yoj555/numpy/numpy/distutils/core.py", line 171, in setup
    return old_setup(**new_attr)
  File "/tmp/pip-build-env-obypaer2/overlay/lib/python3.7/site-packages/setuptools/__init__.py", line 153, in setup
    return distutils.core.setup(**attrs)
  File "/usr/lib/python3.7/distutils/core.py", line 148, in setup
    dist.run_commands()
  File "/usr/lib/python3.7/distutils/dist.py", line 966, in run_commands
    self.run_command(cmd)
  File "/usr/lib/python3.7/distutils/dist.py", line 985, in run_command
    cmd_obj.run()
  File "/tmp/pip-build-env-obypaer2/overlay/lib/python3.7/site-packages/setuptools/command/dist_info.py", line 31, in run
    egg_info.run()
  File "/tmp/pip-install-n0yoj555/numpy/numpy/distutils/command/egg_info.py", line 26, in run
    self.run_command("build_src")
  File "/usr/lib/python3.7/distutils/cmd.py", line 313, in run_command
    self.distribution.run_command(command)
  File "/usr/lib/python3.7/distutils/dist.py", line 985, in run_command
    cmd_obj.run()
  File "/tmp/pip-install-n0yoj555/numpy/numpy/distutils/command/build_src.py", line 146, in run
    self.build_sources()
  File "/tmp/pip-install-n0yoj555/numpy/numpy/distutils/command/build_src.py", line 157, in build_sources
    self.build_library_sources(*libname_info)
  File "/tmp/pip-install-n0yoj555/numpy/numpy/distutils/command/build_src.py", line 290, in build_library_sources
    sources = self.generate_sources(sources, (lib_name, build_info))
  File "/tmp/pip-install-n0yoj555/numpy/numpy/distutils/command/build_src.py", line 380, in generate_sources
    source = func(extension, build_dir)
  File "numpy/core/setup.py", line 661, in get_mathlib_info
    raise RuntimeError("Broken toolchain: cannot link a simple C program")
RuntimeError: Broken toolchain: cannot link a simple C program

----------------------------------------

Command "/usr/bin/python3.7 /usr/lib/python3.7/site-packages/pip/_vendor/pep517/_in_process.py prepare_metadata_for_build_wheel /tmp/tmp2ikrbto3" failed with error code 1 in /tmp/pip-install-n0yoj555/numpy
You are using pip version 19.0.3, however version 21.0.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
ERROR: Service 'train-model' failed to build: The command '/bin/sh -c cd /app && pip3 install -r requirements.txt' returned a non-zero code: 1
`

I guess the container is missing gcc (from what I have been able to find on google) and thus it cannot install this module.

@Philip-os
Copy link

I am having the same issue as well.

@j-juric
Copy link

j-juric commented Apr 27, 2021

I was able to install numpy by adding this line in my dockerfile.

RUN apk add --no-cache py3-numpy

@dusanjovanovich
Copy link

dusanjovanovich commented Jun 20, 2021

You could also extend the spark-submit image and install build dependencies before running pip install. You cannot do this with the Python template image though, that's why I decided to go with the submit image.

Something like this:

FROM bde2020/spark-submit:3.1.1-hadoop3.2

# Add build dependencies for c-libraries (important for building numpy and other sci-libs)
RUN apk --no-cache add --virtual build-deps musl-dev linux-headers g++ gcc python3-dev

# Copy the requirements.txt first, for separate dependency resolving and downloading
COPY app/requirements.txt /app/
RUN cd /app \ && pip3 install -r requirements.txt

@devAmoghS
Copy link

Run this on all the CLI of the containers

apk --no-cache --update-cache add gcc gfortran python python-dev py-pip build-base wget freetype-dev libpng-dev openblas-dev

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants