Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Failed to run mig.sh on MIG dataproc-2.1-ubuntu20 #11675

Open
yinqingh opened this issue Oct 30, 2024 · 3 comments
Open

[BUG] Failed to run mig.sh on MIG dataproc-2.1-ubuntu20 #11675

yinqingh opened this issue Oct 30, 2024 · 3 comments
Assignees
Labels
bug Something isn't working

Comments

@yinqingh
Copy link
Collaborator

yinqingh commented Oct 30, 2024

Describe the bug
Observed following error while running mig.sh on dataproc-2.1-ubuntu20 with runtime version "2.1.72-ubuntu20" and kernel version "5.15.0-1067-gcp".

 make -f ./scripts/Makefile.modpost
   sed 's/\.ko$/\.o/' /var/lib/dkms/nvidia/495.29.05/build/modules.order | scripts/mod/modpost -m -a  -o /var/lib/dkms/nvidia/495.29.05/build/Module.symvers -e -i Module.symvers   -T -
 ERROR: modpost: GPL-incompatible module nvidia.ko uses GPL-only symbol 'rcu_read_unlock_strict'
 make[2]: *** [scripts/Makefile.modpost:133: /var/lib/dkms/nvidia/495.29.05/build/Module.symvers] Error 1

Tried with some old dataproc runtime versions. It works with runtime version "2.1.40-ubuntu20" and kernel version "5.15.0-1049-gcp".

Steps/Code to reproduce bug

  1. Create dataproc cluster using MIG with nvidia-tesla-a100 gpu and runtime version "2.1.72-ubuntu20"
  2. ssh to gpu node
  3. download mig.sh
  4. sudo bash mig.sh

Expected behavior
succeed to run mig.sh

Environment details (please complete the following information)

  • Environment location: Dataproc, version 2.1.72-ubuntu20
@yinqingh yinqingh added ? - Needs Triage Need team to review and classify bug Something isn't working labels Oct 30, 2024
@pxLi
Copy link
Collaborator

pxLi commented Oct 30, 2024

thanks for the investigation!

@sameerz This is the reason why mig-on-dataproc-2.1-ubuntu20 has been failing to initialize recently.

@yinqingh yinqingh changed the title [BUG] Failed to run mig.sh on dataproc-2.1-ubuntu20 [BUG] Failed to run mig.sh on MIG dataproc-2.1-ubuntu20 Oct 30, 2024
@sameerz sameerz removed the ? - Needs Triage Need team to review and classify label Nov 5, 2024
@SurajAralihalli
Copy link
Collaborator

SurajAralihalli commented Nov 7, 2024

Hello @yinqingh, I think you're using a different version of /gpu/mig.sh
Can you try with /spark-rapids/mig.sh?

I’ll inform the repository maintainers about this inconsistency.

Edit: Created issue GoogleCloudDataproc/initialization-actions#1259

@yinqingh
Copy link
Collaborator Author

yinqingh commented Nov 8, 2024

Hi @SurajAralihalli , I tried with spark-rapids/mig.sh but it still failed in installing nvidia driver (535.104.05) with the same error. The dataproc runtime version is "2.1.73-ubuntu20".

 make -f ./scripts/Makefile.modpost
   sed 's/\.ko$/\.o/' /var/lib/dkms/nvidia/535.104.05/build/modules.order | scripts/mod/modpost -m -a  -o /var/lib/dkms/nvidia/535.104.05/build/Module.symvers -e -i Module.symvers   -T -
 ERROR: modpost: GPL-incompatible module nvidia.ko uses GPL-only symbol 'rcu_read_unlock_strict'
 make[2]: *** [scripts/Makefile.modpost:133: /var/lib/dkms/nvidia/535.104.05/build/Module.symvers] Error 1
 make[2]: *** Deleting file '/var/lib/dkms/nvidia/535.104.05/build/Module.symvers'
 make[1]: *** [Makefile:1829: modules] Error 2
 make[1]: Leaving directory '/usr/src/linux-headers-5.15.0-1070-gcp'
 make: *** [Makefile:82: modules] Error 2
DKMSKernelVersion: 5.15.0-1070-gcp
Date: Fri Nov  8 09:07:43 2024
Package: nvidia-dkms-535 535.104.05-0ubuntu1
PackageVersion: 535.104.05-0ubuntu1
SourcePackage: nvidia-graphics-drivers-535
Title: nvidia-dkms-535 535.104.05-0ubuntu1: nvidia kernel module failed to build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants