Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA 10.2 #31

Closed
blueberry opened this issue Dec 21, 2019 · 21 comments
Closed

CUDA 10.2 #31

blueberry opened this issue Dec 21, 2019 · 21 comments

Comments

@blueberry
Copy link

Hi Marco!

You probably already know that Nvidia released a new version update; I'm just opening this issue as a reference point for (I hope) upcoming support for CUDA 10.2 in JCuda. Any plans for working on this? As usual, I'll build (and test via Neanderthal & ClojureCUDA on Linux) the Linux and MacOS binaries.

@jcuda
Copy link
Owner

jcuda commented Dec 25, 2019

Yes, I already mentioned this at the bottom of https://forum.byte-welt.net/t/why-jcuda-10-1-requires-usr-local-bfm-lib64-libstdc-so-6-version-cxxabi-1-3-8/21225 - and as the thread suggests: There have been some issues with the dependency versions.

As mentioned in the forum thread, there don't seem to be many updates or changes, so on the one hand, this has "low priority" (because there are no important new features), but also "high priority" because it can probably be done quickly - once I have done the update locally. Maybe I have some time for that in the next days, during the "holidays", but in any case, it's already on the radar.

@jcuda
Copy link
Owner

jcuda commented Jan 10, 2020

Just a short note: I've started the update locally. There are some changes of which I'm not entirely sure whether and how they can be (sensibly) mapped to the Java world. Things like NvSciBuf/NvSciSync will likely be omitted. Other features like UVA are about to be added, but I have no idea whether they make sense for Java (or whether they are mainly intended for the "External Memory and Semaphore" operations, which aren't supported in Java anyhow). Also, CUSPARSE has a new library interface for Multi-GPU support and some deprecations - I still have to sort this out, but will continue with that during the weekend and beginning of next week.

@jcuda
Copy link
Owner

jcuda commented Jan 21, 2020

With the "usual" delay, the update is done.

The "Release Candidate" that is supposed to be used for building the natives is tagged version-10.2.0-RC00, which currently matches the master branch.

@blueberry As mentioned in jcuda/jcudnn#4 (comment) , we'll have to sort out this dependency issue for the release. I'll ping the guy who contributed the latest natives at https://forum.byte-welt.net/t/why-jcuda-10-1-requires-usr-local-bfm-lib64-libstdc-so-6-version-cxxabi-1-3-8/21225/20 , maybe he can create them for the new version as well.

(Edit: I wrote him a note at https://forum.byte-welt.net/t/why-jcuda-10-1-requires-usr-local-bfm-lib64-libstdc-so-6-version-cxxabi-1-3-8/21225/21 )


Some sort of "release notes" (or maybe a rant...?) for those who are interested:

There have been some hiccups related to CUSOLVER (not CUSPARSE as I said above). They introduced a new API, cusolverMg, for multi-GPU solvers. It required a (minor) update of the FindCUDA CMake script, and the bindings are basically, there, but this is currently totally untested: They did add some snippets showing how to use the Muti-GPU API at https://docs.nvidia.com/cuda/cusolver/index.html#mgsyevd_examples , and I wasted some time trying to port this to Java, but it is ridiculously complicated, and so I deferred this (also because I cannot properly test it anyhow - I only have a single GPU right now).

It seems very unlikely that anybody will use cusolverMg anyhow, and even less from Java (which isn't very motivating, and may explain some of the delay of the update, admittedly...).

Additionally, I noticed that in the CUSOLVER documentation, at some points, they say: "Oh, by the way, this-and-that parameter may be NULL". They don't do this in the description of the parameters, but in the wall of text that describes what the function is supposed to do. I don't see a reasonable way to figure out which parameters may be NULL and which ones may not. I gave it a try by removing all NULL checks, in the hope that the library can figure it out and return a proper error code. But for some parameters, when they are NULL, CUSOLVER bails out with a segfault. So now the NULL checks are back in, which might cause a NullPointerException for some cases even when the parameter can be NULL. I'll have to wait for the bug reports to figure this out. (If someone wants to read the whole CUSOLVER doc, and send me a list of which parameters may or may not be NULL, I'd do an update...)

@blueberry
Copy link
Author

@jcuda Thank you so much Marco for working on this!

I am in the middle of moving to a different appartment, so I can't build this right away, but as soon as I can find a slot, I'll build a Linux + OSX binaries (one of the first days of February most likely).

@blueberry
Copy link
Author

HI @jcuda,

I get the following build error when I try to build JCuda 10.2 on Linux:

jcuda.build cmake-gui ➜ jcuda.build make [ 3%] Building CXX object jcuda/JCudaDriverJNI/bin/bin/CMakeFiles/JCudaCommonJNI.dir/src/JNIUtils.cpp.o /home/dragan/workspace/java/jcuda/jcuda-common/JCudaCommonJNI/src/JNIUtils.cpp: In function ‘bool initNative(JNIEnv*, jobjectArray, int**&, bool)’: /home/dragan/workspace/java/jcuda/jcuda-common/JCudaCommonJNI/src/JNIUtils.cpp:538:23: error: ‘nullptr’ was not declared in this scope if (javaObject == nullptr) ^ /home/dragan/workspace/java/jcuda/jcuda-common/JCudaCommonJNI/src/JNIUtils.cpp: In function ‘bool releaseNative(JNIEnv*, int**&, jobjectArray, bool)’: /home/dragan/workspace/java/jcuda/jcuda-common/JCudaCommonJNI/src/JNIUtils.cpp:560:25: error: ‘nullptr’ was not declared in this scope if (nativeObject == nullptr) ^ make[2]: *** [jcuda/JCudaDriverJNI/bin/bin/CMakeFiles/JCudaCommonJNI.dir/build.make:63: jcuda/JCudaDriverJNI/bin/bin/CMakeFiles/JCudaCommonJNI.dir/src/JNIUtils.cpp.o] Error 1 make[1]: *** [CMakeFiles/Makefile2:351: jcuda/JCudaDriverJNI/bin/bin/CMakeFiles/JCudaCommonJNI.dir/all] Error 2 make: *** [Makefile:84: all] Error 2

I have updated my system's CUDA and cuDNN to 10.2.89 and 7.6.5. (I have also manually built gcc 4.8.5 to support older RHEL, but this is not related to this issue because I get this error anyway).
I have manually set the few missin references to cuDNN library, include, etc. in cmake-gui.

Everything seems to work well until the make step.

@blueberry
Copy link
Author

@jcuda FYI changing "nullptr" to "NULL" fixes this build. Was nullptr some sort of typo or it was intentional (in this case, how to fix that)?

@blueberry
Copy link
Author

Linux binaries (with gcc 4.8.5):

jcuda-linux-10.2.zip

I'll build it for macOS now and will upload it as soon as they are ready.

When will you have time to wrap it up into a release? I plan to release some dependent libraries when JCuda is ready.

@blueberry
Copy link
Author

macOS binaries:

jcuda-macos-10.2.zip

IMPORTANT:

  1. Nvidia officially discontinued support for CUDA on macOS. 10.2 is the last release that they provide.
  2. They have already discontinued support for cuDNN on macs. The latest cuDNN 7.6.5 is not even provided for CUDA 10.2, but only for CUDA 10.1. I have included that release (7.6.5 for 10.1) in this build, so JCudnn could be built at all. I do not know whether it would actually work. However, doubt that anyone will be hit by this, since the most recent Nvidia GPUs in Macs are from 2012 or 2013. I guess this is included mostly for completeness sake. If it works on macs - great. If not - time to use CUDA via ssh or some other means. Nvidia on Apple is not a viable option any more...

@jcuda
Copy link
Owner

jcuda commented Feb 5, 2020

Thanks @blueberry !

The issue of nullptr vs. NULL: The nullptr keyword is a replacement for NULL from C++11 onwards (see https://en.cppreference.com/w/cpp/keyword/nullptr ), because NULL was somehow not properly defined in the language standard or so. This is fixed in jcuda/jcuda-common@c057670 , and they should semantically be the same here.

The deprecation of MacOS support was already mentioned elsewhere, and we (or rather: The MacOS users) will just have to anticipate that. I'll try to include the MacOS binaries for the last time, despire the doubts of whether they'll actually work - again, there's not much else that we can do.

Regarding the Linux binaries: I haven't heard back from the contributor who provided the Linux binaries with the older dependencies. I pinged him again in https://forum.byte-welt.net/t/why-jcuda-10-1-requires-usr-local-bfm-lib64-libstdc-so-6-version-cxxabi-1-3-8/21225/22 - if there is no response, I'll try to schedule the update (maybe end of this week, but) not later than beginning of next week, and drop a note here when it's done.

@blueberry
Copy link
Author

@jcuda regarding the other contributor: you might have missed that, but I already provided binaries that support legacy gcc 4.8.5 that he asked for.

@jcuda
Copy link
Owner

jcuda commented Feb 5, 2020

Sorry, I noticed that you mentioned gcc 4.8.5, but wasn't aware that this implies that the right (lower) CXXABI-version will be used (I'm not so familiar with the Linux world, obviously...).

Then I'll try to do the release this week, but again, it should not be later than Monday/Tuesday next week (there's some task during the weekend that might block me for some time).

@jcuda
Copy link
Owner

jcuda commented Feb 9, 2020

JCuda 10.2.0 is on its way into Maven Central, and should be available in a few minutes, under the usual coordinates:

<dependency>
    <groupId>org.jcuda</groupId>
    <artifactId>jcuda</artifactId>
    <version>10.2.0</version>
</dependency>
<dependency>
    <groupId>org.jcuda</groupId>
    <artifactId>jcublas</artifactId>
    <version>10.2.0</version>
</dependency>
<dependency>
    <groupId>org.jcuda</groupId>
    <artifactId>jcufft</artifactId>
    <version>10.2.0</version>
</dependency>
<dependency>
    <groupId>org.jcuda</groupId>
    <artifactId>jcusparse</artifactId>
    <version>10.2.0</version>
</dependency>
<dependency>
    <groupId>org.jcuda</groupId>
    <artifactId>jcusolver</artifactId>
    <version>10.2.0</version>
</dependency>
<dependency>
    <groupId>org.jcuda</groupId>
    <artifactId>jcurand</artifactId>
    <version>10.2.0</version>
</dependency>
<dependency>
    <groupId>org.jcuda</groupId>
    <artifactId>jnvgraph</artifactId>
    <version>10.2.0</version>
</dependency>
<dependency>
    <groupId>org.jcuda</groupId>
    <artifactId>jcudnn</artifactId>
    <version>10.2.0</version>
</dependency>

As usual, a huge thanks @blueberry for the support and for providing the Linux- and Mac binaries!

(I'm curious to see whether Mac people will stumble over the deprecation of cuDNN on Mac via JCudnn... in that case, however, we'd have to point to version 10.1.0 and to complaints@nvidia.com ;-))

If there are no problems reported with this release, I'll close this issue (and update the website and README) in a few days.

@blueberry
Copy link
Author

Thank you Marco!

@ekokrek
Copy link

ekokrek commented Feb 18, 2020

Hello Marco and blueberry,

I am trying to build JCuda 10.2 but couldn't succeed.

Ubuntu 18.04.4 LTS
CUDA 10.2
Java 11.0.6
gcc 7.4.0
cmake 3.10.2
cuDNN 7.6

I receive an error like the following:

emel@bsb-workstation:~/jcuda$ cmake ./jcuda-main
-- The C compiler identification is GNU 7.4.0
-- The CXX compiler identification is GNU 7.4.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - not found
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE  
-- Found CUDA: /usr/local/cuda-10.2/bin/nvcc
CMake Error at /usr/share/cmake-3.10/Modules/FindPackageHandleStandardArgs.cmake:137 (message):
  Could NOT find JNI (missing: JAVA_AWT_INCLUDE_PATH)
Call Stack (most recent call first):
  /usr/share/cmake-3.10/Modules/FindPackageHandleStandardArgs.cmake:378 (_FPHSA_FAILURE_MESSAGE)
  /usr/share/cmake-3.10/Modules/FindJNI.cmake:310 (FIND_PACKAGE_HANDLE_STANDARD_ARGS)
  ../jcuda-common/JCudaCommon_CMake.txt:33 (find_package)
  CMakeLists.txt:7 (include)


-- Configuring incomplete, errors occurred!
See also "/home/emel/jcuda/CMakeFiles/CMakeOutput.log".
See also "/home/emel/jcuda/CMakeFiles/CMakeError.log".

I looked around and there were several suggestions for
Could NOT find JNI (missing: JAVA_AWT_INCLUDE_PATH)
and other errors; however, I couldn't resolve this problem.

Is there anyway that I can solve this situation without reverting to older versions of any tools that I've listed initially? Like to java 1.8 or gcc 4.8 etc.

Thanks in advance !

@jcuda
Copy link
Owner

jcuda commented Feb 18, 2020

(Just curious: Is there any reason why you don't use the release via Maven? (I'll also upload a single ZIP with all JARs on the website, but this currently has low priority)).


The most likely reason for the error message is:

You might have installed a JRE (Java Runtime Environment) instead of a JDK (Java Development Kit).
(I'm not so familiar with Linux/Ubuntu. If this is the case, I'd also have to do some websearches for possible solutions...)

Another possible reason is:

There might have something changed between Java 8 and Java 11 that affects whether CMake can find the required paths. That would be a nuisance.


A drive-by of technical details of CMake that I'm usually not concerned with, but that might be a step towards a solution for both cases: Do you have the directory that is mentioned at https://github.com/Kitware/CMake/blob/master/Modules/FindJNI.cmake#L207 ?

@ekokrek
Copy link

ekokrek commented Feb 18, 2020

Thank you for the quick response !

Actually, when I first checked maven repository upon reading messages here, I got lost a little bit (not familiar with maven) and thought those folders were missing something compared to the zip files you share on the website. Now I realize that libraries are shared individually. I will use them asap!

About jre-jdk: I checked jdk through checking javac version. So, I think it is not the issue.

I should try modification on the FindJNI.cmake file, though.

Thanks a lot for the kind response.

@jcuda
Copy link
Owner

jcuda commented Feb 18, 2020

A side note: The package on the website is still for CUDA 10.1 - the update for 10.2 was recently, and I didn't (yet) manage to upload the package to the website.

Since JCuda does not have any further dependencies, you could still use the JARs directly. But I'd strongly recommend to use Maven: If you only want to use libraries, it's very simple, and you don't have to care about transitive dependencies.

(If you wanted to publish a library on your own in Maven Central, possibly even a library with JNI, I'd say "Welcome to the clusterfLIck of 'convention over configuration'". But if you only want to use Maven libraries, it makes life really simpler....)

@blueberry
Copy link
Author

blueberry commented Mar 1, 2020

I forgot to close this issue. Everything's been working smoothly with 10.2, and some bugs found in 10.1 (regardless of whether they were introduced by JCuda or present in Nvidia's driver itself) have been fixed in 10.2. Thank you a lot for these great libraries, Marco!

BTW, I've just opened a related thread for JOCL (gpu/JOCL#27)

@jcuda
Copy link
Owner

jcuda commented Mar 1, 2020

Thanks @blueberry - I appreciate your "heads ups" for new versions, and of course, your contributions. But I'll leave this one open just as a reminder that the READMEs and the website still have to be updated for 10.2.

(It's not much effort, but I've been facing some tight schedules in the past few weeks, and assume that most people will either use Maven+10.1, or figure out that 10.2 is available anyhow, so had to defer this a bit)

Along the same line: Thanks for your pointer to the JOCL issue. In fact, there has been a discussion about HIP support at jcuda/jcuda#5 (and HIP and ROC somehow seem to be related or the same thing, or at least related).

I'll have to re-read the other issue as a refresher - it's been ~4 years since then, and I'm sure a lot has happened in the meantime. Although the sticky note to consider creating "JHip" is still on my table, I'll have to carefully look at the current projects and their structure to see whether I can even consider carve out an appropriate chunk of my spare time for that - I already have the feeling of neglecting too many of my "spare time" projects...

@jcuda jcuda reopened this Mar 1, 2020
@blueberry
Copy link
Author

Thanks @jcuda Just to be clear, MIOpen supports OpenCL. I hope it means that it can be supported with the existing JOCL infrastructure, without the need of JHip!

@jcuda
Copy link
Owner

jcuda commented Sep 7, 2020

This one was still opened, because the README and website had not been updated with the new version number, but this has become obsolete with the update to CUDA 11.

@jcuda jcuda closed this as completed Sep 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants