Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support fp16 in new python engine #393

Open
Xp-speit2018 opened this issue Oct 30, 2024 · 3 comments
Open

Support fp16 in new python engine #393

Xp-speit2018 opened this issue Oct 30, 2024 · 3 comments

Comments

@Xp-speit2018
Copy link

Hi,

I recently encountered an issue where fp16 (half-precision) is no longer supported in the new Python engine of KeOps (my version is 2.2.3). I received the following error:

ValueError: [KeOps] Error : only float and double dtypes are implemented in new python engine for now (error at line 272 in file ...

However, I noticed that fp16 was claimed to be supported since version 1.4, as mentioned in this issue. I'm curious if there is a specific reason why fp16 support was dropped in the new engine introduced in version 2.0.

Given that half-precision is widely used in many prevalent models, I believe it would be beneficial to clarify the current status and potential roadmap for fp16 support in the new engine. This could encourage the community to contribute and help improve the feature.

Thank you for your attention.

@joanglaunes
Copy link
Contributor

Hello @Xp-speit2018 ,
In fact fp16 is still supported. I checked about this message in the source code, and the only occurrence is when attempting to generate a formula involving infinite numbers. So it means that only if your formula involves infinite numbers it will not work with float16. I will change this error message to clarify.

joanglaunes added a commit that referenced this issue Oct 30, 2024
changed error message regarding infinity and float16 ; for issue #393.
@Xp-speit2018
Copy link
Author

Thanks @joanglaunes for your clarification! Now I'm a bit more aware of where the error is originated from.

I created a minimal demo to show this error:

import torch
from pykeops.torch import LazyTensor
N = 10
M = 20
D = 8

X = torch.randn(N, D).half().cuda()
Y = torch.randn(N, D).half().cuda()

# find the L2-nearest neighbors of X in Y

X_i = LazyTensor(X[:, None, :])  # (N, 1, D)
Y_j = LazyTensor(Y[None, :, :])  # (1, M, D)

D_ij = ((X_i - Y_j) ** 2).sum(-1)  # (N, M)
ind_k = D_ij.argmin(dim=1)

print(ind_k)

The infinite is indeed created during argmin to initialize the min value, see:

class Min_ArgMin_Reduction_Base(Reduction):
"""min+argmin reduction : base class"""
def __init__(self, formula, tagIJ):
super().__init__(formula, tagIJ)
# We work with a (values,indices) vector
self.dimred = 2 * formula.dim # dimension of inner reduction variables
def InitializeReduction(self, acc):
# Returns C++ code to be used at initialization phase of the reduction.
dim = self.formula.dim
acc_min, acc_argmin = acc.split(dim, dim)
return acc_min.assign(infinity(acc.dtype)) + acc_argmin.assign(c_zero_float)

The trick part is, on cuda devices half values are packed into half2. Even if patched the infinity creation with:

def infinity(dtype):
    if dtype == "float":
        code = "( 1.0f/0.0f )"
    elif dtype == "double":
        code = "( 1.0/0.0 )"
    elif dtype == "half2":
        # not sure if it's logically correct but only used to bypass error checking
        code = "__half2(65504.0f, 65504.0f)"
    else:
        KeOps_Error(
            "only float and double dtypes are implemented in new python engine for now"
        )
    return c_variable(dtype, code)

We still have not implemented error firstly raised from:

def ReducePairShort(self, acc, xi, ind):
if xi.dtype == "half2":
KeOps_Error("not implemented")
half2_val = c_variable("half2_ind")
string = half2_val.declare_assign(
f"__floats2half2_rn(2*{ind()},2*{ind()}+1)"
)

Now it's clear that we are lacking support for half2 introduced by cuda.

Interestingly when I want to take a further look at how to patch half and I created the same demo on numpy:

import numpy as np
from pykeops.numpy import LazyTensor
N = 10
M = 20
D = 8

X = np.random.randn(N, D).astype('float16')
Y = np.random.randn(M, D).astype('float16')

# find the L2-nearest neighbors of X in Y

X_i = LazyTensor(X[:, None, :])  # (N, 1, D)
Y_j = LazyTensor(Y[None, :, :])  # (1, M, D)

D_ij = ((X_i - Y_j) ** 2).sum(-1) 
ind_k = D_ij.argmin(dim=1)

print(ind_k)

I encountered the exactly same error. It seems that we are still compiling from Gpu_link_compile and the dtype running into argmin is still half2, which I suppose is not an expected behavior.

Would it be possible to consider add support for half2 and half with reduction methods where creating infinity is a must? Your patience would be much appreciated.

@joanglaunes
Copy link
Contributor

Hello @Xp-speit2018 ,
Indeed, the Min-Argmin reductions are not implemented with float16, I forgot about this, sorry. Actually this should not be so difficult to do, I think it was left unfinished only because of lack of time. Regarding half2, in fact Cuda first introduced support for float16 via the half2 type only, and so this is what we used to implement support for float16 in KeOps. So the fact that you get the same error with numpy is completely expected behaviour.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants