Support fp16 in new python engine #393

Xp-speit2018 · 2024-10-30T05:41:12Z

Hi,

I recently encountered an issue where fp16 (half-precision) is no longer supported in the new Python engine of KeOps (my version is 2.2.3). I received the following error:

ValueError: [KeOps] Error : only float and double dtypes are implemented in new python engine for now (error at line 272 in file ...

However, I noticed that fp16 was claimed to be supported since version 1.4, as mentioned in this issue. I'm curious if there is a specific reason why fp16 support was dropped in the new engine introduced in version 2.0.

Given that half-precision is widely used in many prevalent models, I believe it would be beneficial to clarify the current status and potential roadmap for fp16 support in the new engine. This could encourage the community to contribute and help improve the feature.

Thank you for your attention.

The text was updated successfully, but these errors were encountered:

joanglaunes · 2024-10-30T15:25:11Z

Hello @Xp-speit2018 ,
In fact fp16 is still supported. I checked about this message in the source code, and the only occurrence is when attempting to generate a formula involving infinite numbers. So it means that only if your formula involves infinite numbers it will not work with float16. I will change this error message to clarify.

changed error message regarding infinity and float16 ; for issue #393.

Xp-speit2018 · 2024-10-31T11:02:37Z

Thanks @joanglaunes for your clarification! Now I'm a bit more aware of where the error is originated from.

I created a minimal demo to show this error:

import torch
from pykeops.torch import LazyTensor
N = 10
M = 20
D = 8

X = torch.randn(N, D).half().cuda()
Y = torch.randn(N, D).half().cuda()

# find the L2-nearest neighbors of X in Y

X_i = LazyTensor(X[:, None, :])  # (N, 1, D)
Y_j = LazyTensor(Y[None, :, :])  # (1, M, D)

D_ij = ((X_i - Y_j) ** 2).sum(-1)  # (N, M)
ind_k = D_ij.argmin(dim=1)

print(ind_k)

The infinite is indeed created during argmin to initialize the min value, see:

keops/keopscore/keopscore/formulas/reductions/Min_ArgMin_Reduction_Base.py

Lines 12 to 25 in dfa60e2

    
           class Min_ArgMin_Reduction_Base(Reduction): 
        
               """min+argmin reduction : base class""" 
        
               def __init__(self, formula, tagIJ): 
        
                   super().__init__(formula, tagIJ) 
        
                   # We work with a (values,indices) vector 
        
                   self.dimred = 2 * formula.dim  # dimension of inner reduction variables 
        
               def InitializeReduction(self, acc): 
        
                   # Returns C++ code to be used at initialization phase of the reduction. 
        
                   dim = self.formula.dim 
        
                   acc_min, acc_argmin = acc.split(dim, dim) 
        
                   return acc_min.assign(infinity(acc.dtype)) + acc_argmin.assign(c_zero_float)

The trick part is, on cuda devices half values are packed into half2. Even if patched the infinity creation with:

def infinity(dtype):
    if dtype == "float":
        code = "( 1.0f/0.0f )"
    elif dtype == "double":
        code = "( 1.0/0.0 )"
    elif dtype == "half2":
        # not sure if it's logically correct but only used to bypass error checking
        code = "__half2(65504.0f, 65504.0f)"
    else:
        KeOps_Error(
            "only float and double dtypes are implemented in new python engine for now"
        )
    return c_variable(dtype, code)

We still have not implemented error firstly raised from:

keops/keopscore/keopscore/formulas/reductions/Min_ArgMin_Reduction_Base.py

Lines 40 to 46 in dfa60e2

    
           def ReducePairShort(self, acc, xi, ind): 
        
               if xi.dtype == "half2": 
        
                   KeOps_Error("not implemented") 
        
                   half2_val = c_variable("half2_ind") 
        
                   string = half2_val.declare_assign( 
        
                       f"__floats2half2_rn(2*{ind()},2*{ind()}+1)" 
        
                   )

Now it's clear that we are lacking support for half2 introduced by cuda.

Interestingly when I want to take a further look at how to patch half and I created the same demo on numpy:

import numpy as np
from pykeops.numpy import LazyTensor
N = 10
M = 20
D = 8

X = np.random.randn(N, D).astype('float16')
Y = np.random.randn(M, D).astype('float16')

# find the L2-nearest neighbors of X in Y

X_i = LazyTensor(X[:, None, :])  # (N, 1, D)
Y_j = LazyTensor(Y[None, :, :])  # (1, M, D)

D_ij = ((X_i - Y_j) ** 2).sum(-1) 
ind_k = D_ij.argmin(dim=1)

print(ind_k)

I encountered the exactly same error. It seems that we are still compiling from Gpu_link_compile and the dtype running into argmin is still half2, which I suppose is not an expected behavior.

Would it be possible to consider add support for half2 and half with reduction methods where creating infinity is a must? Your patience would be much appreciated.

joanglaunes · 2024-10-31T21:09:39Z

Hello @Xp-speit2018 ,
Indeed, the Min-Argmin reductions are not implemented with float16, I forgot about this, sorry. Actually this should not be so difficult to do, I think it was left unfinished only because of lack of time. Regarding half2, in fact Cuda first introduced support for float16 via the half2 type only, and so this is what we used to implement support for float16 in KeOps. So the fact that you get the same error with numpy is completely expected behaviour.

joanglaunes added a commit that referenced this issue Oct 30, 2024

[ci skip] Update code_gen_utils.py

5619ac3

changed error message regarding infinity and float16 ; for issue #393.

joanglaunes mentioned this issue Oct 30, 2024

[ci skip] Update code_gen_utils.py #396

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support fp16 in new python engine #393

Support fp16 in new python engine #393

Xp-speit2018 commented Oct 30, 2024

joanglaunes commented Oct 30, 2024

Xp-speit2018 commented Oct 31, 2024

joanglaunes commented Oct 31, 2024

Support fp16 in new python engine #393

Support fp16 in new python engine #393

Comments

Xp-speit2018 commented Oct 30, 2024

joanglaunes commented Oct 30, 2024

Xp-speit2018 commented Oct 31, 2024

joanglaunes commented Oct 31, 2024