Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for dtypes uint32 and int32 #341

Closed
pumelo opened this issue Mar 3, 2021 · 9 comments
Closed

Support for dtypes uint32 and int32 #341

pumelo opened this issue Mar 3, 2021 · 9 comments
Labels
enhancement New feature or request

Comments

@pumelo
Copy link

pumelo commented Mar 3, 2021

Describe the solution you'd like
Support dtypes int32 and uint32.

Additional context
I do get an array of int32 from other hardware (i2s microphone). I did currently not find an efficient way to use this data with ulab than iterating over the array and converting it to an array of floats. This seems cumbersome and slow. So it would be nice to have int32 directly supported.

This is what I am doing right now:

samples = bytearray(2048)  # bytearray to receive audio samples
data = uctypes.struct(uctypes.addressof(samples), {'arr': (0 | uctypes.ARRAY, 512 | uctypes.INT32)})

np_a = np.zeros(512, dtype=np.float)

# <--  here, get data from hardware into samples buffer

for i in range(512):
    np_a[i] = data.arr[i]

This loop above loop takes about 90ms on a esp32s2 @ 240MHz which is insanely slow. While the fft on the same data thereafter takes about 10ms.

@pumelo pumelo added the enhancement New feature or request label Mar 3, 2021
@pumelo
Copy link
Author

pumelo commented Mar 3, 2021

Just found an "easy" way to get a performance boost (overally I'd still consider this slow 😸):

putting the conversion into its one function speeds the convertion up by approximatly a factor of two. I think this is mostly due to the attribute access and lookup which is not present. (data.arr[i] vs. input[i]). I think this is mentioned somewhere in the micropython docs..

samples = bytearray(2048)  # bytearray to receive audio samples
data = uctypes.struct(uctypes.addressof(samples), {'arr': (0 | uctypes.ARRAY, 512 | uctypes.INT32)})

np_a = np.zeros(512, dtype=np.float)

def i32tof32(input, output):
    for i in range(512):
        output[i] = input[i]

# <--  here, get data from hardware into samples buffer

i32tof32(data.arr, np_a)

@v923z
Copy link
Owner

v923z commented Mar 3, 2021

@pumelo

Support dtypes int32 and uint32.

I think the short answer is no. If you want to know the reason, you can read more about the problems here: #306 (comment)

However, if I understand your problem correctly, you do not want complete support, you only want to pass an int32 array into a numpy function. This is a subtle, but significant difference.

Here are two possible solutions: one is that you wait for about a week. I am going to release version 3.0, which will support a mechanism, by which you can attach arbitrary transformers to an ndarray. You would still have to write a transformer in C, but the type of the source array could be detached from the standard dtypes, as long as you can construct a function that converts your type to one of the five supported dtypes. This mechanism would also allow arbitrary type conversion.

The other option is that we add a single function that takes your buffer, and returns an ndarray of dtype float. Afterwards, you could pass this array to the FFT function. This is simpler, and I could implement a prototype before tomorrow. Do you think you could help a bit with the testing?

@v923z
Copy link
Owner

v923z commented Mar 3, 2021

@pumelo It would be great, if you could comment on #342 The PR is not complete yet. You can do something like this:

from ulab import numpy as np
from ulab import utils

b = bytearray([0, 1, 2, 3, 4, 5, 6, 7])
print(b)
print(utils.from_intbuffer(b))

which then turns the bytearray into two floats:

bytearray(b'\x00\x01\x02\x03\x04\x05\x06\x07')
array([50462976.0, 117835012.0], dtype=float64)

The offset, count from frombuffer are supported, as well as the byteswap=False keyword argument for the case, when the endianness of the peripheral device causes problems. I might add the inplace=False, or out keyword argument in the future. That would save you the RAM allocation.

I think, the problem that you described above is going to be quite common in the future, and adding a sub-module with utility functions could be a possible solution.

@pumelo
Copy link
Author

pumelo commented Mar 4, 2021

@v923z wow, thank you very much for the fast reaction and the solution provided in #342. I'll go and test this.

I think the short answer is no. If you want to know the reason, you can read more about the problems here: #306 (comment)

👍 okay got the reason for not implementing int32. But just to mention, the esp32 family of chips is very bad at floating point operations. esp23-s2 does not even have an fpu. The esp32 is also reported to have very low flops (esp32-s3 should come with more support for flops). Which means, there are reasons to stick with int32 rather than float. But anyway I'll test what I can squeeze out of it 😄

I think the solution you provided will cover my use case. But what from my point of view would be very useful is to provide an utility function which could seamlessly use micropython arrays or uctype arrays, determine the byteorder and type from it and return the requested ndarray with the specific array. This would be the only function required at that point because you can always create an array or a ctypes array from a bytearray or memoryview where you data is. Probably this could be built on the new functionality you mentioned to be released in 3.0. Especially when using inplace this could be very powerful!

I'll go and test #342 and report back.

@v923z
Copy link
Owner

v923z commented Mar 4, 2021

@pumelo

But just to mention, the esp32 family of chips is very bad at floating point operations. esp23-s2 does not even have an fpu. The esp32 is also reported to have very low flops (esp32-s3 should come with more support for flops). Which means, there are reasons to stick with int32 rather than float.

Many operations produce floating point results. Standard deviations, mean, all of the vectorised mathematical function, FFT, just to name a few. I understand that one could move the FFT to the integer domain, but you would still have difficulties with the other examples. So, I believe, it is practically impossible to get rid of the floats. Adding two new types would significantly increase the firmware size.

I think the solution you provided will cover my use case. But what from my point of view would be very useful is to provide an utility function which could seamlessly use micropython arrays or uctype arrays, determine the byteorder and type from it and return the requested ndarray with the specific array. This would be the only function required at that point because you can always create an array or a ctypes array from a bytearray or memoryview where you data is.

This is a sound idea. I will look into this. Could you, please, write a mock-up as to how this should work? I really mean python pseudo-code. It doesn't have to work, I would only like to see what kind of syntax you have in mind.

Probably this could be built on the new functionality you mentioned to be released in 3.0. Especially when using inplace this could be very powerful!

Here is the background, if you are interested. You can comment there, if you want your voice to be heard: #327

I'll go and test #342 and report back.

Here are the docs: https://github.com/v923z/micropython-ulab/blob/utils/docs/manual/source/ulab-utils.rst

@pumelo
Copy link
Author

pumelo commented Mar 4, 2021

Just posting this here without further investigation:

Your examples in the doc work on the esp32-s2.

but If I use a larger buffer like 4096 bytes long like so:

a = bytearray(1024*4)
x = utils.from_intbuffer(a)

the device crashes with
Write operation at address 0x40020000 not permitted.

Backtrace is this one here:

0x400bca2b: utils_from_intbuffer_helper at /root/micropython/lib/micropython-ulab/code/utils/utils.c:94
0x400bca6a: utils_from_intbuffer at /root/micropython/lib/micropython-ulab/code/utils/utils.c:107
0x4008c78e: fun_builtin_var_call at /root/micropython/py/objfun.c:126
0x40093201: mp_call_function_n_kw at /root/micropython/py/runtime.c:652
0x40093311: mp_call_method_n_kw at /root/micropython/py/runtime.c:668
0x40095bce: mp_execute_bytecode at /root/micropython/py/vm.c:1085
0x4008c868: fun_bc_call at /root/micropython/py/objfun.c:288
0x40093201: mp_call_function_n_kw at /root/micropython/py/runtime.c:652
0x4009322a: mp_call_function_0 at /root/micropython/py/runtime.c:626
0x400a4e27: parse_compile_execute at /root/micropython/lib/utils/pyexec.c:116
0x400a50bd: pyexec_friendly_repl at /root/micropython/lib/utils/pyexec.c:661
0x4008609f: mp_task at /root/micropython/ports/esp32/build_GENERIC_S2/../main.c:125
0x4002e3a5: vPortTaskWrapper at /opt/esp/idf/components/freertos/xtensa/port.c:143

@pumelo
Copy link
Author

pumelo commented Mar 4, 2021

see comment in pull request. this seems to be the issue for the error.

Now working with the data from the mic is okay 💐

Conversion which previously took about 45ms happens now within 0.8 ms 😺 nice speedup

Thank you!

@v923z
Copy link
Owner

v923z commented Mar 5, 2021

see comment in pull request. this seems to be the issue for the error.

I have fixed that.

Now working with the data from the mic is okay 💐

Great, thanks for the feedback!

Conversion which previously took about 45ms happens now within 0.8 ms 😺 nice speedup

I have moved the if close one level up, so for the case, when you don't swap bytes, the code should run faster. The difference is probably not significant, though.

Do you think you could write a small test script for this function? If so, here is some help: https://github.com/v923z/micropython-ulab#testing

@v923z
Copy link
Owner

v923z commented Mar 8, 2021

#342 adds the requested functionality.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants