Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading dense array doesn't free memory #150

Closed
keenangraham opened this issue May 8, 2019 · 8 comments · Fixed by #151
Closed

Reading dense array doesn't free memory #150

keenangraham opened this issue May 8, 2019 · 8 comments · Fixed by #151

Comments

@keenangraham
Copy link

keenangraham commented May 8, 2019

Hi,

I'm wondering if this is expected behavior or if you have any tips to fix. On Ubuntu 16, Python 3.7, and tiledb 0.4.1:

Create toy array:

x = np.ones(10000000)
ctx = tiledb.Ctx()
path = 'test_tile_db'
d1 = tiledb.Dim(
    'test_domain', domain=(0, x.shape[0] - 1), tile=10000, dtype="uint32"
)
domain = tiledb.Domain(d1)
v = tiledb.Attr(
    'test_value',
    dtype="float32",
)
schema = tiledb.ArraySchema(
    domain=domain, attrs=(v,), cell_order="row-major", tile_order="row-major"
)
A = tiledb.DenseArray.create(path, schema)
values = x.astype(np.float32)
with tiledb.DenseArray(path, mode="w", ctx=ctx) as A:
    A[:] = {'test_value': values}

Read from array:

for i in range(10):
    with tiledb.DenseArray(path, mode='r') as data:
        data[:]
    print('Gigs:', round(psutil.virtual_memory().used / (10**9), 2))
Gigs: 0.84
Gigs: 0.89
Gigs: 0.93
Gigs: 0.97
Gigs: 1.01
Gigs: 1.05
Gigs: 1.1
Gigs: 1.14
Gigs: 1.18
Gigs: 1.22

Basically memory never seems to get released even when I don't assign the data[:] to any variable. I've tried playing around with garbage collection (import gc; gc.collect()) but it seems Python is not aware. Have also tried doing some explicit closing of the DenseArray. Eventually have to reset Jupyter notebook to get memory to free.

In my real use case I am iterating over several tileDBs and pulling full array data out from each, doing some transforms, and writing new tileDBs with transformed data. Works okay except every read call adds around 2GBs to the used memory and never releases it, causing the machine to eventually run out of memory. Current work around is to spin up new process for every iteration.

Thanks!

@keenangraham keenangraham changed the title Reading dense array doesn't free data Reading dense array doesn't free memory May 8, 2019
ihnorton added a commit that referenced this issue May 10, 2019
PyArray_NewFromDescr ignores the NPY_ARRAY_OWNDATA flag, so we
need to set it manually so that NumPy frees the memory.

Fixes #150
@ihnorton
Copy link
Member

Thanks for the report and the great repro, fix inbound (#151).

ihnorton added a commit that referenced this issue May 14, 2019
Fixes #150

PyArray_NewFromDescr ignores the NPY_ARRAY_OWNDATA flag, so we
need to set it manually so that NumPy frees the memory.

Also switch memory allocation to use PyDataMem_NEW/FREE. This
should generally be the same as PyMem_Malloc, but it could end up
different in a situation where the C ext is not linked against
the same C stdlib. In that case, NumPy would not call the correct
de-allocator when freeing the memory it gains ownership over.
ihnorton added a commit that referenced this issue May 14, 2019
Fixes #150

PyArray_NewFromDescr ignores the NPY_ARRAY_OWNDATA flag, so we
need to set it manually so that NumPy frees the memory.

Also switch memory allocation to use PyDataMem_NEW/FREE. This
should generally be the same as PyMem_Malloc, but it could end up
different in a situation where the C ext is not linked against
the same C stdlib. In that case, NumPy would not call the correct
de-allocator when freeing the memory it gains ownership over.
ihnorton added a commit that referenced this issue May 14, 2019
Fixes #150

PyArray_NewFromDescr ignores the NPY_ARRAY_OWNDATA flag, so we
need to set it manually so that NumPy frees the memory.

Also switch memory allocation to use PyDataMem_NEW/FREE. This
should generally be the same as PyMem_Malloc, but it could end up
different in a situation where the C ext is not linked against
the same C stdlib. In that case, NumPy would not call the correct
de-allocator when freeing the memory it gains ownership over.
ihnorton added a commit that referenced this issue May 17, 2019
Fixes #150

PyArray_NewFromDescr ignores the NPY_ARRAY_OWNDATA flag, so we
need to set it manually so that NumPy frees the memory.

Because we now don't (can't) set the flag in this call, simplify
construction by using PyArray_SimpleNewFromData.

Now that memory is being freed correctly, we must use the NumPy
allocator (PyDataMem_NEW/FREE) so that de-allocation is matched.
ihnorton added a commit that referenced this issue May 17, 2019
ihnorton added a commit that referenced this issue May 17, 2019
ihnorton added a commit that referenced this issue May 17, 2019
ihnorton added a commit that referenced this issue May 17, 2019
ihnorton added a commit that referenced this issue May 17, 2019
ihnorton added a commit that referenced this issue May 17, 2019
Fixes #150

PyArray_NewFromDescr ignores the NPY_ARRAY_OWNDATA flag, so we
need to set it manually so that NumPy frees the memory.

Because we now don't (can't) set the flag in this call, simplify
construction by using PyArray_SimpleNewFromData.

Now that memory is being freed correctly, we must use the NumPy
allocator (PyDataMem_NEW/FREE) so that de-allocation is matched.
ihnorton added a commit that referenced this issue May 17, 2019
antalakas pushed a commit that referenced this issue Jul 6, 2020
Fixes #150

PyArray_NewFromDescr ignores the NPY_ARRAY_OWNDATA flag, so we
need to set it manually so that NumPy frees the memory.

Because we now don't (can't) set the flag in this call, simplify
construction by using PyArray_SimpleNewFromData.

Now that memory is being freed correctly, we must use the NumPy
allocator (PyDataMem_NEW/FREE) so that de-allocation is matched.
antalakas pushed a commit that referenced this issue Jul 6, 2020
@Hoeze
Copy link

Hoeze commented Dec 8, 2020

Please re-open @ihnorton , I experience the same issue with current tiledb and sparse arrays in range of many GB's.

> conda list | grep -E -i "tiledb|numpy"
numpy                     1.19.4           py37h7e9df27_1    conda-forge
tiledb                    2.1.3                h17508cd_0    conda-forge
tiledb-py                 0.7.3            py37h11a8686_0    conda-forge

The reproducer from @keenangraham again leaks memory so it can be re-used for testing.

@ihnorton
Copy link
Member

I have several basic tests for releasing array and context memory, and have done additional checking on sparse arrays specifically, but could not reproduce a situation like this where the memory trivially leaks every iteration. So, definitely not ruling it out, but considering in light of discussion in #440 right now.

@ihnorton
Copy link
Member

ihnorton commented Dec 10, 2020

@Hoeze
Copy link

Hoeze commented Dec 10, 2020

@ihnorton In your tests you always use the same context.
As pointed out in #440 (comment), I think the memory leak is caused by the ctx not freeing up memory when being garbage-collected.

@ihnorton
Copy link
Member

This line creates a new Ctx every iteration. However, the test is only checking that we keep the memory usage under 2x the initial usage (because RSS is not very reliable).

@Hoeze
Copy link

Hoeze commented Dec 10, 2020

I think you're right and my test is somehow flawed.
When setting ctx=ctx in the for-loop, the memory usage is slightly higher than when leaving it away.
Also, even after 3000 loops, the memory does not increase more than ~5-10% in total, no matter which configuration I use.
This is far below the 8GB of memory usage I observe on my dask workers after reading from the array the first time.

@ihnorton
Copy link
Member

Do you mind if we close this one and consolidate in #440? I will respond there.

@ihnorton ihnorton closed this as completed Apr 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants