Reading dense array doesn't free memory #150

keenangraham · 2019-05-08T03:25:36Z

Hi,

I'm wondering if this is expected behavior or if you have any tips to fix. On Ubuntu 16, Python 3.7, and tiledb 0.4.1:

Create toy array:

x = np.ones(10000000)
ctx = tiledb.Ctx()
path = 'test_tile_db'
d1 = tiledb.Dim(
    'test_domain', domain=(0, x.shape[0] - 1), tile=10000, dtype="uint32"
)
domain = tiledb.Domain(d1)
v = tiledb.Attr(
    'test_value',
    dtype="float32",
)
schema = tiledb.ArraySchema(
    domain=domain, attrs=(v,), cell_order="row-major", tile_order="row-major"
)
A = tiledb.DenseArray.create(path, schema)
values = x.astype(np.float32)
with tiledb.DenseArray(path, mode="w", ctx=ctx) as A:
    A[:] = {'test_value': values}

Read from array:

for i in range(10):
    with tiledb.DenseArray(path, mode='r') as data:
        data[:]
    print('Gigs:', round(psutil.virtual_memory().used / (10**9), 2))
Gigs: 0.84
Gigs: 0.89
Gigs: 0.93
Gigs: 0.97
Gigs: 1.01
Gigs: 1.05
Gigs: 1.1
Gigs: 1.14
Gigs: 1.18
Gigs: 1.22

Basically memory never seems to get released even when I don't assign the data[:] to any variable. I've tried playing around with garbage collection (import gc; gc.collect()) but it seems Python is not aware. Have also tried doing some explicit closing of the DenseArray. Eventually have to reset Jupyter notebook to get memory to free.

In my real use case I am iterating over several tileDBs and pulling full array data out from each, doing some transforms, and writing new tileDBs with transformed data. Works okay except every read call adds around 2GBs to the used memory and never releases it, causing the machine to eventually run out of memory. Current work around is to spin up new process for every iteration.

Thanks!

The text was updated successfully, but these errors were encountered:

PyArray_NewFromDescr ignores the NPY_ARRAY_OWNDATA flag, so we need to set it manually so that NumPy frees the memory. Fixes #150

ihnorton · 2019-05-10T20:57:25Z

Thanks for the report and the great repro, fix inbound (#151).

Fixes #150 PyArray_NewFromDescr ignores the NPY_ARRAY_OWNDATA flag, so we need to set it manually so that NumPy frees the memory. Also switch memory allocation to use PyDataMem_NEW/FREE. This should generally be the same as PyMem_Malloc, but it could end up different in a situation where the C ext is not linked against the same C stdlib. In that case, NumPy would not call the correct de-allocator when freeing the memory it gains ownership over.

Fixes #150 PyArray_NewFromDescr ignores the NPY_ARRAY_OWNDATA flag, so we need to set it manually so that NumPy frees the memory. Because we now don't (can't) set the flag in this call, simplify construction by using PyArray_SimpleNewFromData. Now that memory is being freed correctly, we must use the NumPy allocator (PyDataMem_NEW/FREE) so that de-allocation is matched.

ref #150

Fixes #150 PyArray_NewFromDescr ignores the NPY_ARRAY_OWNDATA flag, so we need to set it manually so that NumPy frees the memory. Because we now don't (can't) set the flag in this call, simplify construction by using PyArray_SimpleNewFromData. Now that memory is being freed correctly, we must use the NumPy allocator (PyDataMem_NEW/FREE) so that de-allocation is matched.

ref #150

Fixes #150 PyArray_NewFromDescr ignores the NPY_ARRAY_OWNDATA flag, so we need to set it manually so that NumPy frees the memory. Because we now don't (can't) set the flag in this call, simplify construction by using PyArray_SimpleNewFromData. Now that memory is being freed correctly, we must use the NumPy allocator (PyDataMem_NEW/FREE) so that de-allocation is matched.

ref #150

Hoeze · 2020-12-08T20:21:37Z

Please re-open @ihnorton , I experience the same issue with current tiledb and sparse arrays in range of many GB's.

> conda list | grep -E -i "tiledb|numpy"
numpy                     1.19.4           py37h7e9df27_1    conda-forge
tiledb                    2.1.3                h17508cd_0    conda-forge
tiledb-py                 0.7.3            py37h11a8686_0    conda-forge

The reproducer from @keenangraham again leaks memory so it can be re-used for testing.

ihnorton · 2020-12-10T02:50:05Z

I have several basic tests for releasing array and context memory, and have done additional checking on sparse arrays specifically, but could not reproduce a situation like this where the memory trivially leaks every iteration. So, definitely not ruling it out, but considering in light of discussion in #440 right now.

ihnorton · 2020-12-10T02:51:32Z

basic tests

Also here: https://github.com/TileDB-Inc/TileDB-Py/blob/dev/tiledb/tests/test_libtiledb.py#L3178-L3214

Hoeze · 2020-12-10T14:52:16Z

@ihnorton In your tests you always use the same context.
As pointed out in #440 (comment), I think the memory leak is caused by the ctx not freeing up memory when being garbage-collected.

ihnorton · 2020-12-10T15:05:50Z

This line creates a new Ctx every iteration. However, the test is only checking that we keep the memory usage under 2x the initial usage (because RSS is not very reliable).

Hoeze · 2020-12-10T15:39:16Z

I think you're right and my test is somehow flawed.
When setting ctx=ctx in the for-loop, the memory usage is slightly higher than when leaving it away.
Also, even after 3000 loops, the memory does not increase more than ~5-10% in total, no matter which configuration I use.
This is far below the 8GB of memory usage I observe on my dask workers after reading from the array the first time.

ihnorton · 2020-12-10T16:03:47Z

Do you mind if we close this one and consolidate in #440? I will respond there.

keenangraham changed the title ~~Reading dense array doesn't free data~~ Reading dense array doesn't free memory May 8, 2019

ihnorton added a commit that referenced this issue May 10, 2019

Fix memory leak

808c55c

PyArray_NewFromDescr ignores the NPY_ARRAY_OWNDATA flag, so we need to set it manually so that NumPy frees the memory. Fixes #150

ihnorton mentioned this issue May 10, 2019

Fix memory leak #151

Merged

ihnorton added a commit that referenced this issue May 17, 2019

Add read buffer memory sanity check

cedc55a

ref #150

ihnorton added a commit that referenced this issue May 17, 2019

Add read buffer memory sanity check

80c246a

ref #150

ihnorton added a commit that referenced this issue May 17, 2019

Add read buffer memory sanity check

581350c

ref #150

ihnorton added a commit that referenced this issue May 17, 2019

Add read buffer memory sanity check

19c10e8

ref #150

ihnorton added a commit that referenced this issue May 17, 2019

Add read buffer memory sanity check

7b1b1b4

ref #150

ihnorton closed this as completed in #151 May 17, 2019

ihnorton added a commit that referenced this issue May 17, 2019

Add read buffer memory sanity check

79c1056

ref #150

antalakas pushed a commit that referenced this issue Jul 6, 2020

Add read buffer memory sanity check

c76a6d8

ref #150

stavrospapadopoulos reopened this Dec 8, 2020

ihnorton mentioned this issue Dec 10, 2020

Some tips on correct usage? #440

Open

ihnorton closed this as completed Apr 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reading dense array doesn't free memory #150

Reading dense array doesn't free memory #150

keenangraham commented May 8, 2019 •

edited

Loading

ihnorton commented May 10, 2019

Hoeze commented Dec 8, 2020

ihnorton commented Dec 10, 2020

ihnorton commented Dec 10, 2020 •

edited

Loading

Hoeze commented Dec 10, 2020

ihnorton commented Dec 10, 2020

Hoeze commented Dec 10, 2020 •

edited

Loading

ihnorton commented Dec 10, 2020

Reading dense array doesn't free memory #150

Reading dense array doesn't free memory #150

Comments

keenangraham commented May 8, 2019 • edited Loading

ihnorton commented May 10, 2019

Hoeze commented Dec 8, 2020

ihnorton commented Dec 10, 2020

ihnorton commented Dec 10, 2020 • edited Loading

Hoeze commented Dec 10, 2020

ihnorton commented Dec 10, 2020

Hoeze commented Dec 10, 2020 • edited Loading

ihnorton commented Dec 10, 2020

keenangraham commented May 8, 2019 •

edited

Loading

ihnorton commented Dec 10, 2020 •

edited

Loading

Hoeze commented Dec 10, 2020 •

edited

Loading