You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I experienced the same issue with MySQL database. Are you also using MySQL database for Django?
I have done some testing and attempted to replicate this issue on both MySQL and Postgres databases, and I found that Postgres does not encounter Out Of Memory issues when dealing with large amount of data. Have not tested with other databases.
I did some investigation and found that the issue stems from Django iterator on this line:
To get around this issue, I wrote a custom iterator that does the chunking on the app-level, which should behave correctly regardless of which database you use: lowdeyou@b43e6d2
I am trying to re-index more than 100 million documents, which doesn't work due to lack of ram.
Is it possible that the problem is in the Elasticsearch implementation when executing parallel indexing?
Here is an issue where they talk about the memory leak:
elastic/elasticsearch-py#1101 (comment)
Looks like my memory fills up after this line when using streaming_bulk:
https://github.com/elastic/elasticsearch-py/blob/8d10e1545e2572d3ab1e92cfaf0968085145eb4d/elasticsearch/helpers/actions.py#L232
The text was updated successfully, but these errors were encountered: