-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disk usage increase after moving data #67
Comments
Just after posting the above, I found https://hexdocs.pm/cubdb/CubDB.html#current_db_file/1 in the docs. Running that gives me "1A.cub" as the current db file (last file in the list above). After moving all the other files, I still have a working CubDB instance with the expected number of records, so this seems to validate that disk usage is OK now (2.4GB after is similar to 2.5GB before, which it should be). This does leave one question open: is it a bug in CubDB that unused previous database files aren't removed from the file system? Or is this expected behaviour and is it up to me as a user to clean up the data directory once in a while? |
Hi @florish , thanks for reporting this. Yes, it looks like a bug: the old Meanwhile, if you do notice something suspicious in your logs (such as the |
I tried to reproduce it manually and with some automated tests, but the old files are always cleaned up upon compaction. A clean up of old files can be blocked if some read operation started before the compaction (so it references an old file), and did not yet complete. When the read operation completes, the clean up should be performed as soon as the first write operation occurs, but if the application is restarted before the clean up, the next clean up is triggered upon the next compaction. In your case, you performed a manual compaction, which seems to have completed (the data directory only contains One possibility, although unlikely, is that a snapshot or read operation was still referencing those old files, preventing clean up. For example, this would block clean up of old files indefinitely: # Spawn a process that creates a snapshot of the database, and waits indefinitely for it:
spawn(fn ->
CubDB.with_snapshot(db, fn snap ->
# Wait indefinitely for a message
receive do
x -> x
end
end)
end)
# This compaction will succeed, but will not cleanup the old file, because a snapshot still references it
CubDB.compact(db) Do you think it's possible that something like that occurred? As a side note, if you still have this information, it would be interesting to know the creation time of those CubDB files from |
Hi @lucaong , thank for the detailed response!
Here are the creation times:
As you can see, all created within a time frame of 12 minutes. As I was actively working on data changes on my local machine, it could very well be that I've stopped and started the application multiple times, which could have triggered the situation you describe above. I'm not able to reproduce the exact order of migration and (manual) compacting steps, unfortunately.
I haven't used snapshot functions much, and certainly not with non-standard timeouts, so I'm pretty sure this wasn't the case in my situation.
After migrating the data, I did not perform any writes before posting the issue (I am 100% sure, as this is on my local machine, so no other users could have been writing to CubDB without me knowing about it). If I understand correctly, then this could be the reason for the old files still being around. I'm not sure how to reproduce this, however. Hope this information helps. Let me know if you need anything else! |
Thank you @florish , Anyway, your solution of manually removing old files is perfectly valid. You might want to make sure that calling |
One idea of how it could happen. Suppose you copy over the keys like in this script: {:ok, db} = CubDB.start_link(data_dir)
# Original keys:
CubDB.put_multi(db, (1..1000) |> Enum.map(fn x -> {x, x} end))
# Move each entry to a new key
CubDB.select(db)
|> Stream.each(fn {k, v} ->
CubDB.put(db, "new-#{k}", v)
CubDB.delete(db, k)
end)
|> Stream.run() In this case, since the copying is executed in a lazy stream, the What does not convince me is that manually running I would suggest to try calling In case you observe something out of ordinary, you can also try subscribe to debugging events about compaction in your console and see what you get. Normally, this is what you should observe (notice the {:ok, db} = CubDB.start_link(data_dir)
CubDB.subscribe(db)
CubDB.compact(db)
# Wait for compaction to end, or call flush repeatedly
flush
# =>
# :compaction_started
# :compaction_completed
# :clean_up_started
# :catch_up_completed
# :ok |
@lucaong I've followed the steps (start_link, subscribe, compact) with my local 2.5GB database, and the So the cleanup on compaction seems to work as expected. As in my case, multiple |
Thanks a lot @florish , this is very useful. There is definitely something odd going on, and I'd love to find it and fix it. I'll investigate more. What version of |
Great!
CubDB 2.0.2 on macOS 11.7, with Elixir 1.14.2.
Yes! |
Hi, not entirely sure if this is a bug, but it felt weird, so here's a report. Feel free to dismiss!
I've been using
cubdb
as a simple HTTP response cache for a while. In my development database, I had ± 10K records with 2.5GB disk space in use (measured usingdu -sh
on the CubDB data directory).Due to a data structure change, I've been copying all data to be stored under a different
key
within the same CubDB database. Afterwards, the old data has been removed, which means that the net amount of data should not change significantly.I noticed, however, that the database is now using 22GB (!) of disk space, while the total number of records (measured using
CubDB.size/1
) is as expected, which means the expected data deletion did take place. Manually runningCubDB.compact/1
does not change the disk space used.As I did not expect this disk usage increase, I'm suspecting some kind of bug, but I'm not sure.
One thing that could be helpful: I did not measure the number of
.cub
files in use before the migration, but afterwards these are the current files with their respective file size:Please let me know if any other information is needed, I'm happy to help!
The text was updated successfully, but these errors were encountered: