Ask about hash expansion #97

KeeProMise · 2022-06-26T08:17:25Z

During the hash expansion, the data in the hash bucket on the disk will be moved to the new bucket. Is the performance affected by the disk IO?

vinniefalco · 2022-06-26T14:44:48Z

I'm not sure I understand the question. NuDB only works with SSD drives, and the performance is near the theroetical maximum for what is possible with a single write queue. Are you asking me if the overall performance of the library is affected by the performance of the SSD drive? Obviously yes, so I am maybe not understanding the question fully.

KeeProMise · 2022-06-26T14:51:45Z

I'm not sure I understand the question. NuDB only works with SSD drives, and the performance is near the theroetical maximum for what is possible with a single write queue. Are you asking me if the overall performance of the library is affected by the performance of the SSD drive? Obviously yes, so I am maybe not understanding the question fully.

I know that you use linear hash to solve hash table expansion. Do you directly use the native linear hash expansion algorithm? In my understanding, the expansion of the native linear hash table will cause the data in some slots to be moved to the new slot. I think this process will involve random io. Have you optimized this process?

vinniefalco · 2022-06-26T14:56:56Z

Do you directly use the native linear hash expansion algorithm?

No. The principle is the same, but since NuDB is block/bucket based, every expansion of the hash table adds an entire block of slots and not just one. NuDB is very fast. You should run your own benchmarks to confirm that the performance meets your requirements.

There's a benchmark program that compares NuDB to other key/value stores:
https://github.com/CPPAlliance/NuDB/tree/master/bench

KeeProMise · 2022-06-26T15:07:39Z

Do you directly use the native linear hash expansion algorithm?

No. The principle is the same, but since NuDB is block/bucket based, every expansion of the hash table adds an entire block of slots and not just one. NuDB is very fast. You should run your own benchmarks to confirm that the performance meets your requirements.

There's a benchmark program that compares NuDB to other key/value stores: https://github.com/CPPAlliance/NuDB/tree/master/bench

Thank you for your reply, I would also like to ask, is your key a fixed length? How many bits do you map the key to, and what is the hash algorithm for the key? What is the approximate number of keys you support? What is the approximate conflict rate?

vinniefalco · 2022-06-26T15:36:20Z

is your key a fixed length?

Yes. You set the key size when the database is created.

How many bits do you map the key to

64 bits

what is the hash algorithm for the key

The default algorithm uses xxhasher which is fine for general purpose use. But this can be changed, if you can take advantage of how your keys are distributed.

What is the approximate number of keys you support

Because of the birthday problem, you can insert 2^32 (over 4 billion) keys before performance will begin to degrade. But since a database can be trivially split up (by dividing the key space by a power of two) the number of keys is essentially unlimited. NuDB is in production use with databases that are over 13 terabytes in size.

What is the approximate conflict rate?

The default settings aim for 50% occupancy in buckets. Less than 1% of buckets require two blocks on disk ("spills").

Remember that each bucket is designed to hold many keys not just one. And the cost for I/O is much higher than the cost for in-memory lookup.

https://en.wikipedia.org/wiki/Birthday_problem

KeeProMise · 2022-06-26T16:00:19Z

Thank you very much for your patient answer. At present, I want to design a kv database, the key is a string, and the value is a list, which is used to store some integer values. Currently, I need to support adding kv operations and adding numbers to a list of a key. And want to get the list of a key as quickly as possible, do you mind? At present, my idea is to use linear hash to solve the expansion of the hash table when adding keys, but adding an integer value to a key to the list may cause the value of a key to be distributed in multiple disk blocks, which requires multiple disk io to obtain the complete value, so the read performance is poor.

vinniefalco · 2022-06-26T16:34:48Z

NuDB is not going to work well for you, because you intend to change the value of already-inserted entries. Modification and deletion of existing values is not possible in NuDB.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ask about hash expansion #97

Ask about hash expansion #97

KeeProMise commented Jun 26, 2022

vinniefalco commented Jun 26, 2022

KeeProMise commented Jun 26, 2022

vinniefalco commented Jun 26, 2022 •

edited

Loading

KeeProMise commented Jun 26, 2022

vinniefalco commented Jun 26, 2022 •

edited

Loading

KeeProMise commented Jun 26, 2022

vinniefalco commented Jun 26, 2022

Ask about hash expansion #97

Ask about hash expansion #97

Comments

KeeProMise commented Jun 26, 2022

vinniefalco commented Jun 26, 2022

KeeProMise commented Jun 26, 2022

vinniefalco commented Jun 26, 2022 • edited Loading

KeeProMise commented Jun 26, 2022

vinniefalco commented Jun 26, 2022 • edited Loading

KeeProMise commented Jun 26, 2022

vinniefalco commented Jun 26, 2022

vinniefalco commented Jun 26, 2022 •

edited

Loading

vinniefalco commented Jun 26, 2022 •

edited

Loading