Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open-CAS vs dm-cache, dm-writecache, bcache #1221

Open
mikabytes opened this issue Jun 2, 2022 · 8 comments
Open

Open-CAS vs dm-cache, dm-writecache, bcache #1221

mikabytes opened this issue Jun 2, 2022 · 8 comments
Assignees
Labels
question Further information is requested

Comments

@mikabytes
Copy link

mikabytes commented Jun 2, 2022

Question

Is this project aimed at solving the same problem as LVM cache strategies such as dm-cache, dm-writecache, or bcache? If it is, then what need is Open-CAS serving that isn't already met? If not, please help me understand in what situations Open-CAS is preferrable.

Motivation

I am currently investigating how to improve my homelab datacenter. So far, I've gone through Ceph, DRBD, GlusterFS. Backed by raided drives, pure drives, SSDs, HDDs, NVMes. It is clear that the limitations of HDD IOPS is a common problem no matter what software strategy I use. I could go for a pure SSD/NVMe solution, but then I have all these rotational drives just laying around... It would be good to be able to keep using them at least until they break on their own.

I have yet to fully test any of these mentioned caching solutions.

Thank you.

@mikabytes mikabytes added the question Further information is requested label Jun 2, 2022
@rafalste rafalste self-assigned this Jun 2, 2022
@rafalste
Copy link
Contributor

rafalste commented Jun 2, 2022

Hi @mikabytes,
we did some comparison of those caching solutions about three years ago. As of now this might be a bit outdated, but in general Open CAS is more powerful, configurable, efficient and - paradoxically - more easy to use then those other solutions.

To be more precise, here are some details about Open CAS that you may find useful:

  • support many caching modes, that handles data in a particular way (Write-Through, Write-Back, Write-Around, Write-Invalidate, Write-Only, Pass-Through)
  • selectable cache line sizes (4, 8, 16, 32, 64 [KiB])
  • separated policies for cache data replacement (promotion: always, nhit; cleaning: ALRU, ACP, NOP; eviction: LRU)
  • sequential data cut-off policies (full, always, never (+ threshold setting))
  • IO classification (filter IO and redirect based on user defined rules)
  • multicore (backend device, e.g. HDD) support
  • core (backend device) transparency
  • detailed IO statistics

You can find more info in Open CAS documentation (which other solutions also lack sometimes).

Moreover, Open CAS is actively developed, maintained and tested, as it is used in many commercial production environments, which makes it in general more reliable then other solutions.

The main disadvantage in comparison to the alternatives, is that Open CAS is not a kernel built-in. But the installation process should take no more effort than issuing simple ./configure && make && make install (or you can even generate RPM/DEB package using our package generating tool, by simply running make rpm/deb). :)

Hope this answers you question, but feel free to ask any follow-up if needed. :)

@rafalste
Copy link
Contributor

rafalste commented Jun 2, 2022

One more thing worth mention - Open CAS "engine" called OCF (Open CAS Framework) is a part of SPDK software that aims to increase storage performance even further, by omitting the kernel and putting all storage operations in userspace.

@mikabytes
Copy link
Author

Thank you for the detailed answer. That's excellent. Looking forward to giving it a good go once kernel 5.13+ support lands.

@mmichal10
Copy link
Contributor

Hi @mikabytes,

Open CAS v22.6 was released a few days ago. The see the recent changes, please take a look at the release notes

@mrpops2ko
Copy link

hi i saw it mentioned in #1414 and #1433 that preemptive mode is required in order to use opencas. Is that still the case and are there any plans to change that in the future?

I ask because quite a few of the linux kernels now come shipped with preemptive mode by default.

root@choedan-kal:/etc/opencas# grep PREEMPT /boot/config-$(uname -r)
CONFIG_PREEMPT_BUILD=y
# CONFIG_PREEMPT_NONE is not set
CONFIG_PREEMPT_VOLUNTARY=y
# CONFIG_PREEMPT is not set
CONFIG_PREEMPT_COUNT=y
CONFIG_PREEMPTION=y
CONFIG_PREEMPT_DYNAMIC=y
CONFIG_PREEMPT_RCU=y
CONFIG_HAVE_PREEMPT_DYNAMIC=y
CONFIG_HAVE_PREEMPT_DYNAMIC_CALL=y
CONFIG_PREEMPT_NOTIFIERS=y
CONFIG_DRM_I915_PREEMPT_TIMEOUT=640
# CONFIG_DEBUG_PREEMPT is not set
# CONFIG_PREEMPT_TRACER is not set
# CONFIG_PREEMPTIRQ_DELAY_TEST is not set
root@choedan-kal:/etc/opencas# casadm -V
╔═════════════════════════╤═════════════════════╗
║ Name                    │       Version       ║
╠═════════════════════════╪═════════════════════╣
║ CAS Cache Kernel Module │ 22.12.0.0843.master ║
║ CAS CLI Utility         │ 22.12.0.0843.master ║
╚═════════════════════════╧═════════════════════╝
root@choedan-kal:/etc/opencas# uname -a
Linux choedan-kal 6.1.34-060134-generic #202306141038 SMP PREEMPT_DYNAMIC Wed Jun 14 10:45:57 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
root@choedan-kal:/etc/opencas# uname -r
6.1.34-060134-generic

this was on a ubuntu 22.04 server installation.

@TheLinuxGuy
Copy link

@mikabytes did you ever set this up and what were your findings?

I am also looking to experiment with Open-CAS on my Proxmox node to compare it to bcache performance. I haven't found many information online about anyone setting this up on proxmox.

@mikabytes
Copy link
Author

mikabytes commented Jul 31, 2024

Hi @TheLinuxGuy

While I did evaluate the other options, I never got around to trying Open-CAS once the Linux kernel support landed.

I concluded that this kind of caching was an ill fit for my use case. Most of my big data is rarely accessed, and the data that is frequently accessed follows a known pattern. So, the ideal solution ended up being a script that retires data to rotational drives every night. I'm overlaying the devices with MergerFS so it's all transparent from the application layer.

Since my initial post, a few years have passed and the price of SSDs kept dropping. Now, half my storage is already SSD, further decreasing my need for adaptive caching strategies.

Sometimes the simplest solution is the best. I'd look into something like Open-CAS again if I had a large dataset that is less predictable though.

@aedgie
Copy link

aedgie commented Aug 1, 2024

Hi all,

I just want to share results of my comparing bcache vs OpenCAS in flushing optimization, specifically merging neighbor dirty sectors and write them once in flush.

TLDR: OpenCAS is better.

We all know that most part of HDD latency is just waiting until required part of disk surface moves under heads. That's why random IOPS of ordinary HDD is about 100-200 (even disk with ideal heads has to wait on average half of one revolution, which gives 7200rpm/60seconds*2 = 240 readwrites per second). Internal cache and modern firmware can increase this number, but IOPS value order will remain.
So I became curious: can SSD caching convert set of just written random sectors to something similar to sequential flow when dumping them to HDD? And if yes, how close to sequential write it would be. Because better merge decreases HDD load, keeps more space for read operations and therefore overall performance. Even if my new system survives high IO load spike I want it to stay alive further when cache flushes dirty blocks to free space for new data.

My idea was to create very slow block device using "delay" target of device mapper (dm-delay), then use it as backing device like HDD and use RAM disk like SSD cache.
dm-delay allows to insert delay for every operation. Also it allows to set delays for reads or writes separately.
So I made delayed disk with write delay of 1 second (1000 ms) and no read delay.

$ echo "0 $(blockdev --getsz /dev/loop1) delay /dev/loop1 0 0 /dev/loop1 0 1000" | dmsetup create delayed_disk

Then I confirmed that writing to delayed disk either 4K or 128K block tooks roughly same time: 1 second. I.e. delay does not depend on request size.
Then built RAMdisk with size larger than delayed_disk.
Then built bcache and after test finished destroyed it and built OpenCAS. Both in writeback mode.

bcache:

$ make-bcache -C /dev/loop1 -B /dev/mapper/delayed_disk
$ echo writeback > /sys/block/bcache0/bcache/cache_mode
$ echo 100 > /sys/block/bcache0/bcache/writeback_percent

OpenCAS:

$ casadm --start-cache --cache-device /dev/loop2 --cache-mode wb --cache-line-size 4 --cache-id 999
$ casadm --add-core --core-device /dev/disk/by-id/delayed_disk --cache-id 999 --core-id 888

Test was simple, I used fio with parameters: only writing randomly (readwrite=randwrite), data size equal to size of delayed_disk:

fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=fiotest --bs=4k --iodepth=8 --size= --readwrite=randwrite --runtime=15 --filename=[/dev/bcahe0 | /dev/cas999-888]

After test I started flush
For bcache:

$ echo $(( 1024 * 1024 * 1024 / 512 )) > /sys/block/bcache0/bcache/writeback_rate
$ echo 0 > /sys/block/bcache0/bcache/writeback_delay
$ echo writethrough > /sys/block/bcache0/bcache/cache_mode
$ time while [ "x$(cat /sys/block/bcache0/bcache/state)" == "xdirty" ] ; do sleep 0.1; done

for OpenCAS:

time casadm --flush-cache --cache-id 999 --core-id 888

And eventually

Results

for delayed_disk with size = 32MB, write data size = 32MB (full disk), RAM disk size = 512MB and write delay = 1000ms:

Flush time (32MB, 100% fill):
  bcache  ~ 170 seconds
  OpenCAS ~ 15 seconds

for delayed_disk with size = 256MB, write data size = 64MB (25% of disk), RAM disk size = 512MB and write delay = 1000ms:

Flush time (256MB, 25% fill):
  bcache  ~ 300 seconds
  OpenCAS ~ 35 seconds

Obviously OpenCAS is a winner. OpenCAS merges and flushes neighbour dirty sectors 10 times faster than bcache. In other words OpenCAS produces IOPS to HDD up to 10 times less than bcache in this artificial but illustrative experiment.

Hope results can help someone to choose.

PS

Btw IOPS while fio'ing before flush also differ a much:

IOPS (32MB, 100% fill):
  bcache  ~ 10k
  OpenCAS ~ 100k
IOPS (256MB, 25% fill):
  bcache  ~ 18k
  OpenCAS ~ 80k

PPS

If someone wants to repeat tests attachment contains scripts I used in tests.
There are three main parameters . All are in bytes:

FSIZE - size of delayed disk used as backing device
RSIZE - size of RAM disk used as caching device
FIOSIZE - volume of data written to bcache/cas device by fio test

and one parameter for write delay of delayed_disk (milliseconds):

DELAYMS

linearization_tests.tar.gz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

6 participants