drop db_connection #273

ruslandoga · 2023-12-22T08:49:39Z

Since there is interest in moving db_connection to ecto_sqlite3 I thought I'd PR this change :)

TODOs:

finish tests
finish TODOs in code
add individual binds (done in https://github.com/ruslandoga/xqlite)
add prepare_v3 (done in https://github.com/ruslandoga/xqlite)
explain the NIF changes
move benchmarks elsewhere

README.md

c_src/sqlite3_nif.c

lib/exqlite/nif.ex

CHANGELOG.md

lib/exqlite.ex

warmwaffles · 2023-12-23T14:36:16Z

#271 functionality is "lost", what should be done about it?

Would it be useful to perhaps lift the callback definition to the same place after_connect is specified? https://github.com/elixir-ecto/db_connection/blob/fa5f705fa5d272ed28b64ee0954e4275c0260d36/lib/db_connection.ex#L118-L135 SQLite users can't be the only ones who would find a callback to fire before disconnect as useful.

josevalim · 2024-04-22T12:46:57Z

SQLite users can't be the only ones who would find a callback to fire before disconnect as useful.

My concern about such callbacks is that we can't ever rely on them running though (what if the whole node terminates), so I worry people may rely on it always executing. It probably makes more sense to setup a monitor process instead or similar.

ruslandoga · 2024-04-29T06:22:22Z

Sorry for disappearing without finishing the PR. I'm going to continue working on it later this week.

warmwaffles · 2024-04-29T13:34:32Z

@ruslandoga It's not a problem. I too, have had other priorities take over where I can't find time to carve out and think hard about this.

warmwaffles · 2024-09-17T14:04:51Z

c_src/sqlite3_nif.c

+  {"execute", 2, exqlite_execute},
+  {"changes", 1, exqlite_changes},
+  {"prepare", 2, exqlite_prepare},
+  {"columns", 2, exqlite_columns},
+  {"step", 2, exqlite_step},
+  {"interrupt", 1, exqlite_interrupt},
+  {"finalize", 1, exqlite_finalize},
+  {"last_insert_rowid", 1, exqlite_last_insert_rowid},
+  {"transaction_status", 1, exqlite_transaction_status},


Why are these not going through the Dirty NIF? These are io bound, specifically exqlite_step is definitely IO bound.

👋 @warmwaffles

Not necessarily. Sometimes exqlite_step can be safely scheduled on a non-dirty thread, this change introduces multiple versions of each nif, making it possible for the caller to decide which one to use. By default, the library would keep using dirty schedulers but it will also introduce a family of "unsafe" implementations :) E.g. Exqlite.step would use Exqlite.Nif.dirty_io_step and Exqlite.unsafe_step would use Exqlite.Nif.step.

I decided to do this split when I noticed that non-dirty exqlite_step is almost twice as performant as a dirty exqlite_multi_step on all data sets I tried them on.

Similarly, I noticed that calling non-dirty bind_* on each parameter from Elixir is faster than the current dirty "bind all" implementation. I think this makes for a cleaner API as it's both more intuitive for people familiar with SQLite C API and doesn't require {:blob, binary} workarounds. And bind_* functions also seem to be safe to schedule on non dirty threads. I will however name them "unsafe" as well, and keep the current multi-param bind implementation but with two changes, renaming it to bind_all and moving it to dirty CPU scheduler since it doesn't really touch IO.

As for the other functions, like exqlite_last_insert_rowid, exqlite_transaction_status and some others, I think they are neither IO nor CPU bound and they pretty much always return immediately, well below the 1ms nif guideline. This will require further testing though :)

My understanding of sqlite is that exqlite_step may invoke a read from disk if you reach the end of the results that are buffered, this could take longer than 1ms when the database is large.

I don't think they are IO or CPU bound and they pretty much always return immediately, meeting the 1ms nif guideline. But This will require further testing though :)

I'll look into simulating slow disk access (system under heavy write mode)

EDIT: Leaving this here for later https://serverfault.com/questions/523509/linux-how-to-simulate-hard-disk-latency-i-want-to-increase-iowait-value-withou

I'll look into simulating slow disk access (system under heavy write mode)

And I'll try to find their implementation. Right now I think last_insert_rowid, get_autocommit, etc. just read from internal connection state.

UPDATE:

last_insert_rowid just reads from db->lastRowid: https://github.com/sqlite/sqlite/blob/ff60709d681406df3644863feaee64fdff8ae2bd/src/main.c#L1117-L1128, so I think it's safe to always run on non-dirty threads

same for get_autocommit: https://github.com/sqlite/sqlite/blob/ff60709d681406df3644863feaee64fdff8ae2bd/src/main.c#L3849-L3863

I decided to do this split when I noticed that non-dirty exqlite_step is almost twice as performant as a dirty exqlite_multi_step on all data sets I tried them on.

I originally did the multi step implementation because I had a very large sqlite database of timeseries data and it did not have a lot of throughput. But of course that was on the implementation I wrote, and what you have here may actually be better.

warmwaffles · 2024-09-17T14:27:12Z

@ruslandoga do you have a branch on ecto_sqlite3 for changes made here? I'd like to give it a whirl if you do.

ruslandoga · 2024-09-17T14:31:27Z

No, sorry. I haven't touched ecto_sqlite3 yet. I'll try to come up with something tomorrow :)

UPDATE: sorry, got caught up in the nif again today and didn't do anything about the Ecto adapter :) Also I found https://github.com/max-au/sqlite and I think there are some interesting bits there:

it mentions that destructors run on the main schedulers and that closing the DB in the resource destructor might block the main schedulers and has a workaround with a monitoring process for it
it pre-defines atoms
it mentions that in-memory dbs are faster with SQLite's native memory allocators rather than the ones from Erlang

UPDATE: to make it easier to use both the current versions of Exqlite and Ecto.Adapters.SQLite3 and the ones from this PR, I started using a separate namespace for the new experimental approaches:

https://github.com/ruslandoga/xqlite -- just a nif wrapper
some benchmarks against Exqlite: https://github.com/ruslandoga/elixir-sqlite-bench
- XQLite's fetch_all is faster mostly due to using enif_make_new_binary in make_binary (which attempts to allocate small binaries on process heap) and skipping array creation in make_row
- bind_all is faster due to skipping dirty scheduling
- insert_all uses a different approach so it's not quite fair but I still included it there to see how that different approach fares against Exqlite's
https://github.com/ruslandoga/xqlite_readers -- a pool for readonly conns
https://github.com/ruslandoga/xqlite_ecto -- ecto adapter

I'm working on xqlite_readers right now :)

warmwaffles · 2024-09-17T16:08:04Z

I'll try to come up with something tomorrow

No rush, I can work with the raw NIF until then.

warmwaffles · 2024-09-17T17:33:46Z

One interesting thing to consider about moving the DB Connection to ecto_sqlite3. I would love to be able to run a repo with :memory: and it stay alive for as long as the application is up. My use case here is a queryable cache. With the way pooling is done now, I can not do that easily and have to resort to running a standalone genserver that takes SQL queries generated by ecto and ramming them in.

warmwaffles · 2024-10-08T14:30:16Z

Sorry @ruslandoga I changed a bit of the code in around the NIF and sqlite to raise an exception when binding arguments to a statement. I wonder if we can accomplish this move to remove db connection in parts rather than one big change.

Mainly aiming at, refactoring this in such a way to make that swap over easier to manage.

ruslandoga · 2024-10-08T14:38:13Z

I wonder if we can accomplish this move to remove db connection in parts rather than one big change.

Definitely!

to raise an exception when binding arguments

I actually also decided to mostly go with exceptions everywhere instead of error tuples :) But I took a different approach to binds which (somewhat surprisingly) 10x-ed bind_all performance, at least according to my benchmarks in https://github.com/ruslandoga/elixir-sqlite-bench.

I think I'll close this PR again (for now) and continue my experiments in https://github.com/ruslandoga/xqlite. Once I feel confident that the "new" API is good enough, I'll start opening small PRs to make it available from Exqlite while also keeping all currently available functionality intact.

warmwaffles · 2024-10-08T15:07:39Z

sounds good.

warmwaffles · 2024-10-08T15:09:58Z

That split out for the bind seems like a reasonable thing to do. Do you think we can backport that bit in? Currently fighting with a bug elixir-ecto/ecto_sql#638 around binding on ARM

ruslandoga · 2024-10-08T15:24:15Z

Regarding elixir-ecto/ecto_sql#638, with "individual binds" we can do something like this without much performance penalty.

# or band(i, 0xFFFFFFFF) == i, whatever ends up being "faster"
def bind_integer(stmt, idx, i) when i < 2147483647 and i > -2147483648 do
  bind_int_nif(stmt, idx, i)
end

def bind_integer(stmt, idx, i) do
  bind_int64_nif(stmt, idx, i)
end

But I'm not sure this is necessary on 64bit architectures since both enif_get_int and enif_get_int64 would probably use the same underlying type: https://github.com/erlang/otp/blob/9e5a088ca02b196788260aa8dcd6db3972e6031b/erts/emulator/beam/erl_drv_nif.h#L107-L141

warmwaffles · 2024-10-08T16:28:43Z

I personally think probably just using int64 for all integers is the safest thing. I don't know if there is a real use case to not do that.

ruslandoga · 2024-10-08T16:32:00Z

~~enif_get_int actually seems to be 64bit on 64bit archs as well: https://github.com/erlang/otp/blob/9e5a088ca02b196788260aa8dcd6db3972e6031b/erts/emulator/beam/erl_drv_nif.h#L127-L128~~ ah, wrong. enif_get_int still binds just int, not ErlNapiSInt

I don't know if there is a real use case to not do that.

Right, I don't remember when I had to interact with a 32bit system last time. Probably a safe trade-off of (some?) performance for cleaner and easier to understand code.

warmwaffles · 2024-10-08T16:38:01Z

I don't think there is a real perf difference checking for it being outside of int32 min and max. That branch logic itself will slow it down. Last I checked with SQLite, it will auto size the integer in the database, but for code binding, you just need to make sure to bind with 64 bit and move on.

Besides, in this case the migration timestamps are integers that will never work for 32 bit integers.

ruslandoga · 2024-10-08T17:11:41Z

That branch logic itself will slow it down.

Right: ruslandoga/xqlite#3