Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

drop db_connection #273

Closed
wants to merge 5 commits into from
Closed

Conversation

ruslandoga
Copy link
Contributor

@ruslandoga ruslandoga commented Dec 22, 2023

Since there is interest in moving db_connection to ecto_sqlite3 I thought I'd PR this change :)

TODOs:

README.md Outdated Show resolved Hide resolved
README.md Show resolved Hide resolved
README.md Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
lib/exqlite/nif.ex Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
@warmwaffles
Copy link
Member

#271 functionality is "lost", what should be done about it?

Would it be useful to perhaps lift the callback definition to the same place after_connect is specified? https://github.com/elixir-ecto/db_connection/blob/fa5f705fa5d272ed28b64ee0954e4275c0260d36/lib/db_connection.ex#L118-L135 SQLite users can't be the only ones who would find a callback to fire before disconnect as useful.

@josevalim
Copy link

SQLite users can't be the only ones who would find a callback to fire before disconnect as useful.

My concern about such callbacks is that we can't ever rely on them running though (what if the whole node terminates), so I worry people may rely on it always executing. It probably makes more sense to setup a monitor process instead or similar.

@ruslandoga
Copy link
Contributor Author

Sorry for disappearing without finishing the PR. I'm going to continue working on it later this week.

@warmwaffles
Copy link
Member

@ruslandoga It's not a problem. I too, have had other priorities take over where I can't find time to carve out and think hard about this.

Comment on lines +1239 to +1247
{"execute", 2, exqlite_execute},
{"changes", 1, exqlite_changes},
{"prepare", 2, exqlite_prepare},
{"columns", 2, exqlite_columns},
{"step", 2, exqlite_step},
{"interrupt", 1, exqlite_interrupt},
{"finalize", 1, exqlite_finalize},
{"last_insert_rowid", 1, exqlite_last_insert_rowid},
{"transaction_status", 1, exqlite_transaction_status},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these not going through the Dirty NIF? These are io bound, specifically exqlite_step is definitely IO bound.

Copy link
Contributor Author

@ruslandoga ruslandoga Sep 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👋 @warmwaffles

Not necessarily. Sometimes exqlite_step can be safely scheduled on a non-dirty thread, this change introduces multiple versions of each nif, making it possible for the caller to decide which one to use. By default, the library would keep using dirty schedulers but it will also introduce a family of "unsafe" implementations :) E.g. Exqlite.step would use Exqlite.Nif.dirty_io_step and Exqlite.unsafe_step would use Exqlite.Nif.step.

Copy link
Contributor Author

@ruslandoga ruslandoga Sep 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to do this split when I noticed that non-dirty exqlite_step is almost twice as performant as a dirty exqlite_multi_step on all data sets I tried them on.

Similarly, I noticed that calling non-dirty bind_* on each parameter from Elixir is faster than the current dirty "bind all" implementation. I think this makes for a cleaner API as it's both more intuitive for people familiar with SQLite C API and doesn't require {:blob, binary} workarounds. And bind_* functions also seem to be safe to schedule on non dirty threads. I will however name them "unsafe" as well, and keep the current multi-param bind implementation but with two changes, renaming it to bind_all and moving it to dirty CPU scheduler since it doesn't really touch IO.

Copy link
Contributor Author

@ruslandoga ruslandoga Sep 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for the other functions, like exqlite_last_insert_rowid, exqlite_transaction_status and some others, I think they are neither IO nor CPU bound and they pretty much always return immediately, well below the 1ms nif guideline. This will require further testing though :)

Copy link
Member

@warmwaffles warmwaffles Sep 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding of sqlite is that exqlite_step may invoke a read from disk if you reach the end of the results that are buffered, this could take longer than 1ms when the database is large.

I don't think they are IO or CPU bound and they pretty much always return immediately, meeting the 1ms nif guideline. But This will require further testing though :)

I'll look into simulating slow disk access (system under heavy write mode)

EDIT: Leaving this here for later https://serverfault.com/questions/523509/linux-how-to-simulate-hard-disk-latency-i-want-to-increase-iowait-value-withou

Copy link
Contributor Author

@ruslandoga ruslandoga Sep 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll look into simulating slow disk access (system under heavy write mode)

And I'll try to find their implementation. Right now I think last_insert_rowid, get_autocommit, etc. just read from internal connection state.


UPDATE:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to do this split when I noticed that non-dirty exqlite_step is almost twice as performant as a dirty exqlite_multi_step on all data sets I tried them on.

I originally did the multi step implementation because I had a very large sqlite database of timeseries data and it did not have a lot of throughput. But of course that was on the implementation I wrote, and what you have here may actually be better.

@warmwaffles
Copy link
Member

@ruslandoga do you have a branch on ecto_sqlite3 for changes made here? I'd like to give it a whirl if you do.

@ruslandoga
Copy link
Contributor Author

ruslandoga commented Sep 17, 2024

No, sorry. I haven't touched ecto_sqlite3 yet. I'll try to come up with something tomorrow :)


UPDATE: sorry, got caught up in the nif again today and didn't do anything about the Ecto adapter :) Also I found https://github.com/max-au/sqlite and I think there are some interesting bits there:

  • it mentions that destructors run on the main schedulers and that closing the DB in the resource destructor might block the main schedulers and has a workaround with a monitoring process for it
  • it pre-defines atoms
  • it mentions that in-memory dbs are faster with SQLite's native memory allocators rather than the ones from Erlang

UPDATE: to make it easier to use both the current versions of Exqlite and Ecto.Adapters.SQLite3 and the ones from this PR, I started using a separate namespace for the new experimental approaches:

I'm working on xqlite_readers right now :)

@warmwaffles
Copy link
Member

I'll try to come up with something tomorrow

No rush, I can work with the raw NIF until then.

@warmwaffles
Copy link
Member

warmwaffles commented Sep 17, 2024

One interesting thing to consider about moving the DB Connection to ecto_sqlite3. I would love to be able to run a repo with :memory: and it stay alive for as long as the application is up. My use case here is a queryable cache. With the way pooling is done now, I can not do that easily and have to resort to running a standalone genserver that takes SQL queries generated by ecto and ramming them in.

@warmwaffles
Copy link
Member

Sorry @ruslandoga I changed a bit of the code in around the NIF and sqlite to raise an exception when binding arguments to a statement. I wonder if we can accomplish this move to remove db connection in parts rather than one big change.

Mainly aiming at, refactoring this in such a way to make that swap over easier to manage.

@ruslandoga
Copy link
Contributor Author

ruslandoga commented Oct 8, 2024

I wonder if we can accomplish this move to remove db connection in parts rather than one big change.

Definitely!

to raise an exception when binding arguments

I actually also decided to mostly go with exceptions everywhere instead of error tuples :) But I took a different approach to binds which (somewhat surprisingly) 10x-ed bind_all performance, at least according to my benchmarks in https://github.com/ruslandoga/elixir-sqlite-bench.

I think I'll close this PR again (for now) and continue my experiments in https://github.com/ruslandoga/xqlite. Once I feel confident that the "new" API is good enough, I'll start opening small PRs to make it available from Exqlite while also keeping all currently available functionality intact.

@ruslandoga ruslandoga closed this Oct 8, 2024
@warmwaffles
Copy link
Member

sounds good.

@warmwaffles
Copy link
Member

warmwaffles commented Oct 8, 2024

That split out for the bind seems like a reasonable thing to do. Do you think we can backport that bit in? Currently fighting with a bug elixir-ecto/ecto_sql#638 around binding on ARM

@ruslandoga
Copy link
Contributor Author

ruslandoga commented Oct 8, 2024

Regarding elixir-ecto/ecto_sql#638, with "individual binds" we can do something like this without much performance penalty.

# or band(i, 0xFFFFFFFF) == i, whatever ends up being "faster"
def bind_integer(stmt, idx, i) when i < 2147483647 and i > -2147483648 do
  bind_int_nif(stmt, idx, i)
end

def bind_integer(stmt, idx, i) do
  bind_int64_nif(stmt, idx, i)
end

But I'm not sure this is necessary on 64bit architectures since both enif_get_int and enif_get_int64 would probably use the same underlying type: https://github.com/erlang/otp/blob/9e5a088ca02b196788260aa8dcd6db3972e6031b/erts/emulator/beam/erl_drv_nif.h#L107-L141

@warmwaffles
Copy link
Member

I personally think probably just using int64 for all integers is the safest thing. I don't know if there is a real use case to not do that.

@ruslandoga
Copy link
Contributor Author

ruslandoga commented Oct 8, 2024

enif_get_int actually seems to be 64bit on 64bit archs as well: https://github.com/erlang/otp/blob/9e5a088ca02b196788260aa8dcd6db3972e6031b/erts/emulator/beam/erl_drv_nif.h#L127-L128 ah, wrong. enif_get_int still binds just int, not ErlNapiSInt

I don't know if there is a real use case to not do that.

Right, I don't remember when I had to interact with a 32bit system last time. Probably a safe trade-off of (some?) performance for cleaner and easier to understand code.

@warmwaffles
Copy link
Member

I don't think there is a real perf difference checking for it being outside of int32 min and max. That branch logic itself will slow it down. Last I checked with SQLite, it will auto size the integer in the database, but for code binding, you just need to make sure to bind with 64 bit and move on.

Besides, in this case the migration timestamps are integers that will never work for 32 bit integers.

@ruslandoga
Copy link
Contributor Author

That branch logic itself will slow it down.

Right: ruslandoga/xqlite#3

@ruslandoga ruslandoga mentioned this pull request Oct 9, 2024
@ruslandoga ruslandoga deleted the no_db_connection branch October 10, 2024 07:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants