Skip to content

Commit

Permalink
Include a preload joins vs. no joins discussion in the preload docs
Browse files Browse the repository at this point in the history
I'm not sure if Ecto wants to provide this level of guidance in
the docs, so I understand if this PR won't be merged.

I've just seen many well-meaning people do this wrong (imo) by
defaulting to joining all associations just to preload them,
resulting in a lot more code and potentially worse performance.

I also vaguely remember a tweet from José calling this out,
but couldn't find it again.

Here I tried to summarize what I think I know about the topic,
hoping that it's accurate and helpful.
  • Loading branch information
PragTob committed Oct 2, 2024
1 parent 49ee1fb commit 1175eac
Showing 1 changed file with 28 additions and 0 deletions.
28 changes: 28 additions & 0 deletions lib/ecto/query.ex
Original file line number Diff line number Diff line change
Expand Up @@ -2667,6 +2667,34 @@ defmodule Ecto.Query do
where: l.inserted_at > c.updated_at,
preload: [:author, comments: {c, likes: l}]
## Choosing Between Preloading with Joins vs. Separate Queries
Deciding between preloading associations via joins, a single large
query, (`preload: [comments: c]`) or separate smaller queries
(`preload: [:comments]`) depends on the specific use case.
Here are some factors to guide your decision:
* **Joins reduce database round trips:** By fetching data in a single
query, joins can minimize database round trips, potentially reducing
overall latency.
* **Potential for data duplication:** Joins may lead to duplicated
data in the result set, which requires more processing by Ecto
and consumes more bandwidth when transmitting the results.
* **Increased database load:** Joins can be taxing on the database,
increasing the demand on working memory. In some cases, the database
may need to write intermediate results to disk, which can slow down
query performance.
* **Parallelism with separate queries:** When using separate queries
outside of a transaction, Ecto can parallelize the preload queries,
which can speed up the overall operation.
In general, a good default is to only use joins in preloads if you're
already joining the associations in the main query. For example,
in the last query in the section above, comments and likes are already
joined, so they are included in the preload.
However, the author is not joined in the main query, so it is preloaded
via a separate query.
## Preload queries
Preload also allows queries to be given, allowing you to filter or
Expand Down

0 comments on commit 1175eac

Please sign in to comment.