-
Notifications
You must be signed in to change notification settings - Fork 371
Building indexes
Designing indexes for full-text search queries in PostgreSQL is a delicate art. Here are some tips to help guide developers to find an index definition that works for their application.
Please update this page with any new tips or tricks you might discover.
You must use GIN or GIST indexes for full text search
Essentially you look in a SQL statement for the operator (% or @@) and grab the operand that isn't the query.
Then you do something like this:
CREATE INDEX my_cool_index ON my_cool_table USING gist (([expression]));
or (for trigram only) (Trigram seems to underperform the other types).
CREATE INDEX my_cool_index ON my_cool_table USING gist (([expression]) gist_trgm_ops);
If you are using multiple columns and/or weighted columns then you want to grab the whole expression; that is, you should have one index, not one index per column. You also want to include the setweight
function as part of the index if you are utilizing weighted columns. For example, if you have a Page
model with the following configuration:
pg_search_scope :search_full_text,
:against => {
name: "A",
keywords: "B",
body: "C"
},
:using => {
:tsearch => {
:prefix => true,
:any_word => true,
:dictionary => "english",
}
}
then your index will end up being something like this:
CREATE INDEX "ts_vector_index_on_pages" ON "pages" USING gin (
setweight(to_tsvector('english', coalesce("pages"."name"::text, '')), 'A') ||
setweight(to_tsvector('english', coalesce("pages"."keywords"::text, '')), 'B') ||
setweight(to_tsvector('english', coalesce("pages"."body"::text, '')), 'C')
)
For apps using multisearchable
and a pg_search_documents
table, the index migration would look something like this:
# IMPORTANT you must substitute in the correct to_tsvector
# expression. Look for "to_tsvector" in your your SQL query logs:
add_index :pg_search_documents, YOUR_EXPRESSION_HERE, using: :gin, name: "index_pg_search_documents_on_content"
# For example for a tsvector column
# add_index :pg_search_documents, %[to_tsvector('simple', coalesce("pg_search_documents"."content"::text, ''))], using: :gin, name: "index_pg_search_documents_on_content"
# For example for a content column where you are `using: :trigram`
execute <<~SQL
CREATE INDEX pg_search_documents_on_content ON pg_search_documents USING gin(coalesce(content, ''::text) gin_trgm_ops)
SQL
Note: not sure about multisearchable
, but for pg_search_documents
, if you’re setting it against
multiple fields, you need to create a multicolumn index that includes all those fields. This blog post describes one approach to this.
First, you add the index like this:
add_index :pg_search_documents, %[to_tsvector('simple', coalesce("pg_search_documents"."content"::text, ''))], using: :gin, name: "index_pg_search_documents_on_content"
Second, you change your model to multisearch and add the scope:
class User < ApplicationRecord
multisearchable(
against: [:name]
)
scope :full_text_search_for, -> (term) do
joins(:pg_search_document).merge(
PgSearch.multisearch(term).where(searchable_type: klass.to_s)
)
end
end
Then, you can use the scope and the index:
User.full_text_search_for('term')
Alternatively, you can use a trigger to automatically populate a column of type tsvector with the [expression] from above, and then use the :tsvector_column option in pg_search to tell it to use that column instead. Then you'd make a very similar index as the above (option 1) but put the column in instead of [expression]
You'll need enough rows in your table for EXPLAIN ANALYZE to show up using the index, and the index is brittle on the expression, so if you change the pg_search configuration, you'll have to change the index.
Begin by making sure you have a tsvector column in your table:
add_column :posts, :tsv_body, :tsvector
Add an index on the tsv_body column. As of ActiveRecord 4.0, you can add indexes and specify the "type" of index easily:
add_index(:posts, :tsv_body, using: 'gin')
For the triggers, you can use hair_trigger to add database triggers via your migrations:
create_trigger(compatibility: 1).on(:posts).before(:insert, :update) do
"new.tsv_body := to_tsvector('pg_catalog.english', coalesce(new.body,''));"
end
You can also execute plain SQL for adding trigger function in your migration code:
say_with_time("Adding trigger function on posts for updating tsv_body column") do
sql = <<-MIGRATION
CREATE TRIGGER tsvectorupdate BEFORE INSERT OR UPDATE
ON posts FOR EACH ROW EXECUTE PROCEDURE
tsvector_update_trigger(tsv_body, 'pg_catalog.simple', body);
MIGRATION
execute(sql)
end
To update tsv_body
column for existing records, you can simply run
Post.find_each(&:touch)
or
execute("UPDATE posts SET id = id")
The latter will be faster since it won't load each post into memory as a Ruby object.
Don't forget to add the column to your pg_search specification, or the DB won't use your index.
using: {
tsearch: {
tsvector_column: "tsv_body"
}
}
To ensure the DB is using your index, copy the search's SELECT
query from your development log and run an EXPLAIN
(or use Rails' explain
function if you prefer). Look for a bitmap scan on your tsv index. If your index doesn't appear in the EXPLAIN
output, it's not being used.