Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Status of this project #2

Open
bf opened this issue Jul 13, 2022 · 4 comments
Open

Status of this project #2

bf opened this issue Jul 13, 2022 · 4 comments

Comments

@bf
Copy link

bf commented Jul 13, 2022

Hello, thanks for your hard work on this feature. I have some questions.

  • What is new size limitation of tsvector2, as the 1MB is no longer relevant?
  • I see you replaced GIN index with RUM index - is RUM index needed for tsvector2?
  • Did you try to get this merged with the main project?
  • As PostgreSQL is on version 14 now, what is overall status on this project? Did you find a better solution?

Thank you very much!

@ildus
Copy link
Collaborator

ildus commented Aug 4, 2022

Hi,

  1. 1MB limit is not relevant for this type, limitation is same as in other toasted types.
  2. RUM is optional. If the extensions finds RUM when installed, it'll add support functions for it.
  3. There's was a patch to postgres, but it didn't get through.
  4. I'm not working on this project anymore. But maybe I'll add support of newer postgres versions some day.

@bf
Copy link
Author

bf commented Aug 4, 2022

Hi Ildus, thanks for your elaborate answers to these questions.

I have now "circumvented" 1MB tsvector limitation in PG14 by using array_to_tsvector(string_to_array(_clean_text(text_content), ' ')) where _clean_text is a function that removes special characters. array_to_tsvector returns tsvector without the positional information, and this is what gets me under the 1MB threshold.

Overall, it is still an extremely ugly hack which I don't like. But I also don't understand why there isn't something like bigtsvector with optional stemming so the use case of exact search is also covered in postgresql.

@ildus
Copy link
Collaborator

ildus commented Aug 5, 2022

Yes, I think that is expected feature. I was trying to replace current tsvector type in the past, but maybe new type like bigtsvector based on this extension or at least as a contrib extension it could be accepted by committers.

@mguinness
Copy link

There's was a patch to postgres, but it didn't get through

For reference the patch was Remove 1MB size limit for tsvector but it was returned for feedback.

A new datatype bigtsvector was also mentioned in thread tsvector field length limitation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants