-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Schema on Reads #1133
Comments
@rramachand21 are you working on this? |
Thanks for putting up this proposal. Downside of this feature, clients will start taking the easy path and use schema on reads even for fields that are being used frequently. we should think of some field usage and guardrails to avoid abusing the feature. |
@imRishN what about the existing runtime fields feature? Looks almost the same that you are proposing here: |
@lrynek You are very right, the runtime fields serve the same purpose (schema on read) but this is proprietary Elasticseach feature / implementation. The goal of this RFC is to provide similar functionality on OpenSearch side (but obviously it cannot be copied as is). |
@imRishN Oh, haven't known that ,thanks for explanation! 👍 // It's that I assumed that given OpenSearch is a fork of Elasticsearch 7 version, it would have all the features available for that version too. Have we got any reference for such discrepancies between the two projects? It would be awesome...😎 |
OpenSearch forked at 7.10.2, so anything added in OpenSearch or ES since then is likely different. |
@dblock Thanks for explaining! 👍 |
+1 for having this |
@imRishN @rramachand21 @elfisher |
@imRishN is this being worked on? |
Came from: https://forum.opensearch.org/t/runtime-fields-on-opensearch/9837 I'm bummed that opensearch doesn't support runtime fields — they seemed like the solution I needed for my project (was reading about ES, obviously), so I'm disappointed that I'm left without the feature having chosen OS over ES 😞 |
We will be looking into this and updating this with a more accurate version where this will be available. As usual, opensource contributions are welcome :) If there is interest in contributing to this, please do reach out. |
@rramachand21 has this made it to the roadmap yet? Can you comment on status? |
@rramachand21 do you have a plan deliver it? |
Voting for this feature too. This would massively simplify our task to build an integrated view on distributed data. Currently we manage this by a prepocessing service resolving references before indexing. |
It's crazy that such a feature has still been in the backlog since 2021! |
Please contribute! |
Please contribute or let us know your plan about this feature. |
Reading up on the runtime fields feature it sounds more like this is a lot like adding script fields to the mapping, so that you don't need to specify them at query time. As script fields, they're still computed at runtime. From the original issue description, it sounds like the problem with script fields is that they're bulky and awkward to inject into a query. You might be able to simplify the syntax with a search request processor, essentially injecting script fields into the query wherever a "runtime field" is specified. (At first I was thinking about this purely from the perspective of adding a field to results, which could also be done with a search response processor, but then it would be too late to use the field in filters and aggregations.) |
Yes, if we add script fields reading from index mappings or search request runtime field, that should solve the problem. Also, going by the blog here - https://www.elastic.co/blog/getting-started-with-elasticsearch-runtime-fields, in elasticsearch its also possible to convert these fields to indexed fields at the time of rollover, which seems like a nice value add. |
Yeah -- I experimented a little bit with |
we will get some boiler plate code to implement it once #6836 is done on which I'm currently working on. |
Is there any timeframe to deliver it? |
Here is the breakdown for the implementation - Runtime field mapping parsing
"runtime_mappings": {
"<name>": {
"type": "keyword",
"script": {
"source": "<script>"
}
}
},
QueryBuilder and execution
AggregationScoringSorting |
@rishabhmaurya thanks for picking it up, I have nothing against |
@reta that's a good point. Should we call them prototype field? Since they are meant for prototype purposes and not for permanent use. |
I don't think we should name them for what we think they are best used for. I'd hope that they would have essentially zero runtime cost and so not be suitable only for prototypes. There are lots of good names to choose from. I think in DBMS's they are called generated, virtual, computed, calculated, derived, etc. Or is there some critical difference between the DBMS concept and the OpenSearch concept which needs to be emphasized? |
I like |
+1 for |
It would be great to have the same name and interface as ElasticSearch for it |
In theory - yes, in practice - we are asking for problems: Elasticsearch is not OSS |
Good news - we are done with most of the implementation(#12281) and here is a little documentation (opensearch-project/documentation-website#6943). I encourage folks waiting for it to give it a shot using snapshot build and see if it meets their needs. Let us know if you have any feedback or suggestions, happy to incorporate them possibly before next version release. |
This feature is released in 2.15 - https://opensearch.org/docs/latest/field-types/supported-field-types/derived/ |
Problem Statement
By default, OpenSearch supports ‘schema on write’ i.e. the structure is defined at the time of ingest so that it is available for query immediately. However, as use cases for OpenSearch evolved, there is a need for greater flexibility. End users may not be aware of the data structure or may want additional attributes to query upon post ingest. This is where ‘schema on read’ is useful. With ‘schema on read’, the query result field can be defined at the time of query. This also helps greatly improve ingest rate by avoiding having to index fields that are not always going to be queried right away.
Requirements
Existing Solution
Scripting
Scripting is supported at various constructs of the _search request body. In each of these constructs, the fundamental working is same: script is evaluated at query time, it derives value/s from the indexed field/s and acts on the derived values.
Shortcomings of existing solution
Scripting satisfies most of the requirements listed above but adding scripts to the request make it bulky, non-readable and difficult to manage. Even though scripts can be stored and referenced in the query, it does not help the readability.
Following example highlights the same:
Proposed Solution
Regular OpenSearch queries revolve around fields in the schema. With scripting, the query syntax changes a lot.
In the proposed solution, we aim to achieve ease of using schema on read along with all the benefits of scripting.
The proposal includes defining fields in mapping which will be evaluated at query time and behave like regular fields.
The text was updated successfully, but these errors were encountered: