Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose isUnquotedPathCharacter for validation #375

Closed
wants to merge 9 commits into from

Conversation

rui-mo
Copy link
Collaborator

@rui-mo rui-mo commented Jul 25, 2023

No description provided.

@rui-mo rui-mo force-pushed the token branch 14 times, most recently from e404d87 to 463e601 Compare August 9, 2023 08:45
@rui-mo rui-mo force-pushed the token branch 3 times, most recently from fafcbe0 to abdb894 Compare August 23, 2023 07:08
rui-mo and others added 4 commits September 18, 2023 05:53
Summary:
To improve the performance, instead of creating intermediate strings, the
raw string buffer is pre-allocated to be written directly. Function `std::to_chars`
is used to convert integers into a character string by successively filling
the range. On buffer allocation, instead of calculating the precise size
from intermediate strings, we pre-allocate sufficient buffer based on an
estimation with decimal precision and scale, and set the precise size after all
strings are written.

An alternative implementation used `DecimalUtil::toString` which produced a lot
of intermediate strings during conversion. Besides, `DecimalUtil::toString`
was called for the calculation of string buffer size. The optimized implementation
uses `std::to_chars` to convert integer to string and avoid all intermediate strings.
The string buffer size is estimated with decimal precision and scale. As below
benchmarks show, the final performance is improved 4-5x compared with the
previous one.

Cast from decimal to varchar benchmark | cast##cast_short_decimal | cast##cast_long_decimal
-- | -- | --
previous (DecimalUtil::toString) | 45.43ms | 132.09ms
optimized (std::to_chars) | 9.87ms | 35.00ms

Pull Request resolved: facebookincubator#6210

Reviewed By: xiaoxmeng

Differential Revision: D49315826

Pulled By: mbasmanova

fbshipit-source-id: 1f419aa9edcb080752c3bed567d390cc7a461cce
Summary:
When velox was used as a third-party library and `SIMDJsonExtractor` was used, it failed when running json function tests. We found that `-DSIMDJSON_THREADS_ENABLED=1` was not configured when generating libvelox_functions_json.a. We fix it by changing "simdjson" to "simdjson::simdjson" in target_link_libraries.

Fixes facebookincubator#6564

Pull Request resolved: facebookincubator#6565

Reviewed By: Yuhta

Differential Revision: D49285542

Pulled By: kgpai

fbshipit-source-id: f9bc093b278288a2a73bbb289bb91b5dd7061097
facebookincubator#6599)

Summary:
Pull Request resolved: facebookincubator#6599

When type kind is not equal and one of them non-primitive we would
crash accessing null type pointer after dynamic cast.
Fix is to bail out from going down the type tree whenever type kind is different.
The bug sneaked in, when we replaced throw() by log(1) in type checking code.

Reviewed By: Yuhta

Differential Revision: D49338549

fbshipit-source-id: 987f1df62016f68d7796f40c0aedfcd1becf5f1e
…okincubator#6404)

Summary:
pass down the scan table schema to parquet column reader

Details:

currently, the requestedType, which is available in [ParquetColumnReader.cpp](https://github.com/facebookincubator/velox/blob/517e3e3a0c8308c96ca068444dfeee37204f7773/velox/dwio/parquet/reader/ParquetColumnReader.cpp#L37C60-L37C68), are set based on the schema present in the parquet file (file data type) instead of scan table schema.

The issue occurs when the expected output of the TableScan differs from the schema of the parquet file. Spark's data format for some types differs from Parquet's format. Similar to schema evolution, when the type differs, Spark performs an implicit conversion. The conversions that Spark performs can be seen in [ParquetVectorUpdaterFactory.java](https://github.com/apache/spark/blob/6ca45c52b7416e7b3520dc902cb24f060c7c72dd/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/ParquetVectorUpdaterFactory.java#L67C3-L185C6).

This PR fix the issue by setting requestedType` with scan table schema data type to parquet column reader.

It's a follow PR of this PR facebookincubator#5786 to address the issue by following the comments of Yuhta

Please check detail context from facebookincubator#5786

This update is one of the modifications necessary for issue facebookincubator#5770.

Pull Request resolved: facebookincubator#6404

Reviewed By: pedroerp

Differential Revision: D49330580

Pulled By: Yuhta

fbshipit-source-id: bd56bda6efd708691ee35b5b66d5ba9536df525f
xumingming and others added 5 commits September 18, 2023 10:31
Summary:
Fixes facebookincubator#6417

Pull Request resolved: facebookincubator#6463

Reviewed By: amitkdutta

Differential Revision: D49371431

Pulled By: mbasmanova

fbshipit-source-id: 8956b04abe608bfcb76b0a3b49cefd0689284bb2
…n crashes (facebookincubator#6402)

Summary:
Pull Request resolved: facebookincubator#6402

This adds an experimental flag
'experimental_velox_save_input_on_fatal_signal' that when set to
true, serializes the input vector data and all the SQL expressions
in the ExprSet that is currently executing whenever a fatal signal
is encountered. Enabling this flag makes the signal handler async
signal unsafe, so it should only be used for debugging purposes.

Reviewed By: kgpai

Differential Revision: D48891649

fbshipit-source-id: 47722d726c76a8602cf436c1840d2a0d720e2c35
…MONTH() and DATE() to avoid copying (facebookincubator#6615)

Summary:
Pull Request resolved: facebookincubator#6615

This is to remove unnecessary copying in INTERVAL_DAY_TIME(), INTERVAL_YEAR_MONTH() and DATE() calls, which return (a copy of) constant shared_ptr, and make it very expensive.

Reviewed By: Yuhta, bikramSingh91

Differential Revision: D49347369

fbshipit-source-id: 6930970d9f2807347b16065fc224d7a7f5f57b69
Summary: Pull Request resolved: facebookincubator#6309

Reviewed By: xiaoxmeng

Differential Revision: D49394977

Pulled By: pedroerp

fbshipit-source-id: ba5fa3dda474505093d7d9d2f00aaa8c3d2d7e81
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants