Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-1528909: Snowflake CLI cannot handle UTF-16LE encoded text files #1303

Open
sfc-gh-cgorrie opened this issue Jul 10, 2024 · 2 comments
Open
Labels
bug Something isn't working

Comments

@sfc-gh-cgorrie
Copy link
Contributor

sfc-gh-cgorrie commented Jul 10, 2024

SnowCLI version

2.6.0rc0

Python version

Python 3.11.9

Platform

macOS-14.5-arm64-arm-64bit

What happened

Powershell redirects (e.g. command > file) by default encode output using UTF-16LE. Unfortunately, Snowflake CLI in a lot of paths is assuming utf-8 encoding, which makes common workflows fail there. Here's an example PR that simply changes the input for a snow sql -f command to use that encoding, showing the failure: #1299

Console output

src/snowflake/cli/api/commands/snow_typer.py:96: in command_callable_decorator
    result = command_callable(*args, **kw)
src/snowflake/cli/api/commands/decorators.py:158: in wrapper
    return func(**options)
src/snowflake/cli/api/commands/decorators.py:158: in wrapper
    return func(**options)
src/snowflake/cli/plugins/sql/commands.py:82: in execute_sql
    single_statement, cursors = SqlManager().execute(query, files, std_in, data=data)
src/snowflake/cli/plugins/sql/manager.py:60: in execute
    query_from_file = SecurePath(file).read_text(
src/snowflake/cli/api/secure_path.py:157: in read_text
    return self._path.read_text(*args, **kwargs)
/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/pathlib.py:1059: in read_text
    return f.read()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <encodings.utf_8.IncrementalDecoder object at 0x7f7815a95f90>
input = b'\xff\xfe/\x00*\x00\n\x00 \x00C\x00o\x00p\x00y\x00r\x00i\x00g\x00h\x00t\x00 \x00(\x00c\x00)\x00 \x002\x000\x002\x004\...00e\x00c\x00t\x00 \x00r\x00o\x00u\x00n\x00d\x00(\x00l\x00n\x00(\x001\x000\x000\x00)\x00,\x00 \x004\x00)\x00;\x00\n\x00'
final = True

>   ???
E   UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

<frozen codecs>:322: UnicodeDecodeError


### How to reproduce

1. Encode a file using UTF-16LE
2. Use it as `snowflake.yml`, as a post-deploy hook, or as an input to `snow sql -f`
3. Observe a utf-8 codec error
@github-actions github-actions bot changed the title Snowflake CLI cannot handle UTF-16LE encoded text files SNOW-1528909: Snowflake CLI cannot handle UTF-16LE encoded text files Jul 10, 2024
@sfc-gh-turbaszek
Copy link
Collaborator

We may need to use a tool like https://github.com/jawah/charset_normalizer

@sfc-gh-cgorrie
Copy link
Contributor Author

I think we could get away with something a little lighter-weight and more deterministic. BOM detection alone will solve the standard codepath for Windows, and if we give users the ability to use (python-standard? *nix locale?) env vars to match any overrides they've made on their local system, that coverage should be enough to resolve this ticket.

@sfc-gh-turbaszek sfc-gh-turbaszek added the bug Something isn't working label Jul 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants