Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pybind11 raises a UnicodeDecodeError on non-utf bytes in terms of sort Bytes #1078

Open
gtrepta opened this issue May 31, 2024 · 1 comment
Assignees
Labels
bindings LLVM backend bindings to other languages

Comments

@gtrepta
Copy link
Contributor

gtrepta commented May 31, 2024

Terms of sort Bytes and String are both stored in a kore_string_pattern in the AST library and treated the same way when being accessed from the bindings:

py::class_<kore_string_pattern, std::shared_ptr<kore_string_pattern>>(
ast, "StringPattern", pattern_base)
.def(py::init(&kore_string_pattern::create))
.def_property_readonly("contents", &kore_string_pattern::get_contents);

The issue here is when the contents property is accessed, pybind assumes it's a valid utf encoded string. This isn't always the case for Bytes terms, though, and an exception gets thrown in that case.

Pybind does support returning an unconverted string, so we should find out how to do that for terms that need to be treated that way.

https://pybind11.readthedocs.io/en/stable/advanced/cast/strings.html#returning-c-strings-to-python

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bindings LLVM backend bindings to other languages
Projects
None yet
Development

No branches or pull requests

2 participants