Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the C-ness of the code #7

Open
ndmitchell opened this issue Dec 14, 2016 · 2 comments
Open

Improve the C-ness of the code #7

ndmitchell opened this issue Dec 14, 2016 · 2 comments

Comments

@ndmitchell
Copy link
Owner

@thoughtpolice in https://www.reddit.com/r/haskell/comments/5i2mg1/new_xml_parser_hexml/db5os2h/ says:

Also, IMO, the C library could be improved a bit, too e.g. it should probably be namespaced so everything is under hexml_, and I'm not sure about the usage of document_parse taking an int vs a size_t which is what strlen returns (int and size_t do not have the same sign, making such loose conversions extremely dangerous).

The only reason document_parse handles int is so you can pass -1 so it will call strlen itself, but frankly, I'd just suggest making a totally separate function to do this. Or just not have this 'feature' at all -- users should always be expected to reasonably know the size of the fragment they pass to document_parse, right? It also assumes that the input to the string is actually null terminated, but if it isn't, strlen is going to do something very bad (by returning some massive integer). Assuming the input is hostile means I'm always going to assume someone, somehow, snuck in a non-null terminated string...

@ndmitchell
Copy link
Owner Author

Adding a hexml_ prefix is pure win.

There were good reasons for the int thing originally - you could pass a string which was not null terminated, and use the length. Or you could pass a string that was null terminated, and it wouldn't ever look for the trailing zero in advance, it emerged as a consequence of the operation, so saved you a string walk. As it stands, there's little reason for the -1 nastiness, and thus size_t makes more sense.

@ndmitchell
Copy link
Owner Author

I have prefixed all exported functions with hexml_ and added static annotations to the rest. Should I also be mangling the types? e.g. hexml_str? Should it really be hexml_str_t by convention?

Regarding signed operations - I am using int32_t in the str data type since I want to use less memory, and the document is essentially just str data types. It seems like that should really be uint32_t. For length of the string, size_t is plausible, but for most other contexts should I be using size_t? unsigned? uint32_t? What should I be doing by default, and what should I be converting to/from at the edges?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant