Improve the C-ness of the code #7

ndmitchell · 2016-12-14T22:17:59Z

@thoughtpolice in https://www.reddit.com/r/haskell/comments/5i2mg1/new_xml_parser_hexml/db5os2h/ says:

Also, IMO, the C library could be improved a bit, too e.g. it should probably be namespaced so everything is under hexml_, and I'm not sure about the usage of document_parse taking an int vs a size_t which is what strlen returns (int and size_t do not have the same sign, making such loose conversions extremely dangerous).

The only reason document_parse handles int is so you can pass -1 so it will call strlen itself, but frankly, I'd just suggest making a totally separate function to do this. Or just not have this 'feature' at all -- users should always be expected to reasonably know the size of the fragment they pass to document_parse, right? It also assumes that the input to the string is actually null terminated, but if it isn't, strlen is going to do something very bad (by returning some massive integer). Assuming the input is hostile means I'm always going to assume someone, somehow, snuck in a non-null terminated string...

ndmitchell · 2016-12-14T22:21:34Z

Adding a hexml_ prefix is pure win.

There were good reasons for the int thing originally - you could pass a string which was not null terminated, and use the length. Or you could pass a string that was null terminated, and it wouldn't ever look for the trailing zero in advance, it emerged as a consequence of the operation, so saved you a string walk. As it stands, there's little reason for the -1 nastiness, and thus size_t makes more sense.

ndmitchell · 2016-12-15T09:48:53Z

I have prefixed all exported functions with hexml_ and added static annotations to the rest. Should I also be mangling the types? e.g. hexml_str? Should it really be hexml_str_t by convention?

Regarding signed operations - I am using int32_t in the str data type since I want to use less memory, and the document is essentially just str data types. It seems like that should really be uint32_t. For length of the string, size_t is plausible, but for most other contexts should I be using size_t? unsigned? uint32_t? What should I be doing by default, and what should I be converting to/from at the edges?

ndmitchell added a commit that referenced this issue Dec 15, 2016

#7, every C function should start hexml_ or be static

b88a0ed

ndmitchell added a commit that referenced this issue Dec 15, 2016

#7, avoid camel-case names in the C code

e06e524

ndmitchell added a commit that referenced this issue Dec 15, 2016

#7, rename peekAt to peek_at, for consistency

539ba4e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve the C-ness of the code #7

Improve the C-ness of the code #7

ndmitchell commented Dec 14, 2016

ndmitchell commented Dec 14, 2016

ndmitchell commented Dec 15, 2016

Improve the C-ness of the code #7

Improve the C-ness of the code #7

Comments

ndmitchell commented Dec 14, 2016

ndmitchell commented Dec 14, 2016

ndmitchell commented Dec 15, 2016