150: don't use regex to clean up XML following pretty-printing #681
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Closes #150
Default xml.minidom pretty-printer treats each node in a mixed-content element as a separate node. As a result it adds newlines and indents for each text node and output element, which is ugly and wrong. Then survey.py cleans that up with some regex.
This change copies and customises the default
writexml
implementation from minidom, and skips newline/indent for text nodes. It also includes conditional whitespace to match the previous processing, in case Collect or Enketo expected it to be exactly that way.Added a test to specifically enumerate various output/text mixture permutations that appear in a variety of existing tests.
Why is this the best possible solution? Were any other approaches considered?
I think not outputting the undesirable whitespace in the first place is better than cleaning it up afterwards. Copying
writexml
adds a bit of complexity to the code base but I think the trade off vs the regex is worth it. Being able to do this without a new dependency is preferable. It may also allow more flexibility around users inputting whitespace in their question labels, without pyxform mangling it.What are the regression risks?
While the added test should provide coverage, it's possible that some combination of text and references doesn't produce exactly the same white-spaced output as before, and assuming Collect or Enketo are sensitive to that it might change the output. But that risk seems pretty small. I'm not sure that the whitespace is actually significant in the first place.
Does this change require updates to documentation? If so, please file an issue here and include the link below.
No
Before submitting this PR, please make sure you have:
tests
nosetests
and verified all tests passblack pyxform tests
to format code