Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strings are not properly escaped in JUnit XML reports #163

Open
generalmimon opened this issue Jul 25, 2024 · 1 comment
Open

Strings are not properly escaped in JUnit XML reports #163

generalmimon opened this issue Jul 25, 2024 · 1 comment

Comments

@generalmimon
Copy link

Reproduction code:

test_reproducer.lua

local lu = require('luaunit')

function test_str_compare_null_byte()
    local actual   = "q\000\000\002w\000"
    local expected = "q\000\000\002w\000\000"

    lu.assertEquals(actual, expected)
end

os.exit( lu.LuaUnit.run() )
$ lua test_reproducer.lua --output junit --name report | cat --show-nonprinting
# XML output to report.xml
# Started on 07/25/24 16:34:54
# Starting test: test_str_compare_null_byte
#   Failure:  test_reproducer.lua:7: expected: "q^@^@^Bw^@^@"
#   actual: "q^@^@^Bw^@"
# Ran 1 tests in 0.002 seconds, 0 successes, 1 failure

The problem is that the JUnit XML reports will also (like the console output) contain these characters unescaped, resulting in invalid XML that the XML parsers I've tried refuse to read:

$ cat --show-nonprinting report.xml
<?xml version="1.0" encoding="UTF-8" ?>
<testsuites>
    <testsuite name="LuaUnit" id="00001" package="" hostname="localhost" tests="1" timestamp="2024-07-25T16:36:05" time="0.003" errors="0" failures="1" skipped="0">
        <properties>
            <property name="Lua Version" value="Lua 5.3"/>
            <property name="LuaUnit Version" value="3.4"/>
        </properties>
        <testcase classname="[TestFunctions]" name="test_str_compare_null_byte" time="0.002">
            <failure type="test_reproducer.lua:7: expected: &quot;q^@^@^Bw^@^@&quot;
actual: &quot;q^@^@^Bw^@&quot;">
                <![CDATA[stack traceback:
	test_reproducer.lua:7: in function 'test_str_compare_null_byte']]></failure>
        </testcase>
    <system-out/>
    <system-err/>
    </testsuite>
</testsuites>

I tried:

  • Ruby

    parse_xml.rb

    require 'rexml/document'
    
    REXML::Document.new(File.read('report.xml'))
    Console output
    $ ruby parse_xml.rb
    C:/Ruby32-x64/lib/ruby/gems/3.2.0/gems/rexml-3.2.6/lib/rexml/parsers/treeparser.rb:96:in `rescue in parse': #<RuntimeError: Illegal character "\u0000" in raw string "test_reproducer.lua:7: expected: &quot;q\u0000\u0000\u0002w\u0000\u0000&quot;\nactual: &quot;q\u0000\u0000\u0002w\u0000&quot;"> (REXML::ParseException)
    C:/Ruby32-x64/lib/ruby/gems/3.2.0/gems/rexml-3.2.6/lib/rexml/text.rb:140:in `block in check'
    C:/Ruby32-x64/lib/ruby/gems/3.2.0/gems/rexml-3.2.6/lib/rexml/text.rb:136:in `each'
    C:/Ruby32-x64/lib/ruby/gems/3.2.0/gems/rexml-3.2.6/lib/rexml/text.rb:136:in `check'
    C:/Ruby32-x64/lib/ruby/gems/3.2.0/gems/rexml-3.2.6/lib/rexml/attribute.rb:175:in `element='
    C:/Ruby32-x64/lib/ruby/gems/3.2.0/gems/rexml-3.2.6/lib/rexml/element.rb:2384:in `[]='
    C:/Ruby32-x64/lib/ruby/gems/3.2.0/gems/rexml-3.2.6/lib/rexml/parsers/treeparser.rb:36:in `block in parse'
    C:/Ruby32-x64/lib/ruby/gems/3.2.0/gems/rexml-3.2.6/lib/rexml/parsers/treeparser.rb:35:in `each'
    C:/Ruby32-x64/lib/ruby/gems/3.2.0/gems/rexml-3.2.6/lib/rexml/parsers/treeparser.rb:35:in `parse'
    C:/Ruby32-x64/lib/ruby/gems/3.2.0/gems/rexml-3.2.6/lib/rexml/document.rb:448:in `build'
    C:/Ruby32-x64/lib/ruby/gems/3.2.0/gems/rexml-3.2.6/lib/rexml/document.rb:101:in `initialize'
    parse_xml.rb:3:in `new'
    parse_xml.rb:3:in `<main>'
    ...
    Illegal character "\u0000" in raw string "test_reproducer.lua:7: expected: &quot;q\u0000\u0000\u0002w\u0000\u0000&quot;\nactual: &quot;q\u0000\u0000\u0002w\u0000&quot;"
    Line: 10
    Position: 581
    Last 80 unconsumed characters:
    
      from C:/Ruby32-x64/lib/ruby/gems/3.2.0/gems/rexml-3.2.6/lib/rexml/parsers/treeparser.rb:21:in `parse'
      from C:/Ruby32-x64/lib/ruby/gems/3.2.0/gems/rexml-3.2.6/lib/rexml/document.rb:448:in `build'
      from C:/Ruby32-x64/lib/ruby/gems/3.2.0/gems/rexml-3.2.6/lib/rexml/document.rb:101:in `initialize'
      from parse_xml.rb:3:in `new'
      from parse_xml.rb:3:in `<main>'
    C:/Ruby32-x64/lib/ruby/gems/3.2.0/gems/rexml-3.2.6/lib/rexml/text.rb:140:in `block in check': Illegal character "\u0000" in raw string "test_reproducer.lua:7: expected: &quot;q\u0000\u0000\u0002w\u0000\u0000&quot;\nactual: &quot;q\u0000\u0000\u0002w\u0000&quot;" (RuntimeError)
      from C:/Ruby32-x64/lib/ruby/gems/3.2.0/gems/rexml-3.2.6/lib/rexml/text.rb:136:in `each'
      from C:/Ruby32-x64/lib/ruby/gems/3.2.0/gems/rexml-3.2.6/lib/rexml/text.rb:136:in `check'
      from C:/Ruby32-x64/lib/ruby/gems/3.2.0/gems/rexml-3.2.6/lib/rexml/attribute.rb:175:in `element='
      from C:/Ruby32-x64/lib/ruby/gems/3.2.0/gems/rexml-3.2.6/lib/rexml/element.rb:2384:in `[]='
      from C:/Ruby32-x64/lib/ruby/gems/3.2.0/gems/rexml-3.2.6/lib/rexml/parsers/treeparser.rb:36:in `block in parse'
      from C:/Ruby32-x64/lib/ruby/gems/3.2.0/gems/rexml-3.2.6/lib/rexml/parsers/treeparser.rb:35:in `each'
      from C:/Ruby32-x64/lib/ruby/gems/3.2.0/gems/rexml-3.2.6/lib/rexml/parsers/treeparser.rb:35:in `parse'
      from C:/Ruby32-x64/lib/ruby/gems/3.2.0/gems/rexml-3.2.6/lib/rexml/document.rb:448:in `build'
      from C:/Ruby32-x64/lib/ruby/gems/3.2.0/gems/rexml-3.2.6/lib/rexml/document.rb:101:in `initialize'
      from parse_xml.rb:3:in `new'
      from parse_xml.rb:3:in `<main>'
  • Python: pip install defusedxml

    parse_xml.py

    from defusedxml.ElementTree import parse
    et = parse('report.xml')
    Console output
    $ python parse_xml.py
    Traceback (most recent call last):
      File "C:\Users\pp\.pyenv\pyenv-win\versions\3.12.4\Lib\xml\etree\ElementTree.py", line 1706, in feed
        self.parser.Parse(data, False)
    xml.parsers.expat.ExpatError: not well-formed (invalid token): line 9, column 67
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "C:\temp\ks-experiments\luaunit-bug\parse_xml.py", line 2, in <module>
        et = parse('report.xml')
            ^^^^^^^^^^^^^^^^^^^
      File "C:\Users\pp\.pyenv\pyenv-win\versions\3.12.4\Lib\site-packages\defusedxml\common.py", line 100, in parse
        return _parse(source, parser)
              ^^^^^^^^^^^^^^^^^^^^^^
      File "C:\Users\pp\.pyenv\pyenv-win\versions\3.12.4\Lib\xml\etree\ElementTree.py", line 1204, in parse
        tree.parse(source, parser)
      File "C:\Users\pp\.pyenv\pyenv-win\versions\3.12.4\Lib\xml\etree\ElementTree.py", line 572, in parse
        parser.feed(data)
      File "C:\Users\pp\.pyenv\pyenv-win\versions\3.12.4\Lib\xml\etree\ElementTree.py", line 1708, in feed
        self._raiseerror(v)
      File "C:\Users\pp\.pyenv\pyenv-win\versions\3.12.4\Lib\xml\etree\ElementTree.py", line 1615, in _raiseerror
        raise err
    xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 9, column 67
    
  • xmllint (manpage, available on Ubuntu in the libxml2-utils package)

    Console output
    $ xmllint report.xml
    report.xml:9: parser error : Char 0x0 out of allowed range
                <failure type="test_reproducer.lua:7: expected: &quot;q
                                                                      ^
    report.xml:9: parser error : AttValue: ' expected
                <failure type="test_reproducer.lua:7: expected: &quot;q
                                                                      ^
    report.xml:9: parser error : attributes construct error
                <failure type="test_reproducer.lua:7: expected: &quot;q
                                                                      ^
    report.xml:9: parser error : Couldn't find end of Start Tag failure line 9
                <failure type="test_reproducer.lua:7: expected: &quot;q
                                                                      ^
    report.xml:9: parser error : Premature end of data in tag testcase line 8
                <failure type="test_reproducer.lua:7: expected: &quot;q
                                                                      ^
    

As you can see, all of these reject the XML report with some kind of error, indicating that it is not well-formed XML.

@generalmimon
Copy link
Author

generalmimon commented Jul 25, 2024

You might want to take a look at, for example, https://github.com/xmlrunner/unittest-xml-reporting from the Python ecosystem to see how it handles this situation:

import unittest

class TestRepro(unittest.TestCase):
    def test_str_compare_null_byte(self):
        actual   = "q\u0000\u0000\u0002w\u0000"
        expected = "q\u0000\u0000\u0002w\u0000\u0000"
        self.assertEqual(actual, expected)

if __name__ == '__main__':
    unittest.main()

Make sure to pip install unittest-xml-reporting first (I'm using the latest version 3.2.0):

$ python -m xmlrunner --outsuffix '' 2>&1 | cat --show-nonprinting

Running tests...
----------------------------------------------------------------------
F
======================================================================
FAIL [0.001s]: test_str_compare_null_byte (test_repro.TestRepro.test_str_compare_null_byte)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\temp\ks-experiments\luaunit-bug\test_repro.py", line 7, in test_str_compare_null_byte
    self.assertEqual(actual, expected)
AssertionError: 'q\x00\x00\x02w\x00' != 'q\x00\x00\x02w\x00\x00'
- q^@^@^Bw^@
+ q^@^@^Bw^@^@
?       +


----------------------------------------------------------------------
Ran 1 test in 0.000s

FAILED (failures=1)

Generating XML reports...

TEST-test_repro.TestRepro.xml

<?xml version="1.0" encoding="UTF-8"?>
<testsuite name="test_repro.TestRepro" tests="1" file="test_repro.py" time="0.001" timestamp="2024-07-25T17:03:59" failures="1" errors="0" skipped="0">
	<testcase classname="test_repro.TestRepro" name="test_str_compare_null_byte" time="0.001" timestamp="2024-07-25T17:03:59" file="test_repro.py" line="4">
		<failure type="AssertionError" message="'q\x00\x00\x02w\x00' != 'q\x00\x00\x02w\x00\x00'
- qw
+ qw
?       +
"><![CDATA[Traceback (most recent call last):
  File "C:\temp\ks-experiments\luaunit-bug\test_repro.py", line 7, in test_str_compare_null_byte
    self.assertEqual(actual, expected)
AssertionError: 'q\x00\x00\x02w\x00' != 'q\x00\x00\x02w\x00\x00'
- qw
+ qw
?       +

]]></failure>
	</testcase>
</testsuite>

Note that the assertion failure is escaped ('q\x00\x00\x02w\x00' != 'q\x00\x00\x02w\x00\x00'), which means that 1. the representation uses only basic ASCII characters, which doesn't cause any problems in the XML report or elsewhere, 2. the full contents of each string is captured, so it's always meaningful for debugging.

It also tries to display some kind of vertical diff with that - qw and + qw, but in this case it turns out to be useless, because all the non-basic-ASCII characters were filtered out from these. But that's still better than outputting them in the XML (which would make the XML invalid) and the real contents of both strings is already clear from the escaped form, so it doesn't matter.

I checked that the generated TEST-test_repro.TestRepro.xml only contains basic ASCII characters as follows:

$ xxd -ps -c 1 TEST-test_repro.TestRepro.xml | sort -u
09
0a
20
(...)
79

(the (...) mark indicates the part I've omitted, otherwise the listing would be unnecessarily long)

09 is horizontal tab (often denoted \t), 0a in hex is line feed (often denoted \n) and everything between 20 and 7e (inclusive) are printable characters (see https://en.wikipedia.org/wiki/ASCII#Printable_characters), so there's nothing problematic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant