Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix DTD parsing issue resulting in ExpectedClosingQuote error. #60

Merged
merged 2 commits into from
Mar 14, 2018

Conversation

onelson
Copy link
Contributor

@onelson onelson commented Mar 10, 2018

Certain characters (such as /) in the document type declaration (dtd)
could cause the parser to fail.
This updates sxd_document::parser::parse_external_id() to use a
slightly different (more permissive) function to consume the attribute
string.

refs #50, #59

src/parser.rs Outdated
/* without the optional intSubset */
fn parse_document_type_declaration<'a>(pm: &mut XmlMaster<'a>, xml: StringPoint<'a>) -> XmlProgress<'a, Token<'a>> {
let (xml, _) = try_parse!(xml.expect_literal("<!DOCTYPE"));
let (xml, _) = try_parse!(xml.expect_space());
let (xml, _type_name) = try_parse!(xml.consume_name().map_err(|_| SpecificError::ExpectedDocumentTypeName));
let (xml, _external_id) = try_parse!(parse_external_id(pm, xml));
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The corresponding grammar here is

'<!DOCTYPE' S Name (S ExternalID)? S? ('[' intSubset ']' S?)? '>'

This means that 0, 1, or 2 of the external ID and an internal subset can be present. The current code only supports exactly one or the other, closer to this grammar:

'<!DOCTYPE' S Name (ext | int) '>'

This PR is still an improvement though! I see two paths:

  1. If you'd be up for adding more support, then we can add tests for the 0 and 2 cases.
  2. If you'd rather stop here, I'd prefer a comment be added to this newly-added alternate stating that the grammar isn't yet fully correct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me take another quick swing at it. I misunderstood the spec thinking you could have at most one or the other. Interesting.

@onelson
Copy link
Contributor Author

onelson commented Mar 12, 2018

I've been flip-flopping a bit on terminology. Moving towards using "int subset" rather than "internal dtd."

Also added some extra tests to try and cover the various combinations (including explicit checks for whitespace handling towards the end of the node). I think this is much closer to the spec while still skipping the modeling of the individual elements inside the int subset.

@onelson
Copy link
Contributor Author

onelson commented Mar 13, 2018

Patched this fork in today at work, and it seems to be working well for us ;)

Certain characters (such as `/`) in the document type declaration (DTD)
could cause the parser to fail.

This updates `parse_external_id()` to use a slightly more permissive
function to consume the attribute string all the way to the closing
quote.
@shepmaster shepmaster changed the title Fixes dtd parser issue resulting in ExpectedClosingQuote error. Fix DTD parsing issue resulting in ExpectedClosingQuote error. Mar 14, 2018
@shepmaster
Copy link
Owner

As a heads-up, I made some minor changes to your commits: trailing \ aren't needed and using some raw string literals (r#"..."). I also squashed the latter 3 commits into one. I've force-pushed to your branch and will merge once tests re-run.

@shepmaster shepmaster merged commit 4d91a58 into shepmaster:master Mar 14, 2018
@onelson
Copy link
Contributor Author

onelson commented Mar 14, 2018

Of course. I'm just a guest here. This is your house!
I'm just glad to be a part of the process 😉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants