Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doctype gets removed #16

Open
JocelynDelalande opened this issue May 19, 2015 · 2 comments
Open

Doctype gets removed #16

JocelynDelalande opened this issue May 19, 2015 · 2 comments

Comments

@JocelynDelalande
Copy link
Contributor

If I process an HTML with a doctype declaration with toronado, the doctype gets removed.

import toronado

toronado.from_string(
"""<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"></html>
""")

--> '<html xmlns="http://www.w3.org/1999/xhtml"></html>'
@tkaemming
Copy link
Collaborator

I confirmed this with an (admittedly brittle) test on the doctype branch: 89148e8.

I don't have any immediate thoughts on a good fix — my first impressions are that part of the issue is that lxml.html.fromstring paves over the inconsistencies between handling documents and fragments, abandoning some of the associated root-level metadata in the process.

It probably would make sense for the fix to provide a new API method (e.g. toronado.from_document) to avoid any backwards incompatible implementation changes (unless it's possible to continue supporting both documents and fragments with the same method.)

I don't necessarily have time to work on this in the very near future, but am happy to merge any fixes you're able to provide (given that there is test coverage.)

@tkaemming
Copy link
Collaborator

It looks like the fix could be as simple as providing toronado.from_document that uses document_fromstring and tostring instead, providing doctype as a keyword argument to tostring.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants