-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inline HTML handling is sloppy #9
Comments
Stumbled upon this exact issue in remarkjs/react-markdown#17. |
Because of the way the commonmark parser is built, this one is pretty tricky to solve, unfortunately. Would love some help on this if someone has time. |
I guessed that this comes from the parser. Any hint where I should look for this issue? |
Basically, when encountering
Because of how React works, we can't just inject "half a tag" somewhere. If the parser gave us |
Yeah, I see. Seems like have to fall back to the plain Markdown processor and inject the output dangerously. |
I also ran into this issue in remarkjs/react-markdown#67 Is this an issue in the commonmark parser, or the way it is used by commonmark-react-renderer? Has anyone looked into alternative commonmark parsers? |
The issue is basically that you'd need to pull in an HTML parser in order to properly handle this. <span id=unquoted class="foo">Some *bold* <span class='outlined'>text</span></span> In an ideal world, it would have to convert this to the following tree: {
tag: 'span',
attrs: {id: 'unquoted', className: 'foo'},
children: [
'Some ',
{tag: 'strong', children: ['bold']},
{tag: 'span', attrs: {className: 'outlined'}, children: ['text']}
]
} Which is non-trivial to implement with a parser that only emits tokens without having any notion of "depth". You'd have to keep track of the structure you're inside of, so that for the opening |
What are the objections to pulling in an HTML parser? Is it the size? Does the commonmark parser not help with this? Would changing parsers solve this? The way things are now, inline html doesn't work, even for very simple and properly formed html:
Here is what gets rendered:
I suggest that the project README reflect this situation until it can be solved and it should also be mentioned in the |
Well, even if you pull in an HTML parser (which will add significant weight), you'll still have to figure out how to deal with all the edge cases, as I outlined above (broken HTML, optional closing tags etc). It's not a task I want to venture out on - my time is already stretched way too thin as it is. I'd be more than happy to accept a pull request that would improve the situation. Last time I thought about this problem, I came across html-to-react, which would solve much of this problem if only the commonmark parser returned blocks of HTML instead of just tokens for individual HTML fragments (start and end tags as individual tokens). So you'd still be left with trying to match up start and end tags manually. I'll try to update the readmes to reflect this state. |
Thanks for the explanation, and I understand being stretched too thin. I was hoping to get a clear picture of how you would solve this so that someone else could take on the work and create a PR you would potentially accept. It would also help to know what you don't want to see (adding significant weight to project, etc.). I am stretched thin and don't have the time to solve this problem either. I'm working around the issue in my current project, but it keeps rearing it's ugly head and I may end up needing to work on a real solution. |
Hi there! I'm evaluating this project right now. Honestly, I wouldn't expect the contents of HTML to be parsed as markdown. Couldn't things be drastically simplified if we say that once you get into HTML-land we no longer parse it as markdown? Kinda like switching context from JSX to expressions and back again using |
Hi Kent! For sure, things would be simpler if inline HTML was just emitted as one big chunk. I don't think this is consistent with other markdown parsers though, and might not necessarily be what you want. It also doesn't really solve the problem unless we switch to a different parser. As outlined earlier, the parser itself outputs individual tokens. If it emitted a single token, it would be a different story. |
Note that while In that case, Remarkable just takes everything inside as pure html -- which is why that is output as a single So it's kind of odd to me why the authors of Remarkable did this -- it seems kind of inconsistent. They have span allowing Markdown inside it, whereas div does not. EDIT: Just realized this project uses the Commonmark parser instead of the Remarkable one; however, I suspect they may have the same difference in handling of |
I'm going to try to look into this more in the next two weeks. One question; if we could detect whether the node was an opening/closing tag of inline HTML with something like |
First of all, thanks a ton for showing interest in helping with this! <3 Been a while since I looked at this, so I don't have a clear picture of the issue in my head right now. Having said that, here's the first few thoughts that pops into my head when we're dealing with this stuff:
As far as I know, once a React element has been created, there isn't a way to add children to it, so you'll have no choice but to group the children together as you suggested. Good luck! |
Oh, very interesting. I'm familiar with That |
Sure, I'm super busy at work these days, but I'll see if I can find the time to get it up to speed sometime soon. |
Huge divergence from master, so I decided to just do a clean branch from master. |
Since the current markdown parsing package has a known issue with rendering inline html (rexxars/commonmark-react-renderer#9) which has been an issue for 3 years+, we need a new markdown package. The react-markdown package has an option to render links with target="_blank" so this commit allows the component to pass that option through
Since the current markdown parsing package has a known issue with rendering inline html (rexxars/commonmark-react-renderer#9) which has been an issue for 3 years+, we need a new markdown package. The react-markdown package has an option to render links with target="_blank" so this commit allows the component to pass that option through
Since the current markdown parsing package has a known issue with rendering inline html (rexxars/commonmark-react-renderer#9) which has been an issue for 3 years+, we need a new markdown package. The react-markdown package has an option to render links with target="_blank" so this commit allows the component to pass that option through
Since the current markdown parsing package has a known issue with rendering inline html (rexxars/commonmark-react-renderer#9) which has been an issue for 3 years+, we need a new markdown package. The react-markdown package has an option to render links with target="_blank" so this commit allows the component to pass that option through
Since the current markdown parsing package has a known issue with rendering inline html (rexxars/commonmark-react-renderer#9) which has been an issue for 3 years+, we need a new markdown package. The react-markdown package has an option to render links with target="_blank" so this commit allows the component to pass that option through
Since the current markdown parsing package has a known issue with rendering inline html (rexxars/commonmark-react-renderer#9) which has been an issue for 3 years+, we need a new markdown package. The react-markdown package has an option to render links with target="_blank" so this commit allows the component to pass that option through
The fact that
foo <strong>bar</strong>
gets turned intofoo <span><strong></span>bar<span></strong></span>
is obviously horribly broken. It's tied to the fact that the walker gives<strong>
as an inline HTML element, then givesbar
as a text node, followed by</strong>
as another inline html element. Not 100% how to best handle this yet.The text was updated successfully, but these errors were encountered: