Inline HTML handling is sloppy #9

rexxars · 2016-01-29T22:14:41Z

The fact that foo bar gets turned into foo bar is obviously horribly broken. It's tied to the fact that the walker gives  as an inline HTML element, then gives bar as a text node, followed by  as another inline html element. Not 100% how to best handle this yet.

The text was updated successfully, but these errors were encountered:

kadishmal · 2016-02-29T15:20:14Z

Stumbled upon this exact issue in remarkjs/react-markdown#17.

rexxars · 2016-02-29T15:21:22Z

Because of the way the commonmark parser is built, this one is pretty tricky to solve, unfortunately. Would love some help on this if someone has time.

kadishmal · 2016-02-29T15:22:45Z

I guessed that this comes from the parser. Any hint where I should look for this issue?

rexxars · 2016-02-29T15:34:31Z

Basically, when encountering foo bar, the parser yields four tokens:

Text with the literal foo
InlineHtml with the literal 
Text with the literal bar
InlineHtml with the literal

Because of how React works, we can't just inject "half a tag" somewhere. If the parser gave us bar, things would be a lot simpler, but this doesn't work because of things like the following: foo *bar*. In other words, markdown inside of inline HTML.

kadishmal · 2016-02-29T16:00:08Z

Yeah, I see. Seems like have to fall back to the plain Markdown processor and inject the output dangerously.

tauren · 2017-03-23T19:41:16Z

I also ran into this issue in remarkjs/react-markdown#67

Is this an issue in the commonmark parser, or the way it is used by commonmark-react-renderer? Has anyone looked into alternative commonmark parsers?

rexxars · 2017-03-28T17:25:30Z

The issue is basically that you'd need to pull in an HTML parser in order to properly handle this.
Let's imagine you have this HTML:

<span id=unquoted class="foo">Some *bold* <span class='outlined'>text</span></span>

In an ideal world, it would have to convert this to the following tree:

{
  tag: 'span',
  attrs: {id: 'unquoted', className: 'foo'},
  children: [
    'Some ',
    {tag: 'strong', children: ['bold']},
    {tag: 'span', attrs: {className: 'outlined'}, children: ['text']}
  ]
}

Which is non-trivial to implement with a parser that only emits tokens without having any notion of "depth". You'd have to keep track of the structure you're inside of, so that for the opening , you'd note that we're now entering a node and need to record every new token as a child, until you run into a new opening tag or a closing tag for the span. Then you'd also have to consider cases like optional closing tags (good old , for instance), incorrect HTML structures (FooMuh) and the likes. It really is a nightmare.

tauren · 2017-03-28T18:39:23Z

What are the objections to pulling in an HTML parser? Is it the size? Does the commonmark parser not help with this? Would changing parsers solve this?

The way things are now, inline html doesn't work, even for very simple and properly formed html:

<span>Hello</span>

Here is what gets rendered:

<p><span><span></span></span><!-- react-text: 89 -->Hello<!-- /react-text --><span></span></p>

I suggest that the project README reflect this situation until it can be solved and it should also be mentioned in the react-markdown readme. I suspect this might save others a bunch of time figuring out why things aren't rendering right.

rexxars · 2017-03-28T19:04:46Z

Well, even if you pull in an HTML parser (which will add significant weight), you'll still have to figure out how to deal with all the edge cases, as I outlined above (broken HTML, optional closing tags etc).

It's not a task I want to venture out on - my time is already stretched way too thin as it is. I'd be more than happy to accept a pull request that would improve the situation. Last time I thought about this problem, I came across html-to-react, which would solve much of this problem if only the commonmark parser returned blocks of HTML instead of just tokens for individual HTML fragments (start and end tags as individual tokens). So you'd still be left with trying to match up start and end tags manually.

I'll try to update the readmes to reflect this state.

tauren · 2017-03-28T19:25:56Z

Thanks for the explanation, and I understand being stretched too thin. I was hoping to get a clear picture of how you would solve this so that someone else could take on the work and create a PR you would potentially accept. It would also help to know what you don't want to see (adding significant weight to project, etc.).

I am stretched thin and don't have the time to solve this problem either. I'm working around the issue in my current project, but it keeps rearing it's ugly head and I may end up needing to work on a real solution.

kentcdodds · 2017-05-09T21:09:07Z

Hi there! I'm evaluating this project right now. Honestly, I wouldn't expect the contents of HTML to be parsed as markdown. Couldn't things be drastically simplified if we say that once you get into HTML-land we no longer parse it as markdown? Kinda like switching context from JSX to expressions and back again using { and }?

rexxars · 2017-05-10T07:58:49Z

Hi Kent!

For sure, things would be simpler if inline HTML was just emitted as one big chunk. I don't think this is consistent with other markdown parsers though, and might not necessarily be what you want. It also doesn't really solve the problem unless we switch to a different parser.

As outlined earlier, the parser itself outputs individual tokens. If it emitted a single token, it would be a different story.

Venryx · 2017-08-13T11:50:12Z

Note that while *Something* is, as you say, parsed by Remarkable as three tokens (so Markdown can be used inside), this is not true for <div>*Something*</div>.

In that case, Remarkable just takes everything inside as pure html -- which is why that is output as a single htmlblock token, rather than htmltag, text, and htmltag.

So it's kind of odd to me why the authors of Remarkable did this -- it seems kind of inconsistent. They have span allowing Markdown inside it, whereas div does not.

EDIT: Just realized this project uses the Commonmark parser instead of the Remarkable one; however, I suspect they may have the same difference in handling of <div> and .

localjo · 2017-10-27T16:50:46Z

I'm going to try to look into this more in the next two weeks. One question; if we could detect whether the node was an opening/closing tag of inline HTML with something like if (!props.isBlock && props.literal.match(/<([A-Z]+)[^>]+>/i)) how would we then skip creating a react element for that node? Or would it be better to group an opening tag, text node and closing tag together and then create a react element for that?

rexxars · 2017-10-28T14:25:11Z

First of all, thanks a ton for showing interest in helping with this! <3

Been a while since I looked at this, so I don't have a clear picture of the issue in my head right now. Having said that, here's the first few thoughts that pops into my head when we're dealing with this stuff:

If you are using react-markdown, beware that I've been wanting to switch to a different parser for some time. I've got a semi-working prototype going that uses remark-parse as the parser. I'd focus my efforts on that branch. If you're using react-commonmark, or just this parser in and of itself, this is the right place to address it.
Simply matching the opening and closing tags might be a good start, but keep in mind that to properly create React elements from the HTML, you'll want to also extract the attributes in the opening tag and apply them to the created element.
Things I'm worried about being difficult to handle properly:
- Invalid HTML (What happens if you put a  in there somewhere, and never close it? What if you open a , then a  and close the tags in the wrong order?)
- Properly parsing HTML without a real HTML parser
Having said that, my first suggestion (without having taken a stab at it) would be to:
1. Keep an array of open tags/nodes that you append to every time you hit an InlineHtml node
  - If you hit an empty element, parse it and produce a react element right away
2. Keep an array of all the other nodes produced since the last open html tag
3. When you encounter an InlineHtml node, parse it, see if it's a closing tag. If so, check if you have a corresponding open tag. If so, close all the open nodes that have been encountered after that tag, and add all those elements along with the array of produced nodes previously recorded as children of the current tag.

As far as I know, once a React element has been created, there isn't a way to add children to it, so you'll have no choice but to group the children together as you suggested.

Good luck!

localjo · 2017-10-30T14:00:48Z

Oh, very interesting. I'm familiar with remark-parse. I used a lot of remark plugins for https://github.com/sparkartgroup/quality-docs

That md-ast branch on react-markdown is 17 ahead and 54 behind master. What are your thoughts on rebasing that and then opening a PR with the current progress that we could use to track discussion?

rexxars · 2017-10-30T14:28:37Z

Sure, I'm super busy at work these days, but I'll see if I can find the time to get it up to speed sometime soon.

rexxars · 2017-11-04T00:00:49Z

Huge divergence from master, so I decided to just do a clean branch from master.
Anyway, check out https://github.com/rexxars/react-markdown/tree/remark if you're interested.

Since the current markdown parsing package has a known issue with rendering inline html (rexxars/commonmark-react-renderer#9) which has been an issue for 3 years+, we need a new markdown package. The react-markdown package has an option to render links with target="_blank" so this commit allows the component to pass that option through

rexxars added the bug label Jan 29, 2016

rexxars added the help wanted label Feb 29, 2016

tauren mentioned this issue Mar 23, 2017

HtmlBlock renderer for JSX remarkjs/react-markdown#67

Closed

tomvej mentioned this issue Apr 22, 2017

Obarvení 2 slov v průvodním textu na Home datoszs/czech-lawyers#120

Closed

This was referenced Oct 27, 2017

Support for remarkjs/react-markdown#16

Closed

Render Markdown Directly in Components nasa-gibs/worldview-components#28

Closed

DanielMSchmidt mentioned this issue Dec 4, 2017

Add link to snack demo styled-components/styled-components-website#180

Merged

dbatten5 mentioned this issue Jan 23, 2020

CHORE Allow markdown component to open links in new tabs appearhere/bloom#486

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inline HTML handling is sloppy #9

Inline HTML handling is sloppy #9

rexxars commented Jan 29, 2016

kadishmal commented Feb 29, 2016

rexxars commented Feb 29, 2016

kadishmal commented Feb 29, 2016

rexxars commented Feb 29, 2016

kadishmal commented Feb 29, 2016

tauren commented Mar 23, 2017 •

edited

Loading

rexxars commented Mar 28, 2017 •

edited

Loading

tauren commented Mar 28, 2017

rexxars commented Mar 28, 2017

tauren commented Mar 28, 2017

kentcdodds commented May 9, 2017

rexxars commented May 10, 2017

Venryx commented Aug 13, 2017 •

edited

Loading

localjo commented Oct 27, 2017

rexxars commented Oct 28, 2017

localjo commented Oct 30, 2017

rexxars commented Oct 30, 2017

rexxars commented Nov 4, 2017

Inline HTML handling is sloppy #9

Inline HTML handling is sloppy #9

Comments

rexxars commented Jan 29, 2016

kadishmal commented Feb 29, 2016

rexxars commented Feb 29, 2016

kadishmal commented Feb 29, 2016

rexxars commented Feb 29, 2016

kadishmal commented Feb 29, 2016

tauren commented Mar 23, 2017 • edited Loading

rexxars commented Mar 28, 2017 • edited Loading

tauren commented Mar 28, 2017

rexxars commented Mar 28, 2017

tauren commented Mar 28, 2017

kentcdodds commented May 9, 2017

rexxars commented May 10, 2017

Venryx commented Aug 13, 2017 • edited Loading

localjo commented Oct 27, 2017

rexxars commented Oct 28, 2017

localjo commented Oct 30, 2017

rexxars commented Oct 30, 2017

rexxars commented Nov 4, 2017

tauren commented Mar 23, 2017 •

edited

Loading

rexxars commented Mar 28, 2017 •

edited

Loading

Venryx commented Aug 13, 2017 •

edited

Loading