Skip to content
This repository has been archived by the owner on Mar 8, 2020. It is now read-only.

Parent covers children invariant is not respected #394

Open
m09 opened this issue Mar 28, 2019 · 7 comments
Open

Parent covers children invariant is not respected #394

m09 opened this issue Mar 28, 2019 · 7 comments

Comments

@m09
Copy link

m09 commented Mar 28, 2019

With the concept of leading comments in the UAST, the simple invariant of parent covering their children is not respected.

I think this might be non-intuitive to many users.

To reproduce:

// This is a comment

var a = 1;

The initial comment is a child of the expression assignment node (on current bblfsh web for example).

@dennwc
Copy link
Member

dennwc commented Mar 28, 2019

It depends on the parsing mode you use.

For Native and Annotated, the AST is not normalized, thus it preserves any form that the JS parser emits. According to your comment, it looks like JS parser does not preserve this invariant.

For Semantic mode this invariant usually is not respected either, but for a different reason. All languages have different AST shapes and the (normalized) UAST structure may break the invariant by moving nodes around.

But for your specific case, this issue is caused by bblfsh/javascript-driver#74 - the UAST pipeline does not process comments correctly, and they are preserved in the same places as in Native mode.

@m09
Copy link
Author

m09 commented Mar 28, 2019

Thanks for the clarifications.

@dennwc dennwc closed this as completed Mar 28, 2019
@vmarkovtsev
Copy link

@dennwc @creachadair I wouldn't really close this one. Although I do understand the implemented logic,

  1. Somebody has to calculate the "correct" node spans and currently, it is us, the ML team.
  2. This behavior is confusing a common user. Tested on all 7 of us, and additionally on Martin's research group in KTH Stockholm. Everybody wants the "correct" positions in Semantic mode.
  3. Nothing prevents us from fixing the positions at the end of the normalization, e.g. by doing an extra pass over all the nodes. Scales linearly.

@dennwc
Copy link
Member

dennwc commented Apr 2, 2019

@vmarkovtsev You have to realize that nodes have a totally different order in Semantic. You just can't have a single universal representation and still have the "correct" node hierarchy in regards to positions for all languages. At least with the tree structures.

It is possible though if we switch to graph representation (#339) because you will be able to jump from Semantic nodes to Native and get positions and the "correct" hierarchy.

@creachadair @bzz As I mentioned in last couple of months, the issues with the current representation (tree structure) start to actually matter. I think we should bump the priority for the transition to the new representation.

@vmarkovtsev
Copy link

Actually, we are likely to use the Annotated mode in our current analyzers, because we need to reconstruct the original token stream byte-to-byte.

@dennwc
Copy link
Member

dennwc commented Apr 2, 2019

Everybody wants the "correct" positions in Semantic mode.

You mentioned Semantic, so I focused the answer on it.

Actually, we are likely to use the Annotated mode in our current analyzers, because we need to reconstruct the original token stream byte-to-byte.

Right, Annotated will work better for this use case, but again, it has a similar issue - some AST does not provide a "correct" hierarchy. This time we cannot fix it because we cannot modify the structure in this mode by definition.

We really need a way to link those trees (+ token stream) in an arbitrary way. I will dedicate some time this week to outline the proposal in regards to the new representation (graphs). It will solve all those issues.

@dennwc
Copy link
Member

dennwc commented Apr 2, 2019

Reopening and moving to SDK.

@dennwc dennwc reopened this Apr 2, 2019
@bzz bzz transferred this issue from bblfsh/javascript-driver Apr 8, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants