Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to get text after html tag? #355

Closed
sooxt98 opened this issue Oct 23, 2020 · 7 comments
Closed

Is it possible to get text after html tag? #355

sooxt98 opened this issue Oct 23, 2020 · 7 comments

Comments

@sooxt98
Copy link

sooxt98 commented Oct 23, 2020

<span></span>1
<span></span>2
<span></span>3

i want to get 1,2,3 out;

i tried with doc.Contents().Each it just return the whole text out at once

@mna
Copy link
Member

mna commented Oct 23, 2020

Not directly with a selector, but see #287 .

@sooxt98
Copy link
Author

sooxt98 commented Oct 23, 2020

@mna i think goquery cant separate that example code into 6 chucks, it just return one big whole chunk with all the text inside ; so .Contents().Each is useless for me

@mna
Copy link
Member

mna commented Oct 23, 2020

What do you mean? Did you try the example in the issue I linked?

const data = `
<div>
<span></span>1
<span></span>2
<span></span>3
</div>
`

func main() {
	doc, err := goquery.NewDocumentFromReader(strings.NewReader(data))
	if err != nil {
		log.Fatal(err)
	}
	doc.Find("div").Contents().Each(func(i int, s *goquery.Selection) {
		if goquery.NodeName(s) == "#text" {
			fmt.Printf(">>> (%d) >>> %s\n", i, s.Text())
		}
	})
}

// Prints:
// >>> (0) >>> 
// >>> (2) >>> 1
// >>> (4) >>> 2
// >>> (6) >>> 3

@sooxt98
Copy link
Author

sooxt98 commented Oct 23, 2020

@mna please try not to wrap with parent div

@mna
Copy link
Member

mna commented Oct 23, 2020

Just change the "div" selector (which obviously won't work) with "body".

@sooxt98
Copy link
Author

sooxt98 commented Oct 23, 2020

Okay thanks, I think i might manually add the tag around it

@sooxt98 sooxt98 closed this as completed Oct 23, 2020
@mna
Copy link
Member

mna commented Oct 23, 2020

Oh you don't have to, if all you have is the three spans, when parsed with the net/html parser, it will automatically add the html/head/body tags to make it a proper document (the Go html parser uses the same logic as the official html5 parser unsed in browsers, so it tries hard to "fix" documents to make them valid). When in doubt, you should print the full html document after the call to goquery.NewDocument... (using goquery.OuterHtml(doc.Selection)).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants