Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create extraction interface #3

Open
rumkin opened this issue Apr 3, 2019 · 4 comments
Open

Create extraction interface #3

rumkin opened this issue Apr 3, 2019 · 4 comments
Labels
enhancement New feature or request

Comments

@rumkin
Copy link
Owner

rumkin commented Apr 3, 2019

Create content extraction interface. It should receive html and return an object with:

  • title
  • content
  • entry point

The list could be extended.

@tamb
Copy link
Contributor

tamb commented Aug 23, 2019

I agree, but I think it should follow the Response API closer. https://developer.mozilla.org/en-US/docs/Web/API/Response

So maybe instead of title we have headers.

headers should contain an object of any meta data. All of this should be sanitized for functions. So it should be stringified to prevent xss, idk the best method to do this.

body : contains the response body

and then entry point or container or slot or saddle or whatever term for where the content will be loaded into.

@rumkin
Copy link
Owner Author

rumkin commented Aug 25, 2019

In current form I was trying to avoid collisions in HTML of two pages and minify memory usage, that's why it's just pulling only title and container's data.

And now I think it should be improved to make it possible to load pages with different structure, for injecting things like viewport meta tag (<meta name="viewport" ...) and os specific links (like < link rel="apple-touch-icon" ...). So I think there will be head and content properties presented as strings and expires property.

While expires property could require information from response headers, they will be passed to extraction callback as argument and used once to produce static data. Maybe it would be correct to give an ability to define own properties for an extracted content. And thus we need to use extraction and prerendering callbacks. I think this interface could look like that:

type Page = {
  head: String,
  content: String,
  expires: Date,
  props: CustomProps,
}

type CustomProps = {[key:String]: Boolean|Number|String|Object|Array}

type ExtractCallback = (url: String, headers: Map, body: Uint8Array) => Page
type PrerenderCallback = (page: Page) => Page

The Page type is internal pill's representation of the document. It should be convertible to JSON to be placed into history. Though it has props where developer could put some data which could be used on prerendering step.

Will it cover your needs?

@tamb
Copy link
Contributor

tamb commented Aug 25, 2019

So the head would be replaced entirely? That would be fine with me. Technically it's a new page so that would be what's expected.

The issue that would exist with custom props, etc is that you'd have to remove them on unload.

This is tricky. Turbolinks merges the head. But I think that's overkill. I think it's probably safe to assume that the head will be similar across pages.

Maybe the extraction interface should just contain more data, and let the dev do what they want with it. Keep the content loading fairly unnopinionated.

@rumkin rumkin added the enhancement New feature or request label Sep 1, 2019
@rumkin
Copy link
Owner Author

rumkin commented Sep 1, 2019

CustomProps is just a key-value storage which developer could use on rendering stage. Thus develop only decides what and how will be stored and rendered. As I think it intersects with your suggestion to let developer decide what to do. I just want to make extracted data structured well. In this case developer could store everything stringified in head and content props or as structured data in custom props.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants