Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stream based blob API #9

Open
martinheidegger opened this issue Sep 1, 2015 · 4 comments
Open

Stream based blob API #9

martinheidegger opened this issue Sep 1, 2015 · 4 comments

Comments

@martinheidegger
Copy link
Collaborator

martinheidegger commented Sep 1, 2015

The current blob API is not stream based. It would be good if it could be similar to fs.createWriteStream or fs.createReadStream


Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.

@megastef
Copy link
Owner

megastef commented Sep 2, 2015

I think the problem is that the sha1 checksum needs to be calculated in advance.
See: https://crate.io/docs/stable/blob.html
"To upload a blob the sha1 hash of the blob has to be known upfront since this has to be used as the id of the new blob"

@martinheidegger
Copy link
Collaborator Author

I had a talk with Jodok yesterday & it is indeed the case as far as I understood. think it would still be possible to implement it using a file-system step in-between.

function createWriteStream() {
   var shasum = crypto.createHash('sha1')
   var input = new stream.Duplex();
   var fsStream = fs.createWriteStream('tmpfile')
   input.on('data', function (data) {
     shasum.update(data);
   });
   input.pipe(fsStream);
   fsStream.on('end', function () {
     fs.createReadStream('tmpfile').pipe(createHttpBlobRequest(shasum.digest('hex')))
   })
   return input;
}

This would still be better than trying to implement it by the user themselves.

@megastef
Copy link
Owner

megastef commented Sep 2, 2015

+0.5 it solves the problem to upload files larger than heap limit and it could reduce memory ussage.
Your example looks easy, but does not deal with with problems raised by using temporary files for 'streams'.

  1. Management of temporary files (naming, location, deletion in various error scenarios ...)
  2. Delay in upload. I gues streaming makes only sense for large files or realtime data over networks (like IP cams) - in case of live video the upload starts eventually after a long time (at end of stream).
  3. 'Endless streams' like video from an IP Camera or continious packet captures could fill up the disk without sending any byte to Crate! So the implementation might not meet the expectation of API users -> more documentation about it, adding limits for file size and timeouts for streams that don't provide data for N Minutes (to close the temporary file ...).

Well, we could start with a simple version - but I would recommend to Crate to accept streams without sha hash in the URL (or allow simply user defined ID's). Crate could calculate the hash during reception to return it in the http response. The client driver could calculate the sha hash during upload to verify it with the server value.

Is there an issue open @crate to support streaming of blobs (without sha upfront)?

@sairamdevarashetty
Copy link

testing the graphql

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants