Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unresponsive (from error?) #237

Closed
wants to merge 3 commits into from
Closed

unresponsive (from error?) #237

wants to merge 3 commits into from

Conversation

ihanson
Copy link
Contributor

@ihanson ihanson commented Mar 21, 2012

aleph.sagemath.org is now unresponsive, and here is the last bit of the stdout log:

InvalidDocument: BSON document too large (21829064 bytes) - the connected server supports BSON document sizes up to 16777216 bytes.
ERROR:root:Exception in I/O handler for fd <zmq.core.socket.Socket object at 0xaa9a70>
Traceback (most recent call last):
  File "/padic/scratch/jason/sage-4.7.1-sage.math.washington.edu-x86_64-Linux/local/lib/python2.6/site-packages/zmq/eventloop/ioloop.py", line 330, in start
    self._handlers[fd](fd, events)
  File "/padic/scratch/jason/sage-4.7.1-sage.math.washington.edu-x86_64-Linux/local/lib/python2.6/site-packages/zmq/eventloop/zmqstream.py", line 391, in _handle_events
    self._handle_recv()
  File "/padic/scratch/jason/sage-4.7.1-sage.math.washington.edu-x86_64-Linux/local/lib/python2.6/site-packages/zmq/eventloop/zmqstream.py", line 424, in _handle_recv
    self._run_callback(callback, msg)
  File "/padic/scratch/jason/sage-4.7.1-sage.math.washington.edu-x86_64-Linux/local/lib/python2.6/site-packages/zmq/eventloop/zmqstream.py", line 365, in _run_callback
    callback(*args, **kwargs)
  File "trusted_db.py", line 95, in <lambda>
    stream.on_recv(lambda msgs:callback(db,key,pipe,fs_auth_dict if isFS else db_auth_dict,rep,msgs,isFS), copy=False)
  File "trusted_db.py", line 159, in callback
    db.add_messages([c[0] for c in content])
  File "/padic/scratch/jason/simple-python-db-compute/db_mongo.py", line 112, in add_messages
    self.database.messages.insert(messages)
  File "/padic/scratch/jason/sage-4.7.1-sage.math.washington.edu-x86_64-Linux/local/lib/python2.6/site-packages/pymongo/collection.py", line 310, in insert
    continue_on_error, self.__uuid_subtype), safe)
  File "/padic/scratch/jason/sage-4.7.1-sage.math.washington.edu-x86_64-Linux/local/lib/python2.6/site-packages/pymongo/connection.py", line 807, in _send_message
    (request_id, data) = self.__check_bson_size(message)
  File "/padic/scratch/jason/sage-4.7.1-sage.math.washington.edu-x86_64-Linux/local/lib/python2.6/site-packages/pymongo/connection.py", line 784, in __check_bson_size
    (max_doc_size, self.__max_bson_size))
InvalidDocument: BSON document too large (21829064 bytes) - the connected server supports BSON document sizes up to 16777216 bytes.
Traceback (most recent call last):
  File "device_process.py", line 793, in <module>
    keys=keys, resource_limits=resource_limits)
  File "device_process.py", line 343, in device
    for X in db.get_input_messages(device=device_id, limit=-1):
  File "/padic/scratch/jason/simple-python-db-compute/db_zmq.py", line 35, in f
    output=self.socket.recv_pyobj()
  File "socket.pyx", line 801, in zmq.core.socket.Socket.recv_pyobj (zmq/core/socket.c:7113)
cPickle.UnpicklingError: invalid load key, '{'.

@kramer314
Copy link
Contributor

I'm not sure about the pickling error, but for the first error what version of MongoDB is aleph.sagemath running? A brief google search on the traceback message suggests that pre-1.8 versions have this issue.

@jasongrout
Copy link
Member Author

It's running mongodb-linux-x86_64-1.8.1

@kramer314
Copy link
Contributor

Hmm, then it seems like someone's input just generated a massive output message, and the DB complained about it being larger than 16MB (hence the error). Implementing gh-139 could be a good long-term solution, but the quickest solution seems to be to just catch the error when things are added to the database and drop the message. Or, do we want to find a way to let the user know that their output was too large / was rejected?

@jasongrout
Copy link
Member Author

This just halted all of the workers again. As a quick fix, let's just drop the computation (or if there's an easy way, insert a message back to the user saying there was an error).

@jasongrout jasongrout mentioned this pull request Mar 15, 2012
@jasongrout
Copy link
Member Author

Nice, and I agree with Alex's comment. And I'm really curious: how are you attaching code to an existing issue? You can do that from the API; are you using the API?

@ihanson
Copy link
Contributor

ihanson commented Mar 22, 2012

Yes, I’ve been using the Github API.

@jasongrout
Copy link
Member Author

Do you happen to have a shell script or python script you could share with us? I know it's easy enough to write my own, but I thought maybe you might have something already written.

@ihanson
Copy link
Contributor

ihanson commented Mar 22, 2012

Here: https://gist.github.com/2156799

"sequence": m["sequence"]})
log("INSERTED: %s"%('\n'.join(str(m) for m in success),))
if len(success) < len(messages):
log("FAILED TO INSERT %d message(s)" % (len(messages) - len(success)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, there is a tab on this line. That needs to be changed to spaces.

jasongrout pushed a commit to jasongrout/sagecell that referenced this pull request Mar 29, 2012
… referee fixes by jasongrout)

The problem was that messages were exceeding the mongodb limits for record size.  Now we just insert an error in the message stream when that happens
@jasongrout
Copy link
Member Author

I guess that's what I get for adding a few small referee commits and merging---now that it didn't contain your tab commit, it appears that I didn't merge your branch. But I made the same tab fix and a few other fixes that relate to printing enormous messages in the logs.

@jasongrout jasongrout closed this Mar 29, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants