Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scoop locks up if out of memory #55

Open
joernhees opened this issue Apr 23, 2017 · 0 comments
Open

scoop locks up if out of memory #55

joernhees opened this issue Apr 23, 2017 · 0 comments

Comments

@joernhees
Copy link
Contributor

I'm running a series of experiments with scoop on a slurm cluster.

Tonight some of my tasks seem to have run out of memory:

Traceback (most recent call last):
  File "/software/python/2.7.12/lib/python2.7/logging/__init__.py", line 872, in emit
Bad address (bundled/zeromq/src/tcp.cpp:244)
    stream.write(ufs % msg)
  File "/home/hees/graph-pattern-learner/venv/lib/python2.7/codecs.py", line 706, in write
    return self.writer.write(data)
  File "/home/hees/graph-pattern-learner/venv/lib/python2.7/codecs.py", line 370, in write
    self.stream.write(data)
IOError: [Errno 12] Cannot allocate memory
...
Traceback (most recent call last):
  File "/software/python/2.7.12/lib/python2.7/runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/software/python/2.7.12/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/home/hees/graph-pattern-learner/venv/lib/python2.7/site-packages/scoop/bootstrap/__main__.py", line 302, in <module>
    b.main()
  File "/home/hees/graph-pattern-learner/venv/lib/python2.7/site-packages/scoop/bootstrap/__main__.py", line 92, in main
    self.run()
  File "/home/hees/graph-pattern-learner/venv/lib/python2.7/site-packages/scoop/bootstrap/__main__.py", line 290, in run
    futures_startup()
  File "/home/hees/graph-pattern-learner/venv/lib/python2.7/site-packages/scoop/bootstrap/__main__.py", line 271, in futures_startup
    run_name="__main__"
  File "/home/hees/graph-pattern-learner/venv/lib/python2.7/site-packages/scoop/futures.py", line 64, in _startup
    result = _controller.switch(rootFuture, *args, **kargs)
  File "/home/hees/graph-pattern-learner/venv/lib/python2.7/site-packages/scoop/_control.py", line 231, in runController
    future = execQueue.pop()
  File "/home/hees/graph-pattern-learner/venv/lib/python2.7/site-packages/scoop/_types.py", line 320, in pop
    self.updateQueue()
  File "/home/hees/graph-pattern-learner/venv/lib/python2.7/site-packages/scoop/_types.py", line 343, in updateQueue
    for future in self.socket.recvFuture():
  File "/home/hees/graph-pattern-learner/venv/lib/python2.7/site-packages/scoop/_comm/scoopzmq.py", line 279, in recvFuture
    received = self._recv()
  File "/home/hees/graph-pattern-learner/venv/lib/python2.7/site-packages/scoop/_comm/scoopzmq.py", line 188, in _recv
    thisFuture = pickle.loads(msg[1])
IndexError: list index out of range

The main issue here is that it seems as if scoop did not completely terminate, but remains running in a locked up state (0 load) for hours.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant