Feature request: dynamically adding checks during execution? #222

pazz · 2020-06-28T12:04:08Z

Hello once more, and sorry for spamming you with feature requests without contributing much first.

I am teaching an undergraduate "Intro to OOP" class, with exercises in Java. In order to facilitate check50 checks for structural inspection (class hierarchy, or method signatures), I wrote a small extension that allows to run junit5 unit tests from check50. At this point, its use is rather cumbersome due to the fact that one has to manually add check50 checks for every unit test on its own unless one is fine with one big check for all unit tests in a test class.

It'd be great if I could dynamically add checks to the dependency tree at execution time.

This would allow to first compile a (junit) test class, then extract the contained unit tests and interpret them as individual check50 checks for marking. In fact, I can easily run all unit test in one go and retroactively interpret the outcome. So it would even suffice to make check50 aware of new checks after the results are computed. But for the sake of consistent student feedback Id'be be great to at least make it look as if regular checks have been run.

It looks to me as if check50's runner class computes the dependency DAG of checks before executing them.
Any pointers would be much appreciated.

Thanks again for an awesome tool,
P

Jelleas · 2020-06-28T13:03:52Z

I might have something to help you here. You're correct in that check50 creates a dependency DAG prior to executing checks. And because checks run in a separate process, it's impossible, or at least very difficult, to change the DAG from within a check. However, to create this DAG, check50 will import the checks module. That means any code that is in that module will run. You can use this to dynamically create checks prior to running checks.

In fact I've done so in an extension of my own at:

https://github.com/uva/ci_checks/blob/ace8dc6d45b69b7dcafa399c96d6e28c568596a3/mod1/__init__.py#L67-L95

Here I use one check (exists) to run all unittests in a Jupyter Notebook, but this might as well be JUnit. Then exists passes down the results in a large tuple. This can be any data structure that is serializable (can be pickled).

@check50.check()
def exists():
    ...
    for cell in cells:
        # Execute the cell
        try:
            execute(cell)
            exception = ""
            passed = True
        except check50.Failure as f:
            exception = str(f)
            passed = False

        # If it's a test cell, record result
        if check_jupyter.is_test_cell(cell):
            results.append((check_jupyter.get_cell_id(cell), passed, exception))

    # Pass down the results to the actual checks
    return tuple(results)

Then I dynamically create checks, that each depend on exists to receive its results.

for test_id in get_test_ids("module 1.ipynb"):
        check = create_check(test_id)

        # Register the check with check50
        check = check50.check(dependency=exists)(check)

        # Add the check to global module scope
        globals()[test_id] = check

The dynamically created checks themselves just look up their results and behave accordingly.

def check(results):
        for id, passed, exception in results:
            if id == test_id and not passed:
                raise check50.Failure(exception)
        return results

This makes it so that I don't have to write a check50 check for each unittest, the set of unittests is run exactly once, and from the students perspective, each unittest is just a check50 test.

Hopefully this helps!

pazz · 2020-06-29T11:10:15Z

Thanks @Jelleas this is very close to what I had in mind.

The problem with this for my particular use case is that getting the list of test IDs before running any checks is not possible.

The thing is that unit tests may not even compile, because the student's submission may not match the expected method signatures. This is why I believe it makes sense to have one check for compiling the unit test (and possibly running them all) to be able to complain about a non-conformant submission without blowing up check50.
After the test class compiled, it is easy to run it and extract all testcases that it considered (which may not be simple to identify syntactically beforehand, in case of parametrized tests), but at this point I'm not sure how to insert derived checks any more.

Does that make sense?

At the moment, the only ways around this problem I can think of would be to include compiled test classes into the problem set and hope that the produced .class files are compatible with the JVM the student runs, or to hard-code all testcases (perhaps by scripting this at the point of pset creation).

pazz · 2020-07-04T18:16:55Z

I've decided to go include compiled tests, run them once and interpret the junit xml report by looking for all testcases, then introducing one check, as you propose.

I am observing some really strange behaviour trying to reproduce your approach.
Could you comment on this?

Basically, I've hard-coded the list of test cases for now, and am trying to create checks as you do in init().
This almost works, except that only for the last check, I get some file system error when check50 tries to copy the state of the parent check into the temp-file for this last child.

The check itself does nothing, has

check.__name__ = "4"
check.__doc__ = "ItemTest__getPrice passes"

The output of check50 --dev --log debug ... is as follows.

:| ItemTest__getPrice passes
    check50 ran into an error while running checks!
    FileExistsError: [Errno 17] File exists: '/tmp/tmpw3sw2n7o/4'
      File "/home/pazz/.local/lib/python3.8/site-packages/check50/runner.py", line 136, in wrapper
    shutil.copytree(src_dir, internal.run_dir)
      File "/usr/lib/python3.8/shutil.py", line 554, in copytree
    return _copytree(entries=entries, src=src, dst=dst, symlinks=symlinks,
      File "/usr/lib/python3.8/shutil.py", line 455, in _copytree
    os.makedirs(dst, exist_ok=dirs_exist_ok)
      File "/usr/lib/python3.8/os.py", line 223, in makedirs
    mkdir(name, mode)

Now, if I omit line 90 or line 93 in your code, then none of these child-checks run. (both register the check globally, why are both necessary?).

If I include both lines, I see the error above, but only for the last of the generated child checks!
This last check only runs once and there are no other checks by that name. I don't see why this folder should exist. I have also tried with more complicated names to be sure, same effect.

Any help would be much appreciated; Thanks a lot!
P

Jelleas · 2020-07-04T19:23:22Z

I am observing some really strange behaviour trying to reproduce your approach.
Could you comment on this?

check50 will create a directory (inside a temporary directory) for each check, named after the check. So for check.__name__ = "4", check50 will try to create the directory 4 within a temporary directory (happens to be /tmp/tmpw3sw2n7o in this case). Looks like that's failing now with the message:

FileExistsError: [Errno 17] File exists: '/tmp/tmpw3sw2n7o/4'

Apparently, the directory (or file) 4 already exists. My guess is, without seeing the rest of the code, that there's probably two checks with the same __name__. In that case, renaming one should fix it.

Now, if I omit line 90 or line 93 in your code, then none of these child-checks run. (both register the check globally, why are both necessary?).

What line 90 does is manually call the check50.check decorator, that would normally be used to mark checks in check50 like so:

@check50.check(dependency=exists)
def my_check():
    ...

But because decorators in Python are essentially just functions that produce functions themselves , we can transform the above to:

def my_check():
    ...
my_check = check50.check(dependency=exists)(my_check)

Line 93 goes ahead, and puts the newly created function in the globals() dictionary. If unfamiliar, globals() is everything that exists in global (module) scope. This should make it so that after:

test_id = "my_check"
globals()[test_id] = check

You can now call the function check with my_check(). Just like you would if it were defined statically.

why are both necessary?

Underneath the hood check50 uses the inspect module to grab all checks from a module like so:

checks = inspect.getmembers(check_module, lambda f: hasattr(f, "_check_dependency"))

For this to work, each check needs to have a _check_dependency, that's just something the check50.check decorator adds, and needs to be a global in the module.

pazz · 2020-07-04T20:06:27Z

Thanks again for your quick response Jelle! Quoting Jelle van Assema (2020-07-04 20:23:33)

I am observing some really strange behaviour trying to reproduce your approach. Could you comment on this? check50 will create a directory (inside a temporary directory) for each check, named after the check. So for check.__name__ = "4", check50 will try to create the directory 4 within a temporary directory (happens to be /tmp/tmpw3sw2n7o in this case). Looks like that's failing now with the message: FileExistsError: [Errno 17] File exists: '/tmp/tmpw3sw2n7o/4'

I understand (from reading the check50 sources).

Apparently, the directory (or file) 4 already exists. My guess is, without seeing the rest of the code, that there's probably two checks with the same __name__. In that case, renaming one should fix it.

That's the thing. There is no other check with that name. I suspect that the one that I have is registered more than once for some reason, but I don't see why that would only affect the one last check in the sequence. I have posted a stripped down version of my code here in case you want to have a look: https://gist.github.com/pazz/f109183b56cf8919e1efd358f3b7cc9a

Now, if I omit line 90 or line 93 in your code, then none of these child-checks run. (both register the check globally, why are both necessary?). What line 90 does is manually call the check50.check decorator, that would normally be used to mark checks in check50 like so: @check50.check(dependency=exists) def my_check(): ... But because decorators in Python are essentially just functions that produce functions themselves , we can transform the above to: def my_check(): ... my_check = check50.check(dependency=exists)(my_check)

Yes, I was referring to this line inside the definition of the decorator, which seems to register the check in a global list called "_check_names". https://github.com/cs50/check50/blob/b8b140ffe0beb701b801e70071916b7d3207add7/check50/runner.py#L122 I was wondering why, if the check is known there, it needs to be registered as well under the pset module. But that was just a side issue and a design decision, which is fine. I am not sure how to debug my pset module during set up except by calling check50._api.log, so I cannot be 100% sure, but while I was playing with check50.run, it sometimes looked as if my child-checks (which only log in the linked gist) caused their parent to run multiple times: Only the parent called run, and I saw it logged multiple times in the check50 output). Botton line is that I suspect my checks are executed more than once, but not all of them.

Jelleas · 2020-07-04T21:02:55Z

https://gist.github.com/pazz/f109183b56cf8919e1efd358f3b7cc9a

I tried to replicate the error from this snippet, but I couldn't exactly. I did however run into another issue that might be the underlying cause of this. Because you are running the following inside the module, and not inside a function (such as init()):

i = 0
for case in report['testcases']:
    testmethod = case['name']
    check_id = f"{testclass}__{testmethod}"

    check = create_check(testclass, i)
    i += 1

    # Register the check with check50
    check = check50.check(run_tests)(check)

    # Add the check to global module scope
    globals()[check_id] = check

All variables become globals, including the variable check. The variable check here will always point to the last check and is now for all intents and purposes just a regular check50.check. However, the global check_id identifies the same check through:

globals()[check_id] = check

That means there are now two globals pointing to the same check, check 4 in your case. Both checks will try to create a directory called 4 and that's where the error comes from.

Should be a simple fix, just wrap this bit of code in a function and call the function instead:

def init():
    i = 0
    for case in report['testcases']:
        testmethod = case['name']
        check_id = f"{testclass}__{testmethod}"

        check = create_check(testclass, i)
        i += 1

        # Register the check with check50
        check = check50.check(run_tests)(check)

        # Add the check to global module scope
        globals()[check_id] = check
init()

Jelleas · 2020-07-04T21:10:20Z

I am not sure how to debug my pset module during set up except by calling check50._api.log,

Oh, looks like I'm blocking a PR that should make this easier. Long story short, a recent release accidentally inverted the logic for check50's verbose mode and through that --dev now hides anything send to stdout/stderr. That in turn makes debugging a little tricky!

Jumping back to 3.0.10 should fix this and you should then be able to just use print():

pip install check50==3.0.10

pazz · 2020-07-04T22:42:31Z

I tried to replicate the error from this snippet, but I couldn't exactly. I did however run into another issue that might be the underlying cause of this. Because you are running the following inside the module, and not inside a function (such as init()):

Of course! that was exactly what I was missing and it works just as expected now.
I'll post here once I'm done and have a presentable extension, in the hope that it may be useful to others.
Cheers again for your support!
P

pazz · 2020-07-05T10:21:55Z

Could you explain what's going on here?
It looks like every call to check50.check will cause init to be called.

import check50

@check50.check()
def item_exists():
    pass

@check50.check()
def item_still_exists():
    pass

def init():
    print("init called")

It seems that with every new @check50.check() before the definition of init, that utility function gets called once.
Here is the output (python 3.8 and check50 v 3.0.10):

check50 --dev -v ~/repo/teaching/2020-COMP122-OOP/assessment/tmp/resit/pset
init called
init called
init called
Results for /home/pazz/repo/teaching/2020-COMP122-OOP/assessment/tmp/resit/pset generated by check50 v3.0.10
:) item exists
    checking that Item.java exists...
:) item still exists
    checking that Item.java exists...
To see the results in your browser go to file:///tmp/tmp8wkrymjk.html

Jelleas · 2020-07-05T11:43:41Z

This is semi-true, check50 runs checks in parallel by default through multi-processing. But because processes are relatively expensive to start, check50 uses a pool of them through futures.ProcessPoolExecutor. That means that depending on your machine, rather than on the number of checks, there could be anywhere from 2 to 64 processes started by check50. Each of these processes will have to import your checks module, and will inevitably run init().

The ability to easily disable multi-processing has been on our minds for some time, but there wasn't been a compelling enough use-case yet to push it up the list of priorities. But perhaps this is one!

For now, if you would like to opt-out, you can set the environment variable CHECK50_WORKERS to limit the number of processes (workers) spawned by check50. You can do this prior to running check50, or in your checks module with:

import os
os.environ["CHECK50_WORKERS"] = "1"

Note that this still runs your checks in a separate process from the tool itself. That means your checks module will be imported twice, once by the tool, and once by the process running your checks.

pazz · 2020-07-05T15:50:15Z

That sounds like a plausible explanation but your fix seems to have no effect:

[~/tmp] python --version
Python 3.8.3
[~/tmp] check50 --version
check50 3.0.10
[~/tmp] ls -la pset-init
total 16
drwxr-xr-x 2 pazz pazz 4096 Jul  5 16:47 .
drwxr-xr-x 6 pazz pazz 4096 Jul  5 16:44 ..
-rw-r--r-- 1 pazz pazz   14 Jul  5 16:47 .cs50.yml
-rw-r--r-- 1 pazz pazz  205 Jul  5 16:41 __init__.py
[~/tmp] cat pset-init/.cs50.yml  
check50: true
[~/tmp] cat pset-init/__init__.py 
import os
os.environ["CHECK50_WORKERS"] = "1"
import check50

@check50.check()
def item_exists():
    pass

@check50.check()
def item_still_exists():
    pass

def init():
    print("init called")

init()
[~/tmp] CHECK50_WORKERS=1 check50 --dev ~/tmp/pset-init
init called
init called
init called
Results for /home/pazz/tmp/pset-init generated by check50 v3.0.10
:) item exists
:) item still exists
To see the results in your browser go to file:///tmp/tmp4pbloh3a.html

I get the exact same result when removing the first two lines of pset-init/__init__.py and pset-init/__pycache__/.

pazz · 2020-07-06T09:01:37Z

The bug I described above is somewhat unrelated to the original feature request.
In fact, although the pset module gets interpreted multiple times, the actual checks are run only once each.
I guess that you did not notice that in your jupyter pset because you don't log in init, and because it is fast enough even if interpreted multiple times.
In my case, I am executing a costly shell command inside init.

Regarding dynamically adding checks: I noticed that the dependency DAG is in fact a tree: you cannot declare more than one dependency per check. This means instead of precomputing a full dependency tree, one could in principle only keep a global todo-queue, which contains only unfinished parent checks but not its children.
Then, once a check is done one dynamically adds all its children to that queue. As a byproduct, you can now also let a check manipulate its own dependency subtrees!

Jelleas · 2020-07-06T14:18:29Z

You're right, with the notebooks the time spent on creating the dynamic checks is rather negligible and as such I wasn't concerned with how many times it actually runs. I can see how that is different in your situation. The problem at hand is, currently each "check/worker" process needs to have all possible checks in memory. Because the "manager/main" process can call for any check to run in any process. This is how the current model with a futures.ProcessPoolExecutor works. That means that whatever is created dynamically needs to exist and effectively be recreated in each process,

Now my solution was to rediscover and recreate each check every time. But in your case, perhaps it's best if you discover checks once (run the shell command once), but then leverage something persistent like the file system to communicate the results of the discovery. Then have each check process dynamically create checks from that. Something like: https://gist.github.com/Jelleas/21278df92f40e4804bd74084a2a3dbc0

I did have to add support for this in check50. This currently lives in a feature branch over at https://github.com/cs50/check50/tree/run_dir. So to try out my gist, please install check50 via:

pip install git+https://github.com/cs50/check50@run_dir --upgrade

Jelleas · 2020-07-06T14:24:04Z

As to why init() is called so many times even with CHECK50_WORKERS set. I actually only just remembered, check50 will re-import the module after every check. This is to make an effort to remove any state (globals in particular) after the execution of a check. This was important here because it is nondeterministic which checks run in which process. But it works against us in our use-case here.

Jelleas · 2020-07-06T14:51:40Z

let a check manipulate its own dependency subtrees!

I would like to make this possible, and I would like to natively support dynamic checks too. Such that we don't have to "hack" it in. I do also like the idea of being able to create checks only once they become relevant, a collapsible list of sorts. Where you only start to show checks for some part of an assignment once students reach that part.

But this all causes a lot of friction with multiprocessing. In particular, once a check starts to dynamically create other checks, we'd then need to ensure those checks are run in the same process as the check that created them. That's a level of control that we don't have with futures.ProcessPoolExecutor and it's something we'd need to build.

Keeping that in mind, a bit of future talk, I propose a new decorator perhaps. That signals a priori, this check can create other checks and cannot be run through our current model of futures.ProcessPoolExecutor.

@check50.dynamic_check()
def foo():
     """Assignment 2 exists"""     

    # Adds bar
    @check50.check()
    def bar():
        pass

    # Alternative method
    def baz():
         pass
    check50.check(baz)

But perhaps baby-steps first, I think a lot of headaches and gotchas in this thread have to do with multiprocessing. And being able to opt-out of MP would address most of the issues. Something like:

check50.single_process()

Edit: just to note, it would always be two processes (a manager and a worker) to catch any timeouts and any potential crashes.

pazz closed this as completed Jul 4, 2020

Jelleas reopened this Jul 6, 2020

Jelleas mentioned this issue Jul 6, 2020

check50.internal.run_root_dir #225

Merged

Jelleas mentioned this issue Jul 17, 2020

check50.check(dynamic=True) #230

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: dynamically adding checks during execution? #222

Feature request: dynamically adding checks during execution? #222

pazz commented Jun 28, 2020

Jelleas commented Jun 28, 2020

pazz commented Jun 29, 2020 •

edited

Loading

pazz commented Jul 4, 2020 •

edited

Loading

Jelleas commented Jul 4, 2020

pazz commented Jul 4, 2020 via email

Jelleas commented Jul 4, 2020 •

edited

Loading

Jelleas commented Jul 4, 2020 •

edited

Loading

pazz commented Jul 4, 2020

pazz commented Jul 5, 2020

Jelleas commented Jul 5, 2020 •

edited

Loading

pazz commented Jul 5, 2020 •

edited

Loading

pazz commented Jul 6, 2020

Jelleas commented Jul 6, 2020

Jelleas commented Jul 6, 2020

Jelleas commented Jul 6, 2020 •

edited

Loading

Feature request: dynamically adding checks during execution? #222

Feature request: dynamically adding checks during execution? #222

Comments

pazz commented Jun 28, 2020

Jelleas commented Jun 28, 2020

pazz commented Jun 29, 2020 • edited Loading

pazz commented Jul 4, 2020 • edited Loading

Jelleas commented Jul 4, 2020

pazz commented Jul 4, 2020 via email

Jelleas commented Jul 4, 2020 • edited Loading

Jelleas commented Jul 4, 2020 • edited Loading

pazz commented Jul 4, 2020

pazz commented Jul 5, 2020

Jelleas commented Jul 5, 2020 • edited Loading

pazz commented Jul 5, 2020 • edited Loading

pazz commented Jul 6, 2020

Jelleas commented Jul 6, 2020

Jelleas commented Jul 6, 2020

Jelleas commented Jul 6, 2020 • edited Loading

pazz commented Jun 29, 2020 •

edited

Loading

pazz commented Jul 4, 2020 •

edited

Loading

Jelleas commented Jul 4, 2020 •

edited

Loading

Jelleas commented Jul 4, 2020 •

edited

Loading

Jelleas commented Jul 5, 2020 •

edited

Loading

pazz commented Jul 5, 2020 •

edited

Loading

Jelleas commented Jul 6, 2020 •

edited

Loading