Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use less state-full design for collecting metrics #102

Open
izgzhen opened this issue Dec 31, 2018 · 2 comments
Open

Use less state-full design for collecting metrics #102

izgzhen opened this issue Dec 31, 2018 · 2 comments

Comments

@izgzhen
Copy link
Collaborator

izgzhen commented Dec 31, 2018

discussion started in #93

@anhnamtran
Copy link
Collaborator

anhnamtran commented Dec 31, 2018

@Calvin-L @izgzhen I have several questions:

  1. What do you think the stats structure would look like? Can it be a class with fields for all the data it stores? Can it just be a dictionary created by the improve_implementation method?
  2. Should the stats structure handle threading itself or should improve_implementation handle threading. More specifically, since we are updating it in the ImproveQueryJob for-loop, are they sharing the same stats structure? Or are we somehow making one for each, then aggregating them into one in improve_implementation? Or maybe there's a better way than both of the ones I said.
  3. How would we go about getting the difference in time between each improve loop iteration (the "forever" while-loop) if we want to preserve its responsibility and not have an out-parameter?
  4. What Zhen mentioned in his comment in Counter for number of improvements done #93, not messing with improve itself also creates some difficulties in implementing improvement number limit.

@Calvin-L
Copy link
Collaborator

Calvin-L commented Jan 4, 2019

Ultimately the design is up to you, but:

What do you think the stats structure would look like?

It doesn't really matter, as long as it is documented. I would probably do something lightweight, like "a dictionary whose keys are method names and whose values are lists of the timestamps of when improvements were made".

since we are updating it in the ImproveQueryJob for-loop, are they sharing the same stats structure?

That would be an example of shared mutable state, so I'm opposed to it. It would be much better for each job to collect its own local statistics and to aggregate them at the end. If you use the dictionary I proposed, each job would store a list and the aggregation would just arrange them into a dictionary by job name.

How would we go about getting the difference in time between each improve loop iteration

This can be done by the loop in ImproveQueryJob.

How comfortable are you with generators (sometimes called "coroutines") in Python?

def generate():
  i = 0
  while i < 100:
    print("yielding...")
    yield i
    i += 1

for value in generate():
  print("got {}".format(value))

In the example, the generator generate() and the loop interleave: instead of printing 100 "yielding..." lines followed by 100 "got X" lines, the program alternates between "yielding..." and "got X".

The improve loop is a generator, so the time difference between yields can be easily calculated by the caller. Instead of printing "got X", compute the current time and subtract it from the previous time.

You may want to alter improve so that it yields once on every loop iteration, even if an improvement was not found. (In the other thread I wrote up a more detailed description of this idea.)

not messing with improve itself also creates some difficulties in implementing improvement number limit

Actually it doesn't! The reason you were messing with improve was to make it check the improvement number and stop when it reached a threshold. However improve already comes with a super-general stopping mechanism: the stop_callback parameter. All you should need to do is modify the stop callback passed by the ImproveQueryJob.

If you intend the limit to be global rather than per-job, then instead you should modify the top-level loop that collects results to call stop_jobs(...) and return when a threshold is reached.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants