Skip to content
This repository has been archived by the owner on Sep 12, 2024. It is now read-only.

Submit execution context to TextCortex API #7

Open
4 of 6 tasks
osolmaz opened this issue Oct 3, 2022 · 0 comments
Open
4 of 6 tasks

Submit execution context to TextCortex API #7

osolmaz opened this issue Oct 3, 2022 · 0 comments
Assignees
Labels
enhancement New feature or request high priority Issues that need immediate attention

Comments

@osolmaz
Copy link
Collaborator

osolmaz commented Oct 3, 2022

Currently, requests to TextCortex API generate code independently for each cell. Without the context of the entire notebook, global variables, etc. the API returns disparate code, forcing the user to be overly specific about e.g. variable names in their prompts.

Ideally, the entire execution context, i.e.

  1. inputs of previously executed cells,
  2. code generated from prompts,
  3. outputs of previously executed cells,
  4. names of variables in the global namespace,
  5. values of variables in the global namespace

should all be submitted to the API in each request for the best possible generation.

Bandwidth is a bottleneck for code generated remotely, so the request payload would need to be pruned without losing too much of the context. Say it should not exceed a ballpark of 500kB.

Implementation

Fortunately, IPython caches inputs and outputs for each cell and stores them in hidden variables in the global namespace, which we can easily access to:

https://ipython.readthedocs.io/en/stable/interactive/reference.html#input-caching-system

For submitting to a remote API, history variables need to be pruned down to the aforementioned limit. Code generation performance is inversely proportional to the amount of discarded information, but we expect it to perform already pretty well with only (1), (2) and (4) from above.

  • Implement logic to pack

    • (1)
    • (2)
    • (3)
    • (4)
    • (5)
  • Create a schema to convert the dict into JSON

That JSON would then be included in the payload and processed by the API for each request.

Notes

The JSON schema is to be the same as Jupyter notebook format where code generation specific data are stored in cell metadata.

Future work

  • A more sophisticated pruning algorithm that processes and includes (3) and (5) in the payload
@osolmaz osolmaz added the enhancement New feature or request label Oct 3, 2022
@osolmaz osolmaz self-assigned this Oct 3, 2022
@osolmaz osolmaz added the high priority Issues that need immediate attention label Oct 8, 2022
@osolmaz osolmaz pinned this issue Oct 8, 2022
@osolmaz osolmaz unpinned this issue Nov 9, 2022
@osolmaz osolmaz pinned this issue Nov 19, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request high priority Issues that need immediate attention
Projects
None yet
Development

No branches or pull requests

1 participant