You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Sep 12, 2024. It is now read-only.
Currently, requests to TextCortex API generate code independently for each cell. Without the context of the entire notebook, global variables, etc. the API returns disparate code, forcing the user to be overly specific about e.g. variable names in their prompts.
Ideally, the entire execution context, i.e.
inputs of previously executed cells,
code generated from prompts,
outputs of previously executed cells,
names of variables in the global namespace,
values of variables in the global namespace
should all be submitted to the API in each request for the best possible generation.
Bandwidth is a bottleneck for code generated remotely, so the request payload would need to be pruned without losing too much of the context. Say it should not exceed a ballpark of 500kB.
Implementation
Fortunately, IPython caches inputs and outputs for each cell and stores them in hidden variables in the global namespace, which we can easily access to:
For submitting to a remote API, history variables need to be pruned down to the aforementioned limit. Code generation performance is inversely proportional to the amount of discarded information, but we expect it to perform already pretty well with only (1), (2) and (4) from above.
Implement logic to pack
(1)
(2)
(3)
(4)
(5)
Create a schema to convert the dict into JSON
That JSON would then be included in the payload and processed by the API for each request.
Notes
The JSON schema is to be the same as Jupyter notebook format where code generation specific data are stored in cell metadata.
Future work
A more sophisticated pruning algorithm that processes and includes (3) and (5) in the payload
The text was updated successfully, but these errors were encountered:
Currently, requests to TextCortex API generate code independently for each cell. Without the context of the entire notebook, global variables, etc. the API returns disparate code, forcing the user to be overly specific about e.g. variable names in their prompts.
Ideally, the entire execution context, i.e.
should all be submitted to the API in each request for the best possible generation.
Bandwidth is a bottleneck for code generated remotely, so the request payload would need to be pruned without losing too much of the context. Say it should not exceed a ballpark of 500kB.
Implementation
Fortunately, IPython caches inputs and outputs for each cell and stores them in hidden variables in the global namespace, which we can easily access to:
https://ipython.readthedocs.io/en/stable/interactive/reference.html#input-caching-system
For submitting to a remote API, history variables need to be pruned down to the aforementioned limit. Code generation performance is inversely proportional to the amount of discarded information, but we expect it to perform already pretty well with only (1), (2) and (4) from above.
Implement logic to pack
Create a schema to convert the dict into JSON
That JSON would then be included in the payload and processed by the API for each request.
Notes
The JSON schema is to be the same as Jupyter notebook format where code generation specific data are stored in cell metadata.
Future work
The text was updated successfully, but these errors were encountered: