-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DSPy 2.5 + TypedPredictor with List[str] outputs throws during optimization, but inference works #1567
Comments
Thanks @dbczumar ! This is actually an interesting problem. The way to debug this is to do When you do that, you see:
In zero-shot mode, this model understands the format of the task enough to output a list. When you're about to bootstrap, it sees that you do have pre-labeled text/tokens pairs, so it formats them before bootstrapping as "labeled" demos, with the goal of producing (good) "bootstrapped" demos. Now, in 2.5, lists are formatted by default into this numbered list. This is handy for input fields. I guess for output fields, we may want to raise an exception if you feed us any non-str, or maybe we call str(.) for you. But I'm not sure that's better either. Maybe the right thing to do is to look at whether the user supplied a type annotation for the output field. If they did, we should format their labels in a way that would lead to correct parsing. That's the right thing here, yup. |
To illustrate the scope of the problem, this is a temporary fix that does work: tokenizer_train_set = [
dspy.Example(
text=get_input_text(data_row),
tokens=str(data_row["tokens"]) # note the cast here
).with_inputs("text")
for data_row in train_data
]
def validate_tokens(expected_tokens, predicted_tokens, trace=None):
import ast
return ast.literal_eval(expected_tokens.tokens) == predicted_tokens.tokens An even shorter, independent fix (that doesn't require the stuff above) is to pass But of course let's figure out how to make this a general fix. Aside: Using BootstrapFewShot with Predict makes for a great bug report. But in general using BootstrapFS with Predict (i.e., without any intermediate steps like a chain of thought) and with an exact match metric ( |
So ultimately there are three tasks here. In the very short term, we should be more careful with formatting output fields, especially (only?) if the user supplied a type annotation for them. We should format them in a way that would parse, if they are already objects of the right type. If they're strings, we should check that they would parse. In the short term, we should think about the right abstractions for formatting fields. I think it will heavily depend on the type annotations (whether the user supplied them, and what the type is), on whether the field is an input or an output, and on whether the value has duck-typed methods like Less critically, in the longer term, we should re-consider how BootstrapFS uses labeled demos under the hood. It's controlled by a flag (max_labeled_demos), but it could be more versatile, e.g. if it leads to errors it can automatically avoid the examples? |
Script
Logs output
The text was updated successfully, but these errors were encountered: