Add ability to switch output languages for multilingual models #69 #72

SamDewriter · 2023-08-15T18:41:34Z

Here contains the few changes I have made, which I am doubting makes sense.

This reverts commit 075c4d3.

SamDewriter · 2023-08-15T18:44:08Z

My questions are:

I have added the target language as a parameter to the Segment class, but I am not sure if it is necessary to implement it in the DataLoader.
Somehow, if implemented in the Segmented and the DataLoader classes, the target language will still need to be entered manually before running it

ibanesh · 2023-08-15T23:10:12Z

You are on the right track.

To keep it simple, for now consider this as the use case for which you are making the changes:

When running the example pipeline(CounterInTargetLanguageAgentPipeline) we specify the target file (reference/en.txt) which has a list of references to compare the output of the pipeline. Let's say in addition to this target file, we also specify another file (eg: target_language.txt) which has the list of languages for the output corresponding to the respect input from the source file.

context of target_language.txt:

en

Now we want dataloader to load this file along with the source and target references, and then use the target language value from that file for setting the target language in the policy method of CounterInTargetLanguage agent class.

ibanesh · 2023-08-15T23:28:22Z

I have added the target language as a parameter to the Segment class, but I am not sure if it is necessary to implement it in the DataLoader.

Given the above use case, it would be required to make some necessary changes in dataloader.

Somehow, if implemented in the Segmented and the DataLoader classes, the target language will still need to be entered manually before running it

Even in the use case I stated, we are not technically changing the target language dynamically. But the changes you make in SimulEval to make things work for this use case will pave the way to dynamically pass in the target language from the demo front end.

In the given use case, instead of passing the target language as a parameter when loading the pipeline (i.e., --tgt-lang en), the target language will be inferred from a file.
When integrating with the demo (seamless-experience repo), this loading from a file part using a dataloader part will be replaced by some logic to stream data from front end, but the changes you made to enable passing the target language from the dataloader to the agent policy method will still hold and will enable us to dynamically change the target language.

ibanesh · 2023-08-15T23:37:03Z

simuleval/data/segments.py

@@ -15,6 +15,7 @@ class Segment:
    finished: bool = False
    is_empty: bool = False
    data_type: str = None
+    tgt_lang: str = ""


I can see that you have made some changes to add target language as a property to dataloader and segment classes. But I don't see any change for passing the target language parameter from the dataloader to the segment.

In case you are wondering, the instance class (eg: https://github.com/facebookresearch/SimulEval/blob/main/simuleval/evaluator/instance.py) is the one that loads a specific instance/sample from the dataloader and then creates segments for that instance.

Thanks so much for the comments, they are super helpful!

SamDewriter · 2023-08-18T12:42:07Z

I made new changes and updated the instance.py to pass the tgt_lang to the segment when creating it. It seems to work well only that the target language is not detected and instances.log shows "unknown" where the target language is meant to be when I used the dummy model (counter_in_tgt_lang_agent.py).

I suspect that the target language parameter (args) from the counter_in_tgt_lang_agent is not getting passed to the instance.

ibanesh · 2023-08-18T16:19:06Z

Are you sure all changes have been pushed to this PR?
I'm not seeing any changes to instance.py in this PR even after your latest commit.

ibanesh · 2023-08-18T16:24:55Z

This is the summary of changes I think that needs to be made:

Changes in dataloader to infer/load the target language from a file.
Changes for passing the tgt lang attribute loaded in the dataloader to the instance object for each instance.
Changes for passing the tgt lang attribute from instance object to segment.
Changes for passing the tgt lang attribute from the segment to agent policy through the agent states.

SamDewriter · 2023-08-18T17:10:28Z

Changes in dataloader to infer/load the target language from a file.
Changes for passing the tgt lang attribute loaded in the dataloader to the instance object for each instance.
Changes for passing the tgt lang attribute from instance object to segment.
Changes for passing the tgt lang attribute from the segment to agent policy through the agent states.

I am sure I have touched on all these aspects.

ibanesh · 2023-08-18T22:06:36Z

@SamDewriter
I still don't see any logic added in this PR for passing the tgt_lang from the dataloader to instance object to segment object. That connection is needed for propagating the parameter. Right now you are setting the tgt_lang property in dataloader from the file, but this property is not being used to set the tgt_lang property you added in instance or segment.

If you are looking for more code pointers, this should help:
https://github.com/facebookresearch/SimulEval/blob/seamless_main/simuleval/evaluator/instance.py#L46-L48
https://github.com/facebookresearch/SimulEval/blob/seamless_main/simuleval/evaluator/instance.py#L278-L283

The final missing part will be the passing of the tgt_lang from segment to the agent's policy method. This is the main part that is going to be useful when making changes to the demo.
pointers:

segments gets pushed to the pipeline - https://github.com/facebookresearch/SimulEval/blob/seamless_main/simuleval/evaluator/evaluator.py#L215-L216
segments gets pushed to the agent modules - https://github.com/facebookresearch/SimulEval/blob/seamless_main/simuleval/agents/pipeline.py#L60-L61
segment being used to update the state
- https://github.com/facebookresearch/SimulEval/blame/seamless_main/simuleval/agents/agent.py#L83
- https://github.com/facebookresearch/SimulEval/blob/seamless_main/simuleval/agents/states.py#L43-L47
policy method invoking - https://github.com/facebookresearch/SimulEval/blame/50a0783168f98101a427fe89057a7aacaf1b7e1c/simuleval/agents/agent.py#L112-L115

SamDewriter · 2023-08-20T11:35:03Z

In this new commit, I pass the target language to only the SpeechInputInstance class. The property tgt_lang is not added to the parent class instance. It was commented out in the last commit.

ibanesh · 2023-08-21T17:12:47Z

examples/speech_to_text/counter_in_tgt_lang_agent.py

+        if args is not None:
+            with open(args.tgt_lang, "r") as file:
+                tgt_lang = file.read()
+        self.tgt_lang = tgt_lang


Is this part here as a fallback? It is fine to leave it in as a fallback, but if this is the sole logic for getting tgt_lang in this agent then it won't meet our purpose.

We primarily want to get the tgt lang param dynamically passed to the policy method instead of solely getting set at initialization. You understand that solely getting set at initialization means it will remain static, right?

The reason I added this part is just to preprocess the tgt_lang because we are reading it from a file. I came to this conclusion after several trials and errors. I noticed that before adding the logic, the evaluation works well but the tgt_lang is not being passed.

One thing I'd also need clarification on is that in the counter_in_tgt_lang_agent, the tgt_lang is being passed to the class after inheriting from the parent class Agent, which means that the tgt_lang is not implemented in the parent class. I suppose the right thing will be to implement it in the parent class so that all the children classes will automatically have it

We don't want to set tgt_lang in the init method by the end of this effort, so don't worry about moving this attribute to the parent class for now.

If the objective is still not clear to you, imagine that

SimulEval/examples/speech_to_text/counter_in_tgt_lang_agent.py

Lines 22 to 24 in c7a1749

parser.add_argument(

"--tgt-lang", default="en", type=str, choices=["en", "es", "de"]

)

and

SimulEval/examples/speech_to_text/counter_in_tgt_lang_agent.py

Line 17 in c7a1749

self.tgt_lang = args.tgt_lang

will be removed. We want to be able to get the tgt_lang passed down to the policy method as part of the states, something like:

def policy(self, states: Optional[AgentStates] = None): .... .... tgt_lang = states.tgt_lang if tgt_lang == "en": prediction += "seconds" elif tgt_lang == "es": prediction += "segundos" elif tgt_lang == "de": prediction += "sekunden" else: prediction += "<unknown>" .... ....

Oh! Now I understand the goal. Thanks for the clarification!

ibanesh · 2023-08-25T20:52:42Z

The changes in latest commit looks good.
Can you please clean the PR up by removing the additional unnecessary changes and make it ready for review?

Mubaraq Sani added 13 commits July 19, 2023 21:28

Testing Circleci on main

89fbc24

Merge branch 'main' of https://github.com/SamDewriter/SimulEval

93f9643

Testing Circleci on main

fd41dc2

Testing Circleci on main

4becf02

Testing Circleci on main

cc893ef

Testing Circleci on main

a6f00f8

correct Circle config

0f41351

correct Circle config

78d13d0

correct Circle config

98ae6ec

correct Circle config

cdffc8d

Revert "[demo] s2t + s2s agent pipelines (facebookresearch#58)"

cacfbc9

This reverts commit 075c4d3.

resolve branch changes

f97cdfa

add target language

4ddf84a

SamDewriter requested a review from ibanesh August 15, 2023 18:41

SamDewriter self-assigned this Aug 15, 2023

add target language as a parameter

69e5816

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 15, 2023

ibanesh reviewed Aug 15, 2023

View reviewed changes

Test dynamic language

812bcbc

Switch language dynamically

233aa35

Mubaraq Sani added 3 commits August 18, 2023 22:15

Add ability to switch output language (facebookresearch#69)

f15634c

Add tgt language argument

c06fb0c

Add Namespace to args argument (facebookresearch#69)

5e0165a

Mubaraq Sani added 3 commits August 18, 2023 22:20

Modify code to read target language from a file

c43e9da

Add ability to switch input language

bbe6a88

Add a tgt-lang file to test

03c06b1

Add tgt_lang to instance

9d97d18

ibanesh reviewed Aug 21, 2023

View reviewed changes

Mubaraq Sani added 4 commits August 25, 2023 15:32

Add tgt_lang to AgentStates

3e5fbe6

States

6d06a8e

Add tgt_lang from state to test (facebookresearch#69)

50efb42

Target language to test

1c07b35

ibanesh closed this Sep 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ability to switch output languages for multilingual models #69 #72

Add ability to switch output languages for multilingual models #69 #72

SamDewriter commented Aug 15, 2023

SamDewriter commented Aug 15, 2023

ibanesh commented Aug 15, 2023 •

edited

Loading

ibanesh commented Aug 15, 2023

ibanesh Aug 15, 2023 •

edited

Loading

SamDewriter Aug 18, 2023

SamDewriter commented Aug 18, 2023

ibanesh commented Aug 18, 2023 •

edited

Loading

ibanesh commented Aug 18, 2023

SamDewriter commented Aug 18, 2023

ibanesh commented Aug 18, 2023

SamDewriter commented Aug 20, 2023 •

edited

Loading

ibanesh Aug 21, 2023

SamDewriter Aug 23, 2023

SamDewriter Aug 23, 2023

ibanesh Aug 23, 2023 •

edited

Loading

SamDewriter Aug 23, 2023

ibanesh commented Aug 25, 2023

	parser.add_argument(
	"--tgt-lang", default="en", type=str, choices=["en", "es", "de"]
	)

Add ability to switch output languages for multilingual models #69 #72

Add ability to switch output languages for multilingual models #69 #72

Conversation

SamDewriter commented Aug 15, 2023

SamDewriter commented Aug 15, 2023

ibanesh commented Aug 15, 2023 • edited Loading

ibanesh commented Aug 15, 2023

ibanesh Aug 15, 2023 • edited Loading

Choose a reason for hiding this comment

SamDewriter Aug 18, 2023

Choose a reason for hiding this comment

SamDewriter commented Aug 18, 2023

ibanesh commented Aug 18, 2023 • edited Loading

ibanesh commented Aug 18, 2023

SamDewriter commented Aug 18, 2023

ibanesh commented Aug 18, 2023

SamDewriter commented Aug 20, 2023 • edited Loading

ibanesh Aug 21, 2023

Choose a reason for hiding this comment

SamDewriter Aug 23, 2023

Choose a reason for hiding this comment

SamDewriter Aug 23, 2023

Choose a reason for hiding this comment

ibanesh Aug 23, 2023 • edited Loading

Choose a reason for hiding this comment

SamDewriter Aug 23, 2023

Choose a reason for hiding this comment

ibanesh commented Aug 25, 2023

ibanesh commented Aug 15, 2023 •

edited

Loading

ibanesh Aug 15, 2023 •

edited

Loading

ibanesh commented Aug 18, 2023 •

edited

Loading

SamDewriter commented Aug 20, 2023 •

edited

Loading

ibanesh Aug 23, 2023 •

edited

Loading