Changing a dataset's contents (and potentially length) each epoch? #5792
Replies: 2 comments
-
I think the only option is to run |
Beta Was this translation helpful? Give feedback.
0 replies
-
I ended up just doing one giant training dataset of my dataset repeated for the number of epochs, shuffled each time before appending and doing my per-epoch changes. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hiya,
Is it possible to change the contents of a dataset, and potentially even the length, between epochs?
My use cases is that I'm training a LoRA for Llama, and incidentally found that when trained on a monolithic text file of questions and answers, chunked into context window lengths with some overlap between chunks, it performs much better than when chunked into contained question and answer blocks.
I'd like to try a process where on each epoch the questions and answers order are randomized and then joined back together into a monolithic text which is chunked to context window lengths again, changing the overlap between questions, and potentially even changing the length of the dataset (depending on the chunking and where words which tokenize to multiple tokens cause inconsistent cutoff points).
Beta Was this translation helpful? Give feedback.
All reactions