Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Last batch policy support for remaining image readers #217

Open
wants to merge 653 commits into
base: develop
Choose a base branch
from

Conversation

fiona-gladwin
Copy link
Contributor

  • Add Last batch policy, pad last batch, stick to shard and shard size support for remaining image readers
  • Add changes to COCO, caffe, caffe2, mxnet, tf and cifar10 readers

fiona-gladwin and others added 30 commits April 11, 2024 14:32
Remove changes to update the
filenames vector incase of padding
Fix the pipeline when pad is off and use idx in a continuous manner
To skip and batch and start with the previously padded idx in last batch
LakshmiKumar23
LakshmiKumar23 previously approved these changes Oct 8, 2024
@kiritigowda
Copy link
Collaborator

@fiona-gladwin can you fix the merge conflicts

@fiona-gladwin fiona-gladwin dismissed LakshmiKumar23’s stale review October 9, 2024 12:09

The merge-base changed after approval.

@fiona-gladwin
Copy link
Contributor Author

@fiona-gladwin can you fix the merge conflicts

Fixed merge conflicts

@@ -68,6 +68,8 @@ class CaffeLMDBRecordReader : public Reader {

CaffeLMDBRecordReader();

size_t last_batch_padded_size() override; // The size of the number of samples padded in the last batch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should consider merging the caffe and caffe2 classes. Or derive caffe2 from caffe. I see lots of repeated member variables in both classes

@@ -68,6 +68,8 @@ class Caffe2LMDBRecordReader : public Reader {

Caffe2LMDBRecordReader();

size_t last_batch_padded_size() override; // The size of the number of samples padded in the last batch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the same variables and functions are repeated in all the image reader classes. That is why we needs to make so many changes for accomodating something small like last_batch_policy.
We need to consolidate all common functions and variables in a common base class and derive all different readers from that.

return _file_names.size();

int ret = ((int)_file_names.size() - _read_counter);
int ret = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add new base class and move all common stuff to that class

@@ -223,7 +223,11 @@ def video(sequence_length, file_list_frame_num=False, file_root="", image_type=t
"file_list_frame_num": file_list_frame_num} # VideoMetaDataReader
b.videoMetaDataReader(Pipeline._current_pipeline._handle,
*(kwargs_pybind_reader.values()))

RocalShardingInfo = b.RocalShardingInfo()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add these defaults into constructor of RocalShardingInfo()

Copy link
Contributor

@rrawther rrawther left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please address review comments.
Update changelog.md

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci:precheckin enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants