Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Last batch policy support for remaining image readers #217

Open
wants to merge 654 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
654 commits
Select commit Hold shift + click to select a range
ffdcb0a
Update CMakeLists and README for audio test
fiona-gladwin Apr 11, 2024
b2de5f4
Merge branch 'swbs/audio/pr1' into swbs/audio/pr2
fiona-gladwin Apr 11, 2024
d000af0
Update README for audio test
fiona-gladwin Apr 11, 2024
7415447
Minor fix
fiona-gladwin Apr 12, 2024
f6bffef
Merge branch 'develop' of https://github.com/ROCm/rocAL into swbs/aud…
fiona-gladwin Apr 12, 2024
cb034b0
Merge branch 'swbs/audio/pr1' into swbs/audio/pr2
fiona-gladwin Apr 12, 2024
6089040
Merge branch 'swbs/audio/pr7' of https://github.com/swetha097/rocAL i…
fiona-gladwin Apr 12, 2024
568ee7e
Fix build errors
fiona-gladwin Apr 12, 2024
2e38233
Fix Copy_Data_2d_ROI
swetha097 Apr 12, 2024
0e51f24
Merge branch 'swbs/audio/pr10' of https://github.com/swetha097/rocAL …
swetha097 Apr 12, 2024
d8031b5
Merge remote-tracking branch 'swe_fork/swbs/audio/pr2' into swbs/audi…
swetha097 Apr 12, 2024
d894aba
Fix merge from PR 2
swetha097 Apr 12, 2024
689c55f
Minor changes shard_count argument name
fiona-gladwin Apr 12, 2024
1079d50
Rename set and get functions of data_info to decoded_data_info
fiona-gladwin Apr 12, 2024
1f63cab
Merge branch 'swbs/audio/pr1' into swbs/audio/pr2
fiona-gladwin Apr 12, 2024
ff75ed9
Fix shard_size and audio source evaluation
Apr 15, 2024
94f6754
Changes in file_source_reader - to minimize the I/O operations
Apr 15, 2024
c082e9d
Changes in the variable name
Apr 15, 2024
5db5535
Changes in the variable names of the audio source evalution
Apr 15, 2024
2967b68
Merge branch 'swbs/audio/pr3' into audio_pr4
SundarRajan28 Apr 16, 2024
27429b3
Use set instead of vector
swetha097 Apr 16, 2024
d45967e
Minor bug fixes
swetha097 Apr 16, 2024
18baa2a
Minor fixes
fiona-gladwin Apr 16, 2024
dd2a7df
Fix drop policy without padding
fiona-gladwin Apr 16, 2024
a8f81d9
Fix pytorch iterator - PARTIAL policy
fiona-gladwin Apr 16, 2024
9dab620
Merge branch 'swbs/audio/pr10' of https://github.com/swetha097/rocAL …
fiona-gladwin Apr 16, 2024
36a9516
Merge branch 'audio_pr4' into swbs/audio/pr5
SundarRajan28 Apr 17, 2024
fb7a52b
Merge branch 'swbs/audio/pr5' into swbs/audio/pr6
SundarRajan28 Apr 17, 2024
b3823c8
Merge branch 'swbs/audio/pr6' into swbs/audio/pr8
SundarRajan28 Apr 17, 2024
4de03a5
Merge branch 'swbs/audio/pr8' into swbs/audio/pr9
SundarRajan28 Apr 17, 2024
0161204
Merge branch 'swbs/audio/pr9' into swbs/audio/pr7
SundarRajan28 Apr 17, 2024
42d1bb1
Merge remote-tracking branch 'upstream/develop' into swbs/audio/pr2
SundarRajan28 Apr 17, 2024
3375f41
Merge branch 'swbs/audio/pr2' into swbs/audio/pr3
SundarRajan28 Apr 17, 2024
d7c8884
Merge branch 'swbs/audio/pr3' into audio_pr4
SundarRajan28 Apr 17, 2024
513fd78
Merge branch 'audio_pr4' into swbs/audio/pr5
SundarRajan28 Apr 17, 2024
44cefd6
Merge branch 'swbs/audio/pr5' into swbs/audio/pr6
SundarRajan28 Apr 17, 2024
c100e80
Merge branch 'swbs/audio/pr6' into swbs/audio/pr8
SundarRajan28 Apr 17, 2024
9698308
Merge branch 'swbs/audio/pr8' into swbs/audio/pr9
SundarRajan28 Apr 17, 2024
23dad87
Merge branch 'swbs/audio/pr9' into swbs/audio/pr7
SundarRajan28 Apr 17, 2024
d928c48
Merge branch 'develop' of https://github.com/ROCm/rocAL into swbs/aud…
fiona-gladwin Apr 17, 2024
c0d2309
Merge branch 'swbs/audio/pr1' into swbs/audio/pr2
fiona-gladwin Apr 17, 2024
c01325d
Revert empty line removed in CMakeLists.txt
fiona-gladwin Apr 17, 2024
549def5
Removed prefix original for audio vectors
fiona-gladwin Apr 17, 2024
b8f90a8
Fix PARTIAL
fiona-gladwin Apr 18, 2024
211c4c9
Reduce overall time for audio source evalution
swetha097 Apr 18, 2024
5a313ec
Fix shard_size and stick to shard issue seen with convergence
swetha097 Apr 18, 2024
c1d9cc5
Resolve PR comments
swetha097 Apr 18, 2024
7874f09
Add @params to all args in pytorch.py
swetha097 Apr 18, 2024
ef9a21b
Fix build issue
swetha097 Apr 18, 2024
0f48da9
Merge branch 'develop' of https://github.com/ROCm/rocAL into swbs/aud…
fiona-gladwin Apr 22, 2024
a4de349
Merge branch 'swbs/audio/pr7' of https://github.com/swetha097/rocAL i…
fiona-gladwin Apr 22, 2024
37921de
Minor changes in unit test
swetha097 Apr 22, 2024
96ace00
Merge branch 'swbs/audio/pr2' of https://github.com/swetha097/rocAL i…
swetha097 Apr 22, 2024
6602895
Minor changes
swetha097 Apr 22, 2024
aa13a35
Change ROCAL instaces to rocAL in pytorch.py
swetha097 Apr 22, 2024
2873d8c
Merge branch 'swbs/audio/pr2' into swbs/audio/pr3
fiona-gladwin Apr 22, 2024
2dd31f8
Resolve the PR comments
swetha097 Apr 23, 2024
1cd9779
Merge branch 'swbs/audio/pr3' of https://github.com/swetha097/rocAL i…
swetha097 Apr 23, 2024
d1d5241
Minor changes in decoders.py - Modify the comment for shard_size
swetha097 Apr 23, 2024
f4bcbca
Merge branch 'swbs/audio/pr2' of https://github.com/swetha097/rocAL i…
fiona-gladwin Apr 23, 2024
d152dca
Merge branch 'swbs/audio/pr3' of https://github.com/swetha097/rocAL i…
fiona-gladwin Apr 23, 2024
e4c5788
Merge branch 'develop' of https://github.com/ROCm/rocAL into swbs/aud…
fiona-gladwin Apr 23, 2024
6a98227
Merge branch 'swbs/audio/pr10_training' into swbs/audio/pr10
swetha097 Apr 24, 2024
6279bb1
Fix shard_size
swetha097 Apr 24, 2024
fb33f06
Merge branch 'swbs/audio/pr3' into audio_pr4
SundarRajan28 Apr 24, 2024
be416ef
Minor changes
swetha097 Apr 24, 2024
d9dbd2c
Changes in pipeline.py and decoders.py
swetha097 Apr 24, 2024
b674564
Merge remote-tracking branch 'origin/swbs/audio/pr10_training' into s…
swetha097 Apr 24, 2024
8a7bb3c
Address the PR comments
swetha097 Apr 25, 2024
2021ab9
Address Review comments
swetha097 Apr 25, 2024
f3a1afa
Remove print statement
swetha097 Apr 30, 2024
08033c7
Merge branch 'swbs/audio/pr10' into lbp_fix_pr10
swetha097 May 3, 2024
33b3681
Fix the count_items
swetha097 May 3, 2024
261b2e7
Make Sharding similar to DALI
swetha097 May 3, 2024
bb2bad2
Fix issues with DROP policy by introducing a new vector for padding
swetha097 May 6, 2024
075882e
Minor fixes
swetha097 May 6, 2024
adbc5fd
Comment out print statements
swetha097 May 7, 2024
c304314
Add changes for shard_size LBP testing
swetha097 May 7, 2024
0524019
Fix DROP Policy with shard_size > 0
swetha097 May 7, 2024
a7eef66
Fix Stick_to_Shard=False
swetha097 May 7, 2024
0395ddd
Fix PARTIAL policy and code clean up
swetha097 May 7, 2024
63aacf6
fix last_batch_padded size when shard_size > 0
swetha097 May 8, 2024
0cb2812
Fix Drop policy - we skip the dropped batch in the next epoch
swetha097 May 8, 2024
296e1c6
Fix single shard outputs
swetha097 May 8, 2024
497cd08
Remove the commented code and fix the padding code in open()
swetha097 May 9, 2024
66addb9
Remove div by num_shards in decoders.py
swetha097 May 9, 2024
0c900a9
Introduce Audio layouts
fiona-gladwin May 9, 2024
e75616c
Add layout changes for spectrogram
fiona-gladwin May 9, 2024
e7ed0d8
Fix the unit tests - c++ & python
swetha097 May 9, 2024
528a87a
Merge branch 'swbs/audio/pr5' of https://github.com/swetha097/rocAL i…
fiona-gladwin May 9, 2024
feff5bd
Code clean up and formatting
swetha097 May 9, 2024
9b206a8
Minor code clean up
swetha097 May 9, 2024
4bc0f0d
code clean up in pytorch.py
swetha097 May 9, 2024
2809761
Add layout changes for spectrogram
fiona-gladwin May 10, 2024
8ff55cb
Pass layouts for MelFilterBank
fiona-gladwin May 10, 2024
0993896
Fix ToDecibels
fiona-gladwin May 10, 2024
79d316c
Fix Normalize
fiona-gladwin May 10, 2024
0822f76
Fix build issue
fiona-gladwin May 10, 2024
120dddc
Merge branch 'swbs/audio/pr5_layout' of https://github.com/swetha097/…
fiona-gladwin May 10, 2024
a6cbbe2
Fix python unit test
fiona-gladwin May 10, 2024
8be961b
Merge remote-tracking branch 'swe_fork/swbs/audio/pr7_layout' into sw…
swetha097 May 10, 2024
ab993d0
Minor fix
fiona-gladwin May 10, 2024
1ddfe34
Pass LBP to decoders instead of the Pipeline creation
swetha097 May 13, 2024
d7764e1
Merge branch 'swbs/lbp_fixes_pr10_pass_readers' into swbs/lbp_fixes_pr10
May 13, 2024
484e1bd
Update pipeline.py - Remove commented code
swetha097 May 13, 2024
924ab79
Update pipeline.py - Remove commented out code
swetha097 May 13, 2024
9757256
Adding changes for spec layout changes
SundarRajan28 May 15, 2024
6b2a06c
Merge branch 'swbs/audio/pr6' into swbs/audio/pr8
SundarRajan28 May 15, 2024
df70d39
Merge branch 'swbs/audio/pr8' into swbs/audio/pr7
SundarRajan28 May 15, 2024
5505ed8
Adding changes to MFB and normalize nodes
SundarRajan28 May 15, 2024
e685c37
Update node_slice.cpp
swetha097 May 16, 2024
fc26afd
Update node_slice.h
swetha097 May 16, 2024
4320399
Resolve PR comments
swetha097 May 17, 2024
ce91644
Merge branch 'swbs/audio/pr5_layout' into swbs/audio/pr5
fiona-gladwin May 17, 2024
60133c6
Merge branch 'swbs/audio/pr5' into swbs/audio/pr6
fiona-gladwin May 17, 2024
b2c40eb
Merge branch 'swbs/audio/pr6' into swbs/audio/pr8
fiona-gladwin May 17, 2024
84db544
Merge branch 'swbs/audio/pr8' into swbs/audio/pr9
fiona-gladwin May 17, 2024
affe8f3
Merge branch 'swbs/audio/pr9' into swbs/audio/pr7
fiona-gladwin May 17, 2024
c41f363
Merge remote-tracking branch 'open_source/develop' into swbs/audio/pr3
swetha097 May 17, 2024
70e12cd
Merge branch 'swbs/audio/pr3' into audio_pr4
swetha097 May 18, 2024
b858b69
Merge branch 'audio_pr4' into swbs/audio/pr5
swetha097 May 18, 2024
5e79034
Merge remote-tracking branch 'origin/swbs/audio/pr5' into HEAD
swetha097 May 18, 2024
66be5a2
Merge branch 'temp_swbs/audio/pr6' into swbs/audio/pr6
swetha097 May 19, 2024
91c4fa1
Merge branch 'swbs/audio/pr6' into swbs/audio/pr8
swetha097 May 19, 2024
750b286
Merge branch 'swbs/audio/pr8' into swbs/audio/pr9
swetha097 May 19, 2024
5276ec2
Merge branch 'swbs/audio/pr9' into swbs/audio/pr7
swetha097 May 19, 2024
4e39f6a
Merge branch 'swbs/audio/pr7' into swbs/lbp_fixes_pr10
swetha097 May 19, 2024
ed9aae2
Fix downmix failing case and resolve the issue with merge
swetha097 May 19, 2024
74759dc
Fix issue with file_source_reader.cpp when file_list is not used
swetha097 May 20, 2024
a417f17
Resolve PR comments - Sundar
swetha097 May 20, 2024
a0242d9
Fix file_source_reader.cpp
swetha097 May 21, 2024
0f4a590
Fix shuffle issues
SundarRajan98 May 23, 2024
9b5fab1
Merge remote-tracking branch 'upstream/develop' into swbs/audio/pr7
SundarRajan28 Jun 5, 2024
328c41a
Adding comments to all if conditions
sbavasab Jun 5, 2024
90f8125
Add Cmake changes for HIP backend
swetha097 Jun 7, 2024
14e6562
Add LPB & Sharding changes for cifar10 reader
swetha097 Jun 7, 2024
688e76e
Add LBP & Sharding changes to coco-file-source-reader
swetha097 Jun 7, 2024
f669459
Add LBP & Sharding changes to TF Record reader
swetha097 Jun 7, 2024
da43265
Adding LBP & Sharding changes in caffe lmdb reader
swetha097 Jun 7, 2024
1ddee78
Add LBP and sharding changes in caffe2 lmdb reader
swetha097 Jun 7, 2024
2ba51f5
Add support for LBP & Sharding in mxnetIO reader
swetha097 Jun 7, 2024
e166721
Merge branch 'swbs/lbp_fixes_pr10' into swbs/lbp_readers_pr11
swetha097 Jun 7, 2024
6237aaf
Merge branch 'swbs/lbp_fixes_pr10' of https://github.com/swetha097/ro…
swetha097 Jun 7, 2024
e1b7337
Merge branch 'swbs/lbp_fixes_pr10' into swbs/lbp_readers_pr11
swetha097 Jun 7, 2024
11b0f96
Merge remote-tracking branch 'upstream/develop' into swbs/audio/pr7
SundarRajan28 Jun 12, 2024
83cecf5
Merge remote-tracking branch 'open_source/develop' into swbs/audio/pr9
swetha097 Jun 12, 2024
6692974
Fix merge conflicts
SundarRajan28 Jun 13, 2024
0cd21de
Merge remote-tracking branch 'upstream/develop' into swbs/audio/pr7
SundarRajan28 Jun 14, 2024
f7e8826
Merge remote-tracking branch 'open_source/develop' into develop
swetha097 Jun 18, 2024
00cdddb
Merge remote-tracking branch 'open_source/develop' into swbs/audio/pr9
swetha097 Jun 18, 2024
0c74d8b
Merge remote-tracking branch 'upstream/develop' into swbs/audio/pr7
SundarRajan28 Jun 20, 2024
20ef6d6
Resolving review comments
SundarRajan28 Jun 21, 2024
7e9b3ce
Merge branch 'develop' into swbs/audio/pr9
swetha097 Jun 21, 2024
7af1c03
Merge remote-tracking branch 'upstream/develop' into swbs/audio/pr9
SundarRajan28 Jun 25, 2024
e2ef16b
Merge branch 'swbs/audio/pr9' into swbs/audio/pr7
SundarRajan28 Jun 25, 2024
c3f5391
Merge remote-tracking branch 'upstream/develop' into swbs/audio/pr7
SundarRajan28 Jun 29, 2024
5b5348a
Merge remote-tracking branch 'swe_fork/swbs/audio/pr7' into swbs/lbp_…
swetha097 Jul 1, 2024
9b5edec
Fix a minor warning in file source reader
swetha097 Jul 1, 2024
5326625
Merge branch 'develop' into swbs/audio/pr7
LakshmiKumar23 Jul 2, 2024
c6fe840
Resolving review comments
SundarRajan28 Jul 4, 2024
5b4ebc5
LBP comments resolution
swetha097 Jul 4, 2024
c8e1791
Merge branch 'develop' into swbs/audio/pr7
LakshmiKumar23 Jul 8, 2024
704badd
Resolving review comments
SundarRajan28 Jul 9, 2024
e521768
Merge remote-tracking branch 'upstream/develop' into swbs/lbp_fixes_pr10
SundarRajan28 Jul 9, 2024
a902dca
Merge branch 'swbs/audio/pr7' into swbs/lbp_fixes_pr10
SundarRajan28 Jul 9, 2024
97d7077
Formatting changes
SundarRajan28 Jul 11, 2024
0ddea59
Resolving Final Set of PR comments
swetha097 Jul 11, 2024
13a105a
Combine with OR condition
swetha097 Jul 11, 2024
f189738
Remove the pad_last_batch_repeated print statement from decoders.py
swetha097 Jul 11, 2024
4e9ff04
Merge branch 'develop' of https://github.com/ROCm/rocAL into swbs/lbp…
fiona-gladwin Jul 12, 2024
5d0decf
Add shard_size and stick_to_shard variables in args
swetha097 Jul 12, 2024
0b8f35c
Merge branch 'develop' into swbs/lbp_fixes_pr10
swetha097 Jul 17, 2024
0e5fa5a
Minor spelling fix
swetha097 Jul 17, 2024
895d664
Merge remote-tracking branch 'swe_fork/swbs/lbp_fixes_pr10' into swbs…
swetha097 Jul 18, 2024
364210b
Merge remote-tracking branch 'swe_fork/swbs/lbp_fixes_pr10' into swbs…
swetha097 Jul 18, 2024
8db43ae
Commit for resolving PR comments
swetha097 Jul 22, 2024
d4bbdb7
Commit for resolving PR comments
swetha097 Jul 22, 2024
68479df
Merge branch 'swbs/lbp_readers_pr11' of https://github.com/swetha097/…
swetha097 Jul 22, 2024
8d39cf1
Update cifar10_data_reader.cpp
swetha097 Jul 22, 2024
eec6d65
Update mxnet_recordio_reader.cpp
swetha097 Jul 22, 2024
8f72469
Update cifar10_data_reader.cpp
swetha097 Jul 22, 2024
769010c
Merge branch 'develop' into swbs/lbp_fixes_pr10
kiritigowda Jul 24, 2024
615e12d
Make changes to insert the padded data in the file_names vector
swetha097 Jul 24, 2024
cec5e8a
Merge branch 'swbs/lbp_fixes_pr10' of https://github.com/swetha097/ro…
swetha097 Jul 24, 2024
072ee56
Support to pass the variables fo lbp as struct
swetha097 Jul 24, 2024
408091a
Merge branch 'develop' into swbs/lbp_fixes_pr10
kiritigowda Jul 24, 2024
78c2a3d
Fix segmentation fault
swetha097 Jul 29, 2024
4dcc1fc
Merge branch 'ROCm:develop' into swbs/lbp_fixes_pr10_pr_comments
SundarRajan28 Jul 30, 2024
019e14c
Merge remote-tracking branch 'origin/swbs/lbp_fixes_pr10_pr_comments'…
swetha097 Aug 2, 2024
e3ccc27
Merge branch 'swbs/lbp_fixes_pr10_pr_comments' into swbs/lbp_fixes_pr10
swetha097 Aug 5, 2024
1c5398e
Resolve PR comments
swetha097 Aug 5, 2024
4e003c3
Merge branch 'swbs/lbp_fixes_pr10' of https://github.com/swetha097/ro…
swetha097 Aug 5, 2024
4ffb79b
Resolve PR comments
swetha097 Aug 5, 2024
70220c3
Resolve PR comments
swetha097 Aug 5, 2024
c58ffa9
Use PreComputed start and end indices
swetha097 Aug 5, 2024
6803069
Use precomputed shard_idx start and end in initialize
swetha097 Aug 5, 2024
a5797d5
Merge branch 'develop' into swbs/lbp_fixes_pr10
kiritigowda Aug 9, 2024
68cbee9
Merge branch 'develop' into swbs/lbp_fixes_pr10
kiritigowda Aug 20, 2024
4d83a11
Merge remote-tracking branch 'origin/swbs/lbp_fixes_pr10_pr_comments'…
swetha097 Aug 20, 2024
4a5b55f
Merge branch 'develop' into swbs/lbp_fixes_pr10
kiritigowda Aug 27, 2024
99d232d
Initialize the Sharding info using ShardingInfo()
swetha097 Aug 28, 2024
e9b5d63
Merge branch 'swbs/lbp_fixes_pr10' of https://github.com/swetha097/ro…
swetha097 Aug 28, 2024
612b450
convert the signed to int32_t type
swetha097 Aug 28, 2024
85d2c2a
temp commit for struct changes
swetha097 Aug 28, 2024
1c1dfa8
Fix the struct changes - All the test cases passing
Aug 29, 2024
9e1ce0c
Remove any print statements
Aug 29, 2024
85ef52a
Add support to Pass the decode size policy from the user
Aug 29, 2024
6e1b81f
Add support to Pass the decode size policy from the user
Aug 29, 2024
be92088
Merge branch 'swbs/lbp_fixes_pr10' of https://github.com/swetha097/ro…
swetha097 Aug 29, 2024
51259c3
Rename RocalShardingInfo for ShardingInfo and vice-versa
swetha097 Aug 29, 2024
7b4276f
xywh roi copy
swetha097 Aug 29, 2024
8157375
Fix decoders.py for image decoders
swetha097 Aug 29, 2024
673b5d5
Make stick_to_shard True by default
swetha097 Aug 29, 2024
dea2023
Merge branch 'develop' of https://github.com/ROCmSoftwarePlatform/roc…
fgladwin Aug 29, 2024
8ae2e1c
Minor changes to the copy_data function
swetha097 Aug 29, 2024
8e26013
Rename to x_offset and y_offset in copy_data
fgladwin Aug 30, 2024
125c093
Minor changes - remove unused variables
fgladwin Aug 30, 2024
bf6f72f
Minor change - Variable names
fgladwin Aug 30, 2024
4560ffc
Update Doxygen comments and comments of API
fgladwin Sep 3, 2024
81c9d19
Merge branch 'swbs/lbp_fixes_pr10' of https://github.com/swetha097/ro…
fgladwin Sep 3, 2024
f279dc5
Merge branch 'develop' into swbs/lbp_fixes_pr10
LakshmiKumar23 Sep 5, 2024
bf1c23a
Merge branch 'develop' into swbs/lbp_fixes_pr10
kiritigowda Sep 6, 2024
82e276f
Make the rocalShardingInfo as the last param for Audio loaders
Sep 6, 2024
2918a89
Remove unused variables and functions in file_source_reader.cpp & .h …
Sep 6, 2024
0878361
Remove the doctring explanation for unused params
Sep 6, 2024
2c1aee9
Change the explanation according to the newly introduced structure
Sep 6, 2024
753e442
Merge branch 'develop' into swbs/lbp_fixes_pr10
LakshmiKumar23 Sep 6, 2024
fd31a01
Merge branch 'swbs/lbp_fixes_pr10' into swbs/lbp_readers_pr11_merged
Sep 12, 2024
e406477
Review comments from PR10 to PR11
Sep 13, 2024
423feaf
PR comments
Sep 18, 2024
04bd3f2
Merge remote-tracking branch 'upstream/develop' into swbs/lbp_readers…
Sep 18, 2024
0874a24
Add file source reader .h file changes
Sep 24, 2024
f8888e3
First level of comments - LBP
Sep 24, 2024
d22f0f6
LBP comments - 2
Sep 25, 2024
067711c
Fix issues
Sep 25, 2024
62ea27d
caffe2 reader fix
Sep 26, 2024
01c2a5c
Fix LBP file source reader
fgladwin Sep 26, 2024
aa68dcd
Add count items fix for all readers
fgladwin Sep 26, 2024
7da362a
Fix caffe and caffe2 readers
fgladwin Sep 27, 2024
29a7de9
Minor changes
fgladwin Sep 27, 2024
6baf859
Minor changes
fgladwin Sep 30, 2024
88759fe
Minor changes
fgladwin Sep 30, 2024
b6cdbfe
Remove get dataset size function
fgladwin Sep 30, 2024
5f76488
Minor changes
fgladwin Sep 30, 2024
7103426
Merge branch 'develop' of https://github.com/ROCmSoftwarePlatform/roc…
fgladwin Sep 30, 2024
62e8112
Merge branch 'develop' into swbs/lbp_readers_pr11_merged
LakshmiKumar23 Oct 8, 2024
c184b53
Merge branch 'develop' into swbs/lbp_readers_pr11_merged
fiona-gladwin Oct 9, 2024
f54eccc
Merge branch 'develop' into swbs/lbp_readers_pr11_merged
kiritigowda Oct 11, 2024
2736455
Merge branch 'develop' into swbs/lbp_readers_pr11_merged
LakshmiKumar23 Oct 12, 2024
8c9605f
Fixing issues with ext src reader example
SundarRajan28 Oct 15, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 0 additions & 2 deletions rocAL/include/readers/file_source_reader.h
Original file line number Diff line number Diff line change
Expand Up @@ -85,15 +85,13 @@ class FileSourceReader : public Reader {
unsigned _curr_file_idx;
FILE *_current_fPtr;
unsigned _current_file_size;
unsigned _shard_start_idx;
std::vector<unsigned> _shard_start_idx_vector, _shard_end_idx_vector;
std::string _last_id;
std::string _last_file_name, _last_file_path, _absolute_file_path;
size_t _shard_id = 0;
size_t _shard_count = 1; // equivalent of batch size
int32_t _shard_size = -1;
size_t _batch_size = 1;
size_t _padded_samples = 0;
bool _loop;
bool _shuffle;
int _read_counter = 0;
Expand Down
26 changes: 16 additions & 10 deletions rocAL/include/readers/image/caffe2_lmdb_record_reader.h
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,8 @@ class Caffe2LMDBRecordReader : public Reader {

Caffe2LMDBRecordReader();

size_t last_batch_padded_size() override; // The size of the number of samples padded in the last batch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the same variables and functions are repeated in all the image reader classes. That is why we needs to make so many changes for accomodating something small like last_batch_policy.
We need to consolidate all common functions and variables in a common base class and derive all different readers from that.


private:
//! opens the folder containnig the images
Reader::Status Caffe2_LMDB_reader();
Expand All @@ -78,7 +80,7 @@ class Caffe2LMDBRecordReader : public Reader {
DIR* _sub_dir;
struct dirent* _entity;
std::vector<std::string> _file_names;
std::map<std::string, unsigned int> _file_size;
std::map<std::string, unsigned int> _file_size, _all_shard_file_sizes_padded;
unsigned _curr_file_idx;
unsigned _current_file_size;
std::string _last_id;
Expand All @@ -87,24 +89,16 @@ class Caffe2LMDBRecordReader : public Reader {
size_t _shard_id = 0;
size_t _shard_count = 1; // equivalent of batch size
bool _last_rec;
//!< _batch_count Defines the quantum count of the images to be read. It's usually equal to the user's batch size.
/// The loader will repeat images if necessary to be able to have images available in multiples of the load_batch_count,
/// for instance if there are 10 images in the dataset and _batch_count is 3, the loader repeats 2 images as if there are 12 images available.
size_t _batch_count = 1;
size_t _file_id = 0;
size_t _batch_size = 1;
size_t _in_batch_read_count = 0;
bool _loop;
bool _shuffle;
int _read_counter = 0;
uint _file_byte_size;
void incremenet_read_ptr();
int release();
size_t get_file_shard_id();
//!< _file_count_all_shards total_number of files in to figure out the max_batch_size (usually needed for distributed training).
size_t _file_count_all_shards;
void incremenet_file_id() { _file_id++; }
void replicate_last_image_to_fill_last_shard();
void replicate_last_batch_to_pad_partial_shard();
void read_image(unsigned char* buff, std::string file_name);
void read_image_names();
std::map<std::string, uint> _image_record_starting;
Expand All @@ -116,4 +110,16 @@ class Caffe2LMDBRecordReader : public Reader {
MDB_txn* _read_mdb_txn;
MDB_cursor* _read_mdb_cursor;
void open_env_for_read_image();
int32_t _shard_size = -1;
ShardingInfo _sharding_info = ShardingInfo(); // The members of ShardingInfo determines how the data is distributed among the shards and how the last batch is processed by the pipeline.
std::vector<unsigned> _shard_start_idx_vector, _shard_end_idx_vector;
size_t _last_batch_padded_size = 0;
bool _stick_to_shard = false;
bool _pad_last_batch_repeated = false;
size_t actual_shard_size_without_padding(); // Number of files belonging to a shard (without padding)
size_t largest_shard_size_without_padding(); // Number of files belonging to a shard (with padding)
//!< Used to advance to the next shard's data to increase the entropy of the data seen by the pipeline>
void increment_shard_id();
void increment_curr_file_idx();
void compute_start_and_end_idx_of_all_shards(); // Start Idx of all the Shards
};
23 changes: 15 additions & 8 deletions rocAL/include/readers/image/caffe_lmdb_record_reader.h
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,8 @@ class CaffeLMDBRecordReader : public Reader {

CaffeLMDBRecordReader();

size_t last_batch_padded_size() override; // The size of the number of samples padded in the last batch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should consider merging the caffe and caffe2 classes. Or derive caffe2 from caffe. I see lots of repeated member variables in both classes


private:
//! opens the folder containnig the images
Reader::Status folder_reading();
Expand All @@ -85,10 +87,7 @@ class CaffeLMDBRecordReader : public Reader {
size_t _shard_id = 0;
size_t _shard_count = 1; // equivalent of batch size
bool _last_rec;
//!< _batch_count Defines the quantum count of the images to be read. It's usually equal to the user's batch size.
/// The loader will repeat images if necessary to be able to have images available in multiples of the load_batch_count,
/// for instance if there are 10 images in the dataset and _batch_count is 3, the loader repeats 2 images as if there are 12 images available.
size_t _batch_count = 1;
size_t _batch_size = 1;
size_t _file_id = 0;
size_t _in_batch_read_count = 0;
bool _loop;
Expand All @@ -102,17 +101,25 @@ class CaffeLMDBRecordReader : public Reader {
uint _file_byte_size;
void incremenet_read_ptr();
int release();
size_t get_file_shard_id();
//!< _file_count_all_shards total_number of files in to figure out the max_batch_size (usually needed for distributed training).
size_t _file_count_all_shards;
void incremenet_file_id() { _file_id++; }
void replicate_last_image_to_fill_last_shard();
void replicate_last_batch_to_pad_partial_shard();
void read_image(unsigned char* buff, std::string _file_name);
void read_image_names();
std::map<std::string, uint> _image_record_starting;
int _open_env = 1;
int rc;
void open_env_for_read_image();
std::shared_ptr<MetaDataReader> _meta_data_reader = nullptr;
int32_t _shard_size = -1;
std::vector<unsigned> _shard_start_idx_vector, _shard_end_idx_vector;
size_t _last_batch_padded_size = 0;
ShardingInfo _sharding_info = ShardingInfo(); // The members of ShardingInfo determines how the data is distributed among the shards and how the last batch is processed by the pipeline.
bool _stick_to_shard = false;
bool _pad_last_batch_repeated = false;
size_t actual_shard_size_without_padding(); // Number of files belonging to a shard (without padding)
size_t largest_shard_size_without_padding(); // Number of files belonging to a shard (with padding)
//!< Used to advance to the next shard's data to increase the entropy of the data seen by the pipeline>
void increment_shard_id();
void increment_curr_file_idx();
void compute_start_and_end_idx_of_all_shards(); // Start Idx of all the Shards
};
27 changes: 20 additions & 7 deletions rocAL/include/readers/image/cifar10_data_reader.h
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,8 @@ class CIFAR10DataReader : public Reader {

unsigned get_file_index() { return _last_file_idx; }

size_t last_batch_padded_size() override; // The size of the number of samples padded in the last batch

private:
//! opens the folder containing the images
Reader::Status open_folder();
Expand All @@ -87,15 +89,26 @@ class CIFAR10DataReader : public Reader {
//!< _raw_file_size of each file to read
const size_t _raw_file_size = (32 * 32 * 3 + 1); // todo:: need to add an option in reader config to take this.
size_t _total_file_size;
//!< _batch_count Defines the quantum count of the images to be read. It's usually equal to the user's batch size.
/// The loader will repeat images if necessary to be able to have images available in multiples of the load_batch_count,
/// for instance if there are 10 images in the dataset and _batch_count is 3, the loader repeats 2 images as if there are 12 images available.
size_t _batch_count = 1;
size_t _file_id = 0;
size_t _in_batch_read_count = 0;
size_t _batch_size = 1;
bool _loop;
bool _shuffle;
int _read_counter = 0;
void incremenet_read_ptr();
int release();
void incremenet_file_id() { _file_id++; }
int32_t _shard_size = -1;
size_t _shard_id = 0;
size_t _shard_count = 1;
std::vector<unsigned> _shard_start_idx_vector, _shard_end_idx_vector;
//!< _file_count_all_shards total_number of files in to figure out the max_batch_size (usually needed for distributed training).
size_t _file_count_all_shards;
ShardingInfo _sharding_info = ShardingInfo(); // The members of ShardingInfo determines how the data is distributed among the shards and how the last batch is processed by the pipeline.
size_t _last_batch_padded_size = 0;
bool _stick_to_shard = false;
bool _pad_last_batch_repeated = false;
size_t actual_shard_size_without_padding(); // Number of files belonging to a shard (without padding)
size_t largest_shard_size_without_padding(); // Number of files belonging to a shard (with padding)
//!< Used to advance to the next shard's data to increase the entropy of the data seen by the pipeline>
void increment_shard_id();
void increment_curr_file_idx();
void compute_start_and_end_idx_of_all_shards();
};
27 changes: 16 additions & 11 deletions rocAL/include/readers/image/coco_file_source_reader.h
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,8 @@ class COCOFileSourceReader : public Reader {

COCOFileSourceReader();

size_t last_batch_padded_size() override; // The size of the number of samples padded in the last batch

private:
std::shared_ptr<MetaDataReader> _meta_data_reader = nullptr;
//! opens the folder containnig the images
Expand All @@ -85,23 +87,26 @@ class COCOFileSourceReader : public Reader {
std::string _last_id;
std::string _last_file_name;
size_t _shard_id = 0;
size_t _shard_count = 1; // equivalent of batch size
//!< _batch_count Defines the quantum count of the images to be read. It's usually equal to the user's batch size.
/// The loader will repeat images if necessary to be able to have images available in multiples of the load_batch_count,
/// for instance if there are 10 images in the dataset and _batch_count is 3, the loader repeats 2 images as if there are 12 images available.
size_t _batch_count = 1;
size_t _file_id = 0;
size_t _in_batch_read_count = 0;
size_t _shard_count = 1;
size_t _batch_size = 1;
bool _loop;
bool _shuffle;
int _read_counter = 0;
//!< _file_count_all_shards total_number of files in to figure out the max_batch_size (usually needed for distributed training).
size_t _file_count_all_shards;
void incremenet_read_ptr();
int release();
size_t get_file_shard_id();
void incremenet_file_id() { _file_id++; }
void replicate_last_image_to_fill_last_shard();
void replicate_last_batch_to_pad_partial_shard();
void shuffle_with_aspect_ratios();
void increment_curr_file_idx();
ShardingInfo _sharding_info = ShardingInfo(); // The members of ShardingInfo determines how the data is distributed among the shards and how the last batch is processed by the pipeline.
size_t _last_batch_padded_size = 0;
bool _stick_to_shard = false;
bool _pad_last_batch_repeated = false;
int32_t _shard_size = -1;
std::vector<unsigned> _shard_start_idx_vector, _shard_end_idx_vector;
size_t actual_shard_size_without_padding(); // Number of files belonging to a shard (without padding)
size_t largest_shard_size_without_padding(); // Number of files belonging to a shard (with padding)
//!< Used to advance to the next shard's data to increase the entropy of the data seen by the pipeline>
void increment_shard_id();
void compute_start_and_end_idx_of_all_shards(); // Start Idx of all the Shards
};
27 changes: 17 additions & 10 deletions rocAL/include/readers/image/mxnet_recordio_reader.h
Original file line number Diff line number Diff line change
Expand Up @@ -66,8 +66,10 @@ class MXNetRecordIOReader : public Reader {

MXNetRecordIOReader();

size_t last_batch_padded_size() override; // The size of the number of samples padded in the last batch

private:
//! opens the folder containnig the images
//! opens the folder containig the images
Reader::Status record_reading();
Reader::Status MXNet_reader();
std::string _path;
Expand All @@ -83,11 +85,8 @@ class MXNetRecordIOReader : public Reader {
int64_t _last_seek_pos;
int64_t _last_data_size;
size_t _shard_id = 0;
size_t _shard_count = 1; // equivalent of batch size
//!< _batch_count Defines the quantum count of the images to be read. It's usually equal to the user's batch size.
/// The loader will repeat images if necessary to be able to have images available in multiples of the load_batch_count,
/// for instance if there are 10 images in the dataset and _batch_count is 3, the loader repeats 2 images as if there are 12 images available.
size_t _batch_count = 1;
size_t _shard_count = 1;
size_t _batch_size = 1;
size_t _file_id = 0;
size_t _in_batch_read_count = 0;
bool _loop;
Expand All @@ -97,10 +96,6 @@ class MXNetRecordIOReader : public Reader {
size_t _file_count_all_shards;
void incremenet_read_ptr();
int release();
size_t get_file_shard_id();
void incremenet_file_id() { _file_id++; }
void replicate_last_image_to_fill_last_shard();
void replicate_last_batch_to_pad_partial_shard();
void read_image(unsigned char* buff, int64_t seek_position, int64_t data_size);
void read_image_names();
uint32_t DecodeFlag(uint32_t rec) { return (rec >> 29U) & 7U; };
Expand All @@ -110,4 +105,16 @@ class MXNetRecordIOReader : public Reader {
const uint32_t _kMagic = 0xced7230a;
int64_t _seek_pos, _data_size_to_read;
ImageRecordIOHeader _hdr;
int32_t _shard_size = -1;
ShardingInfo _sharding_info = ShardingInfo(); // The members of ShardingInfo determines how the data is distributed among the shards and how the last batch is processed by the pipeline.
std::vector<unsigned> _shard_start_idx_vector, _shard_end_idx_vector;
bool _stick_to_shard = false;
bool _pad_last_batch_repeated = false;
size_t _last_batch_padded_size = 0;
size_t actual_shard_size_without_padding(); // Number of files belonging to a shard (without padding)
size_t largest_shard_size_without_padding(); // Number of files belonging to a shard (with padding)
//!< Used to advance to the next shard's data to increase the entropy of the data seen by the pipeline>
void increment_shard_id();
void increment_curr_file_idx();
void compute_start_and_end_idx_of_all_shards(); // Start Idx of all the Shards
};
26 changes: 16 additions & 10 deletions rocAL/include/readers/image/tf_record_reader.h
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,8 @@ class TFRecordReader : public Reader {

TFRecordReader();

size_t last_batch_padded_size() override; // The size of the number of samples padded in the last batch

private:
//! opens the folder containnig the images
Reader::Status tf_record_reader();
Expand All @@ -81,7 +83,7 @@ class TFRecordReader : public Reader {
DIR *_sub_dir;
struct dirent *_entity;
std::vector<std::string> _file_names;
std::map<std::string, unsigned int> _file_size;
std::map<std::string, unsigned int> _file_size, _all_shard_file_sizes_padded;
unsigned _curr_file_idx;
unsigned _current_file_size;
std::string _last_id;
Expand All @@ -90,12 +92,8 @@ class TFRecordReader : public Reader {
size_t _shard_id = 0;
size_t _shard_count = 1; // equivalent of batch size
bool _last_rec;
//!< _batch_count Defines the quantum count of the images to be read. It's usually equal to the user's batch size.
/// The loader will repeat images if necessary to be able to have images available in multiples of the load_batch_count,
/// for instance if there are 10 images in the dataset and _batch_count is 3, the loader repeats 2 images as if there are 12 images available.
size_t _batch_count = 1;
size_t _batch_size = 1;
size_t _file_id = 0;
size_t _in_batch_read_count = 0;
bool _loop;
bool _shuffle;
int _read_counter = 0;
Expand All @@ -108,11 +106,19 @@ class TFRecordReader : public Reader {
tensorflow::Feature _single_feature;
void incremenet_read_ptr();
int release();
size_t get_file_shard_id();
void incremenet_file_id() { _file_id++; }
void replicate_last_image_to_fill_last_shard();
void replicate_last_batch_to_pad_partial_shard();
Reader::Status read_image(unsigned char *buff, std::string record_file_name, uint file_size);
Reader::Status read_image_names(std::ifstream &file_contents, uint file_size);
std::map<std::string, uint> _image_record_starting;
ShardingInfo _sharding_info = ShardingInfo(); // The members of ShardingInfo determines how the data is distributed among the shards and how the last batch is processed by the pipeline.
size_t _last_batch_padded_size = 0;
bool _stick_to_shard = false;
bool _pad_last_batch_repeated = false;
int32_t _shard_size = -1;
std::vector<unsigned> _shard_start_idx_vector, _shard_end_idx_vector;
void increment_curr_file_idx();
size_t actual_shard_size_without_padding(); // Number of files belonging to a shard (without padding)
size_t largest_shard_size_without_padding(); // Number of files belonging to a shard (with padding)
//!< Used to advance to the next shard's data to increase the entropy of the data seen by the pipeline>
void compute_start_and_end_idx_of_all_shards(); // Start Idx of all the Shards
void increment_shard_id();
};
5 changes: 3 additions & 2 deletions rocAL/source/readers/file_source_reader.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -260,9 +260,10 @@ Reader::Status FileSourceReader::generate_file_names() {
ERR("FileReader ShardID [" + TOSTR(_shard_id) + "] Did not load any file from " + _folder_path)

auto dataset_size = _file_count_all_shards;
size_t padded_samples = 0;
// Pad the _file_names with last element of the shard in the vector when _pad_last_batch_repeated is True
_padded_samples = ((_shard_size > 0) ? _shard_size : largest_shard_size_without_padding()) % _batch_size;
_last_batch_padded_size = ((_batch_size > 1) && (_padded_samples > 0 )) ? (_batch_size - _padded_samples) : 0;
padded_samples = ((_shard_size > 0) ? _shard_size : largest_shard_size_without_padding()) % _batch_size;
_last_batch_padded_size = ((_batch_size > 1) && (padded_samples > 0)) ? (_batch_size - padded_samples) : 0;

if (_pad_last_batch_repeated == true) {
// pad the last sample when the dataset_size is not divisible by
Expand Down
Loading