Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More efficient video processing #170

Merged
merged 3 commits into from
Jul 10, 2023
Merged

More efficient video processing #170

merged 3 commits into from
Jul 10, 2023

Conversation

ejolly
Copy link
Contributor

@ejolly ejolly commented Jun 22, 2023

This is an attempt to dramatically lower the memory usage of .detect_video() This PR touches #139 and #165.

Previously we were relying on torch's read_video which always reads all video frames into memory at once. Our skip_frames was then used to slice the frame data during processing to speed things up. This approach is reasonable for short videos, videos with lower framerates, or smaller resolutions, but incrediby inefficient for larger files.

We had hoped there would be an informative error from the torch side of things in case this process failed when running out of memory, but multiple user reports (and validated locally) show that the kernel just crashes, hangs, or even causes a computuer freeze...hardly a graceful failure.

While torch has a VideoReader class, it's currently in beta and using it requires compiling torch from source 🙃.

I first tried storing video frame-counts and making repeated calls to read_video on a per-frame basis, but getting the pts and timing right was non-trivial.

So in order to avoid adding another hard-to-install dependency like openCV, this PR now uses a trick to lazy load video-frames by wrapping and slicing a pyav generator object; the library that torch's read_video uses under-the-hood.

I've verified that even extremely long, high resolution videos work with .detect_video, and that the approach still works with batching, since we're still wrapping our VideoDataset in a torch DataLoader.

@ljchang @TiankangXie If you could test this branch on GPU machines or other platforms to see if we're incurring any additional unexpected overhead that would be super helpful!

@ejolly
Copy link
Contributor Author

ejolly commented Jun 29, 2023

@ljchang This PR now includes approximate times for each processed frame as a column the Fex output. Also I realized that parallelization is possible on top of batching because DataLoader takes a num_workers arg. For this reason I'm going to keep the current strategy with creating a new generator for each call to load_frame in VideoDataset instead of using a single shared generator to avoid race conditions.

@ejolly ejolly merged commit 3973bf1 into main Jul 10, 2023
1 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant