Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use a single pipeline function to decode a video file (ex: .mp4) into video AND audio tensors #5597

Open
1 task done
zade-twelvelabs opened this issue Aug 5, 2024 · 1 comment
Assignees
Labels
question Further information is requested

Comments

@zade-twelvelabs
Copy link

Describe the question.

  1. fn.readers.video must be used to read and decode video files
  2. fn.readers.file must be used to decode audio files, but does not accept video formats

So if I can't uses fn.readers.file to read a videos audio, and fn.readers.video does not decode video audio, how do I decode a .mp4 files audio?

Check for duplicates

  • I have searched the open bugs/issues and have found no duplicates for this bug report
@zade-twelvelabs zade-twelvelabs added the question Further information is requested label Aug 5, 2024
@JanuszL
Copy link
Contributor

JanuszL commented Aug 5, 2024

Hi @zade-twelvelabs,

Thank you for reaching out. Currently, DALI doesn't support decoding audio from mp4 files. The current audio decoding capabilities (and the flow) are described here.
What you can do is use the external source operator and utilize FFmpeg to load and decode audio from mp4 containers. As audio decoding is not GPU accelerated in DALI, there shouldn't be a substantial perf overhead due to this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants