You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
(mypy) C:\Users\Nawaz-Server\Documents\ml>python myest.py
[C:\vcpkg\buildtrees\mpg123\src\0d8db63f9b-3db975bc05.clean\src\libmpg123\layer3.c:INT123_do_layer3():1801] error: dequantization failed!
{'audio': {'path': 'C:\\Users\\Nawaz-Server\\.cache\\huggingface\\hub\\datasets--fawazahmed0--bug-audio\\snapshots\\fab1398431fed1c0a2a7bff0945465bab8b5daef\\data\\Ghamadi\\037135.mp3', 'array': array([ 0.00000000e+00, -2.86519935e-22, -2.56504911e-21, ...,
-1.94239747e-02, -2.42924765e-02, -2.99104657e-02]), 'sampling_rate': 22050}, 'reciter': 'Ghamadi', 'transcription': 'الا عجوز ا في الغبرين', 'line': 3923, 'chapter': 37, 'verse': 135, 'text': 'إِلَّا عَجُوزࣰ ا فِي ٱلۡغَٰبِرِينَ'}
Traceback (most recent call last):
File "C:\Users\Nawaz-Server\Documents\ml\myest.py", line 5, in<module>fordatain dataset["train"]:
~~~~~~~^^^^^^^^^
File "C:\Users\Nawaz-Server\.conda\envs\mypy\Lib\site-packages\datasets\arrow_dataset.py", line 2372, in __iter__
formatted_output = format_table(
^^^^^^^^^^^^^
File "C:\Users\Nawaz-Server\.conda\envs\mypy\Lib\site-packages\datasets\formatting\formatting.py", line 639, in format_table
return formatter(pa_table, query_type=query_type)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Nawaz-Server\.conda\envs\mypy\Lib\site-packages\datasets\formatting\formatting.py", line 403, in __call__
return self.format_row(pa_table)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Nawaz-Server\.conda\envs\mypy\Lib\site-packages\datasets\formatting\formatting.py", line 444, in format_row
row = self.python_features_decoder.decode_row(row)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Nawaz-Server\.conda\envs\mypy\Lib\site-packages\datasets\formatting\formatting.py", line 222, in decode_row
return self.features.decode_example(row) if self.features else row
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Nawaz-Server\.conda\envs\mypy\Lib\site-packages\datasets\features\features.py", line 2042, in decode_example
column_name: decode_nested_example(feature, value, token_per_repo_id=token_per_repo_id)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Nawaz-Server\.conda\envs\mypy\Lib\site-packages\datasets\features\features.py", line 1403, in decode_nested_example
return schema.decode_example(obj, token_per_repo_id=token_per_repo_id)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Nawaz-Server\.conda\envs\mypy\Lib\site-packages\datasets\features\audio.py", line 184, in decode_example
array, sampling_rate = sf.read(f)
^^^^^^^^^^
File "C:\Users\Nawaz-Server\.conda\envs\mypy\Lib\site-packages\soundfile.py", line 285, inread
with SoundFile(file, 'r', samplerate, channels,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Nawaz-Server\.conda\envs\mypy\Lib\site-packages\soundfile.py", line 658, in __init__
self._file = self._open(file, mode_int, closefd)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Nawaz-Server\.conda\envs\mypy\Lib\site-packages\soundfile.py", line 1216, in _open
raise LibsndfileError(err, prefix="Error opening {0!r}: ".format(self.name))
soundfile.LibsndfileError: Error opening <_io.BufferedReader name='C:\\Users\\Nawaz-Server\\.cache\\huggingface\\hub\\datasets--fawazahmed0--bug-audio\\snapshots\\fab1398431fed1c0a2a7bff0945465bab8b5daef\\data\\Ghamadi\\037136.mp3'>: Format not recognised.
Expected behavior
Everything should work fine, as loading the problematic audio file directly with soundfile package works fine
code:
import soundfile as sf
print(sf.read('C:\\Users\\Nawaz-Server\\.cache\\huggingface\\hub\\datasets--fawazahmed0--bug-audio\\snapshots\\fab1398431fed1c0a2a7bff0945465bab8b5daef\\data\\Ghamadi\\037136.mp3'))
Describe the bug
Accessing audio dataset value throws
Format not recognised error
Steps to reproduce the bug
code:
output:
Expected behavior
Everything should work fine, as loading the problematic audio file directly with soundfile package works fine
code:
output:
Environment info
datasets
version: 3.0.2huggingface_hub
version: 0.26.2fsspec
version: 2024.10.0The text was updated successfully, but these errors were encountered: