-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add map argument to audb.load_table() #447
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
semantics of "map"
https://github.com/audeering/audb/pull/447/files#top
Already previously the map argument in audb has confused me:
In pd.rename, the mapper (or the columns
argument) accepts the union of dict-like
and function
, whereas in pd.map or pd.DataFrame.apply it only accepts a callable
, i.e. a function or a lambda. And audb only accepts a dict.
In other words, the audb use of map
uses map to designate a hashmap
aka dict
as a data structure, and not as a higher order function
as in functional programming, where the function is applied on each element of a collection.
So there seems to be a slight mismatch of audb on one side, and pandas (and also the builting python map) on the other. Would it make sense to mention this terminology in the documentation at some level? Eventually this behavior is implemented in audformat.Column.get
, probably this would be the right location for this. This is not a need-be, but reviewing this PR, I got surprised about the signature of the load_table
arg and the terminology of map
.
In the docstring you describe the argument as dict_mapping
. In a world in which api breaking was not be considrered evil, I would find it a better argument name.
test_load_table_map
Interestingly, this led me to some confusing patterns when testing:
Running a test from the commandline succeeded:
pytest --color=yes 'tests/test_load.py::test_load_table_map'
But when I run the same test through my editor plugin I get this:
FAIL Required test coverage of 100% not reached. Total coverage: 0.00%
======================================= short test summary info ========================================
FAILED tests/test_load.py::test_load_table_map[1.0.0-files-None-expected0] - TypeError: load_table() got an unexpected keyword argument 'map'
FAILED tests/test_load.py::test_load_table_map[1.0.0-files-map1-expected1] - TypeError: load_table() got an unexpected keyword argument 'map'
========================================== 2 failed in 1.12s ===========================================
Normally they converge quite well. I have no explanation on that side.
semantics of "map"The Regarding your question of the allowed type and name of >>> db = audb.load("emodb", version="1.4.1", only_metadata=True, full_path=False, verbose=False)
>>> db.schemes["speaker"].labels
'speaker'
>>> db["speaker"].df.head()
age gender language
speaker
3 31 male deu
8 34 female deu
9 21 female deu
10 32 male deu
11 26 male deu If you use >>> db["files"]["speaker"].get(map="age").head()
file
wav/03a01Fa.wav 31
wav/03a01Nc.wav 31
wav/03a01Wa.wav 31
wav/03a02Fc.wav 31
wav/03a02Nc.wav 31
Name: age, dtype: Int64 When you use >>> db["files"].get(map={"speaker": "age"}).head()
duration transcription age
file
wav/03a01Fa.wav 0 days 00:00:01.898250 a01 31
wav/03a01Nc.wav 0 days 00:00:01.611250 a01 31
wav/03a01Wa.wav 0 days 00:00:01.877812500 a01 31
wav/03a02Fc.wav 0 days 00:00:02.006250 a02 31
wav/03a02Nc.wav 0 days 00:00:01.439812500 a02 31
>>> db["files"].get(map={"speaker": ["age", "gender"]}).head()
duration transcription age gender
file
wav/03a01Fa.wav 0 days 00:00:01.898250 a01 31 male
wav/03a01Nc.wav 0 days 00:00:01.611250 a01 31 male
wav/03a01Wa.wav 0 days 00:00:01.877812500 a01 31 male
wav/03a02Fc.wav 0 days 00:00:02.006250 a02 31 male
wav/03a02Nc.wav 0 days 00:00:01.439812500 a02 31 male
>>> db["files"].get(map={"speaker": ["speaker", "age"]}).head()
duration speaker transcription age
file
wav/03a01Fa.wav 0 days 00:00:01.898250 3 a01 31
wav/03a01Nc.wav 0 days 00:00:01.611250 3 a01 31
wav/03a01Wa.wav 0 days 00:00:01.877812500 3 a01 31
wav/03a02Fc.wav 0 days 00:00:02.006250 3 a02 31
wav/03a02Nc.wav 0 days 00:00:01.439812500 3 a02 31 We thought it would be nice to have the same name for the argument in
You are right, this is indeed not ideal. I'm still not in favor of changing the argument name, but maybe extending the docstring would help. |
|
It was a different problem: Putting the cursor in the test parametrization selects the previos test. So an editor problem. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All points have been adressed. The meaning of the "map" kwarg departs a little from how it is used elsewhere but this can be mended in a docstring at a later stage.
Adds the
map
argument toaudb.load_table()
to provide the user the possibility to map values of columns if they contain respective scheme labels. Without the new added argument, a user would have to usedb = audb.load()
+db.get(map=)
instead.It also fixes a bug inside
audb.load_table()
to only load misc tables, that are needed by a scheme of the selected table. Before, it was downloading all misc tables that were used as labels inside any scheme of the database.