Enhance output of audformat.Database.description #59

hagenw · 2021-04-06T10:02:26Z

At the moment we get the following:

>>> import audb
>>> db = audb.load('emodb', version='1.1.0')
>>> db.description
'Berlin Database of Emotional Speech. A German database of emotional utterances spoken by actors recorded as a part of the DFG funded research project SE462/3-1 in 1997 and 1999. Recordings took place in the anechoic chamber of the Technical University Berlin, department of Technical Acoustics. It contains about 500 utterances from ten different actors expressing basic six emotions and neutral.'

which is not very nice to read.

It get's even worse if you have some real formatting in the description string.
For example, for audioset the description contains:

AudioSet ontology categories of the two top hierarchies:

Human sounds            Animal                   Music
|-Human voice           |-Domestic animals, pets |-Musical instrument
|-Whistling             |-Livestock, farm        |-Music genre
|-Respiratory sounds    | animals, working       |-Musical concepts
|-Human locomotion      | animals                |-Music role
|-Digestive             \-Wild animals           \-Music mood
|-Hands
|-Heart sounds,         Sounds of things         Natural sounds
| heartbeat             |-Vehicle                |-Wind
|-Otoacoustic emission  |-Engine                 |-Thunderstorm
\-Human group actions   |-Domestic sounds,       |-Water
                        | home sounds            \-Fire
Source-ambiguous sounds |-Bell
|-Generic impact sounds |-Alarm                  Channel, environment
|-Surface contact       |-Mechanisms             and background
|-Deformable shell      |-Tools                  |-Acoustic environment
|-Onomatopoeia          |-Explosion              |-Noise
|-Silence               |-Wood                   \-Sound reproduction
\-Other sourceless      |-Glass
                        |-Liquid
                        |-Miscellaneous sources
                        \-Specific impact sounds

which would be nice if we could preserve it when printing to screen.

The text was updated successfully, but these errors were encountered:

hagenw · 2022-02-25T07:27:59Z

It's not only description:

>>> db = audb.load('iemocap', versdion='2.2.0', only_metadata=True)
>>> db.schemes["dialog.act"]
description: Dialogue act annotations.Released by https://github.com/sahatulika15/EMOTyDA.Please
  cite the respective paper whenever using it.
dtype: str
labels: {g: greeting, q: question, ans: answer, o: statement-opinion, s: statement-non-opinion,
  ap: apology, ag: agreement, dag: disagreement, a: acknowledgement, b: backchanneling,
  c: command, oth: other}
bibtex: "\n                @inproceedings{saha-etal-2020-towards,\n              \
  \      title = \"Towards Emotion-aided Multi-modal Dialogue Act Classification\"\
  ,\n                    author = \"Saha, Tulika  and\n                    Patra,\
  \ Aditya  and\n                    Saha, Sriparna  and\n                    Bhattacharyya,\
  \ Pushpak\",\n                    booktitle = \"Proceedings of the 58th Annual Meeting\
  \ of the Association for Computational Linguistics\",\n                    month\
  \ = jul,\n                    year = \"2020\",\n                    address = \"\
  Online\",\n                    publisher = \"Association for Computational Linguistics\"\
  ,\n                    url = \"https://www.aclweb.org/anthology/2020.acl-main.402\"\
  ,\n                    doi = \"10.18653/v1/2020.acl-main.402\",\n              \
  \      pages = \"4361--4372\",\n                }\n            "

frankenjoe · 2022-02-28T08:45:15Z

yaml offers the option to modify the way scalars are presented, e.g.:

import audformat


bibtex = '@inproceedings{saha-etal-2020-towards,\n\
    title = "Towards Emotion-aided Multi-modal Dialogue Act Classification"\n\
    author = "Saha, Tulika and\n\
        Patra, Aditya and\n\
        Saha, Sriparna and\n\
        Bhattacharyya, Pushpak"\n\
    booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",\n\
    month = jul,\n\
    year = "2020",\n\
    address = "Online",\n\
    publisher = "Association for Computational Linguistics"\n\
    url = "https://www.aclweb.org/anthology/2020.acl-main.402"\n\
    doi = "10.18653/v1/2020.acl-main.402",\n\
    pages = "4361--4372"'

db = audformat.testing.create_db(minimal=True)
db.schemes['scheme'] = audformat.Scheme(meta={'bibtex': bibtex})
print(db.schemes['scheme'])

{dtype: str, bibtex: "@inproceedings{saha-etal-2020-towards,\n    title = \"Towards\
    \ Emotion-aided Multi-modal Dialogue Act Classification\"\n    author = \"Saha,\
    \ Tulika and\n        Patra, Aditya and\n        Saha, Sriparna and\n        Bhattacharyya,\
    \ Pushpak\"\n    booktitle = \"Proceedings of the 58th Annual Meeting of the Association\
    \ for Computational Linguistics\",\n    month = jul,\n    year = \"2020\",\n \
    \   address = \"Online\",\n    publisher = \"Association for Computational Linguistics\"\
    \n    url = \"https://www.aclweb.org/anthology/2020.acl-main.402\"\n    doi =\
    \ \"10.18653/v1/2020.acl-main.402\",\n    pages = \"4361--4372\""}
dtype: str

can be prettified to:

def str_presenter(dumper, data):
  if len(data.splitlines()) > 1:  # check for multiline string
    return dumper.represent_scalar('tag:yaml.org,2002:str', data, style='|')
  return dumper.represent_scalar('tag:yaml.org,2002:str', data)

yaml.add_representer(str, str_presenter)
print(db.schemes['scheme'])

bibtex: |-
  @inproceedings{saha-etal-2020-towards,
      title = "Towards Emotion-aided Multi-modal Dialogue Act Classification"
      author = "Saha, Tulika and
          Patra, Aditya and
          Saha, Sriparna and
          Bhattacharyya, Pushpak"
      booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
      month = jul,
      year = "2020",
      address = "Online",
      publisher = "Association for Computational Linguistics"
      url = "https://www.aclweb.org/anthology/2020.acl-main.402"
      doi = "10.18653/v1/2020.acl-main.402",
      pages = "4361--4372"

For some reason it did not work with the above example, though. But the formatting of the string looks also a bit odd.

hagenw · 2022-08-05T09:36:02Z

This might indeed be a solution.
For audioset it will not help as the description string does not contain \n at the moment, but we could update it.

hagenw added the enhancement New feature or request label Apr 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance output of audformat.Database.description #59

Enhance output of audformat.Database.description #59

hagenw commented Apr 6, 2021 •

edited

Loading

hagenw commented Feb 25, 2022

frankenjoe commented Feb 28, 2022 •

edited

Loading

hagenw commented Aug 5, 2022

Enhance output of audformat.Database.description #59

Enhance output of audformat.Database.description #59

Comments

hagenw commented Apr 6, 2021 • edited Loading

hagenw commented Feb 25, 2022

frankenjoe commented Feb 28, 2022 • edited Loading

hagenw commented Aug 5, 2022

hagenw commented Apr 6, 2021 •

edited

Loading

frankenjoe commented Feb 28, 2022 •

edited

Loading