Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance output of audformat.Database.description #59

Open
hagenw opened this issue Apr 6, 2021 · 3 comments
Open

Enhance output of audformat.Database.description #59

hagenw opened this issue Apr 6, 2021 · 3 comments
Labels
enhancement New feature or request

Comments

@hagenw
Copy link
Member

hagenw commented Apr 6, 2021

At the moment we get the following:

>>> import audb
>>> db = audb.load('emodb', version='1.1.0')
>>> db.description
'Berlin Database of Emotional Speech. A German database of emotional utterances spoken by actors recorded as a part of the DFG funded research project SE462/3-1 in 1997 and 1999. Recordings took place in the anechoic chamber of the Technical University Berlin, department of Technical Acoustics. It contains about 500 utterances from ten different actors expressing basic six emotions and neutral.'

which is not very nice to read.

It get's even worse if you have some real formatting in the description string.
For example, for audioset the description contains:

AudioSet ontology categories of the two top hierarchies:

Human sounds            Animal                   Music
|-Human voice           |-Domestic animals, pets |-Musical instrument
|-Whistling             |-Livestock, farm        |-Music genre
|-Respiratory sounds    | animals, working       |-Musical concepts
|-Human locomotion      | animals                |-Music role
|-Digestive             \-Wild animals           \-Music mood
|-Hands
|-Heart sounds,         Sounds of things         Natural sounds
| heartbeat             |-Vehicle                |-Wind
|-Otoacoustic emission  |-Engine                 |-Thunderstorm
\-Human group actions   |-Domestic sounds,       |-Water
                        | home sounds            \-Fire
Source-ambiguous sounds |-Bell
|-Generic impact sounds |-Alarm                  Channel, environment
|-Surface contact       |-Mechanisms             and background
|-Deformable shell      |-Tools                  |-Acoustic environment
|-Onomatopoeia          |-Explosion              |-Noise
|-Silence               |-Wood                   \-Sound reproduction
\-Other sourceless      |-Glass
                        |-Liquid
                        |-Miscellaneous sources
                        \-Specific impact sounds

which would be nice if we could preserve it when printing to screen.

@hagenw hagenw added the enhancement New feature or request label Apr 6, 2021
@hagenw
Copy link
Member Author

hagenw commented Feb 25, 2022

It's not only description:

>>> db = audb.load('iemocap', versdion='2.2.0', only_metadata=True)
>>> db.schemes["dialog.act"]
description: Dialogue act annotations.Released by https://github.com/sahatulika15/EMOTyDA.Please
  cite the respective paper whenever using it.
dtype: str
labels: {g: greeting, q: question, ans: answer, o: statement-opinion, s: statement-non-opinion,
  ap: apology, ag: agreement, dag: disagreement, a: acknowledgement, b: backchanneling,
  c: command, oth: other}
bibtex: "\n                @inproceedings{saha-etal-2020-towards,\n              \
  \      title = \"Towards Emotion-aided Multi-modal Dialogue Act Classification\"\
  ,\n                    author = \"Saha, Tulika  and\n                    Patra,\
  \ Aditya  and\n                    Saha, Sriparna  and\n                    Bhattacharyya,\
  \ Pushpak\",\n                    booktitle = \"Proceedings of the 58th Annual Meeting\
  \ of the Association for Computational Linguistics\",\n                    month\
  \ = jul,\n                    year = \"2020\",\n                    address = \"\
  Online\",\n                    publisher = \"Association for Computational Linguistics\"\
  ,\n                    url = \"https://www.aclweb.org/anthology/2020.acl-main.402\"\
  ,\n                    doi = \"10.18653/v1/2020.acl-main.402\",\n              \
  \      pages = \"4361--4372\",\n                }\n            "

@frankenjoe
Copy link
Collaborator

frankenjoe commented Feb 28, 2022

yaml offers the option to modify the way scalars are presented, e.g.:

import audformat


bibtex = '@inproceedings{saha-etal-2020-towards,\n\
    title = "Towards Emotion-aided Multi-modal Dialogue Act Classification"\n\
    author = "Saha, Tulika and\n\
        Patra, Aditya and\n\
        Saha, Sriparna and\n\
        Bhattacharyya, Pushpak"\n\
    booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",\n\
    month = jul,\n\
    year = "2020",\n\
    address = "Online",\n\
    publisher = "Association for Computational Linguistics"\n\
    url = "https://www.aclweb.org/anthology/2020.acl-main.402"\n\
    doi = "10.18653/v1/2020.acl-main.402",\n\
    pages = "4361--4372"'

db = audformat.testing.create_db(minimal=True)
db.schemes['scheme'] = audformat.Scheme(meta={'bibtex': bibtex})
print(db.schemes['scheme'])
{dtype: str, bibtex: "@inproceedings{saha-etal-2020-towards,\n    title = \"Towards\
    \ Emotion-aided Multi-modal Dialogue Act Classification\"\n    author = \"Saha,\
    \ Tulika and\n        Patra, Aditya and\n        Saha, Sriparna and\n        Bhattacharyya,\
    \ Pushpak\"\n    booktitle = \"Proceedings of the 58th Annual Meeting of the Association\
    \ for Computational Linguistics\",\n    month = jul,\n    year = \"2020\",\n \
    \   address = \"Online\",\n    publisher = \"Association for Computational Linguistics\"\
    \n    url = \"https://www.aclweb.org/anthology/2020.acl-main.402\"\n    doi =\
    \ \"10.18653/v1/2020.acl-main.402\",\n    pages = \"4361--4372\""}
dtype: str

can be prettified to:

def str_presenter(dumper, data):
  if len(data.splitlines()) > 1:  # check for multiline string
    return dumper.represent_scalar('tag:yaml.org,2002:str', data, style='|')
  return dumper.represent_scalar('tag:yaml.org,2002:str', data)

yaml.add_representer(str, str_presenter)
print(db.schemes['scheme'])
bibtex: |-
  @inproceedings{saha-etal-2020-towards,
      title = "Towards Emotion-aided Multi-modal Dialogue Act Classification"
      author = "Saha, Tulika and
          Patra, Aditya and
          Saha, Sriparna and
          Bhattacharyya, Pushpak"
      booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
      month = jul,
      year = "2020",
      address = "Online",
      publisher = "Association for Computational Linguistics"
      url = "https://www.aclweb.org/anthology/2020.acl-main.402"
      doi = "10.18653/v1/2020.acl-main.402",
      pages = "4361--4372"

For some reason it did not work with the above example, though. But the formatting of the string looks also a bit odd.

@hagenw
Copy link
Member Author

hagenw commented Aug 5, 2022

This might indeed be a solution.
For audioset it will not help as the description string does not contain \n at the moment, but we could update it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants