Regression - RepresenterError on dict subclass #142

jaraco · 2018-03-12T15:45:28Z

Due to #126, I'm running pyyaml at master. When I try to run code that passes on released versions of pyyaml, it fails in master:

-> raise RepresenterError("cannot represent an object", data)
(Pdb) type(data)
<class 'pmxbot.dictlib.ConfigDict'>
(Pdb) isinstance(data, dict)
True

As the object is a subclass of dict, it would previously be treated as a dict for the purpose of serialization. Is this an intentional change?

The text was updated successfully, but these errors were encountered:

jaraco · 2018-03-12T16:35:03Z

Here's a more direct example of what's newly failing:

$ .tox/python/bin/python
iPython 3.7.0b2 (tags/v3.7.0b2:b0ef5c979b, Feb 27 2018, 20:38:21)
[Clang 6.0 (clang-600.0.57)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> class mydict(dict):
...   pass
...
>>> import yaml
>>> yaml.dump(mydict(a=1))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/jaraco/Dropbox/code/yg/public/pmxbot/.tox/python/lib/python3.7/site-packages/yaml/__init__.py", line 217, indump
    return dump_all([data], stream, Dumper=Dumper, **kwds)
  File "/Users/jaraco/Dropbox/code/yg/public/pmxbot/.tox/python/lib/python3.7/site-packages/yaml/__init__.py", line 196, indump_all
    dumper.represent(data)
  File "/Users/jaraco/Dropbox/code/yg/public/pmxbot/.tox/python/lib/python3.7/site-packages/yaml/representer.py", line 26, in represent
    node = self.represent_data(data)
  File "/Users/jaraco/Dropbox/code/yg/public/pmxbot/.tox/python/lib/python3.7/site-packages/yaml/representer.py", line 57, in represent_data
    node = self.yaml_representers[None](self, data)
  File "/Users/jaraco/Dropbox/code/yg/public/pmxbot/.tox/python/lib/python3.7/site-packages/yaml/representer.py", line 229,in represent_undefined
    raise RepresenterError("cannot represent an object", data)
yaml.representer.RepresenterError: ('cannot represent an object', {'a': 1})

jaraco · 2018-03-12T16:39:12Z

Here's the previous behavior, observed on Python 3.6 and PyYAML 3.12:

$ python3.6 -m rwt pyyaml==3.12
Collecting pyyaml==3.12
  Using cached PyYAML-3.12.tar.gz
Installing collected packages: pyyaml
  Running setup.py install for pyyaml ... done
Successfully installed pyyaml-3.12
Python 3.6.4 (v3.6.4:d48ecebad5, Dec 18 2017, 21:07:28)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import yaml
>>> class MyDict(dict): pass
...
>>> yaml.dump(MyDict(a=1))
'!!python/object/new:__main__.MyDict\ndictitems: {a: 1}\n'

pradyunsg · 2018-06-06T15:11:29Z

This might be since it seems to have switched to safe-dumping by default.

https://github.com/yaml/pyyaml/blob/master/lib3/yaml/dumper.py#L27

jaraco · 2018-06-06T19:17:32Z

So... does that mean the next release will be backward-incompatible? Can you recommend a workaround that would retain the 3.12 behavior?

pradyunsg · 2018-06-06T19:21:17Z

I'm not a maintainer. Landed here from the pmxbot issue. :)

I'd guess you can try to get the "DangerousDumper" (or similarly named) class and if it's not present, fall back to the existing Dumper class when doing the dump.

That'll work now and in the future even if the next release is backwards incompatible.

IMO, the current behaviour on master is the correct behaviour. By default PyYAML should be using the "safe" dumper and loader.

jaraco · 2019-10-19T14:45:13Z

I tried using the dangerous dumper, but found that creates a YAML file that requires a dangerous loader. What I really want is to instruct YAML to serialize a dict subclass exactly as it would a dict.

dhimmel · 2020-11-24T02:41:01Z

I also ran into this issue trying to dump an instance of a dict subclass.

I found an ugly workaround on StackOverflow that you must run before calling yaml.dump:

yaml.add_representer(
    Dict_Subclass,
    lambda dumper, data: dumper.represent_mapping('tag:yaml.org,2002:map', data.items())
)

Limitations:

Dict_Subclass must be the specific dict subclass. This doesn't fix the issue for all dict subclasses, just those explicitly overriden.
It works with yaml.dump but not yaml.safe_dump.

bluenote10 · 2023-04-18T17:23:52Z

This issue doesn't seem to be limited to dict, but applies to all types that are subclasses of representable types, right? Consider for instance:

import json
import yaml

class StringSubclass(str):
    ...

class IntSubclass(int):
    ...

Both of these fail:

print(yaml.safe_dump(StringSubclass("foo")))
print(yaml.safe_dump(IntSubclass(42)))

This is unfortunate, because it is inconsistent with the behavior of json.dump which can serialize types just have, if their base class is serializable. So these two work just fine:

print(json.dumps(StringSubclass("foo")))
print(json.dumps(IntSubclass(42)))

The difference in behavior is unfortunate, because it means that (untagged "safe") yaml serialization cannot just be used in places where plain json serialization works fine. Imho safe dumping should mirror the behavior of json here and allow serialization if the base type is serializable anyway. Otherwise, the necessary work-around on user side when switching from json to yaml can get pretty ugly.

nitzmahone · 2023-04-19T20:55:56Z

Yep, I've gotten nailed by this myself plenty of times. In addition to the stuff going on in native Python, IIRC the Cython-libyaml CDumper also has some hard-coded type checks that would need to be re-thought in order to enable this kind of fallback behavior. At least for the "Python-aware" dumpers, we'd probably also have to have this kind of thing be an opt-in behavior, or make some changes and add tests to ensure that more-specific types are always selected first (so that folks that are expecting actual round-trippable serialization of arbitrary Python types still get the right type tagged on the output).

This all probably gets a little easier to manage once we merge something like #700, since it's really mostly still just about which types get registered as representers rather than wildly different behavior that would necessitate actual new Dumper subclasses.

bluenote10 · 2023-04-20T06:15:26Z

so that folks that are expecting actual round-trippable serialization of arbitrary Python types still get the right type tagged on the output

At least for the SafeDumper this wouldn't be the expected behavior, right? By definition the output of the safe dumper should be free of tags, and thus roundtrips with custom types isn't expected to work. Nonetheless inheritance is a is a relationship, and if a value is a string the safe representer should represent it as such.

If we consider the main use case of the non-safe dumper to be perfect roundtrips, this dumper should probably stay as it is?

I did a quick experiment: Replacing the following occurrences of add_representer by an add_multi_representer seems to solve this issue:

pyyaml/lib/yaml/representer.py

Lines 236 to 267 in 957ae4d

    
           SafeRepresenter.add_representer(str, 
        
                   SafeRepresenter.represent_str) 
        
           SafeRepresenter.add_representer(bytes, 
        
                   SafeRepresenter.represent_binary) 
        
           SafeRepresenter.add_representer(bool, 
        
                   SafeRepresenter.represent_bool) 
        
           SafeRepresenter.add_representer(int, 
        
                   SafeRepresenter.represent_int) 
        
           SafeRepresenter.add_representer(float, 
        
                   SafeRepresenter.represent_float) 
        
           SafeRepresenter.add_representer(list, 
        
                   SafeRepresenter.represent_list) 
        
           SafeRepresenter.add_representer(tuple, 
        
                   SafeRepresenter.represent_list) 
        
           SafeRepresenter.add_representer(dict, 
        
                   SafeRepresenter.represent_dict) 
        
           SafeRepresenter.add_representer(set, 
        
                   SafeRepresenter.represent_set) 
        
           SafeRepresenter.add_representer(datetime.date, 
        
                   SafeRepresenter.represent_date) 
        
           SafeRepresenter.add_representer(datetime.datetime, 
        
                   SafeRepresenter.represent_datetime)

The downside would be a slightly worse performance, because the logic would then always have to go into the second branch with the iteration here:

pyyaml/lib/yaml/representer.py

Lines 46 to 53 in 957ae4d

    
           data_types = type(data).__mro__ 
        
           if data_types[0] in self.yaml_representers: 
        
               node = self.yaml_representers[data_types[0]](self, data) 
        
           else: 
        
               for data_type in data_types: 
        
                   if data_type in self.yaml_multi_representers: 
        
                       node = self.yaml_multi_representers[data_type](self, data) 
        
                       break

To get back to the performance as it is now, we could add a multi representer to the normal representers as well (i.e., adding it to both dicts in line 75) so that the logic here would again use branch line 48 in the majority of cases.

IIRC the Cython-libyaml CDumper also has some hard-coded type checks that would need to be re-thought in

Oh I wasn't aware of that other implementation, so I'm not sure how the change suggested above would relate to that. If it could work like I suggested, may I open a PR?

nitzmahone · 2023-04-20T13:50:22Z

At least for the SafeDumper this wouldn't be the expected behavior, right?

Correct, but it's really a little more nuanced than that, especially in light of what's going on with #700 and YAML 1.2 support, since the entire concept of "safe" and "not safe" gain a few more dimensions. The problem is making sure that we stack the representers in the right order so that anything that subclasses them and/or adds further multi-representers doesn't get hidden by the fallback. #700 makes that a little more visible and explicit, but still probably not quite enough to prevent folks from being bitten by an overly "greedy" default fallback that masks something they added explicitly.

Feel free to play around with some ideas and impls on a draft PR if you like, but we're probably not going to actually merge anything that changes this behavior until after the YAML 1.2 support is nailed down, since it potentially introduces some new variables to consider. Just want to set expectations 😉

jaraco mentioned this issue Jun 6, 2018

Tests failing on Python 3.7 pmxbot/pmxbot#82

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regression - RepresenterError on dict subclass #142

Regression - RepresenterError on dict subclass #142

jaraco commented Mar 12, 2018

jaraco commented Mar 12, 2018

jaraco commented Mar 12, 2018

pradyunsg commented Jun 6, 2018

jaraco commented Jun 6, 2018

pradyunsg commented Jun 6, 2018

jaraco commented Oct 19, 2019

dhimmel commented Nov 24, 2020

bluenote10 commented Apr 18, 2023 •

edited

Loading

nitzmahone commented Apr 19, 2023

bluenote10 commented Apr 20, 2023

nitzmahone commented Apr 20, 2023 •

edited

Loading

Regression - RepresenterError on dict subclass #142

Regression - RepresenterError on dict subclass #142

Comments

jaraco commented Mar 12, 2018

jaraco commented Mar 12, 2018

jaraco commented Mar 12, 2018

pradyunsg commented Jun 6, 2018

jaraco commented Jun 6, 2018

pradyunsg commented Jun 6, 2018

jaraco commented Oct 19, 2019

dhimmel commented Nov 24, 2020

bluenote10 commented Apr 18, 2023 • edited Loading

nitzmahone commented Apr 19, 2023

bluenote10 commented Apr 20, 2023

nitzmahone commented Apr 20, 2023 • edited Loading

bluenote10 commented Apr 18, 2023 •

edited

Loading

nitzmahone commented Apr 20, 2023 •

edited

Loading