Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement pydantic validators for string format checks #406

Open
MarshalX opened this issue Oct 1, 2024 · 0 comments
Open

Implement pydantic validators for string format checks #406

MarshalX opened this issue Oct 1, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@MarshalX
Copy link
Owner

MarshalX commented Oct 1, 2024

lexicon`s string type has the "type" field which directs strict string pattern https://atproto.com/specs/lexicon

here is the list of these formats (https://atproto.com/specs/lexicon#string-formats):

We could make our model validation more strict. For someone who wants to use SDK`s models from the server side. To implement PDS, for example.

snarfed implemented this string validations and kindly shared with us: https://github.com/snarfed/lexrpc/blob/41a858c2c28ad212df64f347270c3a8092743f1b/lexrpc/base.py#L439-L522

    def _validate_string_format(self, val, format):
        """Validates an ATProto string value against a format.

        https://atproto.com/specs/lexicon#string-formats

        Args:
          val (str)
          format (str): one of the ATProto string formats

        Raises:
          ValidationError: if the value is invalid for the given format
        """
        def check(condition):
            if not condition:
                raise ValidationError(f'is invalid for format {format}')

        check(val)

        # TODO: switch to match once we require Python 3.10+
        if format == 'at-identifier':
            check(DID_RE.match(val) or DOMAIN_RE.match(val.lower()))

        elif format == 'at-uri':
            check(len(val) < 8 * 1024)
            check(AT_URI_RE.match(val))
            check('/./' not in val
                  and '/../' not in val
                  and not val.endswith('/.')
                  and not val.endswith('/..'))

        elif format == 'cid':
            # ideally I'd use CID.decode here, but it doesn't support CIDv5,
            # it's too strict about padding, etc.
            check(CID_RE.match(val))

        elif format == 'datetime':
            check('T' in val)

            orig_val = val
            # timezone is required
            val = re.sub(r'([+-][0-9]{2}:[0-9]{2}|Z)$', '', orig_val)
            check(val != orig_val)

            # strip fractional seconds
            val = re.sub(r'\.[0-9]+$', '', val)

            try:
                datetime.fromisoformat(val)
            except ValueError:
                check(False)

        elif format == 'did':
            check(DID_RE.match(val))

        elif format == 'nsid':
            check(len(val) <= 317)
            check(NSID_RE.match(val) and '.' in val)

        elif format in 'handle':
            check(len(val) <= 253)
            check(DOMAIN_RE.match(val.lower()))

        elif format == 'tid':
            check(TID_RE.match(val))
            # high bit, big-endian, can't be 1
            check(not ord(val[0]) & 0x40)

        elif format == 'record-key':
            check(val not in ('.', '..') and RKEY_RE.match(val))

        elif format == 'uri':
            check(len(val) < 8 * 1024)
            check(' ' not in val)
            parsed = urlparse(val)
            check(parsed.scheme
                  and parsed.scheme[0].lower() in string.ascii_lowercase
                  and (parsed.netloc or parsed.path or parsed.query
                       or parsed.fragment))

        elif format == 'language':
            check(LANG_RE.match(val))

        else:
            raise ValidationError(f'unknown format {format}')

the code above is licensed under CC0 1.0 Universal

I see this task as:

  • implement pydantic validators. 1 per each string format
  • tune models generator to apply validators for fields
@MarshalX MarshalX added the enhancement New feature or request label Oct 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant