Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

724: raise an error if a pyxform reference is malformed #734

Merged
merged 2 commits into from
Oct 29, 2024

Conversation

lindsay-stevens
Copy link
Contributor

Closes #724

Changes:

  • pyxform_reference.py
    • new module to check pyxform reference syntax
    • unit test generates 2874 unique usage permutations
  • xls2json.py
    • remove extra iteration of sheet data in replace_smart_quotes by combining it with clean_text_values and calling that instead
  • expression.py
    • move expression parser from utils to here
    • update usages of lexer to reference a cached wrapper func instead
  • use compiled regex patterns directly instead of importing re as well

Why is this the best possible solution? Were any other approaches considered?

The approach was to try and consistently apply pyxform reference syntax validation anywhere that references may be used in a XLSForm, ideally where the sheet data was already being iterated over, in order to minimise performance impact. Almost all sheet data was being passed through remove_smart_quotes and/or clean_text_values so I consolidated them with the new check under clean_text_values.

What are the regression risks?

May be validating or cleaning text that wasn't being validated or cleaned before, but if an error is raised then the value is probably invalid.

Does this change require updates to documentation? If so, please file an issue here and include the link below.

No. Although on xlsform.org the concept of a pyxform reference is not addressed at all except in passing and I'm not sure even the ODK docs give that concept a name?

Before submitting this PR, please make sure you have:

  • included test cases for core behavior and edge cases in tests
  • run python -m unittest and verified all tests pass
  • run ruff format pyxform tests and ruff check pyxform tests to lint code
  • verified that any code or assets from external sources are properly credited in comments

- pyxform_reference.py
  - new module to check pyxform reference syntax
  - unit test generates 2874 unique usage permutations
- xls2json.py
  - remove extra iteration of sheet data in replace_smart_quotes by
    combining it with clean_text_values and calling that instead
- expression.py
  - move expression parser from utils to here
  - update usages of lexer to reference a cached wrapper func instead
- use compiled regex patterns directly instead of importing re as well
- looks to have been unused since about 2012 so it's probably safe to go

lexicon = [(v, get_tokenizer(k)) for k, v in lexer_rules.items()]
# re.Scanner is undocumented but has been around since at least 2003
# https://mail.python.org/pipermail/python-dev/2003-April/035075.html
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"a masterpiece of work to get inspiration from, but not as a tool to give out to anybody"
-- https://mail.python.org/pipermail/python-dev/2003-April/035070.html

Feels like annoying fence-sitting! https://bugs.python.org/issue5337

Copy link
Contributor

@lognaturel lognaturel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beautiful, thank you!!

@lognaturel lognaturel merged commit 61bb3c3 into XLSForm:master Oct 29, 2024
10 checks passed
@lindsay-stevens lindsay-stevens deleted the pyxform-724 branch October 30, 2024 06:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Error if there's a ${ with a space or newline before the next }
2 participants