Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error parsing xhtml: Unexpected character \'>\' (code 62) expected \'=\'\\n #92

Closed
harish0619 opened this issue Jun 21, 2023 · 6 comments

Comments

@harish0619
Copy link

harish0619 commented Jun 21, 2023

I have a markdown file and I convert it into a mediawiki format using pypandoc and then use md2cf to upload to confluence. However, I'm facing the issue ( pasted in subject ) everytime i try to run the code.

Can you help me in pointing out what could be happening?

Unfortunately, the file that i'm trying to convert is a proprietary document and i can't share it here.

@galund
Copy link

galund commented Jun 30, 2023

Mediawiki format is similar but not the same as Markdown, so I don't think I would expect that to work. Why are you not uploading the original markdown using md2cf?

@harish0619
Copy link
Author

harish0619 commented Jul 5, 2023

Because it throws the same error as i described in the issue.

b'{"statusCode":400,"data":{"authorized":false,"valid":true,"allowedInReadOnlyMode":true,"errors":[],"successful":false},"message":"Error parsing xhtml: Unexpected character
'>' (code 62) expected '='\n at : [11,26]","reason":"Bad Request"}'

Is there a way to read the xthml file it generates during the process?

@galund
Copy link

galund commented Jul 5, 2023

This sounds a bit like the problem I had in #81 - I wonder if you have some image alt text with an angle bracket or quote mark in it, which seems to cause the MD->HTML conversion lib Mistune to produce broken XHTML.

You get more debug information if you add the --debug flag to md2cf.

If you want to see what's in the HTML when it's breaking, you might want to add
import pdb; pdb.set_trace()

to after line 440 (after where it says except HTTPError as e:) and have a poke around in the page object.

You also might like to try my branch that has an updated version of Mistune https://github.com/alphagov/md2cf/tree/upgrade-mistune.

@harish0619
Copy link
Author

Thanks for the swift response. I tried your branch but I get the same error and i tried using debug and pasting the response below. Can you point out exactly where I can put the debugger trace?

`─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /usr/local/lib/python3.7/site-packages/md2cf-2.1.0-py3.7.egg/md2cf/main.py:402 in main │
│ │
│ [Errno 20] Not a directory: │
│ '/usr/local/lib/python3.7/site-packages/md2cf-2.1.0-py3.7.egg/md2cf/main.py' │
│ │
│ /usr/local/lib/python3.7/site-packages/md2cf-2.1.0-py3.7.egg/md2cf/upsert.py:93 in upsert_page │
│ │
│ [Errno 20] Not a directory: │
│ '/usr/local/lib/python3.7/site-packages/md2cf-2.1.0-py3.7.egg/md2cf/upsert.py' │
│ │
│ /usr/local/lib/python3.7/site-packages/md2cf-2.1.0-py3.7.egg/md2cf/api.py:176 in create_page │
│ │
│ [Errno 20] Not a directory: │
│ '/usr/local/lib/python3.7/site-packages/md2cf-2.1.0-py3.7.egg/md2cf/api.py' │
│ │
│ /usr/local/lib/python3.7/site-packages/md2cf-2.1.0-py3.7.egg/md2cf/api.py:73 in _post │
│ │
│ [Errno 20] Not a directory: │
│ '/usr/local/lib/python3.7/site-packages/md2cf-2.1.0-py3.7.egg/md2cf/api.py' │
│ │
│ /usr/local/lib/python3.7/site-packages/md2cf-2.1.0-py3.7.egg/md2cf/api.py:66 in _request │
│ │
│ [Errno 20] Not a directory: │
│ '/usr/local/lib/python3.7/site-packages/md2cf-2.1.0-py3.7.egg/md2cf/api.py' │
│ │
│ /usr/local/lib/python3.7/site-packages/requests-2.28.2-py3.7.egg/requests/models.py:1021 in │
│ raise_for_status │
│ │
│ 1018 │ │ │ ) │
│ 1019 │ │ │
│ 1020 │ │ if http_error_msg: │
│ ❱ 1021 │ │ │ raise HTTPError(http_error_msg, response=self) │
│ 1022 │ │
│ 1023 │ def close(self): │
│ 1024 │ │ """Releases the connection back to the pool. Once this method has been │
│ │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │ http_error_msg = '400 Client Error: for url: │ │
│ │ https://oneconfluence.verizon.com/rest/api/content' │ │
│ │ reason = '' │ │
│ │ self = <Response [400]> │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
HTTPError: 400 Client Error: for url: https://oneconfluence.verizon.com/rest/api/content

📄️ deploy_openstack ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ ❌ Error while uploading

Total progress ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00

400 Client Error: for url: https://oneconfluence.verizon.com/rest/api/content -
b'{"statusCode":400,"data":{"authorized":false,"valid":true,"allowedInReadOnlyMode":true,"errors":[],"successful":false},"message":"Error parsing xhtml: Unexpected character
'>' (code 62) expected '='\n at : [11,26]","reason":"Bad Request"}'`

@galund
Copy link

galund commented Jul 5, 2023

Heh, I forgot to give you the file name didn't I, it's md2cf/main.py where the error is usually caught, and where you have a page object that you can look at.

@harish0619
Copy link
Author

Upon debugging, found out the issue was with the character '>'. Seems the tool wasn't able to differentiate it as xhtml also has similar characters. Thank you for your inputs.

Maybe this is something that can be fixed in the mistune. You can close the issue and consider it resolved. Thanks once again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants