Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RTL, Arabic Text Marker not rendering properly in PDF #354

Closed
opoudjis opened this issue Dec 13, 2021 · 4 comments
Closed

RTL, Arabic Text Marker not rendering properly in PDF #354

opoudjis opened this issue Dec 13, 2021 · 4 comments
Assignees
Labels
bug Something isn't working

Comments

@opoudjis
Copy link
Contributor

In #166, I've decided to use Right to Left Text Marker and Arabic Text Marker, to delimit runs of inline Hebrew or Arabic script text within left to right text. I've chosen to use the Unicode characters, rather than markup, to yield a solution compatible with Word, which uses HTML 4 (and thus does not know about directionality markup.)

This works in HTML and DOC. It is not working in PDF, but it is not working for a roundabout reason.

I can go to bidirectional markup instead of control markers if I need to (<bdi> or <xxx direction=rtl>), but I'd like to confirm that I need to, because I'll need to switch that solution out for Word. I am also introducing explicit language and script attributes for these interpolations in IEV, so that you can make sensible font choices.

Attaching sample document. Note also from the attached that <preferred> should NOT be prefixed with term numbers in PDF; only the first instance of <preferred> should do so, and I am already doing that in the Presentation XML markup.

hebrew-arabic.zip

@opoudjis opoudjis added the bug Something isn't working label Dec 13, 2021
@Intelligent2013
Copy link
Contributor

@opoudjis DOC and HTML look differently in my environment (Win7 64-bit, Microsoft Office 2019, Arial font 5.22 - arial.ttf 774236 bytes):
image

HTML:
image

@Intelligent2013
Copy link
Contributor

I've tried the examples from https://unicode.org/L2/L2011/11432r-n4180r-alm-form.pdf:

<hr/>
<div>1. النقطة الأولي</div>
<div>&#x061C;1. النقطة الأولي</div>
<hr/>
<div>4 - 1 = 3</div>
<div>&#x061C;4 - 1 = 3</div>
<hr/>
<div>14-6-2011</div>
<div>&#x061C;14-6-2011</div>
<hr/>

and they look differently:
HTML (looks ok):
image

DOC (quad char, first example renders wrong, and space missing in second):
image

@Intelligent2013
Copy link
Contributor

Done:

  • IEC xslt updated
  • @language processing added
  • mn2pdf updated for replacing U+61C (Arabic Text Marker) to U+200B (Zero-width space)

PDF renders so:
image

Examples from https://unicode.org/L2/L2011/11432r-n4180r-alm-form.pdf:
image

@opoudjis
Copy link
Contributor Author

Thank you for updating that. I may end up having to replace U+61C to RLM in Word too; the character was added in 2011, and we are using DOC, which dates from before 2008.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants