Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert ticket v3 HTML to JSON tickets #22

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Kuba314
Copy link

@Kuba314 Kuba314 commented Sep 22, 2024

Copy diogotcorreia/lidl-to-grocy's fix to a ticket v2 API change. The fix is to try the v2 API and if it fails use a v3 API which returns the ticket formatted as HTML. The JSON data is then constructed from that HTML.

Credit to https://github.com/diogotcorreia/lidl-to-grocy/blob/master/lidl/src/html_receipt.rs.

Closes #20

This is a draft, because this implementation has not been tested much. Feel free to test it yourself.

@Kuba314
Copy link
Author

Kuba314 commented Sep 22, 2024

I have noticed that weighted items (at least in cs/cz) do not work currently. @diogotcorreia This should be the case for your implementation at https://github.com/diogotcorreia/lidl-to-grocy/blob/master/lidl/src/html_receipt.rs as well.

I'm seeing the following entries in the HTML for a weighted item (apples):

<span id="purchase_list_line_21">Jablka Gala                        21,11 B</span>
<span id="purchase_list_line_22">N   0,922 kg   x 22,90  Kč/kg </span>
<span id="purchase_list_line_23">PT: 0,002 kg                              </span>

There's no easy way to parse this AFAICS.

One way would be to store the first non-classed line ending in B as a weighted item with the number being originalAmount, then expect a line starting with N to optionally parse the weight, but technically the originalAmount should be sufficient.

@Kuba314
Copy link
Author

Kuba314 commented Sep 22, 2024

I implemented the method for the weighted items that I mentioned in my last comment. I believe this PR is ready for a review.

@Kuba314 Kuba314 marked this pull request as ready for review September 22, 2024 23:56
@diogotcorreia
Copy link

@Kuba314 Does your receipt not have data- attributes?
This is a sample receipt from my account (in SE): https://github.com/diogotcorreia/lidl-to-grocy/blob/a7856fb5d7369f827d627978bac3460ceab9e0fa/lidl/test/receipt.html
I'm using those data attributes to parse whether an item is weighted, depending if the amount has a decimal separator or not (https://github.com/diogotcorreia/lidl-to-grocy/blob/a7856fb5d7369f827d627978bac3460ceab9e0fa/lidl/src/html_receipt.rs#L52). I think your strategy with detecting the B will not work for everyone since that's country-dependent (it's the VAT type AFAICT).

May I ask, what is the value of type in the v3 ticket response? I took a look inside the APK and there seem to be three possible values, HTML, HTML_OLD (something like that?) and NATIVE.
My guess is that NATIVE would result in the same JSON as the v2 API (because again, in the APK that is still a field in the response), but I have HTML in my receipts (even in the ones that still work on v2).

@Kuba314
Copy link
Author

Kuba314 commented Sep 23, 2024

Does your receipt not have data- attributes?

@diogotcorreia It does, just not for weighted items. These 3 lines are the only information that I have in the receipt. I'm not seeing what you're seeing. I only see data-art-quantity when it's a whole number (not weighted, just N amount of the same product).

I think your strategy with detecting the B will not work for everyone since that's country-dependent (it's the VAT type AFAICT).

Yeah... this is very possible. Maybe detecting [A-Z] would be better...

May I ask, what is the value of type in the v3 ticket response?

Do you mean ticketType? That's set to HTML, same as you. I fear that the CZ API for lidl is somehow worse than SE, or it's this exact store's issue or I don't know.

@diogotcorreia
Copy link

These 3 lines are the only information that I have in the receipt.

@Kuba314 That's unfortunate, I'm not sure how you would fix it then, since you also don't have an article number either :/

@Kuba314
Copy link
Author

Kuba314 commented Sep 24, 2024

I have changed the VAT line detection from what is essentially B$ to [A-Z]$. I hope that this works for everyone. I'm not aware of all the possible VAT types and what their values could be, but I assume it's always an uppercase letter.

@vilmosnagy
Copy link

vilmosnagy commented Oct 1, 2024

FYKI: Hungarian Lidl Plus API broke as well a couple of days ago (some days after 09.21), but this PR solves the issue for me.

Thanks @Kuba314

@salvadorbs
Copy link

So no barcode, no match with openfoodfacts?

@diogotcorreia
Copy link

@salvadorbs unfortunately yeah, there's no way to get the barcode now :/

Copy link

@Fanis10V Fanis10V left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to get receipts that didn't have discounts. I just started using this so can't tell for sure if everything else is working fine. Will continue testing. So far everything else looks good! Thanks!

}
)
elif node.attrib["class"] == "discount":
discount = abs(parse_float(node.text.split()[-1]))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This throws an IndexError when I'm running it because some of the span elements contain just white text so the node.text.split() returns an empty list.

Here's the HTML on my receipt:

<span id="purchase_list_line_3" class="discount css_bold" data-promotion-id="promo_id">   Coupon Plus reward</span>
<span id="purchase_list_line_3" class="discount" data-promotion-id="promo_id">      </span>
<span id="purchase_list_line_3" class="discount" data-promotion-id="promo_id">     </span>
<span id="purchase_list_line_3" class="discount css_bold" data-promotion-id="promo_id">-0.69</span>```

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Man, they just can't be consistent... Thank you for providing another data point with which we can figure out all the formats they use for this! I'll try to implement the format you provided once I have time to actually do this though... You can always suggest changes and I'll be happy to use them of course.

For now I'm thinking a regex searching for something like -\d+[\.,]\d{2}$ would be best.

Btw shouldn't the code currently fail in parsing the first line's reward word as float instead of the whitespace-split-index-error that you're describing?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No rush to implement this. I would have done it myself but I was hesitant to suggest changes cause I'm still trying to understand what's happening. I will for sure once I get more familiar with the project. :)

I think because the class of the first line is discount ccs_bold instead of just discount it's not parsed at all.

So, does the HTML differ from country to country? Or does it depend on the coupon you use and whether it's a percentage/flat discount? That's the only receipt with a coupon I have so that's my only data point :/

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was hesitant to suggest changes cause I'm still trying to understand what's happening. I will for sure once I get more familiar with the project.

No worries :) This is not even that tied to this specific project, but more to the actual lidl API since AFAIK there's no public documentation for it and people just somehow reverse engineered it.

I think because the class of the first line is discount ccs_bold instead of just discount it's not parsed at all.

Right, of course, missed that.

So, does the HTML differ from country to country? Or does it depend on the coupon you use and whether it's a percentage/flat discount?

There's definitely some difference for some reason. See diogotcorreia/lidl-to-grocy's lidl/test/receipt.html. It uses I think the same format as what I saw and implemented in this PR. It's weird that your receipt is different, but we'll probably have to implement a common parsing for all possible formats. Currently I'm blocked on #23 though so I can't verify if anything changed recently in my receipts, but in my lidl-plus android app I don't see any discounts as bold as you probably would.


We could probably do something like this to support both formats:

if ...:
    ...
elif {"discount", "css_bold"}.issubset(node.attrib["class"].split()) and try_parse_float(node.text):
    ...
elif node.attrib["class"] == "discount":
    ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

HTTP 500 on ticket API
5 participants