Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

W3C SPARQL 1.0 i18n normalization-02 test case fails #203

Open
donpellegrino opened this issue Nov 13, 2023 · 3 comments
Open

W3C SPARQL 1.0 i18n normalization-02 test case fails #203

donpellegrino opened this issue Nov 13, 2023 · 3 comments

Comments

@donpellegrino
Copy link

See https://lists.w3.org/Archives/Public/public-rdf-dawg/2005JulSep/0096 for details on the test case. Note the emphases on the presence of "." and ".." in the URLs. Test case definition:

:normalization-2 rdf:type mf:QueryEvaluationTest ;
    mf:name    "normalization-02" ;
    dawgt:approval dawgt:Approved ;
    dawgt:approvedBy <http://lists.w3.org/Archives/Public/public-rdf-dawg/2007JulSep/att-0047/31-dawg-minutes> ;
    rdfs:comment
        "Example 1 from http://lists.w3.org/Archives/Public/public-rdf-dawg/2005JulSep/0096" ;
    mf:action
        [ qt:data   <normalization-02.ttl> ;
          qt:query  <normalization-02.rq> ] ;
    mf:result  <normalization-02-results.ttl>
    .

Defined in https://github.com/w3c/rdf-tests/blob/main/sparql/sparql10/i18n/manifest.ttl

Using hdt-c++:

user@dunx4:~/projects/oxigraph/oxhdt-sys/tests/resources/rdf-tests/sparql/sparql10/i18n$ rdf2hdt /home/user/projects/oxigraph/testsuite/rdf-tests/sparql/sparql10/i18n/normalization-02.ttl normalization-02.hdt
user@dunx4:~/projects/oxigraph/oxhdt-sys/tests/resources/rdf-tests/sparql/sparql10/i18n$ hdtSearch normalization-02.hdt
Predicate Bitmap in 59 usp: 0 % / 14.86 %
Count predicates in 8 usferences: 0 % / 16.075 %
Count Objects in 4 us Max was: 1: 0 % / 34.3 %
Bitmap in 14 us bitmap: 0 % / 45.64 %
Bitmap bits: 2 Ones: 2
Object references in 50 usces: 0 % / 48.475 %
Sort lists in 8 usblists: 0 % / 68.32 %
Index generated in 196 us
>> ? ? ?                                          %
http://example/vocab#s1 http://example/vocab#p example://a/b/c/%7Bfoo%7D#xyz
http://example/vocab#s2 http://example/vocab#p eXAMPLE://a/./b/../b/%63/%7bfoo%7d#xyz
2 results in 72 us

Using hdt-java:

user@dunx4:~/projects/oxigraph/oxhdt-sys/tests/resources/rdf-tests/sparql/sparql10/i18n$ rdf2hdt.sh /home/user/projects/
oxigraph/testsuite/rdf-tests/sparql/sparql10/i18n/normalization-02.ttl normalization-02.hdt
[WARN] base uri not specified, using 'file:///home/user/projects/oxigraph/testsuite/rdf-tests/sparql/sparql10/i18n/normalization-02.ttl'
[INFO] Converting /home/user/projects/oxigraph/testsuite/rdf-tests/sparql/sparql10/i18n/normalization-02.ttl to normalization-02.hdt as TURTLE
[line: 7, col: 8 ] Not advised IRI: <eXAMPLE://a/b/%63/%7bfoo%7d#xyz> Code: 11/LOWERCASE_PREFERRED in SCHEME: lowercase is preferred in this component
File converted in ..... 517 ms 227 us
Total Triples ......... 2
Different subjects .... 2
Different predicates .. 1
Different objects ..... 2
Common Subject/Object . 0
HDT saved to file in .. 3 ms 2 us
user@dunx4:~/projects/oxigraph/oxhdt-sys/tests/resources/rdf-tests/sparql/sparql10/i18n$ hdtSearch.sh normalization-02.h
dt
Count Objects in 25 us Max was: 1
Bitmap in 106 us
Object references in 11 ms 581 us
Sort object sublists in 17 us
Count predicates in 22 us
Index generated in 14 ms 129 us
[main] . [          ] 0.00  Creating Predicate bitmap 0 / 2
[main] . [          ] 0.00  Generating predicate references
Count predicates in 216 us
Index generated and saved in 62 ms 416 us
>> ? ? ?
Query: |?| |?| |?|
http://example/vocab#s1 http://example/vocab#p example://a/b/c/%7Bfoo%7D#xyz
http://example/vocab#s2 http://example/vocab#p eXAMPLE://a/b/%63/%7bfoo%7d#xyz
Iterated 2 triples in 10 ms 428 us
>>

Note that this test case is referenced multiple times in the hdt-java codebase (hdt-jena/testing/DAWG-Final/i18n and hdt-jena/testing/DAWG). However, it was unclear to me where these tests are being run to check their status.

Test system

hdt-c++

rdf2hdt -V
v1.1.2

hdt-java: hdt-java-package-3.0.10

java -version
openjdk version "11.0.20.1" 2023-08-24
OpenJDK Runtime Environment (build 11.0.20.1+1-post-Ubuntu-0ubuntu122.04)
OpenJDK 64-Bit Server VM (build 11.0.20.1+1-post-Ubuntu-0ubuntu122.04, mixed mode, sharing)
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 22.04.3 LTS
Release:        22.04
Codename:       jammy
@ate47
Copy link
Contributor

ate47 commented Nov 13, 2023

I think it's due to how RIOT (Jena's parser) is handling the TURTLE:

    @Test
    public void i18nTest() throws IOException, ParserException {
        // https://github.com/rdfhdt/hdt-java/issues/203

        String data = "@prefix : <http://example/vocab#>.\n" +
                "\n" +
                "  :s1 :p <example://a/b/c/%7Bfoo%7D#xyz>.\n" +
                "  :s2 :p <eXAMPLE://a/./b/../b/%63/%7bfoo%7d#xyz>.\n";

        try (InputStream is = new ByteArrayInputStream(data.getBytes(ByteStringUtil.STRING_ENCODING))) {

            RDFParser build = RDFParser.source(is).lang(Lang.TURTLE).build();

            build.parse(new StreamRDF() {

                @Override
                public void triple(Triple triple) {
                    System.out.println(triple);
                }
                @Override
                public void start() {}
                @Override
                public void quad(Quad quad) {}
                @Override
                public void base(String s) {}
                @Override
                public void prefix(String s, String s1) { }
                @Override
                public void finish() {}
            });

        }
    }

returns

http://example/vocab#s1 @http://example/vocab#p example://a/b/c/%7Bfoo%7D#xyz
http://example/vocab#s2 @http://example/vocab#p eXAMPLE://a/b/%63/%7bfoo%7d#xyz

The parser itself is configured in the org.rdfhdt.hdt.rdf.parsers.RDFParserRIOT#parse() method if you want to get a look.

@donpellegrino
Copy link
Author

Should this issue be submitted upstream against Jena's RIOT instead of here in hdt-java?

@ate47
Copy link
Contributor

ate47 commented Nov 14, 2023

I don’t know, it might be linked with a missing configuration from our side. It would be better to check it before

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants