Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix the data pull problem when use alluxio as fs. #23815

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

awkwardd
Copy link

Description

When use alluxio as hive fs,cant get data from alluxio.
I finaly found that the AlluxioIterator.java didn't generate correct file location as alluxio://192...../dir/filename but alluxio:///dir/filename, so I modify the code.It works for me.Fix the issue#23778

Copy link

cla-bot bot commented Oct 17, 2024

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: tanyajun.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email email@example.com
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

@raunaqmorarka
Copy link
Member

Could you add a test for this problem ?

Comment on lines 56 to 58
String filePath = fileStatus.getPath();
String fileName = filePath.substring(filePath.lastIndexOf('/') + 1);
Location location = pathLocation.appendSuffix("/" + fileName);
Copy link
Contributor

@JiamingMai JiamingMai Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change may lead to failure of some other cases and some tests are not able to pass. But the bug does exist. Still need to find a way to fix it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be a more reasonable solution?

Copy link
Contributor

@JiamingMai JiamingMai Oct 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following tests failed to pass with this change, maybe take a look on these tests? https://github.com/trinodb/trino/actions/runs/11380756207/job/31660701721?pr=23815

Error:  Tests run: 24, Failures: 3, Errors: 0, Skipped: 4, Time elapsed: 62.74 s <<< FAILURE! -- in io.trino.filesystem.alluxio.TestAlluxioFileSystem
2024-10-17T02:10:31.425-0600	INFO	main	io.trino.testing.containers.junit.ReportLeakedContainers	ReportLeakedContainers disabled
Error:  io.trino.filesystem.alluxio.TestAlluxioFileSystem.testPaths -- Time elapsed: 0.051 s <<< FAILURE!
org.opentest4j.AssertionFailedError: 

expected: 35L
 but was: 34L
	at io.trino.filesystem.AbstractTestTrinoFileSystem.listPath(AbstractTestTrinoFileSystem.java:1370)
	at io.trino.filesystem.AbstractTestTrinoFileSystem.testPathHierarchical(AbstractTestTrinoFileSystem.java:763)
	at io.trino.filesystem.AbstractTestTrinoFileSystem.testPaths(AbstractTestTrinoFileSystem.java:739)
	at java.base/java.lang.reflect.Method.invoke(Method.java:580)
	at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:507)
	at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1458)
	at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:2034)
	at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:189)

Error:  io.trino.filesystem.alluxio.TestAlluxioFileSystem.testListFiles -- Time elapsed: 0.171 s <<< FAILURE!
org.opentest4j.AssertionFailedError: 

expected: 46L
 but was: 59L
	at io.trino.filesystem.AbstractTestTrinoFileSystem.listPath(AbstractTestTrinoFileSystem.java:1370)
	at io.trino.filesystem.AbstractTestTrinoFileSystem.testListFiles(AbstractTestTrinoFileSystem.java:1070)
	at io.trino.filesystem.AbstractTestTrinoFileSystem.testListFiles(AbstractTestTrinoFileSystem.java:1001)
	at java.base/java.lang.reflect.Method.invoke(Method.java:580)
	at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:507)
	at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1458)
	at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:2034)
	at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:189)

Error:  io.trino.filesystem.alluxio.TestAlluxioFileSystem.testListLexicographicalOrder -- Time elapsed: 1.436 s <<< FAILURE!
org.opentest4j.AssertionFailedError: 

expected: 44L
 but was: 43L
	at io.trino.filesystem.AbstractTestTrinoFileSystem.listPath(AbstractTestTrinoFileSystem.java:1370)
	at io.trino.filesystem.AbstractTestTrinoFileSystem.testListLexicographicalOrder(AbstractTestTrinoFileSystem.java:1207)
	at java.base/java.lang.reflect.Method.invoke(Method.java:580)
	at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:507)
	at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1458)
	at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:2034)
	at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:189)

[INFO] 
[INFO] Results:
[INFO] 
Error:  Failures: 
Error:    TestAlluxioFileSystem>AbstractTestTrinoFileSystem.testListFiles:1001->AbstractTestTrinoFileSystem.testListFiles:1070->AbstractTestTrinoFileSystem.listPath:1370 
expected: 46L
 but was: 59L
Error:    TestAlluxioFileSystem>AbstractTestTrinoFileSystem.testListLexicographicalOrder:1207->AbstractTestTrinoFileSystem.listPath:1370 
expected: 44L
 but was: 43L
Error:    TestAlluxioFileSystem>AbstractTestTrinoFileSystem.testPaths:739->AbstractTestTrinoFileSystem.testPathHierarchical:763->AbstractTestTrinoFileSystem.listPath:1370 
expected: 35L
 but was: 34L
[INFO] 
Error:  Tests run: 27, Failures: 3, Errors: 0, Skipped: 4

Copy link
Contributor

@JiamingMai JiamingMai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@awkwardd Add the following extractSchema method to AlluxioUtils, and update the next() method in AlluxioFileIterator can pass the test successfully. Please help update the code in this PR.

    public static Location extractSchema(Location location)
    {
        String path = location.toString();
        if (path.startsWith("alluxio://")) {
            int index = path.indexOf("/", "alluxio://".length());
            if (index != -1) {
                String schema = path.substring(0, index);
                return Location.of(schema + "/");
            }
            else {
                return location;
            }
        }
        return location;
    }
   public FileEntry next()
            throws IOException
    {
        if (!hasNext()) {
            return null;
        }
        URIStatus fileStatus = files.next();
        String path = fileStatus.getPath();
        if (path.startsWith("/")) {
            path = path.substring(1);
        }
        Location schema = extractSchema(dirLocation);
        Location fileLocation = schema.appendSuffix(path);
        return new FileEntry(
                fileLocation,
                fileStatus.getLength(),
                Instant.ofEpochMilli(fileStatus.getLastModificationTimeMs()),
                Optional.empty());
    }

Copy link

cla-bot bot commented Oct 31, 2024

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: tanyajun.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email email@example.com
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

@JiamingMai
Copy link
Contributor

@awkwardd you may also need to sign the CLA https://trino.io/development/process#contribution-process

@awkwardd
Copy link
Author

awkwardd commented Nov 1, 2024

@awkwardd you may also need to sign the CLA https://trino.io/development/process#contribution-process

I have signed the cla file and snet it last night.

@ebyhr
Copy link
Member

ebyhr commented Nov 3, 2024

@cla-bot check

Copy link

cla-bot bot commented Nov 3, 2024

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: tanyajun.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email email@example.com
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

Copy link

cla-bot bot commented Nov 3, 2024

The cla-bot has been summoned, and re-checked this pull request!

@ebyhr
Copy link
Member

ebyhr commented Nov 3, 2024

@awkwardd Please take a look at #23815 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cla-signed
Development

Successfully merging this pull request may close these issues.

4 participants