Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fails whenArrowFile is used #1711

Open
zsaladin opened this issue Oct 10, 2024 · 5 comments
Open

Fails whenArrowFile is used #1711

zsaladin opened this issue Oct 10, 2024 · 5 comments

Comments

@zsaladin
Copy link

  1. Variable size is set
  2. But the size is bound method when ArrowFile is used
  3. Variable total is from the size

iterative/dvc#10583

@martindurant
Copy link
Member

Would you mind expanding the description a bit here? Are we talking about something like this:

--- a/fsspec/implementations/arrow.py
+++ b/fsspec/implementations/arrow.py
@@ -220,7 +220,6 @@ class ArrowFSWrapper(AbstractFileSystem):
         "readable",
         "writable",
         "close",
-        "size",
         "seekable",
     ],
 )
@@ -235,6 +234,10 @@ class ArrowFile(io.IOBase):
         self.blocksize = self.block_size = block_size
         self.kwargs = kwargs

+    @property
+    def size(self):
+        return self.stream.size
+
     def __enter__(self):

A test case showing the failure in pure fsspec code would be very helpful.

@zsaladin
Copy link
Author

@martindurant

Right.

"size" shouldn't be method. I attached test case to show the error.

tqdm callback tries to add the method and a float literal.
the method is <bound method NativeFile.size of <pyarrow.NativeFile closed=False own_file=False is_seekable=False is_writable=False is_readable=True>>

import fsspec
import pyarrow
import pyarrow.fs as pafs
from fsspec.callbacks import TqdmCallback

hdfs_host = "kt"

fs = fsspec.filesystem("hdfs", host=hdfs_host)

from_path = "/path/to/file"
to_path = "./tmp/test.tmp"

fs.get_file(from_path, to_path, callback=TqdmCallback())
Traceback (most recent call last):
  File "./home/user/repo/hello.py", line 13, in <module>
    fs.get_file(from_path, to_path, callback=TqdmCallback())
  File "./home/user/repo/.venv/lib/python3.12/site-packages/fsspec/implementations/arrow.py", line 210, in get_file
    super().get_file(rpath, lpath, **kwargs)
  File "./home/user/repo/.venv/lib/python3.12/site-packages/fsspec/spec.py", line 904, in get_file
    callback.set_size(getattr(f1, "size", None))
  File "./home/user/repo/.venv/lib/python3.12/site-packages/fsspec/callbacks.py", line 97, in set_size
    self.call()
  File "./home/user/repo/.venv/lib/python3.12/site-packages/fsspec/callbacks.py", line 311, in call
    self.tqdm = self._tqdm_cls(total=self.size, **self._tqdm_kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "./home/user/repo/.venv/lib/python3.12/site-packages/tqdm/std.py", line 1098, in __init__
    self.refresh(lock_args=self.lock_args)
  File "./home/user/repo/.venv/lib/python3.12/site-packages/tqdm/std.py", line 1347, in refresh
    self.display()
  File "./home/user/repo/.venv/lib/python3.12/site-packages/tqdm/std.py", line 1495, in display
    self.sp(self.__str__() if msg is None else msg)
            ^^^^^^^^^^^^^^
  File "./home/user/repo/.venv/lib/python3.12/site-packages/tqdm/std.py", line 1151, in __str__
    return self.format_meter(**self.format_dict)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "./home/user/repo/.venv/lib/python3.12/site-packages/tqdm/std.py", line 534, in format_meter
    if total and n >= (total + 0.5):  # allow float imprecision (#849)
                       ~~~~~~^~~~~
TypeError: unsupported operand type(s) for +: 'method' and 'float'
Exception ignored in: <function tqdm.__del__ at 0x7fffd8707380>
Traceback (most recent call last):
  File "./home/user/repo/.venv/lib/python3.12/site-packages/tqdm/std.py", line 1148, in __del__
    self.close()
  File "./home/user/repo/.venv/lib/python3.12/site-packages/tqdm/std.py", line 1277, in close
    if self.last_print_t < self.start_t + self.delay:
       ^^^^^^^^^^^^^^^^^
AttributeError: 'tqdm' object has no attribute 'last_print_t'

@martindurant
Copy link
Member

If my diff fixes the problem, would you like to put it in a PR with a test to show it does the right thing?

@zsaladin
Copy link
Author

@martindurant

Sure

Could you provide instructions for writing tests that includes test code location, test code file name etc?

@martindurant
Copy link
Member

fsspec.implementations.tests.test_arrow is the place you would do this. You can copy from any of the tests that include open().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants