Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AttributeError: module 'pyarrow.lib' has no attribute 'ListViewType' #6985

Closed
firmai opened this issue Jun 19, 2024 · 11 comments
Closed

AttributeError: module 'pyarrow.lib' has no attribute 'ListViewType' #6985

firmai opened this issue Jun 19, 2024 · 11 comments

Comments

@firmai
Copy link

firmai commented Jun 19, 2024

Describe the bug

I have been struggling with this for two days, any help would be appreciated. Python 3.10

from setfit import SetFitModel
from huggingface_hub import login

access_token_read = "cccxxxccc"

# Authenticate with the Hugging Face Hub
login(token=access_token_read)

# Load the models from the Hugging Face Hub
trainer_relv = SetFitModel.from_pretrained("snowdere/trainer_relevance")
trainer_trust = SetFitModel.from_pretrained("snowdere/trainer_trust")
trainer_sent = SetFitModel.from_pretrained("snowdere/trainer_sent")
trainer_topic = SetFitModel.from_pretrained("snowdere/trainer_topic")


---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[6], line 1
----> 1 from setfit import SetFitModel
      2 from huggingface_hub import login
      4 access_token_read = "ccsddsds"

File /opt/conda/lib/python3.10/site-packages/setfit/__init__.py:7
      4 import os
      5 import warnings
----> 7 from .data import get_templated_dataset, sample_dataset
      8 from .model_card import SetFitModelCardData
      9 from .modeling import SetFitHead, SetFitModel

File /opt/conda/lib/python3.10/site-packages/setfit/data.py:5
      3 import pandas as pd
      4 import torch
----> 5 from datasets import Dataset, DatasetDict, load_dataset
      6 from torch.utils.data import Dataset as TorchDataset
      8 from . import logging

File /opt/conda/lib/python3.10/site-packages/datasets/__init__.py:18
      1 # ruff: noqa
      2 # Copyright 2020 The HuggingFace Datasets Authors and the TensorFlow Datasets Authors.
      3 #
   (...)
     13 # See the License for the specific language governing permissions and
     14 # limitations under the License.
     16 __version__ = "2.19.0"
---> 18 from .arrow_dataset import Dataset
     19 from .arrow_reader import ReadInstruction
     20 from .builder import ArrowBasedBuilder, BeamBasedBuilder, BuilderConfig, DatasetBuilder, GeneratorBasedBuilder

File /opt/conda/lib/python3.10/site-packages/datasets/arrow_dataset.py:76
     73 from tqdm.contrib.concurrent import thread_map
     75 from . import config
---> 76 from .arrow_reader import ArrowReader
     77 from .arrow_writer import ArrowWriter, OptimizedTypedSequence
     78 from .data_files import sanitize_patterns

File /opt/conda/lib/python3.10/site-packages/datasets/arrow_reader.py:29
     26 from typing import TYPE_CHECKING, List, Optional, Union
     28 import pyarrow as pa
---> 29 import pyarrow.parquet as pq
     30 from tqdm.contrib.concurrent import thread_map
     32 from .download.download_config import DownloadConfig

File /opt/conda/lib/python3.10/site-packages/pyarrow/parquet/__init__.py:20
      1 # Licensed to the Apache Software Foundation (ASF) under one
      2 # or more contributor license agreements.  See the NOTICE file
      3 # distributed with this work for additional information
   (...)
     17 
     18 # flake8: noqa
---> 20 from .core import *

File /opt/conda/lib/python3.10/site-packages/pyarrow/parquet/core.py:33
     30 import pyarrow as pa
     32 try:
---> 33     import pyarrow._parquet as _parquet
     34 except ImportError as exc:
     35     raise ImportError(
     36         "The pyarrow installation is not built with support "
     37         f"for the Parquet file format ({str(exc)})"
     38     ) from None

File /opt/conda/lib/python3.10/site-packages/pyarrow/_parquet.pyx:1, in init pyarrow._parquet()

AttributeError: module 'pyarrow.lib' has no attribute 'ListViewType'

setfit: 1.0.3
transformers: 4.41.2
lingua-language-detector: 2.0.2
polars: 0.20.31
lightning: None
google-cloud-bigquery: 3.24.0
shapely: 2.0.4
pyarrow: 16.0.0

Steps to reproduce the bug

I have tried all version combinations for Dataset and Pyarrow, the all have the same error since a few days ago. This is accross multiple scripts I have.

Expected behavior

Just ron normally.

Environment info

3.10

@albertvillanova
Copy link
Member

albertvillanova commented Jun 19, 2024

Please note that the error is raised just at import:

import pyarrow.parquet as pq

Therefore it must be caused by some problem with your pyarrow installation. I would recommend you uninstall and install pyarrow again.

I also see that it seems you use conda to install pyarrow. Please note that pyarrow offers 3 different packages in conda-forge: https://arrow.apache.org/docs/python/install.html#using-conda

conda install -c conda-forge pyarrow

While the pyarrow conda-forge package is the right choice for most users, both a minimal and maximal variant of the package exist, either of which may be better for your use case. See Differences between conda-forge packages.

Please, make sure you install the right one: I guess it is either pyarrow (or pyarrow-all).

@NicoNicoNico123
Copy link

I have same issue, please downgrade pyarrow==15.0.2, it seem datasets library need to be fix

@albertvillanova
Copy link
Member

It is not a problem with the datasets library: we support latest version of pyarrow and our Continuous Integration tests are using pyarrow 16.1.0 without any problem.

The error reported here is raised when importing pyarrow.parquet:

---> 29 import pyarrow.parquet as pq
File /opt/conda/lib/python3.10/site-packages/pyarrow/parquet/__init__.py:20
      1 # Licensed to the Apache Software Foundation (ASF) under one
      2 # or more contributor license agreements.  See the NOTICE file
      3 # distributed with this work for additional information
   (...)
     17 
     18 # flake8: noqa
---> 20 from .core import *

File /opt/conda/lib/python3.10/site-packages/pyarrow/parquet/core.py:33
     30 import pyarrow as pa
     32 try:
---> 33     import pyarrow._parquet as _parquet
     34 except ImportError as exc:
     35     raise ImportError(
     36         "The pyarrow installation is not built with support "
     37         f"for the Parquet file format ({str(exc)})"
     38     ) from None

File /opt/conda/lib/python3.10/site-packages/pyarrow/_parquet.pyx:1, in init pyarrow._parquet()

AttributeError: module 'pyarrow.lib' has no attribute 'ListViewType'

This can only be explained if pyarrow was not properly installed.

If the user just installed pyarrow-core from conda-forge, then its parquet subpackage is not installed and cannot be imported. You can check pyarrow docs:

The pyarrow-core package includes the following functionality:
...
The pyarrow package adds the following:
...
Parquet (i.e., pyarrow.parquet)

@RenaLu
Copy link

RenaLu commented Jul 24, 2024

I'm still seeing the same issue on datasets version 2.20.0. I installed pyarrow version 17.0.0 with pip install. Downgrading to pyarrow==15.0.2 also did not resolve the issue.

@chenmoneygithub
Copy link

@RenaLu As of UTC time 07/27/2024 23:20:00, I hit the same issue and reinstalling pyarrow==15.0.2 resolved the issue for me. You may want to check if your pyarrow is successfully downgraded.

@Dev-iL
Copy link

Dev-iL commented Jul 29, 2024

I can confirm @albertvillanova's analysis & suggestion - pip uninstall pyarrow followed by pip install pyarrow solved it for me.

I suspect this is because pyarrow was initially installed as a pandas extra pandas[...,parquet,...], then pip-upgrading pyarrow resulted in the issue.

@RenaLu did you uninstall pyarrow between changing versions?

@eminemence
Copy link

eminemence commented Aug 1, 2024

After trying all the above combinations and failing, running the following in the notebook fixed the error!!
!conda install -c conda-forge -y datasets pyarrow libparquet
Note : Uninstall any existing dataset and pyarrow installations in the env before executing the above.

@xloem
Copy link

xloem commented Aug 19, 2024

If on colab, remember to restart the runtime so the new pyarrow is imported. I also upgraded pip which is recommended in pyarrow's installation instructions.

@neurafusionai
Copy link

fixed doing this: !pip install --upgrade datasets

!pip show pyarrow
!pip show datasets
!pip uninstall -y pyarrow
!pip install pyarrow --no-cache-dir
!pip install pyarrow
!pip install transformers
!pip install --upgrade datasets
!pip install datasets
! pip install pyarrow
! pip install pyarrow.parquet
!pip install transformers

Import necessary libraries

from datasets import load_dataset
import pyarrow.parquet as pq
import pyarrow.lib as lib
import pandas as pd
from transformers import AutoTokenizer, AutoModelForSequenceClassification, Trainer, TrainingArguments

@neurafusionai
Copy link

but now i cant run test, so i remove it, ERROR: Could not find a version that satisfies the requirement pyarrow.parquet (from versions: none)
ERROR: No matching distribution found for pyarrow.parquet will still running but will tell you this

@Sherry-zsy
Copy link

I have the same question right now, python3.12 and transformers4.44.2, I have not fixed it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants