Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FR: support units other than nanoseconds #63

Open
tswast opened this issue Jan 27, 2022 · 1 comment
Open

FR: support units other than nanoseconds #63

tswast opened this issue Jan 27, 2022 · 1 comment
Assignees
Labels
api: bigquery Issues related to the googleapis/python-db-dtypes-pandas API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@tswast
Copy link
Collaborator

tswast commented Jan 27, 2022

Currently, one can construct a DateArray with minimal overhead if the dtype is datetime64[ns], but if it is any other datetime64 dtype, it fails. I suspect TimeArray suffers a similar problem.

    @pytest.fixture
    def data():
>       return DateArray(
            numpy.arange(
                datetime.datetime(1900, 1, 1),
                datetime.datetime(2099, 12, 31),
                datetime.timedelta(days=13),
                # dtype="datetime64[ns]"
            )
        )

tests/unit/test_date_compliance.py:35: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
db_dtypes/core.py:49: in __init__
    values = self.__ndarray(values)
db_dtypes/core.py:57: in __ndarray
    return numpy.array([cls._datetime(scalar) for scalar in scalars], "M8[ns]",)
db_dtypes/core.py:57: in <listcomp>
    return numpy.array([cls._datetime(scalar) for scalar in scalars], "M8[ns]",)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

scalar = numpy.datetime64('1900-01-01T00:00:00.000000'), match_fn = <built-in method match of re.Pattern object at 0x1219b4d40>

    @staticmethod
    def _datetime(
        scalar,
        match_fn=re.compile(r"\s*(?P<year>\d+)-(?P<month>\d+)-(?P<day>\d+)\s*$").match,
    ) -> Optional[numpy.datetime64]:
        # Convert pyarrow values to datetime.date.
        if isinstance(scalar, (pyarrow.Date32Scalar, pyarrow.Date64Scalar)):
            scalar = scalar.as_py()
    
        if pandas.isna(scalar):
            return None
        elif isinstance(scalar, datetime.date):
            return pandas.Timestamp(
                year=scalar.year, month=scalar.month, day=scalar.day
            ).to_datetime64()
        elif isinstance(scalar, str):
            match = match_fn(scalar)
            if not match:
                raise ValueError(f"Bad date string: {repr(scalar)}")
            year = int(match.group("year"))
            month = int(match.group("month"))
            day = int(match.group("day"))
            return pandas.Timestamp(year=year, month=month, day=day).to_datetime64()
        else:
>           raise TypeError("Invalid value type", scalar)
E           TypeError: ('Invalid value type', numpy.datetime64('1900-01-01T00:00:00.000000'))
@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-db-dtypes-pandas API. label Jan 27, 2022
@tswast tswast added the type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. label Jan 27, 2022
@tswast
Copy link
Collaborator Author

tswast commented Mar 30, 2023

Looks like we might get some of this for "free" with pandas 2.0. Somehow the actual datetime.date values are equal, even with extreme values post-pandas 2.0. See: googleapis/python-bigquery-pandas#627

Edit: Not quite free. We need to be careful that the numpy.datetime64 objects we create are all of uniform units. Might have to refactor, as _datetime will need to get some units passed to it.

@tswast tswast changed the title FR: support inputs of datetime64 dtype with units other than nanoseconds FR: support units other than nanoseconds Mar 30, 2023
@chelsea-lin chelsea-lin self-assigned this Apr 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-db-dtypes-pandas API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

No branches or pull requests

2 participants