Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing Ruff function-call-in-default-argument (B008) causes test collection errors #6945

Open
smokestacklightnin opened this issue Oct 27, 2024 · 1 comment
Assignees
Labels

Comments

@smokestacklightnin
Copy link
Contributor

If the bug is related to a specific library below, please raise an issue in the
respective repo directly: TFX

System information

  • Have I specified the code to reproduce the issue (Yes, No): Yes
  • Environment in which the code is executed (e.g., Local(Linux/MacOS/Windows),
    Interactive Notebook, Google Cloud, etc): Linux
  • TensorFlow version: 2.15.1
  • TFX Version: 1.15.1
  • Python version: 3.10
  • Python dependencies (from pip freeze output):
absl-py==1.4.0
aiohappyeyeballs==2.4.3
aiohttp==3.10.10
aiosignal==1.3.1
alembic==1.13.3
annotated-types==0.7.0
anyio==4.6.2.post1
apache-airflow==2.10.2
apache-airflow-providers-common-compat==1.2.1
apache-airflow-providers-common-io==1.4.2
apache-airflow-providers-common-sql==1.18.0
apache-airflow-providers-fab==1.4.1
apache-airflow-providers-ftp==3.11.1
apache-airflow-providers-http==4.13.1
apache-airflow-providers-imap==3.7.0
apache-airflow-providers-mysql==5.7.2
apache-airflow-providers-smtp==1.8.0
apache-airflow-providers-sqlite==3.9.0
apache-beam==2.60.0
apispec==6.7.0
argcomplete==3.5.1
argon2-cffi==23.1.0
argon2-cffi-bindings==21.2.0
array_record==0.5.1
arrow==1.3.0
asgiref==3.8.1
astunparse==1.6.3
async-lru==2.0.4
async-timeout==4.0.3
attrs==23.2.0
babel==2.16.0
backcall==0.2.0
backports.tarfile==1.2.0
beautifulsoup4==4.12.3
bleach==6.1.0
blinker==1.8.2
cachelib==0.9.0
cachetools==5.5.0
certifi==2024.8.30
cffi==1.17.1
charset-normalizer==3.4.0
chex==0.1.86
click==8.1.7
clickclick==20.10.2
cloudpickle==2.2.1
colorama==0.4.6
colorlog==6.8.2
comm==0.2.2
ConfigUpdater==3.2
connexion==2.14.2
crcmod==1.7
cron-descriptor==1.4.5
croniter==3.0.4
cryptography==43.0.3
debugpy==1.8.7
decorator==5.1.1
defusedxml==0.7.1
Deprecated==1.2.14
dill==0.3.1.1
dm-tree==0.1.8
dnspython==2.7.0
docker==7.1.0
docopt==0.6.2
docstring_parser==0.16
email_validator==2.2.0
etils==1.10.0
exceptiongroup==1.2.2
fastavro==1.9.7
fasteners==0.19
fastjsonschema==2.20.0
Flask==2.2.5
Flask-AppBuilder==4.5.0
Flask-Babel==2.0.0
Flask-Caching==2.3.0
Flask-JWT-Extended==4.6.0
Flask-Limiter==3.8.0
Flask-Login==0.6.3
Flask-Session==0.5.0
Flask-SQLAlchemy==2.5.1
Flask-WTF==1.2.2
flatbuffers==24.3.25
flax==0.8.4
fqdn==1.5.1
frozenlist==1.5.0
fsspec==2024.10.0
gast==0.6.0
google-api-core==2.21.0
google-api-python-client==1.12.11
google-apitools==0.5.31
google-auth==2.35.0
google-auth-httplib2==0.2.0
google-auth-oauthlib==1.2.1
google-cloud-aiplatform==1.70.0
google-cloud-bigquery==3.26.0
google-cloud-bigquery-storage==2.27.0
google-cloud-bigtable==2.26.0
google-cloud-core==2.4.1
google-cloud-datastore==2.20.1
google-cloud-dlp==3.25.0
google-cloud-language==2.15.0
google-cloud-pubsub==2.26.1
google-cloud-pubsublite==1.11.1
google-cloud-recommendations-ai==0.10.13
google-cloud-resource-manager==1.13.0
google-cloud-spanner==3.49.1
google-cloud-storage==2.18.2
google-cloud-videointelligence==2.14.0
google-cloud-vision==3.8.0
google-crc32c==1.6.0
google-pasta==0.2.0
google-re2==1.1.20240702
google-resumable-media==2.7.2
googleapis-common-protos==1.65.0
greenlet==3.1.1
grpc-google-iam-v1==0.13.1
grpc-interceptor==0.15.4
grpcio==1.65.5
grpcio-status==1.48.2
gunicorn==23.0.0
h11==0.14.0
h5py==3.12.1
hdfs==2.7.3
httpcore==1.0.6
httplib2==0.22.0
httpx==0.27.2
idna==3.10
immutabledict==4.2.0
importlib_metadata==8.4.0
importlib_resources==6.4.5
inflection==0.5.1
iniconfig==2.0.0
ipykernel==6.29.5
ipython==7.34.0
ipython-genutils==0.2.0
ipywidgets==7.8.5
isoduration==20.11.0
itsdangerous==2.2.0
jaraco.classes==3.4.0
jaraco.context==6.0.1
jaraco.functools==4.1.0
jax==0.4.23
jaxlib==0.4.23
jedi==0.19.1
jeepney==0.8.0
Jinja2==3.1.4
jmespath==1.0.1
joblib==1.4.2
json5==0.9.25
jsonpickle==3.3.0
jsonpointer==3.0.0
jsonschema==4.23.0
jsonschema-specifications==2024.10.1
jupyter-events==0.10.0
jupyter-lsp==2.2.5
jupyter_client==8.6.3
jupyter_core==5.7.2
jupyter_server==2.13.0
jupyter_server_terminals==0.5.3
jupyterlab==4.2.5
jupyterlab_pygments==0.3.0
jupyterlab_server==2.27.3
jupyterlab_widgets==1.1.11
keras==2.15.0
keras-tuner==1.4.7
keyring==25.4.1
keyrings.google-artifactregistry-auth==1.1.2
kfp==2.5.0
kfp-pipeline-spec==0.2.2
kfp-server-api==2.0.5
kt-legacy==1.0.5
kubernetes==26.1.0
lazy-object-proxy==1.10.0
libclang==18.1.1
limits==3.13.0
linkify-it-py==2.0.3
lockfile==0.12.2
lxml==5.3.0
Mako==1.3.6
Markdown==3.7
markdown-it-py==3.0.0
MarkupSafe==3.0.2
marshmallow==3.23.0
marshmallow-oneofschema==3.1.1
marshmallow-sqlalchemy==0.28.2
matplotlib-inline==0.1.7
mdit-py-plugins==0.4.2
mdurl==0.1.2
methodtools==0.4.7
mistune==3.0.2
ml-dtypes==0.3.2
ml-metadata==1.15.0
more-itertools==10.5.0
msgpack==1.1.0
multidict==6.1.0
mysql-connector-python==9.1.0
mysqlclient==2.2.5
nbclient==0.10.0
nbconvert==7.16.4
nbformat==5.10.4
nest-asyncio==1.6.0
nltk==3.9.1
notebook==7.2.2
notebook_shim==0.2.4
numpy==1.26.4
nvidia-cublas-cu12==12.2.5.6
nvidia-cuda-cupti-cu12==12.2.142
nvidia-cuda-nvcc-cu12==12.2.140
nvidia-cuda-nvrtc-cu12==12.2.140
nvidia-cuda-runtime-cu12==12.2.140
nvidia-cudnn-cu12==8.9.4.25
nvidia-cufft-cu12==11.0.8.103
nvidia-curand-cu12==10.3.3.141
nvidia-cusolver-cu12==11.5.2.141
nvidia-cusparse-cu12==12.1.2.141
nvidia-nccl-cu12==2.16.5
nvidia-nvjitlink-cu12==12.2.140
oauth2client==4.1.3
oauthlib==3.2.2
objsize==0.7.0
opentelemetry-api==1.27.0
opentelemetry-exporter-otlp==1.27.0
opentelemetry-exporter-otlp-proto-common==1.27.0
opentelemetry-exporter-otlp-proto-grpc==1.27.0
opentelemetry-exporter-otlp-proto-http==1.27.0
opentelemetry-proto==1.27.0
opentelemetry-sdk==1.27.0
opentelemetry-semantic-conventions==0.48b0
opt_einsum==3.4.0
optax==0.2.2
orbax-checkpoint==0.5.16
ordered-set==4.1.0
orjson==3.10.10
overrides==7.7.0
packaging==23.2
pandas==1.5.3
pandocfilters==1.5.1
parso==0.8.4
pathspec==0.12.1
pendulum==3.0.0
pexpect==4.9.0
pickleshare==0.7.5
pillow==11.0.0
platformdirs==4.3.6
pluggy==1.5.0
portalocker==2.10.1
portpicker==1.6.0
presto-python-client==0.7.0
prison==0.2.1
prometheus_client==0.21.0
promise==2.3
prompt_toolkit==3.0.48
propcache==0.2.0
proto-plus==1.25.0
protobuf==3.20.3
psutil==6.1.0
ptyprocess==0.7.0
pyarrow==10.0.1
pyarrow-hotfix==0.6
pyasn1==0.6.1
pyasn1_modules==0.4.1
pybind11==2.13.6
pycparser==2.22
pydantic==2.9.2
pydantic_core==2.23.4
pydot==1.4.2
pyfarmhash==0.3.2
Pygments==2.18.0
PyJWT==2.9.0
pylint-to-ruff==0.3.0
pymongo==4.10.1
pyparsing==3.2.0
pytest==8.0.0
pytest-subtests==0.13.1
python-daemon==3.1.0
python-dateutil==2.9.0.post0
python-json-logger==2.0.7
python-nvd3==0.16.0
python-slugify==8.0.4
pytz==2024.2
PyYAML==6.0.2
pyzmq==26.2.0
redis==5.2.0
referencing==0.35.1
regex==2024.9.11
requests==2.32.3
requests-oauthlib==2.0.0
requests-toolbelt==0.10.1
rfc3339-validator==0.1.4
rfc3986-validator==0.1.1
rich==13.9.3
rich-argparse==1.5.2
rouge-score==0.1.2
rpds-py==0.20.0
rsa==4.9
sacrebleu==2.4.3
scikit-learn==1.5.1
scipy==1.12.0
SecretStorage==3.3.3
Send2Trash==1.8.3
setproctitle==1.3.3
shapely==2.0.6
simple-parsing==0.1.6
six==1.16.0
slackclient==2.9.4
sniffio==1.3.1
sounddevice==0.5.1
soupsieve==2.6
SQLAlchemy==1.4.54
SQLAlchemy-JSONField==1.0.2
SQLAlchemy-Utils==0.41.2
sqlparse==0.5.1
struct2tensor==0.46.0
tabulate==0.9.0
tenacity==9.0.0
tensorboard==2.15.2
tensorboard-data-server==0.7.2
tensorflow==2.15.1
tensorflow-cloud==0.1.16
tensorflow-data-validation==1.15.1
tensorflow-datasets==4.9.6
tensorflow-decision-forests==1.8.1
tensorflow-estimator==2.15.0
tensorflow-hub==0.15.0
tensorflow-io-gcs-filesystem==0.37.1
tensorflow-metadata==1.15.0
tensorflow-ranking==0.5.5
tensorflow-serving-api==2.15.1
tensorflow-text==2.15.0
tensorflow-transform==1.15.0
tensorflow_model_analysis==0.46.0
tensorflowjs==4.17.0
tensorstore==0.1.67
termcolor==2.5.0
terminado==0.18.1
text-unidecode==1.3
tflite-support==0.4.4
tfx-bsl==1.15.1
-e git+ssh://git@github.com/smokestacklightnin/tfx.git@4012b82824b1203783b317fe864ace84bcb6ec31#egg=tfx_dev
threadpoolctl==3.5.0
time-machine==2.16.0
tinycss2==1.4.0
toml==0.10.2
tomli==2.0.2
toolz==1.0.0
tornado==6.4.1
tqdm==4.66.5
traitlets==5.14.3
types-python-dateutil==2.9.0.20241003
typing_extensions==4.12.2
tzdata==2024.2
uc-micro-py==1.0.3
unicodecsv==0.14.1
universal_pathlib==0.2.5
uri-template==1.3.0
uritemplate==3.0.1
urllib3==1.26.20
wcwidth==0.2.13
webcolors==24.8.0
webencodings==0.5.1
websocket-client==0.59.0
Werkzeug==2.2.3
widgetsnbextension==3.6.10
wirerope==0.4.7
wrapt==1.14.1
WTForms==3.2.1
wurlitzer==3.1.1
yarl==1.16.0
zipp==3.20.2
zstandard==0.23.0

Describe the current behavior

Fixing Ruff rule B008 causes
test collection failures in the files

tfx/dsl/component/experimental/decorators_test.py
tfx/dsl/component/experimental/decorators_typeddict_test.py

This is an important bug/linting violation to fix because it can lead to unexpected behavior.

Quoted from the Ruff website:

Any function call that's used in a default argument will only be performed once, at definition time. The returned value will then be reused by all calls to the function, which can lead to unexpected behaviour.

Describe the expected behavior

We should be able to apply the diffs below without experiencing test collection errors (also shown below)

Standalone code to reproduce the issue

--- a/tfx/dsl/component/experimental/decorators_test.py
+++ b/tfx/dsl/component/experimental/decorators_test.py
@@ -14,6 +14,7 @@
 """Tests for tfx.dsl.components.base.decorators."""
 
 
+from __future__ import annotations
 import pytest
 import os
 from typing import Any, Dict, List, Optional
@@ -141,8 +142,10 @@ def verify_beam_pipeline_arg(a: int) -> OutputDict(b=float):  # pytype: disable=
 
 def verify_beam_pipeline_arg_non_none_default_value(
     a: int,
-    beam_pipeline: BeamComponentParameter[beam.Pipeline] = beam.Pipeline(),
+    beam_pipeline: BeamComponentParameter[beam.Pipeline] = None,
 ) -> OutputDict(b=float):  # pytype: disable=invalid-annotation,wrong-arg-types
+  if beam_pipeline is None:
+    beam_pipeline = beam.Pipeline()
   del beam_pipeline
   return {'b': float(a)}
 
diff --git a/tfx/dsl/component/experimental/decorators_typeddict_test.py b/tfx/dsl/component/experimental/decorators_typeddict_test.py

--- a/tfx/dsl/component/experimental/decorators_typeddict_test.py
+++ b/tfx/dsl/component/experimental/decorators_typeddict_test.py
@@ -14,6 +14,7 @@
 """Tests for tfx.dsl.components.base.decorators."""
 
 
+from __future__ import annotations
 import pytest
 import os
 from typing import Any, Dict, List, Optional, TypedDict
@@ -141,8 +142,10 @@ def verify_beam_pipeline_arg(a: int) -> TypedDict('Output6', dict(b=float)):  #
 
 def verify_beam_pipeline_arg_non_none_default_value(
     a: int,
-    beam_pipeline: BeamComponentParameter[beam.Pipeline] = beam.Pipeline(),
+    beam_pipeline: BeamComponentParameter[beam.Pipeline]  | None = None,
 ) -> TypedDict('Output7', dict(b=float)):  # pytype: disable=wrong-arg-types
+  if beam_pipeline is None:
+    beam_pipeline = beam.Pipeline()
   del beam_pipeline
   return {'b': float(a)}

Other info / logs

Here are the errors that the above diffs cause when collecting tests with pytest:

_____________________________________________________________ ERROR collecting tfx/dsl/component/experimental/decorators_test.py ______________________________________________________________

    # Copyright 2020 Google LLC. All Rights Reserved.
    #
    # Licensed under the Apache License, Version 2.0 (the "License");
    # you may not use this file except in compliance with the License.
    # You may obtain a copy of the License at
    #
    #     http://www.apache.org/licenses/LICENSE-2.0
    #
    # Unless required by applicable law or agreed to in writing, software
    # distributed under the License is distributed on an "AS IS" BASIS,
    # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    # See the License for the specific language governing permissions and
    # limitations under the License.
    """Tests for tfx.dsl.components.base.decorators."""
    
    
    from __future__ import annotations
    import pytest
    import os
    from typing import Any, Dict, List, Optional
    
    import apache_beam as beam
    import tensorflow as tf
    from tfx import types
    from tfx.dsl.component.experimental.annotations import BeamComponentParameter
    from tfx.dsl.component.experimental.annotations import InputArtifact
    from tfx.dsl.component.experimental.annotations import OutputArtifact
    from tfx.dsl.component.experimental.annotations import OutputDict
    from tfx.dsl.component.experimental.annotations import Parameter
    from tfx.dsl.component.experimental.decorators import _SimpleBeamComponent
    from tfx.dsl.component.experimental.decorators import _SimpleComponent
    from tfx.dsl.component.experimental.decorators import BaseFunctionalComponent
    from tfx.dsl.component.experimental.decorators import component
    from tfx.dsl.components.base import base_beam_executor
    from tfx.dsl.components.base import base_executor
    from tfx.dsl.components.base import executor_spec
    from tfx.dsl.io import fileio
    from tfx.orchestration import metadata
    from tfx.orchestration import pipeline
    from tfx.orchestration.beam import beam_dag_runner
    from tfx.types import component_spec
    from tfx.types import standard_artifacts
    from tfx.types.channel_utils import union
    from tfx.types.system_executions import SystemExecution
    
    _TestBeamPipelineArgs = ['--my_testing_beam_pipeline_args=foo']
    
    
    class _InputArtifact(types.Artifact):
      TYPE_NAME = '_InputArtifact'
    
    
    class _OutputArtifact(types.Artifact):
      TYPE_NAME = '_OutputArtifact'
    
    
    class _BasicComponentSpec(component_spec.ComponentSpec):
    
      PARAMETERS = {
          'folds': component_spec.ExecutionParameter(type=int),
      }
      INPUTS = {
          'input': component_spec.ChannelParameter(type=_InputArtifact),
      }
      OUTPUTS = {
          'output': component_spec.ChannelParameter(type=_OutputArtifact),
      }
    
    
    class _InjectorAnnotation(SystemExecution):
    
      MLMD_SYSTEM_BASE_TYPE = 1
    
    
    class _SimpleComponentAnnotation(SystemExecution):
    
      MLMD_SYSTEM_BASE_TYPE = 2
    
    
    class _VerifyAnnotation(SystemExecution):
    
      MLMD_SYSTEM_BASE_TYPE = 3
    
    
    def no_op():
      pass
    
    
    _decorated_no_op = component(no_op)
    _decorated_with_arg_no_op = component()(no_op)
    
    
    @component
>   def injector_1(
        foo: Parameter[int], bar: Parameter[str]
    ) -> OutputDict(a=int, b=int, c=str, d=bytes):  # pytype: disable=invalid-annotation,wrong-arg-types

tfx/dsl/component/experimental/decorators_test.py:94: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

func = <function injector_1 at 0x77aa8d575d80>

    def component(
        func: Optional[types.FunctionType] = None,
        /,
        *,
        component_annotation: Optional[
            Type[system_executions.SystemExecution]
        ] = None,
        use_beam: bool = False,
    ) -> Union[
        BaseFunctionalComponentFactory,
        Callable[[types.FunctionType], BaseFunctionalComponentFactory],
    ]:
      '''Decorator: creates a component from a typehint-annotated Python function.
    
      This decorator creates a component based on typehint annotations specified for
      the arguments and return value for a Python function. The decorator can be
      supplied with a parameter `component_annotation` to specify the annotation for
      this component decorator. This annotation hints which system execution type
      this python function-based component belongs to.
      Specifically, function arguments can be annotated with the following types and
      associated semantics:
    
      * `Parameter[T]` where `T` is `int`, `float`, `str`, or `bool`:
        indicates that a primitive type execution parameter, whose value is known at
        pipeline construction time, will be passed for this argument. These
        parameters will be recorded in ML Metadata as part of the component's
        execution record. Can be an optional argument.
      * `int`, `float`, `str`, `bytes`, `bool`, `Dict`, `List`: indicates that a
        primitive type value will be passed for this argument. This value is tracked
        as an `Integer`, `Float`, `String`, `Bytes`, `Boolean` or `JsonValue`
        artifact (see `tfx.types.standard_artifacts`) whose value is read and passed
        into the given Python component function. Can be an optional argument.
      * `InputArtifact[ArtifactType]`: indicates that an input artifact object of
        type `ArtifactType` (deriving from `tfx.types.Artifact`) will be passed for
        this argument. This artifact is intended to be consumed as an input by this
        component (possibly reading from the path specified by its `.uri`). Can be
        an optional argument by specifying a default value of `None`.
      * `OutputArtifact[ArtifactType]`: indicates that an output artifact object of
        type `ArtifactType` (deriving from `tfx.types.Artifact`) will be passed for
        this argument. This artifact is intended to be emitted as an output by this
        component (and written to the path specified by its `.uri`). Cannot be an
        optional argument.
    
      The return value typehint should be either empty or `None`, in the case of a
      component function that has no return values, or a `TypedDict` of primitive
      value types (`int`, `float`, `str`, `bytes`, `bool`, `dict` or `list`; or
      `Optional[T]`, where T is a primitive type value, in which case `None` can be
      returned), to indicate that the return value is a dictionary with specified
      keys and value types.
    
      Note that output artifacts should not be included in the return value
      typehint; they should be included as `OutputArtifact` annotations in the
      function inputs, as described above.
    
      The function to which this decorator is applied must be at the top level of
      its Python module (it may not be defined within nested classes or function
      closures).
    
      This is example usage of component definition using this decorator:
    
      ``` python
      from tfx import v1 as tfx
    
      InputArtifact = tfx.dsl.components.InputArtifact
      OutputArtifact = tfx.dsl.components.OutputArtifact
      Parameter = tfx.dsl.components.Parameter
      Examples = tfx.types.standard_artifacts.Examples
      Model = tfx.types.standard_artifacts.Model
    
    
      class MyOutput(TypedDict):
          loss: float
          accuracy: float
    
    
      @component(component_annotation=tfx.dsl.standard_annotations.Train)
      def MyTrainerComponent(
          training_data: InputArtifact[Examples],
          model: OutputArtifact[Model],
          dropout_hyperparameter: float,
          num_iterations: Parameter[int] = 10,
      ) -> MyOutput:
          """My simple trainer component."""
    
          records = read_examples(training_data.uri)
          model_obj = train_model(records, num_iterations, dropout_hyperparameter)
          model_obj.write_to(model.uri)
    
          return {"loss": model_obj.loss, "accuracy": model_obj.accuracy}
    
    
      # Example usage in a pipeline graph definition:
      # ...
      trainer = MyTrainerComponent(
          training_data=example_gen.outputs["examples"],
          dropout_hyperparameter=other_component.outputs["dropout"],
          num_iterations=1000,
      )
      pusher = Pusher(model=trainer.outputs["model"])
      # ...
      ```
    
      When the parameter `component_annotation` is not supplied, the default value
      is None. This is another example usage with `component_annotation` = None:
    
      ``` python
      @component
      def MyTrainerComponent(
          training_data: InputArtifact[standard_artifacts.Examples],
          model: OutputArtifact[standard_artifacts.Model],
          dropout_hyperparameter: float,
          num_iterations: Parameter[int] = 10,
      ) -> Output:
          """My simple trainer component."""
    
          records = read_examples(training_data.uri)
          model_obj = train_model(records, num_iterations, dropout_hyperparameter)
          model_obj.write_to(model.uri)
    
          return {"loss": model_obj.loss, "accuracy": model_obj.accuracy}
      ```
    
      When the parameter `use_beam` is True, one of the parameters of the decorated
      function type-annotated by BeamComponentParameter[beam.Pipeline] and the
      default value can only be None. It will be replaced by a beam Pipeline made
      with the tfx pipeline's beam_pipeline_args that's shared with other beam-based
      components:
    
      ``` python
      @component(use_beam=True)
      def DataProcessingComponent(
          input_examples: InputArtifact[standard_artifacts.Examples],
          output_examples: OutputArtifact[standard_artifacts.Examples],
          beam_pipeline: BeamComponentParameter[beam.Pipeline] = None,
      ) -> None:
          """My simple trainer component."""
    
          records = read_examples(training_data.uri)
          with beam_pipeline as p:
              ...
      ```
    
      Experimental: no backwards compatibility guarantees.
    
      Args:
        func: Typehint-annotated component executor function.
        component_annotation: used to annotate the python function-based component.
          It is a subclass of SystemExecution from
          third_party/py/tfx/types/system_executions.py; it can be None.
        use_beam: Whether to create a component that is a subclass of
          BaseBeamComponent. This allows a beam.Pipeline to be made with
          tfx-pipeline-wise beam_pipeline_args.
    
      Returns:
        An object that:
    
          1. you can call like the initializer of a subclass of [`base_component.BaseComponent`][tfx.v1.types.BaseChannel] (or [`base_component.BaseBeamComponent`][tfx.v1.types.BaseBeamComponent]).
          2. has a test_call() member function for unit testing the inner implementation of the component.
    
          Today, the returned object is literally a subclass of [BaseComponent][tfx.v1.types.BaseChannel], so it can be used as a `Type` e.g. in isinstance() checks. But you must not rely on this, as we reserve the right to reserve a different kind of object in the future, which _only_ satisfies the two criteria (1.) and (2.) above without being a `Type` itself.
    
      Raises:
        EnvironmentError: if the current Python interpreter is not Python 3.
      '''
      if func is None:
        # Python decorators with arguments in parentheses result in two function
        # calls. The first function call supplies the kwargs and the second supplies
        # the decorated function. Here we forward the kwargs to the second call.
        return functools.partial(
            component,
            component_annotation=component_annotation,
            use_beam=use_beam,
        )
    
      utils.assert_is_top_level_func(func)
    
      (inputs, outputs, parameters, arg_formats, arg_defaults, returned_values,
       json_typehints, return_json_typehints) = (
>          function_parser.parse_typehint_component_function(func))

tfx/dsl/component/experimental/decorators.py:489: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

func = <function injector_1 at 0x77aa8d575d80>

    def parse_typehint_component_function(
        func: types.FunctionType,
    ) -> ParsedSignature:
      """Parses the given component executor function.
    
      This method parses a typehinted-annotated Python function that is intended to
      be used as a component and returns the information needed about the interface
      (inputs / outputs / returned output values) about that components, as well as
      a list of argument names and formats for determining the parameters that
      should be passed when calling `func(*args)`.
    
      Args:
        func: A component executor function to be parsed.
    
      Returns:
        A ParsedSignature.
      """
      utils.assert_is_functype(func)
    
      # Inspect the component executor function.
      typehints = func.__annotations__  # pytype: disable=attribute-error
      argspec = inspect.getfullargspec(func)  # pytype: disable=module-attr
      subject_message = 'Component declared as a typehint-annotated function'
>     _validate_signature(func, argspec, typehints, subject_message)

tfx/dsl/component/experimental/function_parser.py:320: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

func = <function injector_1 at 0x77aa8d575d80>
argspec = FullArgSpec(args=['foo', 'bar'], varargs=None, varkw=None, defaults=None, kwonlyargs=[], kwonlydefaults=None, annotations={'return': 'OutputDict(a=int, b=int, c=str, d=bytes)', 'foo': 'Parameter[int]', 'bar': 'Parameter[str]'})
typehints = {'bar': 'Parameter[str]', 'foo': 'Parameter[int]', 'return': 'OutputDict(a=int, b=int, c=str, d=bytes)'}, subject_message = 'Component declared as a typehint-annotated function'

    def _validate_signature(
        func: types.FunctionType,
        argspec: inspect.FullArgSpec,  # pytype: disable=module-attr
        typehints: Dict[str, Any],
        subject_message: str,
    ) -> None:
      """Validates signature of a typehint-annotated component executor function."""
      utils.assert_no_varargs_varkw(argspec, subject_message)
    
      # Validate argument type hints.
      for arg in argspec.args:
        if isinstance(arg, list):
          # Note: this feature was removed in Python 3:
          # https://www.python.org/dev/peps/pep-3113/.
          raise ValueError('%s does not support nested input arguments.' %
                           subject_message)
        if arg not in typehints:
          raise ValueError('%s must have all arguments annotated with typehints.' %
                           subject_message)
    
      # Validate return type hints.
>     if return_kwargs := _parse_return_type_kwargs(func, typehints):

tfx/dsl/component/experimental/function_parser.py:108: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

func = <function injector_1 at 0x77aa8d575d80>, typehints = {'bar': 'Parameter[str]', 'foo': 'Parameter[int]', 'return': 'OutputDict(a=int, b=int, c=str, d=bytes)'}

    def _parse_return_type_kwargs(
        func: types.FunctionType, typehints: Dict[str, Any]
    ) -> Optional[Dict[str, Any]]:
      """Parse function return type which should be TypedDict or OutputDict."""
      return_annotation = typehints.get('return')
      if return_annotation is None:
        return None
      elif _is_typeddict(return_annotation):
        return typing.get_type_hints(return_annotation)
      elif isinstance(
          return_annotation, annotations.OutputDict
      ):  # For backward compatibility.
        return return_annotation.kwargs
      else:
>       raise ValueError(
            f'Return type annotation of @component {func.__name__} should be'
            ' TypedDict or None.'
        )
E       ValueError: Return type annotation of @component injector_1 should be TypedDict or None.

tfx/dsl/component/experimental/function_parser.py:81: ValueError
________________________________________________________ ERROR collecting tfx/dsl/component/experimental/decorators_typeddict_test.py _________________________________________________________

    # Copyright 2023 Google LLC. All Rights Reserved.
    #
    # Licensed under the Apache License, Version 2.0 (the "License");
    # you may not use this file except in compliance with the License.
    # You may obtain a copy of the License at
    #
    #     http://www.apache.org/licenses/LICENSE-2.0
    #
    # Unless required by applicable law or agreed to in writing, software
    # distributed under the License is distributed on an "AS IS" BASIS,
    # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    # See the License for the specific language governing permissions and
    # limitations under the License.
    """Tests for tfx.dsl.components.base.decorators."""
    
    
    from __future__ import annotations
    import pytest
    import os
    from typing import Any, Dict, List, Optional, TypedDict
    
    import apache_beam as beam
    import tensorflow as tf
    from tfx import types
    from tfx.dsl.component.experimental.annotations import BeamComponentParameter
    from tfx.dsl.component.experimental.annotations import InputArtifact
    from tfx.dsl.component.experimental.annotations import OutputArtifact
    from tfx.dsl.component.experimental.annotations import Parameter
    from tfx.dsl.component.experimental.decorators import _SimpleBeamComponent
    from tfx.dsl.component.experimental.decorators import _SimpleComponent
    from tfx.dsl.component.experimental.decorators import component
    from tfx.dsl.components.base import base_beam_executor
    from tfx.dsl.components.base import base_executor
    from tfx.dsl.components.base import executor_spec
    from tfx.dsl.io import fileio
    from tfx.orchestration import metadata
    from tfx.orchestration import pipeline
    from tfx.orchestration.beam import beam_dag_runner
    from tfx.types import component_spec
    from tfx.types import standard_artifacts
    from tfx.types.channel_utils import union
    from tfx.types.system_executions import SystemExecution
    
    _TestBeamPipelineArgs = ['--my_testing_beam_pipeline_args=foo']
    
    
    class _InputArtifact(types.Artifact):
      TYPE_NAME = '_InputArtifact'
    
    
    class _OutputArtifact(types.Artifact):
      TYPE_NAME = '_OutputArtifact'
    
    
    class _BasicComponentSpec(component_spec.ComponentSpec):
      PARAMETERS = {
          'folds': component_spec.ExecutionParameter(type=int),
      }
      INPUTS = {
          'input': component_spec.ChannelParameter(type=_InputArtifact),
      }
      OUTPUTS = {
          'output': component_spec.ChannelParameter(type=_OutputArtifact),
      }
    
    
    class _InjectorAnnotation(SystemExecution):
      MLMD_SYSTEM_BASE_TYPE = 1
    
    
    class _SimpleComponentAnnotation(SystemExecution):
      MLMD_SYSTEM_BASE_TYPE = 2
    
    
    class _VerifyAnnotation(SystemExecution):
      MLMD_SYSTEM_BASE_TYPE = 3
    
    
    def no_op():
      pass
    
    
    _decoratedno_op = component(no_op)
    _decorated_with_argno_op = component()(no_op)
    
    
    @component
>   def injector_1(
        foo: Parameter[int], bar: Parameter[str]
    ) -> TypedDict('Output1', dict(a=int, b=int, c=str, d=bytes)):  # pytype: disable=wrong-arg-types

tfx/dsl/component/experimental/decorators_typeddict_test.py:88: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

func = <function injector_1 at 0x77aa7d421360>

    def component(
        func: Optional[types.FunctionType] = None,
        /,
        *,
        component_annotation: Optional[
            Type[system_executions.SystemExecution]
        ] = None,
        use_beam: bool = False,
    ) -> Union[
        BaseFunctionalComponentFactory,
        Callable[[types.FunctionType], BaseFunctionalComponentFactory],
    ]:
      '''Decorator: creates a component from a typehint-annotated Python function.
    
      This decorator creates a component based on typehint annotations specified for
      the arguments and return value for a Python function. The decorator can be
      supplied with a parameter `component_annotation` to specify the annotation for
      this component decorator. This annotation hints which system execution type
      this python function-based component belongs to.
      Specifically, function arguments can be annotated with the following types and
      associated semantics:
    
      * `Parameter[T]` where `T` is `int`, `float`, `str`, or `bool`:
        indicates that a primitive type execution parameter, whose value is known at
        pipeline construction time, will be passed for this argument. These
        parameters will be recorded in ML Metadata as part of the component's
        execution record. Can be an optional argument.
      * `int`, `float`, `str`, `bytes`, `bool`, `Dict`, `List`: indicates that a
        primitive type value will be passed for this argument. This value is tracked
        as an `Integer`, `Float`, `String`, `Bytes`, `Boolean` or `JsonValue`
        artifact (see `tfx.types.standard_artifacts`) whose value is read and passed
        into the given Python component function. Can be an optional argument.
      * `InputArtifact[ArtifactType]`: indicates that an input artifact object of
        type `ArtifactType` (deriving from `tfx.types.Artifact`) will be passed for
        this argument. This artifact is intended to be consumed as an input by this
        component (possibly reading from the path specified by its `.uri`). Can be
        an optional argument by specifying a default value of `None`.
      * `OutputArtifact[ArtifactType]`: indicates that an output artifact object of
        type `ArtifactType` (deriving from `tfx.types.Artifact`) will be passed for
        this argument. This artifact is intended to be emitted as an output by this
        component (and written to the path specified by its `.uri`). Cannot be an
        optional argument.
    
      The return value typehint should be either empty or `None`, in the case of a
      component function that has no return values, or a `TypedDict` of primitive
      value types (`int`, `float`, `str`, `bytes`, `bool`, `dict` or `list`; or
      `Optional[T]`, where T is a primitive type value, in which case `None` can be
      returned), to indicate that the return value is a dictionary with specified
      keys and value types.
    
      Note that output artifacts should not be included in the return value
      typehint; they should be included as `OutputArtifact` annotations in the
      function inputs, as described above.
    
      The function to which this decorator is applied must be at the top level of
      its Python module (it may not be defined within nested classes or function
      closures).
    
      This is example usage of component definition using this decorator:
    
      ``` python
      from tfx import v1 as tfx
    
      InputArtifact = tfx.dsl.components.InputArtifact
      OutputArtifact = tfx.dsl.components.OutputArtifact
      Parameter = tfx.dsl.components.Parameter
      Examples = tfx.types.standard_artifacts.Examples
      Model = tfx.types.standard_artifacts.Model
    
    
      class MyOutput(TypedDict):
          loss: float
          accuracy: float
    
    
      @component(component_annotation=tfx.dsl.standard_annotations.Train)
      def MyTrainerComponent(
          training_data: InputArtifact[Examples],
          model: OutputArtifact[Model],
          dropout_hyperparameter: float,
          num_iterations: Parameter[int] = 10,
      ) -> MyOutput:
          """My simple trainer component."""
    
          records = read_examples(training_data.uri)
          model_obj = train_model(records, num_iterations, dropout_hyperparameter)
          model_obj.write_to(model.uri)
    
          return {"loss": model_obj.loss, "accuracy": model_obj.accuracy}
    
    
      # Example usage in a pipeline graph definition:
      # ...
      trainer = MyTrainerComponent(
          training_data=example_gen.outputs["examples"],
          dropout_hyperparameter=other_component.outputs["dropout"],
          num_iterations=1000,
      )
      pusher = Pusher(model=trainer.outputs["model"])
      # ...
      ```
    
      When the parameter `component_annotation` is not supplied, the default value
      is None. This is another example usage with `component_annotation` = None:
    
      ``` python
      @component
      def MyTrainerComponent(
          training_data: InputArtifact[standard_artifacts.Examples],
          model: OutputArtifact[standard_artifacts.Model],
          dropout_hyperparameter: float,
          num_iterations: Parameter[int] = 10,
      ) -> Output:
          """My simple trainer component."""
    
          records = read_examples(training_data.uri)
          model_obj = train_model(records, num_iterations, dropout_hyperparameter)
          model_obj.write_to(model.uri)
    
          return {"loss": model_obj.loss, "accuracy": model_obj.accuracy}
      ```
    
      When the parameter `use_beam` is True, one of the parameters of the decorated
      function type-annotated by BeamComponentParameter[beam.Pipeline] and the
      default value can only be None. It will be replaced by a beam Pipeline made
      with the tfx pipeline's beam_pipeline_args that's shared with other beam-based
      components:
    
      ``` python
      @component(use_beam=True)
      def DataProcessingComponent(
          input_examples: InputArtifact[standard_artifacts.Examples],
          output_examples: OutputArtifact[standard_artifacts.Examples],
          beam_pipeline: BeamComponentParameter[beam.Pipeline] = None,
      ) -> None:
          """My simple trainer component."""
    
          records = read_examples(training_data.uri)
          with beam_pipeline as p:
              ...
      ```
    
      Experimental: no backwards compatibility guarantees.
    
      Args:
        func: Typehint-annotated component executor function.
        component_annotation: used to annotate the python function-based component.
          It is a subclass of SystemExecution from
          third_party/py/tfx/types/system_executions.py; it can be None.
        use_beam: Whether to create a component that is a subclass of
          BaseBeamComponent. This allows a beam.Pipeline to be made with
          tfx-pipeline-wise beam_pipeline_args.
    
      Returns:
        An object that:
    
          1. you can call like the initializer of a subclass of [`base_component.BaseComponent`][tfx.v1.types.BaseChannel] (or [`base_component.BaseBeamComponent`][tfx.v1.types.BaseBeamComponent]).
          2. has a test_call() member function for unit testing the inner implementation of the component.
    
          Today, the returned object is literally a subclass of [BaseComponent][tfx.v1.types.BaseChannel], so it can be used as a `Type` e.g. in isinstance() checks. But you must not rely on this, as we reserve the right to reserve a different kind of object in the future, which _only_ satisfies the two criteria (1.) and (2.) above without being a `Type` itself.
    
      Raises:
        EnvironmentError: if the current Python interpreter is not Python 3.
      '''
      if func is None:
        # Python decorators with arguments in parentheses result in two function
        # calls. The first function call supplies the kwargs and the second supplies
        # the decorated function. Here we forward the kwargs to the second call.
        return functools.partial(
            component,
            component_annotation=component_annotation,
            use_beam=use_beam,
        )
    
      utils.assert_is_top_level_func(func)
    
      (inputs, outputs, parameters, arg_formats, arg_defaults, returned_values,
       json_typehints, return_json_typehints) = (
>          function_parser.parse_typehint_component_function(func))

tfx/dsl/component/experimental/decorators.py:489: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

func = <function injector_1 at 0x77aa7d421360>

    def parse_typehint_component_function(
        func: types.FunctionType,
    ) -> ParsedSignature:
      """Parses the given component executor function.
    
      This method parses a typehinted-annotated Python function that is intended to
      be used as a component and returns the information needed about the interface
      (inputs / outputs / returned output values) about that components, as well as
      a list of argument names and formats for determining the parameters that
      should be passed when calling `func(*args)`.
    
      Args:
        func: A component executor function to be parsed.
    
      Returns:
        A ParsedSignature.
      """
      utils.assert_is_functype(func)
    
      # Inspect the component executor function.
      typehints = func.__annotations__  # pytype: disable=attribute-error
      argspec = inspect.getfullargspec(func)  # pytype: disable=module-attr
      subject_message = 'Component declared as a typehint-annotated function'
>     _validate_signature(func, argspec, typehints, subject_message)

tfx/dsl/component/experimental/function_parser.py:320: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

func = <function injector_1 at 0x77aa7d421360>
argspec = FullArgSpec(args=['foo', 'bar'], varargs=None, varkw=None, defaults=None, kwonlyargs=[], kwonlydefaults=None, annotati...return': "TypedDict('Output1', dict(a=int, b=int, c=str, d=bytes))", 'foo': 'Parameter[int]', 'bar': 'Parameter[str]'})
typehints = {'bar': 'Parameter[str]', 'foo': 'Parameter[int]', 'return': "TypedDict('Output1', dict(a=int, b=int, c=str, d=bytes))"}
subject_message = 'Component declared as a typehint-annotated function'

    def _validate_signature(
        func: types.FunctionType,
        argspec: inspect.FullArgSpec,  # pytype: disable=module-attr
        typehints: Dict[str, Any],
        subject_message: str,
    ) -> None:
      """Validates signature of a typehint-annotated component executor function."""
      utils.assert_no_varargs_varkw(argspec, subject_message)
    
      # Validate argument type hints.
      for arg in argspec.args:
        if isinstance(arg, list):
          # Note: this feature was removed in Python 3:
          # https://www.python.org/dev/peps/pep-3113/.
          raise ValueError('%s does not support nested input arguments.' %
                           subject_message)
        if arg not in typehints:
          raise ValueError('%s must have all arguments annotated with typehints.' %
                           subject_message)
    
      # Validate return type hints.
>     if return_kwargs := _parse_return_type_kwargs(func, typehints):

tfx/dsl/component/experimental/function_parser.py:108: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

func = <function injector_1 at 0x77aa7d421360>, typehints = {'bar': 'Parameter[str]', 'foo': 'Parameter[int]', 'return': "TypedDict('Output1', dict(a=int, b=int, c=str, d=bytes))"}

    def _parse_return_type_kwargs(
        func: types.FunctionType, typehints: Dict[str, Any]
    ) -> Optional[Dict[str, Any]]:
      """Parse function return type which should be TypedDict or OutputDict."""
      return_annotation = typehints.get('return')
      if return_annotation is None:
        return None
      elif _is_typeddict(return_annotation):
        return typing.get_type_hints(return_annotation)
      elif isinstance(
          return_annotation, annotations.OutputDict
      ):  # For backward compatibility.
        return return_annotation.kwargs
      else:
>       raise ValueError(
            f'Return type annotation of @component {func.__name__} should be'
            ' TypedDict or None.'
        )
E       ValueError: Return type annotation of @component injector_1 should be TypedDict or None.

tfx/dsl/component/experimental/function_parser.py:81: ValueError
smokestacklightnin added a commit to smokestacklightnin/tfx that referenced this issue Oct 27, 2024
For remaining B008 violations, see [Issue 6945](tensorflow#6945)
@janasangeetha janasangeetha self-assigned this Oct 28, 2024
@janasangeetha
Copy link
Contributor

Hi @smokestacklightnin,
Thank you for reporting. I'll take a look and provide an update here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants