InferTensorRepresentationsFromSchema
,TensorAdapter
andTensorsToRecordBatchConverter
now supportSparseTensor
s with unknowndense_shape
.
- Depends on
tensorflow>=2.11,<3
- N/A
- N/A
-
TensorAdapter
now processestf.RaggedTensor
s in TF 2 ~10x faster. -
InferTensorRepresentationsFromSchema
now infersRaggedTensor
s forSTRUCT
features. -
TFSequenceExampleRecord
now supports schemas with features not covered or partially covered byTensorRepresentation
s. -
This is the last version that supports TensorFlow 1.15.x. TF 1.15.x support will be removed in the next version. Please check the TF2 migration guide to migrate to TF2.
- Depends on
tensorflow>=1.15.5,<2
ortensorflow>=2.10,<3
- Depends on
protobuf>=3.13,<4
- Various
TFXIO
implementations now inferTensorRepresentations
for provided schemaFeatures
even if someTensorRepresentations
are provided as well.
- N/A
- N/A
ExamplesToRecordBatchDecoder
is now picklable.ParquetTFXIO
can now be used asRecordBasedTFXIO
.- Introduces
CreateTfSequenceExampleParserConfig
that takes TFMD schema as input and produces configs fortf.SequenceExample
parsing. TFSequenceExampleRecord
can now produce an equivalent tf.data.Dataset.- Introduces an api:
CreateModelHandler
that produces a model handler suitable for apache_beam.ml.inference. - Quantiles sketch supports GetQuantilesAndCumulativeWeights, which returns the sum of weights in each quantiles bin along with boundaries.
- Depends on
apache-beam[gcp]>=2.40,<3
. - Depends on
pyarrow>=6,<7
. - Depends on
tensorflow-metadata>=1.10,<1.11
. - Depends on
tensorflow>=1.15.5,<2
ortensorflow>=2.9,<3
.
- GenerateQuantiles removed from weighted_quantiles_summary.h and replaced with GenerateQuantilesAndCumulativeWeights.
- N/A
- N/A
- Depends on
tensorflow-metadata>=1.9,<1.10
. - Depends on
tensorflow>=1.15.5,<2
ortensorflow>=2.9,<3
. - Depends on
protobuf>=3.13,<3.21
.
- N/A
- N/A
- Introduced
RunInferencePerModel
PTransform, which is a vectorized variant ofRunInference
(useful for ensembles). - Introduced
ParquetTFXIO
that allows reading data from Parquet files inpyarrow.RecordBatch
format. - From this version we will be releasing python 3.9 wheels.
- Depends on
apache-beam[gcp]>=2.38,<3
.
- Depends on
tensorflow-metadata>=1.8,<1.9
.
- N/A
- N/A
- N/A
- Depends on
apache-beam[gcp]>=2.36,<3
. - Depends on
tensorflow-metadata>=1.7,<1.8
. - Depends on
tensorflow>=1.15.5,!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,<3
. - Depends on
tensorflow-serving-api>=1.15,!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,<3
. - Added a TFXIO where the user defines the beam source.
- N/A
- N/A
- N/A
- Fixes a bug when
TensorsToRecordBatchConverter
could not handletf.RaggedTensor
s with uniform inner dimensions in TF 1.15. - Depends on
apache-beam[gcp]>=2.35,<3
. - Depends on
tensorflow-metadata>=1.6,<1.7
. - Depends on
numpy>=1.16,<2
. - Depends on
absl-py>=0.9,<2.0.0
. - Depends on
tensorflow>=1.15.5,!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,<3
.
- N/A
- N/A
TensorsToRecordBatchConverter
can now handletf.RaggedTensor
s with uniform inner dimensions.
- Depends on
apache-beam[gcp]>=2.34,<3
. - Depends on
tensorflow>=1.15.2,!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,<3
. - Depends on
tensorflow-serving-api>=1.15,!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,<3
. - Depends on
tensorflow-metadata>=1.5,<1.6
.
- N/A
- N/A
- Introduces
RecordBatchToExamplesEncoder
that supports encoding nestedpyarrow.large_list()
s representingtf.RaggedTensor
s.
- Register s2t ops before loading decoder in record_to_tensor_tfxio if struct2tensor is installed.
- Depends on
pyarrow>=1,<6
. - Depends on
tensorflow-metadata>=1.4,<1.5
.
- N/A
- Deprecated python 3.6 support.
- N/A
QuantilesSketch
now ignores NaNs in input values and weights. Previously, NaNs would lead to incorrect quantiles calculation.- Fixes a bug when
MisraGriesSketch
would discard excessive number of elements duringAddValues
andCompress
and output fewer elements than requested. - Depends on
tensorflow>=1.15.2,!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,<3
. - Depends on
tensorflow-serving-api>=1.15,!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,<3
.
- N/A
- N/A
- Added support for converting
tf.compat.v1.ragged.RaggedTensorValue
s toTensorsToRecordBatchConverter
. - Depends on
apache-beam[gcp]>=2.31,<3
. - Depends on
tensorflow-metadata>=1.2,<1.3
.
- N/A
- N/A
- N/A
- N/A
- Depends on
google-cloud-bigquery>>=1.28.0,<2.21
.
- N/A
- N/A
- Provided the SQL query ability for Apache Arrow RecordBatch. It's not available under Windows.
- Depends on
protobuf>=3.13,<4
. - Upgraded the protobuf (com_google_protobuf) to
3.13.0
. - Upgraded the bazel_skylib to
1.0.2
due to the upgrading of protobuf. - Depends on
tensorflow-metadata>=1.1,<1.2
. - More documentation is added for the SequenceExample decoder. It's available
at
tfx_bsl/coders/README.md
.
- The minimum required OS version for the macOS is 10.14 now.
- N/A
- N/A
- Depends on
apache-beam[gcp]>=2.29,<3
. - Depends on
tensorflow>=1.15.2,!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,<3
. - Depends on
tensorflow-serving-api>=1.15,!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,<3
. - Depends on
tensorflow-metadata>=1.0,<1.1
.
- N/A
- N/A
- Misra-Gries sketch: added support for replacing large string blobs with a configurable placeholder, and replacing invalid utf-8 sequences with a configurable placeholder.
- Depends on
tensorflow-metadata>=0.30,<0.31
.
- Removed
tfx_bsl.beam.shared
. It is now available in Apache Beam. Useapache_beam.utils.shared
instead.
- N/A
- Add RawRecordTensorFlowDataset interface to record based tfxios.
- TensorToArrowConverter now can handle generic SparseTensors (>=3-d).
- Added
RecordToTensorTFXIO.DecodeFunction()
to get the decoder as a TF function.
- Depends on
absl-py>=0.9,<0.13
. - Depends on
tensorflow-metadata>=0.29,<0.30
. - Bumped the mininum bazel version required to build
tfx_bsl
to 3.7.2.
- N/A
- N/A
- N/A
- Depends on
apache-beam[gcp]>=2.28,<3
.
- N/A
- N/A
- RunInference can now be applied on serialized tf.train.{Example, SequenceExample} for all methods as well as any other kind of serialized structure for the Predict method.
- RunInference can now operate on PCollection[K, V] in a key-forwarding mode (whereby the key is left unchanged while inference is performed on the value).
- RunInference is now more performant.
- Depends on
numpy>=1.16,<1.20
. - Depends on
tensorflow-metadata>=0.28,<0.29
.
- N/A
- N/A
- This is a bug fix only version, which modified the dependencies.
- N/A
- Fix in the
tensorflow-serving-api
version constraint.
- N/A
- N/A
tfx_bsl.public.tfxio.TFGraphRecordDecoder
is now a public API.
- Depends on
apache-beam[gcp]>=2.27,<3
. - Depends on
pyarrow>=1,<3
. - Depends on
tensorflow>=1.15.2,!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,<3
. - Depends on
tensorflow-metadata>=0.27,<0.28
. - Depends on
tensorflow-serving>=1.15,!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,<3
.
- N/A
- N/A
- This is a bug fix only version, which modified the dependencies.
- N/A
- Depends on
apache-beam[gcp]>=2.25,!=2.26.*,<3
. - Depends on
tensorflow>=1.15.2,!=2.0.*,!=2.1.*,!=2.2.*,!=2.4.*,<3
. - Depends on
tensorflow-serving>=1.15,!=2.0.*,!=2.1.*,!=2.2.*,!=2.4.*,<3
.
- N/A
- N/A
.TensorFlowDataset
interface is available in RawTfRecord TFXIO.
- Fix TFExampleRecord TFXIO's TensorFlowDataset output key's to match the tensor representation's tensor name (Previously this assumed the user provided a tensor name that is the same as the feature name).
- Add utility in tensor_representation_util.py to get source columns from a tensor representation.
- Depends on
tensorflow-metadata>=0.26,<0.27
.
- N/A
- N/A
-
Add
RecordBatches
interface to TFXIO. This interface returns an iterable of record batches, which can be used outside of Apache Beam or TensorFlow to access data. -
From this release TFX-BSL will also be hosting nightly packages on https://pypi-nightly.tensorflow.org. To install the nightly package use the following command:
pip install --extra-index-url https://pypi-nightly.tensorflow.org/simple tfx-bsl
Note: These nightly packages are unstable and breakages are likely to happen. The fix could often take a week or more depending on the complexity involved for the wheels to be available on the PyPI cloud service. You can always use the stable version of TFX-BSL available on PyPI by running the command
pip install tfx-bsl
.
- TensorToArrow returns LargeListArray/LargeBinaryArray in place of ListArray/BinaryArray.
- array_util.IndexIn now supports LargeBinaryArray inputs.
- Depends on
apache-beam[gcp]>=2.25,<3
. - Depends on
tensorflow-metadata>=0.25,<0.26
.
- Coders (Example, CSV) do not support outputting ListArray/BinaryArray any more. They can only output LargeListArray/LargeBinaryArray.
- N/A
- N/A
- Depends on
apache-beam[gcp]>=2.24,<3
.
- N/A
- N/A
- You can now build
tfx_bsl
wheel withpython setup.py bdist_wheel
. Note:- If you want to build a manylinux2010 wheel you'll still need to use Docker.
- Bazel is still required.
- You can now build manylinux2010
tfx_bsl
wheel for Python 3.8. - From this version we will be releasing python 3.8 wheels.
- Stopped depending on
six
. - Depends on
absl-py>=0.9,<0.11
. - Depends on
pandas>=1.0,<2
. - Depends on
protobuf>=3.9.2,<4
. - Depends on
tensorflow-metadata>=0.24,<0.25
.
- N/A
- Deprecated py3.5 support.
- Several TFXIO symbols are made public, which means:
- TFX users (both pipeline and component authors), and TFX libraries (TFDV, TFMA, TFT) users may start using these symbols.
- We will be subject to semantic versioning once tfx_bsl goes beyond 1.0.
- TFRecord based TFXIO implementations now support reading from multiple file patterns.
- Implemented the TensorFlowDataset() interface for TFExampleRecord TFXIO.
- Starting from this version,
tfx_bsl
has no binary dependency onpyarrow
(libarrow.so
). As a result:- Package
tfx_bsl
will be able to work with a wider range of pyarrow versions. We will relax the version requirements in setup.py in the next release. - Custom built
tfx_bsl
does not have to maintain ABI compatiblity with a specificpyarrow
installation. Custom builds don't need to be manylinux-conformant.
- Package
- Starting from this version, the windows wheel will be built with VS 2015.
run_all_tests
will fail with exit code -2 if no tests are discovered.- Stopped requiring
avro-python3
. - Example coders will ignore duplicate feature names in the TFMD schema (only the first one counts). It is a temporary measure until TFDV can check and prevent duplications. DO NOT rely on this behavior.
- CsvTFXIO now allows skipping CSV headers (
set skip_header_lines
). - CsvTFXIO now requires
telemetry_descriptors
to construct. - Depends on
apache-beam[gcp]>=2.23,<3
. - Depends on
pyarrow>=0.17,<0.18
. - Depends on
tensorflow>=1.15.2,!=2.0.*,!=2.1.*,!=2.2.*,<3
. - Depends on
tensorflow-metadata>=0.23,<0.24
. - Depends on
tensorflow-serving-api>=1.15,!=2.0.*,!=2.1.*,!=2.2.*,<3
.
- N/A
- Dropped Python 2.x support.
- Note: We plan to remove Python 3.5 support after this release.
- Added SequenceExamplesToRecordBatchDecoder.
- Added a TFXIO implementation for SequenceExmaples on TFRecord.
- Added support for TensorAdapter to output tf.RaggedTensors.
- Improved performance of tf.Example and tf.SequenceExample coders.
- Depends on
pandas>=0.24,<2
. - Depends on
tensorflow>=1.15,!=2.0.*,<3
. - Depends on
tensorflow-metadata>=0.22.2,<0.23
. - Removed tensor_to_arrow_test for TF 1.x as it does not support TF 1.x.
- Removed
arrow.table_util.SliceTableByRowIndices
(in favor ofRecordBatchTake
) - Removed
arrow.table_util.MergeTables
(in favor ofMergeRecordBatches
)
- Moved RunInference API and related protos to tfx_bsl/public directory.
- CSV coder support for multivalent columns.
- tf.Exmaple coder support for producing large types (LargeList, LargeBinary).
- Added TFXIO for CSV
- Depends on
apache-beam[gcp]>=2.20,<3
. - Depends on
pyarrow>=0.16,<0.17
- Depends on
tensorflow-metadata>=0.22,<0.23
- Renamed ModelEndpointSpec to AIPlatformPredictionModelSpec to specify remote model endpoint on Google Cloud Platform.
- Renamed InferenceEndpoint to InferenceSpecType.
- Added a tfxio.telemetry.ProfileRecordBatches, a PTransform to collect telemetry from Arrow RecordBatches.
- Added remote model inference on Google Cloud Platform.
- Added
arrow.table_util.MergeRecordBatches
: similar toMergeTables
but operates againstpa.RecordBatch
es. - Added
arrow.table_util.RecordBatchTake
: similar toSliceTableByRowIndices
but operates against apa.RecordBatch
. - Requires
apache-beam>=2.17,<3
- Only requires
avro-python3>=1.8.1,!=1.9.2.*,<2.0.0
on Python 3.5 + MacOS - Requires
google-api-python-client>=1.7.11,<2
- Requires
apache-beam>=2.17,<2.18
- Fixed a bug in tfx_bsl.arrow.array_util.GetFlattenedArrayParentIndices that could cause memory corruption.
- Defined an abstract subclass of
TFXIO
,RecordBasedTFXIO
to model record based file formats.
-
Utilities in
tfx_bsl.arrow.array_util
that:- previously takes
ListArray
now can also acceptLargeListArray
. - previously takes StringArray/BinaryArray now can also accept LargeStringArray and LargeBinaryArray.
As a result:
GetElementLengths
now returns anInt64Array
.GetFlattenedArrayParentIndices
may return anInt64Array
or anInt32Array
depending on the input type. - previously takes
-
Introduced TFXIO, the interface for Standardized TFX Inputs
-
Added the first implementation of TFXIO, for tf.Example on TFRecords.
- Added a test_util sub-package that contains a tool to discover and run all the absltests in a dir (like python's unittest discovery).
- Requires
apache-beam>=2.17,<3
- Requires
pyarrow>=0.15,<0.16
- Requires
tensorflow>=1.15,<3
- Requires
tensorflow-metadata>=0.21,<0.22
.
- Requires
apache-beam>=2.16,<2.17
as 2.17 requires a pyarrow version that we don't support yet.
-
Behavior of csv_decoder.ColumnTypeInferrer was changed. A new column type,
ColumnType.UNKNOWN
was added to denote that the inferrer could not determine the type of that column (instead of making a guess of FLOAT). Summary of behavior change (values in the examples are from the same column):<int>, <empty>
: before:FLOAT
; after:INT
<empty>, ... , <empty>
: before:FLOAT
; after:UNKNOWN
- Added a (beam) utility to infer column types from a
PCollection[CSVLine]
. - Added a utility to parse a CSVLine into cells (conforming to RFC4180).
- Added dependency on
tensorflow>=1.15,<2.2
. Starting from 1.15, packagetensorflow
comes with GPU support. Users won't need to choose betweentensorflow
andtensorflow-gpu
.- Caveat:
tensorflow
2.0.0 is an exception and does not have GPU support. Iftensorflow-gpu
2.0.0 is installed before installingtfx-bsl
, it will be replaced withtensorflow
2.0.0. Re-installtensorflow-gpu
2.0.0 if needed.
- Caveat:
- Added dependency on
tensorflow-serving-api>=1.15,<3
. - Added a python PTransform,
tfx_bsl.beam.RunInference
that enables batch inference.
- Added a tf.Example <-> Arrow coder.
- Added a tf.Example ->
Dict[str, np.ndarray]
coder (this is a legacy format used by some TFX components). - Added some common Arrow utilities (
tfx_bsl.arrow.array_util
). - Added a python class,
tfx_bsl.beam.Shared
that helps sharing a single instance of object across multiple threads. - Added dependency on
apache-beam[gcp]>=2.16,<3
. - Added dependency on
tensorflow-metadata>=0.15,<0.16
.