Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate delta's golden table tests: snapshot tests and comparison script #51

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

zachschuermann
Copy link

@zachschuermann zachschuermann commented Jul 23, 2024

Description

Adding delta's existing golden table tests to DAT. This PR introduces just a small first batch (the first 'snapshot' tests) and a comparison utility which was used to compare the tables produced by the new pyspark code with the tables produced by the old delta golden table code (and persisted in the delta repo).

The tests are first translated to pyspark, then can be tested by generating the tables and using the comparison utility to check that the latest snapshot matches (more advanced tests might take manual checking to confirm). Also ran these tests against delta-kernel-rs and so far all green :)

How was this patch tested?

# after adding the tests, generate tables (and expectations)
make write-generated-tables

# check against the old persisted delta tests (snapshot-vacuum example)
poetry run python util/compare.py out/reader_tests/generated/snapshot-vacuumed/delta <path-to-delta-repo>/connectors/golden-tables/src/main/resources/golden/snapshot-vacuumed

# manually replace the acceptance tests in delta-kernel-rs to run against the new tests
# with cwd delta-kernel-rs/acceptance:
cp -r <path-to-dat>/out/reader_tests/generated tests/dat/out/reader_tests/generated
cargo t

@zachschuermann zachschuermann changed the title [wip] Add golden table snapshot tests and comparison script Add golden table snapshot tests and comparison script Jul 23, 2024
@zachschuermann zachschuermann changed the title Add golden table snapshot tests and comparison script Migrate delta's golden table tests: snapshot tests and comparison script Jul 23, 2024
Copy link
Contributor

@nicklan nicklan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, mostly lgtm modulo a few comments.

print(df2)

# Check schema compatibility (columns and types)
if df1.schema != df2.schema:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't we just use assertDataFrameEqual and assertSchemaEqual

.mode("overwrite") \
.save(table_path)

@reference_table(name="snapshot-data0", description="golden tables snapshot-data0 test")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i know these are the names from the golden tests in delta. they are really not descriptive :) If you know a bit more what they are testing maybe we can rename them to add a little more color.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants