Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade spark dependency to 3.3.0 #824

Merged
merged 15 commits into from
Nov 3, 2022

Conversation

WeichenXu123
Copy link
Contributor

@WeichenXu123 WeichenXu123 commented Aug 15, 2022

Upgrade spark dependency to 3.3.0

Closes #823

@WeichenXu123
Copy link
Contributor Author

CC @jsleight Could you help review and merge if it is good ?

@@ -1,7 +1,7 @@
numpy>=1.8.2
six>=1.10.0
scipy>=0.13.0b1
pandas>=0.18.1, <= 0.24.2
pandas>=1.0.5
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spark 3.3 requires pandas>=1.0.5

Comment on lines -26 to -28
.PHONY: py36_test
py36_test:
source scripts/scala_classpath_for_python.sh && make -C python py36_test
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spark 3.3 does not support python3.6

lazy val awsSdkVersion = "1.11.1033"
val tensorflowJavaVersion = "0.4.0" // Match Tensorflow 2.7.0 https://github.com/tensorflow/java/#tensorflow-version-support
val xgboostVersion = "1.5.2"
val breezeVersion = "1.0"
val breezeVersion = "1.2"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep the same with spark 3.3 breeze dependency.

val scalaTestVersion = "3.0.8"
val junitVersion = "5.8.2"
val akkaVersion = "2.6.14"
val akkaHttpVersion = "10.2.4"
val springBootVersion = "2.6.2"
lazy val logbackVersion = "1.2.3"
lazy val loggingVersion = "3.9.0"
lazy val slf4jVersion = "1.7.30"
lazy val slf4jVersion = "1.7.36"
Copy link
Contributor Author

@WeichenXu123 WeichenXu123 Aug 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Old log4j version has conflicts with spark 3.3 dependencies.

x = y.ix[:,0]
y = y.ix[:,1]
x = y.iloc[:,0]
y = y.iloc[:,1]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because required pandas version >= 1.05, the .ix method is removed.

@WeichenXu123
Copy link
Contributor Author

CC @jsleight Could you help review and merge if it is good ?

The CI does not run on my last commit, could you help fix it ?

@WeichenXu123
Copy link
Contributor Author

CC @jsleight Thanks!

@WeichenXu123
Copy link
Contributor Author

CC @jsleight Thanks!

1 similar comment
@WeichenXu123
Copy link
Contributor Author

CC @jsleight Thanks!

@realknorke
Copy link

Can please someone review this PR and maybe merge? Pretty please? Mleap is currently blocking Spark upgrades to latest stable…
Thank you! :D

@ancasarb
Copy link
Member

@WeichenXu123 @realknorke sorry, we had run out of credits for travis CI, can you please push a dummy commit to trigger the build now? will merge it once it's green.

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Comment on lines +1035 to +1045
return pd.DataFrame(np.add(x, y), columns=[self.output_features])
elif self.transform_type == 'sub':
return pd.DataFrame(np.subtract(x, y))
return pd.DataFrame(np.subtract(x, y), columns=[self.output_features])
elif self.transform_type == 'mul':
return pd.DataFrame(np.multiply(x, y))
return pd.DataFrame(np.multiply(x, y), columns=[self.output_features])
elif self.transform_type == 'div':
return pd.DataFrame(np.divide(x, y))
return pd.DataFrame(np.divide(x, y), columns=[self.output_features])
elif self.transform_type == 'rem':
return pd.DataFrame(np.remainder(x, y))
return pd.DataFrame(np.remainder(x, y), columns=[self.output_features])
elif self.transform_type == 'pow':
return pd.DataFrame(x**y)
return pd.DataFrame(x**y, columns=[self.output_features])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: A fix.

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
assert_frame_equal(res_a, res_b)

# TODO: Deserialization on output_features has some issue. fix this.
# assert_frame_equal(res_a, res_b)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ancasarb

Could you help fixing this ?

This is an existing bug but previous test does not cover it.

But this is not related to this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you make a github issue for this so we don't forget about it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed ticket: #830

@WeichenXu123
Copy link
Contributor Author

CC @ancasarb PR is ready for merging :)

assert_frame_equal(res_a, res_b)

# TODO: Deserialization on output_features has some issue. fix this.
# assert_frame_equal(res_a, res_b)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you make a github issue for this so we don't forget about it?

@jsleight
Copy link
Contributor

So before we merge this do we want to do one last mleap release on spark 3.2? There have been a reasonable number of bug fixes since last release and we might want to ship those out before requiring folks to do the spark 3.3 upgrade? cc @ancasarb

@WeichenXu123
Copy link
Contributor Author

CC @jsleight Can we merge now :) ?

@jsleight
Copy link
Contributor

Personally I would like to release all the 3.2.x fixes before this upgrade, but I think only @ancasarb has the release keys.

@WeichenXu123
Copy link
Contributor Author

WeichenXu123 commented Sep 30, 2022

OK fine. Let's wait. CC @ancasarb

@WeichenXu123
Copy link
Contributor Author

CC @jsleight @ancasarb Any progress ? Thank you.

@jsleight jsleight merged commit aa93bab into combust:master Nov 3, 2022
@WeichenXu123
Copy link
Contributor Author

@jsleight Thank you!

@realknorke
Copy link

This topic (Spark3.3) is getting more and more urgent. Would it be possible to at least create a release candidate for the next mleap release?

@WeichenXu123
Copy link
Contributor Author

@jsleight Can we release it for spark 3.3 ? :)

@jsleight
Copy link
Contributor

jsleight commented Feb 8, 2023

releasing a new mleap version sgtm. I might have time to do this next week.

@realknorke
Copy link

Thank you very much, guys! :D

@realknorke
Copy link

@jsleight Can you please release a new version w/ Spark 3.3.2 support and I'll buy you a beer! ;)

@jsleight
Copy link
Contributor

Apologies for the delay. I just finished releasing mleap v0.22.0 which is on spark 3.3.0. See the release notes for the full changelong. You should see the relevant artifacts on sonatype and pypi. Cheers!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants