Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Writing parquet files #16

Open
peterbe opened this issue May 31, 2016 · 4 comments
Open

Writing parquet files #16

peterbe opened this issue May 31, 2016 · 4 comments

Comments

@peterbe
Copy link

peterbe commented May 31, 2016

Hi,
We need to be able to write python dicts to parquet. What are the chances that you'll have time to work on this? I.e. a writer class.

My team is totally new to parquet so we have a lot to learn. We did see #13 which claims to have a writer functionality but that PR is out-of-sync and tries to solve a couple of other things at the same time.

Would appreciate your thoughts on this project's near future.

cc @adngdb

@martindurant
Copy link

If you wish to write dicts, as opposed to tabular data, you may be better off looking at avro. There are working python libraries, avro (official, slow), fastavro and cyavro.

@peterbe
Copy link
Author

peterbe commented Jun 6, 2016

My stats team say they want it stored in parquet (in S3). I have many individual big dicts that I want to store. Most of them are 1-level dicts, so it's quite tabular. All of it needs to happen from CPython, not a JVM.

@martindurant
Copy link

In that case, you have two options: to wait for the ongoing work by the apache-arrow to enable the conversion of pandas dataframes to parquet (so, presumably, any data structure you can store in a dataframe), or - of course - to work on the writer in this project. I personally have no plans to work on it in the near future.

@peterbe
Copy link
Author

peterbe commented Jun 6, 2016

Thanks! I appreciate the update and tips. I'll try to get a handle on the state of Python support inside arrow. I see the code's there but skimming through it, I only see support (no idea of it's completion state) for readiing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants