Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support java.nio.file.FileSystem abstraction #7

Open
RickMoynihan opened this issue Feb 14, 2022 · 3 comments
Open

Support java.nio.file.FileSystem abstraction #7

RickMoynihan opened this issue Feb 14, 2022 · 3 comments

Comments

@RickMoynihan
Copy link

Hi,

Firstly thanks for datoteka I've been using it across several projects for several years.

The main limitation I have encountered in using it however is that it only supports creation of paths on the default filesystem, and doesn't support constructing paths in other java FileSystems; in particular zip files.

In a few projects I've had the need to operate on paths and perform IO reading and writing files in and out of zip files; and consequently I had to work around some of these limitations and develop some custom code for interop with java's ZipFileSystem. Java ships with a ZipFileSystemProvider, which can be used for this purpose.

I was thinking it would be nice to contribute and integrate some of this code into datoteka or as a companion library. Is it something you'd be interested in?

I'm not entirely sure what would be required to maintain backwards compatibility with datoteka yet. However I believe the main change is simply that I added a new path constructor which takes a FileSystem as the first argument (or if no filesystem is provided here it assumes the default filesystem), and it then returns paths in that filesystem. I believe these paths can then work with most of the datoteka functions as they stand; though some may want to also take an optional filesystem argument.

If you're interested what do you think is the best way to proceed? I was thinking I might first extract the code into a separate library, and share it here for discussion on how best to integrate, before working up a PR to attempt it?

There is some additional code I wrote to also handle reading zip files and managing their resources from multiple threads; which is necessary because the underlying java code doesn't do this very well. I don't know whether that belongs here or not yet; but it is very useful if you need to for example serve files out of zip files over http and may have multiple requests accessing the same file at the same time.

Anyway let me know if this is something worth trying to contribute?

Many thanks.

@niwinz
Copy link
Member

niwinz commented Feb 14, 2022

Hi,

Firstly thanks for datoteka I've been using it across several projects for several years.

\o/ thanks, and glad to know it.

The main limitation I have encountered in using it however is that it only supports creation of paths on the default filesystem, and doesn't support constructing paths in other java FileSystems; in particular zip files.

In a few projects I've had the need to operate on paths and perform IO reading and writing files in and out of zip files; and consequently I had to work around some of these limitations and develop some custom code for interop with java's ZipFileSystem. Java ships with a ZipFileSystemProvider, which can be used for this purpose.

I was thinking it would be nice to contribute and integrate some of this code into datoteka or as a companion library. Is it something you'd be interested in?

I'm not entirely sure what would be required to maintain backwards compatibility with datoteka yet. However I believe the main change is simply that I added a new path constructor which takes a FileSystem as the first argument (or if no filesystem is provided here it assumes the default filesystem), and it then returns paths in that filesystem. I believe these paths can then work with most of the datoteka functions as they stand; though some may want to also take an optional filesystem argument.

If you're interested what do you think is the best way to proceed? I was thinking I might first extract the code into a separate library, and share it here for discussion on how best to integrate, before working up a PR to attempt it?

I think it is worth to implement. From my perspective, it depends on how you feel comfortable. For me is ok, with a "draft" PR with quick and dirty changes to show the approximate changes, when we can discuss the particularities. But it also ok for separate repo, or whatever is ok for you and make you spend less time.
As I can deduce reading this issue, the changes are mainly have the ability to pass the filesystem in some functions. That looks reasonable and probably will maintain the backward compatibility without any problem. We can look over the code case by case :D

There is some additional code I wrote to also handle reading zip files and managing their resources from multiple threads; which is necessary because the underlying java code doesn't do this very well. I don't know whether that belongs here or not yet; but it is very useful if you need to for example serve files out of zip files over http and may have multiple requests accessing the same file at the same time.

About this, I'm not clearly sure. It may fit in datoteka and may not. In any case it will probably live in a separated namespace than core. Show me examples with that i can reason about and I give you my opinion. Honestly, I hadn't thought about this until now.

Anyway let me know if this is something worth trying to contribute?

Many thanks.

@RickMoynihan
Copy link
Author

Thanks for the quick response!

As I can deduce reading this issue, the changes are mainly have the ability to pass the filesystem in some functions. That looks reasonable and probably will maintain the backward compatibility without any problem. We can look over the code case by case :D

For now I've just extracted the relevant code into a gist here:

https://gist.github.com/RickMoynihan/af3049c7214077ab8eb1a352e45946e0

It was largely just written to get something working; rather than with integration with datoteka in mind. So it currently defines its own paths function which takes a filesystem as its first argument and returns paths within that filesystem. There are also some functions open-zip-fs, create-zip-fs and make-zip-file which can be used to either create/open a zip file system accordingly, or to write files into a zip according to a desired layout.

The main difference (providing the filesystem for context) is here:

https://gist.github.com/RickMoynihan/af3049c7214077ab8eb1a352e45946e0#file-zip_file-clj-L41-L74

I'll have a look at what needs done to extend datoteka without breaking compatibility.

@niwinz
Copy link
Member

niwinz commented Feb 15, 2022

After reading the gist file, I can confirm that there are many things that can be ported to datoteka with small changes.

Aso, I think the subsystem with locking and reference counting should live outside of datoteka. Looks nice but I think is not "general purpose". And the zip-* related functions for creating the filesystem can be more generic, because on the end they are calling newFileSistem that is generic and not related to zip.

I will need more in deepth looking on it but this is a good start point. If it is ok for you, I will prepare a PR myself taking the gist as base and we review on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants