Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow synchronization of filename attributes #303

Open
tbabej opened this issue Jun 13, 2023 · 4 comments · May be fixed by #304
Open

Allow synchronization of filename attributes #303

tbabej opened this issue Jun 13, 2023 · 4 comments · May be fixed by #304
Labels
bug Something isn't working

Comments

@tbabej
Copy link
Contributor

tbabej commented Jun 13, 2023

Currently, datafiles handles any attributes that are part of the filepath as not being stored in the file itself. Personally I consider this a good design, as duplicating the value would potentially cause issues if the "filename" and "file content" values did not agree.

However, it seems we are unable to preserve any changes to the attributes that are part of the filename, effectively rendering them read-only. Consider a following datafile called JohnDoe.yml:

age: 42

and an associated short script:

@datafiles.datafile("{self.name}.yml", defaults=True)
class Person:
    name: str
    age: int

person = list(Person.objects.all())[0]
print(person)

As expected, this will produce the following output:

Person(name='JohnDoe', age=42)

However, if we continue to modify the model instance:

person.name="BobBuilder"
person.age=24

we can observe on the filesystem that the file has been modified:

age: 24

however, the filename itself has not been changed, it is still JohnDoe.yml, resulting in the following data:

Person(name='JohnDoe', age=24)

which does not align itself with the expected output from the viewpoint of the user. I would argue we should support one of the two options:
a) the "filename" attributes should be frozen and not allowed to be modified
b) or we should allow their modification, but rename the files, effectively preserving their modified value in the filename

If renaming is too drastic for it to be the default, we could perhaps have (a) as the default behaviour and allow (b) via an optional rename=True flag (similar how we handle defaults=True).

@jacebrowning jacebrowning added the bug Something isn't working label Jun 13, 2023
@jacebrowning
Copy link
Owner

Thanks for the clear bug report!

b) or we should allow their modification, but rename the files, effectively preserving their modified value in the filename

I'd worry that having Datafiles rename files will cause all sorts of unforeseen problems. For example, what if two processes have the file open? If manual=True is set then the attribute and filename will potentially disagree.

a) the "filename" attributes should be frozen and not allowed to be modified

On first pass, I think this is preferable and filename attributes should always be frozen.

If a filename attribute is modified, what should happen? Ignore that attribute's changed value or raise an exception? This feels potentially related to #294.

@tbabej
Copy link
Contributor Author

tbabej commented Jun 13, 2023

In general I agree that (a) is preferable as the default option. In particular, your suggested solution of ignoring the change and raising an exception sounds like the right solution here.

That said, I believe there are valid use cases for changing the "filename" attributes and having them reflect in the filename - especially if the behaviour is opt-in via a rename=True flag.

The concern about two processes accessing the same file is valid, but I never felt like that was a fully supported use case - I suspect the users can already mangle the data by having the processes overwriting each the same datafiles (i.e. if using manual=True for performance reasons). We could avoid this issue and the problem with renaming by introducing a filesystem-level locking mechanism via something like fcntl.flock(f, fcntl.LOCK_EX | fcntl.LOCK_NB), which could prevent both the overwriting and renaming.

If all of this sounds acceptable, I would volunteer to implement the required changes. I already have a preliminary version of (b) implemented, and I can also contribute (a) and potentially file locking if that's what it would take to get the changes merged upstream 🙂

@jacebrowning
Copy link
Owner

I already have a preliminary version of (b) implemented

I'd like to see what that looks like in a PR!

Given that the defaults=True setting combined with modifying filename attributes is already a fairly uncommon scenario, perhaps my concerns over file contention are exaggerated. 😄

@tbabej
Copy link
Contributor Author

tbabej commented Jun 13, 2023

@jacebrowning I put up the preliminary changes, have a look at #304 🙂

Given that the defaults=True setting combined with modifying filename attributes is already a fairly uncommon scenario, perhaps my concerns over file contention are exaggerated.

I wouldn't discount the possibility, I really do think this is a valid concern. But as I mentioned, we currently lack the mechanism to prevent data corruption in the scenario of multiple processes accessing the same data files. I haven't tested it, but I am fairly certain that with sufficiently high amount of simultaneous processes reading and modifying the same data files, even the current state of things would lead to data corruption. But that's perhaps a discussion for a different issue 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants