Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add method to move files between tables #83

Open
hagenw opened this issue Jun 8, 2021 · 6 comments
Open

Add method to move files between tables #83

hagenw opened this issue Jun 8, 2021 · 6 comments
Labels
documentation Improvements or additions to documentation enhancement New feature or request

Comments

@hagenw
Copy link
Member

hagenw commented Jun 8, 2021

Let's say you would like to move a list of files from an existing train table to an existing test table.
The easiest solution I found so far would be:

db['tmp'] = db['train'].pick_files(files)
db['tmp'].split_id = db['test'].split_id
db['train'].drop_files(files, inplace=True)
db['test'].update(db['tmp'])
db.drop_tables('tmp')

so it might be easier to have something like:

db['test'].move_files(db['train'], files)

Of course it will only work if your columns match, but the same is true for update().

@hagenw hagenw added the enhancement New feature or request label Jun 8, 2021
@hagenw
Copy link
Member Author

hagenw commented Jun 8, 2021

The problem with my syntax above is that it is not obvious from which table to which table the files are moved, so maybe we have to find a better syntax.

@frankenjoe
Copy link
Collaborator

Maybe simply rename to move_files_to()?

@hagenw
Copy link
Member Author

hagenw commented Jan 5, 2022

I found another way of achieving the task without the need of introducing an extra tmp table (see example below line).
This means there are at least two ways of achieving the requested task. If we would add an extra method for it, I would say we need at least to add move_files_to() and move_index_to() to stay in line with the current methods like drop_files(), drop_index(). As we would need to cover a lot of special cases in the corresponding tests and I'm not sure if it is worth the effort adding those methods.


Let us first create a dummy database with a test and train table.

db = audformat.Database('test')
db.schemes['data'] = audformat.Scheme(int)
db['train'] = audformat.Table(index=audformat.filewise_index(['a', 'b', 'c']))
db['test'] = audformat.Table(index=audformat.filewise_index(['d']))
db['train']['data'] = audformat.Column(scheme_id='data')
db['test']['data'] = audformat.Column(scheme_id='data')
db['train']['data'].set([0, 1, 0])
db['test']['data'].set([1])

This results in

>>> db['train'].df
      data
file      
a        0
b        1
c        0
>>> db['test'].df
      data
file      
d        1

Now lets target to move file 'c' from the train to the test table.

index = audformat.filewise_index(['c'])
db['test'] = db['test'].extend_index(index)
db['test']['data'].set(db['train'].df.loc[index, 'data'], index=index)
db['train'] = db['train'].drop_index(index)

This results in

>>> db['train'].df
      test
file      
a        0
b        1
>>> db['test'].df
      test
file      
c        0
d        1

@hagenw
Copy link
Member Author

hagenw commented Jan 5, 2022

So maybe as an alternative to providing new methods we add a section to the documentation where we collect a few examples for updating an existing database (e.g. by extending https://audeering.github.io/audformat/update-database.html). There we could also cover stuff like #61

@frankenjoe
Copy link
Collaborator

add a section to the documentation where we collect a few examples for updating an existing database

Yes, I think that makes sense since moving files from one table to another is not a very common use case.

@frankenjoe
Copy link
Collaborator

I would even argue that this is not something we should encourage the user to do. Messing around with test and train splits can be dangerous. Though, I see that there is sometimes the the need to do it when publishing a new version of a database.

@hagenw hagenw added the documentation Improvements or additions to documentation label Jan 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants