Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calculate file hashes once #639

Merged
merged 1 commit into from
Mar 9, 2020

Conversation

ashleshaAtrey
Copy link
Contributor

Currently, mixer build update performs multiple hash calculations for each file in the chroot.
Every manifest file entry and subtracted file will have their hashes calculated.
It would be a significant performance improvement to calculate the hash one time for each file and re-use it.

fixes #585

Signed-off-by: Ashlesha Atrey ashlesha.atrey@intel.com

@rchiossi
Copy link
Contributor

rchiossi commented Nov 7, 2019

Nice work!

Do you have timing comparison with an all-upstream build?
Also, can we get memory usage comparison as well? I'd like to know how much extra memory we need for this table.

@ashleshaAtrey
Copy link
Contributor Author

For all upstream bundles,
CREATE MANIFESTS 2m51.324s (with the new implementation)
CREATE MANIFESTS 5m20.483s ( Previous implementation)

I will check the memory usage.

@reaganlo reaganlo assigned jwakre and unassigned jwakre Nov 7, 2019
@reaganlo reaganlo requested a review from jwakre November 7, 2019 18:18
@jwakre
Copy link

jwakre commented Nov 7, 2019

Also, can we get memory usage comparison as well? I'd like to know how much extra memory we need for this table.

The scope of this change could be extended to reduce the memory consumption from current levels instead of increasing it. IIRC, there are many duplicate copies of file metadata. We could reduce the memory impact by collecting file metadata once for each file and use pointers to reference it when needed instead of creating multiple copies. Currently, the File struct contains both file and manifest specific metadata. These would need to be decoupled because the file metadata can be shared, but the manifest metadata cannot.

@ashleshaAtrey
Copy link
Contributor Author

I printed runtime stats in addAllManifestFiles after all routines are done running. Found the below results.
Alloc = 6783 MiB TotalAlloc = 47924 MiB Sys = 8597 MiB NumGC = 22 // using new implementation
Alloc = 7859 MiB TotalAlloc = 133757 MiB Sys = 9190 MiB NumGC = 45 // prev implementation

I am actually seeing improvement in memory for this implementation as well. I am still digging more into it.

@jwakre
Copy link

jwakre commented Nov 7, 2019

The scope of this change could be extended to reduce the memory consumption from current levels instead of increasing it. IIRC, there are many duplicate copies of file metadata. We could reduce the memory impact by collecting file metadata once for each file and use pointers to reference it when needed instead of creating multiple copies. Currently, the File struct contains both file and manifest specific metadata. These would need to be decoupled because the file metadata can be shared, but the manifest metadata cannot.

I created #641 to document how the memory usage could be reduced.

@reaganlo
Copy link
Contributor

reaganlo commented Nov 8, 2019

Overall code and initial tests seems to be good. Will test more.

Copy link

@jwakre jwakre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

There is a race condition where multiple files will be calculated at the same time, but the end result will still be the same. Another change can remove the race condition.

@reaganlo
Copy link
Contributor

reaganlo commented Mar 5, 2020

@ashleshaAtrey Can you rebase this with upstream and do a make check
I am getting a seg fault:

--- FAIL: TestCreateManifestsBasic (0.00s)
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x88 pc=0x791650]

Currently, mixer build update performs multiple hash calculations for each file in the chroot.
Every manifest file entry and subtracted file will have their hashes calculated.
It would be a significant performance improvement to calculate the hash one time for each file and re-use it.

fixes clearlinux#585

Signed-off-by: Ashlesha Atrey <ashlesha.atrey@intel.com>
@ashleshaAtrey
Copy link
Contributor Author

@ashleshaAtrey Can you rebase this with upstream and do a make check
I am getting a seg fault:

--- FAIL: TestCreateManifestsBasic (0.00s)
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x88 pc=0x791650]

It was failing due to two versions being generated in test at the same time without changing the global variable map. Updated the code to use full path of file as key.

Copy link
Contributor

@reaganlo reaganlo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 LGTM

@reaganlo reaganlo merged commit 301fb7e into clearlinux:master Mar 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Calculate file hashes once
4 participants