-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implement backup descriptor for incremental backup #632
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am still working on reviewing the unit tests. but posting these comments here to unblock Qiyang from addressing them.
0 == str.compare(str.size()-suffix.size(), suffix.size(), suffix); | ||
} | ||
|
||
inline int64_t remove_leading_zero(std::string& num) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the function is not expensive prefer functions that don't modify input to avoid accidential bugs.
For example I'd just return the new string and pass num as constant.
@@ -142,20 +170,59 @@ std::shared_ptr<common::S3Util> ApplicationDBBackupManager::createLocalS3Util( | |||
return local_s3_util; | |||
} | |||
|
|||
// convert the string to a backup_descriptor map, if any error happens, it will return an empty map | |||
// TODO: add sst_id_min and sst_id_max parsing | |||
void ApplicationDBBackupManager::parseBackupDesc(const std::string& contents, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of writing our own parser we can just write the backup descriptor in json and parse the json
/// Copyright 2016 Pinterest Inc. |
This will also be very easy to read manually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bool ApplicationDBBackupManager::backupDBToS3(const std::shared_ptr<ApplicationDB>& db) { | ||
// ::admin::AdminException e; | ||
|
||
common::Timer timer(kS3BackupMs); | ||
LOG(INFO) << "S3 Backup " << db->db_name() << " to " << ensure_ends_with_pathsep(FLAGS_s3_incre_backup_prefix) << db->db_name(); | ||
auto ts = common::timeutil::GetCurrentTimestamp(); | ||
auto local_path = folly::stringPrintf("%ss3_tmp/%s%d/", rocksdb_dir_.c_str(), db->db_name().c_str(), ts); | ||
auto local_path = folly::stringPrintf("%ss3_tmp/%s%ld/", rocksdb_dir_.c_str(), db->db_name().c_str(), ts); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add directory prefix incremental to distinguish regular backup from incremental backup
boost::system::error_code remove_last_backup_err; | ||
boost::system::error_code create_last_backup_err; | ||
// if we have not removed the last backup, we do not need to download it | ||
bool need_download = !boost::filesystem::exists(local_path_last_backup + backupDesc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea to
- Only download, avoid looking into local directory.
- When writing, write the backup descriptor last (Since this means when we upload backup descriptor we have completely uploaded last backup with no errors).
Generally having two sources of data means we have to deal with inconsistencies between them that's why I am suggesting only use s3.
|
||
// Create the new backup descriptor file | ||
std::string backup_desc_contents; | ||
makeUpBackupDescString(file_to_ts, sst_id_min, sst_id_max, backup_desc_contents); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to put min_id and max_id is it an optimization or some sort of check?
std::string backup_desc_contents; | ||
makeUpBackupDescString(file_to_ts, sst_id_min, sst_id_max, backup_desc_contents); | ||
common::FileUtil::createFileWithContent(formatted_checkpoint_local_path, backupDesc, backup_desc_contents); | ||
checkpoint_files.emplace_back(backupDesc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no reason to use emplace_back here, just use push_back
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this guarantee that backup descriptor will only be uploaded after all files get uploaded?
Addressed all the comments. Could you have another look? @mradwan Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing the coments. I left one comment regarding leaving another comment if you can leave it that would be great.
Thank you!
temp_map[item.first] = item.second; | ||
} | ||
contents[fileToTs] = temp_map; | ||
contents[lastBackupTs] = last_ts; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great!
std::unordered_map<std::string, int64_t> file_to_ts; | ||
const string sst_suffix = ".sst"; | ||
|
||
if (last_ts >= 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case if machine restarts we wouldnt have last_ts.
Can you leave a comment that we need to handle this code doesn't handle this case well?
Thanks!
Pushed a new version with backup starter |
Follow Qiyang He's design for incremental backup, this diff adds the backup descriptor for each backup.
Generally, it will create a folder for each backup distinguished by the timestamp. Under each folder, there will be newly added SST files (including those from compaction), backup descriptor, CURRENT, MANIFEST and OPTION. The backup descriptor contains a map from SST filename to timestamp and the range of SST files. You can refer to https://docs.google.com/document/d/1aRO1PJ8qUN-yn6iUK7cisk7qSAsKFhnoyig0_T89jTo/edit#heading=h.oxp5c6sx5wxx for more details.