Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement backup descriptor for incremental backup #632

Open
wants to merge 16 commits into
base: master
Choose a base branch
from

Conversation

qiyanghe1998
Copy link
Contributor

Follow Qiyang He's design for incremental backup, this diff adds the backup descriptor for each backup.

Generally, it will create a folder for each backup distinguished by the timestamp. Under each folder, there will be newly added SST files (including those from compaction), backup descriptor, CURRENT, MANIFEST and OPTION. The backup descriptor contains a map from SST filename to timestamp and the range of SST files. You can refer to https://docs.google.com/document/d/1aRO1PJ8qUN-yn6iUK7cisk7qSAsKFhnoyig0_T89jTo/edit#heading=h.oxp5c6sx5wxx for more details.

@qiyanghe1998 qiyanghe1998 marked this pull request as ready for review August 5, 2022 21:59
Copy link
Contributor

@mradwan mradwan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still working on reviewing the unit tests. but posting these comments here to unblock Qiyang from addressing them.

0 == str.compare(str.size()-suffix.size(), suffix.size(), suffix);
}

inline int64_t remove_leading_zero(std::string& num) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the function is not expensive prefer functions that don't modify input to avoid accidential bugs.

For example I'd just return the new string and pass num as constant.

@@ -142,20 +170,59 @@ std::shared_ptr<common::S3Util> ApplicationDBBackupManager::createLocalS3Util(
return local_s3_util;
}

// convert the string to a backup_descriptor map, if any error happens, it will return an empty map
// TODO: add sst_id_min and sst_id_max parsing
void ApplicationDBBackupManager::parseBackupDesc(const std::string& contents,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of writing our own parser we can just write the backup descriptor in json and parse the json

/// Copyright 2016 Pinterest Inc.

This will also be very easy to read manually.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bool ApplicationDBBackupManager::backupDBToS3(const std::shared_ptr<ApplicationDB>& db) {
// ::admin::AdminException e;

common::Timer timer(kS3BackupMs);
LOG(INFO) << "S3 Backup " << db->db_name() << " to " << ensure_ends_with_pathsep(FLAGS_s3_incre_backup_prefix) << db->db_name();
auto ts = common::timeutil::GetCurrentTimestamp();
auto local_path = folly::stringPrintf("%ss3_tmp/%s%d/", rocksdb_dir_.c_str(), db->db_name().c_str(), ts);
auto local_path = folly::stringPrintf("%ss3_tmp/%s%ld/", rocksdb_dir_.c_str(), db->db_name().c_str(), ts);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add directory prefix incremental to distinguish regular backup from incremental backup

boost::system::error_code remove_last_backup_err;
boost::system::error_code create_last_backup_err;
// if we have not removed the last backup, we do not need to download it
bool need_download = !boost::filesystem::exists(local_path_last_backup + backupDesc);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea to

  1. Only download, avoid looking into local directory.
  2. When writing, write the backup descriptor last (Since this means when we upload backup descriptor we have completely uploaded last backup with no errors).

Generally having two sources of data means we have to deal with inconsistencies between them that's why I am suggesting only use s3.


// Create the new backup descriptor file
std::string backup_desc_contents;
makeUpBackupDescString(file_to_ts, sst_id_min, sst_id_max, backup_desc_contents);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to put min_id and max_id is it an optimization or some sort of check?

std::string backup_desc_contents;
makeUpBackupDescString(file_to_ts, sst_id_min, sst_id_max, backup_desc_contents);
common::FileUtil::createFileWithContent(formatted_checkpoint_local_path, backupDesc, backup_desc_contents);
checkpoint_files.emplace_back(backupDesc);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no reason to use emplace_back here, just use push_back

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this guarantee that backup descriptor will only be uploaded after all files get uploaded?

@qiyanghe1998
Copy link
Contributor Author

Addressed all the comments. Could you have another look? @mradwan Thanks!

Copy link
Contributor

@mradwan mradwan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing the coments. I left one comment regarding leaving another comment if you can leave it that would be great.

Thank you!

temp_map[item.first] = item.second;
}
contents[fileToTs] = temp_map;
contents[lastBackupTs] = last_ts;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

std::unordered_map<std::string, int64_t> file_to_ts;
const string sst_suffix = ".sst";

if (last_ts >= 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case if machine restarts we wouldnt have last_ts.

Can you leave a comment that we need to handle this code doesn't handle this case well?

Thanks!

@qiyanghe1998
Copy link
Contributor Author

Pushed a new version with backup starter

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants