Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Always use bucket virtual hostname? #464

Closed
semaperepelitsa opened this issue Aug 29, 2018 · 18 comments
Closed

Always use bucket virtual hostname? #464

semaperepelitsa opened this issue Aug 29, 2018 · 18 comments

Comments

@semaperepelitsa
Copy link

semaperepelitsa commented Aug 29, 2018

I'm trying to restrict network access of my app to only certain hosts which I control, for security reasons. One of my app's components is S3. Amazon makes it possible with virtual hostnames - mine is called semaperepelitsa-test.s3-eu-west-1.amazonaws.com. So I would like to allow only this host, and block all others. But I'm not sure how to make it work with Fog.

Example:

storage = Fog::Storage.new(
  :provider => 'AWS',
  :aws_access_key_id => ENV['AWS_ACCESS_KEY_ID'],
  :aws_secret_access_key => ENV['AWS_SECRET_ACCESS_KEY'],
  :region => "eu-west-1",
)
# shared hostname: s3-eu-west-1.amazonaws.com
storage.sync_clock

# my hostname: semaperepelitsa-test.s3-eu-west-1.amazonaws.com
storage.get_object("semaperepelitsa-test", "test.txt")

With explicit hostname sync_clock works as I want, but get_object breaks.

storage = Fog::Storage.new(
  :provider => 'AWS',
  :aws_access_key_id => ENV['AWS_ACCESS_KEY_ID'],
  :aws_secret_access_key => ENV['AWS_SECRET_ACCESS_KEY'],
  :region => "eu-west-1",
  :host => "semaperepelitsa-test.s3-eu-west-1.amazonaws.com",
)
# my hostname: semaperepelitsa-test.s3-eu-west-1.amazonaws.com
storage.sync_clock
#=> 2018-08-29 18:47:01 +0000

# bad hostname
storage.get_object("semaperepelitsa-test", "test.txt")
# Excon::Error::Socket: hostname "semaperepelitsa-test.semaperepelitsa-test.s3-eu-west-1.amazonaws.com" does not match the server certificate (OpenSSL::SSL::SSLError)

Is there any option I'm missing, or is this not supported?

Thanks!

@geemus
Copy link
Member

geemus commented Aug 29, 2018

The first one looks more like the usage I would expect, and I think it would result in calls using the desired hostname. Is there something wrong with that usage that you are trying to avoid? It looks like the other case is ending up using the hostname twice, but I think that is expected given the parameters provided. I guess I'm not totally sure I understand what you are trying to achieve though. Is there related documentation you could point me at to help me clarify in my head what you are after? Thanks!

@semaperepelitsa
Copy link
Author

Sorry if I wasn't clear. I want both calls (sync_clock and get_object) to be made to my hostname: "semaperepelitsa-test.s3-eu-west-1.amazonaws.com".

I have a security barrier between my server container and the outside network, where I whitelist the hosts it is allowed to talk to. That way, a possible hacker who gains access from inside the container has very limited ways to communicate with outside world. At the moment I have to whitelist the whole Amazon S3, which means the hacker can communicate from inside the container through his own bucket. So I would like to restrict the access to just my own bucket.

@geemus
Copy link
Member

geemus commented Aug 30, 2018

Ah, ok. I think fog will usually try to use the bucket name as a subdomain whenever it can, which should mean that the first example would (I think) tend to do what you wanted. Are you finding that it is not doing that as you would expect?

@semaperepelitsa
Copy link
Author

Actually, in the first example it is using the shared hostname (s3-eu-west-1.amazonaws.com) for sync_clock - as I noted in comments. I'm guessing it works this way because the operation is not bucket-specific, so it doesn't bother using bucket hostname, even though it would work both ways.

I would like a solution similar to the second example, where I can set a custom hostname, and it would use the hostname as-is for all operations.

@geemus
Copy link
Member

geemus commented Aug 30, 2018

Ah ha. Got it, thanks for talking through it with me. The sync_clock method is really just an alias to get_service, which is not bucket specific (and is therefore not using the subdomain). Maybe there is some better way to fix this ultimately, but I think given that Fog::Time uses a class variable, we can give you a workaround for now.

# sync the clock using specified hostname (basically sync against bucket instead of service)
storage = Fog::Storage.new(
  :provider => 'AWS',
  :aws_access_key_id => ENV['AWS_ACCESS_KEY_ID'],
  :aws_secret_access_key => ENV['AWS_SECRET_ACCESS_KEY'],
  :region => "eu-west-1",
  :host => "semaperepelitsa-test.s3-eu-west-1.amazonaws.com",
)
# my hostname: semaperepelitsa-test.s3-eu-west-1.amazonaws.com
storage.sync_clock
#=> 2018-08-29 18:47:01 +0000

# create new connection without host (subdomain will be set by bucket for operations)
# should still use the sync values from above, as it is for all of fog, not just per-connection
storage = Fog::Storage.new(
  :provider => 'AWS',
  :aws_access_key_id => ENV['AWS_ACCESS_KEY_ID'],
  :aws_secret_access_key => ENV['AWS_SECRET_ACCESS_KEY'],
  :region => "eu-west-1",
)
# my hostname: semaperepelitsa-test.s3-eu-west-1.amazonaws.com
storage.get_object("semaperepelitsa-test", "test.txt")

It's admittedly a bit awkward, but I think it should work. Does that make sense?

@semaperepelitsa
Copy link
Author

The result is what I want, but I can't easily implement this because I'm using Fog through Dragonfly library, so I would have to patch it. Anyway, I think I will just leave this as is for now, since there are no easy solutions, and hopefully return to the issue when I get more time. Thanks!

@geemus
Copy link
Member

geemus commented Aug 30, 2018

Actually, I think you could still do something like this. If you build your own connection and sync clock, the dragonfly usage should also pickup on the synced value. So if you just create/sync in your setup, I think it would get you what you want. Might be worth a try anyway?

@semaperepelitsa
Copy link
Author

Dragonfly makes a call to sync_clock explicitly, so I don't think I can do much. Unless you have some kind of a global cache?

https://github.com/markevans/dragonfly-s3_data_store/blob/15ba3f39abfcce47efcc8a27dbe3381a612317a0/lib/dragonfly/s3_data_store.rb#L83-L95

@geemus
Copy link
Member

geemus commented Aug 31, 2018

Yes, the clock settings are a class variable on Fog, so it should be global. If you were to do the sync first, dragonfly should pick up on the value. Certainly let me know if that does not appear to be the case though. I guess that would ensure the sync worked, but wouldn't prevent dragonfly from doing it against the wrong domain... Sorry, perhaps that is a deadend as well, not sure what the best solution would be.

@github-actions
Copy link

This issue has been marked inactive and will be closed if no further activity occurs.

@duckworth
Copy link
Contributor

Sorry to respond to an old issue but I am also having a problem related to storage.sync_clock.

The sync_clock method calls get_service which does a ListBuckets operation. If the IAM user does not have permissions to ListBuckets (as I assume would be common when granting bucket specific permissions to the credentials used for fog) it generates a ton of http 403's with AccessDenied (which get logged in AWS Cloudtrail).

Any thoughts on making sync_clock configurable to use an S3 operation the user has permissions to perform?
I have currently patched it to do a head_bucket and it is working fine now.

@geemus
Copy link
Member

geemus commented Jul 14, 2022

@duckworth Hey, I hadn't really considered this case. Something like that makes sense to me. Would you be open to working on a patch with the change to help us out?

@duckworth
Copy link
Contributor

@geemus Sure, maybe an option called sync_clock_bucket_name which, if present will do a head_bucket instead of a get_service?

@geemus
Copy link
Member

geemus commented Jul 15, 2022

@duckworth - Hmm, the little bit I had thought about it I was thinking about the different request types and forgetting that we would need to specify which bucket to use for the head, which does make it a little more complicated.

My initial inclination was toward an argument like request_type which could default to :get_service, but also allow :head_bucket. I still kind of like that for clarity (and future flexibility, should we need it), but it doesn't solve the issue of passing the bucket name. We could potentially do the options as a hash, so in addition to request_type it could expect another argument, like perhaps :bucket for :head_bucket.

Alternatively we could think about just making a new/distinct method, something like service.sync_clock_via_head_bucket which takes the bucket name as it's argument. That is nice in being explicit, but could get ugly if we need to make additional variations in the future (that said, this is the first time I recall the issue coming up, so this might well be premature optimization).

Do those make sense? Which sounds better to you?

@duckworth
Copy link
Contributor

duckworth commented Jul 15, 2022

@geemus I am using fog indirectly through the dragonfly-s3_data_store gem so the only control I have would be a through the Fog::Storage options. I have a pull request open to make it optional there markevans/dragonfly-s3_data_store#35

I have researched a bit more and the best option I could come up with is making a call to to the base s3 url with no authentication at all. An unauthenticated call to s3.amazonaws.com/ (or any valid s3 endpoint) seems to return a 307 redirect which has the date header which serves the purpose and is lightweight. The only challenge I am seeing is there doesn't seem to be any existing support/example of a request without credentials. Any thoughts on that approach?

@geemus
Copy link
Member

geemus commented Jul 19, 2022

@duckworth Ah, yeah that makes total sense. Thanks for the added context, that definitely would make it a lot more troublesome to make different calls and things.

I really like your idea about an unauthenticated request. As you suggested, that should always work and still get us what we need. I think we could probably drop straight down to Excon for a request like this to keep things simple. I think that could be something like:

response = Excon.get('https://s3.amazonaws.com')
Fog::Time.new = Time.parse(response.headers['Date'])

I'm writing that off the top of my head, so it might not quite be right, but hopefully close. I think if you need examples or data it's probably easiest to go directly to Excon docs. Does that make sense?

Thanks again for continuing to talk through this, I think we are inching toward a much better implementation.

@duckworth
Copy link
Contributor

@geemus OK, I created a PR #651
I am using the @scheme and @host from Storage so it hits the configured regional S3 url. I could change to take an options hash to be able to override any of the request parameters if we think more control may be needed.

@geemus
Copy link
Member

geemus commented Jul 21, 2022

@duckworth Awesome, thanks again. Great catch on using the provided scheme/host to be a bit more flexible. I like the idea of keeping it simple for now, we can always add extra parameters later if/when we find a clear need for that. I'll check back on it in a bit and should be able to merge it as long as tests pass.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants