Skip to content

Commit

Permalink
Merge branch 'databricks:main' into aws-jd
Browse files Browse the repository at this point in the history
  • Loading branch information
JDBraun authored Sep 10, 2024
2 parents 535c18b + 8f5356c commit 9d15a35
Show file tree
Hide file tree
Showing 7 changed files with 50 additions and 60 deletions.
4 changes: 2 additions & 2 deletions aws-gov/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Security Reference Architecture Template
# Security Reference Architectures (SRA) - Terraform Templates


## Introduction
Expand All @@ -21,7 +21,7 @@ There are four separate operation modes you can choose for the underlying networ

- **Sandbox**: Sandbox or open egress. Selecting 'sandbox' as the operation mode allows traffic to flow freely to the public internet. This mode is suitable for sandbox or development scenarios where data exfiltration protection is of minimal concern, and developers need to access public APIs, packages, and more.

- **Firewall**: Firewall or limited egress. Choosing 'firewall' as the operation mode permits traffic flow only to a selected list of public addresses. This mode is applicable in situations where open internet access is necessary for certain tasks, but unfiltered traffic is not an option due to the sensitivity of the workloads or data. **NOTE**: Due to a limitation in the AWS Network Firewall's ability to use fully qualified domain names for non-HTTP/HTTPS traffic, an external data source is required for the external Hive metastore. For production scenarios, we recommend using Unity Catalog or self-hosted Hive metastores.
- **Firewall**: Firewall or limited egress. Choosing 'firewall' as the operation mode permits traffic flow only to a selected list of public addresses. This mode is applicable in situations where open internet access is necessary for certain tasks, but unfiltered traffic is not an option due to the sensitivity of the workloads or data. **NOTE**: Due to a limitation in the AWS Network Firewall's ability to use fully qualified domain names for non-HTTP/HTTPS traffic, an external data source is required for the external Hive metastore. For sensitive production workloads, it is recommended to use isolated operation mode and Unity Catalog, a self-hosted Hive metastore, or to explore other firewall services to address AWS Network Firewall's limitations.

- **Isolated**: Isolated or no egress. Opting for 'isolated' as the operation mode prevents any traffic to the public internet. Traffic is limited to AWS private endpoints, either to AWS services or the Databricks control plane. This mode should be used in cases where access to the public internet is completely unsupported. **NOTE**: Apache Derby Metastore will be required for clusters and non-serverless SQL Warehouses. For more information, please view this [knowledge article](https://kb.databricks.com/metastore/set-up-embedded-metastore).

Expand Down
34 changes: 23 additions & 11 deletions aws-gov/tf/modules/sra/data_plane_hardening/firewall/firewall.tf
Original file line number Diff line number Diff line change
Expand Up @@ -186,13 +186,8 @@ resource "aws_networkfirewall_rule_group" "databricks_fqdn_allowlist" {
}
}

// Data for IP allow list
data "external" "metastore_ip" {
program = ["sh", "${path.module}/metastore_ip.sh"]

query = {
metastore_domain = var.hive_metastore_fqdn
}
data "dns_a_record_set" "metastore_dns" {
host = var.hive_metastore_fqdn
}

// JDBC Firewall group IP allow list
Expand All @@ -205,10 +200,28 @@ resource "aws_networkfirewall_rule_group" "databricks_metastore_allowlist" {
rule_order = "STRICT_ORDER"
}
rules_source {
dynamic "stateful_rule" {
for_each = toset(data.dns_a_record_set.metastore_dns.addrs)
content {
action = "PASS"
header {
destination = stateful_rule.value
destination_port = 3306
direction = "FORWARD"
protocol = "TCP"
source = "ANY"
source_port = "ANY"
}
rule_option {
keyword = "sid"
settings = ["1"]
}
}
}
stateful_rule {
action = "PASS"
action = "DROP"
header {
destination = data.external.metastore_ip.result["ip"]
destination = "0.0.0.0/0"
destination_port = 3306
direction = "FORWARD"
protocol = "TCP"
Expand All @@ -217,7 +230,7 @@ resource "aws_networkfirewall_rule_group" "databricks_metastore_allowlist" {
}
rule_option {
keyword = "sid"
settings = ["1"]
settings = ["2"]
}
}
}
Expand Down Expand Up @@ -250,7 +263,6 @@ resource "aws_networkfirewall_firewall_policy" "databricks_nfw_policy" {
priority = 2
resource_arn = aws_networkfirewall_rule_group.databricks_metastore_allowlist.arn
}

}

tags = {
Expand Down

This file was deleted.

Original file line number Diff line number Diff line change
Expand Up @@ -3,5 +3,8 @@ terraform {
aws = {
source = "hashicorp/aws"
}
dns = {
source = "hashicorp/dns"
}
}
}
40 changes: 19 additions & 21 deletions aws/tf/modules/sra/data_plane_hardening/firewall/firewall.tf
Original file line number Diff line number Diff line change
Expand Up @@ -186,13 +186,8 @@ resource "aws_networkfirewall_rule_group" "databricks_fqdn_allowlist" {
}
}

// Data for IP allow list
data "external" "metastore_ip" {
program = ["sh", "${path.module}/metastore_ip.sh"]

query = {
metastore_domain = var.hive_metastore_fqdn
}
data "dns_a_record_set" "metastore_dns" {
host = var.hive_metastore_fqdn
}

// JDBC Firewall group IP allow list
Expand All @@ -205,19 +200,22 @@ resource "aws_networkfirewall_rule_group" "databricks_metastore_allowlist" {
rule_order = "STRICT_ORDER"
}
rules_source {
stateful_rule {
action = "PASS"
header {
destination = data.external.metastore_ip.result["ip"]
destination_port = 3306
direction = "FORWARD"
protocol = "TCP"
source = "ANY"
source_port = "ANY"
}
rule_option {
keyword = "sid"
settings = ["1"]
dynamic "stateful_rule" {
for_each = toset(data.dns_a_record_set.metastore_dns.addrs)
content {
action = "PASS"
header {
destination = stateful_rule.value
destination_port = 3306
direction = "FORWARD"
protocol = "TCP"
source = "ANY"
source_port = "ANY"
}
rule_option {
keyword = "sid"
settings = ["1"]
}
}
}
stateful_rule {
Expand Down Expand Up @@ -288,4 +286,4 @@ resource "aws_networkfirewall_firewall" "nfw" {
Name = "${var.resource_prefix}-${var.region}-databricks-nfw"
Project = var.resource_prefix
}
}
}
13 changes: 0 additions & 13 deletions aws/tf/modules/sra/data_plane_hardening/firewall/metastore_ip.sh

This file was deleted.

3 changes: 3 additions & 0 deletions aws/tf/modules/sra/data_plane_hardening/firewall/provider.tf
Original file line number Diff line number Diff line change
Expand Up @@ -3,5 +3,8 @@ terraform {
aws = {
source = "hashicorp/aws"
}
dns = {
source = "hashicorp/dns"
}
}
}

0 comments on commit 9d15a35

Please sign in to comment.