-
Notifications
You must be signed in to change notification settings - Fork 9
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
add beginnings of mdbook for documentation
- Loading branch information
1 parent
9596e0b
commit 406d3fc
Showing
14 changed files
with
1,842 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,6 @@ | ||
/target | ||
/creds/aws | ||
/creds/gcp | ||
.vscode | ||
.vscode | ||
|
||
/docs/book |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
[book] | ||
authors = ["Tim Dikland"] | ||
language = "en" | ||
multilingual = false | ||
src = "src" | ||
title = "Delta Sharing Server" | ||
|
||
[preprocessor] | ||
|
||
[preprocessor.mermaid] | ||
command = "mdbook-mermaid" | ||
|
||
[output] | ||
|
||
[output.html] | ||
additional-js = ["mermaid.min.js", "mermaid-init.js"] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
mermaid.initialize({startOnLoad:true}); |
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
# Introduction | ||
|
||
[Delta Sharing](https://delta.io/sharing/) is an open protocol for secure sharing of large datasets. The protocol enables sharing of data in realtime independent of the computing platform that is used to read the datasets. At the heart of the Delta Sharing is the [REST protocol](https://github.com/delta-io/delta-sharing/blob/main/PROTOCOL.md) that defines the API that can be used by clients to obtain in-place access to shared datasets. | ||
|
||
TODO |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
# Summary | ||
|
||
[Introduction](README.md) | ||
|
||
# User Guide | ||
|
||
- [Getting Started](./user_guide/quickstart.md) | ||
|
||
# Developer Guide | ||
|
||
- [Overview](./developer_guide/overview.md) | ||
- [Protocol router](./developer_guide/overview.md) | ||
- [Authentication](./developer_guide/protocol/authentication.md) | ||
- [Catalog](./developer_guide/protocol/catalog.md) | ||
- [Reader](./developer_guide/protocol/reader.md) | ||
- [Signer](./developer_guide/protocol/signer.md) | ||
- [Admin router](./developer_guide/admin/README.md) | ||
- [Discovery router](./developer_guide/discovery/README.md) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
# Admin router |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
# Discovery router |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
# Overview | ||
|
||
The Delta Sharing Server first and foremost implements the Delta Sharing protocol. The protocol is a REST API that allows clients to discover and query Delta Tables. The server is responsible for authenticating requests, looking up table details, and creating signed URLs to the data files that contain the relevant table data. | ||
|
||
The high level process of querying a Delta Table from a Delta Sharing Client is as follows: | ||
|
||
```mermaid | ||
sequenceDiagram | ||
Delta Sharing Client->>Delta Sharing Server: Query Delta Table | ||
Delta Sharing Server->>Delta Sharing Server: Authenticate request | ||
Delta Sharing Server-->>Delta Sharing Client: Unauthorized | ||
Delta Sharing Server->>Delta Sharing Server: Lookup table details | ||
Delta Sharing Server-->>Delta Sharing Client: Not found / Forbidden | ||
Delta Sharing Server->>Object Storage: Read delta log | ||
Object Storage->>Delta Sharing Server: relevant actions | ||
Delta Sharing Server->>Delta Sharing Server: Sign Delta Table actions | ||
Delta Sharing Server->>Delta Sharing Client: Return signed Delta Table actions | ||
Delta Sharing Client->>Object Storage: Fetch data from signed parquet files | ||
Object Storage->>Delta Sharing Client: Return data | ||
``` | ||
|
||
The Delta Sharing Server is thus responsible for the following: | ||
|
||
- Authentication of HTTP requests from Delta Sharing clients (i.e. recipients) | ||
- Querying a repository of shared Delta tables with details including the location of the data files in (cloud) object storage | ||
- Interacting with the object storage to replay the Delta log to find the data files for the requested table | ||
- Generating signed URLs to the data files that contain the requested table data | ||
|
||
The Delta Sharing Server has abstractions for these components that can be implemented to support different authentication mechanisms, storage backends, and table discovery strategies. These abstractions are defined using traits and can be implemented by users to customize the server to their needs. | ||
|
||
The Delta Sharing Server is implemented in Rust and uses the [Axum](https://github.com/tokio-rs/axum) web framework for handling HTTP requests. The server is designed to be fast and efficient, and can be deployed as a standalone server or as a library in a larger application. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,52 @@ | ||
# Authentication middleware | ||
|
||
Like other areas of the Delta Sharing server, it is possible to extend the server by implementing your own authentication middleware. | ||
|
||
## How is authentication/authorization handled? | ||
|
||
The handlers for all of the routes in the Delta Sharing protocol router expect a request extension with the `RecipientId`. If this extension is not set, the handler will return an error response saying the request is unauthenticated. | ||
The `RecipientId` is the type that identifies the client that is calling the server (or is set to `RecipientId::Unknown` if the client could/should not be identified). | ||
Once the request reaches the route handlers the `RecipientId` is used to determine if the client has the necessary permissions to access the requested data. | ||
|
||
### Example | ||
|
||
An example of custom middleware can be found below. In this example the middleware will authenticate resuests based on a hardcoded password. If the password is correct, the `RecipientId` is set to `RecipientId::anonymous()` and proceeds to the route handler. If the password is incorrect, the middleware will return an unauthorized response. | ||
|
||
```rust | ||
const SUPER_SECRET_PASSWORD: &str = "delta-sharing-is-caring"; | ||
|
||
async fn auth(mut request: Request, next: Next) -> Result<Response, ServerError> { | ||
if let Some(token) = request.headers().get(AUTHORIZATION) { | ||
let token = token.to_str().unwrap(); | ||
if token == SUPER_SECRET_PASSWORD { | ||
tracing::info!(client_id=%client_id, "authorized"); | ||
|
||
let client_id = RecipientId::anonymous(); | ||
request.extensions_mut().insert(client_id); | ||
|
||
let response = next.run(request).await; | ||
return Ok(response); | ||
} | ||
} | ||
|
||
Err(ServerError::unauthorized("")) | ||
} | ||
|
||
let mut state = SharingServerState::new(...); | ||
let svc = build_sharing_server_router(Arc::new(state)); | ||
|
||
// Add custom authentication middleware here | ||
let app = svc | ||
.layer(middleware::from_fn(auth)); | ||
|
||
let listener = TcpListener::bind("127.0.0.1:0") | ||
.await | ||
.expect("Could not bind to socket"); | ||
axum::serve(listener, app).await.expect("server error"); | ||
``` | ||
|
||
## What's in the box? | ||
|
||
The Delta Sharing library comes with a pre-built authentication middleware that can be used out of the box. | ||
|
||
// TODO: write about pre-built middleware |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
# Catalog | ||
|
||
TODO |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
# Reader | ||
|
||
TODO |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,57 @@ | ||
# Signer | ||
|
||
The signer is a component of the protocol router that is responsible for creating signed URLs to the data files that contain the relevant table data. The signer is responsible for ensuring that the client has the necessary permissions to access the data files and that the URLs are only valid for a limited time. | ||
|
||
## How is signing handled? | ||
|
||
The signer is defined by the following trait: | ||
|
||
```rust | ||
trait Signer: Send + Sync { | ||
fn sign(&self, uri: &str, expires_in: Duration) -> Result<SignedUrl, SignerError>; | ||
} | ||
``` | ||
|
||
Implementing this type allows users to customize the signing process to their needs. The `sign` method takes a URI which is typically cloud specfic (e.g. `s3://my-data-bucket/my-table/part1-0000.snappy.parquet`) and a `Duration` for how long the signed URL should be valid. The signer should return a `SignedUrl` that contains the signed URL and the expiration time. | ||
|
||
### Example | ||
|
||
// TODO: create good example | ||
|
||
### Configuring multiple signers | ||
|
||
It is possible that tables that are shared using Delta Sharing are stored in different cloud storage services. In this case, the Delta Sharing server can be configured with multiple signers, each responsible for signing URLs for a specific cloud storage service. To make sure that the correct signer is used, one could implement a simple registry and use it to look up the correct signer based on the URI. | ||
|
||
```rust | ||
struct SignerRegistry { | ||
HashMap<String, Box<dyn Signer>>, | ||
} | ||
|
||
impl SignerRegistry { | ||
fn new() -> Self { | ||
let s3_signer = todo!(); | ||
let gcs_signer = todo!(); | ||
|
||
let mut registry = HashMap::new(); | ||
registry.insert("s3".to_string(), Box::new(s3_signer)); | ||
registry.insert("gs".to_string(), Box::new(gcs_signer)); | ||
Self { registry } | ||
} | ||
|
||
fn get_signer(&self, uri: &str) -> Option<&Box<dyn Signer>> { | ||
// logic to determine which signer to use | ||
todo!() | ||
} | ||
} | ||
|
||
impl Signer for SignerRegistry { | ||
fn sign(&self, uri: &str, expires_in: Duration) -> Result<SignedUrl, SignerError> { | ||
let signer = self.get_signer(uri).unwrap(); | ||
signer.sign(uri, expires_in) | ||
} | ||
} | ||
``` | ||
|
||
## What's in the box? | ||
|
||
The Delta Sharing library comes with pre-built signers for common cloud storage services like S3, GCS, and Azure Blob Storage. These signers are implemented using the `Signer` trait and can be direcly used in the Delta Sharing server configuration. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
# Getting Started | ||
|
||
TODO |