Skip to content

Commit

Permalink
add beginnings of mdbook for documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
TimDikland-DB committed Mar 28, 2024
1 parent 9596e0b commit 406d3fc
Show file tree
Hide file tree
Showing 14 changed files with 1,842 additions and 1 deletion.
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
/target
/creds/aws
/creds/gcp
.vscode
.vscode

/docs/book
16 changes: 16 additions & 0 deletions docs/book.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
[book]
authors = ["Tim Dikland"]
language = "en"
multilingual = false
src = "src"
title = "Delta Sharing Server"

[preprocessor]

[preprocessor.mermaid]
command = "mdbook-mermaid"

[output]

[output.html]
additional-js = ["mermaid.min.js", "mermaid-init.js"]
1 change: 1 addition & 0 deletions docs/mermaid-init.js
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
mermaid.initialize({startOnLoad:true});
1,648 changes: 1,648 additions & 0 deletions docs/mermaid.min.js

Large diffs are not rendered by default.

5 changes: 5 additions & 0 deletions docs/src/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Introduction

[Delta Sharing](https://delta.io/sharing/) is an open protocol for secure sharing of large datasets. The protocol enables sharing of data in realtime independent of the computing platform that is used to read the datasets. At the heart of the Delta Sharing is the [REST protocol](https://github.com/delta-io/delta-sharing/blob/main/PROTOCOL.md) that defines the API that can be used by clients to obtain in-place access to shared datasets.

TODO
18 changes: 18 additions & 0 deletions docs/src/SUMMARY.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# Summary

[Introduction](README.md)

# User Guide

- [Getting Started](./user_guide/quickstart.md)

# Developer Guide

- [Overview](./developer_guide/overview.md)
- [Protocol router](./developer_guide/overview.md)
- [Authentication](./developer_guide/protocol/authentication.md)
- [Catalog](./developer_guide/protocol/catalog.md)
- [Reader](./developer_guide/protocol/reader.md)
- [Signer](./developer_guide/protocol/signer.md)
- [Admin router](./developer_guide/admin/README.md)
- [Discovery router](./developer_guide/discovery/README.md)
1 change: 1 addition & 0 deletions docs/src/developer_guide/admin/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# Admin router
1 change: 1 addition & 0 deletions docs/src/developer_guide/discovery/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# Discovery router
31 changes: 31 additions & 0 deletions docs/src/developer_guide/overview.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# Overview

The Delta Sharing Server first and foremost implements the Delta Sharing protocol. The protocol is a REST API that allows clients to discover and query Delta Tables. The server is responsible for authenticating requests, looking up table details, and creating signed URLs to the data files that contain the relevant table data.

The high level process of querying a Delta Table from a Delta Sharing Client is as follows:

```mermaid
sequenceDiagram
Delta Sharing Client->>Delta Sharing Server: Query Delta Table
Delta Sharing Server->>Delta Sharing Server: Authenticate request
Delta Sharing Server-->>Delta Sharing Client: Unauthorized
Delta Sharing Server->>Delta Sharing Server: Lookup table details
Delta Sharing Server-->>Delta Sharing Client: Not found / Forbidden
Delta Sharing Server->>Object Storage: Read delta log
Object Storage->>Delta Sharing Server: relevant actions
Delta Sharing Server->>Delta Sharing Server: Sign Delta Table actions
Delta Sharing Server->>Delta Sharing Client: Return signed Delta Table actions
Delta Sharing Client->>Object Storage: Fetch data from signed parquet files
Object Storage->>Delta Sharing Client: Return data
```

The Delta Sharing Server is thus responsible for the following:

- Authentication of HTTP requests from Delta Sharing clients (i.e. recipients)
- Querying a repository of shared Delta tables with details including the location of the data files in (cloud) object storage
- Interacting with the object storage to replay the Delta log to find the data files for the requested table
- Generating signed URLs to the data files that contain the requested table data

The Delta Sharing Server has abstractions for these components that can be implemented to support different authentication mechanisms, storage backends, and table discovery strategies. These abstractions are defined using traits and can be implemented by users to customize the server to their needs.

The Delta Sharing Server is implemented in Rust and uses the [Axum](https://github.com/tokio-rs/axum) web framework for handling HTTP requests. The server is designed to be fast and efficient, and can be deployed as a standalone server or as a library in a larger application.
52 changes: 52 additions & 0 deletions docs/src/developer_guide/protocol/authentication.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# Authentication middleware

Like other areas of the Delta Sharing server, it is possible to extend the server by implementing your own authentication middleware.

## How is authentication/authorization handled?

The handlers for all of the routes in the Delta Sharing protocol router expect a request extension with the `RecipientId`. If this extension is not set, the handler will return an error response saying the request is unauthenticated.
The `RecipientId` is the type that identifies the client that is calling the server (or is set to `RecipientId::Unknown` if the client could/should not be identified).
Once the request reaches the route handlers the `RecipientId` is used to determine if the client has the necessary permissions to access the requested data.

### Example

An example of custom middleware can be found below. In this example the middleware will authenticate resuests based on a hardcoded password. If the password is correct, the `RecipientId` is set to `RecipientId::anonymous()` and proceeds to the route handler. If the password is incorrect, the middleware will return an unauthorized response.

```rust
const SUPER_SECRET_PASSWORD: &str = "delta-sharing-is-caring";

async fn auth(mut request: Request, next: Next) -> Result<Response, ServerError> {
if let Some(token) = request.headers().get(AUTHORIZATION) {
let token = token.to_str().unwrap();
if token == SUPER_SECRET_PASSWORD {
tracing::info!(client_id=%client_id, "authorized");

let client_id = RecipientId::anonymous();
request.extensions_mut().insert(client_id);

let response = next.run(request).await;
return Ok(response);
}
}

Err(ServerError::unauthorized(""))
}

let mut state = SharingServerState::new(...);
let svc = build_sharing_server_router(Arc::new(state));

// Add custom authentication middleware here
let app = svc
.layer(middleware::from_fn(auth));

let listener = TcpListener::bind("127.0.0.1:0")
.await
.expect("Could not bind to socket");
axum::serve(listener, app).await.expect("server error");
```

## What's in the box?

The Delta Sharing library comes with a pre-built authentication middleware that can be used out of the box.

// TODO: write about pre-built middleware
3 changes: 3 additions & 0 deletions docs/src/developer_guide/protocol/catalog.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Catalog

TODO
3 changes: 3 additions & 0 deletions docs/src/developer_guide/protocol/reader.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Reader

TODO
57 changes: 57 additions & 0 deletions docs/src/developer_guide/protocol/signer.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
# Signer

The signer is a component of the protocol router that is responsible for creating signed URLs to the data files that contain the relevant table data. The signer is responsible for ensuring that the client has the necessary permissions to access the data files and that the URLs are only valid for a limited time.

## How is signing handled?

The signer is defined by the following trait:

```rust
trait Signer: Send + Sync {
fn sign(&self, uri: &str, expires_in: Duration) -> Result<SignedUrl, SignerError>;
}
```

Implementing this type allows users to customize the signing process to their needs. The `sign` method takes a URI which is typically cloud specfic (e.g. `s3://my-data-bucket/my-table/part1-0000.snappy.parquet`) and a `Duration` for how long the signed URL should be valid. The signer should return a `SignedUrl` that contains the signed URL and the expiration time.

### Example

// TODO: create good example

### Configuring multiple signers

It is possible that tables that are shared using Delta Sharing are stored in different cloud storage services. In this case, the Delta Sharing server can be configured with multiple signers, each responsible for signing URLs for a specific cloud storage service. To make sure that the correct signer is used, one could implement a simple registry and use it to look up the correct signer based on the URI.

```rust
struct SignerRegistry {
HashMap<String, Box<dyn Signer>>,
}

impl SignerRegistry {
fn new() -> Self {
let s3_signer = todo!();
let gcs_signer = todo!();

let mut registry = HashMap::new();
registry.insert("s3".to_string(), Box::new(s3_signer));
registry.insert("gs".to_string(), Box::new(gcs_signer));
Self { registry }
}

fn get_signer(&self, uri: &str) -> Option<&Box<dyn Signer>> {
// logic to determine which signer to use
todo!()
}
}

impl Signer for SignerRegistry {
fn sign(&self, uri: &str, expires_in: Duration) -> Result<SignedUrl, SignerError> {
let signer = self.get_signer(uri).unwrap();
signer.sign(uri, expires_in)
}
}
```

## What's in the box?

The Delta Sharing library comes with pre-built signers for common cloud storage services like S3, GCS, and Azure Blob Storage. These signers are implemented using the `Signer` trait and can be direcly used in the Delta Sharing server configuration.
3 changes: 3 additions & 0 deletions docs/src/user_guide/quickstart.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Getting Started

TODO

0 comments on commit 406d3fc

Please sign in to comment.