Skip to content

peter-lyons-kehl/tokio_logdna-rust

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Requirements

  • Receive a CSV file as a POST body (but not as a multipart upload through a form).
    • (Initially) it's OK to accept only a very simple CSV: Header field names are case sensitive, no special handling of quotes, no escaping.
  • Return as an array of JSON objects.
  • Create tests.

Optional requirements

Position-independent CSV headers. A file is accepted regardless of the order of columns, as far as their names match.

Store in Postgres. Discuss schema designs.

Preferred technologies

Tokio, Axum

Usage

Start the server

Submit a request

  • wget --post-file=tests/assets/addresses.csv http://127.0.0.1:8080/addresses or
  • curl -H "Content-Type: text/csv" --data-binary @tests/assets/addresses.csv 127.0.0.1:8080/addresses
    • Use --data-binary instead of --data, otherwise newlines are stripped - and those are a part of CSV format.

Debug

curl -w "%{http_code}" -H "Content-Type: text/json" --data-binary @tests/assets/addresses.csv 127.0.0.1:8080/addresses

Tests

  • export API_KEY=....
  • cargo test

Roadmap

Tradeoffs and Decisions

  • Postgres with https://docs.rs/tokio-postgres/latest/tokio_postgres - chosen because it's a part of Tokio project => reliable.
  • OpenAPI generation. It seems useless for CSV. But if we had parameters in an HTTP query/form:
  • Write to Postgres
    • Do we have a defined schema for each client/company/data source, shared across their uploads? If yes:
        1. Schema can change over time, so each CSV column info would have to have its applicability period (two timestamps, and/or a client/company-specific version number/string). Or
        1. Everytime a client/company changes their schema, we create a new endpoint (and a new DB table).
      1. Alternatively, we have a dynamic schema generated from CSV, independent, per-upload.
    • Out of scope: merging with existing data (which would involve flagging conflicting/unmergable entries and human intervention). Hence:
    • Each upload creates a new subset of entries, all associated with the same "upload" entry, and we return a new ID of that upload.
    • This MVP has only one endpoint, hence #3 from above. Treating all values as texts. Four tables (mutli-dimensional flattened). Descriptions are Postgres-agnostic:
      • uploads: id (generated primary), uploaded (timestamp)
      • schema_field: id (generated primary), upload_id (foreign), field_name (text), field_max_length (numeric integer)
      • upload_row: id (generated primary), upload_id (foreign)
      • upload_field: id (generated primary), upload_row_id (foreign), schema_field_id (foreign), value (text). Optionally add (redundant) upload_id to simplify queries (if need be).

Releases

No releases published

Packages

No packages published

Languages