Skip to content

Server Architecture

Bob Evans edited this page Oct 9, 2013 · 7 revisions

Overview of the Paco Server

Introduction

The server is relatively simple because the phone clients do most of the scheduling work and network synchronization work. Here is a high-level set of features of the Paco server.

  • It serves requests via servlets and gwt-rpc. It is currently hosted on AppEngine Java.
  • It uses JDO as the interface to the AppEngine datastore. This is being replaced by the low-level data api.
  • We use GWT as the web client framework.
  • There is a layer of DTOs for sending data from the server to the GWT client over GWT-RPC. This is pretty annoying. As it is a redundant data definition. We have moved that to a shared set of objects and are removing the JDO objects.
  • The Android client speaks to the http interface to GET and PUT with JSON payloads.
  • We use AppEngine's UserService to handle authentication.
  • We use backend servers to do report generation on demand.

Details

Data Model Implementation & Changes

The original JDO interface to the datastore has become a problem because it imposes too many constraints on the system as we iterate the design. The implied relationships make it more difficult to morph the data model. We are in the process of migrating to the low level data store for persistence and then using the DTO objects to pass data between the various clients either serializing over gwt-rpc or using Jackson for JSON.

Experiment objects define the Signaling Mechanism used to trigger the user. They also define the inputs that will be collected from the user as well as the Feedback or Interventions that will be delivered to the user.

We model responses in Event objects that capture meta data and contain a tuple of key/value pairs for the values from the Experiment schema.

Access Control

Experiments can be unpublished, in which case, only the creator and other administrators can see the experiment. If an experiment is published, then it can either be published to the world or to a select group of email addresses (Google accounts).

An administrator can view all the data for the experiment. A user can only view their own data.

Previously, this was implemented as set of permissions on the Experiment objects. This was a temporary solution. Moving forward we will build an access control data table to map users to public and private experiments to facilitate faster querying and retrieval.

Report Generation

Paco has grown to the point where reports can contain more data than AppEngine will let us process before it times out our request. There are two approaches that can be applied to report generation. One approach involves computing stats incrementally every time that data is modified. This works if you can compute the report online or if you can trigger a job to compute in the background every time the data is uploaded. In reality, reports may not be generated very often, if ever when considering the numerous pilots. The second approach is to see the request for a report as a task to be completed and to notify the user when it is done. This is the approach we have currently taken for report generation, but we will most likely employ both strategies in the future.

Currently, when a client asks the EventServlet for a report about a particular experiment it can specify query options to subsample the events collected and it can specify a report format, CSV, JSON, or HTML. When the request is received. The server sends a request to launch a long-running job to the backend servers. It then marks a job as in-progress in a reportJob data table and hands a key back to the requestor. When the job is done by the backend, it updates the reportJob table with a link to the generated report in the BlobStore. The client is waiting for the update and then gives the client the link to the generated report. Another cron process cleans up old report jobs.

Request Cache-ing

Right now, the cache-ing strategy is very simple. We cache experiments at the last step (JSON serialization) for a given user request. We also cache, in the Memcache Service, a given user's datastore request. When there is a write to an experiment, we clear the caches. This strategy has served us well but now it is time to come up with a better strategy.

The current plan is to change the way requests work so that not as much data is pulled at any one time. Instead, we will rearrange the requests to return paginated data and data requests will be predicated on categories of Experiments. We will also cache all public experiments and just invalidate specific objects.

Synchronization of Experiment Joined state

We now allow users to join experiments on the web for the purpose of responding on the web. Currently, Join events are recorded in the event table to simplify the storage system. Now, we find that it would be nice to query just for the experiments a user has joined without the overhead of scanning the Events table. So, we will be implementing a Joined table to easily retrieve this information.

We will then use this to allow a user to synchronize their experiments across the various clients they use: Android, iOS tablet, web client. We expect to reuse an existing synchronization mechanism, but will build one if we can't find a suitable one.

Clone this wiki locally