Skip to content

Scaling database queries workshop for CUHacking, uses Nodejs and MongoDB

Notifications You must be signed in to change notification settings

Branch-Bunch/Scalability-Workshop

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

63 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Scaling Database Queries with Nodejs and MongoDB

What we'll be covering:

  1. Atomic Database Actions
  2. Database Queries
  3. Pagination
  4. Aggregation Pipelines

Setting up your environment

See setup.md

  1. Download Nodejs, MongoDB
  2. Run mongod (running as root makes it easier)
  3. Git clone this repo
  4. npm run build
  5. npm test

Atomic Database Actions

  • Lower the amount of requests made, and reduce waiting
  • Eliminate the chance of race conditions, or operating on out of date values

Example: Increasing a property of an object with a specific id

Bad method:

Find object by id, get from database
Modify object on the server
Send updated object to database
Do something after the response
Reveal Bad Implementation
Sortable.findById(id)
  .then((object) => {
    object.property += 5
    return object.save()
  })
  .then(doSomething)
Reveal Good method
Tell the database to increment a certain property of object #id, and respond with the updated object
Do something after the response
Sortable.findByIdAndUpdate(
  id,
  { $inc: { property: 5 } },
  {
    new: true,
  })
  .then(doSomething)

Things to consider:

  1. What happens when multiple people make a request?
  2. How many requests are made in each of them?
Reveal Answer
  1. Atomic actions ensure that the proper data is being modified the proper amount
  2. Finding, modifying, then saving is 2, while finding and updating on the database is just 1

Having two networking requests race can cause a race condition

Useful Database Queries

  • Get data the way you want quickly, and efficiently
  • Do difficult operations off of the server, since the database is fast af

Example: Getting the top 10 Anime betrayals from a database of Anime episodes

Bad method:

Get all of the items from the database
Sort them
Pass on the first 10 items
Make a video with them
Reveal Bad Implementation
function comparison(first, second) {
  // compare the anime episodes
  // return 1 or -1 to get sort to order them
}

AnimeEpisodes.find({})
  .then(animeArray => animeArray.sort(comparison).slice(0, 10)))
  .then(makeVideo)
Reveal Good method
Tell the database to give you the first 10 elements of the sorted array
Do something with the sorted data
AnimeEpisodes.find({})
  .sort({ betrayal: -1 })
  .limit(10)
  .then(makeVideo)

Things to consider:

  1. What happens when you sort on a server that only has one thread?
  2. Why not use the database that is optimized for this?
Reveal Answer
  1. Sorting blocks the entire process, and makes things slow
  2. Might as well use the database as it does it in a faster method.

Pagination

  • Loading items in a set by index
  • There are multiple ways to do this: Skipping, and Ranged Queries
  • There are tradeoffs between both methods

Example: Loading next items by date, newest posts

Skipping through:

Load all of the items
Skip through the items, and load all of the ones along the way
Pass them on to the server
Do something
Reveal Skipping Implementation
  Sortable.find({})
    .skip(10)
    .limit(10)
    .then(doSomething)
Reveal Range Method
Load only the items that match the range
Pass them on to the server
Do Something
  Sortable.find({ createdAt:
    { $lte: 'theLastDateShown' }
    })
    .limit(10)
    .then(doSomething)

Aggregation Pipelines

  • Get data suited for more advanced needs.
  • Common use is when sorting by a single value isn't enough
  • Again, it is faster to do some operations on the database

Example: Getting the 10 most dank memes, where dankness is how recent, and highly voted they are

Bad method:

Get all memes from the database
Sort them using a function to determine their dankness, instead of a simple property
Pass on the first 10 memes
Post the memes
Reveal Bad Implementation
function getDankness(meme) {
  // divides score by how old it is
  return (meme.score / meme.dateField.getTime())
}

function comparison(first, second) {
  // use getDankness to return 1 or -1 for sort to order the memes
}

Memes.find({})
  .then(memesList => memesList.sort(comparison).slice(0, 10))
  .then(postMemes)
Reveal Good Method
Add a temporary field to each of the memes named dankness, and calculate it
Sort all of the memes by dankness, on the database still
Respond to the server with the 10 dankest memes
Post the memes
Memes.aggregate([
  {
    $project: {
      score: '$score',
      value: '$value',
      dankness: {
        $divide: [ '$score', '$value' ]
      }
    }
  },
  { 
    $sort: { dankness: -1} 
  },
])
  .limit(10)
  .then(postMemes)

Things to consider:

  1. Is the data in the database modified?
Reveal Answers
  1. The fields you add are temporary, so it is a good solution

Again, sorting on the database is a good solution in this example

About

Scaling database queries workshop for CUHacking, uses Nodejs and MongoDB

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published