< 20 entries in the collection

cbrunet1;1

Data Discovery Project

Pick a favorite topic that you care about
Find at least 20 datasets for that topic (use, for example, https://toolbox.google.com/datasetsearch). I for one, collect open source git repositories, so I searched for "git urls"
For each of the 20 datasets you chose determine if the underlying data can be accessed (some of these datasets do not provide public access)
Create a mongodb collection YourNetId within the database fdac19mp2 where you store metadata for each of the 20 datasets: YourTopic, title, license, description, url(s) were the data may be retrieved

import pymongo, json
client = pymongo.MongoClient (host="da1.eecs.utk.edu")
db = client ['fdac19mp2']
coll = db ['YourNetId']
# for each dataset
coll.insert_one ( { 'topic':'YourTopic', 'title': 'Data title', 'license': 'license', 'description': 'Brief data description', 'urls': [ 'url1', 'url2', ... ] } )

To check what is recorded:

import pprint
import pymongo, json
client = pymongo.MongoClient (host="da1.eecs.utk.edu")
db = client ['fdac19mp2']
coll = db ['YourNetId']
pp = pprint.PrettyPrinter(indent=1,width=65)
for r in coll. find():
  print(pp .pformat (r))

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

< 20 entries in the collection

Data Discovery Project

Files

README.md

Latest commit

History

README.md

File metadata and controls

< 20 entries in the collection

Data Discovery Project