Skip to content

HKN-Dr-Everitts-Neighborhood/Wiki-Tools

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Wiki-Tools

This Repo contains the code for several projects for the DEN wiki. It started off as just being used for the Curriculum Graph, but as the dataset for that project has found many uses, it has evolved to also support tech electives by subfields, as well as the less-related data processing for Engineering Industry Prospectus (aka the corporate survey). As this repo provides a good set of data & tools for anything that is going to interact with the wiki, it is expected that more subprojects will be added to it in the future.

Requirements

Code in this repo depends on:

  • nodejs for running the javascript on the commandline
  • python 2.7 for running the python scripts. A little of the code is incompatible with python 3, but this could easily be ported (at the expense of python 2.7 compatibility).
  • graphviz tools (namely, dot)
  • various python packages. Currently this is requests and beatifulsoup4. It is recommended to install these via pip, though downloading them and installying with the included setup.py should work too (hint: 'python setup.py install --user' on EWS machines - to install globally you need sudo privileges).

Organization & Design Decisions

The repo is split up into the following sections:

  • data_scrapers/: tools for gathering data.
  • data/: data management / combination. The end product of these is data.json.
  • data.json: contains all the data input to the other subprojects.

Anything that needs to be used for multiple projects should be computed in the data preparation stage, and the results should be saved in data.json - this helps avoid the need for re-implementing functionality common to multiple projects. This is especially important because the projects in this repo include python and javascript, so having common code live in the data preparation stage means it doesn't need to be implemented twice (once for python and once for javascript).

Data Format of data.json

The data format needs to support all projects. The data described here is all in data.json; note that this is autogenerated by data/prepare.py - if you want to edit the data, edit data/raw_data.json instead - the fields in that file will be carried over into data.json. Note that prepare.py combines data from raw_data.json and other data sources - see the readme in the data directory for more info.

The data is formatted as proper JSON - any json-parser should be able to handle it. The format looks like this:

[
  {
    "name": <course number - e.g. ECE 110>,
    "title": <course title - e.g. Introduction to Electrical and Computer Engineering>,
    "link": <link to wiki page>, // This field exists iff there is a corresponding wiki page. It contains the wiki tiny link, which is a permanent link.
    "pagetitle": <title of the corresponding wiki page>, // this field exists iff there is a corresponding wiki page
    "internallink": <title of wiki page>, //MANDATORY field.  If there is a corresponding wiki page, this is the same as the pagetitle.  Otherwise, this is what the page title will be in the future, and can be used to create the red "add page here" links for this course.
    "crosslist": [<name1>, <name2>, ... ], // This field is also optional
    "nocredit": [<name1>, <name2>, ...], // This field is also optional
    "type": <string>, // Also optional - use instead of eetype/cetype
    "eetype": <string>, //optional - use with cetype
    "cetype": <string>, //optional - use with eetype
    "prereqs": [ [<prereq1 name>, <prereq2 name>], ... ],
    "coreqs": [ [<coreq1 name>, <coreq2 name>], ... ],
    "subfield": <subfield the class belongs to> // should be present for ECE/CS tech electives.
    "omit_from_graph": <true or false> //OPTIONAL field.
  },
  {
    "name": <next course name>,
    "title": <The course's title>, //there is no page for this course, and hence no link
    "prereqs": [ [<prereq1 name>, ... ], ... ],
    "coreqs": [ [<coreq1 name>, <coreq2 name>, ... ], ... ],
    "internallink": <page title>,
    "type": <string> // or use eetype & cetype
  },
  ...
]

This format is still under development. Right now, we'll have prereqs and coreqs being the official prereqs / coreqs; in the future, we may extend the format to allow for displaying "DEN recommendations" about prereqs/coreqs. Worth mentioning about prereqs / coreqs: the format is a list of lists: each sublist is a set of prereqs which prepare you for that class. So for example, if we want to say that for ECE 210 you need (MATH 286 or MATH 285) and PHYS 212, we'd set the prereqs to

[["MATH 286", "PHYS 212"], ["MATH 285", "PHYS 212"]]

The crosslist parameter is a list of names by which the course is crosslisted as. A valid crosslist might look like this (for ECE 462):

"crosslist": [ "CS 462", "MATH 491"]

The nocredit field is a list of classes for which you can't get credit for if you take this class; for example, if you take Math 286, you can't get credit for Math 285 or Math 441, so Math 286 would have

"nocredit": ["MATH 285", "MATH 441"]

Three more fields deserve explanations: type, eetype, and cetype. The basic idea here is that "type" tells you what kind of credit a class carries - i.e. whether it is required, elective, an ECE tech elective, etc; look at data.js for examples. eetype and cetype are intended to be used when the two majors disagree on what a course counts as - e.g. ECE 391 is a 3 of 5 course for EE's but for CompE, it's required; thus it's cetype is "required" whereas it's EE type is "3of5".

One issue is having multiple courses under the same name - e.g. ECE 498. This issue will have to be tackled if we ever feel it necessary to tackle this in the graph - these courses tend to be outside the normal curriculum, and change from semester to semester. A possible solution would be to have a suffix added to the name of these courses - e.g. "ECE 498SL" could be Steve Lumetta's "Engineering Parallel Software" class. It's important that the names be unique.

Linking to the wiki

data.json now contains several fields to help with linking to the wiki - namely, link, internallink, and pagetitle. The link field is the url of the wiki tiny link, which has the additional property of being a permalink. The pagetitle is the title of the corresponding wiki page (if it has been successfully matched). These properties are gathered by data/links.py. internallink differs slightly from pagetitle: a class has no pagetitle if it doesn't have a corresponding wiki page, but all classs have an internallink, which is the same as the pagetitle, if it exists, and based on convention if not (see data/link_utils.py). This means that internallink can be easily used to generate the brown "add page here"-type internal wiki links.

As such, internallink should be used everywhere we can generate internal links (i.e. when auto-generating wiki pages), and link should be used anywhere else (with the restriction that the page must exist first). In no case should we use the 'canonical' wiki links (such as "https://wiki.engr.illinois.edu/display/HKNDEN/ECE+190+-+Introduction+to+Computing+Systems"), because these break when the page titles change. The consequences of this design are:

  • Since we use internal links wherever possible, the wiki will keep them up to date when page titles change
  • Since we use tiny links instead of canonical links when we require to link via urls, if the page titles change, our links don't break.

By using a convention to generate the "add page here" internal links to nonexistant pages, we get two advantages:

  • people will be less likely to misname their pages with respect the convention when properly directed to use the "add page here" links
  • internal links from the autogenerated pages can just "link up" to the new content automatically.

Remaining Failure Modes

  • If links data is out of date (new course reviews since the data was last gathered), then some classes will be missing their "link" field.
  • If pagetitles for existing pages change, then the internal links generated with these pagetitles will be brown "add page here" links instead of pointing to existing pages. However, link should still work.
  • For courses without course reviews, the name, title, and crosslist of the course currently determines the title generated for internallink. If any of these change in data/raw_data.json, the "add page here" links generated with the old data won't automatically link up with a page added at a new link, or vice versa.
  • People who ignore the "add page here" links and title the page themselves might not follow the conventions, in which the "add page here" links won't automatically link up to the new page.

The first 2 of these 4 issues can be avoided by running data_scrapers/links.py to get up-to-date links data. What absolutely cannot happen though, is that if a link works at some point, it won't later break (because of a page being renamed). Well, at least, other than that page being removed from the wiki altogether.

Building & Deploying

The general process is:

  • run any data_scrapers, to fetch up to date data.
  • run data/prepare.py to combine everything into the new version of data.json
  • run projects depending on data.json - see subfolders for details.

Deploying depends where the output of the project ends up. Output should be in clearly marked folders, and not interspersed with code.

Subprojects should be easy to run - e.g. curriculum_graph provides a build.sh that can be run to make all the graphs. It even runs data/prepare.py for you.

About

Code for the curriculum graph project for the DEN wiki

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published