diff --git a/index.html b/index.html
index 5f32f50..436e70b 100644
--- a/index.html
+++ b/index.html
@@ -3,4 +3,4 @@
 <span class=navbar-toggler-icon></span></button><div class="collapse navbar-collapse text-center" id=navigation><ul class="navbar-nav ml-auto"><li class=nav-item><a class="nav-link text-dark" href=/DS250-Cannon>Home</a></li><li class=nav-item><a class="nav-link text-dark" href=/DS250-Cannon/projects>Projects</a></li><li class=nav-item><a class="nav-link text-dark" href=/DS250-Cannon/contact>Contact</a></li><li class=nav-item><a class="nav-link text-dark" href=/DS250-Cannon/course-materials>Materials</a></li><li class="nav-item dropdown"><a class="nav-link dropdown-toggle text-dark" href=# role=button data-toggle=dropdown aria-haspopup=true aria-expanded=false>Navigate</a><div class=dropdown-menu><a class=dropdown-item href=/DS250-Cannon/slides>Slides</a>
 <a class=dropdown-item href=/DS250-Cannon/course-materials/syllabus/>Syllabus</a>
 <a class=dropdown-item href=/DS250-Cannon/faq>FAQ</a></div></li></ul></div></div></nav><div class="container section"><div class=row><div class="col-lg-8 text-center mx-auto"><h1 class="text-white mb-3">CSE 250: Data Science Programming</h1><p class="text-white mb-4">Using pandas, Altiar, scikit-learn, and NumPy to program with data</p><div class=position-relative><input id=search class=form-control placeholder="Have a question? Just ask here or enter terms">
-<i class="ti-search search-icon"></i><script>$(function(){var projects=[{value:"Day 2: Project 0",label:"<p>Syllabus Questions?  A note about readings\u0026hellip; Tips for asking for help  Slack Google - acquired discernment   Quarto and tradeoffs Project Submissions: HTML  Are we all on the Slack channel? Follow the Slack invitation that is waiting in your student email. If you don\u0026rsquo;t see an invite, you can join through this link and then ask Brother Cannon to add you to the class channel.\nMethods Checkpoint All the answers will be in the assigned reading or in these slides.\nNotes on Project 0 Installing Packages and Extensions Learn how to install packages by reading the assigned material and by watching the video tutorial on this page.\nThe readings mention a lot of different packages. For Project 0, you need to install at least pandas, altair, numpy, tabulate, and jupyter.\nThe readings will also mention two VS Code extensions you need to install.\nJupyter Notebooks vs. Interactive Python Window Should you decide to use Juypyter Notebooks this semester within VS Code, this is a great guide to get you started.\nOr you can choose to stick with the Python Interactive window like the textbook does.\nUse Your Resources!  Technical documentation Google searches Asking for help on Slack Don\u0026rsquo;t forget the data science lab! (Starts next week.) Question that cannot be answered by the textbook and documentation? Google it. A function you have never seen before? Google it. An error in your code? Google it.  Markdown What is Markdown?  A clean, human readable way to make slick html and pdf documents Used widely among programmers for clean documentation Used widely by Data Scientists to publish results and communicate with stakeholders  Here\u0026rsquo;s a good summary\nQuarto Do your tinkering in interactive Python or Jupyter notebooks. Generate report with finished code, graphs, etc. in Quatro\nQuarto\nNow for some data! Let\u0026rsquo;s get this party started Your turn:  Read in the cars data set Work with you your teams to talk through interesting possibilities for a graph Work on Project 0 Questions and Tasks   Any issues with getting Python installed?     Python VS Code Altair in VS Code     Does everyone have pandas, altiar, numpy, scikit-learn installed?     Video tutorial: how to install packages.  One way to install packages:\npip install pandas altair Maybe a better way to do it: run this in an interactive window.\nimport sys !{sys.executable} -m pip install pandas altair    Does everyone have altair-saver working?     altair_saver Video tutorial     ---------------------------------------------------- Why are we using Altair?    It is built on the VEGA and D3 which are fast and web based.  Grammar of Graphics: Vega-Lite   Technical Paper Website Endorsment      What are we not learning in this course?    Indexing, .loc[] and .iloc[] I may not be experienced enough to understand why I should teach you these. I think they all add complexity to what we are learning in the course and we have elected to avoid it. We will use reset_index() a lot. I think MultiIndex features create complication. I have also elected to use .filter() instead of .loc[] because I like it.\nVirtual Environments Virtual Environments appear to be an important tool as you continue to use Python. We will not be teaching these or supporting these in our course.\nmatplotlib (and any tool leveraging it) It feels old, has a bad api, and isn\u0026rsquo;t declarative.\n   ----------------------------- What can Python Interactive do?    Let\u0026rsquo;s review the power of Python Interactive  # %% in my .py script is much better than Jupyter notebooks (.ipynb).  If we hope to have our code work in a production environment then Jupyter is problematic. Caching and code chunks are problematic https:\/\/medium.com\/@_orcaman\/jupyter-notebook-is-the-cancer-of-ml-engineering-70b98685ee71       Set-up your py script    Setting up your script A good data science .py script will have packages and data loaded at the top. Usually you have a few short commented sentences that descibe the script purpose.\n# %% # import pandas, altair, numpy import pandas as pd import altair as alt import numpy as np # %% # load data # handgrenade data https:\/\/github.com\/byuidatascience\/data4soils\/blob\/master\/data-raw\/cfbp_handgrenade\/cfbp_handgrenade.csv url = \u0026#39;https:\/\/github.com\/byuidatascience\/data4soils\/raw\/master\/data-raw\/cfbp_handgrenade\/cfbp_handgrenade.csv\u0026#39; dat = pd.read_csv(url)    Make a scatter plot with hmx on the x and rdx on the y    To get you started:\nalt.Chart(dat).encode()    Make a spatial plot with hmx colored     Encode the row and column to the axes. Color the hmx points using the \u0026lsquo;goldorange\u0026rsquo; color scheme. Use mark_square() and make the square sizes 500.     -------------------- Create a histogram of hmx     Encode the x-axis as binned. Encode the y-axis as counts. Configure the title to a fontSize of 20. Use properties to place the title.     ----------------------------- How can I get help?     Make sure you read the reading assignments once or twice or five times. Read the guides on the Course Materials page. Post questions in our #cse250_s21_larson slack channel (and try to help others!) Attend the Data Science Lab. Google is your best friend.     -------------------------- </p><p></p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/slides\/introduction\/day02\/"},{value:"Day 3: Resume Fork and Merge",label:"<p>Remember from last class: pull, add, commit, push. Making edits in another user\u0026rsquo;s repo Breakout Room Activity\nEach student in the breakout room is going to provide feedback on another student\u0026rsquo;s resume. The breakout room should begin with a group discussion about the work you\u0026rsquo;ve each done on your resume and any questions the group has. Then follow the steps below.\n fork the other student\u0026rsquo;s resume repository. Now clone that forked repository to your computer. On your local version of the forked repository, do the following;\nA. Create a new file called edits.md and save it in the main folder or the repository.\nB. Make a few recommendations or notes in the edits.md file that will help the other student improve his or her resume.\nC. add, commit, push your edits.\nD. Go to the forked repo on GitHub and check if the edits.md file shows up online. Now, create a pull request to get your edits into the other student\u0026rsquo;s original repo.  Once you\u0026rsquo;ve given another student feedback, accept any pull requests submitted to your own repo. Continue to edit and improve your resume based on the feedback you received.\nCreating a fork in byuids-resumes Fork your own resume repository into the BYU-I Data Science Resumes group.\nIf you change your resume after you create this fork, you will have to submit a pull request to make sure the final version of your resume shows up in the group.\nThese instructions will help you create a pull request.\nOpen time to finalize your resume </p><p></p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/slides\/p6\/d4\/"},{value:"Day 4: May the ML columns be with you",label:"<p>Welcome to class! Spiritual Thought Announcements Getting the data ready for machine learning. What are machine learning algorithms expecting to see?  We need to handle missing values and categorical features before feeding the data into a machine learning algorithm, because the mathematics underlying most machine learning models assumes that the data is numerical and contains no missing values. To reinforce this requirement, scikit-learn will return an error if you try to train a model using data that contain missing values or non-numeric values when working with models like linear regression and logistic regression. ref\n We have some options when converting categorical features (columns) to numeric.\n If the category contains numeric information (like a range of numbers) we can convert it to a numeric variable by taking the minimum, average, or maximum of the range. Factorization: If the category is an \u0026ldquo;ordinal\u0026rdquo; variable (meaning, there is an order to the categories) we can assign each category to an integer. (For example, good = 1, better = 2, best = 3.) One-hot Encoding or Dummy Variables: If the category is a \u0026ldquo;nominal\u0026rdquo; variable (without an order) then we need to use one-hot encoding (sometimes called \u0026ldquo;dummy variable encoding\u0026quot;). If the category is some version of True\/False or Yes\/No then we can simply convert the values to zeros and ones.  What\u0026rsquo;s our game plan for the Star Wars columns? 1. Break into Groups Strategize \u002b Code \u002b Share  Group 1: How are you going to turn Age, Income and Education into numbers? Group 2: How are you going to encode  Who Shot First Gender Location All the Yes\/No responses   Group 3: How are you going to deal with the character rankings?  2. Combine all the factors into one big X dataframe 3. Define Y as those making \u0026gt; $50k First: Limit the data to only people who answered \u0026ldquo;Yes\u0026rdquo; to the question \u0026ldquo;Have you seen any of the 6 films in the Star Wars franchise?\u0026rdquo;.\nThen: Use the table below as a guide to prepare your data for machine learning.\n   Column Original Format Convert To     age category (ordinal, age ranges) number   income category (ordinal, income ranges) number   education category (ordinal, name of degree) number   shot_first category (nominal) one-hot   gender category (nominal) one-hot   location category (nominal) one-hot   fan_star_wars Yes\/No 0\/1   expanded_universe Yes\/No 0\/1   fan_exapanded Yes\/No 0\/1   fan_star_trek Yes\/No 0\/1   seen_i Yes\/No (name of movie\/NaN) 0\/1   seen_ii Yes\/No (name of movie\/NaN) 0\/1   seen_iii Yes\/No (name of movie\/NaN) 0\/1   seen_iv Yes\/No (name of movie\/NaN) 0\/1   seen_v Yes\/No (name of movie\/NaN) 0\/1   seen_vi Yes\/No (name of movie\/NaN) 0\/1   movie rankings number -   character rankings category (ordinal) one-hot or factorize    What functions can we use to convert the categorical columns to numeric?  Range of numbers: str.split() and astype() Ordinal: str.replace() Ordinal: pd.factorize() (can also be used for True\/False) Nominal: pd.get_dummies()  Using the drop_first = True option in get_dummies()    Question: When and why would we drop the first column when we convert a category using pd.get_dummies()?\nAnswer: Whenever your algorithm needs to calculate a matrix inverse.\n The one-hot encoding creates one binary variable for each category.\nThe problem is that this representation includes redundancy. For example, if we know that [1, 0, 0] represents \u0026ldquo;blue\u0026rdquo; and [0, 1, 0] represents \u0026ldquo;green\u0026rdquo; we don\u0026rsquo;t need another binary variable to represent \u0026ldquo;red\u0026rdquo;, instead we could use 0 values for both \u0026ldquo;blue\u0026rdquo; and \u0026ldquo;green\u0026rdquo; alone, e.g. [0, 0].\nThis is called a dummy variable encoding, and always represents C categories with C-1 binary variables. In addition to being slightly less redundant, a dummy variable representation is required for some models.\nFor example, in the case of a linear regression model (and other regression models that have a bias term), a one hot encoding will case the matrix of input data to become singular, meaning it cannot be inverted and the linear regression coefficients cannot be calculated using linear algebra. For these types of models a dummy variable encoding must be used instead.\n Source\n   \u0022, \u0022\u0022)) .astype(\u0027float\u0027) .age_min) ``` You can combine the different features (columns) together using [pd.concat()](https:\/\/pandas.pydata.org\/pandas-docs\/stable\/reference\/api\/pandas.concat.html). ```python dat_numeric = pd.concat([ (dat.age .str.split(\u0022-\u0022, expand = True) .rename(columns = {0: \u0027age_min\u0027, 1: \u0027age_max\u0027}) .apply(lambda x: x.str.replace(\u0022 \u0022, \u0022\u0022)) .astype(\u0027float\u0027).age_min), (dat.household_income .str.split(\u0022-\u0022, expand = True) .rename(columns = {0: \u0027income_min\u0027, 1: \u0027income_max\u0027}) .apply(lambda x: x.str.replace(\u0022\\$|,|\\\u002b\u0022, \u0022\u0022)) .astype(\u0027float\u0027).income_min), (dat.education .str.replace(\u0027Less than high school degree\u0027, \u00279\u0027) .str.replace(\u0027High school degree\u0027, \u002712\u0027) .str.replace(\u0027Some college or Associate degree\u0027, \u002714\u0027) .str.replace(\u0027Bachelor degree\u0027, \u002716\u0027) .str.replace(\u0027Graduate degree\u0027, \u002720\u0027) .astype(\u0027float\u0027))], axis = 1 ) ``` Use `pd.get_dummies()` or other functions from these slides to finish preparing the columns for machine learning. Below is one example witih `pd.get_dummies()`. What difference does the `drop_first` option make? ```python dat_onehot = pd.get_dummies(dat.filter([\u0027shot_first\u0027])) dat_onehot = pd.get_dummies(dat.filter([\u0027shot_first\u0027]), drop_first = True) ``` When you\u0027re done, you can use `pd.concat()` again to combine all your features. ```python dat_ml = pd.concat([ # all of the movie rankings (already numbers, no conversion needed), # age, income, and education variables # all the \u0022one-hot\u0022 encoded variables # all the 0\/1 encoded variables ], axis = 1).dropna() ``` ----------------------------------------- Predicting income. Grand Question 4 wants us to \u0026ldquo;build a machine learning model that predicts whether a person makes more than $50k\u0026rdquo;.\nWhat is the target we\u0026rsquo;re interested in?    Aka, what is our \u0026ldquo;outcome\u0026rdquo; or \u0026ldquo;response\u0026rdquo; that we want to predict?\ndat_ml.income \u0026gt; 50000    How to format the features (x) and target (y)    Remember not to include the answer (income) in your features!\nx = dat_ml.drop([\u0026#39;income\u0026#39;], axis = 1) The response needs to be saved as a 0\/1 variable (at least, for binary classification algorithms).\ny = (dat_ml.income \u0026gt; 50000) \/ 1    One example of a model    First we need to build and train the model.\nfrom sklearn.model_selection import train_test_split from sklearn.tree import DecisionTreeClassifier from sklearn.tree import export_text # split the data (x) and response (y) into training and testing sets x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = .33, random_state = 2020) # build and train the model decision_tree = DecisionTreeClassifier(random_state=0, max_depth=5) decision_tree = decision_tree.fit(x_train, y_train) # what does the decision tree look like? r = export_text(decision_tree, feature_names=x_train.columns.to_list()) print(r) Then we can test it to see how well it does.\nfrom sklearn import metrics # make predictions with the test data predict_y = decision_tree.predict(x_test) # how well did our model do? metrics.plot_confusion_matrix(decision_tree, x_test, y_test) print(metrics.accuracy_score(y_test, predict_y))    ----------------------------------------------- </p><p></p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/slides\/p5\/d4\/"},{value:"DS 250 Syllabus",label:"<p> Why is a raven like a writing desk? -Lewis Carroll- v -The Internet-\n Contact Information Instructor Name: Paul Cannon Email: cannonp@byui.edu Phone: (208) 496-7565 Student hours: Ricks 216, TBD Calendly: https:\/\/calendly.com\/cannonp\nOverview This course provides a better understanding of data programming. If you have signed up for this class, you are most likely driven by curiosity and interested in how data decisions are made. Possibly, you have a more empathetic approach to how the world works and how problems can be solved. Finally, you have an eye for how society reports and uses data to make impactful decisions.1.\nUpon completing this course, you will be able to use data-driven programming in Python to handle, format, and visualize data. We will introduce you to data wrangling techniques, analytical methods, and the grammar of graphics. Specifically, as a successful learner, you will be able to;\n Use functions, data structures, and other programming constructs efficiently to process and find meaning in data. Programmatically load data from various types of data sources, including files, databases, and remote services. Use data manipulation libraries to perform straightforward analysis, produce charts, and prepare data for machine learning algorithms. Use machine learning libraries to discover insights, make predictions, and interpret the success of these algorithms. Use industry-leading tools to collaborate and share your work.  Principles of DS teaching The course follows these principles of teaching Data Science2\n Organize the course around a set of diverse projects Integrate computing into every aspect of the course Teach abstraction, but minimize reliance on mathematical notation Structure course activities to realistically mimic a data scientist\u0026rsquo;s experience Demonstrate the importance of critical thinking\/skepticism through examples  Competency assumptions This course focuses on programming with data to find insights. The prerequisite for this course is an introductory programming course in Python (CSE 110)3. We recommend taking CSE 111 before or during the same semester you take this course - especially if programming is complicated for you. We assume that you do know what the Terminal is and how to execute scripts.\nAn understanding of standard deviation{target=\u0026quot;blank\u0026rdquo;} and variance{target=\u0026quot;blank\u0026rdquo;} will be valuable.\nCourse materials and structure This course focuses on building core data science skills. You will learn to program, but you will also learn how to communicate and collaborate with your peers and mentors.\nCourse communication  How do I talk with my teacher, TA, and other students in this class?\n  We use Slack for most class and one-on-one communication. Don\u0026rsquo;t email or direct message using I-Learn.\nA. Should I paste code snippets in our class Slack channel to get help? Yes.\nB. Should I ask questions about the projects and the readings in our class Slack channel? Yes.\nC. Should I post random quotes or videos in our class Slack channel? No. Use the #random channel. All assignments are submitted in I-Learn. A. Each project submission requires you to submit a short message to the teacher about your work.\nB. We will respond to your message with edits you can make to earn full credit on your resubmit.\nC. Class announcements about the grading of projects are posted in I-Learn.  Online reading materials  Python for Data Science: A port of R for Data Science using the Python packages pandas and Altair. pandas User Guide Altair User Guide Python Data Science Handbook SQL by data.world  Preparation In my experience, getting lectured training outside of college is even more expensive than it is in college. A week\u0026rsquo;s worth of training can cost more than a semester of school here at BYUI. Due to this expense, learning how to digest online material gain understanding before going to the expert with questions is a valuable skill to develop. I expect that you have completed the assigned reading material before class begins.\nSpecifications grading Grading is a nasty side effect of mass learning and academia. We are in a class at a university and will have to manage this side effect. However, we don\u0026rsquo;t have to let it control our learning, thinking, or this class. Learning and thinking should motivate each activity.\nAs we team, teacher and student, we have the challenge to become more! We have worked hard to identify the specifications needed for a python user of the pandas and Altair packages. Our goal is to align your grade with the skill specification you have mastered. In other words, the grade you want will determine how much work you will do. We will not score individual tasks in the class on a percentage scale. If your work meets the specified criteria, you will get full credit.\nIn a specifications-grading system, all tasks are evaluated on a high-standards pass\/fail basis using detailed checklists of task requirements and expectations4. You earn your letter grade by earning passing marks on a set of tasks. This system provides various choices and is closer to how learning and work occur in the real world. It will be easy for us to tell if work is complete, done in good faith, and consistent with the requirements.\nGrading Scale and Elements The grading scale describes the amount of work you must put into the grading elements to achieve the your desired grade. The grading scale only lists requirements for A, B, C, and D. You can request half-step adjustments if you fall slightly short or over on some elements.\nYou will need to provide a detailed description of your completed grading elements in your Review and Request Letter (due at the end of the semester) to support your grade request. For example, if a student was in the B range they might write the following:\n\u0026ldquo;I got three fives, one four, and two threes for 29 points on the projects. I met the checkpoint requirements with 4 halfway checkpoints and 4 full-mark methods checkpoints. I regularly attended data science society. I believe my coding challenge will be above a 3. I request a B\u002b.\u0026quot;\n  Leader (A)    Element Requirement Description     Projects 34 Points 5 points per project   Mid-project checkpoints 5 completed Full credit   Methods \u0026amp; Calculations checkpoints 6 completed 100% unlimited attempts   DS Community Complete 2 \u0026ndash;   Request and review letter submission \u0026ndash;   Coding challenge At least 3 Score is out of 4    See the competency descriptions in the next section. You can request half-step adjustments if you fall slightly short or over on some elements.\n Supporter (B)    Element Requirement Description     Projects 29 Points 5 points per project   Mid-project checkpoints 3 completed Full credit   Methods \u0026amp; Calculations checkpoints 5 completed 100% unlimited attempts   DS Community Complete 2 items \u0026ndash;   Request and review letter submission \u0026ndash;   Coding challenge At least 3 Score is out of 4    See the competency descriptions in the next section. You can request half-step adjustments if you fall slightly short or over on some elements.\n Listener (C)    Element Requirement Description     Projects 24 Points 5 points per project   Mid-project checkpoints 3 completed Full credit   Methods \u0026amp; Calculations checkpoints 3 completed 100% unlimited attempts   DS Community Complete 1 item \u0026ndash;   Request and review letter submission \u0026ndash;   Coding challenge At least 2 Score is out of 4    See the competency descriptions in the next section. You can request half-step adjustments if you fall slightly short or over on some elements.\n Asleep (D)    Element Requirement Description     Projects 14 Points 5 points per project   Mid-project checkpoints 1 completed Full credit   Methods \u0026amp; Calculations checkpoints 2 completed 100% unlimited attempts   DS Community None \u0026ndash;   Request and review letter None \u0026ndash;   Coding challenge None Score is out of 4    See the competency descriptions in the next section. You can request half-step adjustments if you fall slightly short or over on some elements.\n   Competency elements  Projects (Grand questions) Each of the seven projects is worth 5 points and you get one additional submission after the due date. There are 5 two-week projects and two one-week projects to start and end the class.\nGrading Details  1 point: Submission 3 points: submission of a good faith attempt with a statement of work quality. 4 points: High-quality work that addresses each of the Grand Questions and a statement of work quality. 5 points: Addressed reviewer issues and completion of resubmission if needed.   Checkpoints (methods and calculations) These checkpoints are in Canvas and they open when the project starts. They have unlimited attempts and remain open until the end of the semester.\nExamples  Fact-Finding Questions (Calculate descriptive summaries): Fact-finding questions help you with calculations that build into the Grand Questions of the project. These questions have clearly defined answers using Python calculations. You should expect 2-3 problems.   Example: Using the top 10 airports in size, what is the average size? Example: What proportion of flights are delayed at the largest airport?   How the code works questions (Explaining the tools): This part could have direct answer questions or open-ended questions.   Example (direct): What is the recommended function for arranging your data by a variable? What are the outputs after using \u0026lt;FUNCTION\u0026gt;? Example (open): Your client has shown some confusion about NumPy\u0026rsquo;s \u0026lsquo;nan\u0026rsquo; handling in Python. Help them understand by answering the question, \u0026lsquo;How is missing data handled in Pandas?\u0026rsquo;   Checkpoints (Mid-project status) The mid-project checkpoint has a few questions. It opens the first day of the project and closes at 1 am on the 3rd day of class for the project. It has the following questions.\nExamples  Have you checked off more than one grand question from the current project? (Yes\/No) Have you spent at least 2 hours using code to tackle problems related to the case study? (Yes\/No). Have you prepared questions you have about the case study to ask in your next meeting? (Yes\/No).   Data science community To earn credit for the DS Community element you must complete two different tasks from the list below. At the end of the semester, you will be asked to report on which tasks you completed and what you learned from them.\n Attend Data Science Society at least once. Sign up for an email newsletter that will teach you more about data science. Data Science Weekly or Data Elixir are good options. Listen to a podcast episode about data science. Build a Career in Data Science has some excellent episodes. Watch a professional presentation on YouTube about data science. Be prepared to share the link and a summary of the video. Reach out to someone who works in a data-related field and ask them for 15 minutes of their time. Use this time to conduct an \u0026ldquo;informational interview\u0026rdquo; and learn more about their responsibilities and career path. Research and apply to at least 5 data-related jobs or internships.   Finishing the semester Submit a request and review letter that includes what you have learned from this class, the next data science course you plan on taking, and the final grade that you are requesting based on the work you have submitted.\n Coding challenge We will have an in-class coding challenge on the ultimate or penultimate day of class. It would be best if you did not view this challenge like a traditional exam. It will cover the general techniques that we have been practicing throughout the course.\nWe expect to have a few practice challenges throughout the semester. We will score the coding challenge on a four-point scale.\n 1 point: At least you tried. 2 points: You have learned some items from the course, but your work in the coding challenge is deficient. 3 points: Your submission uses proper coding techniques and addresses the objective. 4 points: Exceptional work. Your code can be used as a solution to share with others.       https:\/\/medium.com\/@nikhilbd\/what-makes-a-good-data-scientist-engineer-a8b4d7948a86#.jr80wl98y. I suppose some of you are just taking this class because your degree says you can, and it fits in your schedule. If so, we should chat to make sure this is the right class for you. \u0026#x21a9;\u0026#xfe0e;\n https:\/\/arxiv.org\/ftp\/arxiv\/papers\/1612\/1612.07140.pdf. You will see this pattern in DS 350, DS 460, and Math 488. It will progressively get more realistic. \u0026#x21a9;\u0026#xfe0e;\n We do expect that this is not your first experience with Python and VS Code. If you have done other programming courses, you should be able to succeed in this course. If you have any questions, please ask. \u0026#x21a9;\u0026#xfe0e;\n Making the right checklists can be difficult. Bad checklists could fall in the following categories \u0026ndash; vague and imprecise; too long; hard to use; impractical; too pedantic. Useful checklists are precise, efficient, easy to use and understand. This is the first time this course has been offered, so we will have to work together to ensure the requirements are reasonable. \u0026#x21a9;\u0026#xfe0e;\n   </p><p></p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/course-materials\/syllabus\/"},{value:"Introduction",label:"<p>A competent student should be able to finish the exercises within 60 minutes. You should work through it on your own. This serves as an assessment of your understanding of the assigned readings.\nBefore you start Make sure you have installed VS-code, pandas, and altair on your computer. You can install these package by typing this line in the terminal.\npip install pandas altair\nOR if you have more than one version of python\npip3.9 install pandas altair\npip3.9 indicates the version of python you are installing the packages to.\nPart 1 Get familiar with your tools Programming involves a lot of research. Unlike subjects like Mathematics or History, we are not required to remember every single function and its usage. It is natural for experienced programmers to look for answers on the internet, books, even from other people\u0026rsquo;s code. Programming will be extremely frustrating if we are not allowed to do web searches, so please get familiar with the tools you have and use them often.\nOffical Documentation This should be your first resort for understanding any code\/function. Scanning the documentation of a function will allow you to get an overview of its usage.\nHere is a link to the documentation of the assign() function:\n(https:\/\/pandas.pydata.org\/docs\/reference\/api\/pandas.DataFrame.assign.html)\nExample of assign() (as shown in the documentation)\nimport pandas as pd df1 = pd.DataFrame({\u0026#39;temp_c\u0026#39;: [17.0, 25.0]}, index=[\u0026#39;Portland\u0026#39;, \u0026#39;Berkeley\u0026#39;]) df2 = df1.assign(temp_f=df1.temp_c * 9 \/ 5 \u002b 32) Exercise 1: After reading the documentation for assign(), write a short paragraph to explain assign() as if you were talking to someone with zero programming experience (use the example above to help you explain assign()).\n What is the difference between df1 and df2? How was df2 derived from df1?)  Online textbook It pains us to see students would rather be stuck at problems for hours yet they refuse to use the textbook. This is another very useful resource since this is designed for this class. link to the textbook: (https:\/\/byuidatascience.github.io\/python4ds\/)\nExercise 2: Locate the section where the textbook talks about query() and answer these questions.\n What function in R\u0026rsquo;s dplyr is equivalent or comparable to query() in pandas (You should include the section number in your answer)? What is the easiest mistake for python beginner to make that was shown in the text about query() (You should include the section number in your answer)?  The internet Google is a programmer\u0026rsquo;s friend. Get used to googling thing, in fact, you want to be an expert in googling\n Question that cannot be answered by the textbook and documentation? Google it. A function you have never seen before? Google it. An error in your code? Google it.  Exercise 3: Provide at least 2 extra resources you could find about the pandas function drop() on the internet.\nTutor, TA (Through slack, zoom, or in-person) We want to help you with your work; we want to answer your questions; but most importantly, we want to help you succeed in this class. That will require you to put in the necessary time in understanding the readings, coding and debugging. When you ask us a question, we expect that you have read the documentation, searched the textbook, and done your own research. Then we can be most helpful and can provide insights on top of your understanding.\nExamples of bad questions  How does drop() work? We will ask you to read the documentation for drop(). How do you make a table in a markdown file? We will refer you to the textbook. I don\u0026rsquo;t want these columns in my data, how can I drop them? We will ask you if you have found any things on the internet.  Examples of good questions  I am still confused about the syntax of drop(). After reading the documentation, this is my understanding of the function\u0026hellip; . What am I missing? I tried making a table in markdown (show code), it is still not giving me what I want, how can I fix this? I am trying to drop these columns in my dataframe, I think drop() is what I am looking for. Am I in the right direction? If not, what keywords should I be googling?  Exercise 4:\nUsing the code and tools mentioned above, finish question 4 and 5 under 3.2.4 in the textbook.(use the data in mpg for your plot):\n# library import import pandas as pd import altair as alt # data import url = \u0026quot;https:\/\/github.com\/byuidatascience\/data4python4ds\/raw\/master\/data-raw\/mpg\/mpg.csv\u0026quot; mpg = pd.read_csv(url)   Question 4: Make a scatterplot of hwy vs cyl.\n  Question 5: What happens if you make a scatterplot of class vs drv? Why is the plot not useful?\n  After you have completed this skill builder with your team (or on your own) then compare your work to our script    See the script.   </p><p></p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/skill_builders\/introduction\/"},{value:"Project 0: Introduction",label:"<p>Background We will complete six projects during the semester that each take about four days of class. On average, a student will spend 2 hours outside of class per hour in class to complete the assigned readings, submit any Canvas items, and complete the project (for a total of 8 hours per project). The instruction for each project will be structured into sections as written on this page.\nThis first Background section provides context for the project. Make sure you read the background carefully to see the big picture needs and purpose of the project.\n Python and VS Code are tools commonly used in the field of data science. During our first two days of class we will get VS Code prepped for data science programming. Completing Project 0 will set you pu for success the rest of the semester.\nData Every data science project should start with data, and our class projects are no different. Each project will have \u0026lsquo;Download\u0026rsquo; and \u0026lsquo;Information\u0026rsquo; links like the ones below.\n Download: mpg data\nInformation: Data description\nReadings The Readings section will contain links to reading assignments that are required for each project, as well as optional references. Remember that you are reading this material to build skills. Take the time to comprehend the readings and the skills contained within.\nWe recommend reading through the assigned material once for a general understanding before the first day of each project. You will reread and reference the material multiple times as you complete the project.\n The readings listed below are required for the first two days of class.\n Python for Data Science (P4DS): Introduction P4DS: Data Visualization Section 3.1 \u0026amp; 3.2 Only  Optional References  VS Code user interface Reading Technical Documentation  Questions and Tasks: This section lists the questions and tasks that need to be completed for the project. Your work on the project must be compiled into a rport and submitted in Canvas by the weekend following the last day of material for the project.\n  Finish the readings and be prepared with any questions to get your environment working smoothly (class for on-campus and Slack for online) In VS Code, write a python script to create the example Altair chart from section 3.2.2 of the textbook (part of the assigned readings). Note that you have to type chart to see the Altair chart after you create it. Your final report should also include the markdown table created from the following (assuming you have mpg from question 2).  print(mpg .head(5) .filter([\u0026#34;manufacturer\u0026#34;, \u0026#34;model\u0026#34;,\u0026#34;year\u0026#34;, \u0026#34;hwy\u0026#34;]) .to_markdown(index=False)) Deliverables: Deliverables are “the quantifiable goods or services that must be provided upon the completion of a project”. In this class the deliverable for each project is a HTML report created using Quarto. This final section will be the same for each project.\n Use this template to submit your Client Report. The template has three sections (for additional details please see the instructional template):\n A short summary that highlights key that describes the results describing insights from metrics of the project and the tools you used (Think “elevator pitch”). Answers to the questions from the \u0026ldquo;Questions and Tasks\u0026rdquo; section above. Each answer should include a written description of your results, code snippets, charts, and tables.  This is a simple note.\n This is a simple tip.\n This is a simple info.\n -- </p><p></p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/projects\/introduction\/"},{value:"Pull and Merge Forks on GitHub",label:"<p>Create Pull Request   Go the the forked repository in byuids-resumes and click Pull request.    This will bring you to the the following page where you need to click switching the base.    Now you can Create pull request.    Here you can type a note and then actually Create pull request.    Now you need to View pull request.   Merge Request If you have admin access of the forked repository where you are doing the pull request, you can finish the next two steps.\n Click the Merge pull request button.    Now confirm the merge.   </p><p></p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/course-materials\/git_github_ds\/pull_merge\/"},{value:"Week 12-13: Project 6 - Github",label:"<p> GitHub is the communication tool for Data Scientists and developers. As students, you will want to curate your creative work on GitHub using Git. GitHub is the place to share your original work, not your homework assignments. Many people store their personal websites, blogs, and project websites on GitHub. Our textbook and course are hosted on GitHub, and you can see J. Hathaway\u0026rsquo;s or Ryan Hafen\u0026rsquo;s personal Data Science websites that are hosted on GitHub as well. You will be making your public resume that will be hosted on GitHub for this project.\nIn the process of this project, we will be learning the process of Git and the tools of GitHub. We will use the Git process to have others in our class to edit our resumes. Take the process seriously (pick a suitable username and write a good resume), and you will have the beginning of your social presence in the DS\/CS space.\n Completed Readings: GitHub, a programmer\u0026rsquo;s social media, Join GitHub, Repository Templates, Using Version Control in VS Code, Working with GitHub in VS Code, Git in Visual Studio Code video, New to Git and GitHub? This Essential Beginners Guide is for you, Git vs. GitHub: What is the difference between them?\n Markdown Resume (mdresume) Repository and BYUI Data Science Resumes\n Grand Questions  Join the BYUI Data Science Resumes GitHub organization and use the template repository to make a resume repository under your repositories. A good name might be LASTNAME-Resume. Clone your repository to your computer and build a first draft of your resume. Push your results to GitHub and have another student fork your repository to make edits. Accept the proposed changes from the student review and finish your final version. Make sure your resume is forked by BYU-I Data Science Resumes  </p><p></p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/slides\/p6\/"},{value:"Day 1: Welcome",label:"<p>Welcome to DS 250!  Teacher: Paul Cannon TA: David Pineda  Announcements  Devotional Computing Lab 4:30PM - 6:30PM all weekdays except Wednesday. Saturday from 10AM-12PM  Slack channel #tutoring_lab   Data Science Society - Wednesday\u0026rsquo;s at 6PM  What is a Data Scientist? A Data Scientist has a C\u002b Talent Stack Class Structure  Problem Solving Improved coding skills Effective written\/visual communication Collaboration Timeliness and communication with \u0026ldquo;the boss\u0026rdquo;  Syllabus\nGot Slack? Are we all on the Slack channel? Follow the Slack invitation that is waiting in your student email. If you don\u0026rsquo;t see an invite, you can join through this link and then ask \u0026ldquo;@Paul Cannon\u0026rdquo; to add you to the class channel.\nWho are you?  Introduce yourself and learn the names\/majors\/origin story of your group members. Make a plan to get help this semester. How will you contact each other? Some ideas: Slack, I-Learn, emails, group texts, etc. If you were independently wealthy, what would you be doing right now? Would you change majors? Highlights of 2022  Problem Solving This is not a \u0026ldquo;see and repeat\u0026rdquo; programming class!\nHow would you go about fixing my motorcycle? Learn how to ask for help (1 hr rule)  Getting started on Project 0 Setting up your Programming Snvironment  Download Visual Studio Code Download Python v (3.10.8)  Be sure to select the \u0026ldquo;Add to Path\u0026rdquo; option during the install process    Install the Python packages and VS Code extensions you need (see this page)  pip install pandas pip install numpy pip install jupyter pip install tabulate pip install altair   Install Quarto CLI Quatro Instructions Start looking at Project 0 Complete the \u0026ldquo;Methods Checkpoint\u0026rdquo;  Installing Packages and Extensions Learn how to install packages by reading the assigned material and by watching the video tutorial on this page.\nThe readings mention a lot of different packages. For Project 0, you need to install at least pandas, altair, numpy, and jupyter.\nThe readings will also mention two VS Code extensions you need to install.\nA note on Jupyter Notebooks vs. Interactive Python Window The textbook will show you how to use VS Code\u0026rsquo;s interactive python windows and Quatro. Feel free to use Jupyter Notebooks.\nWe will do write-ups in Quarto, though, which can be rendered as a PDF or HTML\nIntroduction to Brother Cannon    What do you want to know?    What is a data scientist?    Brother Hathaway\u0026rsquo;s definition:\n A blend of programmer, statistician, and communicator that burns with curiosity.\n My definiton for DS 250:\n Someone who can extract insights from data and then communicate those insights with clarity.\n Learn more about the BYU-Idaho data science program here.\n   What is data science programming?    Data scientists write code as a means to an end, whereas software developers write code to build things. Data science is inherently different from software development in that data science is an analytic activity, whereas software development has much more in common with traditional engineering.\nData scientists tackle problems such as identifying fraudulent transactions, or predicting which employees are likely to leave a company. Software developers can take the data scientists models and turn them into fully functioning systems with production-quality code. Software developers tackle problems like getting an algorithm to run more efficiently, or building user interfaces.\n   Course Outcomes    Upon completing this course, you will be able to use data-driven programming in Python to handle, format, and visualize data. We will introduce you to data wrangling techniques (panadas), analytical methods (scikit-learn), and the grammar of graphics (Altair). Specifically, as a successful learner, you will be able to:\n Use functions, data structures, and other programming constructs efficiently to process and find meaning in data. Programmatically load data from various types of data sources, including files, databases, and remote services. Use data manipulation libraries to perform straightforward analysis, produce charts, and prepare data for machine learning algorithms. Use machine learning libraries to discover insights, make predictions, and interpret the success of these algorithms. Collaborate and share your work with industry-leading tools.     BYU-Idaho Mission Statement     Brigham Young University-Idaho was founded and is supported and guided by The Church of Jesus Christ of Latter-day Saints. Its mission is to develop disciples of Jesus Christ who are leaders in their homes, the Church, and their communities.\n  How would you describe a leader? What makes a leader powerful? What does a leader do with insights?  An example of a good leader.\nWhat (or who) is truth?\n   ## Course Format and Grading How hard is this class going to be?    The reality of CSE 250:\n We have done all we can to ensure that this is a 2-credit course for the average student. That means that we expect 4-6 hours outside of class for the average student to achieve an A. You have to put in the time if you want to build skills. The course is necessarily creative in nature. That fact usually makes it feel more challenging. We will be asking you to learn to write creative data science python code. If you have any concerns, please talk with me!     What is the structure of CSE 250?    The class uses 7 projects to teach data science programming in Python using pandas, Altair, scikit-learn, and numpy.\n Projects Syllabus     How do I get the grade I want?     Specification Grading Grading structure Competency Elements  Introduction Project \u0026mdash;\u0026mdash;\u0026mdash;\u0026mdash;\u0026mdash;\u0026mdash;\u0026mdash;\u0026mdash;\u0026mdash;\u0026mdash;\u0026mdash;\u0026mdash;\u0026mdash;\u0026mdash;\u0026mdash;\u0026mdash;\u0026mdash;\u0026mdash;\u0026mdash;\u0026mdash;\u0026gt;\nWhat is the goal?    Completing the introduction project will set you up for success the rest of the semester. The workflow followed in the introduction project (loading packages, writing code, saving images, compiling a final report) will be the same for every other project . If you have questions about this project, you need to seek help.   What exactly do I need to submit?    Make sure you carefully read the project instructions.\nYou will submit a single .pdf file to I-Learn. This pdf file should contain an project summary, your answers to the grand questions (including the plot you saved with altair_saver), and an appendix where you copy and paste your commented Python code.\n   --------------------------------------------------------   ----------------------------------------------- </p><p></p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/slides\/introduction\/day01\/"},{value:"Day 2: Commit, push, fork, and merge",label:"<p>Welcome to class! Announcements Practice with Git GQ3: add, commit, push and a little pull Let\u0026rsquo;s save the changes we\u0026rsquo;ve made to our resume.\nGQ4: Fork and merge Get into groups of 2 or 3. Then follow the steps below:\n fork the other student\u0026rsquo;s resume repository. Now clone that forked repository to your computer. On your local version of the forked repository, do the following:\nA. Create a new file called feedback.md B. Make a few recommendations or notes in the feedback.md file that will help the other student improve his or her resume\nC. add, commit, push your edits\nD. Go to the forked repo on GitHub and check if the feedback.md file shows up online Now, create a pull request to get your edits into the other student\u0026rsquo;s original repo.  Once you\u0026rsquo;ve given another student feedback, accept any pull requests submitted to your own repo. Continue to edit and improve your resume based on the feedback you received.\nGQ5: Fork into byuids-resumes Fork your own resume repository into the BYU-I Data Science Resumes group.\nIf you change your resume after you create this fork, you will have to submit a pull request to make sure the final version of your resume shows up in the group.\nThese instructions will help you create a pull request.\n</p><p></p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/slides\/p6\/d3\/"},{value:"Day 3: Validating data, cleaning columns",label:"<p>Welcome to class! Announcements Spiritual Thought Let\u0026rsquo;s validate some data! Pick something from the Star Wars article you want to validate (\u0026ldquo;double check\u0026rdquo;).\nMoving from categories to values.   Create an additional column(s) that converts the income ranges to a number. Create an additional column(s) that converts the age ranges to a number. Create an additional column(s) that converts the school groupings to a number.    str.replace(\u0027\u0026rsquo;, \u0026lsquo;9\u0026rsquo;) astype(\u0026lsquo;float\u0026rsquo;) pd.concat(axis=1)  Validating visuals You\u0026rsquo;re going to make a lot of bar charts!\n Simple bar chart tutorial. Make Altair do the counting for you! Tutorials here and here.  Getting started on Question 3 One-hot encoding Project 5 asks you to \u0026ldquo;one-hot encode all columns that have categories\u0026rdquo; and \u0026ldquo;convert all yes\/no responses to 1\/0 numeric\u0026rdquo;.\nThe get_dummies method can be used to create one-hot encoded variables. The pd.get_dummies documentation is a great place to start.\nAfter reading the documentation, study the code below and get started on Grand Question #3.\n#%% # When we use machine learning to predict salary, # let\u0026#39;s only look at people that have seen at least # one star wars film starwars = starwars.query(\u0026#39;have_seen_any == \u0026#34;Yes\u0026#34;\u0026#39;) # Discuss - what\u0026#39;s a better way to filter out people  # who haven\u0026#39;t seen star wars? # %% # Format columns for machine learning # Let\u0026#39;s try this first: convert categories to \u0026#34;one-hot\u0026#34; encodings shot_first_onehot = pd.get_dummies(starwars.shot_first) shot_first_onehot # What the difference between code above, # and this? Which one is better? shot_first_onehot = pd.get_dummies(starwars.shot_first, drop_first=True) shot_first_onehot # %% # \u0026#39;get_dummies()\u0026#39; can also be used to convert yes\/no answers to 0\/1 episode_i = pd.get_dummies(starwars.seen_film_i__the_phantom_menace) episode_i # %% episode_i.value_counts() </p><p></p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/slides\/p5\/d3\/"},{value:"pandas and Altair",label:"<p>For this skill builder, we are exploring some important functions in the package of pandas and Altair. DS programming requires a lot of data wrangling. Using the proper functions, we can create concise and comprehensive codes. You should be exposed to a few functions through the readings this week.\nYou may want to at least scan the readings before beginning this task since this serves as an assessment of your understanding of the assigned readings. A prepared student should be able to finish the exercises within 60 minutes. You should work through it on your own.\nBefore you start Make sure you have installed VS-code, pandas, and Altair on your computer. You can install these packages by typing this line in the terminal:\npip install pandas altair\nOR if you have more than one version of python:\npip3.9 install pandas altair\npip3.9 indicates the version of python you are installing the packages to.\nData import Run the following code to import the data we need for this skill builder:\n# package import import numpy as np import pandas as pd import altair as al # data import dat = pd.read_csv(\u0026#34;https:\/\/vincentarelbundock.github.io\/Rdatasets\/csv\/AER\/Guns.csv\u0026#34;) Make sure the variable dat is correctly assigned in your environment and finish the following exercises. You can read the documentation of the data on this page - https:\/\/vincentarelbundock.github.io\/Rdatasets\/doc\/AER\/Guns.html\nExercise 1 One of the first things we can do to a freshly imported data is to check its columns. This will help us understand the basic structure of the dataframe(table).\n Using one line of code, select all the columns in dat, assign it to a variable called col_list.\n  Hint Every dataframe has an attribute \u0022columns\u0022. Accessing this attribute will give you a list of all column names  We often want to know the dimension of a dataframe. How many columns are in the dataset? How many rows are in the dataset?\n Using one line of code, show the number of columns and rows in dat.\n  Hint Every dataframe has an attribute \u0022shape\u0022. Accessing this attribute will give you the dimension of a datafarme  Now run dat.head(). It will print out the first 5 rows of data in dat.\n Just from looking at the output, what column(s) seems to be redundant with the row number?\n  Hint There is one column that serves as nothing but a row counter, that columns is redundant.  Exercise 2 After a brief investigation of the data, we will clean up the data. By cleaning up, we are trying to filter down dat so this only holds data we need. We will first get rid of the extra column we found in the previous excercise.\n Using one line of code, drop the redundant column using the variable col_list (created in excercise 1)\n  Hint Use `drop()`. Understand what \u0026ldquo;axis\u0026rdquo; is as a parameter of drop().\nYour function should looks like this:\ndat.drop([col_list[_]], axis = _)\nfill the \u0026ldquo;_\u0026quot;\u0026rsquo;s with the correct values and assign the output to dat.\n Don\u0026rsquo;t forget to save the changes in dat. Run dat.head() to make sure the column is dropped in dat.\nExercise 3 We have filtered dat vertically by dropping a column. Now we will try to filter dat horizontally, meaning we will get rid of some the rows.\nWe can do that by applying a condition to dat. A condition is an expression that can be evaluated as True\/False. For example, 8 \u0026gt; 5 is an expression that evaluates to be True. This is trivial because 8 will always be greater than 5.\nRun the code below:\n what is the difference between exp1 and exp2?\n exp1 = 8 \u0026gt; 5 exp2 = dat.violent \u0026lt; 300  Hint Try type() on else variable OR calling else variable.  Run ths code below:\n By putting dat.violent \u0026lt; 300, and the violent column from dat into a dataframe, what is the relationship between the two columns?\n exp = pd.DataFrame({\u0026quot;dat.violent \u0026lt; 300\u0026quot; : exp2, \u0026quot;violent value from dat\u0026quot; : dat.violent}) exp  Hint Try computing `dat.violent[n]  Using query(), filter down the dat so that it only contains the data for idaho\n  Hint query() takes in expressions and filters down data.  Don\u0026rsquo;t forget to save the changes in dat. Run dat.shape() to make sure the there are 23 rows and 13 columns.\nExercise 4 Besides filtering, we can manipulate the data by adding new data to it. By adding a new column to the data, we assign a new value to each row.\n Using assign(), create a new column that show the ratio between murder rate and violent rate.\n  Hint Use assign() You see get the ratio by computing this code:\ndat.murder\/dat.violent\n Exercise 5  Create a scatter plot that shows the relationship between murder rate and violent rate for the state of Idaho. Your chart should show murder rate as the x-axis, violent as the y-axis.\n  Hint Can you mimic this plot? (https:\/\/altair-viz.github.io\/gallery\/scatter_tooltips.html)\n  For an extra push Exercise 6  Using a line of code, filter down the data set so that it only shows the data in years between 1993 and 1997.\n Exercise 7  Create a line chart that show prisoners numbers for the state of Idaho, Utah, and Oregon.\n Your chart should show year as the x-axis, prisoner as the y-axis, states as different colours, along with an appropriate title.\nExercise 8  Without using query(), finshed the data wrangling in question 2,5 and 6.\n After you have completed this skill builder with your team (or on your own) then compare your work to our script    See the script.   </p><p></p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/skill_builders\/pandas_altair\/"},{value:"Project 1: What\u0027s in a name?",label:"<p>Background Early in prehistory, some descriptive names began to be used again and again until they formed a name pool for a particular culture. Parents would choose names from the pool of existing names rather than invent new ones for their children.\nWith the rise of Christianity, certain trends in naming practices manifested. Christians were encouraged to name their children after saints and martyrs of the church. These early Christian names can be found in many cultures today, in various forms. These were spread by early missionaries throughout the Mediterranean basin and Europe.\nBy the Middle Ages, the Christian influence on naming practices was pervasive. Each culture had its pool of names, which were a combination of native names and early Christian names that had been in the language long enough to be considered native. [ref]\nData Download: names_year.csv\nInformation: data.md\nReadings  Python for Data Science (P4DS): Data Visualization P4DS: Graphics for Communication P4DS: Markdown P4DS: 5.2 Filter rows with .query() P4DS: Chapter 10 DataFrame  Optional References  The query method  Questions and Tasks For Project 1 the answer to each question should include a chart and a written response. The years labels on your charts should not include a comma. At least two of your charts must include reference marks.\n How does your name at your birth year compare to its use historically? If you talked to someone named Brittany on the phone, what is your guess of his or her age? What ages would you not guess? Mary, Martha, Peter, and Paul are all Christian names. From 1920 - 2000, compare the name usage of each of the four names. What trends do you notice? Think of a unique name from a famous movie. Plot the usage of that name and see how changes line up with the movie release. Does it look like the movie had an effect on usage?  Deliverables Use this template to submit your Client Report. The template has three sections (for additional details please see the instructional template):\n A short summary that highlights key that describes the results describing insights from metrics of the project and the tools you used (Think “elevator pitch”). Answers to the questions from the \u0026ldquo;Questions and Tasks\u0026rdquo; section above. Each answer should include a written description of your results, code snippets, charts, and tables.  </p><p></p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/projects\/project-1\/"},{value:"Python for Data Science",label:"<p>Python for Data Science is a port of R for Data Science into Python. We are keeping Garrett Grolemund and Hadley Wickham’s writing and examples as much as possible while demonstrating Python instead of R. We have focused on pandas and Altair in our Python code snippets.\nThis book will teach you how to do data science with Python: You’ll learn how to get your data into Python, get it into the most useful structure, transform it, visualise it and model it. In this book, you will find a practicum of skills for data science. Just as a chemist learns how to clean test tubes and stock a lab, you’ll learn how to clean data and draw plots—and many other things besides. These are the skills that allow data science to happen, and here you will find the best practices for doing each of these things with Python. You’ll learn how to use the grammar of graphics, literate programming, and reproducible research to save time. You’ll also learn how to manage cognitive resources to facilitate discoveries when wrangling, visualising, and exploring data.\nInstalling and Importing Packages We want to install the following three packages;\n pandas numpy scikit-learn. The Apple Silicon is still more difficult to get installed. You can use the following links to get it installed - Link 1, Link 2, Link 3.  We can get packages installed for this course using one of the two methods below.\nUsing your terminal # default way pip install numpy pandas scikit-learn If you are using a Mac\n# Mac method with Python 2 and 3 installed pip3 install numpy pandas scikit-learn Using your interactive Python (Jupyter server) import sys !{sys.executable} -m pip install numpy pandas scikit-learn    </p><p></p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/course-materials\/python-for-data-science\/"},{value:"Week 10-11: Project 5 - Star Wars",label:"<p> A significant portion of a data scientist\u0026rsquo;s job is data cleaning. during these two weeks we will not hide the data munging from you. We will practice data cleaning using a Star Wars survey from FiveThirtEight. Survey data is notoriously difficult to handle. Even when the data is recorded cleanly the options for ‘write in questions’, ‘choose from multiple answers’, ‘pick all that are right’, and ‘multiple choice questions’ makes storing the data in a tidy format difficult.\n </p><p></p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/slides\/p5\/"},{value:"Day 1: Git and Github",label:"<p>Welcome to class! Spiritual Thought Announcements   Project 5 Comment\n Feature Importance and Model discussion    The last day of DSS is next Wednesday, Dec 6th at 6:00PM in STC 394\n  Extra credit for creating and uploading cheat sheet (2 points for projects or checkpoints)\n  Coding Challenge date?\n  The technical aspects of Project 6 will be done mostly in class. Resume prep\/MD outside\n  Git and GitHub \u0026ldquo;Web developers\u0026rsquo; social media platform\u0026rdquo;  This is GitHub, the world’s largest code repository platform online. A platform used by some 50 million software developers to host their coding projects, most of them open-source — meaning others can access their codes and modify them to create better versions if they feel like.\nMost of the internet is produced or hosted on GitHub in the form of code. “What Gmail is to email, GitHub is to writing software,” says Kiran Jonnalagadda, cofounder of HasGeek, a platform to build and discover peer groups. Source\n  Don\u0026rsquo;t: post code for assignments that hundreds of other students have done. Do: post unique code using skills from your classes.  I would also recommend using private repos to manage your course work.\nIs it going to hurt? Answer: Yes.\nIt feels weird at first but quickly becomes second nature. If you plan on taking more data science classes, you should know that DS 350 students are required to submit all coursework via GitHub. This is a major topic in class and office hours for the first two weeks. Then we practically never discuss it again.\nMore bad news. Do you use GitHub to work with other people or to coordinate your own work from multiple computers? If so, after you recover from the initial setup, Git will crush you again with merge conflicts. And this is not one-time pain, this could be a dull ache for a long time.\n Managing a project via Git\/GitHub is much like the Google Doc scenario and enjoys many of the same advantages. It is definitely more complicated than collaborating on a Google Doc, but this puts you in the right mindset. Source\n Step 1: Download and install Follow steps 1-4 of this tutorial.\nThen:\n Request access tothe BYU-I Resumes page at Request Access Respond to the auto-generated email Wait a few minutes for authorization Join our GitHub organization - byuids-resumes.  If you are on a Mac, you may need:  Mac fix with paths Download Xcode and update (10 gig download) VSCode path selection (scroll down to step 1)  Step 2: Create a repository from the resume template and connect to the BYUI Step 3: Publish your resume to GitHub Pages  Go to settings for your repo. Scroll down to the GitHub Pages section. Under source select the box which says None and pick master. Now select the \/docs folder and click save. Copy your site URL at the top of the \/settings\/pages location. Add your link to the About section of your repository. Edit the readme.md in the base repo to not show the resume directions.  Step 4: Clone repo into VS Code Analytics Vidhya reading\nStep 5: Make your resume look good Examples:\n Undergraduate DS resumes Hathaway\u0026rsquo;s resume  You may also find these articles helpful:\n How to Write a Great Data Science Resume How to Build an Effective Data Science Resume How to Write the Perfect Data Scientist Resume  </p><p></p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/slides\/p6\/d2\/"},{value:"Day 2: Star Wars and strings",label:"<p>Welcome to class! Announcements What\u0026rsquo;s something you\u0026rsquo;re grateful for today? The .str functions in pandas   .str.strip: Strip white space .str.replace: replace one string of characters with another. .str.split: Separate a character string into two values. .str.join: Join two lists together Python for Data Science: Strings Pandas Documentation   .str.strip() s = pd.Series([\u0026#39;1. Ant. \u0026#39;, \u0026#39;2. Bee!\\n\u0026#39;, \u0026#39;3. Cat?\\t\u0026#39;, \u0026#39;4. Beat?\\t\u0026#39;, np.nan]) s.str.strip() s.str.strip(\u0026#39;123.!? \\n\\t\u0026#39;) s.str.strip(\u0026#39;1234.!? \\n\\t\u0026#39;) \n.str.replace() s.str.replace(\u0026#39;Ant.\u0026#39;, \u0026#39;Man\u0026#39;) s.str.replace(\u0026#39;a\u0026#39;, 8) s.str.replace(\u0026#39;a\u0026#39;, \u0026#39;8\u0026#39;) s.str.replace(\u0026#39;a\u0026#39;, \u0026#39;8\u0026#39;, case = False) s.str.replace(\u0026#39;a|e\u0026#39;, \u0026#39;8\u0026#39;, case = False) s.str.replace(\u0026#39;\\d\u0026#39;, \u0026#39;\u0026#39;, case = False) \n.str.split() s2 = pd.Series([\u0026#39;1-20\u0026#39;, \u0026#39;21-50\u0026#39;, \u0026#39;51-80\u0026#39;, \u0026#39;81-100\u0026#39;, np.nan]) s3 = pd.Series( [ \u0026#34;this is a regular sentence\u0026#34;, \u0026#34;https:\/\/docs.python.org\/3\/tutorial\/index.html\u0026#34;, np.nan ] ) s2.str.split() s3.str.split() s2.str.split(pat=\u0026#34;-\u0026#34;) \n.str.join() or .str.cat() two_columns = s2.str.split(\u0026#34;-\u0026#34;, expand = True).rename( columns = {0: \u0026#39;minimum\u0026#39;, 1: \u0026#39;maximum\u0026#39;}) two_columns.fillna(\u0026#34;\u0026#34;).agg(\u0026#34;__\u0026#34;.join, axis = 1) two_columns.minimum.str.cat(two_columns.maximum, sep = \u0026#34;__\u0026#34;) \nFixing the column names Here is some code to get you started:\nurl = \u0027https:\/\/github.com\/fivethirtyeight\/data\/raw\/master\/star-wars-survey\/StarWars.csv\u0027 starwars_data = pd.read_csv(url, encoding = \u0026quot;ISO-8859-1\u0026quot;, skiprows = 2, header = None) starwars_cols = pd.read_csv(url, encoding = \u0026quot;ISO-8859-1\u0026quot;, nrows = 2, header = None) starwars_cols.iloc[0,:].str.upper().str.replace(\u0026quot; \u0026quot;, \u0026quot;!\u0026quot;) \nValidating statistical summaries len(), .query(), and .value_counts() will be your friends.\nValidating visuals You\u0026rsquo;re going to make a lot of bar charts!\n Simple bar chart tutorial. Make Altair do the counting for you! Tutorials here and here.  </p><p></p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/slides\/p5\/d2\/"},{value:"JSONs \u0026 missing",label:"<p>UFO Sightings Data Link to json file\nExercise 1 Read in the json file as a pandas dataframe. After reading in the data, you\u0026rsquo;ll want to explore it and gain some intuition. Exploring data is a very important step — the more you know about your data the better! Answer the following questions to gain some insight into this dataset.\n How many rows are there? How many columns? What does a row represent in this dataset? What are the different ways missing values are encoded? How many np.nan in each column?  Some useful code for exploring data\n# Object\/Categorical Columns data.column_name.value_counts(dropna=False) data.column_name.unique() # Numeric Columns data.column_name.describe() # Counting missing values data.isna().sum() # Creates boolean dataframe and sums each column  Exercise 2 After learning different ways our data encodes missing values, now we will neatly manage them. There are many techniques we can use to handle missing values; for example, we can drop all rows that contain a missing value, impute with mean or median, or replace missing values with a new missing category. We will use some of these techniques in this exercise.\n shape_reported - replace missing values with missing string. distance_reported - change -999 values to np.nan. (-999 is a typical way of encoding missing values.) distance_reported - fill in missing values with the mean (imputation) were_you_abducted - replace - string with missing string.  The first 10 rows of your data should look like this after completion of the above steps.\n    city shape_reported distance_reported were_you_abducted estimated_size     0 Ithaca TRIANGLE 8521.9 yes 5033.9   1 Willingboro OTHER 7438.64 no 5781.03   2 Holyoke OVAL 7438.64 no 697203   3 Abilene DISK 7438.64 no 5384.61   4 New York Worlds Fair LIGHT 6615.78 missing 3417.58   5 Valley City DISK 7438.64 no 4280.1   6 Crater Lake CIRCLE 7377.89 no 528289   7 Alma DISK 7438.64 missing 4772.75   8 Eklutna CIGAR 5214.95 no 4534.03   9 Hubbard CYLINDER 8220.34 missing 4653.72    Some useful code for filling in missing data\ndata.column_name.replace(..., ..., inplace=True) data.column_name.fillna(..., inplace=True)  Exercise 3 Create a table that contains the following summary statistics.\n median estimated size by shape mean distance reported by shape count of reports belonging to each shape  Your table should look like this:\n   shape_reported median_est_size mean_distance_reported group_count     CIGAR 5899.68 6520.21 3   CIRCLE 266002 7408.26 2   CYLINDER 4550.58 8039.49 2   DISK 4581.8 7516.39 16   FIREBALL 5407.22 7097.78 3   FLASH 6108.34 7438.64 1   FORMATION 5104.4 8708.32 2   LIGHT 3850.25 7636.09 2   OTHER 4699.4 7473.98 4   OVAL 4943.63 7787.24 4   RECTANGLE 3668.1 6054.62 2   SPHERE 5076.78 7206.55 6   TRIANGLE 5033.9 8521.9 1   missing 250153 7438.64 2    Some useful code for grouping and getting summary statistics\n(data.groupby(...) .agg(..., ..., ...))  Exercise 4 The cities listed below reported their estimated size in square inches, not square feet. Create a new column named estimated_size_sqft in the dataframe, that has all the estimated sizes reported as sqft. (Hint: divide by 144 to go from sqin -\u0026gt; sqft)\n Holyoke Crater Lake Los Angeles San Diego Dallas  The head of your data should look like this.\n    city shape_reported distance_reported were_you_abducted estimated_size estimated_size_sqft     0 Ithaca TRIANGLE 8521.9 yes 5033.9 5033.9   1 Willingboro OTHER 7438.64 no 5781.03 5781.03   2 Holyoke OVAL 7438.64 no 697203 4841.69   3 Abilene DISK 7438.64 no 5384.61 5384.61   4 New York Worlds Fair LIGHT 6615.78 missing 3417.58 3417.58   5 Valley City DISK 7438.64 no 4280.1 4280.1   6 Crater Lake CIRCLE 7377.89 no 528289 3668.68   7 Alma DISK 7438.64 missing 4772.75 4772.75   8 Eklutna CIGAR 5214.95 no 4534.03 4534.03   9 Hubbard CYLINDER 8220.34 missing 4653.72 4653.72    Some useful code to fix the rows reported in sqin\nnp.where(..., # Condition ..., # If condition is true ...) # If condition is false  After you have completed this skill builder with your team (or on your own) then compare your work to our script    See the script.   </p><p></p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/skill_builders\/json_missing\/"},{value:"Project 2: Late flights and missing data (JSON files)",label:"<p>Background Delayed flights are not something most people look forward to. In the best case scenario you may only wait a few extra minutes for the plane to be cleaned. However, those few minutes can stretch into hours if a mechanical issue is discovered or a storm develops. Arriving hours late may result in you missing a connecting flight, job interview, or your best friend’s wedding.\nIn 2003 the Bureau of Transportation Statistics (BTS) began collecting data on the causes of delayed flights. The categories they use are Air Carrier, National Aviation System, Weather, Late-Arriving Aircraft, and Security. You can visit the BTS website to read definitions of these categories.\nThe JSON file for this project contains information on delays at 7 airports over 10 years. Your task is to clean the data, search for insights about flight delays, and communicate your results using the provided template. If you have completed the checkpoints for Unit 5, then you are ready to answer the Grand Questions listed below. Refer to the readings for additional help.\nData Download: JSON File\nInformation: Data Description\nReadings  P4DS: Section 12.1 \u0026amp; 12.2 Tidy data P4DS: Chapter 5 Data transformation P4DS: Section 7.4 Missing Values Python Data Science Handbook: Missing Data Wikipedia Missing Data  Optional References  isin method where method np.where method replace method An introduction to JSON (May need to open in ingognito to read.) The key word in \u0026lsquo;Data Science\u0026rsquo; is not Data\u0026hellip; How to Handle Missing Data (May need to open in ingognito to read.)  Questions and Tasks   Which airport has the worst delays? Discuss the metric you chose, and why you chose it to determine the “worst” airport. Your answer should include a summary table that lists (for each airport) the total number of flights, total number of delayed flights, proportion of delayed flights, and average delay time in hours.\n  What is the best month to fly if you want to avoid delays of any length? Discuss the metric you chose and why you chose it to calculate your answer. Include one chart to help support your answer, with the x-axis ordered by month. (To answer this question, you will need to remove any rows that are missing the Month variable.)\n  According to the BTS website, the “Weather” category only accounts for severe weather delays. Mild weather delays are not counted in the “Weather” category, but are actually included in both the “NAS” and “Late-Arriving Aircraft” categories. Your job is to create a new column that calculates the total number of flights delayed by weather (both severe and mild). You will need to replace all the missing values in the Late Aircraft variable with the mean. Show your work by printing the first 5 rows of data in a table. Use these three rules for your calculations:__\n 100% of delayed flights in the Weather category are due to weather  30% of all delayed flights in the Late-Arriving category are due to weather. From April to August, 40% of delayed flights in the NAS category are due to weather. The rest of the months, the proportion rises to 65%.    Using the new weather variable calculated above, create a barplot showing the proportion of all flights that are delayed by weather at each airport. Discuss what you learn from this graph.\n  Fix all of the varied missing data types in the data to be consistent (all missing values should be displayed as “NaN”). In your report include one record example (one row) from your new data, in the raw JSON format. Your example should display the \u0026ldquo;NaN\u0026rdquo; for at least one missing value.__\n  Deliverables Use this template to submit your Client Report. The template has three sections (for additional details please see the instructional template):\n A short summary that highlights key that describes the results describing insights from metrics of the project and the tools you used (Think “elevator pitch”). Answers to the questions from the \u0026ldquo;Questions and Tasks\u0026rdquo; section above. Each answer should include a written description of your results, code snippets, charts, and tables.  </p><p></p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/projects\/project-2\/"},{value:"Day 1: The war with Star Wars",label:"<p>Welcome to class! Spiritual Thought Announcements  Project 4 thoughts  Feature Importances - Sorted Bar Graph, not unsorted tables Suppress warnings And the winner is\u0026hellip;    The Star Wars data Load the Star Wars data # %% import pandas as pd import altair as alt import numpy as np url = \u0026#39;https:\/\/github.com\/fivethirtyeight\/data\/raw\/master\/star-wars-survey\/StarWars.csv\u0026#39; dat = pd.read_csv(url) \n??? What do the data look like? Take the time to understand how the current data is organized.\nFirst things first\u0026hellip; Each group should answer these questions:\n Where are the column names? What does each row represent? What does each column represent?  What do we want the data to look like? Each group should answer these questions:\n What is the goal of this project, and how does that affect what we want from the data? What do we want each row to represent? What do we want each column to look like? Pick a few columns from the dataset and try creating an example in excel.  Cleaning data takes time Maybe not 80% of your time, but it does take time!\n Data science is frequently about doing bespoke analysis which means creating and labelling unique datasets. No matter how cleanly formatted or standardized a dataset is, it likely needs some work.\nI would argue that spending time working with data to transform, explore and understand it better is absolutely what data scientists should be doing. This is the medium they are working in. Understand the material better and you\u0026rsquo;ll get better insights. ref\n Structure your project, structure your thinking Tableau on tidying data  Think about your data holistically Know the basic structure of your data Keep track of your steps Spot check throughout  Compartmentalize and organize your scripts and data  Best practices for organizing data science projects How to organize your Python data science project Cookiecutter Data Science Data Science Project Folder Structure  - [BYU=I DSS](https:\/\/github.com\/BYUIDSS\/blank_project_repository) ----- What are codecs and encodings?  UTF-8 Python Unicode Basics pd.read_csv() ISO-8859-1  The .str functions in pandas  .strip: Strip white space .replace: replace one string of characters with another. .split: Separate a character string into two values. .join: Join two lists together Python for Data Science: Strings Pandas Documentation  </p><p></p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/slides\/p5\/d1\/"},{value:"Machine Learning",label:"<p>Introduction Everyone seems to have a slightly different take on the differences between Artificial Intelligence, Machine Learning, and Data Science. The following four articles cover some of the most common definitions.\nAs you read them, think about the differences and similarities of the definitions. Given the backgrounds of the various authors, whose opinions might you give more weight to?\n Michael Copeland writing for NVidia Bernard Marr writing for Forbes Vincent Granville writing for Data Science Central Simply Statistics Blog - The key word in \u0026ldquo;Data Science\u0026rdquo; is not Data, it is Science  Of particular note is this quote from the Granville article:\n Earlier in my career (circa 1990) I worked on image remote sensing technology, among other things to identify patterns (or shapes or features, for instance lakes) in satellite images and to perform image segmentation: at that time my research was labeled as computational statistics, but the people doing the exact same thing in the computer science department next door in my home university, called their research artificial intelligence. Today, it would be called data science or artificial intelligence, the sub-domains being signal processing, computer vision or IoT.\n As with most things in the realm of science, there tends to be a wide gap between how the media, government, and business sectors view a particular technology compared to how it\u0026rsquo;s viewed by the engineers and scientists using that technology.\nFor our purposes in this course, we\u0026rsquo;ll define these terms as follows:\n Artificial Intelligence: The study of man-made \u0026ldquo;agents\u0026rdquo; that perceive their environment and take actions that maximize their chances of success at some goal.1\nMachine Learning: A subfield within Artificial Intelligence that gives \u0026ldquo;computers the ability to learn without being explicitly programmed.\u0026quot;2\nData Science: The study and use of the techniques, statistics, algorithms, and tools needed to extract knowledge and insights from data.3\n MORAVEC\u0026rsquo;S PARADOX In the 1980\u0026rsquo;s, Hans Moravec made the following observation, which came to be known as Moravec\u0026rsquo;s Paradox:\n \u0026hellip;as the number of demonstrations has mounted, it has become clear that it is comparatively easy to make computers exhibit adult-level performance in solving problems on intelligence tests or playing checkers, and difficult or impossible to give them the skills of a one-year-old when it comes to perception and mobility.4\n So, while AI and machine learning algorithms can accomplish many tasks much better than humans can, any toddler can outperform even the most state-of-the-art neural network in picking out photos of their parents or pet cat.\n5\nEven though Moravec wrote about this over thirty years ago, the same sentiment persists in AI research today. In a 2016 interview, Dr. Sean Holden an AI researcher at Cambridge University, discussed the differences between human intelligence and artificial intelligence:\n “Most AI researchers don’t try to solve the whole problem because it’s too hard. They take some specific problem and do it better. That’s not to say that the way humans think isn’t useful to AI, but working out how brains do things is hard. And there’s a difference in scale. Brains are doing things that are in some senses quite different from what AI researchers are currently attacking – I’d be ecstatic, for example, if I could build a robot that could put on a duvet cover.”6\n Dr. Fumiya Iida, from the Machine Intelligence Lab at Cambridge, adds:\n “We have hundreds of thousands of muscles in our body, so how can the brain control this? A computer can’t. Every fraction of a second you have to co-ordinate hundreds of muscles just to grab a cup, for example.”6\n PREDICTION VS. INFERENCE In machine learning, we are typically interested in doing one of two things: making inferences, or making predictions.\n Inference: Given a set of data you want to infer how the output is generated as a function of the data.\nPrediction: Given a new measurement, you want to use an existing data set to build a model that reliably chooses the correct identifier from a set of outcomes.7\n This example explains the differences between those two goals:\n Inference: You want to find out what the effect of Age, Passenger Class and, Gender has on surviving the Titanic Disaster. You can put up a logistic regression and infer the effect each passenger characteristic has on survival rates.\nPrediction: Given some information on a Titanic passenger, you want to choose from the set {lives,dies} and be correct as often as possible.7\n Classification Algorithms Imagine that you\u0026rsquo;re a big fan of comic books. Over the years, you\u0026rsquo;ve read enough Marvel and DC comics that if I asked you to \u0026ldquo;classify\u0026rdquo; which universe Superman belonged to, you\u0026rsquo;d be able to confidently say, \u0026ldquo;The DC Universe\u0026rdquo;.\nOr, let\u0026rsquo;s say you\u0026rsquo;ve eaten a lot of chocolate in your life. If I were to have you close your eyes and take a bite of chocolate, you might be able to accurately tell me if it was white chocolate, milk chocolate, semi-sweet, or dark.\nThese are both classification problems. Based on your prior knowledge or training regarding different groups, you can take an item and sort it into the correct group.\nIn machine learning, classification algorithms, (or classifiers), need to be trained before they can classify things on their own. We can train an algorithm by providing it with lots of examples from each group and telling it which attributes of those samples are important. The more examples we use to train our algorithm, the more accurate the classification of new items will be.\nIn the example below, we’re telling the algorithm “this is what a blue circle looks like\u0026rdquo;, or \u0026ldquo;this is what a green circle looks like\u0026rdquo;, etc\u0026hellip;\nOnce an algorithm has been trained, we can see how well it performs by providing it with test data consisting of new items it hasn\u0026rsquo;t seen yet, and checking to see if it can correctly predict which group the new items belong to.\nThe Iris Dataset ABOUT THE DATA For this example, we will use Fisher\u0026rsquo;s Iris Data.\nThe Iris dataset contains the length and width of the sepals and petals from 150 iris flowers across three different species of iris: Iris setosa, Iris versicolor, and Iris virginica.\nEach row in the Iris dataset represents the measurements of a single flower. We refer to each of these as a sample, observation, or instance.\nEach column in the Iris dataset represents a particular thing being measured about each flower. From left to right we have (in centimeters) the sepal length, the sepal width, the petal length, and the petal width. Each of these is referred to as a feature, attribute, measurement, or dimension.\nThe final column in the dataset is the species of the flower. This final column is often referred to as the target or class of the sample.\nClassifiers Classifier algorithms generally follow the same set of steps. Our goal is to create a classifier that can be provided with the measurements of petals and sepals, and then use that information to predict the species of iris flower we\u0026rsquo;re measuring.\nLoad data The first thing we need to do is load our data. In most cases, there is some pre-processing that has to be done on the data in order to get it to the point where we can start working with it. Often you will need to normalize and encode variables.\n Normalization reading Encoding reading  In this case however, the data is provided to you in the exact format you need:\n sepal_length sepal_width petal_length petal_width species 0 5.1 3.5 1.4 0.2 Iris-setosa 1 4.9 3.0 1.4 0.2 Iris-setosa 2 4.7 3.2 1.3 0.2 Iris-setosa 3 4.6 3.1 1.5 0.2 Iris-setosa 4 5.0 3.6 1.4 0.2 Iris-setosa .. ... ... ... ... ... 145 6.7 3.0 5.2 2.3 Iris-virginica 146 6.3 2.5 5.0 1.9 Iris-virginica 147 6.5 3.0 5.2 2.0 Iris-virginica 148 6.2 3.4 5.4 2.3 Iris-virginica 149 5.9 3.0 5.1 1.8 Iris-virginica The csv file for the iris data can be found here. There are many ways to load data from a csv file, but one handy way is to use the read_csv function from the Pandas library:\nimport pandas as pd url = \u0026quot;https:\/\/byuistats.github.io\/DS250-Course\/skill_builders\/ml_sklearn\/machine_learning.csv\u0026quot; data = pd.read_csv(url) Split data Next, we\u0026rsquo;ll randomly divide all of the samples into two groups. The first group will consist of our training data, or the samples we\u0026rsquo;ll use to train our classifier. The second group will consist of our test data, the data we\u0026rsquo;ll use to test our classifier.\nThere are many ways to do this, but if have our features (sepal and petal measurements) and targets (species names) in separate arrays, we can use the train_test_split function of the sklearn library to do this for us:\nNote, that if you use pandas to load the csv file, you\u0026rsquo;ll have the data in a single pandas Data Frame. At some point you\u0026rsquo;ll need to split that data frame into two numpy arrays, one containing the features, and the other containing the targets.\nTake a look at the Indexing and Selecting Data page in the Pandas user guide for more details and splitting the data, and the to_numpy function for converting to a numpy array.\nNotice the transformation can be completed before the data is divided into test and training sets. Two numpy arrays can be passed to the train_test_split function to get two sets of arrays back. Alternatively, the data frame can be passed to the test_train_split, and then the test and training data is split into their feature and target components.\nThe following examples assume you\u0026rsquo;ve split the data into features and targets before passing it to test_train_split.\nfrom sklearn.model_selection import train_test_split # features = ... select the feature columns from the data frame # targets = ... select the target column from the data frame # Randomize and split the samples into two groups. # 30% of the samples will be used for testing. # The other 70% will be used for training. train_data, test_data, train_targets, test_targets = train_test_split(features, targets, test_size=.3) You could also use python\u0026rsquo;s built in libraries to randomly shuffle the data, and then use array slicing to split the data into test and training subsets. However if you do, make sure you do it in such a way that you still know which species goes with each set of measurements.\nTrain classifier By providing the algorithm with training data, we allow it to create relationships between the features of a sample and its class. In the case of the Iris data set, we\u0026rsquo;re training our algorithm on how a given set of sepal and petal measurements correlate to the flower\u0026rsquo;s species.\nsklearn has a classifier called GaussianNB which we can use to demonstrate this. GaussianNB is a \u0026ldquo;Naïve Bayes\u0026rdquo; classifier that assumes two things about our data:\n  That the underlying features follow a continuous, normal distribution. (The Gaussian part) That each feature is statistically independent of every other feature. (The Naïve part)   Do you think both of these assumptions are true for the Iris data?\nTo train our classifier, first we create an instance of it, then we use the fit method to teach it about our data:\nfrom sklearn.naive_bayes import GaussianNB classifier = GaussianNB() classifier.fit(train_data, train_targets) Test classifier Now that our classifier has been trained on how to classify iris flowers, it\u0026rsquo;s time to test it to see if it can correctly predict the species of flower from a set of measurements.\nNote that it\u0026rsquo;s very important when testing our algorithm that we only test it on data that was not used to train it. Otherwise, we\u0026rsquo;re only testing it\u0026rsquo;s ability to remember training data. This is why we split the data into two groups.\nTo test our classifier, we\u0026rsquo;ll use the predict method and provide it with our test data. This method will return a list of predicted targets, one for each sample in the test data.\nIn our case, we\u0026rsquo;ll give it a list of petal and sepal measurements it has never seen before, and it will return a list of species predictions, on prediction for each sample in our test data:\ntargets_predicted = classifier.predict(test_data) Assess classifier performance Since we already know which type of iris each sample in the test data corresponds to, we can compare the predictions made by the classifier to the sample\u0026rsquo;s actual species and calculate how well our algorithm performs.\nIf m is the number of correct predictions made, and n is the total number of samples in our test data, then accuracy can be calculated as:\naccuracy = m\/n\nSo if our test data has 20 samples and the classifier predicts the correct flower species for 15 of them, then we would say our algorithm has an accuracy of 75%.\n(Note that accuracy isn\u0026rsquo;t the best metric to use for evaluating classification algorithms. We\u0026rsquo;ll be looking at a few alternatives in the future.)\nSummary To summarize: we take our dataset and divide it in two parts: training data and test data. We use the training data to train the classifier to make classifications, then we use the test data to test how well our classifier performs.\nIf we have a classifier that performs well, we can use it with new data, samples whose groups we don\u0026rsquo;t know ahead of time, and the accuracy metric will give us some idea of how reliable those predictions are.\nIf our classifier performs poorly, we either need to provide it with more training data, modify or replace it, or select a different set of attributes to use as features.\nCSE 450: Machine Learning \u0026amp; Data Mining is the class were you can build depth in Machine Learning and it\u0026rsquo;s applications.\nREFERENCES   Artificial Intelligence: A Modern Approach by Russell and Norvig (Prentice Hall, 2009).↩ \u0026#x21a9;\u0026#xfe0e;\n Some Studies in Machine Learning Using the Game of Checkers, by Arthur L. Samuel (IBM Journal, Vol 3, No 3, 1959).↩ \u0026#x21a9;\u0026#xfe0e;\n Wikipedia article on Data Science.↩ \u0026#x21a9;\u0026#xfe0e;\n Mind Children, by Hans Moravec (Harvard University Press, 1988).↩ \u0026#x21a9;\u0026#xfe0e;\n XKCD 1425: Tasks.↩ \u0026#x21a9;\u0026#xfe0e;\n Cambridge Alumni Magazine, Issue 79, pg 19.↩ \u0026#x21a9;\u0026#xfe0e;\n Cross Validated: Prediction vs Inference.↩ \u0026#x21a9;\u0026#xfe0e;\n   </p><p></p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/course-materials\/machine-learning\/"},{value:"Project 3: Finding relationships in baseball.",label:"<p>Background When you hear the word “relationship” what is the first thing that comes to mind? Probably not baseball. But a relationship is simply a way to describe how two or more objects are connected. There are many relationships in baseball such as those between teams and managers, players and salaries, even stadiums and concession prices. The graphs on Data Visualizations from Best Tickets show many other relationships that exist in baseball.\nFor this project, your client would like developed SQL queries that they can use to retrieve data for use on their website without needing Python. They would also like to see example Altair charts.\nData Data Conection: lahmansbaseballdb\nConnection Instructions: See SQL for Data Science\nReadings  SQL for Data Science Readings (read all links)  Optional References  Why SQL is beating NoSQL, and what this means for the future of data Lahman Data Dictionary  Questions and Tasks   Write an SQL query to create a new dataframe about baseball players who attended BYU-Idaho. The new table should contain five columns: playerID, schoolID, salary, and the yearID\/teamID associated with each salary. Order the table by salary (highest to lowest) and print out the table in your report.\n  This three-part question requires you to calculate batting average (number of hits divided by the number of at-bats)\n Write an SQL query that provides playerID, yearID, and batting average for players with at least 1 at bat that year. Sort the table from highest batting average to lowest, and then by playerid alphabetically. Show the top 5 results in your report. Use the same query as above, but only include players with at least 10 at bats that year. Print the top 5 results. Now calculate the batting average for players over their entire careers (all years combined). Only include players with at least 100 at bats, and print the top 5 results.    Pick any two baseball teams and compare them using a metric of your choice (average salary, home runs, number of wins, etc). Write an SQL query to get the data you need, then make a graph in Altair to visualize the comparison. What do you learn?\n  Deliverables Use this template to submit your Client Report. The template has three sections (for additional details please see the instructional template):\n A short summary that highlights key that describes the results describing insights from metrics of the project and the tools you used (Think “elevator pitch”). Answers to the questions from the \u0026ldquo;Questions and Tasks\u0026rdquo; section above. Each answer should include a written description of your results, code snippets, charts, and tables.  </p><p></p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/projects\/project-3\/"},{value:"SQL \u0026 databases",label:"<p>Skill builder (relational database) For this skill builder, we are exploring some important topics in relational databases. This exercise will require you to create SQL queries through python. You may want to at least scan the readings before beginning this task since this serves as an assessment of your understanding of the assigned readings.\nA competent student should be able to finish the exercises within 75 minutes.\nBefore you start Make sure you have installed VS-code, pandas, and Altair on your computer.\nAlso make sure you have gone through the tutorial on under course materials called SQL for Data Science: we assume that you have a connection to your data.\nExercise 1 Readme file A database can consist of more than one table\/data set. A relational database consists of tables\/data sets that share columns. These shared columns then establish the relationship between the tables, thus the name relational database. The relations are sometimes not easily found and they require careful investigations.\nTo understand what is in a relational database, we can start with understanding the tables and the columns within.\nHere is a link to the readme file of the baseball database.\n What is the name of the table that records data about pitchers in the regular seasons?\n  What do the HR and HBP columns mean in that table respectively?\n Excercise 2 SELECT and FROM The simplest SQL query is a query with SELECT and FROM. These are the keywords you will see again and again in SQL. Usually, when constructing a more complex query, it is easier to identify what goes into these two clauses first.\n Create a query that shows all columns from the table you found in Exercise 1, save the dataframe in a variable \u0026ldquo;pitch\u0026rdquo;\n You script should look something like:\nresult = pd.read_sql_query( \u0027SELECT _______ FROM _______\u0027, con) results Excercise 2 WHERE The WHERE keyword allows us to filter down the table horizontally (fewer rows).\nIt goes after SELECT and FROM.\n Using a SQL query, select all rows in the same table where HR is lesser than 10 and gs is greater than 25.\n  Find out what the columns mean and explain your query in words\n Excercise 3 ORDER BY ORDER BY sort the table you select by one or more columns and goes after WHERE\n Using the same query in exercise 2, edit it so that the table is ordered by the year of the season(nearest to furthermost) and the player ID(alphabetically).\n Excercise 4 Joins Joins are used when you wish to create a new table through two different tables. Keep in mind that you have to identify the relationship between two tables before you can correctly join them.\nJOIN goes between FROM and WHERE.\n Identify the shared columns (keys) and join the table in exercise 2 with the salaries table, then filter the data so that it shows only pitchers in the year 1986.\n You should get a dataframe with 306 rows.\nExercise 5 Group by Group by is a keyword we use to lower the level of granularity of a table. Meaning we are combining rows into one by the given column(s).\nCreate a query that captures the number of pitchers the Washington Nationals used in each year, then sort the table by year\nYou should get a dataframe with 23 rows.\nFor the overachievers Excercise 6 Research the order of operations for SQL and put the following keywords in that order.\n SELECT FROM JOIN WHERE HAVING ORDER BY GROUP BY LIMIT  After you have completed this skill builder with your team (or on your own) then compare your work to our script    See the script.   </p><p></p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/skill_builders\/relational_data\/"},{value:"Course Materials",label:"<p>We will be relying on a few resources for this course. You will find the pertinant readings attached to each of the projects. Those readings will be culled from;\n Python for Data Science: A port of R for Data Science using the Python packages pandas and Altair. pandas User Guide Altair User Guide scikit-learn learn User Guide scikit-learn tutorials Python Data Science Handbook A Whirlwind Tour of Python SQL  Wes McKinney\u0026rsquo;s pandas code for his book Python for Data Analysis is a useful reference as well: https:\/\/github.com\/wesm\/pydata-book\n</p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/course-materials\/"},{value:"Machine Learning",label:"<p>Intro to Titanic Machine Learning Skill Builder Link to data\nFor this skill builder, we\u0026rsquo;ll be putting our machine learning hats on. We\u0026rsquo;ll be creating a model that predicts whether a passenger survived. With machine learning, there is a lot of jargon! It can be quite overwhelming at times. This skill builder attempts to keep things basic and simple. With that being said, there are some terms that are important to understand. Let\u0026rsquo;s look at the first few rows of our dataset before proceeding with the definitions.\nThe titanic dataset will be used for examples of each definition.\n   survived pclass sex age siblings_spouses_aboard parents_children_aboard fare     0 3 1 22 1 0 7.25   1 1 0 38 1 0 71.2833   1 3 0 26 0 0 7.925   1 1 0 35 1 0 53.1   0 3 1 35 0 0 8.05    Important Terms:  features: measurable property of the object you\u0026rsquo;re trying to predict. We use this information to predict our target of interest.  Example: pclass, sex, age, siblings_spouses_aboard , parents_children_aboard, fare columns are all examples of different features. Synonyms: attributes, explanatory variables, independent variables, variables, X\u0026rsquo;s, covariates   target: the feature that you are wanting to gain more insight into. The thing you are trying to predict.  Example: in the titanic dataset our target is survived Synonyms: label, dependent variable, y   train set: Usually 70% of the rows from the original dataset are randomly sampled to create this training data. It\u0026rsquo;s used by the algorithm, to determine, or learn, the optimal combinations of variables that will generate a good predictive model  Example: Random sample of 70% of the original titanic dataset rows Synonyms: training data, train data, X_train, y_train   test set: Usually the remaining 30% of the rows in the original dataset are used to create this dataset. The testing data is a set of rows used only to assess the performance (i.e. generalization) of a model. To do this, the final model is used to predict classifications of examples in the test set. Those predictions are compared to the examples\u0026rsquo; true classifications to assess the model\u0026rsquo;s accuracy.  Example: Random sample of 30% of the original titanic dataset rows Synonyms: testing data, test data, X_test, y_test   evaluation metrics: A statistic that tells you how well your predictions align with the actual values. Other words, tells you how good your model is.  Example: Accuracy, Precision, Recall, MSE, MAE, Rsquared Synonyms: performance metric    Again, this is a very light and oversimplified treatment of machine learning. The purpose of this project is to help you understand the main concepts of ml and walk you through the process of building a machine learning model. A simplified work flow of a machine learning project is shown below. Spend some time getting familiar with this flow \u0026amp;mdash as you are about to code it\u0026hellip; Exciting!\nNote in order to do this skill builder you will need to have scikit-learn installed on your machine. Run the following command in your terminal if you haven\u0026rsquo;t already.\npip install scikit-learn\nData Link to csv file\nExercise 0 (Imports and Loading in data) # Loading in packages import pandas as pd import numpy as np import altair as alt from sklearn.model_selection import train_test_split from sklearn.ensemble import RandomForestClassifier from sklearn.metrics import accuracy_score # Loading in data data = pd.read_csv(___)  Exercise 1 Create a chart exploring the relationship between age and survived in the titanic dataset. A strip plot, density plot, or boxplot might be useful here. Below is an example of a density plot. Feel free to replicate this chart or create your own.\nThe purpose of making this chart is to explore the relationships between a feature and the target. We want to see if the feature contains predictive information about the target. This is a large part of machine learning called Exploratory Data Analysis that should never be skipped! Spend time getting to know your features and how they interact with other features and the target.\n Exercise 2 Build a random forest model that is able to predict whether a passenger survived. This exercise is the bulk of the skill builder and contains several steps.\nStep 0: Split the data into X and y variables The X variable will contain all your features\n# Removes the target and keeps all features X = data.drop(___, axis=1) The y variable will hold the target\n# Selects the target column y = data[\u0026#39;___\u0026#39;] Step 1: Split data into train and test sets The train_test_split function is useful for this task. Review the train_test_split function documentation\n# Splitting X and y variables into train and test sets using stratified sampling X_train, X_test, y_train, y_test = train_test_split(___, ___, test_size=0.3, random_state=24, stratify=y) Step 2: Train the model Explore the RandomForestClassifier documentation for the RandomForestClassifier. It\u0026rsquo;s not necessary to understand the inner workings of the Random Forest algorithm for this class - just learn the syntax of fitting the model.\n# Creating random forest object rf = RandomForestClassifier(random_state=24) # Fit with the training data rf.fit(___, ___) Step 3: Use test set to make predictions # Using the features in the test set to make predictions y_pred = rf.predict(___) Step 4: Compare test set predictions to actual values. Calculate the accuracy. # Comparing predictions to actual values accuracy_score(___, ___)  Exercise 3 What is the most important feature in making predictions? Why do you think this is?\nCreate a table that shows the feature importances in descending order. The random forest classifier has a feature importances attribute. It can be accessed by rf.feature_importances_. The table should look something like this.\n   feature names importances     fare 0.288051   sex 0.281853   age 0.266491   pclass 0.0814224   siblings_spouses_aboard 0.0475633   parents_children_aboard 0.034619    After you have completed this skill builder with your team (or on your own) then compare your work to our script    See the script.   </p><p></p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/skill_builders\/ml_sklearn\/"},{value:"Project 4: Can you predict that?",label:"<p>Background The clean air act of 1970 was the beginning of the end for the use of asbestos in home building. By 1976, the U.S. Environmental Protection Agency (EPA) was given authority to restrict the use of asbestos in paint. Homes built during and before this period are known to have materials with asbestos YOu can read more about this ban.\nThe state of Colorado has a large portion of their residential dwelling data that is missing the year built and they would like you to build a predictive model that can classify if a house is built pre 1980.\nColorado gave you home sales data for the city of Denver from 2013 on which to train your model. They said all the column names should be descriptive enough for your modeling and that they would like you to use the latest machine learning methods.\nData Download: dwellings_denver.csv, dwellings_ml.csv, dwellings_neighborhoods_ml.csv\nInformation: Data description\nReadings  Machine Learning Introduction A visual introduction to machine learning How to choose a good evaluation metric for your Machine learning model  Optional References  Decision Tree Classification in Python Boosted algorithms in scikit-learn scikit-plot package  Grand Questions  Create 2-3 charts that evaluate potential relationships between the home variables and before1980. Explain what you learn from the charts that could help a machine learning algorithm. Build a classification model labeling houses as being built “before 1980” or “during or after 1980”. Your goal is to reach or exceed 90% accuracy. Explain your final model choice (algorithm, tuning parameters, etc) and describe what other models you tried. Justify your classification model by discussing the most important features selected by your model. This discussion should include a chart and a description of the features. Describe the quality of your classification model using 2-3 different evaluation metrics. You also need to explain how to interpret each of the evaluation metrics you use.  Deliverables Use this template to submit your Client Report. The template has three sections (for additional details please see the instructional template):\n A short summary that highlights key that describes the results describing insights from metrics of the project and the tools you used (Think “elevator pitch”). Answers to the questions from the \u0026ldquo;Questions and Tasks\u0026rdquo; section above. Each answer should include a written description of your results, code snippets, charts, and tables.  </p><p></p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/projects\/project-4\/"},{value:"Project 5: The war with Star Wars",label:"<p>Background Survey data is notoriously difficult to munge. Even when the data is recorded cleanly the options for ‘write in questions’, ‘choose from multiple answers’, ‘pick all that are right’, and ‘multiple choice questions’ makes storing the data in a tidy format difficult.\nIn 2014, FiveThirtyEight surveyed over 1000 people to write the article titled, America’s Favorite ‘Star Wars’ Movies (And Least Favorite Characters). They have provided the data on GitHub.\nFor this project, your client would like to use the Star Wars survey data to figure out if they can predict an interviewing job candidate’s current income based on a few responses about Star Wars movies.\nData Download: StarWars.csv\nInformation: Article\nReadings  Python for Data Science: Tidy Data Python for Data Science: Graphics for Communication Python for Data Science: Strings  Questions and Tasks  Shorten the column names and clean them up for easier use with pandas. Provide a table or list that exemplifies how you fixed the names. Clean and format the data so that it can be used in a machine learning model. As you format the data, you should complete each item listed below. In your final report provide example(s) of the reformatted data with a short description of the changes made. Filter the dataset to respondents that have seen at least one film. Create a new column that converts the age ranges to a single number. Drop the age range categorical column. Create a new column that converts the education groupings to a single number. Drop the school categorical column Create a new column that converts the income ranges to a single number. Drop the income range categorical column. Create your target (also known as “y” or “label”) column based on the new income range column. One-hot encode all remaining categorical columns.   Validate that the data provided on GitHub lines up with the article by recreating 2 of the visuals from the article. Build a machine learning model that predicts whether a person makes more than $50k. Describe your model and report the accuracy.  Deliverables Use this template to submit your Client Report. The template has three sections (for additional details please see the instructional template):\n A short summary that highlights key that describes the results describing insights from metrics of the project and the tools you used (Think “elevator pitch”). Answers to the questions from the \u0026ldquo;Questions and Tasks\u0026rdquo; section above. Each answer should include a written description of your results, code snippets, charts, and tables.  </p><p></p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/projects\/project-5\/"},{value:"SQL for Data Science",label:"<p>There are many flavors of SQL but most flavors have the same base commands. SQL queries are typed in the following pattern;\nSELECT -- \u0026lt;columns\u0026gt; and \u0026lt;column calculations\u0026gt; FROM -- \u0026lt;table name\u0026gt;  JOIN -- \u0026lt;table name\u0026gt;  ON -- \u0026lt;columns to join\u0026gt; WHERE -- \u0026lt;filter condition on rows\u0026gt; GROUP BY -- \u0026lt;subsets for column calculations\u0026gt; HAVING -- \u0026lt;filter conditions on groups\u0026gt; ORDER BY -- \u0026lt;how the output is returned in sequence\u0026gt; LIMIT -- \u0026lt;number of rows to return\u0026gt; Introductory SQL links  SQL Guide SELECT and FROM clauses WHERE and comparison operators ORDER BY Joins Aggregations GROUP BY  import pandas as pd import altair as alt import numpy as np import sqlite3 # %% # careful to list your path to the file. sqlite_file = \u0026#39;lahmansbaseballdb.sqlite\u0026#39; con = sqlite3.connect(sqlite_file) results = pd.read_sql_query( \u0026#39;SELECT * FROM allstarfull LIMIT 5\u0026#39;, con) results You can see the list of tables available in the database;\ntable = pd.read_sql_query( \u0026#34;SELECT * FROM sqlite_master WHERE type=\u0026#39;table\u0026#39;\u0026#34;, con) print(table.filter([\u0026#39;name\u0026#39;])) print(\u0026#39;\\n\\n\u0026#39;) # 8 is collegeplaying print(table.sql[8]) </p><p></p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/course-materials\/sql-for-data-science\/"},{value:"Munging data",label:"<p>Intro to cleaning movies data Link to the data\nThis skill builder focuses on munging (formatting) data into a machine learning ready dataset. We will be using an IMDB Ratings dataset. It contains columns that are categorical. Sklearn cannot handle columns that are strings, so we need to convert these into a numerical representation. We accomplish this by either one hot encoding, label encoding, or taking just one value of the range provided. There are many other ways to represent these columns as numbers, but they are beyond the scope of this course.\nOnce you\u0026rsquo;ve converted all columns to numeric, in an intelligent way, you will be asked to recreate a graph using altair. Here is the head of the data you will be working with. Enjoy!\n   star_rating content_rating genre duration box_office_rev major_hit     9.3 R Crime 142 €1924521976 - €1925521976 no   9.2 R Crime 175 €177034987 - €178034987 no   9.1 R Crime 200 €2617541398 - €2618541398 no   9 PG-13 Action 152 €996115723 - €997115723 no   8.9 R Crime 154 €1172054364 - €1173054364 no    Data Link to csv file: ...\n Exercise 0  Grab the high range value for each movie and put it into a new column called high_range_rev.  Make sure the data type of this new column is numeric!!   Remove the box_office_rev column from the dataset.  The .str.split() and .astype() methods might be of use! Also, to get the euro sign just copy it from here, €, and put it in your code.\nThe first 5 rows of the resulting dataframe should look like this\n   star_rating content_rating genre duration major_hit high_range_rev     9.3 R Crime 142 no 2345444803   9.2 R Crime 175 no 2182412593   9.1 R Crime 200 no 1604872807   9 PG-13 Action 152 no 284317976   8.9 R Crime 154 yes 1791932201     Exercise 1 Convert the major_hit column to 1\/0\u0026rsquo;s. yes -\u0026gt; 1 and no -\u0026gt; 0. Again, there are several ways to accomplish this. Using our old friend np.where is probably the easiest though.\nThe first 5 rows of the resulting dataframe should like this\n   star_rating content_rating genre duration major_hit high_range_rev     9.3 R Crime 142 0 1925521976   9.2 R Crime 175 0 178034987   9.1 R Crime 200 0 2618541398   9 PG-13 Action 152 0 997115723   8.9 R Crime 154 0 1173054364     Exercise 2 Convert the content_rating column using label encoding. We\u0026rsquo;re using label encoding in this case because the movie ratings already have a natural ordering to them. We will replace each rating with a number in it\u0026rsquo;s natural ascending order.\nTo be more specific, here is how we will do it.\n G: 0 PG: 1 PG-13: 2 R: 3  A dictionary and the .map() method could be useful for this exercise. There are other ways of tackling this problem though. Be creative!\nThe first 5 rows of the resulting dataframe should look like\n   star_rating content_rating genre duration major_hit high_range_rev     9.3 3 Crime 142 0 1925521976   9.2 3 Crime 175 0 178034987   9.1 3 Crime 200 0 2618541398   9 2 Action 152 0 997115723   8.9 3 Crime 154 0 1173054364     Exercise 3 The last column that we need to take care of is genre. We will use one hot encoding for this. Make sure to ONLY one hot encode the genre column!\nA useful function for one hot encoding is pd.get_dummies(). I recommend checking out the documentation.\nThe resulting dataframe should look like the following example; don\u0026rsquo;t worry if your high_range_rev column turned into scientific notation—Pandas does this sometimes.\n    star_rating content_rating duration major_hit high_range_rev genre_Action genre_Adventure genre_Animation genre_Biography genre_Comedy genre_Crime genre_Drama genre_Family genre_Fantasy genre_Horror genre_Mystery genre_Sci-Fi genre_Thriller genre_Western     0 9.3 3 142 0 1.92552e\u002b09 0 0 0 0 0 1 0 0 0 0 0 0 0 0   1 9.2 3 175 0 1.78035e\u002b08 0 0 0 0 0 1 0 0 0 0 0 0 0 0   2 9.1 3 200 0 2.61854e\u002b09 0 0 0 0 0 1 0 0 0 0 0 0 0 0   3 9 2 152 0 9.97116e\u002b08 1 0 0 0 0 0 0 0 0 0 0 0 0 0   4 8.9 3 154 0 1.17305e\u002b09 0 0 0 0 0 1 0 0 0 0 0 0 0 0     Exercise 4 Recreate this graph as best you can. You\u0026rsquo;ll need to use the original data that specifies the actual rating.\nAfter you have completed this skill builder with your team (or on your own) then compare your work to our script    See the script.   </p><p></p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/skill_builders\/munging\/"},{value:"Project 6: Git your resume online",label:"<p>Background GitHub is an online platform where data scientists and developers can communicate and share work. As students, you will want to curate your creative work on GitHub using a program called Git. GitHub is the place to share your original work, not your homework assignments.\nMany people store their personal websites, blogs, and project websites on GitHub. Our textbook and course are hosted on GitHub, and you can see J. Hathaway\u0026rsquo;s or Ryan Hafen\u0026rsquo;s personal Data Science websites that are hosted on GitHub as well. For this project, you will be making a public resume that will be hosted on GitHub.\nDuring this project you will learn the process of Git and the tools of GitHub. We will use Git to have others in our class to edit your resume. Take the process seriously (pick a suitable username and write a good resume), and you will have the beginning of your social presence in the DS\/CS space.\nData Repository: Markdown Resume (mdresume) Repository\nInformation: BYUI Data Science Resumes\nReadings  New to Git and GitHub? This Essential Beginners Guide is for you Git vs. GitHub: What is the difference between them? Using Version Control in VS Code Git in Visual Studio Code video  Questions and Tasks  Join GitHub. Pick a username you would be ok sharing with a potential employer. Join the BYUI Data Science Resumes GitHub organization and use the template repository to make a resume repository under your own GitHub account. A good name might be “Lastname-Resume” Clone your repository to your computer and build a first draft of your resume. Include a link to your resume in the \u0026ldquo;About\u0026rdquo; page. In Canvas, submit the live link to your resume\u0026rsquo;s website hosted in Github.   </p><p></p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/projects\/project-6\/"},{value:"VS Code for Data Science",label:"<p> What if my interactive Python window in VS Code is not using the same version of Python as my terminal?    You can set your Python version in VS Code by opening a .py script and then clicking on the Python text in the bottom left corner as shown below.\nOnce you click, VS Code will open the command pallete where you can select your installation of Python that you would like to use with this workspace.\nThis setting will not fix what version your interactive Python window is using. You can get there by opening settings by using the ⌘, shortcut.\nYou can then search your settings for jupyter and you should see a section that has Jupyter Command Line Arguments. Click on the Edit in settings.json.\nHere you can set the jupyter path to Python to match the one you picked for your Terminal. An example for a Mac computer is shown below.\n\t\u0026quot;python.pythonPath\u0026quot;: \u0026quot;\/usr\/local\/opt\/python\/bin\/python3\u0026quot;,    What if I am not able to read in files from the GitHub links using read_csv()?    Most likely your Python SSl certificates are not installed. Follow the answer in this post   How do I use VS Code to collaborate?    Microsft\u0026rsquo;s Live Share extension documentation says, \u0026lsquo;Live Share enables you to quickly collaborate with a friend, classmate, or professor on the same code without the need to sync code or to configure the same development tools, settings, or environment.\u0027 You can follow their guide or use our course created video.\n     </p><p></p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/course-materials\/vs-code\/"},{value:"Altair for Charts",label:"<p>Altair Visualization We will be using Altair in our course. It is a declarative visualization package in Python that is based on Vega-Lite which leverages the grammar of graphics.\n User Guide Data Visualization Curriculum or the Quarto version https:\/\/jjallaire.github.io\/visualization-curriculum\/) P4DS Data Visualization Chapter  Rendering Altair Charts in Quarto We use Quarto to render Altair images automagically into our HTML reports. The process should simply work.\nHowever, read in the following section IF you need to export one of your images as a .png or another image format. Saving Altair Charts Just installing altair and altair_saver will not allow you to leverage the .save() method to save your chart. The javascript visualization you see in your interactive python window needs additional external applications to allow .save(\u0027chart.png\u0027) to work.\nWe will go through a few ways for us to save our Altair plots.\n  1. Saving altair plots programmatically Let\u0026rsquo;s say we want to save the above plot as a PNG file. Assuming we have already installed the altair library, we need to install the altair_saver.\n1.1 Installing the altair_saver Within your interactive python window execute the following command.\nimport sys !{sys.executable} -m pip install altair_saver 1.2 Additional tool for saving plots We suggest NodeJS path. However, you are more than welcome to study Selenium for further understanding. The Github repository for altair_saver, the developers exclusively told us to install additional tools.\nNodeJS Installation\n Install the NodeJS for your platform Run the following in your Terminal (Mac) or PowerShell (Windows) to install all the packages we need from NodeJS.  npm install -g vega-lite vega-cli canvas M1 Mac Altair Solution  Install selenium using the chromedriver package form this link: https:\/\/chromedriver.chromium.org. Unzip the file and move the file to your chrome path \/usr\/local\/bin\/chromedriver  See the selenium_fix.py script for an example.\nNote: This process will run a local server on your computer that opens the chart as an PNG file in chrome and downloads the file to the folder in which that VSCode file is located on your computer.\n1.3 Saving a plot using altair_saver It might require you to restart VScode and import everything again for this to work. Please note that the plot will be saved in the same folder of the script.\nchart = alt.chart(\u0026lt;data\u0026gt;).\u0026lt;chart_methods\u0026gt; chart.save(\u0026#39;name_of_chart.png\u0026#39;) 2. Save as PNG method The method only requires us to have Altair library. Whenever we output a plot, we will see a button with three dots at the top right corner of the plot.\nClicking Save as PNG will bring us to a window to save our plot.\n3. Screenshot method If all thing fails and we need to save a plot, the snip \u0026amp; sketch (Windows) or taking a screenshot (MacOS) will be our last resort.\n</p><p></p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/course-materials\/altair\/"},{value:"GitHub and git",label:"<p>Complete the Hello World GitHub Guide\n</p><p></p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/skill_builders\/git_github\/"},{value:"Markdown for DS",label:"<p>Markdown Markdown is a plain text formatting syntax aimed at making writing more accessible. The philosophy behind Markdown is that plain text documents should be readable without tags making a mess, but there should still be ways to add text modifiers like lists, bold, italics, etc. It is an alternative to WYSIWYG (what you see is what you get) editors, which use rich text that later gets converted to proper HTML.1\nVSCode Markdown Extensions We prefer the following extensions;\n Markdown Preview Enhanced - This extension previews your Markdown and provides access to converting your markdown document to a pdf document. vscode-pdf - With this extension you will be able to view pdf files in VSCode. Markdown\u002bMath - Now, you can use LaTex math within your markdown file.  Markdown Preview Enhanced VSCode has its own Markdown previewer that displays the same icon in the top right corner of VSCode. You will need to hover over each to see which is Markdown Preview Enhanced (MPE). You will know that you are using MPE when your side view renders with a solid white background. Once you can view your rendered document, you can convert it to a pdf (after saving your file) by right-clicking on the preview. We recommend that you use Chrome (Puppeteer) \u0026gt; PDF to create a pdf document.\nReport Creating Process   Markdown Examples You can read the full syntax guide at the daringfireball.net website. The code chunk below highlights the standard syntax2\n*This text will be italic* _This will also be italic_ **This text will be bold** __This will also be bold__ _You **can** combine them_ You can make bulleted lists. * Item 1 * Item 2 * Item 2a * Item 2b Or numbered lists. 1. Item 1 1. Item 2 1. Item 3 1. Item 3a 1. Item 3b Place an image in the document. ![GitHub Logo](\/images\/logo.png) or a link in a document [GitHub](http:\/\/github.com) You can even blockquote Kanye West said: \u0026gt; We\u0026#39;re living the future so \u0026gt; the present is our past.  Finally, you can create tables. Check out `print(df.to_markdown())` to get tables from pandas. First Header | Second Header ------------ | ------------- Content from cell 1 | Content from cell 2 Content in the first column | Content in the second column Every once in a while, you may want strikethrough. ~~this~~ Getting tables out of Pandas You can create tables using Markdown in your reports. You can use the .to_markdown() method on your DataFrame object. You would use print(df.to_markdown(index=False)) to get tables from pandas. They would print out in your interactive window as;\nname | gender ----- | ------ J. | Male Katie | Female You would then copy the output from your interactive window and paste it into your .md report.\nClass template We have built a template to provide an example of you will submit your project reports. The template has three sections (for additional details please see the instructional template). As you use the template, the following items may help you understand how to write your report.\n The template is a guide. Every line that does not have a hashtag (#) in front of it is guidance. Don\u0026rsquo;t feel responsible for including it. The technical details section has the grand questions as subsections. You should include any work, explanation, charts, or tables that address under the grand question subsections. We have provided example descriptions before the grand question so you can see how to write in Markdown. Your appendix should have properly highlighted Python code that doesn\u0026rsquo;t run off the page (other than file paths).    https:\/\/www.ultraedit.com\/company\/blog\/community\/what-is-markdown-why-use-it.html \u0026#x21a9;\u0026#xfe0e;\n https:\/\/guides.github.com\/features\/mastering-markdown\/ \u0026#x21a9;\u0026#xfe0e;\n   </p><p></p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/course-materials\/markdown\/"},{value:"Quarto for Data Science",label:"<p>Quarto Quarto is an open-source scientific and technical publishing system built on Pandoc. You can create dynamic content with Python, R, Julia, and Observable.\nWe use this perfect union of Jupyter Notebooks and RMarkdown for reporting on our projects. It leverages Markdown and Python code chunks to create dynamic HTML content.\nMarkdown Markdown is a plain text formatting syntax aimed at making writing more accessible. The philosophy behind Markdown is that plain text documents should be readable without tags making a mess, but there should still be ways to add text modifiers like lists, bold, italics, etc. It is an alternative to WYSIWYG (what you see is what you get) editors, which use rich text that later gets converted to proper HTML.1\nQuarto Basics You will need to install the Quarto CLI and then go through the VS Code directions on using Quarto with Python.\n Install Quarto CLI Setup your VS Code Really read the VS Code setup entirely  Class template We have built a template to provide an example of you will submit your project reports (for additional details please see the instructional template). As you use the template, the following items may help you understand how to write your report.\n The template is a guide. Don\u0026rsquo;t feel responsible for including every item beyond sections for each question. Your appendix should have properly highlighted Python code that doesn\u0026rsquo;t run off the page (other than file paths). You can see examples of the html output here and here  Markdown Examples You can read the complete syntax guide at the daringfireball.net website. The code chunk below highlights the standard syntax2\n*This text will be italic* _This will also be italic_ **This text will be bold** __This will also be bold__ _You **can** combine them_ You can make bulleted lists. * Item 1 * Item 2 * Item 2a * Item 2b Or numbered lists. 1. Item 1 1. Item 2 1. Item 3 1. Item 3a 1. Item 3b Place an image in the document. ![GitHub Logo](\/images\/logo.png) or a link in a document [GitHub](http:\/\/github.com) You can even blockquote Kanye West said: \u0026gt; We\u0026#39;re living the future so \u0026gt; the present is our past.  Finally, you can create tables. Check out `print(df.to_markdown())` to get tables from pandas. First Header | Second Header ------------ | ------------- Content from cell 1 | Content from cell 2 Content in the first column | Content in the second column Every once in a while, you may want strikethrough. ~~this~~ Getting tables out of Pandas You can create tables using Markdown in your reports. You can use the .to_markdown() method on your DataFrame object. You would use print(df.to_markdown(index=False)) to get tables from pandas. They would print out in your interactive window as;\nname | gender ----- | ------ J. | Male Katie | Female You would then copy the output from your interactive window and paste it into your .md report.\n  https:\/\/www.ultraedit.com\/company\/blog\/community\/what-is-markdown-why-use-it.html \u0026#x21a9;\u0026#xfe0e;\n https:\/\/guides.github.com\/features\/mastering-markdown\/ \u0026#x21a9;\u0026#xfe0e;\n   </p><p></p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/course-materials\/quarto-for-data-science\/"},{value:"Git and GitHub for DS",label:"<p>Git what? Git is a distributed version control tool that can manage a development project\u0026rsquo;s source code history, while GitHub is a cloud based platform built around the Git tool. Git is a tool a developer installs locally on their computer, while GitHub is an online service that stores code pushed to it from computers running the Git tool. The key difference between Git and GitHub is that Git is an open-source tool developers install locally to manage source code, while GitHub is an online service to which developers who use Git can connect and upload or download resources.1\nGit? The Git tool is popular with developers because is stays true to its purpose of versioning source code, managing commit histories and making it possible to share code between developers without deviating into peripheral fields. There is no feature bloat with Git. It does what it does, it does nothing else, and it makes no apologies for that fact.1\nGithub? We’ve established that Git is a version control system, similar but better than the many alternatives available. So, what makes GitHub so special? Git is a command-line tool, but the center around which all things involving Git revolve is the hub—GitHub.com—where developers store their projects and network with like minded people.2\nSteps related to Git and Github for our final project.   Make sure you have git on your computer.\nA. Note that Mac users have a few extra concerns.3 B. Mac fix with paths ls \/usr\/local C. Download Xcode and update 10 gig download.\nD. VSCode path selection settings Git: path    Create a GitHub account and use an appropriate username    Connect to our BYU-I organizations.\nA. BYU-I DS Resumes need teacher to admit you B. BYU-I Data Science Society need teacher to admit you    Creat your own resume repo from our template (some directions)[https:\/\/github.blog\/2019-06-06-generate-new-repositories-with-repository-templates\/]    Publish your repo on GitHub pages.\nA. Go to settings for your repo.\nB. Scroll down to the GitHub Pages section.\nC. Under source select the box which says None and pick master.\nD. Now select the \/docs folder and click save.    Check your published site settings and copy your site URL.    Update your repository landing page to include your pages URL.    Edit the readme.md in the base repo to not show the resume directions if your repo is public.    Fork your repo back into the BYU-I DS Resumes    Merge a pull request with any changes in your personal repository (see pull and merge on GitHub Guide).     https:\/\/www.theserverside.com\/video\/Git-vs-GitHub-What-is-the-difference-between-them#:~:text=The%20key%20difference%20between%20Git,and%20upload%20or%20download%20resources. \u0026#x21a9;\u0026#xfe0e;\n https:\/\/www.howtogeek.com\/180167\/htg-explains-what-is-github-and-what-do-geeks-use-it-for\/ \u0026#x21a9;\u0026#xfe0e;\n https:\/\/stackoverflow.com\/questions\/29971624\/visual-studio-code-cannot-detect-installed-git \u0026#x21a9;\u0026#xfe0e;\n   </p><p></p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/course-materials\/git_github_ds\/"},{value:"Week 1: Introduction",label:"<p>  Introduction Project Syllabus   </p><p></p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/slides\/introduction\/"},{value:"DS250",label:"<p></p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/"},{value:"Frequently Asked Questions",label:"<p> What do you mean by data science programming?    Most likely, you have had 1-2 courses of programming before you have taken CSE 250. Unlike traditional computer science courses, CSE 250 uses Python in an interactive mode instead of building programs. The data provider usually has some big questions that need answering; However, there are hundreds of little issues and responses along the way. We use programming to facilitate this investigation.\nThere are similarities with User Experience Designers. In our case, we don\u0026rsquo;t get to ask users about their experience. We use programming to ask data about its background, and each data set has its own history. We want our analysis to mold to that experience. You can think of data science programming like a first date with your data. You can\u0026rsquo;t write one long program nieve of the issues and nuances each living data set provides.\n   How does CSE 250 compare to CSE 350 or Math 335?    The two courses have similarities. You could think of CSE 250 as an introduction to data wrangling and visualization. Both classes use real-world data and are built around data science projects. There are some critical differences between the two courses.\n In this course, we use Python, and CSE 350 uses R. We are introducing the principles of data science programming in CSE 250. The course is only 2-credits. CSE 250 is intended to introduce visualization, wrangling, and modeling.     How does CSE 250 prepare me for CSE 350, Math 335 and CSE 450?    You will be comfortable with interactive programming and have an introduction to the principles of data formats for data science applications. You will be introduced to principles related to machine learning, data wrangling, and data visualization.   What programming languages do we use in this course?    The course is done using Python. We focus on the pandas and Altair packages.   What are the prerequisites for this course?    Using the new courses at BYU-I, the prerequisite is CSE 110. However, if you have experience programming from other classes, you most likely are prepared for this course.   Why Python instead of R?    The computer science and software engineering programs at BYU-I use Python as their foundational courses. The standard student will have some experience with Python before CSE 250. Python is an essential programming language for data scientists, and we already have CSE 350\/Math 335, which is taught in R.   What is pandas?    pandas is the foundational data science package in Python. If you are using tabular data you will be in pandas.   Why are we using Altair instead of Seaborn or Matplotlib?    Matplotlib was the first visualization package to gain a following in Python. Seaborn is built on top of Matplotlib. Many data scientists use both in their work—neither leverage the grammar of graphics as developed by Leland Wilkinson. Altair is built on Vega-Lite, which uses the Vega visualization grammar. It is declarative and actively developed. We expect that it will become the predominant visualization package in Python (https:\/\/youtu.be\/FytuB8nFHPQ and https:\/\/youtu.be\/vTingdk_pVM).   </p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/faq\/"},{value:"Projects",label:"<p>We will be relying on a few resources for this course. You will find the pertinant readings attached to each of the projects. Those readings will be culled from;\n Python for Data Science: A port of R for Data Science using the Python packages pandas and Altair. pandas User Guide Altair User Guide scikit-learn learn User Guide scikit-learn tutorials Python Data Science Handbook A Whirlwind Tour of Python SQL  Wes McKinney\u0026rsquo;s pandas code for his book Python for Data Analysis is a useful reference as well: https:\/\/github.com\/wesm\/pydata-book\n</p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/projects\/"},{value:"Skill Builders",label:"<p>These short activites are provided for you to gain some additional skills to help with the class projects.\n</p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/skill_builders\/"},{value:"Slack",label:"<p>If you haven\u0026rsquo;t already, please join Slack. This will be a lifesaver.\nhttps:\/\/join.slack.com\/t\/byuidss\/signup\n</p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/slack\/"},{value:"Slides",label:"<p>Use the navigation pane on the left to review the class slides.\n</p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/slides\/"},{value:"",label:"<p>Details Your coding challenge will help you demonstrate the skills you have developed this semester. Here are a few essential items.\n Your goal is to demonstrate your data science coding abilities. Get through as many items with a rough implementation as possible. Get your code to match our outputs as close as possible, but don\u0026rsquo;t stress over minute details. Keep most of the code you type. If you end up not using specific parts, comment them out and include them at the bottom. Use the entire hour and may not finish. Submit a .md and a .pdf report with your output and code for each challenge.  Please use the challenge template to submit your work.\nimport pandas as pd import altair as alt import numpy as np from sklearn.model_selection import train_test_split from sklearn import tree from sklearn.ensemble import GradientBoostingClassifier from sklearn import metrics Challenge 1 Split Entry houses are a failed building experiment in the United States. Use the data from our Denver homes project, as shown below, to recreate the following graphic.\nurl = \u0026#39;https:\/\/github.com\/byuidatascience\/data4dwellings\/raw\/master\/data-raw\/dwellings_denver\/dwellings_denver.csv\u0026#39; dat_home = pd.read_csv(url).sample(n=4500, random_state=15) Challenge 2 Our computations can\u0026rsquo;t be done with missing values. Programmatically replace all the lost values with 125 and make a box-plot.\nmister = pd.Series([\u0026#34;lost\u0026#34;, 15, 22, 45, 31, \u0026#34;lost\u0026#34;, 85, 38, 129, 80, 21, 2]) Challenge 3 Our computations can\u0026rsquo;t be done with missing values. Programmatically replace all the lost values with 125 and report the mean rounded to two decimals.\nmister = pd.Series([\u0026#34;lost\u0026#34;, 15, 22, 45, 31, \u0026#34;lost\u0026#34;, 85, 38, 129, 80, 21, 2]) Challenge 4 Programmatically read in the following JSON file, keep only the cases column and return a markdown table that has country in the rows and cases for 1999 and 2000 in the columns. Your table will have six cells with values.\nurl = \u0026#39;https:\/\/github.com\/byuidatascience\/data4python4ds\/raw\/master\/data-raw\/table1\/table1.json\u0026#39; Challenge 5 Use our cleaned example of the star wars data from project 6 to predict the gender of the respondent to the survey. Report your precision and a feature importance plot.\n Use test_size = .20 and random_state = 2020 in train_test_split() Use the GradientBoostingClassifier() method.  url = \u0026#34;http:\/\/byuistats.github.io\/CSE250-Course\/data\/clean_starwars.csv\u0026#34; dat = pd.read_csv(url) </p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/final_coding_challenge\/sp22\/"},{value:"Categories",label:"<p></p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/categories\/"},{value:"Final_coding_challenges",label:"<p></p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/final_coding_challenge\/"},{value:"Office Hours",label:"<p>Schedule a visit with Brother Cannon at an available time. https:\/\/calendly.com\/cannonp\n</p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/contact\/"},{value:"Tags",label:"<p></p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/tags\/"},];$("#search").autocomplete({source:projects}).data("ui-autocomplete")._renderItem=function(ul,item){return $("<li>").append("<a href="+item.url+" + \" &quot;\" +  >"+item.value+"</a>"+item.label).appendTo(ul);};});</script></div></div></div></div></header><section class=section><div class=container><div class="row justify-content-center"><div class="col-12 text-center"><h2 class=section-title></h2></div><div class="col-lg-4 col-sm-6 mb-4"><a href=https://byuistats.github.io/DS250-Cannon/course-materials/ class="px-4 py-5 bg-white shadow text-center d-block match-height"><i class="ti-blackboard icon text-primary d-block mb-4"></i><h3 class="mb-3 mt-0">Course Materials</h3><p class=mb-0>Additional Readings and Guidance</p></a></div><div class="col-lg-4 col-sm-6 mb-4"><a href=https://byuistats.github.io/DS250-Cannon/projects/ class="px-4 py-5 bg-white shadow text-center d-block match-height"><i class="ti-bar-chart icon text-primary d-block mb-4"></i><h3 class="mb-3 mt-0">Projects</h3><p class=mb-0>Project details (the work you will do)</p></a></div><div class="col-lg-4 col-sm-6 mb-4"><a href=https://byuistats.github.io/DS250-Cannon/skill_builders/ class="px-4 py-5 bg-white shadow text-center d-block match-height"><i class="ti-ruler-pencil icon text-primary d-block mb-4"></i><h3 class="mb-3 mt-0">Skill Builders</h3><p class=mb-0>Build skills for the projects.</p></a></div><div class="col-lg-4 col-sm-6 mb-4"><a href=https://byuistats.github.io/DS250-Cannon/slack/ class="px-4 py-5 bg-white shadow text-center d-block match-height"><i class="https://img.shields.io/badge/slack-@oresoftware/npp-yellow.svg?logo=slack icon text-primary d-block mb-4"></i><h3 class="mb-3 mt-0">Slack</h3><p class=mb-0>Link to Slack signup</p></a></div><div class="col-lg-4 col-sm-6 mb-4"><a href=https://byuistats.github.io/DS250-Cannon/slides/ class="px-4 py-5 bg-white shadow text-center d-block match-height"><i class="ti-layout-slider-alt icon text-primary d-block mb-4"></i><h3 class="mb-3 mt-0">Slides</h3><p class=mb-0>Class material for every day.</p></a></div></div></div></section><footer class="section pb-4"><div class=container><div class="row align-items-center"><div class="col-md-8 text-md-left text-center"><p class="mb-md-0 mb-4">J. Hathaway and BYU-I ©</p></div><div class="col-md-4 text-md-right text-center"><ul class=list-inline><li class=list-inline-item><a class="text-color d-inline-block p-2" href=https://github.com/byuidatascience><i class=ti-github></i></a></li><li class=list-inline-item><a class="text-color d-inline-block p-2" href=https://www.linkedin.com/groups/13537407/><i class=ti-linkedin></i></a></li></ul></div></div></div></footer><script src=https://byuistats.github.io/DS250-Cannon/js/script.min.js></script></body></html>
\ No newline at end of file
+<i class="ti-search search-icon"></i><script>$(function(){var projects=[{value:"Day 2: Project 0",label:"<p>Syllabus Questions?  A note about readings\u0026hellip; Tips for asking for help  Slack Google - acquired discernment   Quarto and tradeoffs Project Submissions: HTML  Are we all on the Slack channel? Follow the Slack invitation that is waiting in your student email. If you don\u0026rsquo;t see an invite, you can join through this link and then ask Brother Cannon to add you to the class channel.\nMethods Checkpoint All the answers will be in the assigned reading or in these slides.\nNotes on Project 0 Installing Packages and Extensions Learn how to install packages by reading the assigned material and by watching the video tutorial on this page.\nThe readings mention a lot of different packages. For Project 0, you need to install at least pandas, altair, numpy, tabulate, and jupyter.\nThe readings will also mention two VS Code extensions you need to install.\nJupyter Notebooks vs. Interactive Python Window Should you decide to use Juypyter Notebooks this semester within VS Code, this is a great guide to get you started.\nOr you can choose to stick with the Python Interactive window like the textbook does.\nUse Your Resources!  Technical documentation Google searches Asking for help on Slack Don\u0026rsquo;t forget the data science lab! (Starts next week.) Question that cannot be answered by the textbook and documentation? Google it. A function you have never seen before? Google it. An error in your code? Google it.  Markdown What is Markdown?  A clean, human readable way to make slick html and pdf documents Used widely among programmers for clean documentation Used widely by Data Scientists to publish results and communicate with stakeholders  Here\u0026rsquo;s a good summary\nQuarto Do your tinkering in interactive Python or Jupyter notebooks. Generate report with finished code, graphs, etc. in Quatro\nQuarto\nNow for some data! Let\u0026rsquo;s get this party started Your turn:  Read in the cars data set Work with you your teams to talk through interesting possibilities for a graph Work on Project 0 Questions and Tasks   Any issues with getting Python installed?     Python VS Code Altair in VS Code     Does everyone have pandas, altiar, numpy, scikit-learn installed?     Video tutorial: how to install packages.  One way to install packages:\npip install pandas altair Maybe a better way to do it: run this in an interactive window.\nimport sys !{sys.executable} -m pip install pandas altair    Does everyone have altair-saver working?     altair_saver Video tutorial     ---------------------------------------------------- Why are we using Altair?    It is built on the VEGA and D3 which are fast and web based.  Grammar of Graphics: Vega-Lite   Technical Paper Website Endorsment      What are we not learning in this course?    Indexing, .loc[] and .iloc[] I may not be experienced enough to understand why I should teach you these. I think they all add complexity to what we are learning in the course and we have elected to avoid it. We will use reset_index() a lot. I think MultiIndex features create complication. I have also elected to use .filter() instead of .loc[] because I like it.\nVirtual Environments Virtual Environments appear to be an important tool as you continue to use Python. We will not be teaching these or supporting these in our course.\nmatplotlib (and any tool leveraging it) It feels old, has a bad api, and isn\u0026rsquo;t declarative.\n   ----------------------------- What can Python Interactive do?    Let\u0026rsquo;s review the power of Python Interactive  # %% in my .py script is much better than Jupyter notebooks (.ipynb).  If we hope to have our code work in a production environment then Jupyter is problematic. Caching and code chunks are problematic https:\/\/medium.com\/@_orcaman\/jupyter-notebook-is-the-cancer-of-ml-engineering-70b98685ee71       Set-up your py script    Setting up your script A good data science .py script will have packages and data loaded at the top. Usually you have a few short commented sentences that descibe the script purpose.\n# %% # import pandas, altair, numpy import pandas as pd import altair as alt import numpy as np # %% # load data # handgrenade data https:\/\/github.com\/byuidatascience\/data4soils\/blob\/master\/data-raw\/cfbp_handgrenade\/cfbp_handgrenade.csv url = \u0026#39;https:\/\/github.com\/byuidatascience\/data4soils\/raw\/master\/data-raw\/cfbp_handgrenade\/cfbp_handgrenade.csv\u0026#39; dat = pd.read_csv(url)    Make a scatter plot with hmx on the x and rdx on the y    To get you started:\nalt.Chart(dat).encode()    Make a spatial plot with hmx colored     Encode the row and column to the axes. Color the hmx points using the \u0026lsquo;goldorange\u0026rsquo; color scheme. Use mark_square() and make the square sizes 500.     -------------------- Create a histogram of hmx     Encode the x-axis as binned. Encode the y-axis as counts. Configure the title to a fontSize of 20. Use properties to place the title.     ----------------------------- How can I get help?     Make sure you read the reading assignments once or twice or five times. Read the guides on the Course Materials page. Post questions in our #cse250_s21_larson slack channel (and try to help others!) Attend the Data Science Lab. Google is your best friend.     -------------------------- </p><p></p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/slides\/introduction\/day02\/"},{value:"Day 3: Resume Fork and Merge",label:"<p>Remember from last class: pull, add, commit, push. Making edits in another user\u0026rsquo;s repo Breakout Room Activity\nEach student in the breakout room is going to provide feedback on another student\u0026rsquo;s resume. The breakout room should begin with a group discussion about the work you\u0026rsquo;ve each done on your resume and any questions the group has. Then follow the steps below.\n fork the other student\u0026rsquo;s resume repository. Now clone that forked repository to your computer. On your local version of the forked repository, do the following;\nA. Create a new file called edits.md and save it in the main folder or the repository.\nB. Make a few recommendations or notes in the edits.md file that will help the other student improve his or her resume.\nC. add, commit, push your edits.\nD. Go to the forked repo on GitHub and check if the edits.md file shows up online. Now, create a pull request to get your edits into the other student\u0026rsquo;s original repo.  Once you\u0026rsquo;ve given another student feedback, accept any pull requests submitted to your own repo. Continue to edit and improve your resume based on the feedback you received.\nCreating a fork in byuids-resumes Fork your own resume repository into the BYU-I Data Science Resumes group.\nIf you change your resume after you create this fork, you will have to submit a pull request to make sure the final version of your resume shows up in the group.\nThese instructions will help you create a pull request.\nOpen time to finalize your resume </p><p></p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/slides\/p6\/d4\/"},{value:"DS 250 Syllabus",label:"<p> Why is a raven like a writing desk? -Lewis Carroll- v -The Internet-\n Contact Information Instructor Name: Paul Cannon Email: cannonp@byui.edu Phone: (208) 496-7565 Student hours: Ricks 216, TBD Calendly: https:\/\/calendly.com\/cannonp\nOverview This course provides a better understanding of data programming. If you have signed up for this class, you are most likely driven by curiosity and interested in how data decisions are made. Possibly, you have a more empathetic approach to how the world works and how problems can be solved. Finally, you have an eye for how society reports and uses data to make impactful decisions.1.\nUpon completing this course, you will be able to use data-driven programming in Python to handle, format, and visualize data. We will introduce you to data wrangling techniques, analytical methods, and the grammar of graphics. Specifically, as a successful learner, you will be able to;\n Use functions, data structures, and other programming constructs efficiently to process and find meaning in data. Programmatically load data from various types of data sources, including files, databases, and remote services. Use data manipulation libraries to perform straightforward analysis, produce charts, and prepare data for machine learning algorithms. Use machine learning libraries to discover insights, make predictions, and interpret the success of these algorithms. Use industry-leading tools to collaborate and share your work.  Principles of DS teaching The course follows these principles of teaching Data Science2\n Organize the course around a set of diverse projects Integrate computing into every aspect of the course Teach abstraction, but minimize reliance on mathematical notation Structure course activities to realistically mimic a data scientist\u0026rsquo;s experience Demonstrate the importance of critical thinking\/skepticism through examples  Competency assumptions This course focuses on programming with data to find insights. The prerequisite for this course is an introductory programming course in Python (CSE 110)3. We recommend taking CSE 111 before or during the same semester you take this course - especially if programming is complicated for you. We assume that you do know what the Terminal is and how to execute scripts.\nAn understanding of standard deviation{target=\u0026quot;blank\u0026rdquo;} and variance{target=\u0026quot;blank\u0026rdquo;} will be valuable.\nCourse materials and structure This course focuses on building core data science skills. You will learn to program, but you will also learn how to communicate and collaborate with your peers and mentors.\nCourse communication  How do I talk with my teacher, TA, and other students in this class?\n  We use Slack for most class and one-on-one communication. Don\u0026rsquo;t email or direct message using I-Learn.\nA. Should I paste code snippets in our class Slack channel to get help? Yes.\nB. Should I ask questions about the projects and the readings in our class Slack channel? Yes.\nC. Should I post random quotes or videos in our class Slack channel? No. Use the #random channel. All assignments are submitted in I-Learn. A. Each project submission requires you to submit a short message to the teacher about your work.\nB. We will respond to your message with edits you can make to earn full credit on your resubmit.\nC. Class announcements about the grading of projects are posted in I-Learn.  Online reading materials  Python for Data Science: A port of R for Data Science using the Python packages pandas and Altair. pandas User Guide Altair User Guide Python Data Science Handbook SQL by data.world  Preparation In my experience, getting lectured training outside of college is even more expensive than it is in college. A week\u0026rsquo;s worth of training can cost more than a semester of school here at BYUI. Due to this expense, learning how to digest online material gain understanding before going to the expert with questions is a valuable skill to develop. I expect that you have completed the assigned reading material before class begins.\nSpecifications grading Grading is a nasty side effect of mass learning and academia. We are in a class at a university and will have to manage this side effect. However, we don\u0026rsquo;t have to let it control our learning, thinking, or this class. Learning and thinking should motivate each activity.\nAs we team, teacher and student, we have the challenge to become more! We have worked hard to identify the specifications needed for a python user of the pandas and Altair packages. Our goal is to align your grade with the skill specification you have mastered. In other words, the grade you want will determine how much work you will do. We will not score individual tasks in the class on a percentage scale. If your work meets the specified criteria, you will get full credit.\nIn a specifications-grading system, all tasks are evaluated on a high-standards pass\/fail basis using detailed checklists of task requirements and expectations4. You earn your letter grade by earning passing marks on a set of tasks. This system provides various choices and is closer to how learning and work occur in the real world. It will be easy for us to tell if work is complete, done in good faith, and consistent with the requirements.\nGrading Scale and Elements The grading scale describes the amount of work you must put into the grading elements to achieve the your desired grade. The grading scale only lists requirements for A, B, C, and D. You can request half-step adjustments if you fall slightly short or over on some elements.\nYou will need to provide a detailed description of your completed grading elements in your Review and Request Letter (due at the end of the semester) to support your grade request. For example, if a student was in the B range they might write the following:\n\u0026ldquo;I got three fives, one four, and two threes for 29 points on the projects. I met the checkpoint requirements with 4 halfway checkpoints and 4 full-mark methods checkpoints. I regularly attended data science society. I believe my coding challenge will be above a 3. I request a B\u002b.\u0026quot;\n  Leader (A)    Element Requirement Description     Projects 34 Points 5 points per project   Mid-project checkpoints 5 completed Full credit   Methods \u0026amp; Calculations checkpoints 6 completed 100% unlimited attempts   DS Community Complete 2 \u0026ndash;   Request and review letter submission \u0026ndash;   Coding challenge At least 3 Score is out of 4    See the competency descriptions in the next section. You can request half-step adjustments if you fall slightly short or over on some elements.\n Supporter (B)    Element Requirement Description     Projects 29 Points 5 points per project   Mid-project checkpoints 3 completed Full credit   Methods \u0026amp; Calculations checkpoints 5 completed 100% unlimited attempts   DS Community Complete 2 items \u0026ndash;   Request and review letter submission \u0026ndash;   Coding challenge At least 3 Score is out of 4    See the competency descriptions in the next section. You can request half-step adjustments if you fall slightly short or over on some elements.\n Listener (C)    Element Requirement Description     Projects 24 Points 5 points per project   Mid-project checkpoints 3 completed Full credit   Methods \u0026amp; Calculations checkpoints 3 completed 100% unlimited attempts   DS Community Complete 1 item \u0026ndash;   Request and review letter submission \u0026ndash;   Coding challenge At least 2 Score is out of 4    See the competency descriptions in the next section. You can request half-step adjustments if you fall slightly short or over on some elements.\n Asleep (D)    Element Requirement Description     Projects 14 Points 5 points per project   Mid-project checkpoints 1 completed Full credit   Methods \u0026amp; Calculations checkpoints 2 completed 100% unlimited attempts   DS Community None \u0026ndash;   Request and review letter None \u0026ndash;   Coding challenge None Score is out of 4    See the competency descriptions in the next section. You can request half-step adjustments if you fall slightly short or over on some elements.\n   Competency elements  Projects (Grand questions) Each of the seven projects is worth 5 points and you get one additional submission after the due date. There are 5 two-week projects and two one-week projects to start and end the class.\nGrading Details  1 point: Submission 3 points: submission of a good faith attempt with a statement of work quality. 4 points: High-quality work that addresses each of the Grand Questions and a statement of work quality. 5 points: Addressed reviewer issues and completion of resubmission if needed.   Checkpoints (methods and calculations) These checkpoints are in Canvas and they open when the project starts. They have unlimited attempts and remain open until the end of the semester.\nExamples  Fact-Finding Questions (Calculate descriptive summaries): Fact-finding questions help you with calculations that build into the Grand Questions of the project. These questions have clearly defined answers using Python calculations. You should expect 2-3 problems.   Example: Using the top 10 airports in size, what is the average size? Example: What proportion of flights are delayed at the largest airport?   How the code works questions (Explaining the tools): This part could have direct answer questions or open-ended questions.   Example (direct): What is the recommended function for arranging your data by a variable? What are the outputs after using \u0026lt;FUNCTION\u0026gt;? Example (open): Your client has shown some confusion about NumPy\u0026rsquo;s \u0026lsquo;nan\u0026rsquo; handling in Python. Help them understand by answering the question, \u0026lsquo;How is missing data handled in Pandas?\u0026rsquo;   Checkpoints (Mid-project status) The mid-project checkpoint has a few questions. It opens the first day of the project and closes at 1 am on the 3rd day of class for the project. It has the following questions.\nExamples  Have you checked off more than one grand question from the current project? (Yes\/No) Have you spent at least 2 hours using code to tackle problems related to the case study? (Yes\/No). Have you prepared questions you have about the case study to ask in your next meeting? (Yes\/No).   Data science community To earn credit for the DS Community element you must complete two different tasks from the list below. At the end of the semester, you will be asked to report on which tasks you completed and what you learned from them.\n Attend Data Science Society at least once. Sign up for an email newsletter that will teach you more about data science. Data Science Weekly or Data Elixir are good options. Listen to a podcast episode about data science. Build a Career in Data Science has some excellent episodes. Watch a professional presentation on YouTube about data science. Be prepared to share the link and a summary of the video. Reach out to someone who works in a data-related field and ask them for 15 minutes of their time. Use this time to conduct an \u0026ldquo;informational interview\u0026rdquo; and learn more about their responsibilities and career path. Research and apply to at least 5 data-related jobs or internships.   Finishing the semester Submit a request and review letter that includes what you have learned from this class, the next data science course you plan on taking, and the final grade that you are requesting based on the work you have submitted.\n Coding challenge We will have an in-class coding challenge on the ultimate or penultimate day of class. It would be best if you did not view this challenge like a traditional exam. It will cover the general techniques that we have been practicing throughout the course.\nWe expect to have a few practice challenges throughout the semester. We will score the coding challenge on a four-point scale.\n 1 point: At least you tried. 2 points: You have learned some items from the course, but your work in the coding challenge is deficient. 3 points: Your submission uses proper coding techniques and addresses the objective. 4 points: Exceptional work. Your code can be used as a solution to share with others.       https:\/\/medium.com\/@nikhilbd\/what-makes-a-good-data-scientist-engineer-a8b4d7948a86#.jr80wl98y. I suppose some of you are just taking this class because your degree says you can, and it fits in your schedule. If so, we should chat to make sure this is the right class for you. \u0026#x21a9;\u0026#xfe0e;\n https:\/\/arxiv.org\/ftp\/arxiv\/papers\/1612\/1612.07140.pdf. You will see this pattern in DS 350, DS 460, and Math 488. It will progressively get more realistic. \u0026#x21a9;\u0026#xfe0e;\n We do expect that this is not your first experience with Python and VS Code. If you have done other programming courses, you should be able to succeed in this course. If you have any questions, please ask. \u0026#x21a9;\u0026#xfe0e;\n Making the right checklists can be difficult. Bad checklists could fall in the following categories \u0026ndash; vague and imprecise; too long; hard to use; impractical; too pedantic. Useful checklists are precise, efficient, easy to use and understand. This is the first time this course has been offered, so we will have to work together to ensure the requirements are reasonable. \u0026#x21a9;\u0026#xfe0e;\n   </p><p></p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/course-materials\/syllabus\/"},{value:"Introduction",label:"<p>A competent student should be able to finish the exercises within 60 minutes. You should work through it on your own. This serves as an assessment of your understanding of the assigned readings.\nBefore you start Make sure you have installed VS-code, pandas, and altair on your computer. You can install these package by typing this line in the terminal.\npip install pandas altair\nOR if you have more than one version of python\npip3.9 install pandas altair\npip3.9 indicates the version of python you are installing the packages to.\nPart 1 Get familiar with your tools Programming involves a lot of research. Unlike subjects like Mathematics or History, we are not required to remember every single function and its usage. It is natural for experienced programmers to look for answers on the internet, books, even from other people\u0026rsquo;s code. Programming will be extremely frustrating if we are not allowed to do web searches, so please get familiar with the tools you have and use them often.\nOffical Documentation This should be your first resort for understanding any code\/function. Scanning the documentation of a function will allow you to get an overview of its usage.\nHere is a link to the documentation of the assign() function:\n(https:\/\/pandas.pydata.org\/docs\/reference\/api\/pandas.DataFrame.assign.html)\nExample of assign() (as shown in the documentation)\nimport pandas as pd df1 = pd.DataFrame({\u0026#39;temp_c\u0026#39;: [17.0, 25.0]}, index=[\u0026#39;Portland\u0026#39;, \u0026#39;Berkeley\u0026#39;]) df2 = df1.assign(temp_f=df1.temp_c * 9 \/ 5 \u002b 32) Exercise 1: After reading the documentation for assign(), write a short paragraph to explain assign() as if you were talking to someone with zero programming experience (use the example above to help you explain assign()).\n What is the difference between df1 and df2? How was df2 derived from df1?)  Online textbook It pains us to see students would rather be stuck at problems for hours yet they refuse to use the textbook. This is another very useful resource since this is designed for this class. link to the textbook: (https:\/\/byuidatascience.github.io\/python4ds\/)\nExercise 2: Locate the section where the textbook talks about query() and answer these questions.\n What function in R\u0026rsquo;s dplyr is equivalent or comparable to query() in pandas (You should include the section number in your answer)? What is the easiest mistake for python beginner to make that was shown in the text about query() (You should include the section number in your answer)?  The internet Google is a programmer\u0026rsquo;s friend. Get used to googling thing, in fact, you want to be an expert in googling\n Question that cannot be answered by the textbook and documentation? Google it. A function you have never seen before? Google it. An error in your code? Google it.  Exercise 3: Provide at least 2 extra resources you could find about the pandas function drop() on the internet.\nTutor, TA (Through slack, zoom, or in-person) We want to help you with your work; we want to answer your questions; but most importantly, we want to help you succeed in this class. That will require you to put in the necessary time in understanding the readings, coding and debugging. When you ask us a question, we expect that you have read the documentation, searched the textbook, and done your own research. Then we can be most helpful and can provide insights on top of your understanding.\nExamples of bad questions  How does drop() work? We will ask you to read the documentation for drop(). How do you make a table in a markdown file? We will refer you to the textbook. I don\u0026rsquo;t want these columns in my data, how can I drop them? We will ask you if you have found any things on the internet.  Examples of good questions  I am still confused about the syntax of drop(). After reading the documentation, this is my understanding of the function\u0026hellip; . What am I missing? I tried making a table in markdown (show code), it is still not giving me what I want, how can I fix this? I am trying to drop these columns in my dataframe, I think drop() is what I am looking for. Am I in the right direction? If not, what keywords should I be googling?  Exercise 4:\nUsing the code and tools mentioned above, finish question 4 and 5 under 3.2.4 in the textbook.(use the data in mpg for your plot):\n# library import import pandas as pd import altair as alt # data import url = \u0026quot;https:\/\/github.com\/byuidatascience\/data4python4ds\/raw\/master\/data-raw\/mpg\/mpg.csv\u0026quot; mpg = pd.read_csv(url)   Question 4: Make a scatterplot of hwy vs cyl.\n  Question 5: What happens if you make a scatterplot of class vs drv? Why is the plot not useful?\n  After you have completed this skill builder with your team (or on your own) then compare your work to our script    See the script.   </p><p></p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/skill_builders\/introduction\/"},{value:"Project 0: Introduction",label:"<p>Background We will complete six projects during the semester that each take about four days of class. On average, a student will spend 2 hours outside of class per hour in class to complete the assigned readings, submit any Canvas items, and complete the project (for a total of 8 hours per project). The instruction for each project will be structured into sections as written on this page.\nThis first Background section provides context for the project. Make sure you read the background carefully to see the big picture needs and purpose of the project.\n Python and VS Code are tools commonly used in the field of data science. During our first two days of class we will get VS Code prepped for data science programming. Completing Project 0 will set you pu for success the rest of the semester.\nData Every data science project should start with data, and our class projects are no different. Each project will have \u0026lsquo;Download\u0026rsquo; and \u0026lsquo;Information\u0026rsquo; links like the ones below.\n Download: mpg data\nInformation: Data description\nReadings The Readings section will contain links to reading assignments that are required for each project, as well as optional references. Remember that you are reading this material to build skills. Take the time to comprehend the readings and the skills contained within.\nWe recommend reading through the assigned material once for a general understanding before the first day of each project. You will reread and reference the material multiple times as you complete the project.\n The readings listed below are required for the first two days of class.\n Python for Data Science (P4DS): Introduction P4DS: Data Visualization Section 3.1 \u0026amp; 3.2 Only  Optional References  VS Code user interface Reading Technical Documentation  Questions and Tasks: This section lists the questions and tasks that need to be completed for the project. Your work on the project must be compiled into a rport and submitted in Canvas by the weekend following the last day of material for the project.\n  Finish the readings and be prepared with any questions to get your environment working smoothly (class for on-campus and Slack for online) In VS Code, write a python script to create the example Altair chart from section 3.2.2 of the textbook (part of the assigned readings). Note that you have to type chart to see the Altair chart after you create it. Your final report should also include the markdown table created from the following (assuming you have mpg from question 2).  print(mpg .head(5) .filter([\u0026#34;manufacturer\u0026#34;, \u0026#34;model\u0026#34;,\u0026#34;year\u0026#34;, \u0026#34;hwy\u0026#34;]) .to_markdown(index=False)) Deliverables: Deliverables are “the quantifiable goods or services that must be provided upon the completion of a project”. In this class the deliverable for each project is a HTML report created using Quarto. This final section will be the same for each project.\n Use this template to submit your Client Report. The template has three sections (for additional details please see the instructional template):\n A short summary that highlights key that describes the results describing insights from metrics of the project and the tools you used (Think “elevator pitch”). Answers to the questions from the \u0026ldquo;Questions and Tasks\u0026rdquo; section above. Each answer should include a written description of your results, code snippets, charts, and tables.  This is a simple note.\n This is a simple tip.\n This is a simple info.\n -- </p><p></p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/projects\/introduction\/"},{value:"Pull and Merge Forks on GitHub",label:"<p>Create Pull Request   Go the the forked repository in byuids-resumes and click Pull request.    This will bring you to the the following page where you need to click switching the base.    Now you can Create pull request.    Here you can type a note and then actually Create pull request.    Now you need to View pull request.   Merge Request If you have admin access of the forked repository where you are doing the pull request, you can finish the next two steps.\n Click the Merge pull request button.    Now confirm the merge.   </p><p></p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/course-materials\/git_github_ds\/pull_merge\/"},{value:"Week 12-13: Project 6 - Github",label:"<p> GitHub is the communication tool for Data Scientists and developers. As students, you will want to curate your creative work on GitHub using Git. GitHub is the place to share your original work, not your homework assignments. Many people store their personal websites, blogs, and project websites on GitHub. Our textbook and course are hosted on GitHub, and you can see J. Hathaway\u0026rsquo;s or Ryan Hafen\u0026rsquo;s personal Data Science websites that are hosted on GitHub as well. You will be making your public resume that will be hosted on GitHub for this project.\nIn the process of this project, we will be learning the process of Git and the tools of GitHub. We will use the Git process to have others in our class to edit our resumes. Take the process seriously (pick a suitable username and write a good resume), and you will have the beginning of your social presence in the DS\/CS space.\n Completed Readings: GitHub, a programmer\u0026rsquo;s social media, Join GitHub, Repository Templates, Using Version Control in VS Code, Working with GitHub in VS Code, Git in Visual Studio Code video, New to Git and GitHub? This Essential Beginners Guide is for you, Git vs. GitHub: What is the difference between them?\n Markdown Resume (mdresume) Repository and BYUI Data Science Resumes\n Grand Questions  Join the BYUI Data Science Resumes GitHub organization and use the template repository to make a resume repository under your repositories. A good name might be LASTNAME-Resume. Clone your repository to your computer and build a first draft of your resume. Push your results to GitHub and have another student fork your repository to make edits. Accept the proposed changes from the student review and finish your final version. Make sure your resume is forked by BYU-I Data Science Resumes  </p><p></p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/slides\/p6\/"},{value:"Day 1: Welcome",label:"<p>Welcome to DS 250!  Teacher: Paul Cannon TA: David Pineda  Announcements  Devotional Computing Lab 4:30PM - 6:30PM all weekdays except Wednesday. Saturday from 10AM-12PM  Slack channel #tutoring_lab   Data Science Society - Wednesday\u0026rsquo;s at 6PM  What is a Data Scientist? A Data Scientist has a C\u002b Talent Stack Class Structure  Problem Solving Improved coding skills Effective written\/visual communication Collaboration Timeliness and communication with \u0026ldquo;the boss\u0026rdquo;  Syllabus\nGot Slack? Are we all on the Slack channel? Follow the Slack invitation that is waiting in your student email. If you don\u0026rsquo;t see an invite, you can join through this link and then ask \u0026ldquo;@Paul Cannon\u0026rdquo; to add you to the class channel.\nWho are you?  Introduce yourself and learn the names\/majors\/origin story of your group members. Make a plan to get help this semester. How will you contact each other? Some ideas: Slack, I-Learn, emails, group texts, etc. If you were independently wealthy, what would you be doing right now? Would you change majors? Highlights of 2022  Problem Solving This is not a \u0026ldquo;see and repeat\u0026rdquo; programming class!\nHow would you go about fixing my motorcycle? Learn how to ask for help (1 hr rule)  Getting started on Project 0 Setting up your Programming Snvironment  Download Visual Studio Code Download Python v (3.10.8)  Be sure to select the \u0026ldquo;Add to Path\u0026rdquo; option during the install process    Install the Python packages and VS Code extensions you need (see this page)  pip install pandas pip install numpy pip install jupyter pip install tabulate pip install altair   Install Quarto CLI Quatro Instructions Start looking at Project 0 Complete the \u0026ldquo;Methods Checkpoint\u0026rdquo;  Installing Packages and Extensions Learn how to install packages by reading the assigned material and by watching the video tutorial on this page.\nThe readings mention a lot of different packages. For Project 0, you need to install at least pandas, altair, numpy, and jupyter.\nThe readings will also mention two VS Code extensions you need to install.\nA note on Jupyter Notebooks vs. Interactive Python Window The textbook will show you how to use VS Code\u0026rsquo;s interactive python windows and Quatro. Feel free to use Jupyter Notebooks.\nWe will do write-ups in Quarto, though, which can be rendered as a PDF or HTML\nIntroduction to Brother Cannon    What do you want to know?    What is a data scientist?    Brother Hathaway\u0026rsquo;s definition:\n A blend of programmer, statistician, and communicator that burns with curiosity.\n My definiton for DS 250:\n Someone who can extract insights from data and then communicate those insights with clarity.\n Learn more about the BYU-Idaho data science program here.\n   What is data science programming?    Data scientists write code as a means to an end, whereas software developers write code to build things. Data science is inherently different from software development in that data science is an analytic activity, whereas software development has much more in common with traditional engineering.\nData scientists tackle problems such as identifying fraudulent transactions, or predicting which employees are likely to leave a company. Software developers can take the data scientists models and turn them into fully functioning systems with production-quality code. Software developers tackle problems like getting an algorithm to run more efficiently, or building user interfaces.\n   Course Outcomes    Upon completing this course, you will be able to use data-driven programming in Python to handle, format, and visualize data. We will introduce you to data wrangling techniques (panadas), analytical methods (scikit-learn), and the grammar of graphics (Altair). Specifically, as a successful learner, you will be able to:\n Use functions, data structures, and other programming constructs efficiently to process and find meaning in data. Programmatically load data from various types of data sources, including files, databases, and remote services. Use data manipulation libraries to perform straightforward analysis, produce charts, and prepare data for machine learning algorithms. Use machine learning libraries to discover insights, make predictions, and interpret the success of these algorithms. Collaborate and share your work with industry-leading tools.     BYU-Idaho Mission Statement     Brigham Young University-Idaho was founded and is supported and guided by The Church of Jesus Christ of Latter-day Saints. Its mission is to develop disciples of Jesus Christ who are leaders in their homes, the Church, and their communities.\n  How would you describe a leader? What makes a leader powerful? What does a leader do with insights?  An example of a good leader.\nWhat (or who) is truth?\n   ## Course Format and Grading How hard is this class going to be?    The reality of CSE 250:\n We have done all we can to ensure that this is a 2-credit course for the average student. That means that we expect 4-6 hours outside of class for the average student to achieve an A. You have to put in the time if you want to build skills. The course is necessarily creative in nature. That fact usually makes it feel more challenging. We will be asking you to learn to write creative data science python code. If you have any concerns, please talk with me!     What is the structure of CSE 250?    The class uses 7 projects to teach data science programming in Python using pandas, Altair, scikit-learn, and numpy.\n Projects Syllabus     How do I get the grade I want?     Specification Grading Grading structure Competency Elements  Introduction Project \u0026mdash;\u0026mdash;\u0026mdash;\u0026mdash;\u0026mdash;\u0026mdash;\u0026mdash;\u0026mdash;\u0026mdash;\u0026mdash;\u0026mdash;\u0026mdash;\u0026mdash;\u0026mdash;\u0026mdash;\u0026mdash;\u0026mdash;\u0026mdash;\u0026mdash;\u0026mdash;\u0026gt;\nWhat is the goal?    Completing the introduction project will set you up for success the rest of the semester. The workflow followed in the introduction project (loading packages, writing code, saving images, compiling a final report) will be the same for every other project . If you have questions about this project, you need to seek help.   What exactly do I need to submit?    Make sure you carefully read the project instructions.\nYou will submit a single .pdf file to I-Learn. This pdf file should contain an project summary, your answers to the grand questions (including the plot you saved with altair_saver), and an appendix where you copy and paste your commented Python code.\n   --------------------------------------------------------   ----------------------------------------------- </p><p></p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/slides\/introduction\/day01\/"},{value:"Day 2: Commit, push, fork, and merge",label:"<p>Welcome to class! Announcements Practice with Git GQ3: add, commit, push and a little pull Let\u0026rsquo;s save the changes we\u0026rsquo;ve made to our resume.\nGQ4: Fork and merge Get into groups of 2 or 3. Then follow the steps below:\n fork the other student\u0026rsquo;s resume repository. Now clone that forked repository to your computer. On your local version of the forked repository, do the following:\nA. Create a new file called feedback.md B. Make a few recommendations or notes in the feedback.md file that will help the other student improve his or her resume\nC. add, commit, push your edits\nD. Go to the forked repo on GitHub and check if the feedback.md file shows up online Now, create a pull request to get your edits into the other student\u0026rsquo;s original repo.  Once you\u0026rsquo;ve given another student feedback, accept any pull requests submitted to your own repo. Continue to edit and improve your resume based on the feedback you received.\nGQ5: Fork into byuids-resumes Fork your own resume repository into the BYU-I Data Science Resumes group.\nIf you change your resume after you create this fork, you will have to submit a pull request to make sure the final version of your resume shows up in the group.\nThese instructions will help you create a pull request.\n</p><p></p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/slides\/p6\/d3\/"},{value:"pandas and Altair",label:"<p>For this skill builder, we are exploring some important functions in the package of pandas and Altair. DS programming requires a lot of data wrangling. Using the proper functions, we can create concise and comprehensive codes. You should be exposed to a few functions through the readings this week.\nYou may want to at least scan the readings before beginning this task since this serves as an assessment of your understanding of the assigned readings. A prepared student should be able to finish the exercises within 60 minutes. You should work through it on your own.\nBefore you start Make sure you have installed VS-code, pandas, and Altair on your computer. You can install these packages by typing this line in the terminal:\npip install pandas altair\nOR if you have more than one version of python:\npip3.9 install pandas altair\npip3.9 indicates the version of python you are installing the packages to.\nData import Run the following code to import the data we need for this skill builder:\n# package import import numpy as np import pandas as pd import altair as al # data import dat = pd.read_csv(\u0026#34;https:\/\/vincentarelbundock.github.io\/Rdatasets\/csv\/AER\/Guns.csv\u0026#34;) Make sure the variable dat is correctly assigned in your environment and finish the following exercises. You can read the documentation of the data on this page - https:\/\/vincentarelbundock.github.io\/Rdatasets\/doc\/AER\/Guns.html\nExercise 1 One of the first things we can do to a freshly imported data is to check its columns. This will help us understand the basic structure of the dataframe(table).\n Using one line of code, select all the columns in dat, assign it to a variable called col_list.\n  Hint Every dataframe has an attribute \u0022columns\u0022. Accessing this attribute will give you a list of all column names  We often want to know the dimension of a dataframe. How many columns are in the dataset? How many rows are in the dataset?\n Using one line of code, show the number of columns and rows in dat.\n  Hint Every dataframe has an attribute \u0022shape\u0022. Accessing this attribute will give you the dimension of a datafarme  Now run dat.head(). It will print out the first 5 rows of data in dat.\n Just from looking at the output, what column(s) seems to be redundant with the row number?\n  Hint There is one column that serves as nothing but a row counter, that columns is redundant.  Exercise 2 After a brief investigation of the data, we will clean up the data. By cleaning up, we are trying to filter down dat so this only holds data we need. We will first get rid of the extra column we found in the previous excercise.\n Using one line of code, drop the redundant column using the variable col_list (created in excercise 1)\n  Hint Use `drop()`. Understand what \u0026ldquo;axis\u0026rdquo; is as a parameter of drop().\nYour function should looks like this:\ndat.drop([col_list[_]], axis = _)\nfill the \u0026ldquo;_\u0026quot;\u0026rsquo;s with the correct values and assign the output to dat.\n Don\u0026rsquo;t forget to save the changes in dat. Run dat.head() to make sure the column is dropped in dat.\nExercise 3 We have filtered dat vertically by dropping a column. Now we will try to filter dat horizontally, meaning we will get rid of some the rows.\nWe can do that by applying a condition to dat. A condition is an expression that can be evaluated as True\/False. For example, 8 \u0026gt; 5 is an expression that evaluates to be True. This is trivial because 8 will always be greater than 5.\nRun the code below:\n what is the difference between exp1 and exp2?\n exp1 = 8 \u0026gt; 5 exp2 = dat.violent \u0026lt; 300  Hint Try type() on else variable OR calling else variable.  Run ths code below:\n By putting dat.violent \u0026lt; 300, and the violent column from dat into a dataframe, what is the relationship between the two columns?\n exp = pd.DataFrame({\u0026quot;dat.violent \u0026lt; 300\u0026quot; : exp2, \u0026quot;violent value from dat\u0026quot; : dat.violent}) exp  Hint Try computing `dat.violent[n]  Using query(), filter down the dat so that it only contains the data for idaho\n  Hint query() takes in expressions and filters down data.  Don\u0026rsquo;t forget to save the changes in dat. Run dat.shape() to make sure the there are 23 rows and 13 columns.\nExercise 4 Besides filtering, we can manipulate the data by adding new data to it. By adding a new column to the data, we assign a new value to each row.\n Using assign(), create a new column that show the ratio between murder rate and violent rate.\n  Hint Use assign() You see get the ratio by computing this code:\ndat.murder\/dat.violent\n Exercise 5  Create a scatter plot that shows the relationship between murder rate and violent rate for the state of Idaho. Your chart should show murder rate as the x-axis, violent as the y-axis.\n  Hint Can you mimic this plot? (https:\/\/altair-viz.github.io\/gallery\/scatter_tooltips.html)\n  For an extra push Exercise 6  Using a line of code, filter down the data set so that it only shows the data in years between 1993 and 1997.\n Exercise 7  Create a line chart that show prisoners numbers for the state of Idaho, Utah, and Oregon.\n Your chart should show year as the x-axis, prisoner as the y-axis, states as different colours, along with an appropriate title.\nExercise 8  Without using query(), finshed the data wrangling in question 2,5 and 6.\n After you have completed this skill builder with your team (or on your own) then compare your work to our script    See the script.   </p><p></p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/skill_builders\/pandas_altair\/"},{value:"Project 1: What\u0027s in a name?",label:"<p>Background Early in prehistory, some descriptive names began to be used again and again until they formed a name pool for a particular culture. Parents would choose names from the pool of existing names rather than invent new ones for their children.\nWith the rise of Christianity, certain trends in naming practices manifested. Christians were encouraged to name their children after saints and martyrs of the church. These early Christian names can be found in many cultures today, in various forms. These were spread by early missionaries throughout the Mediterranean basin and Europe.\nBy the Middle Ages, the Christian influence on naming practices was pervasive. Each culture had its pool of names, which were a combination of native names and early Christian names that had been in the language long enough to be considered native. [ref]\nData Download: names_year.csv\nInformation: data.md\nReadings  Python for Data Science (P4DS): Data Visualization P4DS: Graphics for Communication P4DS: Markdown P4DS: 5.2 Filter rows with .query() P4DS: Chapter 10 DataFrame  Optional References  The query method  Questions and Tasks For Project 1 the answer to each question should include a chart and a written response. The years labels on your charts should not include a comma. At least two of your charts must include reference marks.\n How does your name at your birth year compare to its use historically? If you talked to someone named Brittany on the phone, what is your guess of his or her age? What ages would you not guess? Mary, Martha, Peter, and Paul are all Christian names. From 1920 - 2000, compare the name usage of each of the four names. What trends do you notice? Think of a unique name from a famous movie. Plot the usage of that name and see how changes line up with the movie release. Does it look like the movie had an effect on usage?  Deliverables Use this template to submit your Client Report. The template has three sections (for additional details please see the instructional template):\n A short summary that highlights key that describes the results describing insights from metrics of the project and the tools you used (Think “elevator pitch”). Answers to the questions from the \u0026ldquo;Questions and Tasks\u0026rdquo; section above. Each answer should include a written description of your results, code snippets, charts, and tables.  </p><p></p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/projects\/project-1\/"},{value:"Python for Data Science",label:"<p>Python for Data Science is a port of R for Data Science into Python. We are keeping Garrett Grolemund and Hadley Wickham’s writing and examples as much as possible while demonstrating Python instead of R. We have focused on pandas and Altair in our Python code snippets.\nThis book will teach you how to do data science with Python: You’ll learn how to get your data into Python, get it into the most useful structure, transform it, visualise it and model it. In this book, you will find a practicum of skills for data science. Just as a chemist learns how to clean test tubes and stock a lab, you’ll learn how to clean data and draw plots—and many other things besides. These are the skills that allow data science to happen, and here you will find the best practices for doing each of these things with Python. You’ll learn how to use the grammar of graphics, literate programming, and reproducible research to save time. You’ll also learn how to manage cognitive resources to facilitate discoveries when wrangling, visualising, and exploring data.\nInstalling and Importing Packages We want to install the following three packages;\n pandas numpy scikit-learn. The Apple Silicon is still more difficult to get installed. You can use the following links to get it installed - Link 1, Link 2, Link 3.  We can get packages installed for this course using one of the two methods below.\nUsing your terminal # default way pip install numpy pandas scikit-learn If you are using a Mac\n# Mac method with Python 2 and 3 installed pip3 install numpy pandas scikit-learn Using your interactive Python (Jupyter server) import sys !{sys.executable} -m pip install numpy pandas scikit-learn    </p><p></p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/course-materials\/python-for-data-science\/"},{value:"Day 1: Git and Github",label:"<p>Welcome to class! Spiritual Thought Announcements   Project 5 Comment\n Feature Importance and Model discussion    The last day of DSS is next Wednesday, Dec 6th at 6:00PM in STC 394\n  Extra credit for creating and uploading cheat sheet (2 points for projects or checkpoints)\n  Coding Challenge date?\n  The technical aspects of Project 6 will be done mostly in class. Resume prep\/MD outside\n  Git and GitHub \u0026ldquo;Web developers\u0026rsquo; social media platform\u0026rdquo;  This is GitHub, the world’s largest code repository platform online. A platform used by some 50 million software developers to host their coding projects, most of them open-source — meaning others can access their codes and modify them to create better versions if they feel like.\nMost of the internet is produced or hosted on GitHub in the form of code. “What Gmail is to email, GitHub is to writing software,” says Kiran Jonnalagadda, cofounder of HasGeek, a platform to build and discover peer groups. Source\n  Don\u0026rsquo;t: post code for assignments that hundreds of other students have done. Do: post unique code using skills from your classes.  I would also recommend using private repos to manage your course work.\nIs it going to hurt? Answer: Yes.\nIt feels weird at first but quickly becomes second nature. If you plan on taking more data science classes, you should know that DS 350 students are required to submit all coursework via GitHub. This is a major topic in class and office hours for the first two weeks. Then we practically never discuss it again.\nMore bad news. Do you use GitHub to work with other people or to coordinate your own work from multiple computers? If so, after you recover from the initial setup, Git will crush you again with merge conflicts. And this is not one-time pain, this could be a dull ache for a long time.\n Managing a project via Git\/GitHub is much like the Google Doc scenario and enjoys many of the same advantages. It is definitely more complicated than collaborating on a Google Doc, but this puts you in the right mindset. Source\n Step 1: Download and install Follow steps 1-4 of this tutorial.\nThen:\n Request access tothe BYU-I Resumes page at Request Access Respond to the auto-generated email Wait a few minutes for authorization Join our GitHub organization - byuids-resumes.  If you are on a Mac, you may need:  Mac fix with paths Download Xcode and update (10 gig download) VSCode path selection (scroll down to step 1)  Step 2: Create a repository from the resume template and connect to the BYUI Step 3: Publish your resume to GitHub Pages  Go to settings for your repo. Scroll down to the GitHub Pages section. Under source select the box which says None and pick master. Now select the \/docs folder and click save. Copy your site URL at the top of the \/settings\/pages location. Add your link to the About section of your repository. Edit the readme.md in the base repo to not show the resume directions.  Step 4: Clone repo into VS Code Analytics Vidhya reading\nStep 5: Make your resume look good Examples:\n Undergraduate DS resumes Hathaway\u0026rsquo;s resume  You may also find these articles helpful:\n How to Write a Great Data Science Resume How to Build an Effective Data Science Resume How to Write the Perfect Data Scientist Resume  </p><p></p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/slides\/p6\/d2\/"},{value:"JSONs \u0026 missing",label:"<p>UFO Sightings Data Link to json file\nExercise 1 Read in the json file as a pandas dataframe. After reading in the data, you\u0026rsquo;ll want to explore it and gain some intuition. Exploring data is a very important step — the more you know about your data the better! Answer the following questions to gain some insight into this dataset.\n How many rows are there? How many columns? What does a row represent in this dataset? What are the different ways missing values are encoded? How many np.nan in each column?  Some useful code for exploring data\n# Object\/Categorical Columns data.column_name.value_counts(dropna=False) data.column_name.unique() # Numeric Columns data.column_name.describe() # Counting missing values data.isna().sum() # Creates boolean dataframe and sums each column  Exercise 2 After learning different ways our data encodes missing values, now we will neatly manage them. There are many techniques we can use to handle missing values; for example, we can drop all rows that contain a missing value, impute with mean or median, or replace missing values with a new missing category. We will use some of these techniques in this exercise.\n shape_reported - replace missing values with missing string. distance_reported - change -999 values to np.nan. (-999 is a typical way of encoding missing values.) distance_reported - fill in missing values with the mean (imputation) were_you_abducted - replace - string with missing string.  The first 10 rows of your data should look like this after completion of the above steps.\n    city shape_reported distance_reported were_you_abducted estimated_size     0 Ithaca TRIANGLE 8521.9 yes 5033.9   1 Willingboro OTHER 7438.64 no 5781.03   2 Holyoke OVAL 7438.64 no 697203   3 Abilene DISK 7438.64 no 5384.61   4 New York Worlds Fair LIGHT 6615.78 missing 3417.58   5 Valley City DISK 7438.64 no 4280.1   6 Crater Lake CIRCLE 7377.89 no 528289   7 Alma DISK 7438.64 missing 4772.75   8 Eklutna CIGAR 5214.95 no 4534.03   9 Hubbard CYLINDER 8220.34 missing 4653.72    Some useful code for filling in missing data\ndata.column_name.replace(..., ..., inplace=True) data.column_name.fillna(..., inplace=True)  Exercise 3 Create a table that contains the following summary statistics.\n median estimated size by shape mean distance reported by shape count of reports belonging to each shape  Your table should look like this:\n   shape_reported median_est_size mean_distance_reported group_count     CIGAR 5899.68 6520.21 3   CIRCLE 266002 7408.26 2   CYLINDER 4550.58 8039.49 2   DISK 4581.8 7516.39 16   FIREBALL 5407.22 7097.78 3   FLASH 6108.34 7438.64 1   FORMATION 5104.4 8708.32 2   LIGHT 3850.25 7636.09 2   OTHER 4699.4 7473.98 4   OVAL 4943.63 7787.24 4   RECTANGLE 3668.1 6054.62 2   SPHERE 5076.78 7206.55 6   TRIANGLE 5033.9 8521.9 1   missing 250153 7438.64 2    Some useful code for grouping and getting summary statistics\n(data.groupby(...) .agg(..., ..., ...))  Exercise 4 The cities listed below reported their estimated size in square inches, not square feet. Create a new column named estimated_size_sqft in the dataframe, that has all the estimated sizes reported as sqft. (Hint: divide by 144 to go from sqin -\u0026gt; sqft)\n Holyoke Crater Lake Los Angeles San Diego Dallas  The head of your data should look like this.\n    city shape_reported distance_reported were_you_abducted estimated_size estimated_size_sqft     0 Ithaca TRIANGLE 8521.9 yes 5033.9 5033.9   1 Willingboro OTHER 7438.64 no 5781.03 5781.03   2 Holyoke OVAL 7438.64 no 697203 4841.69   3 Abilene DISK 7438.64 no 5384.61 5384.61   4 New York Worlds Fair LIGHT 6615.78 missing 3417.58 3417.58   5 Valley City DISK 7438.64 no 4280.1 4280.1   6 Crater Lake CIRCLE 7377.89 no 528289 3668.68   7 Alma DISK 7438.64 missing 4772.75 4772.75   8 Eklutna CIGAR 5214.95 no 4534.03 4534.03   9 Hubbard CYLINDER 8220.34 missing 4653.72 4653.72    Some useful code to fix the rows reported in sqin\nnp.where(..., # Condition ..., # If condition is true ...) # If condition is false  After you have completed this skill builder with your team (or on your own) then compare your work to our script    See the script.   </p><p></p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/skill_builders\/json_missing\/"},{value:"Project 2: Late flights and missing data (JSON files)",label:"<p>Background Delayed flights are not something most people look forward to. In the best case scenario you may only wait a few extra minutes for the plane to be cleaned. However, those few minutes can stretch into hours if a mechanical issue is discovered or a storm develops. Arriving hours late may result in you missing a connecting flight, job interview, or your best friend’s wedding.\nIn 2003 the Bureau of Transportation Statistics (BTS) began collecting data on the causes of delayed flights. The categories they use are Air Carrier, National Aviation System, Weather, Late-Arriving Aircraft, and Security. You can visit the BTS website to read definitions of these categories.\nThe JSON file for this project contains information on delays at 7 airports over 10 years. Your task is to clean the data, search for insights about flight delays, and communicate your results using the provided template. If you have completed the checkpoints for Unit 5, then you are ready to answer the Grand Questions listed below. Refer to the readings for additional help.\nData Download: JSON File\nInformation: Data Description\nReadings  P4DS: Section 12.1 \u0026amp; 12.2 Tidy data P4DS: Chapter 5 Data transformation P4DS: Section 7.4 Missing Values Python Data Science Handbook: Missing Data Wikipedia Missing Data  Optional References  isin method where method np.where method replace method An introduction to JSON (May need to open in ingognito to read.) The key word in \u0026lsquo;Data Science\u0026rsquo; is not Data\u0026hellip; How to Handle Missing Data (May need to open in ingognito to read.)  Questions and Tasks   Which airport has the worst delays? Discuss the metric you chose, and why you chose it to determine the “worst” airport. Your answer should include a summary table that lists (for each airport) the total number of flights, total number of delayed flights, proportion of delayed flights, and average delay time in hours.\n  What is the best month to fly if you want to avoid delays of any length? Discuss the metric you chose and why you chose it to calculate your answer. Include one chart to help support your answer, with the x-axis ordered by month. (To answer this question, you will need to remove any rows that are missing the Month variable.)\n  According to the BTS website, the “Weather” category only accounts for severe weather delays. Mild weather delays are not counted in the “Weather” category, but are actually included in both the “NAS” and “Late-Arriving Aircraft” categories. Your job is to create a new column that calculates the total number of flights delayed by weather (both severe and mild). You will need to replace all the missing values in the Late Aircraft variable with the mean. Show your work by printing the first 5 rows of data in a table. Use these three rules for your calculations:__\n 100% of delayed flights in the Weather category are due to weather  30% of all delayed flights in the Late-Arriving category are due to weather. From April to August, 40% of delayed flights in the NAS category are due to weather. The rest of the months, the proportion rises to 65%.    Using the new weather variable calculated above, create a barplot showing the proportion of all flights that are delayed by weather at each airport. Discuss what you learn from this graph.\n  Fix all of the varied missing data types in the data to be consistent (all missing values should be displayed as “NaN”). In your report include one record example (one row) from your new data, in the raw JSON format. Your example should display the \u0026ldquo;NaN\u0026rdquo; for at least one missing value.__\n  Deliverables Use this template to submit your Client Report. The template has three sections (for additional details please see the instructional template):\n A short summary that highlights key that describes the results describing insights from metrics of the project and the tools you used (Think “elevator pitch”). Answers to the questions from the \u0026ldquo;Questions and Tasks\u0026rdquo; section above. Each answer should include a written description of your results, code snippets, charts, and tables.  </p><p></p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/projects\/project-2\/"},{value:"Machine Learning",label:"<p>Introduction Everyone seems to have a slightly different take on the differences between Artificial Intelligence, Machine Learning, and Data Science. The following four articles cover some of the most common definitions.\nAs you read them, think about the differences and similarities of the definitions. Given the backgrounds of the various authors, whose opinions might you give more weight to?\n Michael Copeland writing for NVidia Bernard Marr writing for Forbes Vincent Granville writing for Data Science Central Simply Statistics Blog - The key word in \u0026ldquo;Data Science\u0026rdquo; is not Data, it is Science  Of particular note is this quote from the Granville article:\n Earlier in my career (circa 1990) I worked on image remote sensing technology, among other things to identify patterns (or shapes or features, for instance lakes) in satellite images and to perform image segmentation: at that time my research was labeled as computational statistics, but the people doing the exact same thing in the computer science department next door in my home university, called their research artificial intelligence. Today, it would be called data science or artificial intelligence, the sub-domains being signal processing, computer vision or IoT.\n As with most things in the realm of science, there tends to be a wide gap between how the media, government, and business sectors view a particular technology compared to how it\u0026rsquo;s viewed by the engineers and scientists using that technology.\nFor our purposes in this course, we\u0026rsquo;ll define these terms as follows:\n Artificial Intelligence: The study of man-made \u0026ldquo;agents\u0026rdquo; that perceive their environment and take actions that maximize their chances of success at some goal.1\nMachine Learning: A subfield within Artificial Intelligence that gives \u0026ldquo;computers the ability to learn without being explicitly programmed.\u0026quot;2\nData Science: The study and use of the techniques, statistics, algorithms, and tools needed to extract knowledge and insights from data.3\n MORAVEC\u0026rsquo;S PARADOX In the 1980\u0026rsquo;s, Hans Moravec made the following observation, which came to be known as Moravec\u0026rsquo;s Paradox:\n \u0026hellip;as the number of demonstrations has mounted, it has become clear that it is comparatively easy to make computers exhibit adult-level performance in solving problems on intelligence tests or playing checkers, and difficult or impossible to give them the skills of a one-year-old when it comes to perception and mobility.4\n So, while AI and machine learning algorithms can accomplish many tasks much better than humans can, any toddler can outperform even the most state-of-the-art neural network in picking out photos of their parents or pet cat.\n5\nEven though Moravec wrote about this over thirty years ago, the same sentiment persists in AI research today. In a 2016 interview, Dr. Sean Holden an AI researcher at Cambridge University, discussed the differences between human intelligence and artificial intelligence:\n “Most AI researchers don’t try to solve the whole problem because it’s too hard. They take some specific problem and do it better. That’s not to say that the way humans think isn’t useful to AI, but working out how brains do things is hard. And there’s a difference in scale. Brains are doing things that are in some senses quite different from what AI researchers are currently attacking – I’d be ecstatic, for example, if I could build a robot that could put on a duvet cover.”6\n Dr. Fumiya Iida, from the Machine Intelligence Lab at Cambridge, adds:\n “We have hundreds of thousands of muscles in our body, so how can the brain control this? A computer can’t. Every fraction of a second you have to co-ordinate hundreds of muscles just to grab a cup, for example.”6\n PREDICTION VS. INFERENCE In machine learning, we are typically interested in doing one of two things: making inferences, or making predictions.\n Inference: Given a set of data you want to infer how the output is generated as a function of the data.\nPrediction: Given a new measurement, you want to use an existing data set to build a model that reliably chooses the correct identifier from a set of outcomes.7\n This example explains the differences between those two goals:\n Inference: You want to find out what the effect of Age, Passenger Class and, Gender has on surviving the Titanic Disaster. You can put up a logistic regression and infer the effect each passenger characteristic has on survival rates.\nPrediction: Given some information on a Titanic passenger, you want to choose from the set {lives,dies} and be correct as often as possible.7\n Classification Algorithms Imagine that you\u0026rsquo;re a big fan of comic books. Over the years, you\u0026rsquo;ve read enough Marvel and DC comics that if I asked you to \u0026ldquo;classify\u0026rdquo; which universe Superman belonged to, you\u0026rsquo;d be able to confidently say, \u0026ldquo;The DC Universe\u0026rdquo;.\nOr, let\u0026rsquo;s say you\u0026rsquo;ve eaten a lot of chocolate in your life. If I were to have you close your eyes and take a bite of chocolate, you might be able to accurately tell me if it was white chocolate, milk chocolate, semi-sweet, or dark.\nThese are both classification problems. Based on your prior knowledge or training regarding different groups, you can take an item and sort it into the correct group.\nIn machine learning, classification algorithms, (or classifiers), need to be trained before they can classify things on their own. We can train an algorithm by providing it with lots of examples from each group and telling it which attributes of those samples are important. The more examples we use to train our algorithm, the more accurate the classification of new items will be.\nIn the example below, we’re telling the algorithm “this is what a blue circle looks like\u0026rdquo;, or \u0026ldquo;this is what a green circle looks like\u0026rdquo;, etc\u0026hellip;\nOnce an algorithm has been trained, we can see how well it performs by providing it with test data consisting of new items it hasn\u0026rsquo;t seen yet, and checking to see if it can correctly predict which group the new items belong to.\nThe Iris Dataset ABOUT THE DATA For this example, we will use Fisher\u0026rsquo;s Iris Data.\nThe Iris dataset contains the length and width of the sepals and petals from 150 iris flowers across three different species of iris: Iris setosa, Iris versicolor, and Iris virginica.\nEach row in the Iris dataset represents the measurements of a single flower. We refer to each of these as a sample, observation, or instance.\nEach column in the Iris dataset represents a particular thing being measured about each flower. From left to right we have (in centimeters) the sepal length, the sepal width, the petal length, and the petal width. Each of these is referred to as a feature, attribute, measurement, or dimension.\nThe final column in the dataset is the species of the flower. This final column is often referred to as the target or class of the sample.\nClassifiers Classifier algorithms generally follow the same set of steps. Our goal is to create a classifier that can be provided with the measurements of petals and sepals, and then use that information to predict the species of iris flower we\u0026rsquo;re measuring.\nLoad data The first thing we need to do is load our data. In most cases, there is some pre-processing that has to be done on the data in order to get it to the point where we can start working with it. Often you will need to normalize and encode variables.\n Normalization reading Encoding reading  In this case however, the data is provided to you in the exact format you need:\n sepal_length sepal_width petal_length petal_width species 0 5.1 3.5 1.4 0.2 Iris-setosa 1 4.9 3.0 1.4 0.2 Iris-setosa 2 4.7 3.2 1.3 0.2 Iris-setosa 3 4.6 3.1 1.5 0.2 Iris-setosa 4 5.0 3.6 1.4 0.2 Iris-setosa .. ... ... ... ... ... 145 6.7 3.0 5.2 2.3 Iris-virginica 146 6.3 2.5 5.0 1.9 Iris-virginica 147 6.5 3.0 5.2 2.0 Iris-virginica 148 6.2 3.4 5.4 2.3 Iris-virginica 149 5.9 3.0 5.1 1.8 Iris-virginica The csv file for the iris data can be found here. There are many ways to load data from a csv file, but one handy way is to use the read_csv function from the Pandas library:\nimport pandas as pd url = \u0026quot;https:\/\/byuistats.github.io\/DS250-Course\/skill_builders\/ml_sklearn\/machine_learning.csv\u0026quot; data = pd.read_csv(url) Split data Next, we\u0026rsquo;ll randomly divide all of the samples into two groups. The first group will consist of our training data, or the samples we\u0026rsquo;ll use to train our classifier. The second group will consist of our test data, the data we\u0026rsquo;ll use to test our classifier.\nThere are many ways to do this, but if have our features (sepal and petal measurements) and targets (species names) in separate arrays, we can use the train_test_split function of the sklearn library to do this for us:\nNote, that if you use pandas to load the csv file, you\u0026rsquo;ll have the data in a single pandas Data Frame. At some point you\u0026rsquo;ll need to split that data frame into two numpy arrays, one containing the features, and the other containing the targets.\nTake a look at the Indexing and Selecting Data page in the Pandas user guide for more details and splitting the data, and the to_numpy function for converting to a numpy array.\nNotice the transformation can be completed before the data is divided into test and training sets. Two numpy arrays can be passed to the train_test_split function to get two sets of arrays back. Alternatively, the data frame can be passed to the test_train_split, and then the test and training data is split into their feature and target components.\nThe following examples assume you\u0026rsquo;ve split the data into features and targets before passing it to test_train_split.\nfrom sklearn.model_selection import train_test_split # features = ... select the feature columns from the data frame # targets = ... select the target column from the data frame # Randomize and split the samples into two groups. # 30% of the samples will be used for testing. # The other 70% will be used for training. train_data, test_data, train_targets, test_targets = train_test_split(features, targets, test_size=.3) You could also use python\u0026rsquo;s built in libraries to randomly shuffle the data, and then use array slicing to split the data into test and training subsets. However if you do, make sure you do it in such a way that you still know which species goes with each set of measurements.\nTrain classifier By providing the algorithm with training data, we allow it to create relationships between the features of a sample and its class. In the case of the Iris data set, we\u0026rsquo;re training our algorithm on how a given set of sepal and petal measurements correlate to the flower\u0026rsquo;s species.\nsklearn has a classifier called GaussianNB which we can use to demonstrate this. GaussianNB is a \u0026ldquo;Naïve Bayes\u0026rdquo; classifier that assumes two things about our data:\n  That the underlying features follow a continuous, normal distribution. (The Gaussian part) That each feature is statistically independent of every other feature. (The Naïve part)   Do you think both of these assumptions are true for the Iris data?\nTo train our classifier, first we create an instance of it, then we use the fit method to teach it about our data:\nfrom sklearn.naive_bayes import GaussianNB classifier = GaussianNB() classifier.fit(train_data, train_targets) Test classifier Now that our classifier has been trained on how to classify iris flowers, it\u0026rsquo;s time to test it to see if it can correctly predict the species of flower from a set of measurements.\nNote that it\u0026rsquo;s very important when testing our algorithm that we only test it on data that was not used to train it. Otherwise, we\u0026rsquo;re only testing it\u0026rsquo;s ability to remember training data. This is why we split the data into two groups.\nTo test our classifier, we\u0026rsquo;ll use the predict method and provide it with our test data. This method will return a list of predicted targets, one for each sample in the test data.\nIn our case, we\u0026rsquo;ll give it a list of petal and sepal measurements it has never seen before, and it will return a list of species predictions, on prediction for each sample in our test data:\ntargets_predicted = classifier.predict(test_data) Assess classifier performance Since we already know which type of iris each sample in the test data corresponds to, we can compare the predictions made by the classifier to the sample\u0026rsquo;s actual species and calculate how well our algorithm performs.\nIf m is the number of correct predictions made, and n is the total number of samples in our test data, then accuracy can be calculated as:\naccuracy = m\/n\nSo if our test data has 20 samples and the classifier predicts the correct flower species for 15 of them, then we would say our algorithm has an accuracy of 75%.\n(Note that accuracy isn\u0026rsquo;t the best metric to use for evaluating classification algorithms. We\u0026rsquo;ll be looking at a few alternatives in the future.)\nSummary To summarize: we take our dataset and divide it in two parts: training data and test data. We use the training data to train the classifier to make classifications, then we use the test data to test how well our classifier performs.\nIf we have a classifier that performs well, we can use it with new data, samples whose groups we don\u0026rsquo;t know ahead of time, and the accuracy metric will give us some idea of how reliable those predictions are.\nIf our classifier performs poorly, we either need to provide it with more training data, modify or replace it, or select a different set of attributes to use as features.\nCSE 450: Machine Learning \u0026amp; Data Mining is the class were you can build depth in Machine Learning and it\u0026rsquo;s applications.\nREFERENCES   Artificial Intelligence: A Modern Approach by Russell and Norvig (Prentice Hall, 2009).↩ \u0026#x21a9;\u0026#xfe0e;\n Some Studies in Machine Learning Using the Game of Checkers, by Arthur L. Samuel (IBM Journal, Vol 3, No 3, 1959).↩ \u0026#x21a9;\u0026#xfe0e;\n Wikipedia article on Data Science.↩ \u0026#x21a9;\u0026#xfe0e;\n Mind Children, by Hans Moravec (Harvard University Press, 1988).↩ \u0026#x21a9;\u0026#xfe0e;\n XKCD 1425: Tasks.↩ \u0026#x21a9;\u0026#xfe0e;\n Cambridge Alumni Magazine, Issue 79, pg 19.↩ \u0026#x21a9;\u0026#xfe0e;\n Cross Validated: Prediction vs Inference.↩ \u0026#x21a9;\u0026#xfe0e;\n   </p><p></p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/course-materials\/machine-learning\/"},{value:"Project 3: Finding relationships in baseball.",label:"<p>Background When you hear the word “relationship” what is the first thing that comes to mind? Probably not baseball. But a relationship is simply a way to describe how two or more objects are connected. There are many relationships in baseball such as those between teams and managers, players and salaries, even stadiums and concession prices. The graphs on Data Visualizations from Best Tickets show many other relationships that exist in baseball.\nFor this project, your client would like developed SQL queries that they can use to retrieve data for use on their website without needing Python. They would also like to see example Altair charts.\nData Data Conection: lahmansbaseballdb\nConnection Instructions: See SQL for Data Science\nReadings  SQL for Data Science Readings (read all links)  Optional References  Why SQL is beating NoSQL, and what this means for the future of data Lahman Data Dictionary  Questions and Tasks   Write an SQL query to create a new dataframe about baseball players who attended BYU-Idaho. The new table should contain five columns: playerID, schoolID, salary, and the yearID\/teamID associated with each salary. Order the table by salary (highest to lowest) and print out the table in your report.\n  This three-part question requires you to calculate batting average (number of hits divided by the number of at-bats)\n Write an SQL query that provides playerID, yearID, and batting average for players with at least 1 at bat that year. Sort the table from highest batting average to lowest, and then by playerid alphabetically. Show the top 5 results in your report. Use the same query as above, but only include players with at least 10 at bats that year. Print the top 5 results. Now calculate the batting average for players over their entire careers (all years combined). Only include players with at least 100 at bats, and print the top 5 results.    Pick any two baseball teams and compare them using a metric of your choice (average salary, home runs, number of wins, etc). Write an SQL query to get the data you need, then make a graph in Altair to visualize the comparison. What do you learn?\n  Deliverables Use this template to submit your Client Report. The template has three sections (for additional details please see the instructional template):\n A short summary that highlights key that describes the results describing insights from metrics of the project and the tools you used (Think “elevator pitch”). Answers to the questions from the \u0026ldquo;Questions and Tasks\u0026rdquo; section above. Each answer should include a written description of your results, code snippets, charts, and tables.  </p><p></p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/projects\/project-3\/"},{value:"SQL \u0026 databases",label:"<p>Skill builder (relational database) For this skill builder, we are exploring some important topics in relational databases. This exercise will require you to create SQL queries through python. You may want to at least scan the readings before beginning this task since this serves as an assessment of your understanding of the assigned readings.\nA competent student should be able to finish the exercises within 75 minutes.\nBefore you start Make sure you have installed VS-code, pandas, and Altair on your computer.\nAlso make sure you have gone through the tutorial on under course materials called SQL for Data Science: we assume that you have a connection to your data.\nExercise 1 Readme file A database can consist of more than one table\/data set. A relational database consists of tables\/data sets that share columns. These shared columns then establish the relationship between the tables, thus the name relational database. The relations are sometimes not easily found and they require careful investigations.\nTo understand what is in a relational database, we can start with understanding the tables and the columns within.\nHere is a link to the readme file of the baseball database.\n What is the name of the table that records data about pitchers in the regular seasons?\n  What do the HR and HBP columns mean in that table respectively?\n Excercise 2 SELECT and FROM The simplest SQL query is a query with SELECT and FROM. These are the keywords you will see again and again in SQL. Usually, when constructing a more complex query, it is easier to identify what goes into these two clauses first.\n Create a query that shows all columns from the table you found in Exercise 1, save the dataframe in a variable \u0026ldquo;pitch\u0026rdquo;\n You script should look something like:\nresult = pd.read_sql_query( \u0027SELECT _______ FROM _______\u0027, con) results Excercise 2 WHERE The WHERE keyword allows us to filter down the table horizontally (fewer rows).\nIt goes after SELECT and FROM.\n Using a SQL query, select all rows in the same table where HR is lesser than 10 and gs is greater than 25.\n  Find out what the columns mean and explain your query in words\n Excercise 3 ORDER BY ORDER BY sort the table you select by one or more columns and goes after WHERE\n Using the same query in exercise 2, edit it so that the table is ordered by the year of the season(nearest to furthermost) and the player ID(alphabetically).\n Excercise 4 Joins Joins are used when you wish to create a new table through two different tables. Keep in mind that you have to identify the relationship between two tables before you can correctly join them.\nJOIN goes between FROM and WHERE.\n Identify the shared columns (keys) and join the table in exercise 2 with the salaries table, then filter the data so that it shows only pitchers in the year 1986.\n You should get a dataframe with 306 rows.\nExercise 5 Group by Group by is a keyword we use to lower the level of granularity of a table. Meaning we are combining rows into one by the given column(s).\nCreate a query that captures the number of pitchers the Washington Nationals used in each year, then sort the table by year\nYou should get a dataframe with 23 rows.\nFor the overachievers Excercise 6 Research the order of operations for SQL and put the following keywords in that order.\n SELECT FROM JOIN WHERE HAVING ORDER BY GROUP BY LIMIT  After you have completed this skill builder with your team (or on your own) then compare your work to our script    See the script.   </p><p></p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/skill_builders\/relational_data\/"},{value:"Course Materials",label:"<p>We will be relying on a few resources for this course. You will find the pertinant readings attached to each of the projects. Those readings will be culled from;\n Python for Data Science: A port of R for Data Science using the Python packages pandas and Altair. pandas User Guide Altair User Guide scikit-learn learn User Guide scikit-learn tutorials Python Data Science Handbook A Whirlwind Tour of Python SQL  Wes McKinney\u0026rsquo;s pandas code for his book Python for Data Analysis is a useful reference as well: https:\/\/github.com\/wesm\/pydata-book\n</p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/course-materials\/"},{value:"Machine Learning",label:"<p>Intro to Titanic Machine Learning Skill Builder Link to data\nFor this skill builder, we\u0026rsquo;ll be putting our machine learning hats on. We\u0026rsquo;ll be creating a model that predicts whether a passenger survived. With machine learning, there is a lot of jargon! It can be quite overwhelming at times. This skill builder attempts to keep things basic and simple. With that being said, there are some terms that are important to understand. Let\u0026rsquo;s look at the first few rows of our dataset before proceeding with the definitions.\nThe titanic dataset will be used for examples of each definition.\n   survived pclass sex age siblings_spouses_aboard parents_children_aboard fare     0 3 1 22 1 0 7.25   1 1 0 38 1 0 71.2833   1 3 0 26 0 0 7.925   1 1 0 35 1 0 53.1   0 3 1 35 0 0 8.05    Important Terms:  features: measurable property of the object you\u0026rsquo;re trying to predict. We use this information to predict our target of interest.  Example: pclass, sex, age, siblings_spouses_aboard , parents_children_aboard, fare columns are all examples of different features. Synonyms: attributes, explanatory variables, independent variables, variables, X\u0026rsquo;s, covariates   target: the feature that you are wanting to gain more insight into. The thing you are trying to predict.  Example: in the titanic dataset our target is survived Synonyms: label, dependent variable, y   train set: Usually 70% of the rows from the original dataset are randomly sampled to create this training data. It\u0026rsquo;s used by the algorithm, to determine, or learn, the optimal combinations of variables that will generate a good predictive model  Example: Random sample of 70% of the original titanic dataset rows Synonyms: training data, train data, X_train, y_train   test set: Usually the remaining 30% of the rows in the original dataset are used to create this dataset. The testing data is a set of rows used only to assess the performance (i.e. generalization) of a model. To do this, the final model is used to predict classifications of examples in the test set. Those predictions are compared to the examples\u0026rsquo; true classifications to assess the model\u0026rsquo;s accuracy.  Example: Random sample of 30% of the original titanic dataset rows Synonyms: testing data, test data, X_test, y_test   evaluation metrics: A statistic that tells you how well your predictions align with the actual values. Other words, tells you how good your model is.  Example: Accuracy, Precision, Recall, MSE, MAE, Rsquared Synonyms: performance metric    Again, this is a very light and oversimplified treatment of machine learning. The purpose of this project is to help you understand the main concepts of ml and walk you through the process of building a machine learning model. A simplified work flow of a machine learning project is shown below. Spend some time getting familiar with this flow \u0026amp;mdash as you are about to code it\u0026hellip; Exciting!\nNote in order to do this skill builder you will need to have scikit-learn installed on your machine. Run the following command in your terminal if you haven\u0026rsquo;t already.\npip install scikit-learn\nData Link to csv file\nExercise 0 (Imports and Loading in data) # Loading in packages import pandas as pd import numpy as np import altair as alt from sklearn.model_selection import train_test_split from sklearn.ensemble import RandomForestClassifier from sklearn.metrics import accuracy_score # Loading in data data = pd.read_csv(___)  Exercise 1 Create a chart exploring the relationship between age and survived in the titanic dataset. A strip plot, density plot, or boxplot might be useful here. Below is an example of a density plot. Feel free to replicate this chart or create your own.\nThe purpose of making this chart is to explore the relationships between a feature and the target. We want to see if the feature contains predictive information about the target. This is a large part of machine learning called Exploratory Data Analysis that should never be skipped! Spend time getting to know your features and how they interact with other features and the target.\n Exercise 2 Build a random forest model that is able to predict whether a passenger survived. This exercise is the bulk of the skill builder and contains several steps.\nStep 0: Split the data into X and y variables The X variable will contain all your features\n# Removes the target and keeps all features X = data.drop(___, axis=1) The y variable will hold the target\n# Selects the target column y = data[\u0026#39;___\u0026#39;] Step 1: Split data into train and test sets The train_test_split function is useful for this task. Review the train_test_split function documentation\n# Splitting X and y variables into train and test sets using stratified sampling X_train, X_test, y_train, y_test = train_test_split(___, ___, test_size=0.3, random_state=24, stratify=y) Step 2: Train the model Explore the RandomForestClassifier documentation for the RandomForestClassifier. It\u0026rsquo;s not necessary to understand the inner workings of the Random Forest algorithm for this class - just learn the syntax of fitting the model.\n# Creating random forest object rf = RandomForestClassifier(random_state=24) # Fit with the training data rf.fit(___, ___) Step 3: Use test set to make predictions # Using the features in the test set to make predictions y_pred = rf.predict(___) Step 4: Compare test set predictions to actual values. Calculate the accuracy. # Comparing predictions to actual values accuracy_score(___, ___)  Exercise 3 What is the most important feature in making predictions? Why do you think this is?\nCreate a table that shows the feature importances in descending order. The random forest classifier has a feature importances attribute. It can be accessed by rf.feature_importances_. The table should look something like this.\n   feature names importances     fare 0.288051   sex 0.281853   age 0.266491   pclass 0.0814224   siblings_spouses_aboard 0.0475633   parents_children_aboard 0.034619    After you have completed this skill builder with your team (or on your own) then compare your work to our script    See the script.   </p><p></p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/skill_builders\/ml_sklearn\/"},{value:"Project 4: Can you predict that?",label:"<p>Background The clean air act of 1970 was the beginning of the end for the use of asbestos in home building. By 1976, the U.S. Environmental Protection Agency (EPA) was given authority to restrict the use of asbestos in paint. Homes built during and before this period are known to have materials with asbestos YOu can read more about this ban.\nThe state of Colorado has a large portion of their residential dwelling data that is missing the year built and they would like you to build a predictive model that can classify if a house is built pre 1980.\nColorado gave you home sales data for the city of Denver from 2013 on which to train your model. They said all the column names should be descriptive enough for your modeling and that they would like you to use the latest machine learning methods.\nData Download: dwellings_denver.csv, dwellings_ml.csv, dwellings_neighborhoods_ml.csv\nInformation: Data description\nReadings  Machine Learning Introduction A visual introduction to machine learning How to choose a good evaluation metric for your Machine learning model  Optional References  Decision Tree Classification in Python Boosted algorithms in scikit-learn scikit-plot package  Grand Questions  Create 2-3 charts that evaluate potential relationships between the home variables and before1980. Explain what you learn from the charts that could help a machine learning algorithm. Build a classification model labeling houses as being built “before 1980” or “during or after 1980”. Your goal is to reach or exceed 90% accuracy. Explain your final model choice (algorithm, tuning parameters, etc) and describe what other models you tried. Justify your classification model by discussing the most important features selected by your model. This discussion should include a chart and a description of the features. Describe the quality of your classification model using 2-3 different evaluation metrics. You also need to explain how to interpret each of the evaluation metrics you use.  Deliverables Use this template to submit your Client Report. The template has three sections (for additional details please see the instructional template):\n A short summary that highlights key that describes the results describing insights from metrics of the project and the tools you used (Think “elevator pitch”). Answers to the questions from the \u0026ldquo;Questions and Tasks\u0026rdquo; section above. Each answer should include a written description of your results, code snippets, charts, and tables.  </p><p></p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/projects\/project-4\/"},{value:"Project 5: The war with Star Wars",label:"<p>Background Survey data is notoriously difficult to munge. Even when the data is recorded cleanly the options for ‘write in questions’, ‘choose from multiple answers’, ‘pick all that are right’, and ‘multiple choice questions’ makes storing the data in a tidy format difficult.\nIn 2014, FiveThirtyEight surveyed over 1000 people to write the article titled, America’s Favorite ‘Star Wars’ Movies (And Least Favorite Characters). They have provided the data on GitHub.\nFor this project, your client would like to use the Star Wars survey data to figure out if they can predict an interviewing job candidate’s current income based on a few responses about Star Wars movies.\nData Download: StarWars.csv\nInformation: Article\nReadings  Python for Data Science: Tidy Data Python for Data Science: Graphics for Communication Python for Data Science: Strings  Questions and Tasks  Shorten the column names and clean them up for easier use with pandas. Provide a table or list that exemplifies how you fixed the names. Clean and format the data so that it can be used in a machine learning model. As you format the data, you should complete each item listed below. In your final report provide example(s) of the reformatted data with a short description of the changes made. Filter the dataset to respondents that have seen at least one film. Create a new column that converts the age ranges to a single number. Drop the age range categorical column. Create a new column that converts the education groupings to a single number. Drop the school categorical column Create a new column that converts the income ranges to a single number. Drop the income range categorical column. Create your target (also known as “y” or “label”) column based on the new income range column. One-hot encode all remaining categorical columns.   Validate that the data provided on GitHub lines up with the article by recreating 2 of the visuals from the article. Build a machine learning model that predicts whether a person makes more than $50k. Describe your model and report the accuracy.  Deliverables Use this template to submit your Client Report. The template has three sections (for additional details please see the instructional template):\n A short summary that highlights key that describes the results describing insights from metrics of the project and the tools you used (Think “elevator pitch”). Answers to the questions from the \u0026ldquo;Questions and Tasks\u0026rdquo; section above. Each answer should include a written description of your results, code snippets, charts, and tables.  </p><p></p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/projects\/project-5\/"},{value:"SQL for Data Science",label:"<p>There are many flavors of SQL but most flavors have the same base commands. SQL queries are typed in the following pattern;\nSELECT -- \u0026lt;columns\u0026gt; and \u0026lt;column calculations\u0026gt; FROM -- \u0026lt;table name\u0026gt;  JOIN -- \u0026lt;table name\u0026gt;  ON -- \u0026lt;columns to join\u0026gt; WHERE -- \u0026lt;filter condition on rows\u0026gt; GROUP BY -- \u0026lt;subsets for column calculations\u0026gt; HAVING -- \u0026lt;filter conditions on groups\u0026gt; ORDER BY -- \u0026lt;how the output is returned in sequence\u0026gt; LIMIT -- \u0026lt;number of rows to return\u0026gt; Introductory SQL links  SQL Guide SELECT and FROM clauses WHERE and comparison operators ORDER BY Joins Aggregations GROUP BY  import pandas as pd import altair as alt import numpy as np import sqlite3 # %% # careful to list your path to the file. sqlite_file = \u0026#39;lahmansbaseballdb.sqlite\u0026#39; con = sqlite3.connect(sqlite_file) results = pd.read_sql_query( \u0026#39;SELECT * FROM allstarfull LIMIT 5\u0026#39;, con) results You can see the list of tables available in the database;\ntable = pd.read_sql_query( \u0026#34;SELECT * FROM sqlite_master WHERE type=\u0026#39;table\u0026#39;\u0026#34;, con) print(table.filter([\u0026#39;name\u0026#39;])) print(\u0026#39;\\n\\n\u0026#39;) # 8 is collegeplaying print(table.sql[8]) </p><p></p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/course-materials\/sql-for-data-science\/"},{value:"Munging data",label:"<p>Intro to cleaning movies data Link to the data\nThis skill builder focuses on munging (formatting) data into a machine learning ready dataset. We will be using an IMDB Ratings dataset. It contains columns that are categorical. Sklearn cannot handle columns that are strings, so we need to convert these into a numerical representation. We accomplish this by either one hot encoding, label encoding, or taking just one value of the range provided. There are many other ways to represent these columns as numbers, but they are beyond the scope of this course.\nOnce you\u0026rsquo;ve converted all columns to numeric, in an intelligent way, you will be asked to recreate a graph using altair. Here is the head of the data you will be working with. Enjoy!\n   star_rating content_rating genre duration box_office_rev major_hit     9.3 R Crime 142 €1924521976 - €1925521976 no   9.2 R Crime 175 €177034987 - €178034987 no   9.1 R Crime 200 €2617541398 - €2618541398 no   9 PG-13 Action 152 €996115723 - €997115723 no   8.9 R Crime 154 €1172054364 - €1173054364 no    Data Link to csv file: ...\n Exercise 0  Grab the high range value for each movie and put it into a new column called high_range_rev.  Make sure the data type of this new column is numeric!!   Remove the box_office_rev column from the dataset.  The .str.split() and .astype() methods might be of use! Also, to get the euro sign just copy it from here, €, and put it in your code.\nThe first 5 rows of the resulting dataframe should look like this\n   star_rating content_rating genre duration major_hit high_range_rev     9.3 R Crime 142 no 2345444803   9.2 R Crime 175 no 2182412593   9.1 R Crime 200 no 1604872807   9 PG-13 Action 152 no 284317976   8.9 R Crime 154 yes 1791932201     Exercise 1 Convert the major_hit column to 1\/0\u0026rsquo;s. yes -\u0026gt; 1 and no -\u0026gt; 0. Again, there are several ways to accomplish this. Using our old friend np.where is probably the easiest though.\nThe first 5 rows of the resulting dataframe should like this\n   star_rating content_rating genre duration major_hit high_range_rev     9.3 R Crime 142 0 1925521976   9.2 R Crime 175 0 178034987   9.1 R Crime 200 0 2618541398   9 PG-13 Action 152 0 997115723   8.9 R Crime 154 0 1173054364     Exercise 2 Convert the content_rating column using label encoding. We\u0026rsquo;re using label encoding in this case because the movie ratings already have a natural ordering to them. We will replace each rating with a number in it\u0026rsquo;s natural ascending order.\nTo be more specific, here is how we will do it.\n G: 0 PG: 1 PG-13: 2 R: 3  A dictionary and the .map() method could be useful for this exercise. There are other ways of tackling this problem though. Be creative!\nThe first 5 rows of the resulting dataframe should look like\n   star_rating content_rating genre duration major_hit high_range_rev     9.3 3 Crime 142 0 1925521976   9.2 3 Crime 175 0 178034987   9.1 3 Crime 200 0 2618541398   9 2 Action 152 0 997115723   8.9 3 Crime 154 0 1173054364     Exercise 3 The last column that we need to take care of is genre. We will use one hot encoding for this. Make sure to ONLY one hot encode the genre column!\nA useful function for one hot encoding is pd.get_dummies(). I recommend checking out the documentation.\nThe resulting dataframe should look like the following example; don\u0026rsquo;t worry if your high_range_rev column turned into scientific notation—Pandas does this sometimes.\n    star_rating content_rating duration major_hit high_range_rev genre_Action genre_Adventure genre_Animation genre_Biography genre_Comedy genre_Crime genre_Drama genre_Family genre_Fantasy genre_Horror genre_Mystery genre_Sci-Fi genre_Thriller genre_Western     0 9.3 3 142 0 1.92552e\u002b09 0 0 0 0 0 1 0 0 0 0 0 0 0 0   1 9.2 3 175 0 1.78035e\u002b08 0 0 0 0 0 1 0 0 0 0 0 0 0 0   2 9.1 3 200 0 2.61854e\u002b09 0 0 0 0 0 1 0 0 0 0 0 0 0 0   3 9 2 152 0 9.97116e\u002b08 1 0 0 0 0 0 0 0 0 0 0 0 0 0   4 8.9 3 154 0 1.17305e\u002b09 0 0 0 0 0 1 0 0 0 0 0 0 0 0     Exercise 4 Recreate this graph as best you can. You\u0026rsquo;ll need to use the original data that specifies the actual rating.\nAfter you have completed this skill builder with your team (or on your own) then compare your work to our script    See the script.   </p><p></p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/skill_builders\/munging\/"},{value:"Project 6: Git your resume online",label:"<p>Background GitHub is an online platform where data scientists and developers can communicate and share work. As students, you will want to curate your creative work on GitHub using a program called Git. GitHub is the place to share your original work, not your homework assignments.\nMany people store their personal websites, blogs, and project websites on GitHub. Our textbook and course are hosted on GitHub, and you can see J. Hathaway\u0026rsquo;s or Ryan Hafen\u0026rsquo;s personal Data Science websites that are hosted on GitHub as well. For this project, you will be making a public resume that will be hosted on GitHub.\nDuring this project you will learn the process of Git and the tools of GitHub. We will use Git to have others in our class to edit your resume. Take the process seriously (pick a suitable username and write a good resume), and you will have the beginning of your social presence in the DS\/CS space.\nData Repository: Markdown Resume (mdresume) Repository\nInformation: BYUI Data Science Resumes\nReadings  New to Git and GitHub? This Essential Beginners Guide is for you Git vs. GitHub: What is the difference between them? Using Version Control in VS Code Git in Visual Studio Code video  Questions and Tasks  Join GitHub. Pick a username you would be ok sharing with a potential employer. Join the BYUI Data Science Resumes GitHub organization and use the template repository to make a resume repository under your own GitHub account. A good name might be “Lastname-Resume” Clone your repository to your computer and build a first draft of your resume. Include a link to your resume in the \u0026ldquo;About\u0026rdquo; page. In Canvas, submit the live link to your resume\u0026rsquo;s website hosted in Github.   </p><p></p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/projects\/project-6\/"},{value:"VS Code for Data Science",label:"<p> What if my interactive Python window in VS Code is not using the same version of Python as my terminal?    You can set your Python version in VS Code by opening a .py script and then clicking on the Python text in the bottom left corner as shown below.\nOnce you click, VS Code will open the command pallete where you can select your installation of Python that you would like to use with this workspace.\nThis setting will not fix what version your interactive Python window is using. You can get there by opening settings by using the ⌘, shortcut.\nYou can then search your settings for jupyter and you should see a section that has Jupyter Command Line Arguments. Click on the Edit in settings.json.\nHere you can set the jupyter path to Python to match the one you picked for your Terminal. An example for a Mac computer is shown below.\n\t\u0026quot;python.pythonPath\u0026quot;: \u0026quot;\/usr\/local\/opt\/python\/bin\/python3\u0026quot;,    What if I am not able to read in files from the GitHub links using read_csv()?    Most likely your Python SSl certificates are not installed. Follow the answer in this post   How do I use VS Code to collaborate?    Microsft\u0026rsquo;s Live Share extension documentation says, \u0026lsquo;Live Share enables you to quickly collaborate with a friend, classmate, or professor on the same code without the need to sync code or to configure the same development tools, settings, or environment.\u0027 You can follow their guide or use our course created video.\n     </p><p></p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/course-materials\/vs-code\/"},{value:"Altair for Charts",label:"<p>Altair Visualization We will be using Altair in our course. It is a declarative visualization package in Python that is based on Vega-Lite which leverages the grammar of graphics.\n User Guide Data Visualization Curriculum or the Quarto version https:\/\/jjallaire.github.io\/visualization-curriculum\/) P4DS Data Visualization Chapter  Rendering Altair Charts in Quarto We use Quarto to render Altair images automagically into our HTML reports. The process should simply work.\nHowever, read in the following section IF you need to export one of your images as a .png or another image format. Saving Altair Charts Just installing altair and altair_saver will not allow you to leverage the .save() method to save your chart. The javascript visualization you see in your interactive python window needs additional external applications to allow .save(\u0027chart.png\u0027) to work.\nWe will go through a few ways for us to save our Altair plots.\n  1. Saving altair plots programmatically Let\u0026rsquo;s say we want to save the above plot as a PNG file. Assuming we have already installed the altair library, we need to install the altair_saver.\n1.1 Installing the altair_saver Within your interactive python window execute the following command.\nimport sys !{sys.executable} -m pip install altair_saver 1.2 Additional tool for saving plots We suggest NodeJS path. However, you are more than welcome to study Selenium for further understanding. The Github repository for altair_saver, the developers exclusively told us to install additional tools.\nNodeJS Installation\n Install the NodeJS for your platform Run the following in your Terminal (Mac) or PowerShell (Windows) to install all the packages we need from NodeJS.  npm install -g vega-lite vega-cli canvas M1 Mac Altair Solution  Install selenium using the chromedriver package form this link: https:\/\/chromedriver.chromium.org. Unzip the file and move the file to your chrome path \/usr\/local\/bin\/chromedriver  See the selenium_fix.py script for an example.\nNote: This process will run a local server on your computer that opens the chart as an PNG file in chrome and downloads the file to the folder in which that VSCode file is located on your computer.\n1.3 Saving a plot using altair_saver It might require you to restart VScode and import everything again for this to work. Please note that the plot will be saved in the same folder of the script.\nchart = alt.chart(\u0026lt;data\u0026gt;).\u0026lt;chart_methods\u0026gt; chart.save(\u0026#39;name_of_chart.png\u0026#39;) 2. Save as PNG method The method only requires us to have Altair library. Whenever we output a plot, we will see a button with three dots at the top right corner of the plot.\nClicking Save as PNG will bring us to a window to save our plot.\n3. Screenshot method If all thing fails and we need to save a plot, the snip \u0026amp; sketch (Windows) or taking a screenshot (MacOS) will be our last resort.\n</p><p></p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/course-materials\/altair\/"},{value:"GitHub and git",label:"<p>Complete the Hello World GitHub Guide\n</p><p></p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/skill_builders\/git_github\/"},{value:"Markdown for DS",label:"<p>Markdown Markdown is a plain text formatting syntax aimed at making writing more accessible. The philosophy behind Markdown is that plain text documents should be readable without tags making a mess, but there should still be ways to add text modifiers like lists, bold, italics, etc. It is an alternative to WYSIWYG (what you see is what you get) editors, which use rich text that later gets converted to proper HTML.1\nVSCode Markdown Extensions We prefer the following extensions;\n Markdown Preview Enhanced - This extension previews your Markdown and provides access to converting your markdown document to a pdf document. vscode-pdf - With this extension you will be able to view pdf files in VSCode. Markdown\u002bMath - Now, you can use LaTex math within your markdown file.  Markdown Preview Enhanced VSCode has its own Markdown previewer that displays the same icon in the top right corner of VSCode. You will need to hover over each to see which is Markdown Preview Enhanced (MPE). You will know that you are using MPE when your side view renders with a solid white background. Once you can view your rendered document, you can convert it to a pdf (after saving your file) by right-clicking on the preview. We recommend that you use Chrome (Puppeteer) \u0026gt; PDF to create a pdf document.\nReport Creating Process   Markdown Examples You can read the full syntax guide at the daringfireball.net website. The code chunk below highlights the standard syntax2\n*This text will be italic* _This will also be italic_ **This text will be bold** __This will also be bold__ _You **can** combine them_ You can make bulleted lists. * Item 1 * Item 2 * Item 2a * Item 2b Or numbered lists. 1. Item 1 1. Item 2 1. Item 3 1. Item 3a 1. Item 3b Place an image in the document. ![GitHub Logo](\/images\/logo.png) or a link in a document [GitHub](http:\/\/github.com) You can even blockquote Kanye West said: \u0026gt; We\u0026#39;re living the future so \u0026gt; the present is our past.  Finally, you can create tables. Check out `print(df.to_markdown())` to get tables from pandas. First Header | Second Header ------------ | ------------- Content from cell 1 | Content from cell 2 Content in the first column | Content in the second column Every once in a while, you may want strikethrough. ~~this~~ Getting tables out of Pandas You can create tables using Markdown in your reports. You can use the .to_markdown() method on your DataFrame object. You would use print(df.to_markdown(index=False)) to get tables from pandas. They would print out in your interactive window as;\nname | gender ----- | ------ J. | Male Katie | Female You would then copy the output from your interactive window and paste it into your .md report.\nClass template We have built a template to provide an example of you will submit your project reports. The template has three sections (for additional details please see the instructional template). As you use the template, the following items may help you understand how to write your report.\n The template is a guide. Every line that does not have a hashtag (#) in front of it is guidance. Don\u0026rsquo;t feel responsible for including it. The technical details section has the grand questions as subsections. You should include any work, explanation, charts, or tables that address under the grand question subsections. We have provided example descriptions before the grand question so you can see how to write in Markdown. Your appendix should have properly highlighted Python code that doesn\u0026rsquo;t run off the page (other than file paths).    https:\/\/www.ultraedit.com\/company\/blog\/community\/what-is-markdown-why-use-it.html \u0026#x21a9;\u0026#xfe0e;\n https:\/\/guides.github.com\/features\/mastering-markdown\/ \u0026#x21a9;\u0026#xfe0e;\n   </p><p></p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/course-materials\/markdown\/"},{value:"Quarto for Data Science",label:"<p>Quarto Quarto is an open-source scientific and technical publishing system built on Pandoc. You can create dynamic content with Python, R, Julia, and Observable.\nWe use this perfect union of Jupyter Notebooks and RMarkdown for reporting on our projects. It leverages Markdown and Python code chunks to create dynamic HTML content.\nMarkdown Markdown is a plain text formatting syntax aimed at making writing more accessible. The philosophy behind Markdown is that plain text documents should be readable without tags making a mess, but there should still be ways to add text modifiers like lists, bold, italics, etc. It is an alternative to WYSIWYG (what you see is what you get) editors, which use rich text that later gets converted to proper HTML.1\nQuarto Basics You will need to install the Quarto CLI and then go through the VS Code directions on using Quarto with Python.\n Install Quarto CLI Setup your VS Code Really read the VS Code setup entirely  Class template We have built a template to provide an example of you will submit your project reports (for additional details please see the instructional template). As you use the template, the following items may help you understand how to write your report.\n The template is a guide. Don\u0026rsquo;t feel responsible for including every item beyond sections for each question. Your appendix should have properly highlighted Python code that doesn\u0026rsquo;t run off the page (other than file paths). You can see examples of the html output here and here  Markdown Examples You can read the complete syntax guide at the daringfireball.net website. The code chunk below highlights the standard syntax2\n*This text will be italic* _This will also be italic_ **This text will be bold** __This will also be bold__ _You **can** combine them_ You can make bulleted lists. * Item 1 * Item 2 * Item 2a * Item 2b Or numbered lists. 1. Item 1 1. Item 2 1. Item 3 1. Item 3a 1. Item 3b Place an image in the document. ![GitHub Logo](\/images\/logo.png) or a link in a document [GitHub](http:\/\/github.com) You can even blockquote Kanye West said: \u0026gt; We\u0026#39;re living the future so \u0026gt; the present is our past.  Finally, you can create tables. Check out `print(df.to_markdown())` to get tables from pandas. First Header | Second Header ------------ | ------------- Content from cell 1 | Content from cell 2 Content in the first column | Content in the second column Every once in a while, you may want strikethrough. ~~this~~ Getting tables out of Pandas You can create tables using Markdown in your reports. You can use the .to_markdown() method on your DataFrame object. You would use print(df.to_markdown(index=False)) to get tables from pandas. They would print out in your interactive window as;\nname | gender ----- | ------ J. | Male Katie | Female You would then copy the output from your interactive window and paste it into your .md report.\n  https:\/\/www.ultraedit.com\/company\/blog\/community\/what-is-markdown-why-use-it.html \u0026#x21a9;\u0026#xfe0e;\n https:\/\/guides.github.com\/features\/mastering-markdown\/ \u0026#x21a9;\u0026#xfe0e;\n   </p><p></p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/course-materials\/quarto-for-data-science\/"},{value:"Git and GitHub for DS",label:"<p>Git what? Git is a distributed version control tool that can manage a development project\u0026rsquo;s source code history, while GitHub is a cloud based platform built around the Git tool. Git is a tool a developer installs locally on their computer, while GitHub is an online service that stores code pushed to it from computers running the Git tool. The key difference between Git and GitHub is that Git is an open-source tool developers install locally to manage source code, while GitHub is an online service to which developers who use Git can connect and upload or download resources.1\nGit? The Git tool is popular with developers because is stays true to its purpose of versioning source code, managing commit histories and making it possible to share code between developers without deviating into peripheral fields. There is no feature bloat with Git. It does what it does, it does nothing else, and it makes no apologies for that fact.1\nGithub? We’ve established that Git is a version control system, similar but better than the many alternatives available. So, what makes GitHub so special? Git is a command-line tool, but the center around which all things involving Git revolve is the hub—GitHub.com—where developers store their projects and network with like minded people.2\nSteps related to Git and Github for our final project.   Make sure you have git on your computer.\nA. Note that Mac users have a few extra concerns.3 B. Mac fix with paths ls \/usr\/local C. Download Xcode and update 10 gig download.\nD. VSCode path selection settings Git: path    Create a GitHub account and use an appropriate username    Connect to our BYU-I organizations.\nA. BYU-I DS Resumes need teacher to admit you B. BYU-I Data Science Society need teacher to admit you    Creat your own resume repo from our template (some directions)[https:\/\/github.blog\/2019-06-06-generate-new-repositories-with-repository-templates\/]    Publish your repo on GitHub pages.\nA. Go to settings for your repo.\nB. Scroll down to the GitHub Pages section.\nC. Under source select the box which says None and pick master.\nD. Now select the \/docs folder and click save.    Check your published site settings and copy your site URL.    Update your repository landing page to include your pages URL.    Edit the readme.md in the base repo to not show the resume directions if your repo is public.    Fork your repo back into the BYU-I DS Resumes    Merge a pull request with any changes in your personal repository (see pull and merge on GitHub Guide).     https:\/\/www.theserverside.com\/video\/Git-vs-GitHub-What-is-the-difference-between-them#:~:text=The%20key%20difference%20between%20Git,and%20upload%20or%20download%20resources. \u0026#x21a9;\u0026#xfe0e;\n https:\/\/www.howtogeek.com\/180167\/htg-explains-what-is-github-and-what-do-geeks-use-it-for\/ \u0026#x21a9;\u0026#xfe0e;\n https:\/\/stackoverflow.com\/questions\/29971624\/visual-studio-code-cannot-detect-installed-git \u0026#x21a9;\u0026#xfe0e;\n   </p><p></p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/course-materials\/git_github_ds\/"},{value:"Week 1: Introduction",label:"<p>  Introduction Project Syllabus   </p><p></p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/slides\/introduction\/"},{value:"DS250",label:"<p></p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/"},{value:"Frequently Asked Questions",label:"<p> What do you mean by data science programming?    Most likely, you have had 1-2 courses of programming before you have taken CSE 250. Unlike traditional computer science courses, CSE 250 uses Python in an interactive mode instead of building programs. The data provider usually has some big questions that need answering; However, there are hundreds of little issues and responses along the way. We use programming to facilitate this investigation.\nThere are similarities with User Experience Designers. In our case, we don\u0026rsquo;t get to ask users about their experience. We use programming to ask data about its background, and each data set has its own history. We want our analysis to mold to that experience. You can think of data science programming like a first date with your data. You can\u0026rsquo;t write one long program nieve of the issues and nuances each living data set provides.\n   How does CSE 250 compare to CSE 350 or Math 335?    The two courses have similarities. You could think of CSE 250 as an introduction to data wrangling and visualization. Both classes use real-world data and are built around data science projects. There are some critical differences between the two courses.\n In this course, we use Python, and CSE 350 uses R. We are introducing the principles of data science programming in CSE 250. The course is only 2-credits. CSE 250 is intended to introduce visualization, wrangling, and modeling.     How does CSE 250 prepare me for CSE 350, Math 335 and CSE 450?    You will be comfortable with interactive programming and have an introduction to the principles of data formats for data science applications. You will be introduced to principles related to machine learning, data wrangling, and data visualization.   What programming languages do we use in this course?    The course is done using Python. We focus on the pandas and Altair packages.   What are the prerequisites for this course?    Using the new courses at BYU-I, the prerequisite is CSE 110. However, if you have experience programming from other classes, you most likely are prepared for this course.   Why Python instead of R?    The computer science and software engineering programs at BYU-I use Python as their foundational courses. The standard student will have some experience with Python before CSE 250. Python is an essential programming language for data scientists, and we already have CSE 350\/Math 335, which is taught in R.   What is pandas?    pandas is the foundational data science package in Python. If you are using tabular data you will be in pandas.   Why are we using Altair instead of Seaborn or Matplotlib?    Matplotlib was the first visualization package to gain a following in Python. Seaborn is built on top of Matplotlib. Many data scientists use both in their work—neither leverage the grammar of graphics as developed by Leland Wilkinson. Altair is built on Vega-Lite, which uses the Vega visualization grammar. It is declarative and actively developed. We expect that it will become the predominant visualization package in Python (https:\/\/youtu.be\/FytuB8nFHPQ and https:\/\/youtu.be\/vTingdk_pVM).   </p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/faq\/"},{value:"Projects",label:"<p>We will be relying on a few resources for this course. You will find the pertinant readings attached to each of the projects. Those readings will be culled from;\n Python for Data Science: A port of R for Data Science using the Python packages pandas and Altair. pandas User Guide Altair User Guide scikit-learn learn User Guide scikit-learn tutorials Python Data Science Handbook A Whirlwind Tour of Python SQL  Wes McKinney\u0026rsquo;s pandas code for his book Python for Data Analysis is a useful reference as well: https:\/\/github.com\/wesm\/pydata-book\n</p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/projects\/"},{value:"Skill Builders",label:"<p>These short activites are provided for you to gain some additional skills to help with the class projects.\n</p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/skill_builders\/"},{value:"Slack",label:"<p>If you haven\u0026rsquo;t already, please join Slack. This will be a lifesaver.\nhttps:\/\/join.slack.com\/t\/byuidss\/signup\n</p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/slack\/"},{value:"Slides",label:"<p>Use the navigation pane on the left to review the class slides.\n</p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/slides\/"},{value:"",label:"<p>Details Your coding challenge will help you demonstrate the skills you have developed this semester. Here are a few essential items.\n Your goal is to demonstrate your data science coding abilities. Get through as many items with a rough implementation as possible. Get your code to match our outputs as close as possible, but don\u0026rsquo;t stress over minute details. Keep most of the code you type. If you end up not using specific parts, comment them out and include them at the bottom. Use the entire hour and may not finish. Submit a .md and a .pdf report with your output and code for each challenge.  Please use the challenge template to submit your work.\nimport pandas as pd import altair as alt import numpy as np from sklearn.model_selection import train_test_split from sklearn import tree from sklearn.ensemble import GradientBoostingClassifier from sklearn import metrics Challenge 1 Split Entry houses are a failed building experiment in the United States. Use the data from our Denver homes project, as shown below, to recreate the following graphic.\nurl = \u0026#39;https:\/\/github.com\/byuidatascience\/data4dwellings\/raw\/master\/data-raw\/dwellings_denver\/dwellings_denver.csv\u0026#39; dat_home = pd.read_csv(url).sample(n=4500, random_state=15) Challenge 2 Our computations can\u0026rsquo;t be done with missing values. Programmatically replace all the lost values with 125 and make a box-plot.\nmister = pd.Series([\u0026#34;lost\u0026#34;, 15, 22, 45, 31, \u0026#34;lost\u0026#34;, 85, 38, 129, 80, 21, 2]) Challenge 3 Our computations can\u0026rsquo;t be done with missing values. Programmatically replace all the lost values with 125 and report the mean rounded to two decimals.\nmister = pd.Series([\u0026#34;lost\u0026#34;, 15, 22, 45, 31, \u0026#34;lost\u0026#34;, 85, 38, 129, 80, 21, 2]) Challenge 4 Programmatically read in the following JSON file, keep only the cases column and return a markdown table that has country in the rows and cases for 1999 and 2000 in the columns. Your table will have six cells with values.\nurl = \u0026#39;https:\/\/github.com\/byuidatascience\/data4python4ds\/raw\/master\/data-raw\/table1\/table1.json\u0026#39; Challenge 5 Use our cleaned example of the star wars data from project 6 to predict the gender of the respondent to the survey. Report your precision and a feature importance plot.\n Use test_size = .20 and random_state = 2020 in train_test_split() Use the GradientBoostingClassifier() method.  url = \u0026#34;http:\/\/byuistats.github.io\/CSE250-Course\/data\/clean_starwars.csv\u0026#34; dat = pd.read_csv(url) </p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/final_coding_challenge\/sp22\/"},{value:"Categories",label:"<p></p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/categories\/"},{value:"Final_coding_challenges",label:"<p></p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/final_coding_challenge\/"},{value:"Office Hours",label:"<p>Schedule a visit with Brother Cannon at an available time. https:\/\/calendly.com\/cannonp\n</p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/contact\/"},{value:"Tags",label:"<p></p>",url:"https:\/\/byuistats.github.io\/DS250-Cannon\/tags\/"},];$("#search").autocomplete({source:projects}).data("ui-autocomplete")._renderItem=function(ul,item){return $("<li>").append("<a href="+item.url+" + \" &quot;\" +  >"+item.value+"</a>"+item.label).appendTo(ul);};});</script></div></div></div></div></header><section class=section><div class=container><div class="row justify-content-center"><div class="col-12 text-center"><h2 class=section-title></h2></div><div class="col-lg-4 col-sm-6 mb-4"><a href=https://byuistats.github.io/DS250-Cannon/course-materials/ class="px-4 py-5 bg-white shadow text-center d-block match-height"><i class="ti-blackboard icon text-primary d-block mb-4"></i><h3 class="mb-3 mt-0">Course Materials</h3><p class=mb-0>Additional Readings and Guidance</p></a></div><div class="col-lg-4 col-sm-6 mb-4"><a href=https://byuistats.github.io/DS250-Cannon/projects/ class="px-4 py-5 bg-white shadow text-center d-block match-height"><i class="ti-bar-chart icon text-primary d-block mb-4"></i><h3 class="mb-3 mt-0">Projects</h3><p class=mb-0>Project details (the work you will do)</p></a></div><div class="col-lg-4 col-sm-6 mb-4"><a href=https://byuistats.github.io/DS250-Cannon/skill_builders/ class="px-4 py-5 bg-white shadow text-center d-block match-height"><i class="ti-ruler-pencil icon text-primary d-block mb-4"></i><h3 class="mb-3 mt-0">Skill Builders</h3><p class=mb-0>Build skills for the projects.</p></a></div><div class="col-lg-4 col-sm-6 mb-4"><a href=https://byuistats.github.io/DS250-Cannon/slack/ class="px-4 py-5 bg-white shadow text-center d-block match-height"><i class="https://img.shields.io/badge/slack-@oresoftware/npp-yellow.svg?logo=slack icon text-primary d-block mb-4"></i><h3 class="mb-3 mt-0">Slack</h3><p class=mb-0>Link to Slack signup</p></a></div><div class="col-lg-4 col-sm-6 mb-4"><a href=https://byuistats.github.io/DS250-Cannon/slides/ class="px-4 py-5 bg-white shadow text-center d-block match-height"><i class="ti-layout-slider-alt icon text-primary d-block mb-4"></i><h3 class="mb-3 mt-0">Slides</h3><p class=mb-0>Class material for every day.</p></a></div></div></div></section><footer class="section pb-4"><div class=container><div class="row align-items-center"><div class="col-md-8 text-md-left text-center"><p class="mb-md-0 mb-4">J. Hathaway and BYU-I ©</p></div><div class="col-md-4 text-md-right text-center"><ul class=list-inline><li class=list-inline-item><a class="text-color d-inline-block p-2" href=https://github.com/byuidatascience><i class=ti-github></i></a></li><li class=list-inline-item><a class="text-color d-inline-block p-2" href=https://www.linkedin.com/groups/13537407/><i class=ti-linkedin></i></a></li></ul></div></div></div></footer><script src=https://byuistats.github.io/DS250-Cannon/js/script.min.js></script></body></html>
\ No newline at end of file
diff --git a/sitemap.xml b/sitemap.xml
index 49a56f1..8e8947b 100644
--- a/sitemap.xml
+++ b/sitemap.xml
@@ -1 +1 @@
-<?xml version="1.0" encoding="utf-8" standalone="yes"?><urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xhtml="http://www.w3.org/1999/xhtml"><url><loc>https://byuistats.github.io/DS250-Cannon/slides/introduction/day02/</loc><lastmod>2020-09-17T10:42:26+06:00</lastmod></url><url><loc>https://byuistats.github.io/DS250-Cannon/slides/p6/d4/</loc><lastmod>2020-10-12T10:42:26+06:00</lastmod></url><url><loc>https://byuistats.github.io/DS250-Cannon/slides/p5/d4/</loc><lastmod>2020-10-12T10:42:26+06:00</lastmod></url><url><loc>https://byuistats.github.io/DS250-Cannon/course-materials/syllabus/</loc><lastmod>2020-09-15T10:42:26+06:00</lastmod></url><url><loc>https://byuistats.github.io/DS250-Cannon/skill_builders/introduction/</loc><lastmod>2020-09-15T10:42:26+06:00</lastmod></url><url><loc>https://byuistats.github.io/DS250-Cannon/projects/introduction/</loc><lastmod>2020-09-15T10:42:26+06:00</lastmod></url><url><loc>https://byuistats.github.io/DS250-Cannon/course-materials/git_github_ds/pull_merge/</loc><lastmod>2020-09-15T10:42:26+06:00</lastmod></url><url><loc>https://byuistats.github.io/DS250-Cannon/slides/p6/</loc><lastmod>2020-09-15T10:42:26+06:00</lastmod></url><url><loc>https://byuistats.github.io/DS250-Cannon/slides/introduction/day01/</loc><lastmod>2020-09-15T10:42:26+06:00</lastmod></url><url><loc>https://byuistats.github.io/DS250-Cannon/slides/p6/d3/</loc><lastmod>2020-10-12T10:42:26+06:00</lastmod></url><url><loc>https://byuistats.github.io/DS250-Cannon/slides/p5/d3/</loc><lastmod>2020-10-12T10:42:26+06:00</lastmod></url><url><loc>https://byuistats.github.io/DS250-Cannon/skill_builders/pandas_altair/</loc><lastmod>2020-09-15T10:42:26+06:00</lastmod></url><url><loc>https://byuistats.github.io/DS250-Cannon/projects/project-1/</loc><lastmod>2020-09-15T10:42:26+06:00</lastmod></url><url><loc>https://byuistats.github.io/DS250-Cannon/course-materials/python-for-data-science/</loc><lastmod>2020-09-15T10:42:26+06:00</lastmod></url><url><loc>https://byuistats.github.io/DS250-Cannon/slides/p5/</loc><lastmod>2020-09-15T10:42:26+06:00</lastmod></url><url><loc>https://byuistats.github.io/DS250-Cannon/slides/p6/d2/</loc><lastmod>2020-10-12T10:42:26+06:00</lastmod></url><url><loc>https://byuistats.github.io/DS250-Cannon/slides/p5/d2/</loc><lastmod>2020-10-12T10:42:26+06:00</lastmod></url><url><loc>https://byuistats.github.io/DS250-Cannon/skill_builders/json_missing/</loc><lastmod>2020-09-15T10:42:26+06:00</lastmod></url><url><loc>https://byuistats.github.io/DS250-Cannon/projects/project-2/</loc><lastmod>2020-09-15T10:42:26+06:00</lastmod></url><url><loc>https://byuistats.github.io/DS250-Cannon/slides/p5/d1/</loc><lastmod>2020-10-12T10:42:26+06:00</lastmod></url><url><loc>https://byuistats.github.io/DS250-Cannon/course-materials/machine-learning/</loc><lastmod>2020-09-15T10:42:26+06:00</lastmod></url><url><loc>https://byuistats.github.io/DS250-Cannon/projects/project-3/</loc><lastmod>2020-09-15T10:42:26+06:00</lastmod></url><url><loc>https://byuistats.github.io/DS250-Cannon/skill_builders/relational_data/</loc><lastmod>2020-09-15T10:42:26+06:00</lastmod></url><url><loc>https://byuistats.github.io/DS250-Cannon/course-materials/</loc><lastmod>2020-09-15T10:42:26+06:00</lastmod></url><url><loc>https://byuistats.github.io/DS250-Cannon/skill_builders/ml_sklearn/</loc><lastmod>2020-09-15T10:42:26+06:00</lastmod></url><url><loc>https://byuistats.github.io/DS250-Cannon/projects/project-4/</loc><lastmod>2020-09-15T10:42:26+06:00</lastmod></url><url><loc>https://byuistats.github.io/DS250-Cannon/projects/project-5/</loc><lastmod>2020-09-15T10:42:26+06:00</lastmod></url><url><loc>https://byuistats.github.io/DS250-Cannon/course-materials/sql-for-data-science/</loc><lastmod>2020-09-15T10:42:26+06:00</lastmod></url><url><loc>https://byuistats.github.io/DS250-Cannon/skill_builders/munging/</loc><lastmod>2020-09-15T10:42:26+06:00</lastmod></url><url><loc>https://byuistats.github.io/DS250-Cannon/projects/project-6/</loc><lastmod>2020-09-15T10:42:26+06:00</lastmod></url><url><loc>https://byuistats.github.io/DS250-Cannon/course-materials/vs-code/</loc><lastmod>2020-09-15T10:42:26+06:00</lastmod></url><url><loc>https://byuistats.github.io/DS250-Cannon/course-materials/altair/</loc><lastmod>2020-09-15T10:42:26+06:00</lastmod></url><url><loc>https://byuistats.github.io/DS250-Cannon/skill_builders/git_github/</loc><lastmod>2020-09-15T10:42:26+06:00</lastmod></url><url><loc>https://byuistats.github.io/DS250-Cannon/course-materials/markdown/</loc><lastmod>2020-09-15T10:42:26+06:00</lastmod></url><url><loc>https://byuistats.github.io/DS250-Cannon/course-materials/quarto-for-data-science/</loc><lastmod>2020-09-15T10:42:26+06:00</lastmod></url><url><loc>https://byuistats.github.io/DS250-Cannon/course-materials/git_github_ds/</loc><lastmod>2020-09-15T10:42:26+06:00</lastmod></url><url><loc>https://byuistats.github.io/DS250-Cannon/slides/introduction/</loc><lastmod>2020-09-15T10:42:26+06:00</lastmod></url><url><loc>https://byuistats.github.io/DS250-Cannon/</loc><lastmod>2020-10-12T10:42:26+06:00</lastmod></url><url><loc>https://byuistats.github.io/DS250-Cannon/faq/</loc><lastmod>2020-09-15T10:42:26+06:00</lastmod></url><url><loc>https://byuistats.github.io/DS250-Cannon/projects/</loc><lastmod>2020-09-15T10:42:26+06:00</lastmod></url><url><loc>https://byuistats.github.io/DS250-Cannon/skill_builders/</loc><lastmod>2020-09-15T10:42:26+06:00</lastmod></url><url><loc>https://byuistats.github.io/DS250-Cannon/slack/</loc><lastmod>2020-09-15T10:42:26+06:00</lastmod></url><url><loc>https://byuistats.github.io/DS250-Cannon/slides/</loc><lastmod>2020-10-06T10:42:26+06:00</lastmod></url><url><loc>https://byuistats.github.io/DS250-Cannon/final_coding_challenge/sp22/</loc></url><url><loc>https://byuistats.github.io/DS250-Cannon/categories/</loc></url><url><loc>https://byuistats.github.io/DS250-Cannon/final_coding_challenge/</loc></url><url><loc>https://byuistats.github.io/DS250-Cannon/contact/</loc></url><url><loc>https://byuistats.github.io/DS250-Cannon/tags/</loc></url></urlset>
\ No newline at end of file
+<?xml version="1.0" encoding="utf-8" standalone="yes"?><urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xhtml="http://www.w3.org/1999/xhtml"><url><loc>https://byuistats.github.io/DS250-Cannon/slides/introduction/day02/</loc><lastmod>2020-09-17T10:42:26+06:00</lastmod></url><url><loc>https://byuistats.github.io/DS250-Cannon/slides/p6/d4/</loc><lastmod>2020-10-12T10:42:26+06:00</lastmod></url><url><loc>https://byuistats.github.io/DS250-Cannon/course-materials/syllabus/</loc><lastmod>2020-09-15T10:42:26+06:00</lastmod></url><url><loc>https://byuistats.github.io/DS250-Cannon/skill_builders/introduction/</loc><lastmod>2020-09-15T10:42:26+06:00</lastmod></url><url><loc>https://byuistats.github.io/DS250-Cannon/projects/introduction/</loc><lastmod>2020-09-15T10:42:26+06:00</lastmod></url><url><loc>https://byuistats.github.io/DS250-Cannon/course-materials/git_github_ds/pull_merge/</loc><lastmod>2020-09-15T10:42:26+06:00</lastmod></url><url><loc>https://byuistats.github.io/DS250-Cannon/slides/p6/</loc><lastmod>2020-09-15T10:42:26+06:00</lastmod></url><url><loc>https://byuistats.github.io/DS250-Cannon/slides/introduction/day01/</loc><lastmod>2020-09-15T10:42:26+06:00</lastmod></url><url><loc>https://byuistats.github.io/DS250-Cannon/slides/p6/d3/</loc><lastmod>2020-10-12T10:42:26+06:00</lastmod></url><url><loc>https://byuistats.github.io/DS250-Cannon/skill_builders/pandas_altair/</loc><lastmod>2020-09-15T10:42:26+06:00</lastmod></url><url><loc>https://byuistats.github.io/DS250-Cannon/projects/project-1/</loc><lastmod>2020-09-15T10:42:26+06:00</lastmod></url><url><loc>https://byuistats.github.io/DS250-Cannon/course-materials/python-for-data-science/</loc><lastmod>2020-09-15T10:42:26+06:00</lastmod></url><url><loc>https://byuistats.github.io/DS250-Cannon/slides/p6/d2/</loc><lastmod>2020-10-12T10:42:26+06:00</lastmod></url><url><loc>https://byuistats.github.io/DS250-Cannon/skill_builders/json_missing/</loc><lastmod>2020-09-15T10:42:26+06:00</lastmod></url><url><loc>https://byuistats.github.io/DS250-Cannon/projects/project-2/</loc><lastmod>2020-09-15T10:42:26+06:00</lastmod></url><url><loc>https://byuistats.github.io/DS250-Cannon/course-materials/machine-learning/</loc><lastmod>2020-09-15T10:42:26+06:00</lastmod></url><url><loc>https://byuistats.github.io/DS250-Cannon/projects/project-3/</loc><lastmod>2020-09-15T10:42:26+06:00</lastmod></url><url><loc>https://byuistats.github.io/DS250-Cannon/skill_builders/relational_data/</loc><lastmod>2020-09-15T10:42:26+06:00</lastmod></url><url><loc>https://byuistats.github.io/DS250-Cannon/course-materials/</loc><lastmod>2020-09-15T10:42:26+06:00</lastmod></url><url><loc>https://byuistats.github.io/DS250-Cannon/skill_builders/ml_sklearn/</loc><lastmod>2020-09-15T10:42:26+06:00</lastmod></url><url><loc>https://byuistats.github.io/DS250-Cannon/projects/project-4/</loc><lastmod>2020-09-15T10:42:26+06:00</lastmod></url><url><loc>https://byuistats.github.io/DS250-Cannon/projects/project-5/</loc><lastmod>2020-09-15T10:42:26+06:00</lastmod></url><url><loc>https://byuistats.github.io/DS250-Cannon/course-materials/sql-for-data-science/</loc><lastmod>2020-09-15T10:42:26+06:00</lastmod></url><url><loc>https://byuistats.github.io/DS250-Cannon/skill_builders/munging/</loc><lastmod>2020-09-15T10:42:26+06:00</lastmod></url><url><loc>https://byuistats.github.io/DS250-Cannon/projects/project-6/</loc><lastmod>2020-09-15T10:42:26+06:00</lastmod></url><url><loc>https://byuistats.github.io/DS250-Cannon/course-materials/vs-code/</loc><lastmod>2020-09-15T10:42:26+06:00</lastmod></url><url><loc>https://byuistats.github.io/DS250-Cannon/course-materials/altair/</loc><lastmod>2020-09-15T10:42:26+06:00</lastmod></url><url><loc>https://byuistats.github.io/DS250-Cannon/skill_builders/git_github/</loc><lastmod>2020-09-15T10:42:26+06:00</lastmod></url><url><loc>https://byuistats.github.io/DS250-Cannon/course-materials/markdown/</loc><lastmod>2020-09-15T10:42:26+06:00</lastmod></url><url><loc>https://byuistats.github.io/DS250-Cannon/course-materials/quarto-for-data-science/</loc><lastmod>2020-09-15T10:42:26+06:00</lastmod></url><url><loc>https://byuistats.github.io/DS250-Cannon/course-materials/git_github_ds/</loc><lastmod>2020-09-15T10:42:26+06:00</lastmod></url><url><loc>https://byuistats.github.io/DS250-Cannon/slides/introduction/</loc><lastmod>2020-09-15T10:42:26+06:00</lastmod></url><url><loc>https://byuistats.github.io/DS250-Cannon/</loc><lastmod>2020-10-12T10:42:26+06:00</lastmod></url><url><loc>https://byuistats.github.io/DS250-Cannon/faq/</loc><lastmod>2020-09-15T10:42:26+06:00</lastmod></url><url><loc>https://byuistats.github.io/DS250-Cannon/projects/</loc><lastmod>2020-09-15T10:42:26+06:00</lastmod></url><url><loc>https://byuistats.github.io/DS250-Cannon/skill_builders/</loc><lastmod>2020-09-15T10:42:26+06:00</lastmod></url><url><loc>https://byuistats.github.io/DS250-Cannon/slack/</loc><lastmod>2020-09-15T10:42:26+06:00</lastmod></url><url><loc>https://byuistats.github.io/DS250-Cannon/slides/</loc><lastmod>2020-10-06T10:42:26+06:00</lastmod></url><url><loc>https://byuistats.github.io/DS250-Cannon/final_coding_challenge/sp22/</loc></url><url><loc>https://byuistats.github.io/DS250-Cannon/categories/</loc></url><url><loc>https://byuistats.github.io/DS250-Cannon/final_coding_challenge/</loc></url><url><loc>https://byuistats.github.io/DS250-Cannon/contact/</loc></url><url><loc>https://byuistats.github.io/DS250-Cannon/tags/</loc></url></urlset>
\ No newline at end of file
diff --git a/slides/index.html b/slides/index.html
index b84fcae..2d0868e 100644
--- a/slides/index.html
+++ b/slides/index.html
@@ -3,5 +3,5 @@
 <span class=navbar-toggler-icon></span></button><div class="collapse navbar-collapse text-center" id=navigation><ul class="navbar-nav ml-auto"><li class=nav-item><a class="nav-link text-dark" href=/DS250-Cannon>Home</a></li><li class=nav-item><a class="nav-link text-dark" href=/DS250-Cannon/projects>Projects</a></li><li class=nav-item><a class="nav-link text-dark" href=/DS250-Cannon/contact>Contact</a></li><li class=nav-item><a class="nav-link text-dark" href=/DS250-Cannon/course-materials>Materials</a></li><li class="nav-item dropdown"><a class="nav-link dropdown-toggle text-dark" href=# role=button data-toggle=dropdown aria-haspopup=true aria-expanded=false>Navigate</a><div class=dropdown-menu><a class=dropdown-item href=/DS250-Cannon/slides>Slides</a>
 <a class=dropdown-item href=/DS250-Cannon/course-materials/syllabus/>Syllabus</a>
 <a class=dropdown-item href=/DS250-Cannon/faq>FAQ</a></div></li></ul></div></div></nav></header><section class="single section-sm pb-0"><div class=container><div class=row><div class=col-lg-3><div class=sidebar><ul class=list-styled><a class=back-btn href=/DS250-Cannon></a><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/ title=Slides class="sidelist
-active"><a href=https://byuistats.github.io/DS250-Cannon/slides/>Slides</a><ul><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p6/ title="Week 12-13: Project 6 - Github" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p6/>Week 12-13: Project 6 - Github</a><ul><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p6/d4/ title="Day 3: Resume Fork and Merge" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p6/d4/>Day 3: Resume Fork and Merge</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p6/d3/ title="Day 2: Commit, push, fork, and merge" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p6/d3/>Day 2: Commit, push, fork, and merge</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p6/d2/ title="Day 1: Git and Github" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p6/d2/>Day 1: Git and Github</a></li></ul></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p5/ title="Week 10-11: Project 5 - Star Wars" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p5/>Week 10-11: Project 5 - Star Wars</a><ul><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p5/d4/ title="Day 4: May the ML columns be with you" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p5/d4/>Day 4: May the ML columns be with you</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p5/d3/ title="Day 3: Validating data, cleaning columns" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p5/d3/>Day 3: Validating data, cleaning columns</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p5/d2/ title="Day 2: Star Wars and strings" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p5/d2/>Day 2: Star Wars and strings</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p5/d1/ title="Day 1: The war with Star Wars" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p5/d1/>Day 1: The war with Star Wars</a></li></ul></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/introduction/ title="Week 1: Introduction" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/introduction/>Week 1: Introduction</a><ul><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/introduction/day02/ title="Day 2: Project 0" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/introduction/day02/>Day 2: Project 0</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/introduction/day01/ title="Day 1: Welcome" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/introduction/day01/>Day 1: Welcome</a></li></ul></li></ul></li></ul></div></div><div class=col-lg-9><div class="p-lg-5 p-4 bg-white"><h2 class=mb-5>Slides</h2><div class=content><p>Use the navigation pane on the left to review the class slides.</p></div><p class="post-meta border-bottom pb-3 mb-0 mt-3">Updated on 06 Oct 2020</p><nav class="pagination mt-3"><a class="nav nav-prev" href=https://byuistats.github.io/DS250-Cannon/slack/><i class="ti-arrow-left mr-2"></i><span class="d-none d-md-block">Slack</span></a>
+active"><a href=https://byuistats.github.io/DS250-Cannon/slides/>Slides</a><ul><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p6/ title="Week 12-13: Project 6 - Github" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p6/>Week 12-13: Project 6 - Github</a><ul><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p6/d4/ title="Day 3: Resume Fork and Merge" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p6/d4/>Day 3: Resume Fork and Merge</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p6/d3/ title="Day 2: Commit, push, fork, and merge" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p6/d3/>Day 2: Commit, push, fork, and merge</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p6/d2/ title="Day 1: Git and Github" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p6/d2/>Day 1: Git and Github</a></li></ul></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/introduction/ title="Week 1: Introduction" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/introduction/>Week 1: Introduction</a><ul><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/introduction/day02/ title="Day 2: Project 0" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/introduction/day02/>Day 2: Project 0</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/introduction/day01/ title="Day 1: Welcome" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/introduction/day01/>Day 1: Welcome</a></li></ul></li></ul></li></ul></div></div><div class=col-lg-9><div class="p-lg-5 p-4 bg-white"><h2 class=mb-5>Slides</h2><div class=content><p>Use the navigation pane on the left to review the class slides.</p></div><p class="post-meta border-bottom pb-3 mb-0 mt-3">Updated on 06 Oct 2020</p><nav class="pagination mt-3"><a class="nav nav-prev" href=https://byuistats.github.io/DS250-Cannon/slack/><i class="ti-arrow-left mr-2"></i><span class="d-none d-md-block">Slack</span></a>
 <a class="nav nav-next" href=https://byuistats.github.io/DS250-Cannon/slides/p6/><span class="d-none d-md-block">Week 12-13: Project 6 - Github</span><i class="ti-arrow-right ml-2"></i></a></nav></div></div></div></div></section><footer class="section pb-4"><div class=container><div class="row align-items-center"><div class="col-md-8 text-md-left text-center"><p class="mb-md-0 mb-4">J. Hathaway and BYU-I ©</p></div><div class="col-md-4 text-md-right text-center"><ul class=list-inline><li class=list-inline-item><a class="text-color d-inline-block p-2" href=https://github.com/byuidatascience><i class=ti-github></i></a></li><li class=list-inline-item><a class="text-color d-inline-block p-2" href=https://www.linkedin.com/groups/13537407/><i class=ti-linkedin></i></a></li></ul></div></div></div></footer><script src=https://byuistats.github.io/DS250-Cannon/js/script.min.js></script></body></html>
\ No newline at end of file
diff --git a/slides/introduction/day01/index.html b/slides/introduction/day01/index.html
index e816b5c..59caca7 100644
--- a/slides/introduction/day01/index.html
+++ b/slides/introduction/day01/index.html
@@ -2,7 +2,7 @@
 <button class="navbar-toggler border-0" type=button data-toggle=collapse data-target=#navigation aria-controls=navigation aria-expanded=false aria-label="Toggle navigation">
 <span class=navbar-toggler-icon></span></button><div class="collapse navbar-collapse text-center" id=navigation><ul class="navbar-nav ml-auto"><li class=nav-item><a class="nav-link text-dark" href=/DS250-Cannon>Home</a></li><li class=nav-item><a class="nav-link text-dark" href=/DS250-Cannon/projects>Projects</a></li><li class=nav-item><a class="nav-link text-dark" href=/DS250-Cannon/contact>Contact</a></li><li class=nav-item><a class="nav-link text-dark" href=/DS250-Cannon/course-materials>Materials</a></li><li class="nav-item dropdown"><a class="nav-link dropdown-toggle text-dark" href=# role=button data-toggle=dropdown aria-haspopup=true aria-expanded=false>Navigate</a><div class=dropdown-menu><a class=dropdown-item href=/DS250-Cannon/slides>Slides</a>
 <a class=dropdown-item href=/DS250-Cannon/course-materials/syllabus/>Syllabus</a>
-<a class=dropdown-item href=/DS250-Cannon/faq>FAQ</a></div></li></ul></div></div></nav></header><section class="single section-sm pb-0"><div class=container><div class=row><div class=col-lg-3><div class=sidebar><ul class=list-styled><a class=back-btn href=/DS250-Cannon></a><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/ title=Slides class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/>Slides</a><ul><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p6/ title="Week 12-13: Project 6 - Github" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p6/>Week 12-13: Project 6 - Github</a><ul><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p6/d4/ title="Day 3: Resume Fork and Merge" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p6/d4/>Day 3: Resume Fork and Merge</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p6/d3/ title="Day 2: Commit, push, fork, and merge" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p6/d3/>Day 2: Commit, push, fork, and merge</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p6/d2/ title="Day 1: Git and Github" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p6/d2/>Day 1: Git and Github</a></li></ul></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p5/ title="Week 10-11: Project 5 - Star Wars" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p5/>Week 10-11: Project 5 - Star Wars</a><ul><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p5/d4/ title="Day 4: May the ML columns be with you" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p5/d4/>Day 4: May the ML columns be with you</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p5/d3/ title="Day 3: Validating data, cleaning columns" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p5/d3/>Day 3: Validating data, cleaning columns</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p5/d2/ title="Day 2: Star Wars and strings" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p5/d2/>Day 2: Star Wars and strings</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p5/d1/ title="Day 1: The war with Star Wars" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p5/d1/>Day 1: The war with Star Wars</a></li></ul></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/introduction/ title="Week 1: Introduction" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/introduction/>Week 1: Introduction</a><ul><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/introduction/day02/ title="Day 2: Project 0" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/introduction/day02/>Day 2: Project 0</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/introduction/day01/ title="Day 1: Welcome" class="sidelist
+<a class=dropdown-item href=/DS250-Cannon/faq>FAQ</a></div></li></ul></div></div></nav></header><section class="single section-sm pb-0"><div class=container><div class=row><div class=col-lg-3><div class=sidebar><ul class=list-styled><a class=back-btn href=/DS250-Cannon></a><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/ title=Slides class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/>Slides</a><ul><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p6/ title="Week 12-13: Project 6 - Github" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p6/>Week 12-13: Project 6 - Github</a><ul><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p6/d4/ title="Day 3: Resume Fork and Merge" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p6/d4/>Day 3: Resume Fork and Merge</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p6/d3/ title="Day 2: Commit, push, fork, and merge" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p6/d3/>Day 2: Commit, push, fork, and merge</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p6/d2/ title="Day 1: Git and Github" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p6/d2/>Day 1: Git and Github</a></li></ul></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/introduction/ title="Week 1: Introduction" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/introduction/>Week 1: Introduction</a><ul><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/introduction/day02/ title="Day 2: Project 0" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/introduction/day02/>Day 2: Project 0</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/introduction/day01/ title="Day 1: Welcome" class="sidelist
 active"><a href=https://byuistats.github.io/DS250-Cannon/slides/introduction/day01/>Day 1: Welcome</a></li></ul></li></ul></li></ul></div></div><div class=col-lg-9><div class="p-lg-5 p-4 bg-white"><h2 class=mb-5>Day 1: Welcome</h2><div class=content><h2 id=welcome-to-ds-250>Welcome to DS 250!</h2><ul><li>Teacher: Paul Cannon</li><li>TA: David Pineda</li></ul><br><h2 id=announcements>Announcements</h2><ol><li>Devotional</li><li><a href=https://byuidatascience.github.io/lab/>Computing Lab</a> 4:30PM - 6:30PM all weekdays except Wednesday. Saturday from 10AM-12PM<ul><li>Slack channel #tutoring_lab</li></ul></li><li>Data Science Society - Wednesday&rsquo;s at 6PM</li></ol><h2 id=what-is-a-data-scientist>What is a Data Scientist?</h2><p><img src=KLR2.png alt>
 <img src=ds_4venn.png alt></p><h2 id=a-data-scientist-has-a-c-talent-stack>A Data Scientist has a C+ Talent Stack</h2><h2 id=class-structure>Class Structure</h2><ol><li>Problem Solving</li><li>Improved coding skills</li><li>Effective written/visual communication</li><li>Collaboration</li><li>Timeliness and communication with &ldquo;the boss&rdquo;</li></ol><p><a href=https://byuistats.github.io/DS250-Cannon/course-materials/syllabus/>Syllabus</a></p><h2 id=got-slack>Got Slack?</h2><h4 id=are-we-all-on-the-slack-channel>Are we all on the Slack channel?</h4><p>Follow the Slack invitation that is waiting in your student email. If you don&rsquo;t see an invite, you can join through <a href=https://join.slack.com/t/byuidss/signup>this link</a> and then ask &ldquo;@Paul Cannon&rdquo; to add you to the class channel.</p><br><h2 id=who-are-you>Who are you?</h2><ol><li>Introduce yourself and learn the names/majors/origin story of your group members.</li><li>Make a plan to get help this semester. How will you contact each other? Some ideas: Slack, I-Learn, emails, group texts, etc.</li><li>If you were independently wealthy, what would you be doing right now? Would you change majors?</li><li>Highlights of 2022</li></ol><h2 id=problem-solving>Problem Solving</h2><p>This is not a &ldquo;see and repeat&rdquo; programming class!</p><h3 id=how-would-you-go-about-fixing-my-motorcycle>How would you go about fixing my motorcycle?</h3><p><img src=PXL_20221015_211101230.jpg alt></p><h4 id=learn-how-to-ask-for-help-1-hr-rule>Learn how to ask for help (1 hr rule)</h4><br><hr><br><p><img src=googleit.jpg alt></p><h2 id=getting-started-on-project-0>Getting started on Project 0</h2><h4 id=setting-up-your-programming-snvironment>Setting up your Programming Snvironment</h4><ol><li>Download <a href=https://code.visualstudio.com/>Visual Studio Code</a></li><li>Download <a href=https://www.python.org/downloads/>Python</a> v <a href=https://www.python.org/downloads/release/python-3108/>(3.10.8)</a><ul><li>Be sure to select the <em>&ldquo;Add to Path&rdquo;</em> option during the install process</li><li><img src=image.png alt></li></ul></li><li>Install the Python packages and VS Code extensions you need (see <a href=https://byuistats.github.io/DS250-Cannon/course-materials/python-for-data-science/>this page</a>)<ul><li>pip install pandas</li><li>pip install numpy</li><li>pip install jupyter</li><li>pip install tabulate</li><li>pip install altair</li></ul></li><li>Install Quarto CLI <a href=https://byuistats.github.io/DS250-Cannon/course-materials/quarto-for-data-science/>Quatro Instructions</a></li><li>Start looking at Project 0</li><li>Complete the &ldquo;Methods Checkpoint&rdquo;</li></ol><h4 id=installing-packages-and-extensions>Installing Packages and Extensions</h4><p>Learn how to install packages by reading the assigned material and by watching the video tutorial on <a href=https://byuistats.github.io/DS250-Cannon/course-materials/python-for-data-science/>this page</a>.</p><p>The readings mention a lot of different packages. For Project 0, you need to install at least <code>pandas</code>, <code>altair</code>, <code>numpy</code>, and <code>jupyter</code>.</p><p>The readings will also mention two VS Code extensions you need to install.</p><h4 id=a-note-on-jupyter-notebooks-vs-interactive-python-window>A note on Jupyter Notebooks vs. Interactive Python Window</h4><p>The textbook will show you how to use VS Code&rsquo;s interactive python windows and Quatro. <strong>Feel free to use Jupyter Notebooks.</strong><br>We will do write-ups in Quarto, though, which can be rendered as a PDF or HTML</p><br></div><p class="post-meta border-bottom pb-3 mb-0 mt-3">Updated on 15 Sep 2020</p><nav class="pagination mt-3"><a class="nav nav-prev" href=https://byuistats.github.io/DS250-Cannon/slides/introduction/day02/><i class="ti-arrow-left mr-2"></i><span class="d-none d-md-block">Day 2: Project 0</span></a>
 <a class="nav nav-next" href=https://byuistats.github.io/DS250-Cannon/final_coding_challenge/><span class="d-none d-md-block">Final_coding_challenges</span><i class="ti-arrow-right ml-2"></i></a></nav></div></div></div></div></section><footer class="section pb-4"><div class=container><div class="row align-items-center"><div class="col-md-8 text-md-left text-center"><p class="mb-md-0 mb-4">J. Hathaway and BYU-I ©</p></div><div class="col-md-4 text-md-right text-center"><ul class=list-inline><li class=list-inline-item><a class="text-color d-inline-block p-2" href=https://github.com/byuidatascience><i class=ti-github></i></a></li><li class=list-inline-item><a class="text-color d-inline-block p-2" href=https://www.linkedin.com/groups/13537407/><i class=ti-linkedin></i></a></li></ul></div></div></div></footer><script src=https://byuistats.github.io/DS250-Cannon/js/script.min.js></script></body></html>
\ No newline at end of file
diff --git a/slides/introduction/day02/index.html b/slides/introduction/day02/index.html
index c0ce4fd..468ec85 100644
--- a/slides/introduction/day02/index.html
+++ b/slides/introduction/day02/index.html
@@ -2,6 +2,6 @@
 <button class="navbar-toggler border-0" type=button data-toggle=collapse data-target=#navigation aria-controls=navigation aria-expanded=false aria-label="Toggle navigation">
 <span class=navbar-toggler-icon></span></button><div class="collapse navbar-collapse text-center" id=navigation><ul class="navbar-nav ml-auto"><li class=nav-item><a class="nav-link text-dark" href=/DS250-Cannon>Home</a></li><li class=nav-item><a class="nav-link text-dark" href=/DS250-Cannon/projects>Projects</a></li><li class=nav-item><a class="nav-link text-dark" href=/DS250-Cannon/contact>Contact</a></li><li class=nav-item><a class="nav-link text-dark" href=/DS250-Cannon/course-materials>Materials</a></li><li class="nav-item dropdown"><a class="nav-link dropdown-toggle text-dark" href=# role=button data-toggle=dropdown aria-haspopup=true aria-expanded=false>Navigate</a><div class=dropdown-menu><a class=dropdown-item href=/DS250-Cannon/slides>Slides</a>
 <a class=dropdown-item href=/DS250-Cannon/course-materials/syllabus/>Syllabus</a>
-<a class=dropdown-item href=/DS250-Cannon/faq>FAQ</a></div></li></ul></div></div></nav></header><section class="single section-sm pb-0"><div class=container><div class=row><div class=col-lg-3><div class=sidebar><ul class=list-styled><a class=back-btn href=/DS250-Cannon></a><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/ title=Slides class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/>Slides</a><ul><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p6/ title="Week 12-13: Project 6 - Github" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p6/>Week 12-13: Project 6 - Github</a><ul><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p6/d4/ title="Day 3: Resume Fork and Merge" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p6/d4/>Day 3: Resume Fork and Merge</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p6/d3/ title="Day 2: Commit, push, fork, and merge" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p6/d3/>Day 2: Commit, push, fork, and merge</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p6/d2/ title="Day 1: Git and Github" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p6/d2/>Day 1: Git and Github</a></li></ul></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p5/ title="Week 10-11: Project 5 - Star Wars" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p5/>Week 10-11: Project 5 - Star Wars</a><ul><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p5/d4/ title="Day 4: May the ML columns be with you" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p5/d4/>Day 4: May the ML columns be with you</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p5/d3/ title="Day 3: Validating data, cleaning columns" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p5/d3/>Day 3: Validating data, cleaning columns</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p5/d2/ title="Day 2: Star Wars and strings" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p5/d2/>Day 2: Star Wars and strings</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p5/d1/ title="Day 1: The war with Star Wars" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p5/d1/>Day 1: The war with Star Wars</a></li></ul></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/introduction/ title="Week 1: Introduction" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/introduction/>Week 1: Introduction</a><ul><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/introduction/day02/ title="Day 2: Project 0" class="sidelist
+<a class=dropdown-item href=/DS250-Cannon/faq>FAQ</a></div></li></ul></div></div></nav></header><section class="single section-sm pb-0"><div class=container><div class=row><div class=col-lg-3><div class=sidebar><ul class=list-styled><a class=back-btn href=/DS250-Cannon></a><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/ title=Slides class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/>Slides</a><ul><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p6/ title="Week 12-13: Project 6 - Github" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p6/>Week 12-13: Project 6 - Github</a><ul><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p6/d4/ title="Day 3: Resume Fork and Merge" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p6/d4/>Day 3: Resume Fork and Merge</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p6/d3/ title="Day 2: Commit, push, fork, and merge" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p6/d3/>Day 2: Commit, push, fork, and merge</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p6/d2/ title="Day 1: Git and Github" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p6/d2/>Day 1: Git and Github</a></li></ul></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/introduction/ title="Week 1: Introduction" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/introduction/>Week 1: Introduction</a><ul><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/introduction/day02/ title="Day 2: Project 0" class="sidelist
 active"><a href=https://byuistats.github.io/DS250-Cannon/slides/introduction/day02/>Day 2: Project 0</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/introduction/day01/ title="Day 1: Welcome" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/introduction/day01/>Day 1: Welcome</a></li></ul></li></ul></li></ul></div></div><div class=col-lg-9><div class="p-lg-5 p-4 bg-white"><h2 class=mb-5>Day 2: Project 0</h2><div class=content><h4 id=syllabus-questions>Syllabus Questions?</h4><ul><li>A note about readings&mldr;</li><li>Tips for asking for help<ul><li>Slack</li><li>Google - acquired discernment</li></ul></li><li>Quarto and tradeoffs</li><li>Project Submissions: HTML</li></ul><h4 id=are-we-all-on-the-slack-channel>Are we all on the Slack channel?</h4><p>Follow the Slack invitation that is waiting in your student email. If you don&rsquo;t see an invite, you can join through <a href=https://join.slack.com/t/byuidss/signup>this link</a> and then ask Brother Cannon to add you to the class channel.</p><br><h2 id=methods-checkpoint>Methods Checkpoint</h2><p>All the answers will be in the assigned reading or in these slides.</p><br><h2 id=notes-on-project-0>Notes on Project 0</h2><h4 id=installing-packages-and-extensions>Installing Packages and Extensions</h4><p>Learn how to install packages by reading the assigned material and by watching the video tutorial on <a href=https://byuistats.github.io/DS250-Cannon/course-materials/python-for-data-science/>this page</a>.</p><p>The readings mention a lot of different packages. For Project 0, you need to install at least <code>pandas</code>, <code>altair</code>, <code>numpy</code>, <code>tabulate</code>, and <code>jupyter</code>.</p><p>The readings will also mention two VS Code extensions you need to install.</p><h4 id=jupyter-notebooks-vs-interactive-python-window>Jupyter Notebooks vs. Interactive Python Window</h4><p>Should you decide to use Juypyter Notebooks this semester within VS Code, <a href=https://code.visualstudio.com/docs/datascience/jupyter-notebooks>this is a great guide</a> to get you started.</p><p>Or you can choose to stick with the <a href=https://code.visualstudio.com/docs/python/jupyter-support-py>Python Interactive window</a> like the textbook does.</p><h4 id=use-your-resources>Use Your Resources!</h4><ul><li>Technical documentation</li><li>Google searches</li><li>Asking for help on Slack</li><li>Don&rsquo;t forget the <a href=https://byuidatascience.github.io/lab.html>data science lab</a>! (Starts next week.)</li><li>Question that cannot be answered by the textbook and documentation? Google it.</li><li>A function you have never seen before? Google it.</li><li>An error in your code? Google it.</li></ul><h4 id=markdown>Markdown</h4><h5 id=what-is-markdown>What is Markdown?</h5><ul><li>A clean, human readable way to make slick html and pdf documents</li><li>Used widely among programmers for clean documentation</li><li>Used widely by Data Scientists to publish results and communicate with stakeholders</li></ul><p><a href=https://byuistats.github.io/DS250-Cannon/course-materials/markdown/>Here&rsquo;s a good summary</a></p><h4 id=quarto>Quarto</h4><p>Do your tinkering in interactive Python or Jupyter notebooks. Generate report with finished code, graphs, etc. in Quatro</p><p><a href=https://quarto.org/>Quarto</a></p><h2 id=now-for-some-data>Now for some data!</h2><h4 id=lets-get-this-party-started>Let&rsquo;s get this party started</h4><h4 id=your-turn>Your turn:</h4><ol><li>Read in the cars data set</li><li>Work with you your teams to talk through interesting possibilities for a graph</li><li>Work on Project 0 Questions and Tasks</li></ol><br></div><p class="post-meta border-bottom pb-3 mb-0 mt-3">Updated on 17 Sep 2020</p><nav class="pagination mt-3"><a class="nav nav-prev" href=https://byuistats.github.io/DS250-Cannon/slides/introduction/><i class="ti-arrow-left mr-2"></i><span class="d-none d-md-block">Week 1: Introduction</span></a>
 <a class="nav nav-next" href=https://byuistats.github.io/DS250-Cannon/slides/introduction/day01/><span class="d-none d-md-block">Day 1: Welcome</span><i class="ti-arrow-right ml-2"></i></a></nav></div></div></div></div></section><footer class="section pb-4"><div class=container><div class="row align-items-center"><div class="col-md-8 text-md-left text-center"><p class="mb-md-0 mb-4">J. Hathaway and BYU-I ©</p></div><div class="col-md-4 text-md-right text-center"><ul class=list-inline><li class=list-inline-item><a class="text-color d-inline-block p-2" href=https://github.com/byuidatascience><i class=ti-github></i></a></li><li class=list-inline-item><a class="text-color d-inline-block p-2" href=https://www.linkedin.com/groups/13537407/><i class=ti-linkedin></i></a></li></ul></div></div></div></footer><script src=https://byuistats.github.io/DS250-Cannon/js/script.min.js></script></body></html>
\ No newline at end of file
diff --git a/slides/introduction/index.html b/slides/introduction/index.html
index c4a6e90..d259dcb 100644
--- a/slides/introduction/index.html
+++ b/slides/introduction/index.html
@@ -2,6 +2,6 @@
 <button class="navbar-toggler border-0" type=button data-toggle=collapse data-target=#navigation aria-controls=navigation aria-expanded=false aria-label="Toggle navigation">
 <span class=navbar-toggler-icon></span></button><div class="collapse navbar-collapse text-center" id=navigation><ul class="navbar-nav ml-auto"><li class=nav-item><a class="nav-link text-dark" href=/DS250-Cannon>Home</a></li><li class=nav-item><a class="nav-link text-dark" href=/DS250-Cannon/projects>Projects</a></li><li class=nav-item><a class="nav-link text-dark" href=/DS250-Cannon/contact>Contact</a></li><li class=nav-item><a class="nav-link text-dark" href=/DS250-Cannon/course-materials>Materials</a></li><li class="nav-item dropdown"><a class="nav-link dropdown-toggle text-dark" href=# role=button data-toggle=dropdown aria-haspopup=true aria-expanded=false>Navigate</a><div class=dropdown-menu><a class=dropdown-item href=/DS250-Cannon/slides>Slides</a>
 <a class=dropdown-item href=/DS250-Cannon/course-materials/syllabus/>Syllabus</a>
-<a class=dropdown-item href=/DS250-Cannon/faq>FAQ</a></div></li></ul></div></div></nav></header><section class="single section-sm pb-0"><div class=container><div class=row><div class=col-lg-3><div class=sidebar><ul class=list-styled><a class=back-btn href=/DS250-Cannon></a><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/ title=Slides class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/>Slides</a><ul><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p6/ title="Week 12-13: Project 6 - Github" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p6/>Week 12-13: Project 6 - Github</a><ul><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p6/d4/ title="Day 3: Resume Fork and Merge" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p6/d4/>Day 3: Resume Fork and Merge</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p6/d3/ title="Day 2: Commit, push, fork, and merge" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p6/d3/>Day 2: Commit, push, fork, and merge</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p6/d2/ title="Day 1: Git and Github" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p6/d2/>Day 1: Git and Github</a></li></ul></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p5/ title="Week 10-11: Project 5 - Star Wars" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p5/>Week 10-11: Project 5 - Star Wars</a><ul><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p5/d4/ title="Day 4: May the ML columns be with you" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p5/d4/>Day 4: May the ML columns be with you</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p5/d3/ title="Day 3: Validating data, cleaning columns" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p5/d3/>Day 3: Validating data, cleaning columns</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p5/d2/ title="Day 2: Star Wars and strings" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p5/d2/>Day 2: Star Wars and strings</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p5/d1/ title="Day 1: The war with Star Wars" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p5/d1/>Day 1: The war with Star Wars</a></li></ul></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/introduction/ title="Week 1: Introduction" class="sidelist
-active"><a href=https://byuistats.github.io/DS250-Cannon/slides/introduction/>Week 1: Introduction</a><ul><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/introduction/day02/ title="Day 2: Project 0" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/introduction/day02/>Day 2: Project 0</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/introduction/day01/ title="Day 1: Welcome" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/introduction/day01/>Day 1: Welcome</a></li></ul></li></ul></li></ul></div></div><div class=col-lg-9><div class="p-lg-5 p-4 bg-white"><h2 class=mb-5>Week 1: Introduction</h2><div class=content><blockquote><ul><li><a href=../../projects/project-0>Introduction Project</a></li><li><a href=../../course-materials/syllabus>Syllabus</a></li></ul></blockquote></div><p class="post-meta border-bottom pb-3 mb-0 mt-3">Updated on 15 Sep 2020</p><nav class="pagination mt-3"><a class="nav nav-prev" href=https://byuistats.github.io/DS250-Cannon/slides/p5/d1/><i class="ti-arrow-left mr-2"></i><span class="d-none d-md-block">Day 1: The war with Star Wars</span></a>
+<a class=dropdown-item href=/DS250-Cannon/faq>FAQ</a></div></li></ul></div></div></nav></header><section class="single section-sm pb-0"><div class=container><div class=row><div class=col-lg-3><div class=sidebar><ul class=list-styled><a class=back-btn href=/DS250-Cannon></a><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/ title=Slides class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/>Slides</a><ul><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p6/ title="Week 12-13: Project 6 - Github" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p6/>Week 12-13: Project 6 - Github</a><ul><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p6/d4/ title="Day 3: Resume Fork and Merge" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p6/d4/>Day 3: Resume Fork and Merge</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p6/d3/ title="Day 2: Commit, push, fork, and merge" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p6/d3/>Day 2: Commit, push, fork, and merge</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p6/d2/ title="Day 1: Git and Github" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p6/d2/>Day 1: Git and Github</a></li></ul></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/introduction/ title="Week 1: Introduction" class="sidelist
+active"><a href=https://byuistats.github.io/DS250-Cannon/slides/introduction/>Week 1: Introduction</a><ul><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/introduction/day02/ title="Day 2: Project 0" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/introduction/day02/>Day 2: Project 0</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/introduction/day01/ title="Day 1: Welcome" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/introduction/day01/>Day 1: Welcome</a></li></ul></li></ul></li></ul></div></div><div class=col-lg-9><div class="p-lg-5 p-4 bg-white"><h2 class=mb-5>Week 1: Introduction</h2><div class=content><blockquote><ul><li><a href=../../projects/project-0>Introduction Project</a></li><li><a href=../../course-materials/syllabus>Syllabus</a></li></ul></blockquote></div><p class="post-meta border-bottom pb-3 mb-0 mt-3">Updated on 15 Sep 2020</p><nav class="pagination mt-3"><a class="nav nav-prev" href=https://byuistats.github.io/DS250-Cannon/slides/p6/d2/><i class="ti-arrow-left mr-2"></i><span class="d-none d-md-block">Day 1: Git and Github</span></a>
 <a class="nav nav-next" href=https://byuistats.github.io/DS250-Cannon/slides/introduction/day02/><span class="d-none d-md-block">Day 2: Project 0</span><i class="ti-arrow-right ml-2"></i></a></nav></div></div></div></div></section><footer class="section pb-4"><div class=container><div class="row align-items-center"><div class="col-md-8 text-md-left text-center"><p class="mb-md-0 mb-4">J. Hathaway and BYU-I ©</p></div><div class="col-md-4 text-md-right text-center"><ul class=list-inline><li class=list-inline-item><a class="text-color d-inline-block p-2" href=https://github.com/byuidatascience><i class=ti-github></i></a></li><li class=list-inline-item><a class="text-color d-inline-block p-2" href=https://www.linkedin.com/groups/13537407/><i class=ti-linkedin></i></a></li></ul></div></div></div></footer><script src=https://byuistats.github.io/DS250-Cannon/js/script.min.js></script></body></html>
\ No newline at end of file
diff --git a/slides/p5/d1/clean_workflow.png b/slides/p5/d1/clean_workflow.png
deleted file mode 100644
index ed2364c..0000000
Binary files a/slides/p5/d1/clean_workflow.png and /dev/null differ
diff --git a/slides/p5/d1/index.html b/slides/p5/d1/index.html
deleted file mode 100644
index 2ae5f57..0000000
--- a/slides/p5/d1/index.html
+++ /dev/null
@@ -1,16 +0,0 @@
-<!doctype html><html lang=en-us><head><meta charset=utf-8><title>Day 1: The war with Star Wars</title><meta name=generator content="Hugo 0.74.3"><meta name=viewport content="width=device-width,initial-scale=1,maximum-scale=1"><link rel=stylesheet href=https://byuistats.github.io/DS250-Cannon/plugins/bootstrap/bootstrap.min.css><link rel=stylesheet href=https://byuistats.github.io/DS250-Cannon/plugins/themify-icons/themify-icons.css><link rel=stylesheet href=https://byuistats.github.io/DS250-Cannon/plugins/highlight/hybrid.css><link rel=icon href=https://byuistats.github.io/DS250-Cannon/images/favicon.png type=image/x-icon><link href="https://fonts.googleapis.com/css?family=Roboto:300,400,700&display=swap" rel=stylesheet><style>:root{--primary-color:#02007e;--body-color:#f9f9f9;--text-color:#636363;--text-color-dark:#242738;--white-color:#ffffff;--light-color:#f8f9fa;--font-family:Roboto}</style><link href=https://byuistats.github.io/DS250-Cannon/css/style.min.css rel=stylesheet media=screen><script src=https://byuistats.github.io/DS250-Cannon/plugins/jquery/jquery-1.12.4.js></script><script src=https://byuistats.github.io/DS250-Cannon/plugins/jquery/jquery-ui.js></script><script src=https://byuistats.github.io/DS250-Cannon/plugins/bootstrap/bootstrap.min.js></script><script src=https://byuistats.github.io/DS250-Cannon/plugins/match-height/jquery.matchHeight-min.js></script><script src=https://byuistats.github.io/DS250-Cannon/plugins/highlight/highlight.pack.js></script><script>hljs.initHighlightingOnLoad();</script><script type=application/javascript>var doNotTrack=false;if(!doNotTrack){(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)})(window,document,'script','https://www.google-analytics.com/analytics.js','ga');ga('create','UA-132356198-4','auto');ga('send','pageview');}</script></head><body><header class="shadow-bottom sticky-top bg-white"><nav class="navbar navbar-expand-md navbar-light"><div class=container><a class="navbar-brand px-2" href=/DS250-Cannon>DS250</a>
-<button class="navbar-toggler border-0" type=button data-toggle=collapse data-target=#navigation aria-controls=navigation aria-expanded=false aria-label="Toggle navigation">
-<span class=navbar-toggler-icon></span></button><div class="collapse navbar-collapse text-center" id=navigation><ul class="navbar-nav ml-auto"><li class=nav-item><a class="nav-link text-dark" href=/DS250-Cannon>Home</a></li><li class=nav-item><a class="nav-link text-dark" href=/DS250-Cannon/projects>Projects</a></li><li class=nav-item><a class="nav-link text-dark" href=/DS250-Cannon/contact>Contact</a></li><li class=nav-item><a class="nav-link text-dark" href=/DS250-Cannon/course-materials>Materials</a></li><li class="nav-item dropdown"><a class="nav-link dropdown-toggle text-dark" href=# role=button data-toggle=dropdown aria-haspopup=true aria-expanded=false>Navigate</a><div class=dropdown-menu><a class=dropdown-item href=/DS250-Cannon/slides>Slides</a>
-<a class=dropdown-item href=/DS250-Cannon/course-materials/syllabus/>Syllabus</a>
-<a class=dropdown-item href=/DS250-Cannon/faq>FAQ</a></div></li></ul></div></div></nav></header><section class="single section-sm pb-0"><div class=container><div class=row><div class=col-lg-3><div class=sidebar><ul class=list-styled><a class=back-btn href=/DS250-Cannon></a><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/ title=Slides class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/>Slides</a><ul><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p6/ title="Week 12-13: Project 6 - Github" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p6/>Week 12-13: Project 6 - Github</a><ul><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p6/d4/ title="Day 3: Resume Fork and Merge" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p6/d4/>Day 3: Resume Fork and Merge</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p6/d3/ title="Day 2: Commit, push, fork, and merge" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p6/d3/>Day 2: Commit, push, fork, and merge</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p6/d2/ title="Day 1: Git and Github" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p6/d2/>Day 1: Git and Github</a></li></ul></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p5/ title="Week 10-11: Project 5 - Star Wars" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p5/>Week 10-11: Project 5 - Star Wars</a><ul><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p5/d4/ title="Day 4: May the ML columns be with you" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p5/d4/>Day 4: May the ML columns be with you</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p5/d3/ title="Day 3: Validating data, cleaning columns" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p5/d3/>Day 3: Validating data, cleaning columns</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p5/d2/ title="Day 2: Star Wars and strings" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p5/d2/>Day 2: Star Wars and strings</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p5/d1/ title="Day 1: The war with Star Wars" class="sidelist
-active"><a href=https://byuistats.github.io/DS250-Cannon/slides/p5/d1/>Day 1: The war with Star Wars</a></li></ul></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/introduction/ title="Week 1: Introduction" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/introduction/>Week 1: Introduction</a><ul><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/introduction/day02/ title="Day 2: Project 0" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/introduction/day02/>Day 2: Project 0</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/introduction/day01/ title="Day 1: Welcome" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/introduction/day01/>Day 1: Welcome</a></li></ul></li></ul></li></ul></div></div><div class=col-lg-9><div class="p-lg-5 p-4 bg-white"><h2 class=mb-5>Day 1: The war with Star Wars</h2><div class=content><h2 id=welcome-to-class>Welcome to class!</h2><h4 id=spiritual-thought>Spiritual Thought</h4><h4 id=announcements>Announcements</h4><ol><li>Project 4 thoughts<ul><li>Feature Importances - Sorted Bar Graph, not unsorted tables</li><li>Suppress warnings</li><li>And the winner is&mldr;</li></ul></li></ol><br><h2 id=the-star-wars-data>The Star Wars data</h2><h2 id=load-the-star-wars-data>Load the Star Wars data</h2><div class=highlight><pre style=color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4><code class=language-python data-lang=python><span style=color:#75715e># %%</span>
-<span style=color:#f92672>import</span> pandas <span style=color:#f92672>as</span> pd 
-<span style=color:#f92672>import</span> altair <span style=color:#f92672>as</span> alt
-<span style=color:#f92672>import</span> numpy <span style=color:#f92672>as</span> np
-
-url <span style=color:#f92672>=</span> <span style=color:#e6db74>&#39;https://github.com/fivethirtyeight/data/raw/master/star-wars-survey/StarWars.csv&#39;</span>
-
-dat <span style=color:#f92672>=</span> pd<span style=color:#f92672>.</span>read_csv(url)
-
-</code></pre></div><br><h4 id=heading>???</h4><h3 id=what-do-the-data-look-like>What do the data look like?</h3><p><strong>Take the time to understand how the current data is organized.</strong></p><h4 id=first-things-first>First things first&mldr;</h4><p>Each group should answer these questions:</p><ol><li>Where are the column names?</li><li>What does each row represent?</li><li>What does each column represent?</li></ol><br><h3 id=what-do-we-want-the-data-to-look-like>What do we <em>want</em> the data to look like?</h3><p>Each group should answer these questions:</p><ol><li>What is the goal of this project, and how does that affect what we want from the data?</li><li>What do we want each row to represent?</li><li>What do we want each column to look like? Pick a few columns from the dataset and try creating an example in excel.</li></ol><br><h2 id=cleaning-data-takes-time>Cleaning data takes time</h2><p><strong>Maybe not 80% of your time, but it does take time!</strong></p><blockquote><p>Data science is frequently about doing bespoke analysis which means creating and labelling unique datasets. No matter how cleanly formatted or standardized a dataset is, it likely needs some work.</p><p>I would argue that spending time working with data to transform, explore and understand it better is absolutely what data scientists should be doing. This is the medium they are working in. Understand the material better and you&rsquo;ll get better insights. <a href=https://blog.ldodds.com/2020/01/31/do-data-scientists-spend-80-of-their-time-cleaning-data-turns-out-no/>ref</a></p></blockquote><br><h2 id=structure-your-project-structure-your-thinking>Structure your project, structure your thinking</h2><h3 id=tableau-on-tidying-data>Tableau on tidying data</h3><ol><li><a href=https://www.tableau.com/learn/whitepapers/data-prep-best-practices#think>Think about your data holistically</a></li><li><a href=https://www.tableau.com/learn/whitepapers/data-prep-best-practices#know>Know the basic structure of your data</a></li><li><a href=https://www.tableau.com/learn/whitepapers/data-prep-best-practices#track>Keep track of your steps</a></li><li><a href=https://www.tableau.com/learn/whitepapers/data-prep-best-practices#spot>Spot check throughout</a></li></ol><br><h3 id=compartmentalize-and-organize-your-scripts-and-data>Compartmentalize and organize your scripts and data</h3><ul><li><a href=https://www.thinkingondata.com/how-to-organize-data-science-projects/>Best practices for organizing data science projects</a></li><li><a href=https://gist.github.com/ericmjl/27e50331f24db3e8f957d1fe7bbbe510#directory-structure>How to organize your Python data science project</a></li><li><a href=https://drivendata.github.io/cookiecutter-data-science/#directory-structure>Cookiecutter Data Science</a></li><li><a href=https://dzone.com/articles/data-science-project-folder-structure>Data Science Project Folder Structure</a></li></ul><br><h3 id=what-are-codecs-and-encodings>What are codecs and encodings?</h3><ul><li><a href=https://en.wikipedia.org/wiki/UTF-8>UTF-8</a></li><li><a href=http://python-notes.curiousefficiency.org/en/latest/python3/text_file_processing.html#unicode-basics>Python Unicode Basics</a></li><li><a href=https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html>pd.read_csv()</a></li><li><a href=https://en.wikipedia.org/wiki/ISO/IEC_8859-1>ISO-8859-1</a></li></ul><br><h3 id=the-str-functions-in-pandas>The <code>.str</code> functions in pandas</h3><ul><li><code>.strip</code>: <a href=https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.strip.html>Strip white space</a></li><li><code>.replace</code>: <a href=https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.replace.html>replace one string of characters with another.</a></li><li><code>.split</code>: <a href=https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.split.html>Separate a character string into two values.</a></li><li><code>.join</code>: <a href=https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.join.html#pandas.Series.str.join>Join two lists together</a></li><li><a href=https://byuidatascience.github.io/python4ds/strings.html>Python for Data Science: Strings</a></li><li><a href=https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.strip.html>Pandas Documentation</a></li></ul></div><p class="post-meta border-bottom pb-3 mb-0 mt-3">Updated on 12 Oct 2020</p><nav class="pagination mt-3"><a class="nav nav-prev" href=https://byuistats.github.io/DS250-Cannon/slides/p5/d2/><i class="ti-arrow-left mr-2"></i><span class="d-none d-md-block">Day 2: Star Wars and strings</span></a>
-<a class="nav nav-next" href=https://byuistats.github.io/DS250-Cannon/slides/introduction/><span class="d-none d-md-block">Week 1: Introduction</span><i class="ti-arrow-right ml-2"></i></a></nav></div></div></div></div></section><footer class="section pb-4"><div class=container><div class="row align-items-center"><div class="col-md-8 text-md-left text-center"><p class="mb-md-0 mb-4">J. Hathaway and BYU-I ©</p></div><div class="col-md-4 text-md-right text-center"><ul class=list-inline><li class=list-inline-item><a class="text-color d-inline-block p-2" href=https://github.com/byuidatascience><i class=ti-github></i></a></li><li class=list-inline-item><a class="text-color d-inline-block p-2" href=https://www.linkedin.com/groups/13537407/><i class=ti-linkedin></i></a></li></ul></div></div></div></footer><script src=https://byuistats.github.io/DS250-Cannon/js/script.min.js></script></body></html>
\ No newline at end of file
diff --git a/slides/p5/d1/index.xml b/slides/p5/d1/index.xml
deleted file mode 100644
index ea67e10..0000000
--- a/slides/p5/d1/index.xml
+++ /dev/null
@@ -1 +0,0 @@
-<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Day 1: The war with Star Wars on DS250</title><link>https://byuistats.github.io/DS250-Cannon/slides/p5/d1/</link><description>Recent content in Day 1: The war with Star Wars on DS250</description><generator>Hugo -- gohugo.io</generator><language>en-us</language><copyright>J. Hathaway and BYU-I ©</copyright><lastBuildDate>Fri, 01 May 2020 11:02:05 +0600</lastBuildDate><atom:link href="https://byuistats.github.io/DS250-Cannon/slides/p5/d1/index.xml" rel="self" type="application/rss+xml"/></channel></rss>
\ No newline at end of file
diff --git a/slides/p5/d2/clean_workflow.png b/slides/p5/d2/clean_workflow.png
deleted file mode 100644
index ed2364c..0000000
Binary files a/slides/p5/d2/clean_workflow.png and /dev/null differ
diff --git a/slides/p5/d2/index.html b/slides/p5/d2/index.html
deleted file mode 100644
index 6cb64fd..0000000
--- a/slides/p5/d2/index.html
+++ /dev/null
@@ -1,48 +0,0 @@
-<!doctype html><html lang=en-us><head><meta charset=utf-8><title>Day 2: Star Wars and strings</title><meta name=generator content="Hugo 0.74.3"><meta name=viewport content="width=device-width,initial-scale=1,maximum-scale=1"><link rel=stylesheet href=https://byuistats.github.io/DS250-Cannon/plugins/bootstrap/bootstrap.min.css><link rel=stylesheet href=https://byuistats.github.io/DS250-Cannon/plugins/themify-icons/themify-icons.css><link rel=stylesheet href=https://byuistats.github.io/DS250-Cannon/plugins/highlight/hybrid.css><link rel=icon href=https://byuistats.github.io/DS250-Cannon/images/favicon.png type=image/x-icon><link href="https://fonts.googleapis.com/css?family=Roboto:300,400,700&display=swap" rel=stylesheet><style>:root{--primary-color:#02007e;--body-color:#f9f9f9;--text-color:#636363;--text-color-dark:#242738;--white-color:#ffffff;--light-color:#f8f9fa;--font-family:Roboto}</style><link href=https://byuistats.github.io/DS250-Cannon/css/style.min.css rel=stylesheet media=screen><script src=https://byuistats.github.io/DS250-Cannon/plugins/jquery/jquery-1.12.4.js></script><script src=https://byuistats.github.io/DS250-Cannon/plugins/jquery/jquery-ui.js></script><script src=https://byuistats.github.io/DS250-Cannon/plugins/bootstrap/bootstrap.min.js></script><script src=https://byuistats.github.io/DS250-Cannon/plugins/match-height/jquery.matchHeight-min.js></script><script src=https://byuistats.github.io/DS250-Cannon/plugins/highlight/highlight.pack.js></script><script>hljs.initHighlightingOnLoad();</script><script type=application/javascript>var doNotTrack=false;if(!doNotTrack){(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)})(window,document,'script','https://www.google-analytics.com/analytics.js','ga');ga('create','UA-132356198-4','auto');ga('send','pageview');}</script></head><body><header class="shadow-bottom sticky-top bg-white"><nav class="navbar navbar-expand-md navbar-light"><div class=container><a class="navbar-brand px-2" href=/DS250-Cannon>DS250</a>
-<button class="navbar-toggler border-0" type=button data-toggle=collapse data-target=#navigation aria-controls=navigation aria-expanded=false aria-label="Toggle navigation">
-<span class=navbar-toggler-icon></span></button><div class="collapse navbar-collapse text-center" id=navigation><ul class="navbar-nav ml-auto"><li class=nav-item><a class="nav-link text-dark" href=/DS250-Cannon>Home</a></li><li class=nav-item><a class="nav-link text-dark" href=/DS250-Cannon/projects>Projects</a></li><li class=nav-item><a class="nav-link text-dark" href=/DS250-Cannon/contact>Contact</a></li><li class=nav-item><a class="nav-link text-dark" href=/DS250-Cannon/course-materials>Materials</a></li><li class="nav-item dropdown"><a class="nav-link dropdown-toggle text-dark" href=# role=button data-toggle=dropdown aria-haspopup=true aria-expanded=false>Navigate</a><div class=dropdown-menu><a class=dropdown-item href=/DS250-Cannon/slides>Slides</a>
-<a class=dropdown-item href=/DS250-Cannon/course-materials/syllabus/>Syllabus</a>
-<a class=dropdown-item href=/DS250-Cannon/faq>FAQ</a></div></li></ul></div></div></nav></header><section class="single section-sm pb-0"><div class=container><div class=row><div class=col-lg-3><div class=sidebar><ul class=list-styled><a class=back-btn href=/DS250-Cannon></a><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/ title=Slides class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/>Slides</a><ul><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p6/ title="Week 12-13: Project 6 - Github" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p6/>Week 12-13: Project 6 - Github</a><ul><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p6/d4/ title="Day 3: Resume Fork and Merge" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p6/d4/>Day 3: Resume Fork and Merge</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p6/d3/ title="Day 2: Commit, push, fork, and merge" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p6/d3/>Day 2: Commit, push, fork, and merge</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p6/d2/ title="Day 1: Git and Github" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p6/d2/>Day 1: Git and Github</a></li></ul></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p5/ title="Week 10-11: Project 5 - Star Wars" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p5/>Week 10-11: Project 5 - Star Wars</a><ul><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p5/d4/ title="Day 4: May the ML columns be with you" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p5/d4/>Day 4: May the ML columns be with you</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p5/d3/ title="Day 3: Validating data, cleaning columns" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p5/d3/>Day 3: Validating data, cleaning columns</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p5/d2/ title="Day 2: Star Wars and strings" class="sidelist
-active"><a href=https://byuistats.github.io/DS250-Cannon/slides/p5/d2/>Day 2: Star Wars and strings</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p5/d1/ title="Day 1: The war with Star Wars" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p5/d1/>Day 1: The war with Star Wars</a></li></ul></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/introduction/ title="Week 1: Introduction" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/introduction/>Week 1: Introduction</a><ul><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/introduction/day02/ title="Day 2: Project 0" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/introduction/day02/>Day 2: Project 0</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/introduction/day01/ title="Day 1: Welcome" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/introduction/day01/>Day 1: Welcome</a></li></ul></li></ul></li></ul></div></div><div class=col-lg-9><div class="p-lg-5 p-4 bg-white"><h2 class=mb-5>Day 2: Star Wars and strings</h2><div class=content><h2 id=welcome-to-class>Welcome to class!</h2><h4 id=announcements>Announcements</h4><h4 id=whats-something-youre-grateful-for-today>What&rsquo;s something you&rsquo;re grateful for today?</h4><br><h2 id=the-str-functions-in-pandas>The <code>.str</code> functions in pandas</h2><blockquote><ul><li><code>.str.strip</code>: <a href=https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.strip.html>Strip white space</a></li><li><code>.str.replace</code>: <a href=https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.replace.html>replace one string of characters with another.</a></li><li><code>.str.split</code>: <a href=https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.split.html>Separate a character string into two values.</a></li><li><code>.str.join</code>: <a href=https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.join.html#pandas.Series.str.join>Join two lists together</a></li><li><a href=https://byuidatascience.github.io/python4ds/strings.html>Python for Data Science: Strings</a></li><li><a href=https://pandas.pydata.org/pandas-docs/stable/user_guide/text.html#method-summary>Pandas Documentation</a></li></ul></blockquote><br><h4 id=strstrip><code>.str.strip()</code></h4><div class=highlight><pre style=color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4><code class=language-python data-lang=python>s <span style=color:#f92672>=</span> pd<span style=color:#f92672>.</span>Series([<span style=color:#e6db74>&#39;1. Ant.  &#39;</span>, <span style=color:#e6db74>&#39;2. Bee!</span><span style=color:#ae81ff>\n</span><span style=color:#e6db74>&#39;</span>, <span style=color:#e6db74>&#39;3. Cat?</span><span style=color:#ae81ff>\t</span><span style=color:#e6db74>&#39;</span>, <span style=color:#e6db74>&#39;4. Beat?</span><span style=color:#ae81ff>\t</span><span style=color:#e6db74>&#39;</span>, np<span style=color:#f92672>.</span>nan])
-
-s<span style=color:#f92672>.</span>str<span style=color:#f92672>.</span>strip()
-
-s<span style=color:#f92672>.</span>str<span style=color:#f92672>.</span>strip(<span style=color:#e6db74>&#39;123.!? </span><span style=color:#ae81ff>\n\t</span><span style=color:#e6db74>&#39;</span>)
-
-s<span style=color:#f92672>.</span>str<span style=color:#f92672>.</span>strip(<span style=color:#e6db74>&#39;1234.!? </span><span style=color:#ae81ff>\n\t</span><span style=color:#e6db74>&#39;</span>)
-
-</code></pre></div><br><h4 id=strreplace><code>.str.replace()</code></h4><div class=highlight><pre style=color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4><code class=language-python data-lang=python>s<span style=color:#f92672>.</span>str<span style=color:#f92672>.</span>replace(<span style=color:#e6db74>&#39;Ant.&#39;</span>, <span style=color:#e6db74>&#39;Man&#39;</span>)
-s<span style=color:#f92672>.</span>str<span style=color:#f92672>.</span>replace(<span style=color:#e6db74>&#39;a&#39;</span>, <span style=color:#ae81ff>8</span>)
-s<span style=color:#f92672>.</span>str<span style=color:#f92672>.</span>replace(<span style=color:#e6db74>&#39;a&#39;</span>, <span style=color:#e6db74>&#39;8&#39;</span>)
-s<span style=color:#f92672>.</span>str<span style=color:#f92672>.</span>replace(<span style=color:#e6db74>&#39;a&#39;</span>, <span style=color:#e6db74>&#39;8&#39;</span>, case <span style=color:#f92672>=</span> False)
-s<span style=color:#f92672>.</span>str<span style=color:#f92672>.</span>replace(<span style=color:#e6db74>&#39;a|e&#39;</span>, <span style=color:#e6db74>&#39;8&#39;</span>, case <span style=color:#f92672>=</span> False)
-
-s<span style=color:#f92672>.</span>str<span style=color:#f92672>.</span>replace(<span style=color:#e6db74>&#39;\d&#39;</span>, <span style=color:#e6db74>&#39;&#39;</span>, case <span style=color:#f92672>=</span> False)
-
-</code></pre></div><br><h4 id=strsplit><code>.str.split()</code></h4><div class=highlight><pre style=color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4><code class=language-python data-lang=python>s2 <span style=color:#f92672>=</span> pd<span style=color:#f92672>.</span>Series([<span style=color:#e6db74>&#39;1-20&#39;</span>, <span style=color:#e6db74>&#39;21-50&#39;</span>, <span style=color:#e6db74>&#39;51-80&#39;</span>, <span style=color:#e6db74>&#39;81-100&#39;</span>, np<span style=color:#f92672>.</span>nan])
-s3 <span style=color:#f92672>=</span> pd<span style=color:#f92672>.</span>Series(
-    [
-        <span style=color:#e6db74>&#34;this is a regular sentence&#34;</span>,
-        <span style=color:#e6db74>&#34;https://docs.python.org/3/tutorial/index.html&#34;</span>,
-        np<span style=color:#f92672>.</span>nan
-    ]
-)
-
-s2<span style=color:#f92672>.</span>str<span style=color:#f92672>.</span>split()
-s3<span style=color:#f92672>.</span>str<span style=color:#f92672>.</span>split()
-s2<span style=color:#f92672>.</span>str<span style=color:#f92672>.</span>split(pat<span style=color:#f92672>=</span><span style=color:#e6db74>&#34;-&#34;</span>)
-</code></pre></div><br><h4 id=strjoin-or-strcat><code>.str.join()</code> or <code>.str.cat()</code></h4><div class=highlight><pre style=color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4><code class=language-python data-lang=python>two_columns <span style=color:#f92672>=</span> s2<span style=color:#f92672>.</span>str<span style=color:#f92672>.</span>split(<span style=color:#e6db74>&#34;-&#34;</span>, expand <span style=color:#f92672>=</span> True)<span style=color:#f92672>.</span>rename(
-   columns <span style=color:#f92672>=</span> {<span style=color:#ae81ff>0</span>: <span style=color:#e6db74>&#39;minimum&#39;</span>, <span style=color:#ae81ff>1</span>: <span style=color:#e6db74>&#39;maximum&#39;</span>})
-
-two_columns<span style=color:#f92672>.</span>fillna(<span style=color:#e6db74>&#34;&#34;</span>)<span style=color:#f92672>.</span>agg(<span style=color:#e6db74>&#34;__&#34;</span><span style=color:#f92672>.</span>join, axis <span style=color:#f92672>=</span> <span style=color:#ae81ff>1</span>)
-
-two_columns<span style=color:#f92672>.</span>minimum<span style=color:#f92672>.</span>str<span style=color:#f92672>.</span>cat(two_columns<span style=color:#f92672>.</span>maximum, sep <span style=color:#f92672>=</span> <span style=color:#e6db74>&#34;__&#34;</span>)
-
-</code></pre></div><br><h2 id=fixing-the-column-names>Fixing the column names</h2><p>Here is some code to get you started:</p><pre><code class=language-{python} data-lang={python}>url = 'https://github.com/fivethirtyeight/data/raw/master/star-wars-survey/StarWars.csv'
-
-starwars_data = pd.read_csv(url, encoding = &quot;ISO-8859-1&quot;, skiprows = 2, header = None)
-starwars_cols = pd.read_csv(url, encoding = &quot;ISO-8859-1&quot;, nrows = 2, header = None)
-
-starwars_cols.iloc[0,:].str.upper().str.replace(&quot; &quot;, &quot;!&quot;)
-</code></pre><br><h2 id=validating-statistical-summaries>Validating statistical summaries</h2><p><code>len()</code>, <code>.query()</code>, and <code>.value_counts()</code> will be your friends.</p><br><h2 id=validating-visuals>Validating visuals</h2><p>You&rsquo;re going to make a lot of bar charts!</p><ul><li><a href=https://altair-viz.github.io/gallery/simple_bar_chart.html>Simple bar chart</a> tutorial.</li><li>Make Altair do the counting for you! Tutorials <a href=https://altair-viz.github.io/user_guide/transform/aggregate.html>here</a> and <a href=https://stackoverflow.com/questions/62405935/altair-pandas-value-counts-horizontal-bar-chart>here</a>.</li></ul></div><p class="post-meta border-bottom pb-3 mb-0 mt-3">Updated on 12 Oct 2020</p><nav class="pagination mt-3"><a class="nav nav-prev" href=https://byuistats.github.io/DS250-Cannon/slides/p5/d3/><i class="ti-arrow-left mr-2"></i><span class="d-none d-md-block">Day 3: Validating data, cleaning columns</span></a>
-<a class="nav nav-next" href=https://byuistats.github.io/DS250-Cannon/slides/p5/d1/><span class="d-none d-md-block">Day 1: The war with Star Wars</span><i class="ti-arrow-right ml-2"></i></a></nav></div></div></div></div></section><footer class="section pb-4"><div class=container><div class="row align-items-center"><div class="col-md-8 text-md-left text-center"><p class="mb-md-0 mb-4">J. Hathaway and BYU-I ©</p></div><div class="col-md-4 text-md-right text-center"><ul class=list-inline><li class=list-inline-item><a class="text-color d-inline-block p-2" href=https://github.com/byuidatascience><i class=ti-github></i></a></li><li class=list-inline-item><a class="text-color d-inline-block p-2" href=https://www.linkedin.com/groups/13537407/><i class=ti-linkedin></i></a></li></ul></div></div></div></footer><script src=https://byuistats.github.io/DS250-Cannon/js/script.min.js></script></body></html>
\ No newline at end of file
diff --git a/slides/p5/d2/index.xml b/slides/p5/d2/index.xml
deleted file mode 100644
index cc82763..0000000
--- a/slides/p5/d2/index.xml
+++ /dev/null
@@ -1 +0,0 @@
-<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Day 2: Star Wars and strings on DS250</title><link>https://byuistats.github.io/DS250-Cannon/slides/p5/d2/</link><description>Recent content in Day 2: Star Wars and strings on DS250</description><generator>Hugo -- gohugo.io</generator><language>en-us</language><copyright>J. Hathaway and BYU-I ©</copyright><lastBuildDate>Fri, 01 May 2020 11:02:05 +0600</lastBuildDate><atom:link href="https://byuistats.github.io/DS250-Cannon/slides/p5/d2/index.xml" rel="self" type="application/rss+xml"/></channel></rss>
\ No newline at end of file
diff --git a/slides/p5/d3/clean_workflow.png b/slides/p5/d3/clean_workflow.png
deleted file mode 100644
index ed2364c..0000000
Binary files a/slides/p5/d3/clean_workflow.png and /dev/null differ
diff --git a/slides/p5/d3/index.html b/slides/p5/d3/index.html
deleted file mode 100644
index 41f9b78..0000000
--- a/slides/p5/d3/index.html
+++ /dev/null
@@ -1,36 +0,0 @@
-<!doctype html><html lang=en-us><head><meta charset=utf-8><title>Day 3: Validating data, cleaning columns</title><meta name=generator content="Hugo 0.74.3"><meta name=viewport content="width=device-width,initial-scale=1,maximum-scale=1"><link rel=stylesheet href=https://byuistats.github.io/DS250-Cannon/plugins/bootstrap/bootstrap.min.css><link rel=stylesheet href=https://byuistats.github.io/DS250-Cannon/plugins/themify-icons/themify-icons.css><link rel=stylesheet href=https://byuistats.github.io/DS250-Cannon/plugins/highlight/hybrid.css><link rel=icon href=https://byuistats.github.io/DS250-Cannon/images/favicon.png type=image/x-icon><link href="https://fonts.googleapis.com/css?family=Roboto:300,400,700&display=swap" rel=stylesheet><style>:root{--primary-color:#02007e;--body-color:#f9f9f9;--text-color:#636363;--text-color-dark:#242738;--white-color:#ffffff;--light-color:#f8f9fa;--font-family:Roboto}</style><link href=https://byuistats.github.io/DS250-Cannon/css/style.min.css rel=stylesheet media=screen><script src=https://byuistats.github.io/DS250-Cannon/plugins/jquery/jquery-1.12.4.js></script><script src=https://byuistats.github.io/DS250-Cannon/plugins/jquery/jquery-ui.js></script><script src=https://byuistats.github.io/DS250-Cannon/plugins/bootstrap/bootstrap.min.js></script><script src=https://byuistats.github.io/DS250-Cannon/plugins/match-height/jquery.matchHeight-min.js></script><script src=https://byuistats.github.io/DS250-Cannon/plugins/highlight/highlight.pack.js></script><script>hljs.initHighlightingOnLoad();</script><script type=application/javascript>var doNotTrack=false;if(!doNotTrack){(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)})(window,document,'script','https://www.google-analytics.com/analytics.js','ga');ga('create','UA-132356198-4','auto');ga('send','pageview');}</script></head><body><header class="shadow-bottom sticky-top bg-white"><nav class="navbar navbar-expand-md navbar-light"><div class=container><a class="navbar-brand px-2" href=/DS250-Cannon>DS250</a>
-<button class="navbar-toggler border-0" type=button data-toggle=collapse data-target=#navigation aria-controls=navigation aria-expanded=false aria-label="Toggle navigation">
-<span class=navbar-toggler-icon></span></button><div class="collapse navbar-collapse text-center" id=navigation><ul class="navbar-nav ml-auto"><li class=nav-item><a class="nav-link text-dark" href=/DS250-Cannon>Home</a></li><li class=nav-item><a class="nav-link text-dark" href=/DS250-Cannon/projects>Projects</a></li><li class=nav-item><a class="nav-link text-dark" href=/DS250-Cannon/contact>Contact</a></li><li class=nav-item><a class="nav-link text-dark" href=/DS250-Cannon/course-materials>Materials</a></li><li class="nav-item dropdown"><a class="nav-link dropdown-toggle text-dark" href=# role=button data-toggle=dropdown aria-haspopup=true aria-expanded=false>Navigate</a><div class=dropdown-menu><a class=dropdown-item href=/DS250-Cannon/slides>Slides</a>
-<a class=dropdown-item href=/DS250-Cannon/course-materials/syllabus/>Syllabus</a>
-<a class=dropdown-item href=/DS250-Cannon/faq>FAQ</a></div></li></ul></div></div></nav></header><section class="single section-sm pb-0"><div class=container><div class=row><div class=col-lg-3><div class=sidebar><ul class=list-styled><a class=back-btn href=/DS250-Cannon></a><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/ title=Slides class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/>Slides</a><ul><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p6/ title="Week 12-13: Project 6 - Github" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p6/>Week 12-13: Project 6 - Github</a><ul><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p6/d4/ title="Day 3: Resume Fork and Merge" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p6/d4/>Day 3: Resume Fork and Merge</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p6/d3/ title="Day 2: Commit, push, fork, and merge" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p6/d3/>Day 2: Commit, push, fork, and merge</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p6/d2/ title="Day 1: Git and Github" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p6/d2/>Day 1: Git and Github</a></li></ul></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p5/ title="Week 10-11: Project 5 - Star Wars" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p5/>Week 10-11: Project 5 - Star Wars</a><ul><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p5/d4/ title="Day 4: May the ML columns be with you" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p5/d4/>Day 4: May the ML columns be with you</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p5/d3/ title="Day 3: Validating data, cleaning columns" class="sidelist
-active"><a href=https://byuistats.github.io/DS250-Cannon/slides/p5/d3/>Day 3: Validating data, cleaning columns</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p5/d2/ title="Day 2: Star Wars and strings" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p5/d2/>Day 2: Star Wars and strings</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p5/d1/ title="Day 1: The war with Star Wars" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p5/d1/>Day 1: The war with Star Wars</a></li></ul></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/introduction/ title="Week 1: Introduction" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/introduction/>Week 1: Introduction</a><ul><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/introduction/day02/ title="Day 2: Project 0" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/introduction/day02/>Day 2: Project 0</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/introduction/day01/ title="Day 1: Welcome" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/introduction/day01/>Day 1: Welcome</a></li></ul></li></ul></li></ul></div></div><div class=col-lg-9><div class="p-lg-5 p-4 bg-white"><h2 class=mb-5>Day 3: Validating data, cleaning columns</h2><div class=content><h2 id=welcome-to-class>Welcome to class!</h2><h4 id=announcements>Announcements</h4><h4 id=spiritual-thought>Spiritual Thought</h4><h2 id=lets-validate-some-data>Let&rsquo;s validate some data!</h2><p>Pick something from <a href=https://fivethirtyeight.com/features/americas-favorite-star-wars-movies-and-least-favorite-characters/>the Star Wars article</a> you want to validate (&ldquo;double check&rdquo;).</p><br><h2 id=moving-from-categories-to-values>Moving from categories to values.</h2><blockquote><ol><li><strong>Create an additional column(s) that converts the income ranges to a number.</strong></li><li><strong>Create an additional column(s) that converts the age ranges to a number.</strong></li><li><strong>Create an additional column(s) that converts the school groupings to a number.</strong></li></ol></blockquote><ul><li><a href=https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.replace.html>str.replace('&rsquo;, &lsquo;9&rsquo;)</a></li><li><a href=https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.astype.html>astype(&lsquo;float&rsquo;)</a></li><li><a href=https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.concat.html>pd.concat(axis=1)</a></li></ul><br><h2 id=validating-visuals>Validating visuals</h2><p>You&rsquo;re going to make a lot of bar charts!</p><ul><li><a href=https://altair-viz.github.io/gallery/simple_bar_chart.html>Simple bar chart</a> tutorial.</li><li>Make Altair do the counting for you! Tutorials <a href=https://altair-viz.github.io/user_guide/transform/aggregate.html>here</a> and <a href=https://stackoverflow.com/questions/62405935/altair-pandas-value-counts-horizontal-bar-chart>here</a>.</li></ul><br><h2 id=getting-started-on-question-3>Getting started on Question 3</h2><h3 id=one-hot-encoding>One-hot encoding</h3><p>Project 5 asks you to <strong>&ldquo;one-hot encode all columns that have categories&rdquo;</strong> and <strong>&ldquo;convert all yes/no responses to 1/0 numeric&rdquo;</strong>.</p><p>The <code>get_dummies</code> method can be used to create one-hot encoded variables. The <a href=https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.get_dummies.html>pd.get_dummies documentation</a> is a great place to start.</p><p>After reading the documentation, study the code below and get started on Grand Question #3.</p><div class=highlight><pre style=color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4><code class=language-python data-lang=python><span style=color:#75715e>#%%</span>
-<span style=color:#75715e># When we use machine learning to predict salary,</span>
-<span style=color:#75715e># let&#39;s only look at people that have seen at least</span>
-<span style=color:#75715e># one star wars film</span>
-starwars <span style=color:#f92672>=</span> starwars<span style=color:#f92672>.</span>query(<span style=color:#e6db74>&#39;have_seen_any == &#34;Yes&#34;&#39;</span>)
-
-<span style=color:#75715e># Discuss - what&#39;s a better way to filter out people </span>
-<span style=color:#75715e># who haven&#39;t seen star wars?</span>
-
-<span style=color:#75715e># %%</span>
-<span style=color:#75715e># Format columns for machine learning</span>
-
-<span style=color:#75715e># Let&#39;s try this first: convert categories to &#34;one-hot&#34; encodings</span>
-shot_first_onehot <span style=color:#f92672>=</span> pd<span style=color:#f92672>.</span>get_dummies(starwars<span style=color:#f92672>.</span>shot_first)
-shot_first_onehot
-
-<span style=color:#75715e># What the difference between code above,</span>
-<span style=color:#75715e># and this? Which one is better?</span>
-shot_first_onehot <span style=color:#f92672>=</span> pd<span style=color:#f92672>.</span>get_dummies(starwars<span style=color:#f92672>.</span>shot_first, drop_first<span style=color:#f92672>=</span>True)
-shot_first_onehot
-
-<span style=color:#75715e># %%</span>
-<span style=color:#75715e># &#39;get_dummies()&#39; can also be used to convert yes/no answers to 0/1</span>
-
-episode_i <span style=color:#f92672>=</span> pd<span style=color:#f92672>.</span>get_dummies(starwars<span style=color:#f92672>.</span>seen_film_i__the_phantom_menace)
-episode_i
-
-<span style=color:#75715e># %%</span>
-episode_i<span style=color:#f92672>.</span>value_counts()
-</code></pre></div></div><p class="post-meta border-bottom pb-3 mb-0 mt-3">Updated on 12 Oct 2020</p><nav class="pagination mt-3"><a class="nav nav-prev" href=https://byuistats.github.io/DS250-Cannon/slides/p5/d4/><i class="ti-arrow-left mr-2"></i><span class="d-none d-md-block">Day 4: May the ML columns be with you</span></a>
-<a class="nav nav-next" href=https://byuistats.github.io/DS250-Cannon/slides/p5/d2/><span class="d-none d-md-block">Day 2: Star Wars and strings</span><i class="ti-arrow-right ml-2"></i></a></nav></div></div></div></div></section><footer class="section pb-4"><div class=container><div class="row align-items-center"><div class="col-md-8 text-md-left text-center"><p class="mb-md-0 mb-4">J. Hathaway and BYU-I ©</p></div><div class="col-md-4 text-md-right text-center"><ul class=list-inline><li class=list-inline-item><a class="text-color d-inline-block p-2" href=https://github.com/byuidatascience><i class=ti-github></i></a></li><li class=list-inline-item><a class="text-color d-inline-block p-2" href=https://www.linkedin.com/groups/13537407/><i class=ti-linkedin></i></a></li></ul></div></div></div></footer><script src=https://byuistats.github.io/DS250-Cannon/js/script.min.js></script></body></html>
\ No newline at end of file
diff --git a/slides/p5/d3/index.xml b/slides/p5/d3/index.xml
deleted file mode 100644
index 1c42823..0000000
--- a/slides/p5/d3/index.xml
+++ /dev/null
@@ -1 +0,0 @@
-<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Day 3: Validating data, cleaning columns on DS250</title><link>https://byuistats.github.io/DS250-Cannon/slides/p5/d3/</link><description>Recent content in Day 3: Validating data, cleaning columns on DS250</description><generator>Hugo -- gohugo.io</generator><language>en-us</language><copyright>J. Hathaway and BYU-I ©</copyright><lastBuildDate>Fri, 01 May 2020 11:02:05 +0600</lastBuildDate><atom:link href="https://byuistats.github.io/DS250-Cannon/slides/p5/d3/index.xml" rel="self" type="application/rss+xml"/></channel></rss>
\ No newline at end of file
diff --git a/slides/p5/d4/index.html b/slides/p5/d4/index.html
deleted file mode 100644
index 816ad57..0000000
--- a/slides/p5/d4/index.html
+++ /dev/null
@@ -1,10 +0,0 @@
-<!doctype html><html lang=en-us><head><meta charset=utf-8><title>Day 4: May the ML columns be with you</title><meta name=generator content="Hugo 0.74.3"><meta name=viewport content="width=device-width,initial-scale=1,maximum-scale=1"><link rel=stylesheet href=https://byuistats.github.io/DS250-Cannon/plugins/bootstrap/bootstrap.min.css><link rel=stylesheet href=https://byuistats.github.io/DS250-Cannon/plugins/themify-icons/themify-icons.css><link rel=stylesheet href=https://byuistats.github.io/DS250-Cannon/plugins/highlight/hybrid.css><link rel=icon href=https://byuistats.github.io/DS250-Cannon/images/favicon.png type=image/x-icon><link href="https://fonts.googleapis.com/css?family=Roboto:300,400,700&display=swap" rel=stylesheet><style>:root{--primary-color:#02007e;--body-color:#f9f9f9;--text-color:#636363;--text-color-dark:#242738;--white-color:#ffffff;--light-color:#f8f9fa;--font-family:Roboto}</style><link href=https://byuistats.github.io/DS250-Cannon/css/style.min.css rel=stylesheet media=screen><script src=https://byuistats.github.io/DS250-Cannon/plugins/jquery/jquery-1.12.4.js></script><script src=https://byuistats.github.io/DS250-Cannon/plugins/jquery/jquery-ui.js></script><script src=https://byuistats.github.io/DS250-Cannon/plugins/bootstrap/bootstrap.min.js></script><script src=https://byuistats.github.io/DS250-Cannon/plugins/match-height/jquery.matchHeight-min.js></script><script src=https://byuistats.github.io/DS250-Cannon/plugins/highlight/highlight.pack.js></script><script>hljs.initHighlightingOnLoad();</script><script type=application/javascript>var doNotTrack=false;if(!doNotTrack){(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)})(window,document,'script','https://www.google-analytics.com/analytics.js','ga');ga('create','UA-132356198-4','auto');ga('send','pageview');}</script></head><body><header class="shadow-bottom sticky-top bg-white"><nav class="navbar navbar-expand-md navbar-light"><div class=container><a class="navbar-brand px-2" href=/DS250-Cannon>DS250</a>
-<button class="navbar-toggler border-0" type=button data-toggle=collapse data-target=#navigation aria-controls=navigation aria-expanded=false aria-label="Toggle navigation">
-<span class=navbar-toggler-icon></span></button><div class="collapse navbar-collapse text-center" id=navigation><ul class="navbar-nav ml-auto"><li class=nav-item><a class="nav-link text-dark" href=/DS250-Cannon>Home</a></li><li class=nav-item><a class="nav-link text-dark" href=/DS250-Cannon/projects>Projects</a></li><li class=nav-item><a class="nav-link text-dark" href=/DS250-Cannon/contact>Contact</a></li><li class=nav-item><a class="nav-link text-dark" href=/DS250-Cannon/course-materials>Materials</a></li><li class="nav-item dropdown"><a class="nav-link dropdown-toggle text-dark" href=# role=button data-toggle=dropdown aria-haspopup=true aria-expanded=false>Navigate</a><div class=dropdown-menu><a class=dropdown-item href=/DS250-Cannon/slides>Slides</a>
-<a class=dropdown-item href=/DS250-Cannon/course-materials/syllabus/>Syllabus</a>
-<a class=dropdown-item href=/DS250-Cannon/faq>FAQ</a></div></li></ul></div></div></nav></header><section class="single section-sm pb-0"><div class=container><div class=row><div class=col-lg-3><div class=sidebar><ul class=list-styled><a class=back-btn href=/DS250-Cannon></a><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/ title=Slides class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/>Slides</a><ul><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p6/ title="Week 12-13: Project 6 - Github" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p6/>Week 12-13: Project 6 - Github</a><ul><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p6/d4/ title="Day 3: Resume Fork and Merge" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p6/d4/>Day 3: Resume Fork and Merge</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p6/d3/ title="Day 2: Commit, push, fork, and merge" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p6/d3/>Day 2: Commit, push, fork, and merge</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p6/d2/ title="Day 1: Git and Github" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p6/d2/>Day 1: Git and Github</a></li></ul></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p5/ title="Week 10-11: Project 5 - Star Wars" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p5/>Week 10-11: Project 5 - Star Wars</a><ul><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p5/d4/ title="Day 4: May the ML columns be with you" class="sidelist
-active"><a href=https://byuistats.github.io/DS250-Cannon/slides/p5/d4/>Day 4: May the ML columns be with you</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p5/d3/ title="Day 3: Validating data, cleaning columns" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p5/d3/>Day 3: Validating data, cleaning columns</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p5/d2/ title="Day 2: Star Wars and strings" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p5/d2/>Day 2: Star Wars and strings</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p5/d1/ title="Day 1: The war with Star Wars" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p5/d1/>Day 1: The war with Star Wars</a></li></ul></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/introduction/ title="Week 1: Introduction" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/introduction/>Week 1: Introduction</a><ul><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/introduction/day02/ title="Day 2: Project 0" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/introduction/day02/>Day 2: Project 0</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/introduction/day01/ title="Day 1: Welcome" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/introduction/day01/>Day 1: Welcome</a></li></ul></li></ul></li></ul></div></div><div class=col-lg-9><div class="p-lg-5 p-4 bg-white"><h2 class=mb-5>Day 4: May the ML columns be with you</h2><div class=content><h2 id=welcome-to-class>Welcome to class!</h2><h4 id=spiritual-thought>Spiritual Thought</h4><h4 id=announcements>Announcements</h4><br><h2 id=getting-the-data-ready-for-machine-learning>Getting the data ready for machine learning.</h2><br><h3 id=what-are-machine-learning-algorithms-expecting-to-see>What are machine learning algorithms expecting to see?</h3><blockquote><p>We need to handle missing values and categorical features before feeding the data into a machine learning algorithm, because the mathematics underlying most machine learning models assumes that the data is numerical and contains no missing values. To reinforce this requirement, scikit-learn will return an error if you try to train a model using data that contain missing values or non-numeric values when working with models like linear regression and logistic regression. <a href=https://www.dataquest.io/blog/machine-learning-preparing-data/>ref</a></p></blockquote><p>We have some options when converting categorical features (columns) to numeric.</p><ul><li>If the <strong>category contains numeric information</strong> (like a range of numbers) we can convert it to a numeric variable by taking the minimum, average, or maximum of the range.</li><li><strong>Factorization:</strong> If the category is an <strong>&ldquo;ordinal&rdquo;</strong> variable (meaning, <a href="https://www.questionpro.com/blog/nominal-ordinal-interval-ratio/#:~:text=Nominal%20scale%20is%20a%20naming,each%20of%20its%20variable%20options.">there is an order to the categories</a>) we can assign each category to an integer. (For example, good = 1, better = 2, best = 3.)</li><li><strong>One-hot Encoding or Dummy Variables:</strong> If the category is a <strong>&ldquo;nominal&rdquo;</strong> variable (without an order) then we need to use one-hot encoding (sometimes called &ldquo;<a href=https://machinelearningmastery.com/one-hot-encoding-for-categorical-data/>dummy variable encoding</a>").</li><li>If the <strong>category is some version of True/False or Yes/No</strong> then we can simply convert the values to zeros and ones.</li></ul><br><h1 id=whats-our-game-plan-for-the-star-wars-columns>What&rsquo;s our game plan for the Star Wars columns?</h1><h3 id=1-break-into-groups>1. Break into Groups</h3><h4 id=strategize--code--share>Strategize + Code + Share</h4><ul><li>Group 1: How are you going to turn Age, Income and Education into numbers?</li><li>Group 2: How are you going to encode<ul><li>Who Shot First</li><li>Gender</li><li>Location</li><li>All the Yes/No responses</li></ul></li><li>Group 3: How are you going to deal with the character rankings?</li></ul><h3 id=2-combine-all-the-factors-into-one-big-x-dataframe>2. Combine all the factors into one big X dataframe</h3><h3 id=3-define-y-as-those-making--50k>3. Define Y as those making > $50k</h3><p><strong>First:</strong> Limit the data to only people who answered &ldquo;Yes&rdquo; to the question &ldquo;Have you seen any of the 6 films in the Star Wars franchise?&rdquo;.</p><p><strong>Then:</strong> Use the table below as a guide to prepare your data for machine learning.</p><table><thead><tr><th>Column</th><th>Original Format</th><th>Convert To</th></tr></thead><tbody><tr><td>age</td><td>category (ordinal, age ranges)</td><td>number</td></tr><tr><td>income</td><td>category (ordinal, income ranges)</td><td>number</td></tr><tr><td>education</td><td>category (ordinal, name of degree)</td><td>number</td></tr><tr><td>shot_first</td><td>category (nominal)</td><td>one-hot</td></tr><tr><td>gender</td><td>category (nominal)</td><td>one-hot</td></tr><tr><td>location</td><td>category (nominal)</td><td>one-hot</td></tr><tr><td>fan_star_wars</td><td>Yes/No</td><td>0/1</td></tr><tr><td>expanded_universe</td><td>Yes/No</td><td>0/1</td></tr><tr><td>fan_exapanded</td><td>Yes/No</td><td>0/1</td></tr><tr><td>fan_star_trek</td><td>Yes/No</td><td>0/1</td></tr><tr><td>seen_i</td><td>Yes/No (name of movie/NaN)</td><td>0/1</td></tr><tr><td>seen_ii</td><td>Yes/No (name of movie/NaN)</td><td>0/1</td></tr><tr><td>seen_iii</td><td>Yes/No (name of movie/NaN)</td><td>0/1</td></tr><tr><td>seen_iv</td><td>Yes/No (name of movie/NaN)</td><td>0/1</td></tr><tr><td>seen_v</td><td>Yes/No (name of movie/NaN)</td><td>0/1</td></tr><tr><td>seen_vi</td><td>Yes/No (name of movie/NaN)</td><td>0/1</td></tr><tr><td>movie rankings</td><td>number</td><td>-</td></tr><tr><td>character rankings</td><td>category (ordinal)</td><td>one-hot or factorize</td></tr></tbody></table><br><h3 id=what-functions-can-we-use-to-convert-the-categorical-columns-to-numeric>What functions can we use to convert the categorical columns to numeric?</h3><ul><li>Range of numbers: <a href=https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.split.html>str.split()</a> and <a href=https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.astype.html>astype()</a></li><li>Ordinal: <a href=https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.replace.html>str.replace()</a></li><li>Ordinal: <a href=https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.factorize.html>pd.factorize()</a> (can also be used for True/False)</li><li>Nominal: <a href=https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.get_dummies.html>pd.get_dummies()</a></li></ul><div class="card mb-4 rounded-0 shadow border-0"><div class="card-header rounded-0 bg-white border p-0 border-0"><a class="card-link h4 d-flex tex-dark mb-0 py-3 px-4 justify-content-between" data-toggle=collapse href=#using-the-drop_first-true-option-in-get_dummies><span>Using the <code>drop_first = True</code> option in <code>get_dummies()</code></span> <i class="ti-plus text-primary text-right"></i></a></div><div id=using-the-drop_first-true-option-in-get_dummies class=collapse data-parent=#accordion><div class="card-body font-secondary text-color"><p>Question: When and why would we drop the first column when we convert a category using <code>pd.get_dummies()</code>?</p><p>Answer: Whenever your algorithm needs to calculate a matrix inverse.</p><blockquote><p>The one-hot encoding creates one binary variable for each category.</p><br><p>The problem is that this representation includes redundancy. For example, if we know that [1, 0, 0] represents &ldquo;blue&rdquo; and [0, 1, 0] represents &ldquo;green&rdquo; we don&rsquo;t need another binary variable to represent &ldquo;red&rdquo;, instead we could use 0 values for both &ldquo;blue&rdquo; and &ldquo;green&rdquo; alone, e.g. [0, 0].</p><br><p>This is called a dummy variable encoding, and always represents C categories with C-1 binary variables. In addition to being slightly less redundant, a dummy variable representation is required for some models.</p><br><p>For example, in the case of a linear regression model (and other regression models that have a bias term), a one hot encoding will case the matrix of input data to become singular, meaning it cannot be inverted and the linear regression coefficients cannot be calculated using linear algebra. For these types of models a dummy variable encoding must be used instead.</p></blockquote><p><a href=https://machinelearningmastery.com/one-hot-encoding-for-categorical-data/>Source</a></p></div></div></div><br><h2 id=predicting-income>Predicting income.</h2><p><strong>Grand Question 4</strong> wants us to &ldquo;build a machine learning model that predicts whether a person makes more than $50k&rdquo;.</p><div class="card mb-4 rounded-0 shadow border-0"><div class="card-header rounded-0 bg-white border p-0 border-0"><a class="card-link h4 d-flex tex-dark mb-0 py-3 px-4 justify-content-between" data-toggle=collapse href=#what-is-the-target-were-interested-in><span>What is the target we&rsquo;re interested in?</span> <i class="ti-plus text-primary text-right"></i></a></div><div id=what-is-the-target-were-interested-in class=collapse data-parent=#accordion><div class="card-body font-secondary text-color"><p>Aka, what is our &ldquo;outcome&rdquo; or &ldquo;response&rdquo; that we want to predict?</p><div class=highlight><pre style=color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4><code class=language-python data-lang=python>dat_ml<span style=color:#f92672>.</span>income <span style=color:#f92672>&gt;</span> <span style=color:#ae81ff>50000</span>
-</code></pre></div></div></div></div><div class="card mb-4 rounded-0 shadow border-0"><div class="card-header rounded-0 bg-white border p-0 border-0"><a class="card-link h4 d-flex tex-dark mb-0 py-3 px-4 justify-content-between" data-toggle=collapse href=#how-to-format-the-features-x-and-target-y><span>How to format the features (x) and target (y)</span> <i class="ti-plus text-primary text-right"></i></a></div><div id=how-to-format-the-features-x-and-target-y class=collapse data-parent=#accordion><div class="card-body font-secondary text-color"><p>Remember not to include the answer (income) in your features!</p><div class=highlight><pre style=color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4><code class=language-python data-lang=python>x <span style=color:#f92672>=</span> dat_ml<span style=color:#f92672>.</span>drop([<span style=color:#e6db74>&#39;income&#39;</span>], axis <span style=color:#f92672>=</span> <span style=color:#ae81ff>1</span>)
-</code></pre></div><p>The response needs to be saved as a 0/1 variable (at least, for binary classification algorithms).</p><div class=highlight><pre style=color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4><code class=language-python data-lang=python>y <span style=color:#f92672>=</span> (dat_ml<span style=color:#f92672>.</span>income <span style=color:#f92672>&gt;</span> <span style=color:#ae81ff>50000</span>) <span style=color:#f92672>/</span> <span style=color:#ae81ff>1</span>
-</code></pre></div></div></div></div><br></div><p class="post-meta border-bottom pb-3 mb-0 mt-3">Updated on 12 Oct 2020</p><nav class="pagination mt-3"><a class="nav nav-prev" href=https://byuistats.github.io/DS250-Cannon/slides/p5/><i class="ti-arrow-left mr-2"></i><span class="d-none d-md-block">Week 10-11: Project 5 - Star Wars</span></a>
-<a class="nav nav-next" href=https://byuistats.github.io/DS250-Cannon/slides/p5/d3/><span class="d-none d-md-block">Day 3: Validating data, cleaning columns</span><i class="ti-arrow-right ml-2"></i></a></nav></div></div></div></div></section><footer class="section pb-4"><div class=container><div class="row align-items-center"><div class="col-md-8 text-md-left text-center"><p class="mb-md-0 mb-4">J. Hathaway and BYU-I ©</p></div><div class="col-md-4 text-md-right text-center"><ul class=list-inline><li class=list-inline-item><a class="text-color d-inline-block p-2" href=https://github.com/byuidatascience><i class=ti-github></i></a></li><li class=list-inline-item><a class="text-color d-inline-block p-2" href=https://www.linkedin.com/groups/13537407/><i class=ti-linkedin></i></a></li></ul></div></div></div></footer><script src=https://byuistats.github.io/DS250-Cannon/js/script.min.js></script></body></html>
\ No newline at end of file
diff --git a/slides/p5/d4/index.xml b/slides/p5/d4/index.xml
deleted file mode 100644
index 62b4a32..0000000
--- a/slides/p5/d4/index.xml
+++ /dev/null
@@ -1 +0,0 @@
-<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Day 4: May the ML columns be with you on DS250</title><link>https://byuistats.github.io/DS250-Cannon/slides/p5/d4/</link><description>Recent content in Day 4: May the ML columns be with you on DS250</description><generator>Hugo -- gohugo.io</generator><language>en-us</language><copyright>J. Hathaway and BYU-I ©</copyright><lastBuildDate>Fri, 01 May 2020 11:02:05 +0600</lastBuildDate><atom:link href="https://byuistats.github.io/DS250-Cannon/slides/p5/d4/index.xml" rel="self" type="application/rss+xml"/></channel></rss>
\ No newline at end of file
diff --git a/slides/p5/index.html b/slides/p5/index.html
deleted file mode 100644
index 0e97e11..0000000
--- a/slides/p5/index.html
+++ /dev/null
@@ -1,7 +0,0 @@
-<!doctype html><html lang=en-us><head><meta charset=utf-8><title>Week 10-11: Project 5 - Star Wars</title><meta name=generator content="Hugo 0.74.3"><meta name=viewport content="width=device-width,initial-scale=1,maximum-scale=1"><link rel=stylesheet href=https://byuistats.github.io/DS250-Cannon/plugins/bootstrap/bootstrap.min.css><link rel=stylesheet href=https://byuistats.github.io/DS250-Cannon/plugins/themify-icons/themify-icons.css><link rel=stylesheet href=https://byuistats.github.io/DS250-Cannon/plugins/highlight/hybrid.css><link rel=icon href=https://byuistats.github.io/DS250-Cannon/images/favicon.png type=image/x-icon><link href="https://fonts.googleapis.com/css?family=Roboto:300,400,700&display=swap" rel=stylesheet><style>:root{--primary-color:#02007e;--body-color:#f9f9f9;--text-color:#636363;--text-color-dark:#242738;--white-color:#ffffff;--light-color:#f8f9fa;--font-family:Roboto}</style><link href=https://byuistats.github.io/DS250-Cannon/css/style.min.css rel=stylesheet media=screen><script src=https://byuistats.github.io/DS250-Cannon/plugins/jquery/jquery-1.12.4.js></script><script src=https://byuistats.github.io/DS250-Cannon/plugins/jquery/jquery-ui.js></script><script src=https://byuistats.github.io/DS250-Cannon/plugins/bootstrap/bootstrap.min.js></script><script src=https://byuistats.github.io/DS250-Cannon/plugins/match-height/jquery.matchHeight-min.js></script><script src=https://byuistats.github.io/DS250-Cannon/plugins/highlight/highlight.pack.js></script><script>hljs.initHighlightingOnLoad();</script><script type=application/javascript>var doNotTrack=false;if(!doNotTrack){(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)})(window,document,'script','https://www.google-analytics.com/analytics.js','ga');ga('create','UA-132356198-4','auto');ga('send','pageview');}</script></head><body><header class="shadow-bottom sticky-top bg-white"><nav class="navbar navbar-expand-md navbar-light"><div class=container><a class="navbar-brand px-2" href=/DS250-Cannon>DS250</a>
-<button class="navbar-toggler border-0" type=button data-toggle=collapse data-target=#navigation aria-controls=navigation aria-expanded=false aria-label="Toggle navigation">
-<span class=navbar-toggler-icon></span></button><div class="collapse navbar-collapse text-center" id=navigation><ul class="navbar-nav ml-auto"><li class=nav-item><a class="nav-link text-dark" href=/DS250-Cannon>Home</a></li><li class=nav-item><a class="nav-link text-dark" href=/DS250-Cannon/projects>Projects</a></li><li class=nav-item><a class="nav-link text-dark" href=/DS250-Cannon/contact>Contact</a></li><li class=nav-item><a class="nav-link text-dark" href=/DS250-Cannon/course-materials>Materials</a></li><li class="nav-item dropdown"><a class="nav-link dropdown-toggle text-dark" href=# role=button data-toggle=dropdown aria-haspopup=true aria-expanded=false>Navigate</a><div class=dropdown-menu><a class=dropdown-item href=/DS250-Cannon/slides>Slides</a>
-<a class=dropdown-item href=/DS250-Cannon/course-materials/syllabus/>Syllabus</a>
-<a class=dropdown-item href=/DS250-Cannon/faq>FAQ</a></div></li></ul></div></div></nav></header><section class="single section-sm pb-0"><div class=container><div class=row><div class=col-lg-3><div class=sidebar><ul class=list-styled><a class=back-btn href=/DS250-Cannon></a><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/ title=Slides class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/>Slides</a><ul><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p6/ title="Week 12-13: Project 6 - Github" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p6/>Week 12-13: Project 6 - Github</a><ul><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p6/d4/ title="Day 3: Resume Fork and Merge" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p6/d4/>Day 3: Resume Fork and Merge</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p6/d3/ title="Day 2: Commit, push, fork, and merge" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p6/d3/>Day 2: Commit, push, fork, and merge</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p6/d2/ title="Day 1: Git and Github" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p6/d2/>Day 1: Git and Github</a></li></ul></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p5/ title="Week 10-11: Project 5 - Star Wars" class="sidelist
-active"><a href=https://byuistats.github.io/DS250-Cannon/slides/p5/>Week 10-11: Project 5 - Star Wars</a><ul><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p5/d4/ title="Day 4: May the ML columns be with you" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p5/d4/>Day 4: May the ML columns be with you</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p5/d3/ title="Day 3: Validating data, cleaning columns" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p5/d3/>Day 3: Validating data, cleaning columns</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p5/d2/ title="Day 2: Star Wars and strings" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p5/d2/>Day 2: Star Wars and strings</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p5/d1/ title="Day 1: The war with Star Wars" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p5/d1/>Day 1: The war with Star Wars</a></li></ul></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/introduction/ title="Week 1: Introduction" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/introduction/>Week 1: Introduction</a><ul><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/introduction/day02/ title="Day 2: Project 0" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/introduction/day02/>Day 2: Project 0</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/introduction/day01/ title="Day 1: Welcome" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/introduction/day01/>Day 1: Welcome</a></li></ul></li></ul></li></ul></div></div><div class=col-lg-9><div class="p-lg-5 p-4 bg-white"><h2 class=mb-5>Week 10-11: Project 5 - Star Wars</h2><div class=content><div class="notices info"><p>A significant portion of a data scientist&rsquo;s job is data cleaning. during these two weeks we will not hide the data munging from you. We will practice data cleaning using a Star Wars survey from FiveThirtEight. Survey data is notoriously difficult to handle. Even when the data is recorded cleanly the options for ‘write in questions’, ‘choose from multiple answers’, ‘pick all that are right’, and ‘multiple choice questions’ makes storing the data in a tidy format difficult.</p></div></div><p class="post-meta border-bottom pb-3 mb-0 mt-3">Updated on 15 Sep 2020</p><nav class="pagination mt-3"><a class="nav nav-prev" href=https://byuistats.github.io/DS250-Cannon/slides/p6/d2/><i class="ti-arrow-left mr-2"></i><span class="d-none d-md-block">Day 1: Git and Github</span></a>
-<a class="nav nav-next" href=https://byuistats.github.io/DS250-Cannon/slides/p5/d4/><span class="d-none d-md-block">Day 4: May the ML columns be with you</span><i class="ti-arrow-right ml-2"></i></a></nav></div></div></div></div></section><footer class="section pb-4"><div class=container><div class="row align-items-center"><div class="col-md-8 text-md-left text-center"><p class="mb-md-0 mb-4">J. Hathaway and BYU-I ©</p></div><div class="col-md-4 text-md-right text-center"><ul class=list-inline><li class=list-inline-item><a class="text-color d-inline-block p-2" href=https://github.com/byuidatascience><i class=ti-github></i></a></li><li class=list-inline-item><a class="text-color d-inline-block p-2" href=https://www.linkedin.com/groups/13537407/><i class=ti-linkedin></i></a></li></ul></div></div></div></footer><script src=https://byuistats.github.io/DS250-Cannon/js/script.min.js></script></body></html>
\ No newline at end of file
diff --git a/slides/p5/index.xml b/slides/p5/index.xml
deleted file mode 100644
index 354f1f4..0000000
--- a/slides/p5/index.xml
+++ /dev/null
@@ -1 +0,0 @@
-<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Week 10-11: Project 5 - Star Wars on DS250</title><link>https://byuistats.github.io/DS250-Cannon/slides/p5/</link><description>Recent content in Week 10-11: Project 5 - Star Wars on DS250</description><generator>Hugo -- gohugo.io</generator><language>en-us</language><copyright>J. Hathaway and BYU-I ©</copyright><lastBuildDate>Fri, 01 May 2020 11:02:05 +0600</lastBuildDate><atom:link href="https://byuistats.github.io/DS250-Cannon/slides/p5/index.xml" rel="self" type="application/rss+xml"/></channel></rss>
\ No newline at end of file
diff --git a/slides/p6/d2/index.html b/slides/p6/d2/index.html
index 1896c0b..e4f8d92 100644
--- a/slides/p6/d2/index.html
+++ b/slides/p6/d2/index.html
@@ -3,5 +3,5 @@
 <span class=navbar-toggler-icon></span></button><div class="collapse navbar-collapse text-center" id=navigation><ul class="navbar-nav ml-auto"><li class=nav-item><a class="nav-link text-dark" href=/DS250-Cannon>Home</a></li><li class=nav-item><a class="nav-link text-dark" href=/DS250-Cannon/projects>Projects</a></li><li class=nav-item><a class="nav-link text-dark" href=/DS250-Cannon/contact>Contact</a></li><li class=nav-item><a class="nav-link text-dark" href=/DS250-Cannon/course-materials>Materials</a></li><li class="nav-item dropdown"><a class="nav-link dropdown-toggle text-dark" href=# role=button data-toggle=dropdown aria-haspopup=true aria-expanded=false>Navigate</a><div class=dropdown-menu><a class=dropdown-item href=/DS250-Cannon/slides>Slides</a>
 <a class=dropdown-item href=/DS250-Cannon/course-materials/syllabus/>Syllabus</a>
 <a class=dropdown-item href=/DS250-Cannon/faq>FAQ</a></div></li></ul></div></div></nav></header><section class="single section-sm pb-0"><div class=container><div class=row><div class=col-lg-3><div class=sidebar><ul class=list-styled><a class=back-btn href=/DS250-Cannon></a><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/ title=Slides class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/>Slides</a><ul><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p6/ title="Week 12-13: Project 6 - Github" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p6/>Week 12-13: Project 6 - Github</a><ul><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p6/d4/ title="Day 3: Resume Fork and Merge" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p6/d4/>Day 3: Resume Fork and Merge</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p6/d3/ title="Day 2: Commit, push, fork, and merge" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p6/d3/>Day 2: Commit, push, fork, and merge</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p6/d2/ title="Day 1: Git and Github" class="sidelist
-active"><a href=https://byuistats.github.io/DS250-Cannon/slides/p6/d2/>Day 1: Git and Github</a></li></ul></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p5/ title="Week 10-11: Project 5 - Star Wars" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p5/>Week 10-11: Project 5 - Star Wars</a><ul><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p5/d4/ title="Day 4: May the ML columns be with you" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p5/d4/>Day 4: May the ML columns be with you</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p5/d3/ title="Day 3: Validating data, cleaning columns" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p5/d3/>Day 3: Validating data, cleaning columns</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p5/d2/ title="Day 2: Star Wars and strings" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p5/d2/>Day 2: Star Wars and strings</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p5/d1/ title="Day 1: The war with Star Wars" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p5/d1/>Day 1: The war with Star Wars</a></li></ul></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/introduction/ title="Week 1: Introduction" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/introduction/>Week 1: Introduction</a><ul><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/introduction/day02/ title="Day 2: Project 0" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/introduction/day02/>Day 2: Project 0</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/introduction/day01/ title="Day 1: Welcome" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/introduction/day01/>Day 1: Welcome</a></li></ul></li></ul></li></ul></div></div><div class=col-lg-9><div class="p-lg-5 p-4 bg-white"><h2 class=mb-5>Day 1: Git and Github</h2><div class=content><h2 id=welcome-to-class>Welcome to class!</h2><h4 id=spiritual-thought>Spiritual Thought</h4><h4 id=announcements>Announcements</h4><ol><li><p>Project 5 Comment</p><ul><li>Feature Importance and Model discussion</li></ul></li><li><p>The last day of DSS is next Wednesday, Dec 6th at 6:00PM in STC 394</p></li><li><p>Extra credit for creating and uploading cheat sheet (2 points for projects or checkpoints)</p></li><li><p>Coding Challenge date?</p></li><li><p>The technical aspects of Project 6 will be done mostly in class. Resume prep/MD outside</p></li></ol><br><h2 id=git-and-github>Git and GitHub</h2><h3 id=web-developers-social-media-platform>&ldquo;Web developers&rsquo; social media platform&rdquo;</h3><blockquote><p>This is GitHub, the world’s largest code repository platform online. A platform used by some 50 million software developers to host their coding projects, most of them open-source — meaning others can access their codes and modify them to create better versions if they feel like.</p><br><p>Most of the internet is produced or hosted on GitHub in the form of code. “What Gmail is to email, GitHub is to writing software,” says Kiran Jonnalagadda, cofounder of HasGeek, a platform to build and discover peer groups. <a href="https://economictimes.indiatimes.com/internet/inside-github-web-developers-social-media-platform/articleshow/77096752.cms?utm_source=contentofinterest&utm_medium=text&utm_campaign=cppst"><em>Source</em></a></p></blockquote><ul><li><strong>Don&rsquo;t:</strong> post code for assignments that hundreds of other students have done.</li><li><strong>Do:</strong> post unique code using skills from your classes.</li></ul><p>I would also recommend using private repos to manage your course work.</p><br><h3 id=is-it-going-to-hurt>Is it going to hurt?</h3><p><strong>Answer: Yes.</strong></p><p>It feels weird at first but quickly becomes second nature. If you plan on taking more data science classes, you should know that DS 350 students are required to submit all coursework via GitHub. This is a major topic in class and office hours for the first two weeks. Then we practically never discuss it again.</p><p>More bad news. Do you use GitHub to work with other people or to coordinate your own work from multiple computers? If so, after you recover from the initial setup, Git will crush you again with merge conflicts. And this is not one-time pain, this could be a dull ache for a long time.</p><p><img src=https://imgs.xkcd.com/comics/git.png alt></p><blockquote><p>Managing a project via Git/GitHub is much like the Google Doc scenario and enjoys many of the same advantages. It is definitely more complicated than collaborating on a Google Doc, but this puts you in the right mindset. <a href=https://happygitwithr.com/big-picture.html><em>Source</em></a></p></blockquote><br><h3 id=step-1-download-and-install>Step 1: Download and install</h3><p>Follow steps 1-4 of <a href=https://www.jcchouinard.com/install-git-in-vscode/>this tutorial</a>.</p><p>Then:</p><ol><li>Request access tothe BYU-I Resumes page at <a href=https://posit.byui.edu/github_orgs/>Request Access</a></li><li>Respond to the auto-generated email</li><li>Wait a few minutes for authorization</li><li>Join our GitHub organization - <a href=https://github.com/byuids-resumes>byuids-resumes</a>.</li></ol><br><h5 id=if-you-are-on-a-mac-you-may-need>If you are on a Mac, you may need:</h5><ul><li><a href=https://modulesunraveled.com/installing-git/updating-git-if-you-have-only-version-comes-xcode-or-command-line-developer-tools>Mac fix with paths</a></li><li><a href=https://developer.apple.com/xcode/>Download Xcode and update</a> (10 gig download)</li><li><a href=../../../course-materials/git_github_ds/>VSCode path selection</a> (scroll down to step 1)</li></ul><br><h3 id=step-2-create-a-repository-from-the-resume-template-and-connect-to-the-byui>Step 2: Create a repository from the resume template and connect to the BYUI</h3><p><img src=template_github.png alt></p><br><h3 id=step-3-publish-your-resume-to-github-pages>Step 3: Publish your resume to GitHub Pages</h3><ul><li>Go to settings for your repo.</li><li>Scroll down to the GitHub Pages section.</li><li>Under source select the box which says None and pick master.</li><li>Now select the /docs folder and click save.</li><li>Copy your site URL at the top of the /settings/pages location.</li><li>Add your link to the About section of your repository.</li><li>Edit the readme.md in the base repo to not show the resume directions.</li></ul><br><h3 id=step-4-clone-repo-into-vs-code>Step 4: Clone repo into VS Code</h3><p><a href=https://www.analyticsvidhya.com/blog/2020/05/git-github-essential-guide-beginners/>Analytics Vidhya reading</a></p><p><img src=https://cdn.analyticsvidhya.com/wp-content/uploads/2020/05/image37.png alt></p><br><h3 id=step-5-make-your-resume-look-good>Step 5: Make your resume look good</h3><p>Examples:</p><ul><li><a href=https://byuidatascience.github.io/resume_example.html>Undergraduate DS resumes</a></li><li><a href=http://jhathaway.io/extra/hathaway.pdf>Hathaway&rsquo;s resume</a></li></ul><p>You may also find these articles helpful:</p><ul><li><a href=https://www.dataquest.io/blog/how-data-science-resume-cv/>How to Write a Great Data Science Resume</a></li><li><a href=https://www.analyticsvidhya.com/blog/2019/07/how-to-build-effective-data-science-resume-4-key-aspects>How to Build an Effective Data Science Resume</a></li><li><a href=https://elitedatascience.com/resume-tips>How to Write the Perfect Data Scientist Resume</a></li></ul></div><p class="post-meta border-bottom pb-3 mb-0 mt-3">Updated on 12 Oct 2020</p><nav class="pagination mt-3"><a class="nav nav-prev" href=https://byuistats.github.io/DS250-Cannon/slides/p6/d3/><i class="ti-arrow-left mr-2"></i><span class="d-none d-md-block">Day 2: Commit, push, fork, and merge</span></a>
-<a class="nav nav-next" href=https://byuistats.github.io/DS250-Cannon/slides/p5/><span class="d-none d-md-block">Week 10-11: Project 5 - Star Wars</span><i class="ti-arrow-right ml-2"></i></a></nav></div></div></div></div></section><footer class="section pb-4"><div class=container><div class="row align-items-center"><div class="col-md-8 text-md-left text-center"><p class="mb-md-0 mb-4">J. Hathaway and BYU-I ©</p></div><div class="col-md-4 text-md-right text-center"><ul class=list-inline><li class=list-inline-item><a class="text-color d-inline-block p-2" href=https://github.com/byuidatascience><i class=ti-github></i></a></li><li class=list-inline-item><a class="text-color d-inline-block p-2" href=https://www.linkedin.com/groups/13537407/><i class=ti-linkedin></i></a></li></ul></div></div></div></footer><script src=https://byuistats.github.io/DS250-Cannon/js/script.min.js></script></body></html>
\ No newline at end of file
+active"><a href=https://byuistats.github.io/DS250-Cannon/slides/p6/d2/>Day 1: Git and Github</a></li></ul></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/introduction/ title="Week 1: Introduction" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/introduction/>Week 1: Introduction</a><ul><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/introduction/day02/ title="Day 2: Project 0" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/introduction/day02/>Day 2: Project 0</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/introduction/day01/ title="Day 1: Welcome" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/introduction/day01/>Day 1: Welcome</a></li></ul></li></ul></li></ul></div></div><div class=col-lg-9><div class="p-lg-5 p-4 bg-white"><h2 class=mb-5>Day 1: Git and Github</h2><div class=content><h2 id=welcome-to-class>Welcome to class!</h2><h4 id=spiritual-thought>Spiritual Thought</h4><h4 id=announcements>Announcements</h4><ol><li><p>Project 5 Comment</p><ul><li>Feature Importance and Model discussion</li></ul></li><li><p>The last day of DSS is next Wednesday, Dec 6th at 6:00PM in STC 394</p></li><li><p>Extra credit for creating and uploading cheat sheet (2 points for projects or checkpoints)</p></li><li><p>Coding Challenge date?</p></li><li><p>The technical aspects of Project 6 will be done mostly in class. Resume prep/MD outside</p></li></ol><br><h2 id=git-and-github>Git and GitHub</h2><h3 id=web-developers-social-media-platform>&ldquo;Web developers&rsquo; social media platform&rdquo;</h3><blockquote><p>This is GitHub, the world’s largest code repository platform online. A platform used by some 50 million software developers to host their coding projects, most of them open-source — meaning others can access their codes and modify them to create better versions if they feel like.</p><br><p>Most of the internet is produced or hosted on GitHub in the form of code. “What Gmail is to email, GitHub is to writing software,” says Kiran Jonnalagadda, cofounder of HasGeek, a platform to build and discover peer groups. <a href="https://economictimes.indiatimes.com/internet/inside-github-web-developers-social-media-platform/articleshow/77096752.cms?utm_source=contentofinterest&utm_medium=text&utm_campaign=cppst"><em>Source</em></a></p></blockquote><ul><li><strong>Don&rsquo;t:</strong> post code for assignments that hundreds of other students have done.</li><li><strong>Do:</strong> post unique code using skills from your classes.</li></ul><p>I would also recommend using private repos to manage your course work.</p><br><h3 id=is-it-going-to-hurt>Is it going to hurt?</h3><p><strong>Answer: Yes.</strong></p><p>It feels weird at first but quickly becomes second nature. If you plan on taking more data science classes, you should know that DS 350 students are required to submit all coursework via GitHub. This is a major topic in class and office hours for the first two weeks. Then we practically never discuss it again.</p><p>More bad news. Do you use GitHub to work with other people or to coordinate your own work from multiple computers? If so, after you recover from the initial setup, Git will crush you again with merge conflicts. And this is not one-time pain, this could be a dull ache for a long time.</p><p><img src=https://imgs.xkcd.com/comics/git.png alt></p><blockquote><p>Managing a project via Git/GitHub is much like the Google Doc scenario and enjoys many of the same advantages. It is definitely more complicated than collaborating on a Google Doc, but this puts you in the right mindset. <a href=https://happygitwithr.com/big-picture.html><em>Source</em></a></p></blockquote><br><h3 id=step-1-download-and-install>Step 1: Download and install</h3><p>Follow steps 1-4 of <a href=https://www.jcchouinard.com/install-git-in-vscode/>this tutorial</a>.</p><p>Then:</p><ol><li>Request access tothe BYU-I Resumes page at <a href=https://posit.byui.edu/github_orgs/>Request Access</a></li><li>Respond to the auto-generated email</li><li>Wait a few minutes for authorization</li><li>Join our GitHub organization - <a href=https://github.com/byuids-resumes>byuids-resumes</a>.</li></ol><br><h5 id=if-you-are-on-a-mac-you-may-need>If you are on a Mac, you may need:</h5><ul><li><a href=https://modulesunraveled.com/installing-git/updating-git-if-you-have-only-version-comes-xcode-or-command-line-developer-tools>Mac fix with paths</a></li><li><a href=https://developer.apple.com/xcode/>Download Xcode and update</a> (10 gig download)</li><li><a href=../../../course-materials/git_github_ds/>VSCode path selection</a> (scroll down to step 1)</li></ul><br><h3 id=step-2-create-a-repository-from-the-resume-template-and-connect-to-the-byui>Step 2: Create a repository from the resume template and connect to the BYUI</h3><p><img src=template_github.png alt></p><br><h3 id=step-3-publish-your-resume-to-github-pages>Step 3: Publish your resume to GitHub Pages</h3><ul><li>Go to settings for your repo.</li><li>Scroll down to the GitHub Pages section.</li><li>Under source select the box which says None and pick master.</li><li>Now select the /docs folder and click save.</li><li>Copy your site URL at the top of the /settings/pages location.</li><li>Add your link to the About section of your repository.</li><li>Edit the readme.md in the base repo to not show the resume directions.</li></ul><br><h3 id=step-4-clone-repo-into-vs-code>Step 4: Clone repo into VS Code</h3><p><a href=https://www.analyticsvidhya.com/blog/2020/05/git-github-essential-guide-beginners/>Analytics Vidhya reading</a></p><p><img src=https://cdn.analyticsvidhya.com/wp-content/uploads/2020/05/image37.png alt></p><br><h3 id=step-5-make-your-resume-look-good>Step 5: Make your resume look good</h3><p>Examples:</p><ul><li><a href=https://byuidatascience.github.io/resume_example.html>Undergraduate DS resumes</a></li><li><a href=http://jhathaway.io/extra/hathaway.pdf>Hathaway&rsquo;s resume</a></li></ul><p>You may also find these articles helpful:</p><ul><li><a href=https://www.dataquest.io/blog/how-data-science-resume-cv/>How to Write a Great Data Science Resume</a></li><li><a href=https://www.analyticsvidhya.com/blog/2019/07/how-to-build-effective-data-science-resume-4-key-aspects>How to Build an Effective Data Science Resume</a></li><li><a href=https://elitedatascience.com/resume-tips>How to Write the Perfect Data Scientist Resume</a></li></ul></div><p class="post-meta border-bottom pb-3 mb-0 mt-3">Updated on 12 Oct 2020</p><nav class="pagination mt-3"><a class="nav nav-prev" href=https://byuistats.github.io/DS250-Cannon/slides/p6/d3/><i class="ti-arrow-left mr-2"></i><span class="d-none d-md-block">Day 2: Commit, push, fork, and merge</span></a>
+<a class="nav nav-next" href=https://byuistats.github.io/DS250-Cannon/slides/introduction/><span class="d-none d-md-block">Week 1: Introduction</span><i class="ti-arrow-right ml-2"></i></a></nav></div></div></div></div></section><footer class="section pb-4"><div class=container><div class="row align-items-center"><div class="col-md-8 text-md-left text-center"><p class="mb-md-0 mb-4">J. Hathaway and BYU-I ©</p></div><div class="col-md-4 text-md-right text-center"><ul class=list-inline><li class=list-inline-item><a class="text-color d-inline-block p-2" href=https://github.com/byuidatascience><i class=ti-github></i></a></li><li class=list-inline-item><a class="text-color d-inline-block p-2" href=https://www.linkedin.com/groups/13537407/><i class=ti-linkedin></i></a></li></ul></div></div></div></footer><script src=https://byuistats.github.io/DS250-Cannon/js/script.min.js></script></body></html>
\ No newline at end of file
diff --git a/slides/p6/d3/index.html b/slides/p6/d3/index.html
index bdccfd8..07d3f5d 100644
--- a/slides/p6/d3/index.html
+++ b/slides/p6/d3/index.html
@@ -3,6 +3,6 @@
 <span class=navbar-toggler-icon></span></button><div class="collapse navbar-collapse text-center" id=navigation><ul class="navbar-nav ml-auto"><li class=nav-item><a class="nav-link text-dark" href=/DS250-Cannon>Home</a></li><li class=nav-item><a class="nav-link text-dark" href=/DS250-Cannon/projects>Projects</a></li><li class=nav-item><a class="nav-link text-dark" href=/DS250-Cannon/contact>Contact</a></li><li class=nav-item><a class="nav-link text-dark" href=/DS250-Cannon/course-materials>Materials</a></li><li class="nav-item dropdown"><a class="nav-link dropdown-toggle text-dark" href=# role=button data-toggle=dropdown aria-haspopup=true aria-expanded=false>Navigate</a><div class=dropdown-menu><a class=dropdown-item href=/DS250-Cannon/slides>Slides</a>
 <a class=dropdown-item href=/DS250-Cannon/course-materials/syllabus/>Syllabus</a>
 <a class=dropdown-item href=/DS250-Cannon/faq>FAQ</a></div></li></ul></div></div></nav></header><section class="single section-sm pb-0"><div class=container><div class=row><div class=col-lg-3><div class=sidebar><ul class=list-styled><a class=back-btn href=/DS250-Cannon></a><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/ title=Slides class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/>Slides</a><ul><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p6/ title="Week 12-13: Project 6 - Github" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p6/>Week 12-13: Project 6 - Github</a><ul><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p6/d4/ title="Day 3: Resume Fork and Merge" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p6/d4/>Day 3: Resume Fork and Merge</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p6/d3/ title="Day 2: Commit, push, fork, and merge" class="sidelist
-active"><a href=https://byuistats.github.io/DS250-Cannon/slides/p6/d3/>Day 2: Commit, push, fork, and merge</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p6/d2/ title="Day 1: Git and Github" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p6/d2/>Day 1: Git and Github</a></li></ul></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p5/ title="Week 10-11: Project 5 - Star Wars" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p5/>Week 10-11: Project 5 - Star Wars</a><ul><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p5/d4/ title="Day 4: May the ML columns be with you" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p5/d4/>Day 4: May the ML columns be with you</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p5/d3/ title="Day 3: Validating data, cleaning columns" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p5/d3/>Day 3: Validating data, cleaning columns</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p5/d2/ title="Day 2: Star Wars and strings" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p5/d2/>Day 2: Star Wars and strings</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p5/d1/ title="Day 1: The war with Star Wars" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p5/d1/>Day 1: The war with Star Wars</a></li></ul></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/introduction/ title="Week 1: Introduction" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/introduction/>Week 1: Introduction</a><ul><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/introduction/day02/ title="Day 2: Project 0" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/introduction/day02/>Day 2: Project 0</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/introduction/day01/ title="Day 1: Welcome" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/introduction/day01/>Day 1: Welcome</a></li></ul></li></ul></li></ul></div></div><div class=col-lg-9><div class="p-lg-5 p-4 bg-white"><h2 class=mb-5>Day 2: Commit, push, fork, and merge</h2><div class=content><h2 id=welcome-to-class>Welcome to class!</h2><h4 id=announcements>Announcements</h4><br><h2 id=practice-with-git>Practice with Git</h2><h4 id=gq3-add-commit-push-and-a-little-pull>GQ3: <code>add, commit, push</code> and a little <code>pull</code></h4><p>Let&rsquo;s save the changes we&rsquo;ve made to our resume.</p><br><h4 id=gq4-fork-and-merge>GQ4: Fork and merge</h4><p>Get into groups of 2 or 3. Then follow the steps below:</p><ol><li><code>fork</code> the other student&rsquo;s resume repository.</li><li>Now clone that forked repository to your computer.</li><li>On your local version of the forked repository, do the following:<br>A. Create a new file called <code>feedback.md</code>
+active"><a href=https://byuistats.github.io/DS250-Cannon/slides/p6/d3/>Day 2: Commit, push, fork, and merge</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p6/d2/ title="Day 1: Git and Github" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p6/d2/>Day 1: Git and Github</a></li></ul></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/introduction/ title="Week 1: Introduction" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/introduction/>Week 1: Introduction</a><ul><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/introduction/day02/ title="Day 2: Project 0" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/introduction/day02/>Day 2: Project 0</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/introduction/day01/ title="Day 1: Welcome" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/introduction/day01/>Day 1: Welcome</a></li></ul></li></ul></li></ul></div></div><div class=col-lg-9><div class="p-lg-5 p-4 bg-white"><h2 class=mb-5>Day 2: Commit, push, fork, and merge</h2><div class=content><h2 id=welcome-to-class>Welcome to class!</h2><h4 id=announcements>Announcements</h4><br><h2 id=practice-with-git>Practice with Git</h2><h4 id=gq3-add-commit-push-and-a-little-pull>GQ3: <code>add, commit, push</code> and a little <code>pull</code></h4><p>Let&rsquo;s save the changes we&rsquo;ve made to our resume.</p><br><h4 id=gq4-fork-and-merge>GQ4: Fork and merge</h4><p>Get into groups of 2 or 3. Then follow the steps below:</p><ol><li><code>fork</code> the other student&rsquo;s resume repository.</li><li>Now clone that forked repository to your computer.</li><li>On your local version of the forked repository, do the following:<br>A. Create a new file called <code>feedback.md</code>
 B. Make a few recommendations or notes in the <code>feedback.md</code> file that will help the other student improve his or her resume<br>C. <code>add, commit, push</code> your edits<br>D. Go to the forked repo on GitHub and check if the <code>feedback.md</code> file shows up online</li><li>Now, create a <code>pull request</code> to get your edits into the other student&rsquo;s original repo.</li></ol><p>Once you&rsquo;ve given another student feedback, accept any pull requests submitted to your own repo. Continue to edit and improve your resume based on the feedback you received.</p><br><h4 id=gq5-fork-into-byuids-resumeshttpsgithubcombyuids-resumes>GQ5: Fork into <a href=https://github.com/byuids-resumes>byuids-resumes</a></h4><p>Fork your own resume repository into the <a href=https://github.com/byuids-resumes>BYU-I Data Science Resumes</a> group.</p><p>If you change your resume after you create this fork, you will have to submit a pull request to make sure the final version of your resume shows up in the group.</p><p><a href=../../../course-materials/git_github_ds/pull_merge/>These instructions</a> will help you create a pull request.</p><br></div><p class="post-meta border-bottom pb-3 mb-0 mt-3">Updated on 12 Oct 2020</p><nav class="pagination mt-3"><a class="nav nav-prev" href=https://byuistats.github.io/DS250-Cannon/slides/p6/d4/><i class="ti-arrow-left mr-2"></i><span class="d-none d-md-block">Day 3: Resume Fork and Merge</span></a>
 <a class="nav nav-next" href=https://byuistats.github.io/DS250-Cannon/slides/p6/d2/><span class="d-none d-md-block">Day 1: Git and Github</span><i class="ti-arrow-right ml-2"></i></a></nav></div></div></div></div></section><footer class="section pb-4"><div class=container><div class="row align-items-center"><div class="col-md-8 text-md-left text-center"><p class="mb-md-0 mb-4">J. Hathaway and BYU-I ©</p></div><div class="col-md-4 text-md-right text-center"><ul class=list-inline><li class=list-inline-item><a class="text-color d-inline-block p-2" href=https://github.com/byuidatascience><i class=ti-github></i></a></li><li class=list-inline-item><a class="text-color d-inline-block p-2" href=https://www.linkedin.com/groups/13537407/><i class=ti-linkedin></i></a></li></ul></div></div></div></footer><script src=https://byuistats.github.io/DS250-Cannon/js/script.min.js></script></body></html>
\ No newline at end of file
diff --git a/slides/p6/d4/index.html b/slides/p6/d4/index.html
index e6124f6..a0e7f31 100644
--- a/slides/p6/d4/index.html
+++ b/slides/p6/d4/index.html
@@ -3,5 +3,5 @@
 <span class=navbar-toggler-icon></span></button><div class="collapse navbar-collapse text-center" id=navigation><ul class="navbar-nav ml-auto"><li class=nav-item><a class="nav-link text-dark" href=/DS250-Cannon>Home</a></li><li class=nav-item><a class="nav-link text-dark" href=/DS250-Cannon/projects>Projects</a></li><li class=nav-item><a class="nav-link text-dark" href=/DS250-Cannon/contact>Contact</a></li><li class=nav-item><a class="nav-link text-dark" href=/DS250-Cannon/course-materials>Materials</a></li><li class="nav-item dropdown"><a class="nav-link dropdown-toggle text-dark" href=# role=button data-toggle=dropdown aria-haspopup=true aria-expanded=false>Navigate</a><div class=dropdown-menu><a class=dropdown-item href=/DS250-Cannon/slides>Slides</a>
 <a class=dropdown-item href=/DS250-Cannon/course-materials/syllabus/>Syllabus</a>
 <a class=dropdown-item href=/DS250-Cannon/faq>FAQ</a></div></li></ul></div></div></nav></header><section class="single section-sm pb-0"><div class=container><div class=row><div class=col-lg-3><div class=sidebar><ul class=list-styled><a class=back-btn href=/DS250-Cannon></a><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/ title=Slides class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/>Slides</a><ul><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p6/ title="Week 12-13: Project 6 - Github" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p6/>Week 12-13: Project 6 - Github</a><ul><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p6/d4/ title="Day 3: Resume Fork and Merge" class="sidelist
-active"><a href=https://byuistats.github.io/DS250-Cannon/slides/p6/d4/>Day 3: Resume Fork and Merge</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p6/d3/ title="Day 2: Commit, push, fork, and merge" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p6/d3/>Day 2: Commit, push, fork, and merge</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p6/d2/ title="Day 1: Git and Github" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p6/d2/>Day 1: Git and Github</a></li></ul></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p5/ title="Week 10-11: Project 5 - Star Wars" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p5/>Week 10-11: Project 5 - Star Wars</a><ul><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p5/d4/ title="Day 4: May the ML columns be with you" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p5/d4/>Day 4: May the ML columns be with you</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p5/d3/ title="Day 3: Validating data, cleaning columns" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p5/d3/>Day 3: Validating data, cleaning columns</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p5/d2/ title="Day 2: Star Wars and strings" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p5/d2/>Day 2: Star Wars and strings</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p5/d1/ title="Day 1: The war with Star Wars" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p5/d1/>Day 1: The war with Star Wars</a></li></ul></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/introduction/ title="Week 1: Introduction" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/introduction/>Week 1: Introduction</a><ul><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/introduction/day02/ title="Day 2: Project 0" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/introduction/day02/>Day 2: Project 0</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/introduction/day01/ title="Day 1: Welcome" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/introduction/day01/>Day 1: Welcome</a></li></ul></li></ul></li></ul></div></div><div class=col-lg-9><div class="p-lg-5 p-4 bg-white"><h2 class=mb-5>Day 3: Resume Fork and Merge</h2><div class=content><h2 id=remember-from-last-class-pull-add-commit-push>Remember from last class: pull, add, commit, push.</h2><br><h2 id=making-edits-in-another-users-repo>Making edits in another user&rsquo;s repo</h2><p><strong>Breakout Room Activity</strong></p><p>Each student in the breakout room is going to provide feedback on another student&rsquo;s resume. The breakout room should begin with a group discussion about the work you&rsquo;ve each done on your resume and any questions the group has. Then follow the steps below.</p><ol><li><code>fork</code> the other student&rsquo;s resume repository.</li><li>Now clone that forked repository to your computer.</li><li>On your local version of the forked repository, do the following;<br>A. Create a new file called <code>edits.md</code> and save it in the main folder or the repository.<br>B. Make a few recommendations or notes in the <code>edits.md</code> file that will help the other student improve his or her resume.<br>C. <code>add, commit, push</code> your edits.<br>D. Go to the forked repo on GitHub and check if the <code>edits.md</code> file shows up online.</li><li>Now, create a <code>pull request</code> to get your edits into the other student&rsquo;s original repo.</li></ol><p>Once you&rsquo;ve given another student feedback, accept any pull requests submitted to your own repo. Continue to edit and improve your resume based on the feedback you received.</p><br><h2 id=creating-a-fork-in-byuids-resumes>Creating a fork in byuids-resumes</h2><p>Fork your own resume repository into the <a href=https://github.com/byuids-resumes>BYU-I Data Science Resumes</a> group.</p><p>If you change your resume after you create this fork, you will have to submit a pull request to make sure the final version of your resume shows up in the group.</p><p><a href=../../../course-materials/git_github_ds/pull_merge/>These instructions</a> will help you create a pull request.</p><br><h2 id=open-time-to-finalize-your-resume>Open time to finalize your resume</h2></div><p class="post-meta border-bottom pb-3 mb-0 mt-3">Updated on 12 Oct 2020</p><nav class="pagination mt-3"><a class="nav nav-prev" href=https://byuistats.github.io/DS250-Cannon/slides/p6/><i class="ti-arrow-left mr-2"></i><span class="d-none d-md-block">Week 12-13: Project 6 - Github</span></a>
+active"><a href=https://byuistats.github.io/DS250-Cannon/slides/p6/d4/>Day 3: Resume Fork and Merge</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p6/d3/ title="Day 2: Commit, push, fork, and merge" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p6/d3/>Day 2: Commit, push, fork, and merge</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p6/d2/ title="Day 1: Git and Github" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p6/d2/>Day 1: Git and Github</a></li></ul></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/introduction/ title="Week 1: Introduction" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/introduction/>Week 1: Introduction</a><ul><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/introduction/day02/ title="Day 2: Project 0" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/introduction/day02/>Day 2: Project 0</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/introduction/day01/ title="Day 1: Welcome" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/introduction/day01/>Day 1: Welcome</a></li></ul></li></ul></li></ul></div></div><div class=col-lg-9><div class="p-lg-5 p-4 bg-white"><h2 class=mb-5>Day 3: Resume Fork and Merge</h2><div class=content><h2 id=remember-from-last-class-pull-add-commit-push>Remember from last class: pull, add, commit, push.</h2><br><h2 id=making-edits-in-another-users-repo>Making edits in another user&rsquo;s repo</h2><p><strong>Breakout Room Activity</strong></p><p>Each student in the breakout room is going to provide feedback on another student&rsquo;s resume. The breakout room should begin with a group discussion about the work you&rsquo;ve each done on your resume and any questions the group has. Then follow the steps below.</p><ol><li><code>fork</code> the other student&rsquo;s resume repository.</li><li>Now clone that forked repository to your computer.</li><li>On your local version of the forked repository, do the following;<br>A. Create a new file called <code>edits.md</code> and save it in the main folder or the repository.<br>B. Make a few recommendations or notes in the <code>edits.md</code> file that will help the other student improve his or her resume.<br>C. <code>add, commit, push</code> your edits.<br>D. Go to the forked repo on GitHub and check if the <code>edits.md</code> file shows up online.</li><li>Now, create a <code>pull request</code> to get your edits into the other student&rsquo;s original repo.</li></ol><p>Once you&rsquo;ve given another student feedback, accept any pull requests submitted to your own repo. Continue to edit and improve your resume based on the feedback you received.</p><br><h2 id=creating-a-fork-in-byuids-resumes>Creating a fork in byuids-resumes</h2><p>Fork your own resume repository into the <a href=https://github.com/byuids-resumes>BYU-I Data Science Resumes</a> group.</p><p>If you change your resume after you create this fork, you will have to submit a pull request to make sure the final version of your resume shows up in the group.</p><p><a href=../../../course-materials/git_github_ds/pull_merge/>These instructions</a> will help you create a pull request.</p><br><h2 id=open-time-to-finalize-your-resume>Open time to finalize your resume</h2></div><p class="post-meta border-bottom pb-3 mb-0 mt-3">Updated on 12 Oct 2020</p><nav class="pagination mt-3"><a class="nav nav-prev" href=https://byuistats.github.io/DS250-Cannon/slides/p6/><i class="ti-arrow-left mr-2"></i><span class="d-none d-md-block">Week 12-13: Project 6 - Github</span></a>
 <a class="nav nav-next" href=https://byuistats.github.io/DS250-Cannon/slides/p6/d3/><span class="d-none d-md-block">Day 2: Commit, push, fork, and merge</span><i class="ti-arrow-right ml-2"></i></a></nav></div></div></div></div></section><footer class="section pb-4"><div class=container><div class="row align-items-center"><div class="col-md-8 text-md-left text-center"><p class="mb-md-0 mb-4">J. Hathaway and BYU-I ©</p></div><div class="col-md-4 text-md-right text-center"><ul class=list-inline><li class=list-inline-item><a class="text-color d-inline-block p-2" href=https://github.com/byuidatascience><i class=ti-github></i></a></li><li class=list-inline-item><a class="text-color d-inline-block p-2" href=https://www.linkedin.com/groups/13537407/><i class=ti-linkedin></i></a></li></ul></div></div></div></footer><script src=https://byuistats.github.io/DS250-Cannon/js/script.min.js></script></body></html>
\ No newline at end of file
diff --git a/slides/p6/index.html b/slides/p6/index.html
index d2bc580..ff69f15 100644
--- a/slides/p6/index.html
+++ b/slides/p6/index.html
@@ -3,5 +3,5 @@
 <span class=navbar-toggler-icon></span></button><div class="collapse navbar-collapse text-center" id=navigation><ul class="navbar-nav ml-auto"><li class=nav-item><a class="nav-link text-dark" href=/DS250-Cannon>Home</a></li><li class=nav-item><a class="nav-link text-dark" href=/DS250-Cannon/projects>Projects</a></li><li class=nav-item><a class="nav-link text-dark" href=/DS250-Cannon/contact>Contact</a></li><li class=nav-item><a class="nav-link text-dark" href=/DS250-Cannon/course-materials>Materials</a></li><li class="nav-item dropdown"><a class="nav-link dropdown-toggle text-dark" href=# role=button data-toggle=dropdown aria-haspopup=true aria-expanded=false>Navigate</a><div class=dropdown-menu><a class=dropdown-item href=/DS250-Cannon/slides>Slides</a>
 <a class=dropdown-item href=/DS250-Cannon/course-materials/syllabus/>Syllabus</a>
 <a class=dropdown-item href=/DS250-Cannon/faq>FAQ</a></div></li></ul></div></div></nav></header><section class="single section-sm pb-0"><div class=container><div class=row><div class=col-lg-3><div class=sidebar><ul class=list-styled><a class=back-btn href=/DS250-Cannon></a><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/ title=Slides class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/>Slides</a><ul><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p6/ title="Week 12-13: Project 6 - Github" class="sidelist
-active"><a href=https://byuistats.github.io/DS250-Cannon/slides/p6/>Week 12-13: Project 6 - Github</a><ul><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p6/d4/ title="Day 3: Resume Fork and Merge" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p6/d4/>Day 3: Resume Fork and Merge</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p6/d3/ title="Day 2: Commit, push, fork, and merge" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p6/d3/>Day 2: Commit, push, fork, and merge</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p6/d2/ title="Day 1: Git and Github" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p6/d2/>Day 1: Git and Github</a></li></ul></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p5/ title="Week 10-11: Project 5 - Star Wars" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p5/>Week 10-11: Project 5 - Star Wars</a><ul><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p5/d4/ title="Day 4: May the ML columns be with you" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p5/d4/>Day 4: May the ML columns be with you</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p5/d3/ title="Day 3: Validating data, cleaning columns" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p5/d3/>Day 3: Validating data, cleaning columns</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p5/d2/ title="Day 2: Star Wars and strings" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p5/d2/>Day 2: Star Wars and strings</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p5/d1/ title="Day 1: The war with Star Wars" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p5/d1/>Day 1: The war with Star Wars</a></li></ul></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/introduction/ title="Week 1: Introduction" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/introduction/>Week 1: Introduction</a><ul><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/introduction/day02/ title="Day 2: Project 0" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/introduction/day02/>Day 2: Project 0</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/introduction/day01/ title="Day 1: Welcome" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/introduction/day01/>Day 1: Welcome</a></li></ul></li></ul></li></ul></div></div><div class=col-lg-9><div class="p-lg-5 p-4 bg-white"><h2 class=mb-5>Week 12-13: Project 6 - Github</h2><div class=content><div class="notices info"><p><p>GitHub is the communication tool for Data Scientists and developers. As students, you will want to curate your creative work on GitHub using Git. GitHub is the place to share your original work, not your homework assignments. Many people store their personal websites, blogs, and project websites on GitHub. Our textbook and course are hosted on GitHub, and you can see <a href=http://jhathaway.io/>J. Hathaway&rsquo;s</a> or <a href=https://ryanhafen.com/>Ryan Hafen&rsquo;s</a> personal Data Science websites that are hosted on GitHub as well. You will be making your public resume that will be hosted on GitHub for this project.</p><p>In the process of this project, we will be learning the process of Git and the tools of GitHub. We will use the Git process to have others in our class to edit our resumes. Take the process seriously (pick a suitable username and write a good resume), and you will have the beginning of your social presence in the DS/CS space.</p></p></div><div class="notices note"><p><strong>Completed Readings:</strong> <a href=https://tech.economictimes.indiatimes.com/news/internet/inside-github-web-developers-social-media-platform/77096752>GitHub, a programmer&rsquo;s social media</a>, <a href=https://github.com/join>Join GitHub</a>, <a href=https://github.blog/2019-06-06-generate-new-repositories-with-repository-templates/>Repository Templates</a>, <a href=https://code.visualstudio.com/docs/editor/versioncontrol>Using Version Control in VS Code</a>, <a href=https://code.visualstudio.com/docs/editor/github>Working with GitHub in VS Code</a>, <a href="https://www.youtube.com/watch?v=wMqukSKYcvU">Git in Visual Studio Code video</a>, <a href=https://www.analyticsvidhya.com/blog/2020/05/git-github-essential-guide-beginners/>New to Git and GitHub? This Essential Beginners Guide is for you</a>, <a href="https://www.theserverside.com/video/Git-vs-GitHub-What-is-the-difference-between-them#:~:text=The%20key%20difference%20between%20Git,and%20upload%20or%20download%20resources.">Git vs. GitHub: What is the difference between them?</a></p></div><div class="notices tip"><p><a href=https://github.com/byuids-resumes/mdresume>Markdown Resume (mdresume) Repository</a> and <a href=https://github.com/byuids-resumes>BYUI Data Science Resumes</a></p></div><h3 id=grand-questions>Grand Questions</h3><ol><li><strong>Join the <a href=https://github.com/byuids-resumes>BYUI Data Science Resumes</a> GitHub organization and use the template repository to make a resume repository under your repositories. A good name might be LASTNAME-Resume.</strong></li><li><strong>Clone your repository to your computer and build a first draft of your resume.</strong></li><li><strong>Push your results to GitHub and have another student fork your repository to make edits.</strong></li><li><strong>Accept the proposed changes from the student review and finish your final version.</strong></li><li><strong>Make sure your resume is forked by <a href=https://github.com/byuids-resumes>BYU-I Data Science Resumes</a></strong></li></ol></div><p class="post-meta border-bottom pb-3 mb-0 mt-3">Updated on 15 Sep 2020</p><nav class="pagination mt-3"><a class="nav nav-prev" href=https://byuistats.github.io/DS250-Cannon/slides/><i class="ti-arrow-left mr-2"></i><span class="d-none d-md-block">Slides</span></a>
+active"><a href=https://byuistats.github.io/DS250-Cannon/slides/p6/>Week 12-13: Project 6 - Github</a><ul><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p6/d4/ title="Day 3: Resume Fork and Merge" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p6/d4/>Day 3: Resume Fork and Merge</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p6/d3/ title="Day 2: Commit, push, fork, and merge" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p6/d3/>Day 2: Commit, push, fork, and merge</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/p6/d2/ title="Day 1: Git and Github" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/p6/d2/>Day 1: Git and Github</a></li></ul></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/introduction/ title="Week 1: Introduction" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/introduction/>Week 1: Introduction</a><ul><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/introduction/day02/ title="Day 2: Project 0" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/introduction/day02/>Day 2: Project 0</a></li><li data-nav-id=https://byuistats.github.io/DS250-Cannon/slides/introduction/day01/ title="Day 1: Welcome" class=sidelist><a href=https://byuistats.github.io/DS250-Cannon/slides/introduction/day01/>Day 1: Welcome</a></li></ul></li></ul></li></ul></div></div><div class=col-lg-9><div class="p-lg-5 p-4 bg-white"><h2 class=mb-5>Week 12-13: Project 6 - Github</h2><div class=content><div class="notices info"><p><p>GitHub is the communication tool for Data Scientists and developers. As students, you will want to curate your creative work on GitHub using Git. GitHub is the place to share your original work, not your homework assignments. Many people store their personal websites, blogs, and project websites on GitHub. Our textbook and course are hosted on GitHub, and you can see <a href=http://jhathaway.io/>J. Hathaway&rsquo;s</a> or <a href=https://ryanhafen.com/>Ryan Hafen&rsquo;s</a> personal Data Science websites that are hosted on GitHub as well. You will be making your public resume that will be hosted on GitHub for this project.</p><p>In the process of this project, we will be learning the process of Git and the tools of GitHub. We will use the Git process to have others in our class to edit our resumes. Take the process seriously (pick a suitable username and write a good resume), and you will have the beginning of your social presence in the DS/CS space.</p></p></div><div class="notices note"><p><strong>Completed Readings:</strong> <a href=https://tech.economictimes.indiatimes.com/news/internet/inside-github-web-developers-social-media-platform/77096752>GitHub, a programmer&rsquo;s social media</a>, <a href=https://github.com/join>Join GitHub</a>, <a href=https://github.blog/2019-06-06-generate-new-repositories-with-repository-templates/>Repository Templates</a>, <a href=https://code.visualstudio.com/docs/editor/versioncontrol>Using Version Control in VS Code</a>, <a href=https://code.visualstudio.com/docs/editor/github>Working with GitHub in VS Code</a>, <a href="https://www.youtube.com/watch?v=wMqukSKYcvU">Git in Visual Studio Code video</a>, <a href=https://www.analyticsvidhya.com/blog/2020/05/git-github-essential-guide-beginners/>New to Git and GitHub? This Essential Beginners Guide is for you</a>, <a href="https://www.theserverside.com/video/Git-vs-GitHub-What-is-the-difference-between-them#:~:text=The%20key%20difference%20between%20Git,and%20upload%20or%20download%20resources.">Git vs. GitHub: What is the difference between them?</a></p></div><div class="notices tip"><p><a href=https://github.com/byuids-resumes/mdresume>Markdown Resume (mdresume) Repository</a> and <a href=https://github.com/byuids-resumes>BYUI Data Science Resumes</a></p></div><h3 id=grand-questions>Grand Questions</h3><ol><li><strong>Join the <a href=https://github.com/byuids-resumes>BYUI Data Science Resumes</a> GitHub organization and use the template repository to make a resume repository under your repositories. A good name might be LASTNAME-Resume.</strong></li><li><strong>Clone your repository to your computer and build a first draft of your resume.</strong></li><li><strong>Push your results to GitHub and have another student fork your repository to make edits.</strong></li><li><strong>Accept the proposed changes from the student review and finish your final version.</strong></li><li><strong>Make sure your resume is forked by <a href=https://github.com/byuids-resumes>BYU-I Data Science Resumes</a></strong></li></ol></div><p class="post-meta border-bottom pb-3 mb-0 mt-3">Updated on 15 Sep 2020</p><nav class="pagination mt-3"><a class="nav nav-prev" href=https://byuistats.github.io/DS250-Cannon/slides/><i class="ti-arrow-left mr-2"></i><span class="d-none d-md-block">Slides</span></a>
 <a class="nav nav-next" href=https://byuistats.github.io/DS250-Cannon/slides/p6/d4/><span class="d-none d-md-block">Day 3: Resume Fork and Merge</span><i class="ti-arrow-right ml-2"></i></a></nav></div></div></div></div></section><footer class="section pb-4"><div class=container><div class="row align-items-center"><div class="col-md-8 text-md-left text-center"><p class="mb-md-0 mb-4">J. Hathaway and BYU-I ©</p></div><div class="col-md-4 text-md-right text-center"><ul class=list-inline><li class=list-inline-item><a class="text-color d-inline-block p-2" href=https://github.com/byuidatascience><i class=ti-github></i></a></li><li class=list-inline-item><a class="text-color d-inline-block p-2" href=https://www.linkedin.com/groups/13537407/><i class=ti-linkedin></i></a></li></ul></div></div></div></footer><script src=https://byuistats.github.io/DS250-Cannon/js/script.min.js></script></body></html>
\ No newline at end of file

Column	Original Format	Convert To
age	category (ordinal, age ranges)	number
income	category (ordinal, income ranges)	number
education	category (ordinal, name of degree)	number
shot_first	category (nominal)	one-hot
gender	category (nominal)	one-hot
location	category (nominal)	one-hot
fan_star_wars	Yes/No	0/1
expanded_universe	Yes/No	0/1
fan_exapanded	Yes/No	0/1
fan_star_trek	Yes/No	0/1
seen_i	Yes/No (name of movie/NaN)	0/1
seen_ii	Yes/No (name of movie/NaN)	0/1
seen_iii	Yes/No (name of movie/NaN)	0/1
seen_iv	Yes/No (name of movie/NaN)	0/1
seen_v	Yes/No (name of movie/NaN)	0/1
seen_vi	Yes/No (name of movie/NaN)	0/1
movie rankings	number	-
character rankings	category (ordinal)	one-hot or factorize