From 77b4089ffced22947d0ea5def7b945828ce3e39a Mon Sep 17 00:00:00 2001 From: hathawayj Date: Thu, 29 Feb 2024 16:25:57 +0000 Subject: [PATCH] deploy: c5e7875711029a098860069fc6ccd61684e54029 --- index.html | 2 +- slides/p4/d2/index.html | 21 ++++++++++++--------- 2 files changed, 13 insertions(+), 10 deletions(-) diff --git a/index.html b/index.html index 72206af..cca0a65 100644 --- a/index.html +++ b/index.html @@ -3,4 +3,4 @@

CSE 250: Data Science Programming

Using pandas, Altiar, scikit-learn, and NumPy to program with data

-
\ No newline at end of file +
\ No newline at end of file diff --git a/slides/p4/d2/index.html b/slides/p4/d2/index.html index cd5c9a4..a483aa3 100644 --- a/slides/p4/d2/index.html +++ b/slides/p4/d2/index.html @@ -3,7 +3,7 @@

Day 2: Intro to Machine Learning

Welcome to class!

Announcements

Spiritual thought

Are facts true?

  • How do you distinguish between truth and error?
  • Joshua and Caleb

Building a Decision Tree

Day 2: Intro to Machine Learning

Welcome to class!

alt text

Shire Reckoning

Announcements

  1. Coding Challenge Practice - Thursday, March 7

Spiritual thought

Are facts true?

  • How do you distinguish between truth and error?
  • Joshua and Caleb

Building a Decision Tree

Splitting the Data

1. Start with packages and data set

We’ll be using some parts of SKLEARN package and the Seaborn package.

# If you haven't already, install scikit-learn and seaborn
 pip install scikit-learn seaborn
 
from types import GeneratorType
 import pandas as pd
@@ -30,17 +30,20 @@
 

4. Split into training and testing sets

What does the “train_test_split()” function do?

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = #???, random_state = #???)
 

Read the documentation and tell me what is returned?

Function documentation

Why do we use “test_size” and “random_state”?

What is “x” and “y” in the above function example?

We need to take our data and build the feature and target data objects.

What columns should we remove from our features (X)?

What column should we use as our target (y)?

x = dwellings_ml.filter([#what variables will you use as "features"?])
 y = dwellings_ml[#what variable is the "target"?]
-


Training a Classifier

Decision Tree Example

# create the model
-classifier = DecisionTreeClassifier()
+


Training a Classifier

Decision Tree Example


+#%%
+# Create a decision tree
+classifier_DT = DecisionTreeClassifier(max_depth = 4)
 
-# train the model
-classifier.fit(x_train, y_train)
+# Fit the decision tree
+classifier_DT.fit(x_train, y_train)
 
-# make predictions
-y_predictions = classifier.predict(x_test)
+# Test the decision tree (make predictions)
+y_predicted_DT = classifier_DT.predict(x_test)
+
+# Evaluate the decision tree
+print("Accuracy:", metrics.accuracy_score(y_test, y_predicted_DT))
 
-# test how accurate predictions are
-metrics.accuracy_score(y_test, y_predictions)
 

How to Improve Accuracy

To improve the accuracy of your model, you could:

  • Change what variables are used in the features (x) data set
  • Change what type of model you are using
  • Tune (aka, “change” or “tweak”) the parameters of the model

Other Classification Models

Here are some other models you could try.

from sklearn.naive_bayes import GaussianNB
 from sklearn.ensemble import RandomForestClassifier
 from sklearn.ensemble import GradientBoostingClassifier