Skip to content

Commit

Permalink
various build fixes
Browse files Browse the repository at this point in the history
  • Loading branch information
brownsarahm committed Oct 18, 2024
1 parent 790efb2 commit 073e102
Show file tree
Hide file tree
Showing 7 changed files with 25 additions and 7 deletions.
2 changes: 1 addition & 1 deletion _toc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -68,4 +68,4 @@ parts:
title: Advice from FA2020 Students
- url: https://rhodyprog4ds.github.io/BrownFall21/letters/
title: Advice from FA2021 Students
- file: letters/index
# - file: letters/index
2 changes: 1 addition & 1 deletion assignments/04-prepare.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ __Due: 2023-10-03__

Eligible skills:
- prepare 1
- access 1
- access 2
- python 1,2


Expand Down
2 changes: 1 addition & 1 deletion assignments/06-evaluate.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ Eligible skills:

## Related notes

- [](../notes/2023-10-12)
- [](../notes/2024-10-10)
<!-- - [](../notes/2023-03-02) -->


Expand Down
2 changes: 1 addition & 1 deletion notes/2024-09-26.md
Original file line number Diff line number Diff line change
Expand Up @@ -440,7 +440,7 @@ I would like to show a histogram here, but for somereason it broke. The output i
```

```{code-cell} ipython3
:tags:["hide-output"]
:tags: ["hide-input"]
pd.cut(coffee_df_bags['Number.of.Bags'],bins=3).hist()
```

Expand Down
18 changes: 17 additions & 1 deletion notes/2024-10-01.md
Original file line number Diff line number Diff line change
Expand Up @@ -118,12 +118,20 @@ here I suppressed the output in class by looking only at the first few character
cs_people_html[:100]
```

<!--
```{code-cell} ipython3
:tags: ["hide-cell"]
# save html to a file to read it in as parts via notebook features
with open('cs_people.html','w') as f:
f.write(cs_people_html)
```
```{literalinclude} https://web.uri.edu/cs/people/
```{literalinclude} cs_people.html
:start-at: Department Chair
:end-before: Directors
:lineno-match:
```
-->


But we do not need to manually write search tools, that's what [`BeautifulSoup`](https://beautiful-soup-4.readthedocs.io/en/latest/) is for.
Expand Down Expand Up @@ -396,3 +404,11 @@ Technically you could manually edit a copy of it.
Web scraping is *for* when the website is not in tabular form. It should be strucutred, but the structure does not need to come from a single page. It could be that there are many pages strucutred similarly and you build most of the columns from the other pages, not the starting page.

For example from the [teams page of the nba](https://www.nba.com/teams) you can get to a page with info about each team that includes all time records and the current rosters. On these individual pages, most info is an actual table, so you can use `pd.read_html` for those, but the crawing part from the first page would count.


```{code-cell} ipython3
:tags: ["hide-cell"]
# delete temp file
import os
# os.remove('cs_people.html')
```
Empty file added notes/cs_people.html
Empty file.
6 changes: 4 additions & 2 deletions resources/glossary.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,11 @@
[BeautifulSoup](https://www.crummy.com/software/BeautifulSoup/bs4/doc/)
a python library used to assist in web scraping, it pulls data from html and xml files that can be parsed in a variety of different ways using different methods.
conditional
a logical control to do something, conditioned on something else, for example the `if`, `elif` `else`
corpus
(NLP) a set of documents for analysis
Expand Down Expand Up @@ -60,7 +62,7 @@ kernel
in the jupyter environment, [the kernel](https://jupyter-notebook-beginner-guide.readthedocs.io/en/latest/what_is_jupyter.html#kernel) is a language specific computational engine
[lambda](https://docs.python.org/3.9/reference/expressions.html#lambda)
they keyword used to define an anonymous function; lambda functions are defined with a compact syntax `<name> = lambda <parameters>: <body> `
they keyword used to define an anonymous function; lambda functions are defined with a compact syntax `<name> = lambda <parameters>: <body>`
numpy array
a type provided by [numpy]() to represent matrices, used by `pd.DataFrame.values` [doc](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.values.html) and accessed by `pd.DataFrame.to_numpy` [doc](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_numpy.html#pandas.DataFrame.to_numpy)
Expand Down

0 comments on commit 073e102

Please sign in to comment.