-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Approx median fixes #20
base: main
Are you sure you want to change the base?
Conversation
Added a quick fix for #24 |
README.rst
Outdated
.. code-block:: python | ||
|
||
>>> moe_example = [ | ||
dict(min=math.nan, max=9999, n=6, moe=1), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're going to advise leaving first min and the last max null, we need to be totally consistent and do that in every single example. I also think that we should the default built in None
type variable rather than nan
so that we're hewing as closely as possible to the standard Python library.
README.rst
Outdated
@@ -131,25 +131,27 @@ Approximating medians | |||
|
|||
Estimate a median and approximate the margin of error. Follows the U.S. Census Bureau's official guidelines for estimation. Useful for generating medians for measures like household income and age when aggregating census geographies. | |||
|
|||
Expects a list of dictionaries that divide the full range of data values into continuous categories. Each dictionary should have three keys: | |||
Expects a list of dictionaries that divide the full range of data values into continuous categories. Each dictionary should have three keys with an optional fourth key for margin of error inputs: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we first describe this input. We need to be emphatically clear that the first min and the last max must be None
type objects and briefly explain why.
README.rst
Outdated
dict(min=200000, max=math.nan, n=18, moe=10) | ||
] | ||
>>> import numpy | ||
>>> numpy.random.seed(711355) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need the seed in example unless we are the point about reproducibility.
Find the value for the dataset you are estimating by referring to `the bureau's reference material <https://www.census.gov/programs-surveys/acs/technical-documentation/pums/documentation.html>`_. | ||
|
||
If you have an associated "jam values" for your dataset provided in the `American Community Survey's technical documentation <https://www.documentcloud.org/documents/6165752-2017-SummaryFile-Tech-Doc.html#document/p20/a508561>`_, input the pair as a list to the `jam_values` keyword argument. | ||
Then if the median falls in the first or last bin, the jam value will be returned instead of `None`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a jam values example here that does not use the simulation method.
README.rst
Outdated
] | ||
>>> import numpy | ||
>>> numpy.random.seed(711355) | ||
>>> census_data_aggregator.approximate_median(moe_example, design_factor=1, sampling_percentage=5*2.5, simulations=50, jam_values=[2499, 200001]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This example probably doesn't need the jam value inputs
|
||
.. code-block:: python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we believe that the moe-based method is superior, we should make it the first and default example.
Deals with #17 and #18