Merge pull request #14 from DeepLcom/improved-docs

Improved docs
DeepLcom · Nov 3, 2023 · 97c3c8e · 97c3c8e
2 parents 7fdbb7a + 3aea102
commit 97c3c8e
Show file tree

Hide file tree

Showing 103 changed files with 11,545 additions and 598 deletions.
diff --git a/.gitignore b/.gitignore
@@ -161,3 +161,9 @@ cython_debug/
 
 # Google cloud test credentials
 google-cloud-credentials.json
+
+# Jupyter notebooks
+*.ipynb
+
+# MAC
+.DS_Store
diff --git a/Makefile b/Makefile
@@ -0,0 +1,31 @@
+# Minimal makefile for Sphinx documentation
+#
+
+# You can set these variables from the command line, and also
+# from the environment for the first two.
+SPHINXOPTS    ?=
+SPHINXBUILD   ?= poetry run sphinx-build
+SOURCEDIR     = ./docsource
+BUILDDIR      = ./docs/_build
+
+# Put it first so that "make" without argument is like "make help".
+help:
+	@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
+
+.PHONY: help Makefile
+
+# Build the docs
+build-docs-local:
+	poetry run sphinx-apidoc ./src/sql_mock -o "$(SOURCEDIR)"
+	@$(SPHINXBUILD) -M html "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
+
+build-docs-github:
+	@make build-docs-local 
+	@cp -a docs/_build/html/. ./docs
+
+# Catch-all target: route all unknown targets to Sphinx using the new
+# "make mode" option.  $(O) is meant as a shortcut for $(SPHINXOPTS).
+%: Makefile
+	@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
+
+
diff --git a/README.md b/README.md
@@ -6,13 +6,17 @@
 
 The primary purpose of this library is to simplify the testing of SQL data models and queries by allowing users to mock input data and create tests for various scenarios. It provides a consistent and convenient way to test the execution of your query without the need to process a massive amount of data.
 
-The library currently supports the following databases. Database specific documentations are provided in the links:
-* [BigQuery](src/sql_mock/bigquery/README.md)
-* [Clickhouse](src/sql_mock/clickhouse/README.md)
+# Documentation
+
+A full documentation can be found [on the documentation page](https://deeplcom.github.io/sql-mock/)
+
+The library currently supports the following databases. 
+* BigQuery
+* Clickhouse
 
 ## Installation
 
-The library can be installed from [PyPI][pypi-project] using pip:
+The library can be installed from [PyPI](https://pypi.org/project/sql-mock/) using pip:
 
 ```shell
 # BigQuery
@@ -28,283 +32,6 @@ If you need to modify this source code, install the dependencies using poetry:
 poetry install --all-extras
 ```
 
-
-## Usage
-
-### How it works
-
-Before diving into specific database scenarios, let's start with a simplified example of how SQL Mock works behind the scenes.
-
-
-1. You have an original SQL query, for instance:
-   ```sql
-   -- path/to/query_for_result_table.sql
-   SELECT id FROM data.table1
-   ```
-
-
-2. Using SQL Mock, you define mock tables. You can use the built-in column types provided by SQL Mock. Available column types include `Int`, `String`, `Date`, and more. Each database type has their own column types. Define your tables by subclassing a mock table class that fits your database (e.g. `BigQueryMockTable`) and specifying the column types along with default values. In our example we use the `ClickhouseTableMock` class
-    ```python
-    from sql_mock.clickhouse import column_mocks as col
-    from sql_mock.clickhouse.table_mocks import ClickHouseTableMock, table_meta
-
-    @table_meta(table_ref='data.table1')
-    class Table(ClickHouseTableMock):
-        id = col.Int(default=1)
-        name = col.String(default='Peter')
-    
-    @table_meta(table_ref='data.result_table', query_path='path/to/query_for_result_table.sql')
-    class ResultTable(ClickhouseTableMock):
-        id = col.Int(default=1)
-    ```
-
-3. **Creating mock data:** Define mock data for your tables using dictionaries. Each dictionary represents a row in the table, with keys corresponding to column names. Table column keys that don't get a value will use the default.
-    ```python
-    user_data = [
-        {}, # This will use the defaults for both id and name
-        {'id': 2, 'name': 'Martin'},
-        {'id': 3}, # This will use defaults for the name
-    ]
-
-    input_table_mock = Table.from_dicts(user_data)
-    ```
-
-
-4. **Getting results for a table mock:** Use the `from_mocks` method of the table mock object to generate mock query results based on your mock data.
-    ```python
-    res = ResultTable.from_mocks(input_data=[input_table_mock])
-    ```
-
-5. Behind the scene SQL Mock replaces table references (e.g. `data.table1`) in your query with Common Table Expressions (CTEs) filled with dummy data. It can roughly be compared to something like this:
-    ```sql
-    WITH data__table1 AS (
-        -- Mocked inputs
-        SELECT 
-            cast('1' AS 'String') AS id, 
-            cast('Peter' AS 'String') AS name
-        UNION ALL 
-        SELECT 
-            cast('2' AS 'String') AS id, 
-            cast('Martin' AS 'String') AS name
-        UNION ALL 
-        SELECT 
-            cast('3' AS 'String') AS id, 
-            cast('Peter' AS 'String') AS name
-    )
-
-    result AS (
-        -- Original query with replaced references
-        SELECT id FROM data__table1 
-    )
-
-    SELECT 
-        cast(id AS 'String') AS id
-    FROM result
-    ```
-
-6. Finally, you can compare your results to some expected results using the `assert_equal` method.
-    ```python
-    expected = [{'id': '1'},{'id': '2'},{'id': '3'}]
-    res.assert_equal(expected)
-    ```
-
-### Defining your table mocks
-
-When you want to provide mocked data to test your SQL model, you need to create MockTable classes for all upstream data that your model uses, as well as for the model you want to test. Those mock tables can be created by inheriting from a `BaseMockTable` class for the database provider you are using (e.g. `BigQueryMockTable`).
-
-**We recommend to have a central `model.py` file where you create those models that you can easily reuse them across your tests**
-
-```python
-# models.py
-
-from sql_mock.bigquery import column_mocks as col
-from sql_mock.bigquery.table_mocks import BigQueryMockTable, table_meta
-
-# The models you are goign to use as inputs need to have a `table_ref` specified
-@table_meta(table_ref='data.table1')
-class Table(BigQueryMockTable):
-    id = col.Int(default=1)
-    name = col.String(default='Peter')
-
-@table_meta(table_ref='data.result_table', query_path='path/to/query_for_result_table.sql')
-class ResultTable(BigQueryMockTable):
-    id = col.Int(default=1)
-```
-
-Some important things to mention:
-
-#### The models you are goign to use as inputs need to have a `table_ref` specified. 
-The `table_ref` is how the table will be referenced in your production database (usually some pattern like `<schema>.<table>`)
-
-#### The model needs to have a query. 
-There are currently 2 ways to provide a query to the model: 
-
-1. Pass a path to your query file in the class definition using the `table_meta` decorator. This allows us to only specify it once.   
-    ```python
-    @table_meta(table_ref='data.result_table', query_path='path/to/query_for_result_table.sql')
-    class ResultTable(BigQueryMockTable):
-        ...
-    ```
-2. Pass it as `query` argument to the `from_mocks` method when you are using the model in your test. This will also overwrite whatever query was read from the `query_path` in the `table_meta` decorator.
-
-### Assert a specific result for a CTE
-
-A lot of times when we built more complicated data models, they include a bunch of CTEs that map to separate logical steps. 
-In those cases, when we unit test our models, we want to be able to not only check the final result but also the single steps.
-To do this, you can use the `assert_cte_equal` method. 
-
-Let's assume we have the following query:
-
-```sql
-WITH subscriptions_per_user AS (
-    SELECT
-        count(sub.user_id) AS subscription_count,
-        users.user_id
-    FROM data.users AS users
-    LEFT JOIN data.subscriptions AS sub ON sub.user_id = users.user_id
-    GROUP BY user_id
-),
-
-users_with_multiple_subs AS (
-    SELECT 
-        *
-    FROM subscriptions_per_user
-    WHERE subscription_count >= 2
-)
-
-SELECT user_id FROM users_with_multiple_subs
-```
-
-Now we can test the CTE logic separately like this:
-
-```python
-import datetime
-
-from sql_mock.bigquery import column_mocks as col
-from sql_mock.bigquery.table_mocks import BigQueryMockTable
-from sql_mock.table_mocks import table_meta
-
-@table_meta(table_ref="data.users")
-class UserTable(BigQueryMockTable):
-    user_id = col.Int(default=1)
-    user_name = col.String(default="Mr. T")
-
-
-@table_meta(table_ref="data.subscriptions")
-class SubscriptionTable(BigQueryMockTable):
-    subscription_id = col.Int(default=1)
-    period_start_date = col.Date(default=datetime.date(2023, 9, 5))
-    period_end_date = col.Date(default=datetime.date(2023, 9, 5))
-    user_id = col.Int(default=1)
-
-
-@table_meta(query_path="./examples/test_query.sql")
-class MultipleSubscriptionUsersTable(BigQueryMockTable):
-    user_id = col.Int(default=1)
-
-
-def test_model():
-    users = UserTable.from_dicts([{"user_id": 1}, {"user_id": 2}])
-    subscriptions = SubscriptionTable.from_dicts(
-        [
-            {"subscription_id": 1, "user_id": 1},
-            {"subscription_id": 2, "user_id": 1},
-            {"subscription_id": 2, "user_id": 2},
-        ]
-    )
-
-    subscriptions_per_user__expected = [{"user_id": 1, "subscription_count": 2}, {"user_id": 2, "subscription_count": 1}]
-    users_with_multiple_subs__expected = [{"user_id": 1, "subscription_count": 2}]
-    end_result__expected = [{"user_id": 1}]
-
-    res = MultipleSubscriptionUsersTable.from_mocks(input_data=[users, subscriptions])
-
-    # Check the results of the subscriptions_per_user CTE
-    res.assert_cte_equal('subscriptions_per_user', subscriptions_per_user__expected)
-    # Check the results of the users_with_multiple_subs CTE
-    res.assert_cte_equal('users_with_multiple_subs', users_with_multiple_subs__expected)
-    # Check the end result
-    res.assert_equal(end_result__expected)
-```
-
-### Recommended Setup for Pytest
-If you are using pytest, make sure to add a `conftest.py` file to the root of your project.
-In the file add the following lines:
-```python
-import pytest
-pytest.register_assert_rewrite('sql_mock')
-```
-This allows you to get a rich comparison when using the `.assert_equal` method on the table mock instances.
-
-We also recommend using [pytest-icdiff](https://github.com/hjwp/pytest-icdiff) for better visibility on diffs of failed tests.
-
-### Examples
-You can find some examples in the [examples folder](examples/).
-
-
-## FAQ
-
-### My database system is not supported yet but I want to use SQL Mock. What should I do?
-
-We are planning to add more and more supported database systems. However, if your system is not supported yet, you can still use SQL Mock. There are only 2 things you need to do:
-
-#### Create your `MockTable` class 
-
-First, you need to create a `MockTable` class for your database system that inherits from `sql_mock.table_mocks.BaseMockTable`.
-
-That class needs to implement the `_get_results` method which should make sure to fetch the results of a query (e.g. produced by `self._generate_query()`) and return it as list of dictionaries.
-
-Look at one of the existing client libraries to see how this could work (e.g. [BigQueryMockTable](https://github.com/DeepLcom/sql-mock/blob/main/src/sql_mock/bigquery/table_mocks.py)).
-
-You might want to create a settings class as well in case you need some specific connection settings to be available within the `_get_results` method.
-
-#### Create your `ColumnMocks`
-
-Your database system might support specific database types. In order to make them available as column types, you can use the `sql_mock.column_mocks.ColumnMock` class as a base and inherit your specific column types from it.
-For most of your column mocks you might only need to specify the `dtype` that should be used to parse the inputs.
-
-A good practise is to create a `ColumnMock` class that is specific to your database and inherit all your column types from it, e.g.:
-
-```python
-from sql_mock.column_mocks import ColumnMock
-
-class MyFanceDatabaseColumnMock(ColumnMock):
-    # In case you need some specific logic that overwrites the default behavior, you can do so here
-    pass 
-
-class Int(MyFanceDatabaseColumnMock):
-    dtype = "Integer"
-
-class String(MyFanceDatabaseColumnMock):
-    dtype = "String"
-```
-
-#### Contribute your database setup
-
-There will definitely be folks in the community that are in the need of support for the database you just created all the setup for.
-Feel free to create a PR on this repository that we can start supporting your database system!
-
-
-### I am missing a specific ColumnMock type for my model fields
-
-We implementd some basic column types but it could happen that you don't find the one you need. 
-Luckily, you can easily create those with the tools provided.
-The only thing you need to do is to inherit from the `ColumnMock` that is specific to your database system (e.g. `BigQueryColumnMock`) and write classes for the column mocks you are missing. Usually you only need to set the correct `dtype`. This would later be used in the `cast(col to <dtype>)` expression.
-
-```python
-# Replace the import with the database system you are using
-from sql_mock.bigquery.column_mock import BigQueryColumnMock 
-
-class MyFancyMissingColType(BigQueryColumnMock):
-    dtype = "FancyMissingColType"
-
-    # In case you need to implement additional logic for casting, you can do so here
-    ...
-```
-
-**Don't forget to create a PR in case you feel that your column mock type could be useful for the community**!
-
-
 ## Contributing
 
 We welcome contributions to improve and enhance this open-source project. Whether you want to report issues, suggest new features, or directly contribute to the codebase, your input is valuable. To ensure a smooth and collaborative experience for both contributors and maintainers, please follow these guidelines:

diff --git a/docs/.buildinfo b/docs/.buildinfo
@@ -0,0 +1,4 @@
+# Sphinx build info version 1
+# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
+config: 6c6eb42f440f1f4c07a591de3a56995e
+tags: 645f666f9bcd5a90fca523b33c5a78b7
diff --git a/docs/.nojekyll b/docs/.nojekyll