Convert object storage connector docs to md

trinodb · Aug 22, 2023 · 79931e7 · 79931e7
1 parent 0663f67
commit 79931e7
Show file tree

Hide file tree

Showing 17 changed files with 2,374 additions and 2,388 deletions.
diff --git a/.../src/main/sphinx/connector/delta-lake.rst → docs/src/main/sphinx/connector/delta-lake.md b/.../src/main/sphinx/connector/delta-lake.rst → docs/src/main/sphinx/connector/delta-lake.md
diff --git a/docs/src/main/sphinx/connector/hive-alluxio.md b/docs/src/main/sphinx/connector/hive-alluxio.md
@@ -0,0 +1,16 @@
+# Hive connector with Alluxio
+
+The {doc}`hive` can read and write tables stored in the [Alluxio Data Orchestration
+System](https://www.alluxio.io/),
+leveraging Alluxio's distributed block-level read/write caching functionality.
+The tables must be created in the Hive metastore with the `alluxio://`
+location prefix (see [Running Apache Hive with Alluxio](https://docs.alluxio.io/os/user/stable/en/compute/Hive.html)
+for details and examples).
+
+Trino queries will then transparently retrieve and cache files or objects from
+a variety of disparate storage systems including HDFS and S3.
+
+## Setting up Alluxio with Trino
+
+For information on how to setup, configure, and use Alluxio, refer to [Alluxio's
+documentation on using their platform with Trino](https://docs.alluxio.io/ee/user/stable/en/compute/Trino.html).
diff --git a/docs/src/main/sphinx/connector/hive-alluxio.rst b/docs/src/main/sphinx/connector/hive-alluxio.rst
diff --git a/.../src/main/sphinx/connector/hive-azure.rst → docs/src/main/sphinx/connector/hive-azure.md b/.../src/main/sphinx/connector/hive-azure.rst → docs/src/main/sphinx/connector/hive-azure.md
@@ -1,22 +1,15 @@
-=================================
-Hive connector with Azure Storage
-=================================
+# Hive connector with Azure Storage
 
-The :doc:`hive` can be configured to use `Azure Data Lake Storage (Gen2)
-<https://azure.microsoft.com/products/storage/data-lake-storage/>`_. Trino
+The {doc}`hive` can be configured to use [Azure Data Lake Storage (Gen2)](https://azure.microsoft.com/products/storage/data-lake-storage/). Trino
 supports Azure Blob File System (ABFS) to access data in ADLS Gen2.
 
-Trino also supports `ADLS Gen1
-<https://learn.microsoft.com/azure/data-lake-store/data-lake-store-overview>`_
-and Windows Azure Storage Blob driver (WASB), but we recommend `migrating to
-ADLS Gen2
-<https://learn.microsoft.com/azure/storage/blobs/data-lake-storage-migrate-gen1-to-gen2-azure-portal>`_,
+Trino also supports [ADLS Gen1](https://learn.microsoft.com/azure/data-lake-store/data-lake-store-overview)
+and Windows Azure Storage Blob driver (WASB), but we recommend [migrating to
+ADLS Gen2](https://learn.microsoft.com/azure/storage/blobs/data-lake-storage-migrate-gen1-to-gen2-azure-portal),
 as ADLS Gen1 and WASB are legacy options that will be removed in the future.
-Learn more from `the official documentation
-<https://docs.microsoft.com/azure/data-lake-store/data-lake-store-overview>`_.
+Learn more from [the official documentation](https://docs.microsoft.com/azure/data-lake-store/data-lake-store-overview).
 
-Hive connector configuration for Azure Storage credentials
-----------------------------------------------------------
+## Hive connector configuration for Azure Storage credentials
 
 To configure Trino to use the Azure Storage credentials, set the following
 configuration properties in the catalog properties file. It is best to use this
@@ -26,16 +19,16 @@ The specific configuration depends on the type of storage and uses the
 properties from the following sections in the catalog properties file.
 
 For more complex use cases, such as configuring multiple secondary storage
-accounts using Hadoop's ``core-site.xml``, see the
-:ref:`hive-azure-advanced-config` options.
+accounts using Hadoop's `core-site.xml`, see the
+{ref}`hive-azure-advanced-config` options.
 
-ADLS Gen2 / ABFS storage
-^^^^^^^^^^^^^^^^^^^^^^^^
+### ADLS Gen2 / ABFS storage
 
 To connect to ABFS storage, you may either use the storage account's access
 key, or a service principal. Do not use both sets of properties at the
 same time.
 
+```{eval-rst}
 .. list-table:: ABFS Access Key
   :widths: 30, 70
   :header-rows: 1
@@ -46,7 +39,9 @@ same time.
     - The name of the ADLS Gen2 storage account
   * - ``hive.azure.abfs-access-key``
     - The decrypted access key for the ADLS Gen2 storage account
+```
 
+```{eval-rst}
 .. list-table:: ABFS Service Principal OAuth
   :widths: 30, 70
   :header-rows: 1
@@ -59,28 +54,28 @@ same time.
     - The service principal's client/application ID.
   * - ``hive.azure.abfs.oauth.secret``
     - A client secret for the service principal.
+```
 
 When using a service principal, it must have the Storage Blob Data Owner,
 Contributor, or Reader role on the storage account you are using, depending on
 which operations you would like to use.
 
-ADLS Gen1 (legacy)
-^^^^^^^^^^^^^^^^^^
+### ADLS Gen1 (legacy)
 
 While it is advised to migrate to ADLS Gen2 whenever possible, if you still
 choose to use ADLS Gen1 you need to include the following properties in your
 catalog configuration.
 
-.. note::
-
-    Credentials for the filesystem can be configured using ``ClientCredential``
-    type. To authenticate with ADLS Gen1 you must create a new application
-    secret for your ADLS Gen1 account's App Registration, and save this value
-    because you won't able to retrieve the key later. Refer to the Azure
-    `documentation
-    <https://docs.microsoft.com/azure/data-lake-store/data-lake-store-service-to-service-authenticate-using-active-directory>`_
-    for details.
+:::{note}
+Credentials for the filesystem can be configured using `ClientCredential`
+type. To authenticate with ADLS Gen1 you must create a new application
+secret for your ADLS Gen1 account's App Registration, and save this value
+because you won't able to retrieve the key later. Refer to the Azure
+[documentation](https://docs.microsoft.com/azure/data-lake-store/data-lake-store-service-to-service-authenticate-using-active-directory)
+for details.
+:::
 
+```{eval-rst}
 .. list-table:: ADLS properties
   :widths: 30, 70
   :header-rows: 1
@@ -97,10 +92,11 @@ catalog configuration.
   * - ``hive.azure.adl-proxy-host``
     - Proxy host and port in ``host:port`` format. Use this property to connect
       to an ADLS endpoint via a SOCKS proxy.
+```
 
-WASB storage (legacy)
-^^^^^^^^^^^^^^^^^^^^^
+### WASB storage (legacy)
 
+```{eval-rst}
 .. list-table:: WASB properties
   :widths: 30, 70
   :header-rows: 1
@@ -111,54 +107,51 @@ WASB storage (legacy)
     - Storage account name of Azure Blob Storage
   * - ``hive.azure.wasb-access-key``
     - The decrypted access key for the Azure Blob Storage
+```
 
-.. _hive-azure-advanced-config:
+(hive-azure-advanced-config)=
 
-Advanced configuration
-^^^^^^^^^^^^^^^^^^^^^^
+### Advanced configuration
 
 All of the configuration properties for the Azure storage driver are stored in
-the Hadoop ``core-site.xml`` configuration file. When there are secondary
+the Hadoop `core-site.xml` configuration file. When there are secondary
 storage accounts involved, we recommend configuring Trino using a
-``core-site.xml`` containing the appropriate credentials for each account.
+`core-site.xml` containing the appropriate credentials for each account.
 
 The path to the file must be configured in the catalog properties file:
 
-.. code-block:: text
-
-    hive.config.resources=<path_to_hadoop_core-site.xml>
+```text
+hive.config.resources=<path_to_hadoop_core-site.xml>
+```
 
 One way to find your account key is to ask for the connection string for the
-storage account. The ``abfsexample.dfs.core.windows.net`` account refers to the
+storage account. The `abfsexample.dfs.core.windows.net` account refers to the
 storage account. The connection string contains the account key:
 
-.. code-block:: text
+```text
+az storage account  show-connection-string --name abfswales1
+{
+  "connectionString": "DefaultEndpointsProtocol=https;EndpointSuffix=core.windows.net;AccountName=abfsexample;AccountKey=examplekey..."
+}
+```
 
-    az storage account  show-connection-string --name abfswales1
-    {
-      "connectionString": "DefaultEndpointsProtocol=https;EndpointSuffix=core.windows.net;AccountName=abfsexample;AccountKey=examplekey..."
-    }
-
-When you have the account access key, you can add it to your ``core-site.xml``
+When you have the account access key, you can add it to your `core-site.xml`
 or Java cryptography extension (JCEKS) file. Alternatively, you can have your
 cluster management tool to set the option
-``fs.azure.account.key.STORAGE-ACCOUNT`` to the account key value:
-
-.. code-block:: text
+`fs.azure.account.key.STORAGE-ACCOUNT` to the account key value:
 
-    <property>
-      <name>fs.azure.account.key.abfsexample.dfs.core.windows.net</name>
-      <value>examplekey...</value>
-    </property>
+```text
+<property>
+  <name>fs.azure.account.key.abfsexample.dfs.core.windows.net</name>
+  <value>examplekey...</value>
+</property>
+```
 
-For more information, see `Hadoop Azure Support: ABFS
-<https://hadoop.apache.org/docs/stable/hadoop-azure/abfs.html>`_.
+For more information, see [Hadoop Azure Support: ABFS](https://hadoop.apache.org/docs/stable/hadoop-azure/abfs.html).
 
-Accessing Azure Storage data
-----------------------------
+## Accessing Azure Storage data
 
-URI scheme to reference data
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+### URI scheme to reference data
 
 Consistent with other FileSystem implementations within Hadoop, the Azure
 Standard Blob and Azure Data Lake Storage Gen2 (ABFS) drivers define their own
@@ -169,86 +162,89 @@ different systems.
 
 ABFS URI:
 
-.. code-block:: text
-
-    abfs[s]://<file_system>@<account_name>.dfs.core.windows.net/<path>/<path>/<file_name>
+```text
+abfs[s]://<file_system>@<account_name>.dfs.core.windows.net/<path>/<path>/<file_name>
+```
 
 ADLS Gen1 URI:
 
-.. code-block:: text
-
-    adl://<data_lake_storage_gen1_name>.azuredatalakestore.net/<path>/<file_name>
+```text
+adl://<data_lake_storage_gen1_name>.azuredatalakestore.net/<path>/<file_name>
+```
 
 Azure Standard Blob URI:
 
-.. code-block:: text
+```text
+wasb[s]://<container>@<account_name>.blob.core.windows.net/<path>/<path>/<file_name>
+```
 
-    wasb[s]://<container>@<account_name>.blob.core.windows.net/<path>/<path>/<file_name>
-
-Querying Azure Storage
-^^^^^^^^^^^^^^^^^^^^^^
+### Querying Azure Storage
 
 You can query tables already configured in your Hive metastore used in your Hive
 catalog. To access Azure Storage data that is not yet mapped in the Hive
 metastore, you need to provide the schema of the data, the file format, and the
 data location.
 
-For example, if you have ORC or Parquet files in an ABFS ``file_system``, you
-need to execute a query::
+For example, if you have ORC or Parquet files in an ABFS `file_system`, you
+need to execute a query:
 
-    -- select schema in which the table is to be defined, must already exist
-    USE hive.default;
+```
+-- select schema in which the table is to be defined, must already exist
+USE hive.default;
 
-    -- create table
-    CREATE TABLE orders (
-         orderkey BIGINT,
-         custkey BIGINT,
-         orderstatus VARCHAR(1),
-         totalprice DOUBLE,
-         orderdate DATE,
-         orderpriority VARCHAR(15),
-         clerk VARCHAR(15),
-         shippriority INTEGER,
-         comment VARCHAR(79)
-    ) WITH (
-         external_location = 'abfs[s]://<file_system>@<account_name>.dfs.core.windows.net/<path>/<path>/',
-         format = 'ORC' -- or 'PARQUET'
-    );
+-- create table
+CREATE TABLE orders (
+     orderkey BIGINT,
+     custkey BIGINT,
+     orderstatus VARCHAR(1),
+     totalprice DOUBLE,
+     orderdate DATE,
+     orderpriority VARCHAR(15),
+     clerk VARCHAR(15),
+     shippriority INTEGER,
+     comment VARCHAR(79)
+) WITH (
+     external_location = 'abfs[s]://<file_system>@<account_name>.dfs.core.windows.net/<path>/<path>/',
+     format = 'ORC' -- or 'PARQUET'
+);
+```
 
-Now you can query the newly mapped table::
+Now you can query the newly mapped table:
 
-    SELECT * FROM orders;
+```
+SELECT * FROM orders;
+```
 
-Writing data
-------------
+## Writing data
 
-Prerequisites
-^^^^^^^^^^^^^
+### Prerequisites
 
 Before you attempt to write data to Azure Storage, make sure you have configured
 everything necessary to read data from the storage.
 
-Create a write schema
-^^^^^^^^^^^^^^^^^^^^^
+### Create a write schema
 
 If the Hive metastore contains schema(s) mapped to Azure storage filesystems,
 you can use them to write data to Azure storage.
 
 If you don't want to use existing schemas, or there are no appropriate schemas
-in the Hive metastore, you need to create a new one::
+in the Hive metastore, you need to create a new one:
 
-    CREATE SCHEMA hive.abfs_export
-    WITH (location = 'abfs[s]://file_system@account_name.dfs.core.windows.net/<path>');
+```
+CREATE SCHEMA hive.abfs_export
+WITH (location = 'abfs[s]://file_system@account_name.dfs.core.windows.net/<path>');
+```
 
-Write data to Azure Storage
-^^^^^^^^^^^^^^^^^^^^^^^^^^^
+### Write data to Azure Storage
 
 Once you have a schema pointing to a location where you want to write the data,
-you can issue a ``CREATE TABLE AS`` statement and select your desired file
+you can issue a `CREATE TABLE AS` statement and select your desired file
 format. The data will be written to one or more files within the
-``abfs[s]://file_system@account_name.dfs.core.windows.net/<path>/my_table``
-namespace. Example::
-
-    CREATE TABLE hive.abfs_export.orders_abfs
-    WITH (format = 'ORC')
-    AS SELECT * FROM tpch.sf1.orders;
+`abfs[s]://file_system@account_name.dfs.core.windows.net/<path>/my_table`
+namespace. Example:
+
+```
+CREATE TABLE hive.abfs_export.orders_abfs
+WITH (format = 'ORC')
+AS SELECT * FROM tpch.sf1.orders;
+```