Skip to content

Commit

Permalink
[feature-16396] Tencent Cloud COS Storage Plugin
Browse files Browse the repository at this point in the history
  • Loading branch information
Mighten committed Sep 21, 2024
1 parent c989973 commit 696dcf0
Show file tree
Hide file tree
Showing 17 changed files with 586 additions and 5 deletions.
20 changes: 19 additions & 1 deletion docs/docs/en/guide/resource/configuration.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Resource Center Configuration

- You could use `Resource Center` to upload text files and other task-related files.
- You could configure `Resource Center` to use distributed file system like [Hadoop](https://hadoop.apache.org/docs/r2.7.0/) (2.6+), [MinIO](https://github.com/minio/minio) cluster or remote storage products like [AWS S3](https://aws.amazon.com/s3/), [Alibaba Cloud OSS](https://www.aliyun.com/product/oss), [Huawei Cloud OBS](https://support.huaweicloud.com/obs/index.html) etc.
- You could configure `Resource Center` to use distributed file system like [Hadoop](https://hadoop.apache.org/docs/r2.7.0/) (2.6+), [MinIO](https://github.com/minio/minio) cluster or remote storage products like [AWS S3](https://aws.amazon.com/s3/), [Alibaba Cloud OSS](https://www.aliyun.com/product/oss), [Huawei Cloud OBS](https://support.huaweicloud.com/obs/index.html), [Tencent Cloud COS](https://cloud.tencent.com/product/cos), etc.
- You could configure `Resource Center` to use local file system. If you deploy `DolphinScheduler` in `Standalone` mode, you could configure it to use local file system for `Resource Center` without the need of an external `HDFS` system or `S3`.
- Furthermore, if you deploy `DolphinScheduler` in `Cluster` mode, you could use [S3FS-FUSE](https://github.com/s3fs-fuse/s3fs-fuse) to mount `S3` or [JINDO-FUSE](https://help.aliyun.com/document_detail/187410.html) to mount `OSS` to your machines and use the local file system for `Resource Center`. In this way, you could operate remote files as if on your local machines.

Expand Down Expand Up @@ -96,3 +96,21 @@ resource.huawei.cloud.obs.endpoint=obs.cn-southwest-2.huaweicloud.com
> * If you want to use the resource upload function, the deployment user in [installation and deployment](../installation/standalone.md) must have relevant operation authority.
> * If you using a Hadoop cluster with HA, you need to enable HDFS resource upload, and you need to copy the `core-site.xml` and `hdfs-site.xml` under the Hadoop cluster to `worker-server/conf` and `api-server/conf`, otherwise skip this copy step.
## connect COS

if you want to upload resources to `Resource Center` connected to `COS`, you need to configure `api-server/conf/common.properties` and `worker-server/conf/common.properties`. You can refer to the following:

config the following fields

```properties
# access key id, required if you set resource.storage.type=COS
resource.tencent.cloud.access.key.id=<your-access-key-id>
# access key secret, required if you set resource.storage.type=COS
resource.tencent.cloud.access.key.secret=<your-access-key-secret>
# cos bucket name, required if you set resource.storage.type=COS
resource.tencent.cloud.cos.bucket.name=dolphinscheduler
# cos bucket region, required if you set resource.storage.type=COS, refer to https://cloud.tencent.com/document/product/436/6224
resource.tencent.cloud.cos.region=ap-nanjing

```

18 changes: 17 additions & 1 deletion docs/docs/zh/guide/resource/configuration.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# 资源中心配置详情

- 资源中心通常用于上传文件以及任务组管理等操作。
- 资源中心可以对接分布式的文件存储系统,如[Hadoop](https://hadoop.apache.org/docs/r2.7.0/)(2.6+)或者[MinIO](https://github.com/minio/minio)集群,也可以对接远端的对象存储,如[AWS S3](https://aws.amazon.com/s3/)或者[阿里云 OSS](https://www.aliyun.com/product/oss)[华为云 OBS](https://support.huaweicloud.com/obs/index.html) 等。
- 资源中心可以对接分布式的文件存储系统,如[Hadoop](https://hadoop.apache.org/docs/r2.7.0/)(2.6+)或者[MinIO](https://github.com/minio/minio)集群,也可以对接远端的对象存储,如[AWS S3](https://aws.amazon.com/s3/)或者[阿里云 OSS](https://www.aliyun.com/product/oss)[华为云 OBS](https://support.huaweicloud.com/obs/index.html)[腾讯云 COS](https://cloud.tencent.com/product/cos) 等。
- 资源中心也可以直接对接本地文件系统。在单机模式下,您无需依赖`Hadoop``S3`一类的外部存储系统,可以方便地对接本地文件系统进行体验。
- 除此之外,对于集群模式下的部署,您可以通过使用[S3FS-FUSE](https://github.com/s3fs-fuse/s3fs-fuse)`S3`挂载到本地,或者使用[JINDO-FUSE](https://help.aliyun.com/document_detail/187410.html)`OSS`挂载到本地等,再用资源中心对接本地文件系统方式来操作远端对象存储中的文件。

Expand Down Expand Up @@ -90,3 +90,19 @@ resource.huawei.cloud.obs.endpoint=obs.cn-southwest-2.huaweicloud.com
> * 如果用到资源上传的功能,那么[安装部署](../installation/standalone.md)中,部署用户需要有这部分的操作权限。
> * 如果 Hadoop 集群的 NameNode 配置了 HA 的话,需要开启 HDFS 类型的资源上传,同时需要将 Hadoop 集群下的 `core-site.xml``hdfs-site.xml` 复制到 `worker-server/conf` 以及 `api-server/conf`,非 NameNode HA 跳过此步骤。
## 对接腾讯云 COS

如果需要使用到资源中心的 COS 上传资源,我们需要对以下路径的进行配置:`api-server/conf/common.properties``worker-server/conf/common.properties`。可参考如下:

```properties
# access key id, required if you set resource.storage.type=COS
resource.tencent.cloud.access.key.id=<your-access-key-id>
# access key secret, required if you set resource.storage.type=COS
resource.tencent.cloud.access.key.secret=<your-access-key-secret>
# cos bucket name, required if you set resource.storage.type=COS
resource.tencent.cloud.cos.bucket.name=dolphinscheduler
# cos bucket region, required if you set resource.storage.type=COS, refer to https://cloud.tencent.com/document/product/436/6224
resource.tencent.cloud.cos.region=ap-nanjing

```

17 changes: 17 additions & 0 deletions dolphinscheduler-bom/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -117,6 +117,7 @@
<azure-sdk-bom.version>1.2.10</azure-sdk-bom.version>
<protobuf.version>3.17.2</protobuf.version>
<esdk-obs.version>3.23.3</esdk-obs.version>
<qcloud-cos.version>5.6.231</qcloud-cos.version>
<system-lambda.version>1.2.1</system-lambda.version>
<zeppelin-client.version>0.10.1</zeppelin-client.version>
<testcontainer.version>1.19.3</testcontainer.version>
Expand Down Expand Up @@ -934,6 +935,22 @@
</exclusions>
</dependency>

<dependency>
<groupId>com.qcloud</groupId>
<artifactId>cos_api</artifactId>
<version>${qcloud-cos.version}</version>
<exclusions>
<exclusion>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-core</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-api</artifactId>
</exclusion>
</exclusions>
</dependency>

<dependency>
<groupId>com.github.stefanbirkner</groupId>
<artifactId>system-lambda</artifactId>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -124,6 +124,9 @@ public final class Constants {
public static final String HUAWEI_CLOUD_OBS_BUCKET_NAME = "resource.huawei.cloud.obs.bucket.name";
public static final String HUAWEI_CLOUD_OBS_END_POINT = "resource.huawei.cloud.obs.endpoint";

public static final String TENCENT_CLOUD_COS_BUCKET_NAME = "resource.tencent.cloud.cos.bucket.name";
public static final String TENCENT_CLOUD_COS_REGION = "resource.tencent.cloud.cos.region";

/**
* fetch applicationId way
*/
Expand Down
12 changes: 11 additions & 1 deletion dolphinscheduler-common/src/main/resources/common.properties
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ data.basedir.path=/tmp/dolphinscheduler
# resource view suffixs
#resource.view.suffixs=txt,log,sh,bat,conf,cfg,py,java,sql,xml,hql,properties,json,yml,yaml,ini,js

# resource storage type: LOCAL, HDFS, S3, OSS, GCS, ABS, OBS. LOCAL type is default type, and it's a specific type of HDFS with "resource.hdfs.fs.defaultFS = file:///" configuration
# resource storage type: LOCAL, HDFS, S3, OSS, GCS, ABS, OBS, COS. LOCAL type is default type, and it's a specific type of HDFS with "resource.hdfs.fs.defaultFS = file:///" configuration
# please notice that LOCAL mode does not support reading and writing in distributed mode, which mean you can only use your resource in one machine, unless
# use shared file mount point
resource.storage.type=LOCAL
Expand Down Expand Up @@ -73,6 +73,16 @@ resource.huawei.cloud.obs.bucket.name=dolphinscheduler
resource.huawei.cloud.obs.endpoint=obs.cn-southwest-2.huaweicloud.com


# tencent cloud access key id, required if you set resource.storage.type=COS
resource.tencent.cloud.access.key.id=<your-access-key-id>
# access key secret, required if you set resource.storage.type=COS
resource.tencent.cloud.access.key.secret=<your-access-key-secret>
# cos bucket name, required if you set resource.storage.type=COS
resource.tencent.cloud.cos.bucket.name=dolphinscheduler
# cos bucket region, required if you set resource.storage.type=COS, see: https://cloud.tencent.com/document/product/436/6224
resource.tencent.cloud.cos.region=ap-nanjing


# if resource.storage.type=HDFS, the user must have the permission to create directories under the HDFS root path
resource.hdfs.root.user=hdfs
# if resource.storage.type=S3, the value like: s3a://dolphinscheduler; if resource.storage.type=HDFS and namenode HA is enabled, you need to copy core-site.xml and hdfs-site.xml to conf dir
Expand Down
13 changes: 12 additions & 1 deletion dolphinscheduler-common/src/test/resources/common.properties
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ data.basedir.path=/tmp/dolphinscheduler
# resource view suffixs
#resource.view.suffixs=txt,log,sh,bat,conf,cfg,py,java,sql,xml,hql,properties,json,yml,yaml,ini,js

# resource storage type: LOCAL, HDFS, S3, OSS, GCS, ABS, OBS. LOCAL type is default type, and it's a specific type of HDFS with "resource.hdfs.fs.defaultFS = file:///" configuration
# resource storage type: LOCAL, HDFS, S3, OSS, GCS, ABS, OBS, COS. LOCAL type is default type, and it's a specific type of HDFS with "resource.hdfs.fs.defaultFS = file:///" configuration
# please notice that LOCAL mode does not support reading and writing in distributed mode, which mean you can only use your resource in one machine, unless
# use shared file mount point
resource.storage.type=LOCAL
Expand Down Expand Up @@ -83,6 +83,17 @@ resource.azure.blob.storage.account.name=<your-account-name>
# abs connection string, required if you set resource.storage.type=ABS
resource.azure.blob.storage.connection.string=<your-connection-string>


# tencent cloud access key id, required if you set resource.storage.type=COS
resource.tencent.cloud.access.key.id=<your-access-key-id>
# access key secret, required if you set resource.storage.type=COS
resource.tencent.cloud.access.key.secret=<your-access-key-secret>
# cos bucket name, required if you set resource.storage.type=COS
resource.tencent.cloud.cos.bucket.name=dolphinscheduler
# cos bucket region, required if you set resource.storage.type=COS, see: https://cloud.tencent.com/document/product/436/6224
resource.tencent.cloud.cos.region=ap-nanjing


# if resource.storage.type=HDFS, the user must have the permission to create directories under the HDFS root path
resource.hdfs.root.user=hdfs
# if resource.storage.type=S3, the value like: s3a://dolphinscheduler; if resource.storage.type=HDFS and namenode HA is enabled, you need to copy core-site.xml and hdfs-site.xml to conf dir
Expand Down
1 change: 1 addition & 0 deletions dolphinscheduler-dist/release-docs/LICENSE
Original file line number Diff line number Diff line change
Expand Up @@ -713,6 +713,7 @@ The text of each license is also included at licenses/LICENSE-[project].txt.
azure-core-management 1.10.1: https://mvnrepository.com/artifact/com.azure/azure-core-management/1.10.1, MIT
azure-storage-blob 12.21.0: https://mvnrepository.com/artifact/com.azure/azure-storage-blob/12.21.0, MIT
azure-storage-internal-avro 12.6.0: https://mvnrepository.com/artifact/com.azure/azure-storage-internal-avro/12.6.0, MIT
cos_api 5.6.231 https://mvnrepository.com/artifact/com.qcloud/cos_api/5.6.231, MIT

========================================================================
MPL 1.1 licenses
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,11 @@
<artifactId>dolphinscheduler-storage-obs</artifactId>
<version>${project.version}</version>
</dependency>
<dependency>
<groupId>org.apache.dolphinscheduler</groupId>
<artifactId>dolphinscheduler-storage-cos</artifactId>
<version>${project.version}</version>
</dependency>
</dependencies>

</project>
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,11 @@ public enum StorageType {

ABS(5, "ABS"),

OBS(6, "OBS");
OBS(6, "OBS"),

COS(7, "COS"),

;

private final int code;
private final String name;
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
<?xml version="1.0" encoding="UTF-8"?>
<!--
~ Licensed to the Apache Software Foundation (ASF) under one or more
~ contributor license agreements. See the NOTICE file distributed with
~ this work for additional information regarding copyright ownership.
~ The ASF licenses this file to You under the Apache License, Version 2.0
~ (the "License"); you may not use this file except in compliance with
~ the License. You may obtain a copy of the License at
~
~ http://www.apache.org/licenses/LICENSE-2.0
~
~ Unless required by applicable law or agreed to in writing, software
~ distributed under the License is distributed on an "AS IS" BASIS,
~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
~ See the License for the specific language governing permissions and
~ limitations under the License.
-->
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>org.apache.dolphinscheduler</groupId>
<artifactId>dolphinscheduler-storage-plugin</artifactId>
<version>dev-SNAPSHOT</version>
</parent>

<artifactId>dolphinscheduler-storage-cos</artifactId>

<dependencies>
<dependency>
<groupId>com.qcloud</groupId>
<artifactId>cos_api</artifactId>
</dependency>

<dependency>
<groupId>org.apache.dolphinscheduler</groupId>
<artifactId>dolphinscheduler-storage-api</artifactId>
<version>${project.version}</version>
</dependency>

<dependency>
<groupId>org.apache.dolphinscheduler</groupId>
<artifactId>dolphinscheduler-task-api</artifactId>
<version>${project.version}</version>
</dependency>
</dependencies>
</project>
Loading

0 comments on commit 696dcf0

Please sign in to comment.