Skip to content

sanjuthomas/kafka-connect-marklogic

Repository files navigation

Overview

Kafka Connect MarkLogic is a sink-only connector that pulls messages from Kafka and stores them in MarkLogic as JSON documents. Consider using the Kafka MarkLogic Sync Connector for the following use cases.

  1. Near real-time ingestion requirements.
  2. Regulate the traffic towards MarkLogic.
  3. Requirement to maintain the order of the messages.
  4. A Kafka ecosystem exists.
  5. Replay messages using an offset.

High-Level Architecture Diagram

Kafka Connect MarkLogic

This project is about the component marked in green. Refer here to see the available list of source connectors.

Prerequisites

Apache ZooKeeper and Apache Kafka are installed and running on your machine. Please refer to respective sites to download and start ZooKeeper and Kafka.

What is MarkLogic?

The MarkLogic is a multi-model schemaless NoSQL database to store, manage, and search JSON, XML, and RDF triples. For more details, please refer to MarkLogic's official website.

What is Apache Kafka?

Apache Kafka is an open-source stream processing platform written in Scala and Java by the Apache Software Foundation. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds. For more details, please refer to the kafka home page.

Implementation details

To send data to MarkLogic, this connector uses MarkLogic REST API. By default, the /v1/documents endpoint at port 8000 is used. You may change that in the marklogic-sink.properties file. You may use your own REST/Service extension instead of the out-of-the-box document API to transform anything on the way in.

To listen to multiple topics, please add the topic name in the marklogic-sink.properties file. The current implementation will take the topic name and use it as the MarkLogic document collection name.

Data Mapping

MarkLogic is a schemaless document store/NoSQL database. Since we are working with plain JSON data, we don't need a schema to serialize and deserialize the messages.

For stand-alone mode, please copy kafka_home/config/connect-standalone.properties to create kafka_home/config/marklogic-connect-standalone.properties file. Open kafka_home/config/marklogic-connect-standalone.properties and set the following properties to false.

key.converter.schemas.enable=false
value.converter.schemas.enable=false

For distributed mode, please copy kafka_home/config/connect-distributed.properties to create kafka_home/config/marklogic-connect-distributed.properties file. Open kafka_home/config/marklogic-connect-distributed.properties and set the following properties to false.

key.converter.schemas.enable=false
value.converter.schemas.enable=false

In distributed mode, if you run more than one worker per host, the rest.port settings must have different values for each instance. By default REST interface is available at 8083.

How do you deploy the connector in Kafka?

This is a maven project. To create an uber jar, execute the following maven goals.

mvn clean compile package shade:shade install

Copy the artifact kafka-connect-marklogic-1.0.jar to kakfa_home/lib folder.

Copy the marklogic-sink.properties file into kafka_home/config folder. Update the content of the property file according to your environment.

Alternatively, you may keep the kafka-connect-marklogic-1.0.jar in another directory and export that directory into Kafka class path before starting the connector.

How to start the connector in stand-alone mode?

Open a shell prompt, move to kafka_home, and execute the following.

bin/connect-standalone.sh config/marklogic-connect-standalone.properties config/marklogic-sink.properties

How to start the connector in distributed mode?

Open a shell prompt, move to kafka_home, and execute the following.

bin/connect-distributed.sh config/marklogic-connect-distributed.properties config/marklogic-sink.properties

How to produce some messages?

Some examples of message producers are available here.

Contact

Create an issue in the GitHub.

Contribute

Please refer to CONTRIBUTING.md

License

The project is licensed under the MIT license.

Looking for a connector that supports XML messages?

Drop me a line - ml@sanju.org