Class

ksb.csle.component.pipe.stream.reader

KafkaPipeReader

Related Doc: package reader

Permalink

class KafkaPipeReader extends BasePipeReader[DataFrame, StreamPipeReaderInfo, SparkSession]

:: ApplicationDeveloperApi ::

Reader that reads data from kafka and pipelines it to the next.

Linear Supertypes
BasePipeReader[DataFrame, StreamPipeReaderInfo, SparkSession], BaseGenericPipeOperator[Int, Int, DataFrame, StreamPipeReaderInfo, SparkSession], BaseGenericMutantOperator[StreamPipeReaderInfo, Int, (Int) ⇒ DataFrame], BaseDoer, Logging, Serializable, Serializable, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. KafkaPipeReader
  2. BasePipeReader
  3. BaseGenericPipeOperator
  4. BaseGenericMutantOperator
  5. BaseDoer
  6. Logging
  7. Serializable
  8. Serializable
  9. AnyRef
  10. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new KafkaPipeReader(o: StreamPipeReaderInfo, session: SparkSession)

    Permalink

    o

    Object that contains message ksb.csle.common.proto.DatasourceProto.KafkaPipeReaderInfo KafkaPipeReaderInfo contains attributes as follows:

    • bootStrapServers: Address of Kafka server (required)
    • zooKeeperConnect: Address of Kafka Zookeeper (required)
    • topic: Topic where fetch data (required)
    • addTimestamp: Flag for adding timestamp column automatically
    • timestampName: Column name for timestamp field
    • watermark: Time slot in seconds or minutest The event time column and the threshold on how late the data is expected to be in terms of event time.
    • sampleJsonPath: Json file path containing sample dataset. The system gets hint for the record format from it.
    • failOnDataLoss: Determines whether or not a streaming query should fail if it's possible data has been lost (e.g., topics are deleted, offsets are out of range). It is important to monitor your streaming queries, especially with temporal infrastructure like Kafka. Offsets typically go out of range when Kafka's log cleaner activates. If a specific streaming query can not process data quickly enough it may fall behind the earliest offsets after the log cleaner rolls a log segment Sometimes failOnDataLoss may be a false alarm. You can disable it if it is not working as expected based on your use case. Refer to followed site for more information. https://github.com/vertica/PSTL/wiki/Kafka-Source

    KafkaPipeReaderInfo

    message KafkaPipeReaderInfo {
      required string bootStrapServers = 1;
      required string zooKeeperConnect = 2;
      required string topic = 3;
      // FIXME: Avoid the case that the addTimestamp is 'true' with no setting of timestampName"
      required bool addTimestamp = 4 [default = true];
      optional string timestampName = 5;
      optional string watermark = 6;
      optional string sampleJsonPath = 7;
      required bool failOnDataLoss = 8 [default = false];
    }

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  5. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  6. def close: Unit

    Permalink
    Definition Classes
    KafkaPipeReader → BasePipeReader
  7. val df: DataStreamReader

    Permalink
  8. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  9. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  10. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  11. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  12. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  13. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  14. val logger: Logger

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  15. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  16. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  17. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  18. val o: StreamPipeReaderInfo

    Permalink

    Object that contains message ksb.csle.common.proto.DatasourceProto.KafkaPipeReaderInfo KafkaPipeReaderInfo contains attributes as follows:

    Object that contains message ksb.csle.common.proto.DatasourceProto.KafkaPipeReaderInfo KafkaPipeReaderInfo contains attributes as follows:

    • bootStrapServers: Address of Kafka server (required)
    • zooKeeperConnect: Address of Kafka Zookeeper (required)
    • topic: Topic where fetch data (required)
    • addTimestamp: Flag for adding timestamp column automatically
    • timestampName: Column name for timestamp field
    • watermark: Time slot in seconds or minutest The event time column and the threshold on how late the data is expected to be in terms of event time.
    • sampleJsonPath: Json file path containing sample dataset. The system gets hint for the record format from it.
    • failOnDataLoss: Determines whether or not a streaming query should fail if it's possible data has been lost (e.g., topics are deleted, offsets are out of range). It is important to monitor your streaming queries, especially with temporal infrastructure like Kafka. Offsets typically go out of range when Kafka's log cleaner activates. If a specific streaming query can not process data quickly enough it may fall behind the earliest offsets after the log cleaner rolls a log segment Sometimes failOnDataLoss may be a false alarm. You can disable it if it is not working as expected based on your use case. Refer to followed site for more information. https://github.com/vertica/PSTL/wiki/Kafka-Source

    KafkaPipeReaderInfo

    message KafkaPipeReaderInfo {
      required string bootStrapServers = 1;
      required string zooKeeperConnect = 2;
      required string topic = 3;
      // FIXME: Avoid the case that the addTimestamp is 'true' with no setting of timestampName"
      required bool addTimestamp = 4 [default = true];
      optional string timestampName = 5;
      optional string watermark = 6;
      optional string sampleJsonPath = 7;
      required bool failOnDataLoss = 8 [default = false];
    }
  19. final def operate(in: Int): (Int) ⇒ DataFrame

    Permalink
    Definition Classes
    BasePipeReader → BaseGenericPipeOperator → BaseGenericMutantOperator
  20. val p: KafkaPipeReaderInfo

    Permalink
  21. def read(): DataFrame

    Permalink

    Reads data from kafka topic TODO: Define and use parameters to set watermark and timestamp column.

    Reads data from kafka topic TODO: Define and use parameters to set watermark and timestamp column.

    returns

    dataframe

    Definition Classes
    KafkaPipeReader → BasePipeReader
  22. val schema: StructType

    Permalink
  23. val session: SparkSession

    Permalink
  24. def stop: Unit

    Permalink
    Definition Classes
    BasePipeReader → BaseGenericPipeOperator → BaseGenericMutantOperator
  25. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  26. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  27. val topic: String

    Permalink
  28. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  29. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  30. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from BasePipeReader[DataFrame, StreamPipeReaderInfo, SparkSession]

Inherited from BaseGenericPipeOperator[Int, Int, DataFrame, StreamPipeReaderInfo, SparkSession]

Inherited from BaseGenericMutantOperator[StreamPipeReaderInfo, Int, (Int) ⇒ DataFrame]

Inherited from BaseDoer

Inherited from Logging

Inherited from Serializable

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped