Class/Object

ksb.csle.didentification.privacy

RecordReductionOperator

Related Docs: object RecordReductionOperator | package privacy

Permalink

class RecordReductionOperator extends BasePrivacyAnonymizer

:: ApplicationDeveloperApi ::

Operator that implements the record reduction module in the Data Reduction algorithm. It discriminates outliers (boxplot and z-score methods are supported) and then replaces the rows which contains these found outliers with blank (or star).

Linear Supertypes
BasePrivacyAnonymizer, DataFrameCheck, BaseDataOperator[StreamOperatorInfo, DataFrame], BaseGenericOperator[StreamOperatorInfo, DataFrame], BaseGenericMutantOperator[StreamOperatorInfo, DataFrame, DataFrame], BaseDoer, Logging, Serializable, Serializable, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. RecordReductionOperator
  2. BasePrivacyAnonymizer
  3. DataFrameCheck
  4. BaseDataOperator
  5. BaseGenericOperator
  6. BaseGenericMutantOperator
  7. BaseDoer
  8. Logging
  9. Serializable
  10. Serializable
  11. AnyRef
  12. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new RecordReductionOperator(o: StreamOperatorInfo)

    Permalink

    o

    Object that contains message ksb.csle.common.proto.StreamDidentProto.RecordReductionInfo RecordReductionInfo contains attributes as follows:

    • selectedColumnId: Column ID to apply the record reduction function
    • method: the special character to replace the found outlier data
    • columnHandlePolicy: consider the given columns all together, or individually.
    • outlierMethod: the method to discriminate the outlier
    • fieldInfo: the info about column attributes (identifier, sensitive, ..)
    • check: the method how to verify the anonymized data

    RecordReductionInfo

    enum ReplaceValueMethod {
      BLANK = 0;
      STAR = 1;
      UNDERBAR = 2;
    }
    enum ColumnHandlePolicy {
      ONEBYONE = 0;
      ALL = 1;
    }
    enum OutlierMethod {
      ZSCORE = 0;
      BOXPLOT = 1;
    }
    message RecordReductionInfo {
      repeated int32 selectedColumnId = 1;
      required ReplaceValueMethod method = 2 [default = STAR];
      required ColumnHandlePolicy columnHandlePolicy = 3 [default = ONEBYONE];
      required OutlierMethod outlierMethod = 4 [default = ZSCORE];
      repeated FieldInfo fieldInfo = 5;
      optional PrivacyCheckInfo check = 6;
    }

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. def anonymize(src: DataFrame): DataFrame

    Permalink

    Anonymizes the src dataframe

    Anonymizes the src dataframe

    src

    Dataframe to anonymize

    returns

    DataFrame Anonymized dataframe

  5. def anonymize(src: DataFrame, columnNames: Array[String]): DataFrame

    Permalink
    Definition Classes
    BasePrivacyAnonymizer
  6. def anonymize(src: DataFrame, columnName: String): DataFrame

    Permalink

    Anonymizes the column specified in src dataframe using generic 'Type' method.

    Anonymizes the column specified in src dataframe using generic 'Type' method. The 'Type' is decided by inherited object module.

    src

    Dataframe to anonymize

    columnName

    Column to be anonymized

    returns

    DataFrame The dataframe which replaces original column with anonymized column

    Definition Classes
    BasePrivacyAnonymizer
  7. def anonymizeColumn(src: DataFrame, columnName: String): DataFrame

    Permalink

    Performs the record reduction on column of src dataframe.

    Performs the record reduction on column of src dataframe.

    src

    Dataframe to anonymize

    columnName

    the column of dataframe to apply record reduction

    returns

    DataFrame Anonymized dataframe

    Definition Classes
    RecordReductionOperatorBasePrivacyAnonymizer
  8. def anonymizeNumericColumn(src: DataFrame, columnName: String, repValueType: ReplaceValueMethod, outlierType: OutlierMethod): DataFrame

    Permalink

    Performs the record reduction on column which has the type of numerical values.

    Performs the record reduction on column which has the type of numerical values. It discriminates outliers on the basis of the statistical information, using given outlier method, and replaces them with 'repValueType'.

    src

    Dataframe to anonymize

    columnName

    the (numerical) column of dataframe to apply record reduction

    repValueType

    The value of replacing outliers (e.x., blank, _, and *)

    outlierType

    the method to discrimiate outliers (e.x., boxplot, z-score)

    returns

    DataFrame Anonymized dataframe

  9. def anonymizeStringColumn(src: DataFrame, columnName: String, repValueType: ReplaceValueMethod, outlierType: OutlierMethod): DataFrame

    Permalink
  10. def anonymizeStringColumn(src: DataFrame, columnName: String): DataFrame

    Permalink

    Performs the record reduction on column which has the type of string values.

    Performs the record reduction on column which has the type of string values. It discriminates outliers on the basis of their frequencys, using given outlier method, and replaces them with 'repValueType'.

    src

    Dataframe to anonymize

    columnName

    the (numerical) column of dataframe to apply record reduction

    returns

    DataFrame Anonymized dataframe

  11. def anonymizedAll(src: DataFrame, columnNames: Array[String]): DataFrame

    Permalink

    Performs the record reduction on the given array of columns simultaneously.

    Performs the record reduction on the given array of columns simultaneously. This function discriminates outliers by considering the given array of column simultaneously, and then replace them together.

    src

    Dataframe to anonymize

    returns

    DataFrame Anonymized dataframe

  12. def anonymizedOneByOne(src: DataFrame, columnNames: Array[String]): DataFrame

    Permalink

    Performs the record reduction on the given array of columns one by one.

    Performs the record reduction on the given array of columns one by one.

    src

    Dataframe to anonymize

    returns

    DataFrame Anonymized dataframe

  13. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  14. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  15. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  16. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  17. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  18. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  19. def getColumnName(src: DataFrame, columnId: Int): String

    Permalink

    Returns column name from src dataframe specified by the column ID defined by protobuf.

    Returns column name from src dataframe specified by the column ID defined by protobuf.

    src

    dataframe to get names of columns.

    columnId

    column ID to anonymize.

    returns

    String.

    Definition Classes
    DataFrameCheck
  20. def getColumnNames(src: DataFrame, columnIDs: Array[Int]): Array[String]

    Permalink

    Returns column names from src dataframe specified by column IDs.

    Returns column names from src dataframe specified by column IDs. Note that the column with invalid IDs are ignored.

    src

    dataframe to get names of columns.

    returns

    Array[String].

    Definition Classes
    DataFrameCheck
  21. def getQuasiColumnIDs(fieldInfos: Array[FieldInfo]): Array[Int]

    Permalink
    Definition Classes
    DataFrameCheck
  22. def getSensColumnIDs(fieldInfos: Array[FieldInfo]): Array[Int]

    Permalink
    Definition Classes
    DataFrameCheck
  23. def getValidColumnIDs(src: DataFrame, columnIDs: Array[Int]): Array[Int]

    Permalink
    Definition Classes
    DataFrameCheck
  24. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  25. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  26. def isValidColumnID(src: DataFrame, columnID: Int): Boolean

    Permalink

    Checks the given column ID is valid.

    Checks the given column ID is valid.

    src

    dataframe to get names of columns.

    returns

    Boolean.

    Definition Classes
    DataFrameCheck
  27. def isValidColumnName(src: DataFrame, columnName: String): Boolean

    Permalink

    Checks the given column Name is valid.

    Checks the given column Name is valid.

    src

    dataframe to get names of columns.

    columnName

    column Name.

    returns

    Boolean.

    Definition Classes
    DataFrameCheck
  28. val logger: Logger

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  29. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  30. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  31. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  32. def operate(df: DataFrame): DataFrame

    Permalink

    Operates record reduction module for basic de-identification

    Operates record reduction module for basic de-identification

    df

    Input dataframe

    returns

    DataFrame Anonymized dataframe

    Definition Classes
    RecordReductionOperator → BaseGenericOperator → BaseGenericMutantOperator
  33. val p: RecordReductionInfo

    Permalink
  34. val privacy: PrivacyCheckInfo

    Permalink
    Definition Classes
    BasePrivacyAnonymizer
  35. def stop: Unit

    Permalink
    Definition Classes
    BaseGenericOperator → BaseGenericMutantOperator
  36. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  37. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  38. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  39. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  40. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from BasePrivacyAnonymizer

Inherited from DataFrameCheck

Inherited from BaseDataOperator[StreamOperatorInfo, DataFrame]

Inherited from BaseGenericOperator[StreamOperatorInfo, DataFrame]

Inherited from BaseGenericMutantOperator[StreamOperatorInfo, DataFrame, DataFrame]

Inherited from BaseDoer

Inherited from Logging

Inherited from Serializable

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped