Class/Object

ksb.csle.didentification.privacy

RandomNoiseOperator

Related Docs: object RandomNoiseOperator | package privacy

Permalink

class RandomNoiseOperator extends BasePrivacyAnonymizer

:: ApplicationDeveloperApi ::

Operator that implements the random noise module in the Data Masking algorithm. It inserts random noises on original data. - If the given column is string type, random noises composed of numerical, or alphabet, or both are inserted at specific position. - If the given column is numerical type, some values (it may be specified, randomly chosen, or got from the normal distribution) are added (or subtracted, multiplied, and divided) on each value of that column.

Linear Supertypes
BasePrivacyAnonymizer, DataFrameCheck, BaseDataOperator[StreamOperatorInfo, DataFrame], BaseGenericOperator[StreamOperatorInfo, DataFrame], BaseGenericMutantOperator[StreamOperatorInfo, DataFrame, DataFrame], BaseDoer, Logging, Serializable, Serializable, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. RandomNoiseOperator
  2. BasePrivacyAnonymizer
  3. DataFrameCheck
  4. BaseDataOperator
  5. BaseGenericOperator
  6. BaseGenericMutantOperator
  7. BaseDoer
  8. Logging
  9. Serializable
  10. Serializable
  11. AnyRef
  12. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new RandomNoiseOperator(o: StreamOperatorInfo)

    Permalink

    o

    Object that contains message ksb.csle.common.proto.StreamDidentProto.RandomNoiseInfo RandomNoiseInfo contains attributes as follows:

    • selectedColumnId: Column ID to apply the random noise function
    • strHandle: how to add the noise into the string values
    • numHandle: how to add the noise into the numerical values
    • fieldInfo: the info about column attributes (identifier, sensitive, ..)
    • check: the method how to verify the performance of anonymized data

    RandomNoiseInfo

    message StringHandle {
    	required int32 position = 1;
    	required int32 length = 2;
    	required RandomMethod randMethod = 3 [default = MIXED];
    }
    enum RandomType {
    	FIXED = 0;
    	RANDOM = 1;
    	GAUSSIAN = 2;
    }
    enum NoiseOperator {
    	NOISE_SUM = 0;
    	NOISE_MINUS = 1;
    	NOISE_MULTIPLY = 2;
    	NOISE_DIVIDE = 3;
    }
    message NormalDistInfo {
    	required double mu = 1 [default = 0.0];
    	required double std = 2 [default = 1.0];
    }
    message NumericHandle {
    	required RandomType isRandom = 1;
    	required NoiseOperator operator = 2 [default = NOISE_SUM];
    	optional double value = 3;
    	optional NormalDistInfo normalDist = 4;
    }
    message RandomNoiseInfo {
      repeated int32 selectedColumnId = 1;
      optional StringHandle strHandle = 2;
      optional NumericHandle numHandle = 3;
      repeated FieldInfo fieldInfo = 4;
      optional PrivacyCheckInfo check = 5;
    }

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. def anonymize(src: DataFrame, columnNames: Array[String]): DataFrame

    Permalink
    Definition Classes
    BasePrivacyAnonymizer
  5. def anonymize(src: DataFrame, columnName: String): DataFrame

    Permalink

    Anonymizes the column specified in src dataframe using generic 'Type' method.

    Anonymizes the column specified in src dataframe using generic 'Type' method. The 'Type' is decided by inherited object module.

    src

    Dataframe to anonymize

    columnName

    Column to be anonymized

    returns

    DataFrame The dataframe which replaces original column with anonymized column

    Definition Classes
    BasePrivacyAnonymizer
  6. def anonymizeColumn(src: DataFrame, columnName: String): DataFrame

    Permalink

    Performs random noise operations on given src dataframe

    Performs random noise operations on given src dataframe

    src

    Dataframe to anonymize

    columnName

    Column to be anonymized

    returns

    DataFrame Anonymized dataframe

    Definition Classes
    RandomNoiseOperatorBasePrivacyAnonymizer
  7. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  8. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  9. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  10. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  11. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  12. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  13. def getColumnName(src: DataFrame, columnId: Int): String

    Permalink

    Returns column name from src dataframe specified by the column ID defined by protobuf.

    Returns column name from src dataframe specified by the column ID defined by protobuf.

    src

    dataframe to get names of columns.

    columnId

    column ID to anonymize.

    returns

    String.

    Definition Classes
    DataFrameCheck
  14. def getColumnNames(src: DataFrame, columnIDs: Array[Int]): Array[String]

    Permalink

    Returns column names from src dataframe specified by column IDs.

    Returns column names from src dataframe specified by column IDs. Note that the column with invalid IDs are ignored.

    src

    dataframe to get names of columns.

    returns

    Array[String].

    Definition Classes
    DataFrameCheck
  15. def getQuasiColumnIDs(fieldInfos: Array[FieldInfo]): Array[Int]

    Permalink
    Definition Classes
    DataFrameCheck
  16. def getSensColumnIDs(fieldInfos: Array[FieldInfo]): Array[Int]

    Permalink
    Definition Classes
    DataFrameCheck
  17. def getValidColumnIDs(src: DataFrame, columnIDs: Array[Int]): Array[Int]

    Permalink
    Definition Classes
    DataFrameCheck
  18. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  19. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  20. def isValidColumnID(src: DataFrame, columnID: Int): Boolean

    Permalink

    Checks the given column ID is valid.

    Checks the given column ID is valid.

    src

    dataframe to get names of columns.

    returns

    Boolean.

    Definition Classes
    DataFrameCheck
  21. def isValidColumnName(src: DataFrame, columnName: String): Boolean

    Permalink

    Checks the given column Name is valid.

    Checks the given column Name is valid.

    src

    dataframe to get names of columns.

    columnName

    column Name.

    returns

    Boolean.

    Definition Classes
    DataFrameCheck
  22. val logger: Logger

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  23. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  24. def noiseNumericColumn(src: DataFrame, columnName: String, numHandle: NumericHandle): DataFrame

    Permalink

    Performs random noise operations on the given numerical column in src dataframe.

    Performs random noise operations on the given numerical column in src dataframe. NumHandler contains the information about how to generates noises.

    src

    Dataframe to anonymize

    columnName

    Column to be anonymized

    numHandle

    The method to generate noises

    returns

    DataFrame Anonymized dataframe

  25. def noiseNumericStringColumn(src: DataFrame, columnName: String, numHandle: NumericHandle): DataFrame

    Permalink

    Performs random noise operations on the given string column.

    Performs random noise operations on the given string column. Note that this column is composed of both numerical and string data. In this case, this function extracts numerical data only, inserts noises, and then combines the other string data.

    src

    Dataframe to anonymize

    columnName

    Column to be anonymized

    numHandle

    The method to generate noises

    returns

    DataFrame Anonymized dataframe

  26. def noiseStringColumn(src: DataFrame, columnName: String): DataFrame

    Permalink

    Performs random noise operations on the given string column.

    Performs random noise operations on the given string column. Note that this column may be composed of only string data or both numerical and string data.

    src

    Dataframe to anonymize

    columnName

    Column to be anonymized

    returns

    DataFrame Anonymized dataframe

  27. def noiseStringOnlyColumn(src: DataFrame, columnName: String, position: Int, length: Int, randMethod: RandomMethod): DataFrame

    Permalink

    Same as noiseStringOnlyColumn(src, columeName, strHandle), but the given parameter is different.

    Same as noiseStringOnlyColumn(src, columeName, strHandle), but the given parameter is different.

    src

    Dataframe to anonymize

    columnName

    Column to be anonymized

    position

    The position to add noises

    length

    The length of generated noises

    randMethod

    How to make the random string

    returns

    DataFrame Anonymized dataframe

  28. def noiseStringOnlyColumn(src: DataFrame, columnName: String, strHandle: StringHandle): DataFrame

    Permalink

    Performs random noise operations on the given string column.

    Performs random noise operations on the given string column. Note that this column is only composed of string data. In this case, random noises are inserted at specific position.

    src

    Dataframe to anonymize

    columnName

    Column to be anonymized

    strHandle

    The method to generate noises

    returns

    DataFrame Anonymized dataframe

  29. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  30. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  31. def operate(df: DataFrame): DataFrame

    Permalink

    Operates random noise module for basic de-identification

    Operates random noise module for basic de-identification

    df

    Input dataframe

    returns

    DataFrame Anonymized dataframe

    Definition Classes
    RandomNoiseOperator → BaseGenericOperator → BaseGenericMutantOperator
  32. val p: RandomNoiseInfo

    Permalink
  33. val privacy: PrivacyCheckInfo

    Permalink
    Definition Classes
    BasePrivacyAnonymizer
  34. def stop: Unit

    Permalink
    Definition Classes
    BaseGenericOperator → BaseGenericMutantOperator
  35. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  36. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  37. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  38. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  39. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from BasePrivacyAnonymizer

Inherited from DataFrameCheck

Inherited from BaseDataOperator[StreamOperatorInfo, DataFrame]

Inherited from BaseGenericOperator[StreamOperatorInfo, DataFrame]

Inherited from BaseGenericMutantOperator[StreamOperatorInfo, DataFrame, DataFrame]

Inherited from BaseDoer

Inherited from Logging

Inherited from Serializable

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped