ksb.csle.didentification.privacy
Object that contains message ksb.csle.common.proto.StreamDidentProto.RandomNoiseInfo RandomNoiseInfo contains attributes as follows:
message StringHandle { required int32 position = 1; required int32 length = 2; required RandomMethod randMethod = 3 [default = MIXED]; } enum RandomType { FIXED = 0; RANDOM = 1; GAUSSIAN = 2; } enum NoiseOperator { NOISE_SUM = 0; NOISE_MINUS = 1; NOISE_MULTIPLY = 2; NOISE_DIVIDE = 3; } message NormalDistInfo { required double mu = 1 [default = 0.0]; required double std = 2 [default = 1.0]; } message NumericHandle { required RandomType isRandom = 1; required NoiseOperator operator = 2 [default = NOISE_SUM]; optional double value = 3; optional NormalDistInfo normalDist = 4; } message RandomNoiseInfo { repeated int32 selectedColumnId = 1; optional StringHandle strHandle = 2; optional NumericHandle numHandle = 3; repeated FieldInfo fieldInfo = 4; optional PrivacyCheckInfo check = 5; }
Anonymizes the column specified in src dataframe using generic 'Type' method.
Anonymizes the column specified in src dataframe using generic 'Type' method. The 'Type' is decided by inherited object module.
Dataframe to anonymize
Column to be anonymized
DataFrame The dataframe which replaces original column with anonymized column
Performs random noise operations on given src dataframe
Performs random noise operations on given src dataframe
Dataframe to anonymize
Column to be anonymized
DataFrame Anonymized dataframe
Returns column name from src dataframe specified by the column ID defined by protobuf.
Returns column name from src dataframe specified by the column ID defined by protobuf.
dataframe to get names of columns.
column ID to anonymize.
String.
Returns column names from src dataframe specified by column IDs.
Returns column names from src dataframe specified by column IDs. Note that the column with invalid IDs are ignored.
dataframe to get names of columns.
Array[String].
Checks the given column ID is valid.
Checks the given column ID is valid.
dataframe to get names of columns.
Boolean.
Checks the given column Name is valid.
Checks the given column Name is valid.
dataframe to get names of columns.
column Name.
Boolean.
Performs random noise operations on the given numerical column in src dataframe.
Performs random noise operations on the given numerical column in src dataframe. NumHandler contains the information about how to generates noises.
Dataframe to anonymize
Column to be anonymized
The method to generate noises
DataFrame Anonymized dataframe
Performs random noise operations on the given string column.
Performs random noise operations on the given string column. Note that this column is composed of both numerical and string data. In this case, this function extracts numerical data only, inserts noises, and then combines the other string data.
Dataframe to anonymize
Column to be anonymized
The method to generate noises
DataFrame Anonymized dataframe
Performs random noise operations on the given string column.
Performs random noise operations on the given string column. Note that this column may be composed of only string data or both numerical and string data.
Dataframe to anonymize
Column to be anonymized
DataFrame Anonymized dataframe
Same as noiseStringOnlyColumn(src, columeName, strHandle), but the given parameter is different.
Same as noiseStringOnlyColumn(src, columeName, strHandle), but the given parameter is different.
Dataframe to anonymize
Column to be anonymized
The position to add noises
The length of generated noises
How to make the random string
DataFrame Anonymized dataframe
Performs random noise operations on the given string column.
Performs random noise operations on the given string column. Note that this column is only composed of string data. In this case, random noises are inserted at specific position.
Dataframe to anonymize
Column to be anonymized
The method to generate noises
DataFrame Anonymized dataframe
Operates random noise module for basic de-identification
Operates random noise module for basic de-identification
Input dataframe
DataFrame Anonymized dataframe
:: ApplicationDeveloperApi ::
Operator that implements the random noise module in the Data Masking algorithm. It inserts random noises on original data. - If the given column is string type, random noises composed of numerical, or alphabet, or both are inserted at specific position. - If the given column is numerical type, some values (it may be specified, randomly chosen, or got from the normal distribution) are added (or subtracted, multiplied, and divided) on each value of that column.