ksb.csle.didentification.privacy
Object that contains message ksb.csle.common.proto.StreamDidentProto.RecordReductionInfo RecordReductionInfo contains attributes as follows:
enum ReplaceValueMethod { BLANK = 0; STAR = 1; UNDERBAR = 2; } enum ColumnHandlePolicy { ONEBYONE = 0; ALL = 1; } enum OutlierMethod { ZSCORE = 0; BOXPLOT = 1; } message RecordReductionInfo { repeated int32 selectedColumnId = 1; required ReplaceValueMethod method = 2 [default = STAR]; required ColumnHandlePolicy columnHandlePolicy = 3 [default = ONEBYONE]; required OutlierMethod outlierMethod = 4 [default = ZSCORE]; repeated FieldInfo fieldInfo = 5; optional PrivacyCheckInfo check = 6; }
Anonymizes the src dataframe
Anonymizes the src dataframe
Dataframe to anonymize
DataFrame Anonymized dataframe
Anonymizes the column specified in src dataframe using generic 'Type' method.
Anonymizes the column specified in src dataframe using generic 'Type' method. The 'Type' is decided by inherited object module.
Dataframe to anonymize
Column to be anonymized
DataFrame The dataframe which replaces original column with anonymized column
Performs the record reduction on column of src dataframe.
Performs the record reduction on column of src dataframe.
Dataframe to anonymize
the column of dataframe to apply record reduction
DataFrame Anonymized dataframe
Performs the record reduction on column which has the type of numerical values.
Performs the record reduction on column which has the type of numerical values. It discriminates outliers on the basis of the statistical information, using given outlier method, and replaces them with 'repValueType'.
Dataframe to anonymize
the (numerical) column of dataframe to apply record reduction
The value of replacing outliers (e.x., blank, _, and *)
the method to discrimiate outliers (e.x., boxplot, z-score)
DataFrame Anonymized dataframe
Performs the record reduction on column which has the type of string values.
Performs the record reduction on column which has the type of string values. It discriminates outliers on the basis of their frequencys, using given outlier method, and replaces them with 'repValueType'.
Dataframe to anonymize
the (numerical) column of dataframe to apply record reduction
DataFrame Anonymized dataframe
Performs the record reduction on the given array of columns simultaneously.
Performs the record reduction on the given array of columns simultaneously. This function discriminates outliers by considering the given array of column simultaneously, and then replace them together.
Dataframe to anonymize
DataFrame Anonymized dataframe
Performs the record reduction on the given array of columns one by one.
Performs the record reduction on the given array of columns one by one.
Dataframe to anonymize
DataFrame Anonymized dataframe
Returns column name from src dataframe specified by the column ID defined by protobuf.
Returns column name from src dataframe specified by the column ID defined by protobuf.
dataframe to get names of columns.
column ID to anonymize.
String.
Returns column names from src dataframe specified by column IDs.
Returns column names from src dataframe specified by column IDs. Note that the column with invalid IDs are ignored.
dataframe to get names of columns.
Array[String].
Checks the given column ID is valid.
Checks the given column ID is valid.
dataframe to get names of columns.
Boolean.
Checks the given column Name is valid.
Checks the given column Name is valid.
dataframe to get names of columns.
column Name.
Boolean.
Operates record reduction module for basic de-identification
Operates record reduction module for basic de-identification
Input dataframe
DataFrame Anonymized dataframe
:: ApplicationDeveloperApi ::
Operator that implements the record reduction module in the Data Reduction algorithm. It discriminates outliers (boxplot and z-score methods are supported) and then replaces the rows which contains these found outliers with blank (or star).