

Extending the Identity Anonymization Tool¶

Warning

THE CONTENT ON THIS PAGE IS STILL A WORK IN PROGRESS.

If you want to include additional relational databases and log files from which you want to remove references to deleted user identities, you can extend the Identity Anonymization tool to include required relational databases and log files. It is also possible to extend the tool to remove references to deleted user identities from additional modules other than relational databases or log files.

Before you begin

Check out the source of the Identity Anonymization tool from here, and build the tool. For detailed instructions on how to build the tool, see Building the Identity Anonymization tool .
Once you build the source you can extract the identity-anonymization-tool-SNAPSHOT directory. The path to this directory will be referred to as <TOOL_HOME> throughout this section.

Extending the tool to remove references from additional relational databases¶

Follow the steps below if you want to extend the Identity Anonymization tool to include an additional relational database from which you want to remove references to deleted user identities:

Create a new directory in <TOOL_HOME>/conf/sql/, with an appropriate name based on the relational database table from which you want to remove references.

Tip

If there is an existing directory that serves this purpose, you should use the existing directory instead of creating a new directory.

For example, if you want to remove references to deleted user identities in a relational database where customer information is stored, yo can create a directory named customer.

Create an SQL file that includes the required commands, and save the file with an appropriate name in the directory that you created in step 1.

Tip

Ensure that the file is saved with the .sql extension.

The SQL statements should either be UPDATE or DELETE statements. The following variables can be used to replace respective values at the time of execution.

Variable	Value to replace
`pseudonym`	The pseudonym that should be used to replace a deleted user's identity.
`username`	The user name that should be replaced with the pseudonym.
`user_store_domain`	The user store domain.
`tenant_domain`	The tenant domain.

Following is a sample SQL statement that can be used:

UPDATE IDN_ASSOCIATED_ID
SET USER_NAME = `pseudonym`
WHERE USER_NAME = `username`
    AND DOMAIN_NAME = `user_store_domain`
    AND TENANT_ID = (SELECT UM_ID
                     FROM UM_TENANT
                     WHERE UM_DOMAIN_NAME = `tenant_domain`
                     ORDER BY UM_ID DESC LIMIT 1)

Note

When you run the Identity Anonymization tool to remove references to a deleted user’s identity from relational databases, there can be instances where a user name is stored in different forms in databases. For example, a user name can be stored as user1 or [email protected] or PRIMARY/user1 . To handle such scenarios, you need to assign a query type to each sql query used with the Identity Anonymization tool, create a sql.properties file corresponding to the sql query, and then specify the query type in the properties file as follows:

Tip

For example, if the sql file is customer-access-token.sql, then the corresponding properties file would be customer-access-token.sql.properties .

type=<Query-Type>

For example, type=DOMAIN_APPENDED

Following are the query types that you can use together with the description of each query type:

Query type	Description
`DOMAIN_SEPARATED`	You can use this query type when the query refers to a user name that is not appended with any domain name (i.e., when a user name is not appended with a user store domain name or tenant domain name such as `user1` ). This is the default query type. Therefore, you do not need to define this query type in a properties file. All queries without a `sql.properties` file uses this query type by default.
`DOMAIN_APPENDED`	You can use this query type when the query refers to a user name that is appended with a user store domain name as follows: `PRIMARY/user1` .
`TENANT_APPENDED`	You can use this query type when the query refers to a user name that is appended either with a tenant domain such as `[email protected]` , or with a super tenant domain (carbon.super) such as `[email protected]` .
`TENANT_SPECIFIC_APPENDED`	You can use this query type when the query refers to a user name that is appended with the tenant domain, but not in instances where the super tenant domain such as `[email protected]` or `user2` is used.

Provide a datasource definition in the <TOOL_HOME>/conf/datasources directory to map the directory that you created (i.e., <TOOL_HOME>/conf/sql/customer ) in step 1.
Following is a sample datasource definition:

<datasources-configuration>
  <datasources>
      <datasource>
          <name>customer</name>
          <description>The datasource used for customer</description>
          <definition type="RDBMS">
              <configuration>
                  <url>jdbc:h2:file:WSO2CARBON_DB</url>
                  <url>jdbc:mysql://localhost:3306/userdb</url>
                  <username>root</username>
                  <password>root</password>
                  <driverClassName>com.mysql.jdbc.Driver</driverClassName>
                  <maxActive>50</maxActive>
                  <maxWait>60000</maxWait>
                  <testOnBorrow>true</testOnBorrow>
                  <validationQuery>SELECT 1</validationQuery>
                  <validationInterval>30000</validationInterval>
                  <defaultAutoCommit>false</defaultAutoCommit>
              </configuration>
          </definition>
      </datasource>
  </datasources>
</datasources-configuration>

Here, the datasource name <name>customer</name> maps to the directory that you created in step 1. This datasource can be reused by any of the scripts provided within the <TOOL_HOME>/conf/sql/customer directory .

Note

The Identity Anonymization tool does not support JNDI. Therefore, i f there are JNDI configuration sections, be sure to remove those.

Add the corresponding JDBC driver into the <TOOL_HOME>/ lib directory if it is not already added.

Extending the tool to remove references from additional log files¶

Follow the steps below if you want to extend the Identity Anonymization tool to include an additional log file from which you want to remove references to deleted user identities:

Open the <TOOL_HOME>/conf/config.json file, go to the directories section, and add a new directory entry. You need to specify the following details when adding a new directory entry:
- dir : The directory where the regex replacement patterns are defined. The specified directory can have multiple RegEx pattern files.
- type : \<deprecated>
- processor : The name of the processor.
- log-file-path : The directory where the log files are located.
- log-file-name-regex : The regular expression to filter log files. Specify this when there are rolling log files, where you have to define which files needs processing with the specified replacement logic.

Open the <TOOL_HOME>/conf/patterns.xml file, and define required patterns using regular expressions to find and replace references to a deleted user's identity. Following are the variable that you can use in a regular expression:

Variable	Description
${userstoreDomain}	Replaces the user store domain name of a deleted user with a specified pseudonym.
${username}	Replaces the user name of a deleted user with a specified pseudonym.
${tenantId}	Replaces the tenant ID of a deleted user with a specified pseudonym .
${tenantDomain}	Replaces the tenant domain of a deleted user.

Following is a sample pattern that you can define in the <TOOL_HOME>/conf/patterns.xml file:

<patterns xmlns="patterns.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
          xsi:schemaLocation="patterns.xsd">
  <pattern key="pattern4">
      <detectPattern> (.)*(Initiator : )(.)*${username}</detectPattern>
      <replacePattern>${username}</replacePattern>
  </pattern>

  ...

</patterns>

Here, the <detectPattern> element contains the pattern to detect in log file entries. The <replacementPattern> element contains the variable that should be replaced with the pseudonym.

Extending the tool to remove references from additional modules other than relational databases or log files¶

Follow the steps below if you want to extend the tool to include additional modules other than relational databases or log files:

Import the following maven dependency:

    <dependency>
    <groupId>org.wso2.carbon.privacy</groupId>
    <artifactId>org.wso2.carbon.privacy.forgetme.api</artifactId>
    </dependency>

Implement the ForgetMeInstruction interface and the InstructionReader interface. Keep the following in mind when you implement the interfaces:
- The ForgetMeInstruction interface that you implement should contain a single execution of a single artifact. For example, a single log file or a single datasource that can be processed with a single instruction.
- The InstructionReader interface should generate a list of instructions related to a given artifact type in the config directory. With regard to RDBMS, the InstructionReader interface is responsible for treating a single sql file as a single instruction.
  - Make sure you specify a distinctive name for the result of getType() . This will be the name of the processor.
```
            public String getType() {return MY_PROCESOR_NAME;}
```
  Tip
  An instruction is considered to be atomic. i.e., An instruction should either be completely processed or should not be processed at all.
Foolow the steps below to register the InstructionReader interface with the Java 8 SPI (Service Provider Interface).
1. Create the META-INF/services/org.wso2.carbon.privacy.forgetme.api.runtime.InstructionReader in the resource directory, which will be packed inside the JAR.
2. Add the fully qualified class name of the implementation of InstructionReader inside the SPI file.
```
org.wso2.carbon.privacy.forgetme.logs.instructions.LogFileInstructionReader
```
Compile and build the JAR.
Add the pre-built jar to the <TOOL_HOME>/lib directory.
Alternatively, you can build your own distribution of the tool and include it in the product itself.

Top