Story Behind the Data Migration of WSO2 Analytics Solutions from DAS to SP

Roland Hewage
8 min readMar 21, 2020

What are WSO2 Analytics Solutions?

Analytics Solutions refer to WSO2 SP solutions or WSO2 DAS solutions pre-designed to work with other WSO2 products. The following solutions are currently supported.

  1. APIM-Analytics
  2. EI-Analytics
  3. IS-Analytics

You can read more about each analytics solution in my post “What are WSO2 Analytics Solutions?”. Here I’m going to explain about the efforts taken behind the scene for the data migration of WSO2 Analytics solutions in an architectural view.

DAS based Analytics Solutions

The old DAS based Analytics consists of the components: Data Agents, Event Recievers, Analytics REST API, Data store, Siddhi Event Processors, Analytics Spark, Data Indexing, Event Publishers, Analytics Dashboard, Event Sinks which are accessible through the DAS Management Console. You can read more about each of these components in my blog post “Architecture of WSO2 Data Analytics Server (WSO2 DAS)”.

WSO2 DAS Event Flow

  1. Event Receivers and the analytics REST API send data to the DAS server.
  2. Received data are stored through the data layer in the underlying Data Store (Cassandra, RDBMS or HBase etc.)
  3. A background data indexing process fetches the data from the Data Store, and does the indexing operations.
  4. Analyzer engine, which is powered by Apache Spark or the realtime Siddhi based Event Processors analyze this data according to defined analytic queries. This usually follows a pattern of retrieving data from the Data Store, performing a data operation such as an addition, and storing data back in the Data Store.
  5. The Analytics Dashboard queries the Data Store for the analyzed data and displays them graphically.

The Data Access Layer (DAL) persists the data processed by WSO2 DAS and consists of two components named Analytics Record Store and Analytics File System. The Analytics Record Store handles the storage of records that are received by WSO2 DAS in the form of events. The Analytics File System handles the storage of index data.

Data Persistence in DAS based Analytics

Analytics Record Store

The Analytics Record Store is the section that handles the storing of records that are received by WSO2 DAS in the form of events. This store contains raw data relating to events in a tabular form to be retrieved later.

There are 2 types of record stores configured by default in DAS.

  1. Primary Store (EVENT_STORE) :- This record store is used to store the persisted incoming events of WSO2 DAS. It contains raw data in a tabular structure which can be used later.
  2. Processed Data Store (PROCESSED_DATA_STORE) :- This record store is used to store summarized event data.

Analytics File System

By default, WSO2 DAS executes indexing operation when the server is started. All index data are stored in the file system, partitioned into unit is known as shards. You can read more about indexing in the Data Access Layer in my blog post “Data Access Layer (DAL) of WSO2 Data Analytics Server (WSO2 DAS)”.

DAS supports data persistence through the Data Access Layer (DAL) by implementation with different database types.

  1. Configuring data persistence with Cassandra
  2. Configuring data persistence with HBase or HDFS
  3. Configuring data persistence with RDBMS

WSO2 DAS supports several RDBMS types for its underlying Data Access Layer (DAL). In order to use the RDBMS DAL component, a pre-configured installation of a RDBMS is required. The RDBMS implementation for the DAS DAL contains the implementations of Analytics Record Store. WSO2 DAS support several RDBMS types like H2, MySQL, PostgreSQL, ORACLE, Microsoft SQL Server, DB2 etc…

Usually the Analytics Record Store holds all the Analytics data. In DAS these records are stored as BLOBS. Most of the RDBMS have a fixed schema and do not support storing arbitrary key values. In DAS, records are stored as blobs. This way, the data fields in a record can be converted to a blob and stored together with the record ID and the timestamp of the record. Storing data as blobs has one drawback. That is database level indexing cannot be used for blobs to search by fields. To overcome this issue, records are sent through the indexer before the records are persisted in DAS so that the indexer can keep a searchable copy of record data along with the record ID. This indexer does not store the field values. It only stores the record ID and keeps an index of data fields. When a search query is sent to DAS, the indexer identifies the matching records, gets their record IDs, and sends them to the persistence layer/recordstore. In recordstore level, the record blobs are deserialized and returned as the record objects from the AnalyticsDataService implementation.

SP based Analytics Solutions

The new SP based Analytics profile consists of two components: Worker and Dashboard. The worker is the server that processes the data streams and publishes the statistics to a database. The dashboard reads the statistics published by the worker and displays the statistics on the dashboard. The worker and dashboard are connected through the database. This database hold all the analytics data, both event data & processed event data. WSO2 SP support several RDBMS types like H2, MySQL, PostgreSQL, ORACLE, Microsoft SQL Server, DB2 etc…

Data Persistence in SP based Analytics

Data Migration of WSO2 Analytics Solutions from DAS to SP

Data Migration of Analytics Solutions from DAS to SP means migrating these persisted analytics data in the Analytics Record Store of the old DAS product to the data store specified by the user & used by the Worker & Dashboard in the new SP product for all Analytics Solutions.

Architectural view of Data Migration of IS Analytics from DAS to SP

But we can’t replace the RDBMS data store in the new SP based product with the RDBMS data store in the old DAS based product as it is.

Why can’t we replace the RDBMS data store in the new SP based product with the RDBMS data store in the old DAS based product as it is?

  • Data is stored as blobs in the old DAS product which is encrypted & not human readable.
  • Database level indexing cannot be used for blobs to search by fields.
  • Logical changes in database schema used by old and new Analytics products

We can’t use the same RDBMS data store of the old DAS product as it is in the new SP product because data is stored as blobs in the old DAS product which is encrypted & not human readable. These data are accessible in a human readable format only through the Data Explorer in the Management Console of DAS. These data are accessed through the Data Access Layer. Database level indexing cannot be used for blobs to search by fields.

So How Can We Migrate?

After collecting and storing data, the next step in analyzing data is to analyze them to produce meaningful information. WSO2 DAS’s analyzer engine retrieves data from the data stores and performs various analysis operations on them according to defined analytic queries.

  1. Interactive Analytics
  2. Batch Analytics using Spark SQL
  3. Realtime Analytics using Siddhi
  4. Predictive Analytics using WSO2 ML

Analyzer engine, which is powered by Apache Spark or the realtime Siddhi based Event Processors analyze this data according to defined analytic queries. This usually follows a pattern of retrieving data from the Data Store, performing a data operation such as an addition, and storing data back in the Data Store. Apache Spark is used to perform batch analytics operations on the data stored in Event Stores using analytics scripts written in Spark SQL.

Interactive SQL (Structured Query Language) queries are widely used for exploring and analyzing data in the current context by many business intelligence users. WSO2 DAS 3.0.0. ships with the feature of running SQL queries on the underlying data sources as specified in the DAS Data Access Layer (DAL). It uses Spark SQL as the query engine, which succeeds Apache Hive from WSO2 DAS 3.0.0 onwards. This provides a powerful integration with the rest of the Spark analytics engine. Spark SQL follows the standard SQL format. Some query types of the standard SQL format are not supported by Spark SQL. WSO2 DAS inherits the query parsing options from the Spark SQL’s native query parser.

The query syntax to register a temporary table in the Spark environment using data from a provider class is as follows.

CREATE TEMPORARY TABLE < table_name >
USING < provider_name >
OPTIONS ( < options > )
AS < alias >;

The provider used to create the temporary table in WSO2 DAS can be Carbon Analytics, Carbon JDBC, Compressed Event Analytics or other Relation Providers.

Creating the table using Carbon Analytics as the provider

The following query can be used to create a table in the Spark environment (if it does not already exist), using data from Carbon analytics. Carbon analytics refer to either the built-in H2 database or any external database that is connected to the WSO2 Data Access Layer (WSO2 DAL).

CREATE TEMPORARY TABLE <temp_table> 
USING CarbonAnalytics
OPTIONS (tableName "<table name>",
schema “<schema>” [, primaryKeys “<primaryKeys>”]
);

For Example,

CREATE TEMPORARY TABLE plugUsage 
USING CarbonAnalytics
OPTIONS (tableName "plug_usage",
schema "house_id INT, household_id INT, plug_id INT, usage FLOAT sp, composite FACET -i",
primaryKeys "household_id, plug_id"
);

Creating the table using Carbon JDBC as the provider

The Carbon JDBC provider can be used to interact directly with a relational database, bypassing the DAS Data Access Layer. As a result, the data is stored in the format native to the relational storage mechanism in use without compression or encoding. This is useful in scenarios where the results of a Spark query need to be stored in a relational database to be accessed by third parties such as reporting tools or dashboards, or if the results need to be readable outside WSO2 DAS.

CREATE TEMPORARY TABLE <temp_table> 
USING CarbonJDBC
OPTIONS (dataSource “<datasource name>”,
tableName “<table name>”,
schema “<schema>” [, primaryKeys “<primaryKeys>”]
);

We have to write a spark script to migrate the Analytics data in the Analytics Record Store from CarbonAnalytics to a relational database specified by the user through CarbonJDBC. But database level indexing wasn’t possible through CarbonJDBC provider. So we have to write a JAVA client or any other migration client to create the Analytics tables with indexes related with the new SP product prior to any data migration. We have to write a migration client to be given to the user. Currently a shell script migration client which calls a JAVA client internally was developed for Linux users & a batch script migration client was developed for Windows users. Then the data migration client was the Spark Script. So prior to writing any code we had to explore the database schemas of the Analytics data stores for each Analytics solution of the old DAS product & the new SP product and do a relevant mapping of the tables and relevant fields and find common solutions for any issue faced during the data mapping and data migration.

--

--

Roland Hewage

(Born to Code) | Software Engineer (Ecosystem Engineering) at WSO2 | Bachelor of Computer Science (Special) Degree Graduate at University of Ruhuna, Sri Lanka