Cassandra Data Analytics

Abstract

This project will look at how we can do more advanced data analytics on data hald in Cassandra using the RapidMiner framework.
RapidMiner is a dashboard which can orchestrate data analytics against multiple data sources. It can connect directly to Cassandra. However the way data is stored by OpenNMS using Newts makes it difficult to access by any other framework. This makes it particularly difficult to use widely used data analytics frameworks to look at the data stored in Cassandra. The key problems include:

  • The values are stored as Java blobs and not recognised data types
  • the resource is identified using a long string which is not searchable easily in Cassandra

In this project we will try to identify use cases for data analytics and try and identify improvements or workarounds which can be done using user defined functions and indexing in Cassandra:

  • USF to extract number type data from the Cassandra data value ‘blob’
  • Cassandra indexing with Lucine to search resource values for available data types across resources.
  • construct some searches and machine learning examples using rapidMiner

Folks

Technology

OpenNMS Components

  • Data collection
  • Newts

Talk