OSRC Research

Data mining and big data management:

Big data is data that has been defined as having all of the characteristics defined by the “5 Vs”: volume (lots of data), variety (highly diverse data), velocity (changing very fast), veracity (hard to fully validate) and Value. Data Mining is an exploratory data-analytic process that detects interesting, novel patterns within one or more data sets (that are usually large). It employs a variety of techniques, including the machine-learning techniques and standard multivariate statistical techniques.

Aims:

We are living in an era of big data, where the process of generating data is continuously been taking place with each coming second. Data that is more varied and extremely complex in structure (unstructured/semi-structured) with problems of indexing, sorting, searching, analyzing and visualizing are major challenges of today’s organizations.

Methods and applications:

Big data is always defined by its 5-v characteristics which are Volume, Velocity, Veracity, Variety, and Value. Almost each data model comprising big data is dependent on these 5-v characteristics. A large number of researches have been done on velocity and volume, but the complete and efficient solution for the variety is still not available in the markets. To analyze huge data sets in order to get insights and find patterns in data is called big data analytics. Big data analytics is the need of every corporate and state of the art organization to look forward and make useful decisions. This is very true concerning healthcare system.

Our researches Applications will:

  • Detect complex patterns in chronic diseases.
  • Improve the organisation of data so that it can easily be rendered and analysed.

Study of chronic pain patterns to decipher the symptoms and investigate their association with the outcomes

The project is at the design phase.

Contact us if you have some knowledge/experience about the subject and wish to get involved.

 

Linking patient reported outcome measures with the clinical picture, blood investigation and daily charts

The project is in the design phase.

Contact us if you have some knowledge/experience about the subject and wish to get involved.

Using smartphones to collect follow-up data after hospital admission

The project is at the design phase.

Contact us if you have some knowledge/experience about the subject and wish to get involved.

There is a hype about data mining research but in open source research collaboration we believe that posing the right question is the most impotent step in achieving good results. 

It is crucial to select the appropriate method of data mining according to the business or the problem statement.

Read the story below from the second world war about how data interpretation is more important than data itself.

Methods to conduct data mining

Association
Classification
Clustering Analysis
Prediction
Sequential Patterns or Pattern Tracking
Decision Trees
Outlier Analysis or Anomaly Analysis
Neural Network

Interpretation of the data is more important than the data itself

During World War II, fighter planes would come back from battle with bullet holes. The Allies found the areas that were most commonly hit by enemy fire. They sought to strengthen the most commonly damaged parts of the planes to reduce the number that were shot down.

A mathematician, Abraham Wald, pointed out that the perhaps there was another way to look at the data. Perhaps the reason certain areas of the planes weren’t covered in bullet holes was that planes that were shot in those areas did not return. This insight led to the armour being re-enforced on the parts of plane where there were no bullet holes.

Interpretation of the data is more important than the data itself. Or more precisely, the reason behind why we are missing certain pieces of data may be more meaningful than the data we have.

Interpretation of the data is more important than the data itself