Striving for the best: Process mining and machine learning are a team

Process mining and machine learning both seek to obtain a representation of a system in order to gain further insights concerning the system.  The difference is dictated by the type of representation and insights sought.

What represents “meaning” in data?

Understanding of a world implies the creation of an internal model of the world based on a set of features representative of the world observed. 

“Meaning” is derived from: 

The decision of what features are representative resides 

How to retrieve meaning from data? 

ML method

In brief

Use cases, examples

Algorithms, examples

Supervised classification

rely on labelled samples of various categories to decide which category new samples belong to.

prediction of tumors  based on MRI image datasets, pre-labelled by qualified medical personnel.

Logistics Regression, Decision Trees,

Boosting (AdaBoost),

Support Vector Machines etc

Unsupervised classification

cluster the data into as many groups as needed.

There are methods to determine the optimal number of groups the data is representative of,  but there is no way to describe the exact difference between the outcomes without a close inspection of the clustering algorithm output.

separation of normal versus abnormal status quo (good / bad average vitals).

Nearest neighbour,

K-means (with various distance metrics),

Gaussian Mixture Models (e.g.Expectation-Maximization),

Hierarchical Clustering etc

Regression

predict future behaviour (numeric values) based on knowledge of past behaviour in a similar context

forecast of sleep score (based on  quality of exercise & nutrition, alcohol intake, activity duration etc.); predicting healthcare costs | treatment duration (based on patient age, blood tests, exercise level, primary diagnosis, etc)

Linear (multiple) regression, ridge regression,

kernel regression,

ElasticNet etc

Time Series Analysis

detects/ forecasts short / long term changes in the data.

unusual heart activity in an ECG recording.

ARIMA,

Exponential Smoothing, Seasonal Decomposition, Distributed Lags Analysis, Interrupted Time Series, spectral analysis (Fourier, wavelets)

With / without neural networks?

All problems above are addressable with or without neural networks, the difference being that with neural networks,

  1.  there is no need to have the user predefine what features are relevant to support decision taking in the specific task at hand (the neural network can compute any function, and learns from the data itself what is relevant and what not). 
  2. The more data you have and the larger the network, the better results are to be expected. Classic algorithms fail to improve performance beyond a certain point, irrespective of the amount of data that they are being fed. 

Flavors of neural networks include Convolutional Neural Networks (the typical go-to for image classification tasks), Recurrent Neural Networks (classification and/or prediction of sequences / time series forecasting), autoencoders  (for anomaly detection).

Where does process mining fit in the picture?

Process mining algorithms look to discover flows of activities in the data, and cluster these flows into patterns of repetitive behaviour (sequences of activities). New data can be replayed on top of the discovered models, to understand where and how reality deviates from the expected.

Process mining and machine learning 

ML (both classic algorithms or neural networks) can build on top of this to gain further understanding of these flows:

Process flow categorization seeks to group and confer semantics to the pathways of an entity type through system events. The layer of meaning (e.g. unusual / disruptive / regular behaviour) is added based on similarity of found pathways to previously labelled pathways, or based on combined performance indicator values of the clustered pathways.  Autoencoders can be used for automatic labelling of process behavioral patterns resulting from a process discovery algorithm. The expected behaviour of the system as a whole could be forecasted based on previous patterns of cyclic behavior.