Names: Love Kush Tak, Harshita Agarwal, Rishabh Sharaff and Shantanu Chhaparia
Department: Dept of Civil Engineering
Abstract—This document presents various machine learning models applied to vibration signals collected from bearings of different health conditions under time-varying rotational speed conditions and then compare accuracies of the models. The main aim is to use the most accurate model in real-life to find health status of machines. This would help industries save a lot of money and time from breaking down of machines. Early detection of fault will also lead to have timely repair and maintenance of machine. This paper would also discuss about how to extract and select features, data visualization, selecting best hyper-parameters for the model.
Index Terms—healthy, inner race fault, outer race fault, fault detection, accuracy
Fault detection in machines using vibration monitoring has been increasing due to easy availability of contact or contactless vibration sensors. Breakdown of machines is a problem for users, for specialized machines it is even more difficult to get it repaired fast. Specialized machine needs highly qualified technicians which would lead to high cost of repairing as well as a lot of time getting spent in repairing. Early fault detection plays a key role here as it could prevent breakdown of machines, early repairing and maintenance could be done. This saves a lot of money and time for the users. The vibration data in real-life is a combination of frequencies which makes it difficult to predict the healthy and fault machines. Therefore, machine learning is considered as a best alternative to predict the health status based on the parameters obtained from training the data. Accuracy should be very high in such cases as if our machine is healthy and it predicts it faulty, then users will be wasting time and money in unnecessary check from technicians.
II. DATASET EXPLANATION
The bearing vibration data is collected under time-varying rotational speed conditions for different bearing health conditions. The bearing vibration data is important for bearing fault detection. The health conditions of the bearing considered are healthy, inner race fault and outer race fault. The operating rotation speed of the dataset include increasing speed, decreasing speed, increasing then decreasing speed, and decreasing then increasing speed. An accelerometer was used was used to collect vibration data and an incremental encoder was used to collect the rotational data. The data is recorded in Ottawa, Canada and its source is mentioned here . The original data consisted of only one column, i.e., vibration data thus it was required to extract features from it in order to implement models. The features identified were 12 different statistical measure like mean, rms, std dev, kurtosis etc. Henceforth the data has been divided into thousand different batches having two thousand vibration data each in order to get these statistical measures. The frequency of the dataset was 200kHz and the total time for which the 200kHz vibration data was obtained is 10 sec, hence each data point is associated with a time of 5 microseconds.
III. LITERATURE REVIEW
Bearing failure is one of the major causes of equipment downtime in any manufacturing industry using rotating equipment. Most of these bearing failures can even cause the whole system to perform erroneously resulting in economic and human losses (Li et al. 2018; Goyal and Pabla 2015). Vibration signature is a widely use method for machine health monitoring and accelerometer, being the most commonly used vibration sensor. After data acquisition, (Goyal 2019) adopted Discrete Wavelet transform (DWT) and Mahalanobis Distance (MD) criteria for signal denoising and feature selection from the time-domain vibration features respectively. Most time domain features are statistical features such as mean value, root mean squares, standard deviation, kurtosis, skewness, peak-to-peak and others. Time domain and frequency domain features usually easy to calculate, are effective in extracting the features of the original rotating machinery signals and so, considered as classification features. Although, most of the vibration signals may have nonstationary characteristic. Wavelet transform, one of the most useful signal analysis methods, has been proved a superior method than the conventional Fourier Analysis in handling non-stationary signals (Konar and Chattopadhyay 2011 ; Zhang et al. 2013 ). Another method to solve the nonstationary characteristic is the Empirical Mode Decomposition (EMD) technique, where the vibration signal of a rotating machine is decomposed into a set of intrinsic mode functions (IMFs). Each IMF may be considered as a basic function of the signal. EMD energy entropy is calculated using the first few IMFs which contain more energy. With these features extracted, classification models are trained for diagnosis. Mahalanobis distance is a useful statistical measure to evaluate the resemblance of an unknown data set to a known one (Lebaroud and Clerc2008; Niu et al. 2011; Wu et al. 2013 ). (Hu et al.  and Sreejith et al. ) combined time domain features with artificial neural network (ANN), in bearing fault diagnosis. In , time domain features and frequency domain features were combined using information fusion and an ANN model was trained for fault diagnosis. In (2018 Elsevier B.V.), three techniques (i.e. statistical analysis, Fast Fourier Transform(FFT), and Variance Mode Decomposition(VMD)) are adopted to extract multi-domain feature contents aimed to fully reveal the intrinsic property of the raw signal. Afterwards, Laplacian Score (LS) algorithm, is used to evaluate classification sensitivity of extracted features and rearrange the feature space to get a low-dimensional feature set. At last, particle swarm optimization-based support vector machine (PSO-SVM) classification model is presented to differentiate the fault category. Cao et al. in  trained a SVM model with feature extraction using PCA method. Ali et al.  adopted empirical mode decomposition (EMD) energy entropy to extract a feature vector as the input of artificial neural network (ANN) for achieving the automatic detection of bearing fault severity.
Random Forest Classifier has also been widely used by researchers (Bo-Suk Yang, et al. ) because of its fast execution, high performance and as it does ensembles decision trees. By increasing various number of trees we get much better accuracy while using single decision trees. For classification problem it works on maximum votes.
In (Sanyam Shukla, et al. ), it has been observed that change in vibration in faulty machines is more as compared to healthy machines which can be seen by plotting Root Mean Square (RMS) or standard deviation (stdev.) for healthy and faulty data. And also, to extract statistical features, data is divided into samples as more the number of samples more features we will have to train and better the accuracy we will get. But increasing too many numbers of sample increases computational time to so it is necessary to choose optimal number of samples.
Convolution Neural Network (CNN) has widespread application in fault diagnosis, it is compared much easier to train than Deep Neural Networks (DNN). CNN has the ability to extract useful and robust features for the monitoring signals. 1D-CNN can be applied directly on the raw signals but due to noise interference components increases the feature extraction capability of CNN and increases the training cost of CNN (Ince et al. ). Time-frequency domain analysis methods like Empirical Mode Decomposition (EMD), Wavelet Transform (WT), Hilbert-Huang transform, and S-transform (ST) algorithms has been used for fault detection. These convert the 1D data into 2D on which CNN can be applied. [20-24].
IV. PROCEDURE & EXPERIMENT
A. Data Visualization
Vibration data were plotted as scatter plots for all three bearing conditions, i.e., healthy, inner fault and outer fault. The visualizations were observed for the four cases of speed which are increasing speed, decreasing speed, increasing then decreasing speed and decreasing then increasing speed. The visualizations lead to a conclusion that inner race fault could be easily identified using vibration data in increasing rotational speed condition.
The amplitude was plotted in time and frequency domain respectively for Inner fault and healthy, Outer Fault and Healthy conditions for the four rotation speed conditions. The visualizations clearly demonstrated the amplitudes were differing for different health conditions and were convincing of the fact that we could predict health of the motor using machine learning models.
In frequency domain, the plots obtained showed similar conclusions as that in the time domain.
B. Feature Extraction
It is defined as a process to estimate some measures which will express the signal. From the raw accelerometer data to use it for classification we extracted some features from the data. A fairly wide set of statistical features viz. mean, Standard Deviation, peak-peak amplitude, energy, entropy, crest factor, kurtosis, skewness, margin factor, impulse factor, variance, RMS were calculated from the accelerometer data using statistics as depicted in the table I.
Fig. 2. Amplitude in time domain for Increasing Speed of Inner Fault (Blue legend) and Healthy (Red) and Fig. 5. Amplitude in time domain for Increasing Speed of Inner Fault(Blue legend) and Healthy (Red)
V. APPLIED MODELS
We have implemented Support Vector Classification and Random Forest Classifier, Artificial Neural Network. We have also implemented an ensemble of these three models.
A. Support Vector Machine (SVM)
The basic principle of SVM is the search for optimal separating hyperplane so that the classification problem becomes linearly separable. For two dimensions, a line is the hyperplane. For a space of n dimensions we have a hyperplane of n-1 dimensions separating it into two parts. According to the SVM algorithm, the points closest to the line from the classes are support vectors. The distance between the line and the support vectors is the margin. The hyperplane for which the margin is maximum is the optimal hyperplane. C and gamma are the tuning parameters that is passed while creating the classifier.
The feature matrix was calculated for two different combination of training and testing data. a) Training data extracted in batches from Trial 1, and testing data from Trial 2, b) Training data extracted in batches from Trial 2 & 3, and testing data from Trial 1. The input matrix for SVM with 9 features was reduced to 4 features using Principal Component Analysis (PCA). This was trained for different values of the tuning parameters (C, gamma, kernel). Table III shows different accuracy scores on the testing data for both the combinations.
Random Forest Classifier is considered as an extended version of decision tress in which we train the data for more than one decision trees and then we assign the class to the data having maximum number of votes.
For training the data, we used number of trees, criterion, maximum depth of tree, maximum features and minimum sample leaf size as our parameters. For different values of these parameters an accuracy of 98% is achieved which would vary generally between 97.5% to 98.1%. For inner fault 100% accuracy is achieved whereas for healthy and outer fault condition 96.85% and 97.15% of accuracy is achieved.
C. Artificial Neural Network
An artificial neural network is essentially comprised of node layers, containing an input layer, one or more hidden layers, and an output layer. Each node in a layer is connected to every node of the previous and next layer. Each connection between an input and an output node is assigned a weight which represents its relative importance.
We have implemented the artificial neural network using PyTorch library. It consists of an input layer, three hidden layers and an output layer (Table IV). The activation functions used over the hidden layers is ReLU due to its faster convergence. With the use of other activation functions viz tanh, sigmoid similar results were being obtained. The Optimizer used is SGD (implements Stochastic Gradient Descent). Loss function taken in use is CrossEntropyLoss which combines LogSoftmax and NLLLoss in one single class. From the literature review of ANN implemented previously in this field and our implementation of the neural network, we arrived at the following architecture of the Neural Network (Table IV)
Result: An overall accuracy of 95 percent is achieved using Artificial Neural Network with the distribution shown in Table V.
Result: Overall Accuracy 95% (11418/12000)
An ensemble classifier trains on ensemble of numerous models and predicts an output based on the most probable chosen class. It simply predicts the output class based on the majority of predicted class using voting. The idea is to create separate models and predict output based on the combined majority of output of each of the models.
We have used majority of votes from individual models to predict the output and obtained the accuracy score of 98.09 percentage. Table II shows the confusion matrix of the ensemble for healthy, inner race fault and outer race fault respectively.
VI. RESULTS AND CONCLUSION
In the graphs we could see sudden change in data points, which could be due to noise (white noise, brown noise, etc.) coming from the sensor while recording the data. In our model the noise has not been accounted to make our model robust and save computation time. The plots of frequency and time domain for Increasing then Decreasing speed and Decreasing then Increasing speed looks same as of Decreasing speeds. We could infer from the graphs of Inner faults in time and frequency domain that the inner fault can easily be recognized due to its either high amplitude (as in case of Increasing speeds) or in high variability (in other 3 speed categories) whereas outer fault seems to have overlapped with healthy data which makes it difficult to predict.
Highest accuracy is achieved using Random Forest Classifier (98%) and Ensemble (98%), followed by Artificial Neural Network (94%) and Support Vector Classification (82%) respectively. 100% accuracy is achieved for inner race fault which could also be validated by plotting graphs in time frequency domain as well as the features like mean and standard deviation of healthy, inner race defect and outer race defect.
RFC could be implemented over the real-time data in machines. The prediction could be done for one second data as for the data we will have 2 lacs of data points i.e. 100 batches. If the model even predicts 90% accuracy (less than what we got) i.e. 90 times healthy and 10 times unhealthy or vice-versa then we could say the health status of machine with full surety. Therefore, we could consider 1 second as minimum period for which the data would be required to predict health status of machine. Decreasing the duration of data would decrease surety and vice-versa. Increasing the frequency of sensor would reduce time taken for the model to predict the accuracy. Therefore, frequency of sensor plays a major role in determining speed of prediction.
We would like to thank Prof. Amit Sethi (Department of Electrical Engineering, IIT Bombay) for teaching us machine learning concepts and Prof. Siddharth Tallur (Department of Electrical Engineering, IIT Bombay) and his students: Vaibhav Malviya (B.Tech) and Indrani Mukherjee (M.Tech) for guiding us in the project. We would also like to thank ”Mendeley Data, ELSEVIER” for providing the data.