Smart test

Condition-based system monitoring (CbM) method and system design

Anyone who understands the necessity of equipment maintenance knows how important the sound and vibration of equipment are. Appropriate equipment health monitoring through sound and vibration can cut maintenance costs by half and double the service life. Realizing real-time acoustic data and analysis is another important condition-based system monitoring (CbM) method.
We can learn to understand the normal sound of the device. When the sound changes, we can confirm that there is an abnormality. Then we can understand what the problem is, and connect the sound to the specific problem in this way. Recognizing abnormalities may require several minutes of training, but combining sounds, vibrations, and causes to implement diagnosis may take a lifetime. Experienced technicians and engineers may have this knowledge, but they are scarce resources. It can be quite difficult to identify the problem through the sound itself, even when using recordings, descriptive frames, or receiving personal training.
Therefore, the ADI team has been committed to understanding how humans interpret sound and vibration for the past 20 years. Our goal is to build a system that can learn sounds and vibrations from devices and decipher their meanings to detect abnormal behaviors and diagnose. This article describes in detail the architecture of OtoSense, a device health monitoring system that supports what we call computer hearing and allows computers to understand the main indicators of device behavior: sound and vibration.
The system is suitable for any device, and it can work in real time without network connection. It has been used in industrial applications to support the realization of a scalable and efficient equipment health monitoring system.
This article discusses the principles guiding the development of OtoSense and the role of human hearing during the design of OtoSense. Then, this article discusses how sound or vibration characteristics are designed, how to understand their meaning from these characteristics, and how to continuously change and improve OtoSense in continuous learning to perform more and more complex diagnoses with better results. For precision.
Guiding Principles
In order to ensure durability, unknowable and efficient, OtoSense design philosophy upholds several guiding principles:
Get inspiration from human neurology. Humans can learn and understand any sound they hear in a very energy-efficient way.
Able to learn static and transient sounds. This requires constant adjustment of functions and continuous implementation of monitoring.
Identify at the terminal close to the sensor. There should be no need to connect to a remote server over the network to make a decision.
Interact with them and learn from them, provided that they avoid interfering with their daily work as much as possible, and the process should be as pleasant as possible.
The human auditory system and the analysis of OtoSense
Hearing is a feeling about survival. It is the overall feeling of distant, invisible events, mature before birth
The process of human perception of sound can be described in four familiar steps: sound analog acquisition, digital conversion, feature extraction and interpretation. In each step, we compare the human ear with the OtoSense system.
Analog acquisition and digitization. The membrane and lever in the middle ear capture the sound, and then adjust the impedance to transmit the vibration to the fluid-filled cavity, where another membrane is selectively shifted according to the spectral components present in the signal. This in turn bends the elastic units, which emit digital signals that reflect the degree and strength of the bend. These individual signals are then transmitted to the primary auditory cortex through parallel nerves arranged in frequency.
In OtoSense, this work is done by sensors, amplifiers, and codecs. The digitization process uses a fixed sampling rate, which can be adjusted between 250 Hz and 196kHz, the waveform is encoded in 16 bits, and then stored in a buffer with a size between 128 and 4096.
Characteristic extraction occurs in the primary cortex: frequency domain characteristics, such as dominant frequency, harmonics, and spectral shape, and time domain characteristics, such as pulses, intensity changes, and main frequency components within a time window of approximately 3 seconds.
OtoSense uses a time window, which we call a block, which moves in fixed steps. The size and step length of this block range from 23 milliseconds to 3 seconds, which are determined by the event to be recognized and the sampling rate for extracting features at the terminal. In the next section, we will explain the features extracted by OtoSense in more detail.
Analysis occurs in the contact cortex, which combines all perceptions and memories, and gives meaning to the sound (for example, through language), which plays a role in shaping perception. The parsing process will organize our description of events, and it's much more than just naming them. Naming a project, a sound, or an event allows us to give it a larger and more layered meaning. For them, the names and meanings allow them to better understand their surroundings.
This is why the interaction between OtoSense and people begins with visual, unsupervised sound mapping based on human neurology. OtoSense uses graphics to represent all sounds or vibrations heard, which are arranged in similarity, but does not try to create a fixed classification. This allows us to organize the groups displayed on the screen and name them without trying to artificially create bounded categories. They can build a semantic map based on their own knowledge, perception, and expectations of the final output of OtoSense. For the same soundscape, auto mechanics, aeronautical engineers, or cold forging presses, or even people who study the same field but come from different companies, can be divided, organized, and labeled in different ways. OtoSense uses the same bottom-up approach to give meaning as in shaping language meaning.
From sound and vibration to characteristics
After a period of time (as shown before, time window or block), we will assign a single number to a feature, which is used to describe the given attribute/quality of the sound or vibration during that time. The principles for selecting features of the OtoSense platform are as follows:
For both frequency domain and time domain, features should describe the environment as completely as possible, with as many details as possible. They must describe the static buzzing sound, as well as clicking, rattling, squeaking, and any momentary changes.
Features should be as orthogonal as possible to form a set. If a feature is defined as the "average amplitude on the block", then there should not be another feature that is highly related to it, such as the "total spectral energy on the block". Of course, orthogonality may never be achieved, but any one should not be expressed as a combination of other features. Each feature must contain a single piece of information.
The characteristics should be calculated. Our brain only knows to add, compare and reset to zero. Most OtoSense features are designed to be incremental, so that each new example can modify the features with simple operations, without the need to recalculate in the complete buffer, or worse, on the block. The computational complexity also means that standard physical units can be ignored. For example, it doesn't make sense to try to express strength in terms of values (in dBA). If you need to output the dBA value, it can be done at the time of the output (if necessary).
Among the 2 to 1024 features of the OtoSense platform, some of them describe the time domain. They are either extracted directly from the waveform or extracted from the evolution of any other characteristics on the block. Among these characteristics, some include the average amplitude and amplitude, the complexity obtained from the linear length of the waveform, the amplitude change, the presence or absence of a pulse and its characteristics, the stability of the similarity between a buffer and a buffer, and the super convolution. Small autocorrelation or changes in main spectral peaks.
The characteristics used in the frequency domain are extracted from FFT. FFT is calculated on each buffer, producing outputs from 128 to 2048 individual frequencies. Then, the process creates a vector with the required dimensions, which is much smaller than the FFT, but still describes the environment in detail. OtoSense initially used an agnostic method to create data buckets of the same size on the logarithmic spectrum. Then, based on the environment and the event to be identified, these data buckets focus on the spectral regions with high information density, either from an unsupervised perspective that can be entropyized, or from a semi-supervised perspective that uses labeled events as a guide. This simulates the cell structure of our inner ear. Where the language information is dense, the voice details are denser.
Structure: Support terminal and local data
OtoSense implements anomaly detection and event recognition at the terminal location without using any remote equipment. This structure ensures that the system will not be affected by network failures, and there is no need to send out all the original data blocks for analysis. The terminal device running OtoSense is a self-contained system that can describe the behavior of the listening device in real time.
The OtoSense server running AI and HMI is generally hosted locally. Cloud architecture can aggregate multiple meaningful data streams into the output of OtoSense devices. For an AI that specializes in processing large amounts of data and interacting with hundreds of devices on one site, cloud hosting makes little sense.

Figure 1. OtoSense system
From characteristics to anomaly detection
Normal/abnormal evaluation does not require much interaction with. Just need to help determine the baseline that indicates that the device’s sound and vibration are normal. Then, before pushing to the device, first convert this baseline to an abnormal model on the Otosense server.
Then, we use two different strategies to evaluate whether the incoming sound or vibration is normal:
A strategy is what we call "normality", which is to check the surrounding environment of any new sound entering the characteristic space, its distance from the baseline point and clusters, and the size of these clusters. The greater the distance and the smaller the cluster, the more unusual the new sound and the higher the outlier. When this abnormal value is higher than the defined threshold, the corresponding block will be marked as unusual and sent to the server for viewing.
The second strategy is very simple: any incoming block whose characteristic value is higher or lower than the baseline value or value defined by the characteristic is marked as "extreme" and sent to the server.
The combination of anomalous and extreme strategies covers abnormal sounds or vibrations well, and these strategies also perform well in detecting increasingly worn and brutal accidents.
From feature to event recognition
The characteristics belong to the realm of physics, and the meaning belongs to human cognition. To link features and meanings, interaction between OtoSenseAI and humans is required. We spent a lot of time researching customer feedback and developing a human-machine interface (HMI) that allows engineers to efficiently interact with OtoSense and design event recognition models. This HMI allows to explore data, label data, create anomaly models and voice recognition models, and test these models.
OtoSense Sound Platter (also known as splatter) allows exploration and labeling of sounds through a complete overview data set. Splatter selects interesting and representative sounds from the complete data set and displays them as a 2D similarity map that mixes marked and unmarked sounds.

Figure 2. 2D splatter sound map in OtoSense Sound Platter.
Any sound or vibration, including its environment, can be visualized in many different ways-for example, using Sound Widget (also known as Swidget).
Figure 3. OtoSense sound widget (swidget).

At any time, an abnormal model or event recognition model can be created. The event recognition model is a circular confusion matrix that allows OtoSense users to explore confusion events.

Figure 4. Event recognition models can be created based on the required events
Anomalies can be inspected and marked through an interface that displays all anomalies and extreme sounds.

Figure 5. In the OtoSense anomaly visualization interface, the sound analysis changes over time.
Continuous learning process-from anomaly detection to increasingly complex diagnosis
OtoSense was originally designed to learn from many people, and over time, make more and more complex diagnoses. The common process is the loop between OtoSense and:
Anomaly model and event recognition model are both run on the terminal. These models create outputs for the probability of potential events and their outliers.
An abnormal sound or vibration that exceeds a defined threshold will trigger an abnormal notification. The technicians and engineers who use OtoSense can check the sound and its front and back sound information.
Then, these will flag this abnormal event.
Calculate the new recognition model and anomaly model containing these new information, and push it to the terminal device.
in conclusion
The OtoSense technology provided by ADI aims to make sound and vibration knowledge continuously available on any device without the need to connect to the network to perform anomaly detection and event recognition. In aerospace, automotive and industrial monitoring applications, this technology is increasingly used for equipment health monitoring, which means that in scenarios that once required knowledge and involved embedded applications, especially for complex devices, The technology has shown good performance.

PREVIOUS：Antenna test method introduction NEXT：How to detect thyristors! Different types of