Here's the way to think about using negative entropy to find an optimal solution. Consider autonomous vs non-autonomous differential equations. One way to think about the distinction is that the transfer function for non-autonomous only depends on the presenting input. Thus, it acts like an op-amp with infinite bandwidth. Or below saturation it gives perfectly linear amplification

![](https://pbs.twimg.com/media/EyUEt_2U8AIhkhg.png)

In contrast, for an autonomous formulation, the amplification depends on prior values so it requires a time-domain convolution or a frequency-domain transfer function

![](https://pbs.twimg.com/media/EyUGgCeU8AEeg0r.png)

Yet there are many other non-autonomous formulations that aren't linear, for example a companding transfer that takes the square root of the input (used for compressing the dynamic range of a signal).

![](https://pbs.twimg.com/media/EyUHXfVVcAEvEkV.png)

What does this have to do with entropy? Well that transfer function can get very strange but still possess underlying order. Yet that order or pattern may be difficult to discern without adequate information. So consider if the non-autonomous transfer function itself is something odd, such as an unknown and potentially complex sinusoidal modulation. This occurs in Mach-Zehnder modulation. The effect is to distort the input enough to fold the amplitude at certain points.

![](https://pbs.twimg.com/media/EyUIsZRVIAUcfYY.png)

The difficulty is if we have little knowledge of the input forcing or the modulation, we will not be able to decode anything. But with a measure such as Negative Shannon Entropy, we can see how far we can go with limited info.

So consider this output waveform that we are told is due to Mach-Zehnder modulation of an unknown input

![](https://imagizer.imageshack.com/img922/3888/lLoVr3.png)

All we know is that there may be a basis forcing that consists of a couple of sinusoids, and that there is an obvious non-autonomous complex modulation that is generating the above waveform

The idea is that we test out various combinations of sinusoidal parameters and then maximize the Shannon entropy of the *power spectrum* of the transfer from input to output (see the citation in the previous post). We can do this calculating a discrete Fourier transform or an FFT and multiplying by the complex conjugate to get the power spectrum. For a perfectly linear amplification as in the first example, it is essentially a delta function at a frequency of zero, indicating maximum order with a maximum negative Shannon entropy. And for a single sinusoidal frequency modulation, the power spectrum would be a delta *shifted* to the frequency of the modulation. Again this will be a maximally-ordered amplification, and again with a maximum in negative Shannon entropy. Yet, in practical terms, perhaps something such as a Renyi or Tsallis entropy measure would work even better than Shannon entropy. Actually, the [Tsallis entropy](https://en.wikipedia.org/wiki/Tsallis_entropy) is close to describing a mean-square variance error in a signal, whereby it exaggerates clusters or strong excursions when compared against a constant background.

So this is what I have used that works quite well. I essentially maximize the normalized mean-squared variance of the power spectrum

$$\frac{\sum (F(\omega)-)^2}{\sum F(\omega)}$$

The result of a search algorithm of input sinusoidal factors to maximize the power spectrum variance value is this power spectrum

![](https://imagizer.imageshack.com/img924/5228/w54jkW.png)

which stems from this optimal input forcing

![](https://imagizer.imageshack.com/img923/3659/wE7Gon.png)

Note that this is not the transfer modulation, which we still need to extract from the power spectrum.

As a result, this negative entropy algorithm is able to deconstruct or decode a Mach-Zehnder modulation of two sinusoidal factors that's encoding an input forcing of another pair of sinusoidal factors. So essentially we are able to find 4 unknown factors (or 8 if both amplitude and phase are included) by only searching on 2 factors (or 4 if amplitude and phase are included). But how is that possible? It's actually not a free lunch because the power spectrum calculation is essentially testing all possible modulations in parallel and the negative entropy calculation is keeping track of the frequency components that maximize the delta functions in the spectrum. That is the mean-square variance is weighting greater excursions than a flat highly-random background would.

From the paper, this is the general idea. For negative entropy we are looking for the upper spectrum, not the lower, which is a maximum entropy

![](https://imagizer.imageshack.com/img922/6891/XKauf7.png)


Good luck, this works well for certain applications. It may even work better in a search algorithm than if you did a pure RMS minimization of fitting the 4 sinusoidal factors directly against the output, as it may not fall into local minima as easily. Doing the power spectrum helps to immediately broaden the search I think.