It looks like you're new here. If you want to get involved, click one of these buttons!

- All Categories 2.2K
- Applied Category Theory Course 352
- Applied Category Theory Seminar 4
- Exercises 149
- Discussion Groups 49
- How to Use MathJax 15
- Chat 479
- Azimuth Code Project 108
- News and Information 145
- Azimuth Blog 148
- Azimuth Forum 29
- Azimuth Project 189
- - Strategy 108
- - Conventions and Policies 21
- - Questions 43
- Azimuth Wiki 710
- - Latest Changes 701
- - - Action 14
- - - Biodiversity 8
- - - Books 2
- - - Carbon 9
- - - Computational methods 38
- - - Climate 53
- - - Earth science 23
- - - Ecology 43
- - - Energy 29
- - - Experiments 30
- - - Geoengineering 0
- - - Mathematical methods 69
- - - Meta 9
- - - Methodology 16
- - - Natural resources 7
- - - Oceans 4
- - - Organizations 34
- - - People 6
- - - Publishing 4
- - - Reports 3
- - - Software 21
- - - Statistical methods 2
- - - Sustainability 4
- - - Things to do 2
- - - Visualisation 1
- General 39

Options

Consider an empirical "random variable," like the temperature T2000 at the North Pole on Jan 1, 2000.

To what extent can this be truly modeled as a random variable, in the technical sense of probability theory? For that we need to have a sample space S consisting of "experimental outcomes," a sigma-algebra of events (subsets) on S, and a probability measure on S; a random variable then has to be a measurable function on S.

So what's the probability space underlying our variable T2000? Would S consist of all "conceivable" histories of the world, and T2000 the function which picks off the temperature at that point in space and time? But this would be a purely fictional construction -- who's to say what's in S and what's not -- and even more artificial would be the assignment of a probability measure to the events in S.

Yet without an underlying probability space, there's no way that we could speak of, say the variance of T2000.

## Comments

I understand that the formalism works well in many situations -- where you have a model to begin with of the sample space, like a uniformly probability distribution on the space of all permutations of a deck of cards.

`I understand that the formalism works well in many situations -- where you have a model to begin with of the sample space, like a uniformly probability distribution on the space of all permutations of a deck of cards.`

Hello David

The problems with traditional answers to your questions were:

A Priori existence of probability measures, what if there is a space that has no such measures e.g. Integers and nothing else!

n-sequence 101010101010... and a random n-sequence 10101000010001111... had the same probability

So Kolmogorov solved these problems with his innovation of Kolmogorov Complexity which assume only integers and computable functions and no a priori probability measures.

I posted the references for the latter in this forum, but Kolmogorov, may he rest in peace, did provide a fantastic answer.

`Hello David The problems with traditional answers to your questions were: 1. A Priori existence of probability measures, what if there is a space that has no such measures e.g. Integers and nothing else! 2. n-sequence 101010101010... and a random n-sequence 10101000010001111... had the same probability So Kolmogorov solved these problems with his innovation of Kolmogorov Complexity which assume only integers and computable functions and no a priori probability measures. I posted the references for the latter in this forum, but Kolmogorov, may he rest in peace, did provide a fantastic answer.`

Here is, for example, a common starting point for mathematical finance: consider the returns of a stock as a stochastic process, with variables R1, R2, ...

Then the assumption might be later introduced that each of the returns Ri is normally distributed.

Again, what does it mean to speak of the probability distribution of the return of IBM on a specific day?

`Here is, for example, a common starting point for mathematical finance: consider the returns of a stock as a stochastic process, with variables R1, R2, ... Then the assumption might be later introduced that each of the returns Ri is normally distributed. Again, what does it mean to speak of the probability distribution of the return of IBM on a specific day?`

It would be clear if someone were to spell out the assumptions of their probability model for the sample space of stock-prices processes, but to say without further qualification "let R1, R2, .. be the process of the returns" is to presuppose that the model has already been identified, and this seems to imply that this model has an objective status.

`It would be clear if someone were to spell out the assumptions of their probability model for the sample space of stock-prices processes, but to say without further qualification "let R1, R2, .. be the process of the returns" is to presuppose that the model has already been identified, and this seems to imply that this model has an objective status.`

You just stated the trouble Kolmogorov had, and solved thru his complexity theory

`>presuppose that the model has already been identified You just stated the trouble Kolmogorov had, and solved thru his complexity theory`

If you have a sequence of bits and wanna know how random it is, compress it, the lesser the size of the new compressed file the less random it was.

If the sequence is truly random and quite large, then compression algorithms could not reduce the sequence size except by some fixed constant(there is more rigorous statement of this by Kolmogorov).

Note that in all the above there is no need for an a priori probability model (measure).

`If you have a sequence of bits and wanna know how random it is, compress it, the lesser the size of the new compressed file the less random it was. If the sequence is truly random and quite large, then compression algorithms could not reduce the sequence size except by some fixed constant(there is more rigorous statement of this by Kolmogorov). Note that in all the above there is no need for an a priori probability model (measure).`

Daily temperature variations are low complexity and substantially deterministic. Seasonal temperature variations are the same. Oceanic temperature variations ala ENSO are the same deterministically forced class of phenomena. The time series can be encoded as a differential equation with a few periodic forcing functions. The complexity is IMO somewhat higher than algorithmically predicting tides.

Just some observations jotted down from my phone.

`Daily temperature variations are low complexity and substantially deterministic. Seasonal temperature variations are the same. Oceanic temperature variations ala ENSO are the same deterministically forced class of phenomena. The time series can be encoded as a differential equation with a few periodic forcing functions. The complexity is IMO somewhat higher than algorithmically predicting tides. Just some observations jotted down from my phone.`

yes, they compress well, which means they are not as random as people might think of them.

If you want to find the random part of a signal, you might denoise and subtract the trend, and get the high frequency part of the signal, and even that is still not as random as we like since the instrument which record them are primitive and record with certain bias which erase much of the randomness.

You might be able to find such almost random signals if you bounce a laser off a particle Brownian motion in some gas or liquid at micro or nano scales.

`>Daily temperature variations are low complexity and substantially deterministic. yes, they compress well, which means they are not as random as people might think of them. If you want to find the random part of a signal, you might denoise and subtract the trend, and get the high frequency part of the signal, and even that is still not as random as we like since the instrument which record them are primitive and record with certain bias which erase much of the randomness. You might be able to find such almost random signals if you bounce a laser off a particle Brownian motion in some gas or liquid at micro or nano scales.`

What I find interesting is that the seasonal temperature variations have a significant periodic solar fraction and the "noise" on top of this is at least partly attributable to the quasi-periodic ENSO contribution. So when the total variation is decomposed according to known forcings, the random fraction is reduced. The same can be done for other known forcings, such as due to volcanic eruptions and total solar irradiance variation (aka fluctuating sunspot numbers). And don't forget anthropogenic excess CO2, of course!

It is entirely possible that the amount of noise is actually very small. I have gone through this exercise in the past and I have it documented in what I call the CSALT model. http://contextearth.com/context_salt_model/

`What I find interesting is that the seasonal temperature variations have a significant periodic solar fraction and the "noise" on top of this is at least partly attributable to the quasi-periodic ENSO contribution. So when the total variation is decomposed according to known forcings, the random fraction is reduced. The same can be done for other known forcings, such as due to volcanic eruptions and total solar irradiance variation (aka fluctuating sunspot numbers). And don't forget anthropogenic excess CO2, of course! It is entirely possible that the amount of noise is actually very small. I have gone through this exercise in the past and I have it documented in what I call the CSALT model. <http://contextearth.com/context_salt_model/>`

Dara wrote:

Note that Kolmogorov is the one credited with the introduction of the modern-day formalism for probability spaces. My question was basically about how truly applicable they are to concrete empirical situations.

Now isn't Kolmogorov complexity an entirely different animal, being that it is a measure of the computational complexity required to represent a given string of digits?

Again, for concreteness, let's consider the standard starting point for (a certain branch of) mathematical finance: let S1, S2, ... the random variables which represent the prices of a stock on day 1, 2, ..., taken as part of a stochastic process. This is calling into force the whole machinery of Kolmogorov probability spaces.

What I'm asking for is a justification for why this is a valid mathematical construct for organizing the empirical investigation of the stock price movement. (Clearly there are other approaches which would begin with a structural analysis of the economic relationships involved, but let's leave that aside, and assume that we are taking a phenomenological approach.)

This is an entire premise of e.g. the Black-Scholes analysis.

Also, how could Kolmogorov Complexity, which does deal with integers, have anything to say about this starting point for (phenomenologial) mathematical finance, which deals with real-valued variables?

`Dara wrote: > So Kolmogorov solved these problems with his innovation of Kolmogorov Complexity which assume only integers and computable functions and no a priori probability measures. Note that Kolmogorov is the one credited with the introduction of the modern-day formalism for probability spaces. My question was basically about how truly applicable they are to concrete empirical situations. Now isn't Kolmogorov complexity an entirely different animal, being that it is a measure of the computational complexity required to represent a given string of digits? Again, for concreteness, let's consider the standard starting point for (a certain branch of) mathematical finance: let S1, S2, ... the random variables which represent the prices of a stock on day 1, 2, ..., taken as part of a stochastic process. This is calling into force the whole machinery of Kolmogorov probability spaces. What I'm asking for is a justification for why this is a valid mathematical construct for organizing the empirical investigation of the stock price movement. (Clearly there are other approaches which would begin with a structural analysis of the economic relationships involved, but let's leave that aside, and assume that we are taking a phenomenological approach.) This is an entire premise of e.g. the Black-Scholes analysis. Also, how could Kolmogorov Complexity, which does deal with integers, have anything to say about this starting point for (phenomenologial) mathematical finance, which deals with real-valued variables?`

Here are some finance applications, I have not read the papers but they seem to be genuine:

Economic law of increase of Kolmogorov complexity,Transition from financial crisis 2008 to the zero-order phase transition (social explosion)

Using estimates for complexity of a time series:

Estimating the Complexity Function of Financial Time Series: An Estimation Based on Predictive Stochastic Complexity

This might have a practical immediate use:

Algorithmic complexity of financial motions

There are many more you could dish up from databases of paper.

`Here are some finance applications, I have not read the papers but they seem to be genuine: [Economic law of increase of Kolmogorov complexity,Transition from financial crisis 2008 to the zero-order phase transition (social explosion)](http://arxiv.org/pdf/0812.4737v1.pdf) Using estimates for complexity of a time series: [Estimating the Complexity Function of Financial Time Series: An Estimation Based on Predictive Stochastic Complexity ](http://www.aiecon.org/staff/shc/pdf/estimating.pdf) This might have a practical immediate use: [Algorithmic complexity of financial motions](http://www2.lifl.fr/~delahaye/dnalor/BrandouyDelahayeMaZenilAlgComplFinMotions2012.pdf) There are many more you could dish up from databases of paper.`

IN most cases e.g. finance you care about up to 3 digits after the decimal point, so you truncate, and you got a finite representation of the numbers, which are easily transformed to itegers (simple shift right of some fix number of digits).

The latter is true about any number computation, there is a finite limit to to number reps.

`>Also, how could Kolmogorov Complexity, which does deal with integers, have anything to say about this starting point for (phenomenologial) mathematical finance, which deals with real-valued variables? IN most cases e.g. finance you care about up to 3 digits after the decimal point, so you truncate, and you got a finite representation of the numbers, which are easily transformed to itegers (simple shift right of some fix number of digits). The latter is true about any number computation, there is a finite limit to to number reps.`

Paul wrote:

What is AMOUNT OF NOISE?

`Paul wrote: >It is entirely possible that the amount of noise is actually very small. What is AMOUNT OF NOISE?`

I don't think there is a justification of the kind you are looking for. One could ask the same type of question about differential equations. We use this or that mathematical machinery because we find it useful, or because it works better than anything else we can think of.

`> What I’m asking for is a justification for why this is a valid mathematical construct for organizing the empirical investigation of the stock price movement. I don't think there is a justification of the kind you are looking for. One could ask the same type of question about differential equations. We use this or that mathematical machinery because we find it useful, or because it works better than anything else we can think of.`

Dara asked:

The fraction of the time-series that can not be attributed to some known forcing or resonance I consider noise. And when or if that fraction is eventually attributed, it is no longer considered noise.

This is an example of a global temperature CSALT fit of the GISS time-series, with training interval up to 1950. This uses primarily the CO2, SOI, volcanic eruption, TSI, and the LOD signals as forcing to project extrapolated values post 1950.

The difference, or residual between model and data, is the noise, but considering the correlation, this may indicate that the model has captured a significant fraction of the climate signal.

`Dara asked: > " > What is AMOUNT OF NOISE? > " The fraction of the time-series that can not be attributed to some known forcing or resonance I consider noise. And when or if that fraction is eventually attributed, it is no longer considered noise. This is an example of a global temperature CSALT fit of the GISS time-series, with training interval up to 1950. This uses primarily the CO2, SOI, volcanic eruption, TSI, and the LOD signals as forcing to project extrapolated values post 1950. ![csalt](http://imageshack.com/a/img534/2/isd.gif) The difference, or residual between model and data, is the noise, but considering the correlation, this may indicate that the model has captured a significant fraction of the climate signal.`

Paul noise is different than error or deviation. Noise has to be much higher frequency in order to be clearly discernible from the Trend, else it will be error or deviation. See the write up John did on the wavelets.

of course you could define it as you like, but it will get mixed with Trend is it is low frequency.

`>The fraction of the time-series that can not be attributed to some known forcing or resonance I consider noise. Paul noise is different than error or deviation. Noise has to be much higher frequency in order to be clearly discernible from the Trend, else it will be error or deviation. See the write up John did on the wavelets. of course you could define it as you like, but it will get mixed with Trend is it is low frequency.`

Graham wrote:

I agree that it's too much to expect a proof of why a model is the right one to use. But there's a important difference between the use of a model based on differential equations, and the stochastic model that I described. DIfferential equations can be part of a well-defined model of reality, in which everything is continuous and smooth. It's an idealization which will match reality, or fail to match it, to whatever extent it does -- but the point here is that it is clearly defined, and we can perform experiments too see how good or bad the model is.

On the other hand, if I only say, let R1, R2, ... be the returns of a stock, viewed as a 'stochastic process,' then here the model itself is grossly underspecified -- what is the sample space, the event structure, and the probability measure? Certain things have to be defined in order for the model to even make sense. For differential equations we do expect the field of scalars and the underlying manifold to be specified.

`Graham wrote: > I don’t think there is a justification of the kind you are looking for. One could ask the same type of question about differential equations. We use this or that mathematical machinery because we find it useful, or because it works better than anything else we can think of. I agree that it's too much to expect a proof of why a model is the right one to use. But there's a important difference between the use of a model based on differential equations, and the stochastic model that I described. DIfferential equations can be part of a well-defined model of reality, in which everything is continuous and smooth. It's an idealization which will match reality, or fail to match it, to whatever extent it does -- but the point here is that it is clearly defined, and we can perform experiments too see how good or bad the model is. On the other hand, if I only say, let R1, R2, ... be the returns of a stock, viewed as a 'stochastic process,' then here the model itself is grossly underspecified -- what is the sample space, the event structure, and the probability measure? Certain things have to be defined in order for the model to even make sense. For differential equations we do expect the field of scalars and the underlying manifold to be specified.`

Actually not exactly true, since e.g. I have done 1000s stock forecasts using e.g. SVR and it had an exact solution ( of course some error similar to differential equations). SVR actually had a closed form parametrization. You do not need even to have any a priori probability and you need not assume any stochasticity. Generally speaking the probability measures do not apply to systems of stocks (or else everyone would know how to make money).

What you refer to as

grossly underspecified= is thespecification human read-ableorhuman cognizable? Answer is most of the times No!Because of our very limited education and exposure to actual dynamical systems, we tend to only know of simple systems which have simple system of equations. In reality they are all of the same similar forms, whether modeled by differential equation or stochastic processes, but our view of them are prejudiced somehow.

Our

Pencil and Papereducation tells us to simplify a system in order to be able to deal with it, but we no longer need to do that since we could now use symbolic numeric software to do huge investigations of the dynamical systems without any simplifications! Best example was what Paul did with his symbolic modeling of the differential equations on actual climate data.`>On the other hand, if I only say, let R1, R2, … be the returns of a stock, viewed as a ’stochastic process,’ then here the model itself is grossly underspecified – what is the sample space, the event structure, and the probability measure? Actually not exactly true, since e.g. I have done 1000s stock forecasts using e.g. SVR and it had an exact solution ( of course some error similar to differential equations). SVR actually had a closed form parametrization. You do not need even to have any a priori probability and you need not assume any stochasticity. Generally speaking the probability measures do not apply to systems of stocks (or else everyone would know how to make money). What you refer to as **grossly underspecified** = is the **specification human read-able** or **human cognizable**? Answer is most of the times No! Because of our very limited education and exposure to actual dynamical systems, we tend to only know of simple systems which have simple system of equations. In reality they are all of the same similar forms, whether modeled by differential equation or stochastic processes, but our view of them are prejudiced somehow. Our **Pencil and Paper** education tells us to simplify a system in order to be able to deal with it, but we no longer need to do that since we could now use symbolic numeric software to do huge investigations of the dynamical systems without any simplifications! Best example was what Paul did with his symbolic modeling of the differential equations on actual climate data.`

I'll replace R1, R2,... with $n$ random variables $X_1, \dots, X_n$. (I want a finite number of them for simplicity, and want $R$ to be the reals.) The event space is $R^n$. Let $X = (X_1, \dots, X_n)$. A stochastic process can provide a density $f_X$ in $R^n$, which tells you the probability that $X$ belongs to to a region $A$ of $R^n$ in the usual way

$$ Pr(X \in A) = \int_A f(x_1, \dots \x_n) d x_1 \dots d \x_n$$ Typically $f_X$ is not an explicit formula, but is implicitly defined by some process.

A simple example. You could assume that $log(X_{i+1}) = log(X_i) + Y_i$ where the $Y_i$ are independent and identically distributed normals: $Y_i ~ N(.00005, .01^2)$.

`> On the other hand, if I only say, let R1, R2, … be the returns of a stock, viewed as a ’stochastic process,’ then here the model itself is grossly underspecified – what is the sample space, the event structure, and the probability measure? Certain things have to be defined in order for the model to even make sense. For differential equations we do expect the field of scalars and the underlying manifold to be specified. I'll replace R1, R2,... with $n$ random variables $X_1, \dots, X_n$. (I want a finite number of them for simplicity, and want $R$ to be the reals.) The event space is $R^n$. Let $X = (X_1, \dots, X_n)$. A stochastic process can provide a density $f_X$ in $R^n$, which tells you the probability that $X$ belongs to to a region $A$ of $R^n$ in the usual way $$ Pr(X \in A) = \int_A f(x_1, \dots \x_n) d x_1 \dots d \x_n$$ Typically $f_X$ is not an explicit formula, but is implicitly defined by some process. A simple example. You could assume that $log(X_{i+1}) = log(X_i) + Y_i$ where the $Y_i$ are independent and identically distributed normals: $Y_i ~ N(.00005, .01^2)$.`

Make Graham's f a constant for a uniform distribution, it says e.g. for a coin that P(TTTTTTTTTTTTTTTTT....) = P(THTTTHTHTTTHHHTHTTTHT...) for sequences both of length n, one random one very regular. This was considered to be an anomaly for probability theory and then Kolmogorov resolved that by introducing his Complexity.

Basically the first sequence of all Tails's shortest program that generates is quite short i.e. length of T + length of digits in n (Logn). However the second random sequence's shortest program that generates it is quite long almost as long as n itself.

`Make Graham's f a constant for a uniform distribution, it says e.g. for a coin that P(TTTTTTTTTTTTTTTTT....) = P(THTTTHTHTTTHHHTHTTTHT...) for sequences both of length n, one random one very regular. This was considered to be an anomaly for probability theory and then Kolmogorov resolved that by introducing his Complexity. Basically the first sequence of all Tails's shortest program that generates is quite short i.e. length of T + length of digits in n (Logn). However the second random sequence's shortest program that generates it is quite long almost as long as n itself.`

Recommended reading is Murray Gell-Mann's The Quark and the Jaguar. I read the half of the book on complexity theory and never got around to the particle stuff as his treatment of complex processes whet my appetite sufficiently.

`Recommended reading is Murray Gell-Mann's The Quark and the Jaguar. I read the half of the book on complexity theory and never got around to the particle stuff as his treatment of complex processes whet my appetite sufficiently.`

Stanford's Coin Flipping machine:

DYNAMICAL BIAS IN THE COIN TOSS

They built a coin flipping machine, and they showed that under strict control they could flip the coin same side up all the time. But that does not happen in

natural flipe.g. flip by hand. There is a bias.They also found what they called

The pragmatic uncertainty principlei.e. attaching any device to the coin e.g. ribbon for still photography (to detect some trace of motion) changes the experiment completely!My conclusion when I read these papers was: Randomness is a concept that requires extreme delicacy to study and understand, it is however taught from high school level into the college years with a degree of carelessness and thus has occluded our vision of dynamical systems in the nature.

For example, we could have TTTTTTTTTT... for a coin, but we do not find that in the nature, it does not mean TTTTTTTT... is impossible or even improbable to find, it just means

we did not observe it in our daily physical setup.`Stanford's Coin Flipping machine: [DYNAMICAL BIAS IN THE COIN TOSS](http://statweb.stanford.edu/~susan/papers/headswithJ.pdf) >Conclusion:The discussion also highlights the true difficulty of carefully studying random phenomena. If we can find this much trouble analyzing a common coin toss, the reader can imagine the difficulty we have with interpreting typical stochastic assumptions in an econometric analysis. They built a coin flipping machine, and they showed that under strict control they could flip the coin same side up all the time. But that does not happen in **natural flip** e.g. flip by hand. There is a bias. They also found what they called **The pragmatic uncertainty principle** i.e. attaching any device to the coin e.g. ribbon for still photography (to detect some trace of motion) changes the experiment completely! My conclusion when I read these papers was: Randomness is a concept that requires extreme delicacy to study and understand, it is however taught from high school level into the college years with a degree of carelessness and thus has occluded our vision of dynamical systems in the nature. For example, we could have TTTTTTTTTT... for a coin, but we do not find that in the nature, it does not mean TTTTTTTT... is impossible or even improbable to find, it just means **we did not observe it in our daily physical setup**.`

Graham, thanks, your comment 19 is helpful.

`Graham, thanks, your comment 19 is helpful.`