I guess $C(x)$ gives the base climatology (pre-industrial or whatever), but $W(x,t)$ must contain a lot of the climate change signal. I think a minimal assumption for a "climate" term would have some time dependence, such as a separable product $C(x) \theta(t)$, with temperatures treated as anomalies. This is known as "pattern scaling". Maybe that's all built into $W$.

It's reasonable to assume that the spatial pattern of the climatology is additive with the mean if you define that to be some baseline climate state, and not too bad to assume that the "weather" is also additive. (If you get a little more sophisticated and propagate additive weather noise through linear dynamics, you'll get an additive term with autocorrelation.) This will break down eventually as you force the system more strongly and nonlinearities become more apparent, but is probably ok for the instrumental record. But you do need something that can capture regional climate change effects like polar amplification (such as a pattern scaling approach), which change the spatial climatology (e.g. by decreasing meridional gradients). It also needs to encode some of the spacetime correlations adequately (I don't know what covariance model they're using).