27 Minimum Contrast Estimation
27.1 Motivation
In the previous chapter, we saw that likelihood-based inference for Cox processes takes the form
\[ L(\theta;X) = \mathbb{E}_\Lambda\left[ \exp\left(-\int_W \Lambda(u)\,du\right) \prod_{i=1}^n \Lambda(x_i) \right]. \]
Although this expression is well-defined, it involves an expectation over an entire random field \(\Lambda(u)\), which is generally difficult to evaluate in practice.
This motivates the search for alternative estimation procedures.
27.2 Replacing the likelihood
Rather than working with the full likelihood, we may instead attempt to match certain summary characteristics of the process.
Recall that a point process can be described through:
- first-order structure (intensity \(\lambda(u)\))
- second-order structure (pair correlation function \(g(r)\) or \(K\)-function)
The key idea behind minimum contrast estimation is:
Instead of maximizing a likelihood, we choose parameters so that a model-based summary function matches its empirical estimate as closely as possible.
27.3 General minimum contrast framework
Let:
- \(T(r)\) be a theoretical summary function (depending on parameters \(\theta\))
- \(\hat{T}(r)\) be an empirical estimate computed from the observed point pattern
We define a contrast function:
\[ C(\theta) = \int_{r_{\min}}^{r_{\max}} w(r)\, \left[ \hat{T}(r) - T(r;\theta) \right]^2 \,dr, \]
where:
- \(w(r)\) is a weight function
- \([r_{\min}, r_{\max}]\) is a chosen fitting range
The minimum contrast estimator is then
\[ \hat{\theta} = \arg\min_\theta C(\theta). \]
27.4 Interpretation
This is essentially a functional least squares problem.
- \(\hat{T}(r)\) plays the role of observed data
- \(T(r;\theta)\) is the model prediction
- we choose \(\theta\) to make the two curves as close as possible
Minimum contrast estimation replaces the likelihood with a distance between functions.
27.5 Choice of summary function
The most common choices are:
- the \(K\)-function
- the pair correlation function (PCF)
Both describe second-order structure, but differ in form:
- \(K(r)\) accumulates interaction over distances \(\le r\)
- \(g(r)\) describes interaction at distance exactly \(r\)
In this work, we focus primarily on the pair correlation function, since it provides a more direct description of clustering at specific scales.
27.6 PCF-based minimum contrast
Let:
- \(\hat{g}(r)\) be the estimated PCF
- \(g(r;\theta)\) be the theoretical PCF under the model
Then the contrast function becomes:
\[ C(\theta) = \int_{r_{\min}}^{r_{\max}} w(r)\, \left[ \hat{g}(r) - g(r;\theta) \right]^2 \,dr. \]
27.7 Log-transformed contrast
In practice, it is often advantageous to work with a transformed version.
A common choice is:
\[ C(\theta) = \int_{r_{\min}}^{r_{\max}} w(r)\, \left[ \log \hat{g}(r) - \log g(r;\theta) \right]^2 \,dr. \]
27.7.1 Why log-transform?
The PCF typically satisfies:
- \(g(r) \to 1\) as \(r \to \infty\)
- strong variation near \(r=0\)
The log transform:
- stabilizes variance
- reduces the influence of large values
- emphasizes relative differences
27.8 Weight functions
The choice of weight function \(w(r)\) can have a significant impact.
Common choices include:
- \(w(r) = 1\) (uniform weighting)
- \(w(r) = r^p\) for some power \(p\)
- weights based on variability of \(\hat{g}(r)\)
In practice, weighting is often used to:
- downweight noisy regions (typically large \(r\))
- avoid instability near \(r=0\)
27.9 Choice of fitting range
The interval \([r_{\min}, r_{\max}]\) is crucial.
27.9.1 Small distances
Near \(r=0\):
- \(\hat{g}(r)\) is highly variable
- edge effects are strongest
So we typically take \(r_{\min} > 0\).
27.9.2 Large distances
At large \(r\):
- \(g(r) \approx 1\)
- estimates become noisy
- little information about clustering remains
So we choose a moderate \(r_{\max}\).
The choice of fitting range can strongly influence parameter estimates.
27.10 Connection to Cox processes
For Cox processes, the PCF is typically available in closed form.
For example:
27.10.1 LGCP
\[ g(r) = \exp(C(r)), \]
where \(C(r)\) is the covariance function of the Gaussian field.
27.10.2 CSCP (single-component, mean-zero)
\[ g(r) = 1 + A \rho(r)^2, \]
where:
- \(\rho(r)\) is the correlation function
- \(A\) controls clustering strength
This makes minimum contrast particularly attractive:
- the theoretical PCF is simple
- the empirical PCF is readily estimated
- no integration over random fields is required
27.11 What information is being used?
Minimum contrast estimation based on the PCF uses only second-order information.
That is:
- it depends on \(\lambda^{(2)}(u,v)\)
- it does not use higher-order structure
- it does not use the full likelihood
Minimum contrast estimators based on the PCF cannot distinguish between models that share the same second-order structure.
This point will be central to our later analysis.
27.12 Practical implementation
In practice:
- Estimate \(\hat{g}(r)\) from the observed point pattern
- Specify a theoretical form \(g(r;\theta)\)
- Choose \(w(r)\) and \([r_{\min}, r_{\max}]\)
- Minimize \(C(\theta)\) numerically
In , this is implemented via the mincontrast function.
27.13 Summary
Minimum contrast estimation provides a practical alternative to likelihood-based inference by replacing the likelihood with a distance between summary functions.
For Cox processes, it is particularly convenient because:
- theoretical PCFs are often available
- empirical PCFs are easy to compute
- no integration over latent fields is required
However, this convenience comes at a cost:
- only second-order information is used
- different models may become indistinguishable under this framework
This limitation will play an important role in the comparison between LGCP and CSCP models.