I guess I seem to be confused with the partition function from evidence. When they appear together, I don’t seem to be confused. But when they appeared in different contexts, I just got confused like falling into some illusion tricks.

Given data $latex x_D$ and parameter $latex \theta$, the evidence is simply $latex p(x_D;\theta)$.

And $latex p(x;\theta) = \frac{\hat{p}(x;\theta)}{\int \hat{p}(x;\theta) dx}$, where the denominator $latex Z(\theta)\triangleq\int \hat{p}(x;\theta) dx$ is the partition function.

How come I could get confused? I guess maybe when we consider the MAP estimate of $latex \theta$, we have

$latex p(\theta|x_D)=\frac{p(x_D|\theta)p(\theta)}{\int p(x_D|\theta)p(\theta) d\theta}$, and the denominator is $latex p(x_D)$. Indeed, they do look a bit similar. Computing $latex p(x_D)$ should not be necesary for inference as I was confused earlier. But it is needed for model class selection. That is, does the current parametrized models fit well with the data?

On the other hand, we need to know the partition function even for just selecting the model (matching parameter) from the current model class. Since maximizing $latex p(x_D;\theta)$ involves $latex Z(\theta)$ already.