I did make a quick test, just to be sure ;-)
My data is well within the typical spec - see below.
What you may see in your configuration can be caused by temperature change/drift over time.
Also take in account that this kind of measurements should be done without JTAG connected.
Also take in account directly after flashing a new program the part needs a short time to return
to normal die temperature before you can do this type of qualification.
Can you provide a graph for i.e. 100 samples?
This may show the details!?
Here below a graph for 100 samples - calculated the sample to sample difference (Peak-to-Peak) - showing that with your configuration - even by using an external reference and the RTD (not a short) the noise is within typical spec.