(Poster: Femr2)

A quick repost of the basic pure sub-pixel tracing example...

I generated the following animation of a simple circle moving in a circle itself...

I then downscaled it until the circle is a small blob...

If we look at a blow-up of the shrunken image, such that we can see the individual pixels, it looks like this...

Sub-pixel feature tracking essentially works by enlarging the area around the feature you want to track, and applying an interpolation filter to it. Lots of different filters can be used with varying results.

Applying a Lanczos3 filter to the previous GIF, to smooth the colour information between each pixel, results in the following...

I think you will see that there will be no problem for a computer to locate the centre of the circle quite accurately in that smoothed GIF, even though the circle in the original tiny image was simply a random looking collection of pixels. This process of upscaling and filtering generates arguably more accurate results than simply looking at inter-pixel intensities.

The resulting position determined is therefore clearly sub-pixel when translated back into the units of the original tiny source.

It is a side effect of aliasing that small movements of the object cause slight variation in inter-pixel intensity, saturation and colour.

Tracing the position of the small blob on the tiny version results in the following...

The raw data is here...
http://femr2.ucoz.com/SubPixelTracing.xls

The graph shows accurate (not perfect) sub-pixel location data for the position of the small blob.

I could go into more detail, but hope that clarifies.

Another test using exactly the same small blob rescale, but extended such that it takes 1000 frames to perform the circular movement. This results in the amount of movement being much much smaller between frames. This will give you an idea of how accurate the method can be...

(click to zoom)

Would you believe it eh.

Here's the first few samples...

0 0
-0.01 0
-0.024 -0.001
-0.039 -0.001
-0.057 -0.001
-0.08 -0.002
-0.106 -0.002
-0.136 -0.002
-0.167 -0.002
-0.194 -0.004
-0.214 -0.005
-0.234 -0.005
-0.251 -0.007
-0.269 -0.008
-0.289 -0.009
-0.31 -0.009
-0.337 -0.012
-0.365 -0.014
-0.402 -0.015
-0.431 -0.018
-0.455 -0.019
-0.48 -0.02

For this example, I'll quite confidently state that the 3rd decimal place is required, as accuracy under 0.01 pixels is clear. There are other sources of distortion, such as the little wobbles in the trace, which are caused by side-effects of the smoothing and upscaling when pixels cross certain boundaries. This reduces the *effective* accuracy. Can be quantified, by graphing the difference between *perfect* and the trace location, but not sure how much it matters.

Now, obviously this level of accuracy does not directly apply to the video traces, as they contain all manner of other noise and distortion sources.

Get SynthEyes (or equivalent)
Get the video.
Unfold it.
Trace the NW corner in both frames (fields if you will)
Export the traces.
Open in excel.
Save.
Upload.

Okay, new blob test variance results...

Quite interesting. Shows the error across pixel boundaries (which makes sense), and the *drift* given circular movement (which is slightly surprising, but about a third of the pixel boundary scale, so may also make sense). Also shows variation in the oscillating frequency dependant upon rotation angle (which makes sense), and flattened, non-oscillating regions at 180 degree intervals (which again makes sense).

A useful image, and should assist in defining trace accuracy considerably.

Will look at the same thing with a square object, and then again with linear movement, rather than circular.

Behaviour for square and linear movement is very similar to that of circular movement.

So, from simple observation of the variance graph, I would suggest...

a) The highest accuracy is attained when movement in parallel to the axis being traced.
b) The highest accuracy is maintained when on-axis movement is < 1/4 perpendicular-to-axis movement. < 1/4 gradient. Within this margin for the example equates to within +/- 0.01 pixel accuracy.
c) The highest *drift* is attained when movement is at maximum velocity.
d) *drift* is recovered when velocity reduces.
e) On such small regions (49 pixels) inter-pixel transitions can result in oscillating positional error of up to 0.06 pixels. It is expected that this will reduce as region size increases (and will be tested)
f) Pixel transition error oscillation period is obviously related to movement velocity.
g) Error does not appear to favour an axis.
h) For pure on-axis movement, for a 7*7 region, minimum positional error lies within +/- 0.005 pixels.
i) Interestin' :)

Thought it would be useful to test a trace of a box corner in perfect freefall...

It's nearish the same scale as the Dan Rather drop distance (both fields). Sample rate is 59.94fps in relative terms.

The x position of the test does not change at all, so the variance is purely SE *noise*...

Previous observations seems to hold true...

a) Vertical variance of around 0.06 pixels.
b) Variance narrows around frame 350, which I imagine is due to the velocity, and so inter-pixel state harmonics.
c) Vertical variance oscillation rate increases as velocity increases, as expected.
d) Drift shifts *downwards* as velocity increases. Will have to double-check if that is a leading or trailing shift, as it will either increase or decrease apparent velocity.
e) Horizontal variance is pure *noise*. +/- 0.001 pixels. No obvious pattern.

For the purpose of tracing...

Interlaced video MUST be deinterlaced.

Deinterlacing CANNOT use and blend, weave or any other technique which attempts to merge data from the separated fields together.

Restoring aspect ratio by line doubling CANNOT be performed.

a) Vertical variance of around 0.06 pixels.
This seems to be fairly constant in numerous test conditions.

I'll hazard a guess at a reason...

It's clearly linked to pixel transition, as can be seen by its oscillatory nature, and that frequency increases with velocity.

Assuming a base hpix (half-pixel) start point, it is also of note that *8 upscaling is applied to the trace region.

0.5/8 = 0.0625

It's conjecture, but it does seem possible that the blending between two pixels can cause cyclic lagging which would be related to upscale multiplier.

It's not a baseless assertion, and it's useful to see how upscaled pixel transitions actually look before dismissing the suggestion. Here's an animated GIF to (hopefully) illustrate the point.

a look at the data with multiple graphs based on sample interval.

The image alternates between fields.

Each image includes three graphs, each with a 3 sample interval.

Shows the clear difference between field traces, and as each interval graph is very consistent, shows [I]why[/I] I trace each field separately.

Vertical shift between fields is as expected, roughly half pixel.

Shows a few jumps on one of the fields (which is not the fault of SynthEyes, but video quality).

I'll sort out the velocity and acceleration derived views later.

Will be wanting to move over to the better quality Cam#3 footage pretty soon, but happy to respond to technical queries beforehand.

DeJitter processing has been deliberately very simple to date, to make released spreadsheet data simple, but may begin using more advanced techniques. No complaints about more complex spreadsheet data once I do though please ;)

a) position error, in pixels

Dan Rather footage - +/- 0.2 pixels

b) scaling metric error, in pixels/ft (footage dependant)

+/- 1 pixel

For WTC 7 there is limited building measurement data available, so with the caveat of accepting the scant NIST provided values...

Vertical scaling metric 3.41 to 3.47 ft/pixel
Horizontal scaling metric 1.64 to 1.66 ft/pixel

Note that these are global metrics over the full distance, and do not affect the positional error metric.

Scaling Metrics for the Cam#3 footage are of higher accuracy, as the footage is of higher quality and resolution (which is why I stated to tfk at the beginning of our discussion that I'd prefer to use that footage)

c) velocity error, in ft/s (footage dependant)

There has been no agreement of noise reduction or smoothing process. Until there is *some* agreement, it's too early to state.

d) acceleration error, in ft/s^2 (footage dependant)'

There has been no agreement of noise reduction or smoothing process. Until there is *some* agreement, it's too early to state.

There's zero impact by increasing the sampling rate at all.

What does change is the way that data must be treated.

We've previously discussed the process of static point extraction, which is the tracing of features on separate buildings and extraction of that data from moving feature traces to eliminate frame factors such as frame deinterlace jitter and camera shake.

Haven't included scaling in the cam#3 data yet. It's all in pixels. Will be sorting per-feature scaling metrics over the weekend hopefully.

Wouldn't be able to use the same scaling for features closer or further from the camera though. Would have to make some attempt at accounting for frame position and relative camera orientation.

It is an absolute MUST to take account of deinterlace jitter when using the data. We've been through that before, but can do so again. The alternate frame *bouncing* is not a problem with tracing technique, it's a direct effect resulting from deinterlacing. Not a problem, just has to be handled properly.

looking at the deinterlace jitter, then no, it's not a software algorithm problem, it's data knowledge.

For this example, I'll quite confidently state that the 3rd decimal place is required, as accuracy under 0.01 pixels is clear. There are other sources of distortion, such as the little wobbles in the trace, which are caused by side-effects of the smoothing and upscaling when pixels cross certain boundaries. This reduces the *effective* accuracy. Can be quantified, by graphing the difference between *perfect* and the trace location, but not sure how much it matters.

Now, obviously this level of accuracy does not directly apply to the video traces, as they contain all manner of other noise and distortion sources. For previous descent traces I've estimated +/-0.2 pixels taking account of noise.

It's possible to extract additional resolution using algorithms such as those within SuperResolution video plugins, but that's not what we're doing. We're performing sub-pixel accurate feature position tracing, which in my case utilises *8 upscaling and LancZos3 filtering, followed by area-based pattern matching to zero-in on specified features.

The accuracy of each individual positional location does not change when increasing the sample rate, it is purely how the resultant data is subsequently treated that has any effect upon derived metrics.

Performing first and second order derivations from noisy data using near-adjacent samples will, of course, result in extreme amplification of that noise.

I'll upload a spreadsheet with my local Dan Rather data soon (I'll make it a bit more presentable first).

It includes...

* NW Corner Trace
* Static Point extraction
* Deinterlace Jitter treatment
* Velocity and Acceleration derivatives by both wide-band symmetric differencing, and least squares.
* A global single-cell scaling metric, so you can modify the scaling as you wish.
* Various graphs of the above.

Here's a copy of my position/time data for Dan Rather (zoomed)...

Have to do all sorts of jiggery-pokery to stabilise the footage if the camera is not essentially static.

I assume we agree that generation of derived metrics must not use near-adjacent samples.

The techniques used are critical. Using, say, 19 sample wide adjacent differencing does not result in such extreme noise amplification.

We're not looking for *jolts* in this context, so there's quite a lot of legroom (headroom) in terms of smoothing the data without negatively affecting the results too much.

To generate the 59.94fps data, it is necessary to combine deinterlaced video footage into *bob doubled* format.

There are inherent differences between alternate frames as a result of this, and I use simple averaging to account for it.

There are two options available...

We can:

a) Use the 59.94fps data, and be aware of the inherent deinterlace jitter

-or-

b) Use the TWO sets of deinterlaced trace data at 29.97fps

The latter will allow the generation of two separate drop/velocity/accel curves, with the caveat that they are 1/59.94 seconds apart, but the removal of the jitter.

To save some time, I've uploaded a simple copy of the latest Dan Rather Data.

I imagine we'll be using totally different processing techniques on the data, so best not to pollute the given data with too much of mine...

Download at this link

Columns:

C/D - Raw pixel data for NW corner

F/G - Raw pixel data for static building

I/J - Pixel data for normalised NW corner (Static point data subtracted)

L/M - Pixel data for de-jittered NW corner (simple two point rolling average)

S/T - De-jittered NW corner data in feet (Uses variable scaling metric source values)

The scaling metrics can be modified, but are based on the building width and the visible distance between NW corner and the building it becomes obscured by.

The vertical distance is based upon NIST values, and I'm inclined to spend some more time on it. It's simply a matter of changing the scalar so I've uploaded the data now. We can agree on refined scaling metrics shortly.

The vertical axis is in pixels and spans +/- 1 pixel, and the horizontal axis spans 13 seconds.

Is there any remaining doubt that the positional data is accurate sub-pixel ?

For reference, here are the trace locations for the data provided...

The original source video can be found here...

Download at this link

Bear in mind it is in interlaced form, and if you choose to use it, the first thing to do is unfold each interlace field.

Any processing of the video should use a lossless codec such as RAW or HuffYUV.

A very *loose* curve fit.

The initial polynomial is simply done with excel, then a quick 2nd order plot in Maple...

Probably useless, but a *start point*. I *very* rarely use Maple, so will dig in and see whether I can use the raw data rather than the poxy excel poly.

Initial equation is polynomial curve fit of position/time data within excel.

Graph is a second order curve - Acceleration.

x - time (s)
y - ft/s^2

Time is 10 to 17 seconds in the supplied data.

I said it's a very loose curve (though it's not *that* far off - have you plotted the equation itself ?), and the graph is probably useless (as it's very low order due to max-ing out on the poxy excel poly fit. Am looking at a order 56 plot at the moment, but there's no rush).

As the initial equation is a 5th order poly, the accel derivation is only order 3, which is why it's the shape it is.

A draft vertical trace...

There is not enough contrast on the background building, and the trace drifts quite badly. So I've omitted it.

Q: the main reason we are down this track of detailed methodology is because a lot of truther claims are based on the premise that "free fall acceleration" == "demolition"

A: Not from my perspective. My reason for performing the tracing is to determine accurate metrics on building movement. Application of the same methods to early motion of WTC1 has proven very insightful and has highlighted numerous inaccuracies with the NIST analyses. Understanding actual movement can only be helpful in understanding the actual events. Performing similar for WTC7 will help clarify numerous behaviours, such as the timing between corner release points.

The data does seem to indicate periods of over-G acceleration. A couple of factors not presented are...

a) Scaling metrics. We're at the mercy of very scant building measurements, and even minor error in scaling could result in significant change to derived acceleration data.

b) Camera Perspective. Again scaling related, but there is no treatment of the data wrt perspective. From the Dan Rather viewpoint it won't make a *huge* difference, but if included it would make *some* difference.

c) Other unknowns, such as whether the video framerate is *exact*. Very slight difference between the actual *original* framerate and the available video could skew results.

Do you still have any doubts about whether the positional data is sub-pixel accurate ? I'm still comfortable with the suggested +/- 0.2 pixel accuracy.

Am still looking at scaling metrics, but the biggest stumbling block is a ridiculous lack of building measurement data. The next biggest stumbling block is the low resolution of the videos, which limits the accuracy possible when determining feature measurements. That's further compounded by the fact that we cannot really use sub-pixel positional methods, as they are only useful for determining changes in position, not static position. I've developed procedures to drastically improve static image quality (stacked image summation for example), but there are limitations of course.

A first stab...

Graph of original distance/time data and order 50 poly fit...

A 5964*3720 pixel version so you can actually see what's occurin...

http://femr2.ucoz.com/WTC7DanRatherDrop.png

Derived Velocity (uses dydx function of PFit Excel Plugin)

Derived Acceleration (uses ddydx function of PFit Excel Plugin)

Notes on scaling...

The Dan Rather data is scaled relative to this measurement...

I've set it to 344ft in the spreadsheet data provided, which is based upon the only distance metrics available (to me).

Again, if anyone has information which will allow a more precise distance to be determined, fire away.

I use a metric provided by NIST (242ft - roofline to top of windows on floor 29) with an addition of a multiple of their stated general floor height (12ft 9in) to account for the increased portion visible at the West edge.

There's no way to avoid amplification of raw data error when performing first and second order derivations of that data I'm afraid. What we can do, if we can agree an amount of error for the position/time data, is generate a good estimation of error magnitude at first and second order derivation levels.

Quick additional note whilst I remember it's there...59.94 Hz is only a conventional approximation; the mathematically exact field rate is 60 Hz * 1000/1001.

The vertical trace remains within ~+/- 0.2 pixel margin for 12s.

The test video (the blob) showed that SynthEyes is technically capable of determining position to the third decimal place.

The West edge trace on the Cam#3 footage was comparable with NISTs (imo) dodgy moire method...

Note the vertical axis on my data is from a *2 upscaled video, so it's actually +/- 1 pixel range, not +/- 2. Very similar to the NIST results (better imo) and well sub-pixel.

So, what is the question ? Is it whether SynthEyes is capable of tracking feature position to sub-pixel accuracy, or whether the noise in the video results in an over 1 pixel margin ?

I think it's been made clear that SynthEyes is more than capable of sub-pixel positional accuracy, and the variance in traces of WTC7 is low. There are all manner of sources of noise which could result in the trace position varying, but they are nothing to do with SynthEyes. It simply determines the location which best matches its reference image. An example of what I'm talking about there is the draft NE corner trace. That showed the NE corner gradually descending over a long period. In reality the corner didn't move, but the *bleed* of the image data reduced. As SynthEyes was told to track that location, it did exactly what it said on the tin. Changing the trace location to the windows near the NE corner negated that particular video artifact.

When compared to other methods used to generate positional data from video from others, I think it's fair to say my data is of clearly superior quality, and comparable with the NIST moire method in accuracy attainable.

That the NIST moire method can only be used for one single point and my method can be used anywhere on the frame leads me to suggest that I'm generating the best quality positional data for the other building locations that there has been presented thus far. It is not clear at all how NIST performed their roofline positional data, but it's description suggests it was not great quality (their described initial point cannot be determined accurately, and there's no way to determine where the roofline actually is near their described location). There's then the fact that they used a point somewhere around the mid-point of the building width in the Cam#3 footage, which has the implication that they've interpreted the flexing and twisting motion as vertical movement, and ignored perspective correction. All these issues can be cleared up by using better, and public, trace data.

If you can accept the validity of the tracing methods, we can move forward into interpretation of the various traces that have been performed.

For WTC7 that will mean quantifying things like...

a) Was there vertical kink of the roofline ?
b) Did the NE and NW corners release at the same time ?
c) How early did movement begin ?
d) How early did vertical movement begin ?
e) Implications for NIST report...
etc, etc.

Following that, it'll be the turn of WTC1, with a plethora of traceable observations possible.

Graphs have been presented which are in good agreement with the NIST moire method, showing horizontal movement down to inch accuracy. Inch, not feet

The focus on second order derivations of the data is fine up to a point, as it's there where noise levels can be amplified if the data is not treated appropriately.

However, the majority of points I intend on analysing do not require derived data plots, but instead use position/time data comparisons between various points to describe order and scale of movement, not rate of movement. An example of this would be using trace data to determine the angle through which *tilt* of WTC1 progressed before *release* of all four corners (and so the transition to vertical drop).

Worth noting the strange additional focus on my *analysis* of the data...which I haven't really done here yet. The extent of *claims* at this point is simply suggesting a +/- 0.2 pixel accuracy for the Dan Rather position/time data. Of course that value will vary depending upon what footage is being traced.

SynthEyes is used in a mode where a rectangular region is traced, rather than a single pixel.

Focus on a particular feature, such as the NW corner, is obtained by moving the initial region such that the NW corner is in the centre of the region (though it doesn't have to be, and there are sometimes good reasons not to make it so.).

Region placement can be performed to sub-pixel accuracy.

SE then scans a larger defined region centered on the same location, using pattern matching of the region with *8 upscaling and LancZos3 filtering applied, for a best fit match in the next frame.

It provides a Figure of Merit metric for each frame region match (0 perfect to 1 no match), which is included in the data made available.

Next frame region is of course automatically re-centered on the newly found sub-pixel location.

Subsequent matches are performed using the initial pattern, or, when keyframing is selected, it updates the base pattern every (n) frames to handle time based variations. I can explain the benefits/drawbacks of using keyframing if necessary.

Where I'm going with this is...

As a region is used, SE can use numerous pixels to determine feature location. As the region is upscaled, this translates to many pixels, even with a small region size.

I use regions as large as is practical for the purpose in hand. This is a subjective process, and region size selection and shape is pretty intuitive with experience of performing the traces.

For example, I use a larger region size when tracing static features, and a smaller region size when tracing features such as the NW corner to provide more *focus* and negate the effect of background features and suchlike.

The localised effect of elements such as heat and smoke is, imo, minimised by...

a) Using region based tracing in the first place

and

b) Making the regions as large as possible.

Here is an example image which shows relative region sizes...

Consider a single white pixel feature on a black background.

If that feature moves left by one pixel, gradually, the aliasing end result is that the intensity of the original pixel drops, and the intensity of the adjacent pixel increases.

Assuming simple 8-bit greyscale colour depth, that alone allows for detection of 255 positions, translating to 1/255th of a pixel (0.0039 pixel accuracy if you will).

Ramp this up with...

a) Full 24bit RGB colour (3 planes of 8 bit data)
b) Region based pattern matching (normally involving well over 64 separate pixels. Hundreds in the case of static point traces)
c) *8 upscaling
d) LancZos3 filtering

...and I hope you can appreciate that potential technical sub-pixel position change determination accuracy can be...awesome.

A hack at static point trace variance...

I've applied some processing to the data.

1) Aligned separate traces by either +0.1 or -0.1 pixels relative to t9/16
2) Normal 2 point moving average for jitter treatment.
3) Average of all four traces taken as root.
4) Variance as subtraction of each trace from root.
5) 7 point moving average on variance data (just to simplify the display).

The animation shows the effect of increase in order of the poly fit (steps of 2 if I recall)...

I think that the animation shows that significant increase in order does not significantly change the shape of the graph, and so would suggest my version is a little *truer* to reality.

here's a comparison between the NIST WTC 2 horizontal movement following impact to mine...

Whilst there's clear similarity, it is obvious that the NIST data includes FAR more noise and error.

(My data is unfiltered in any way)

Whilst NIST is on the slab...

NCSTAR 1-9 Vol 2 Appendix C - VIDEO ANALYSIS OF WTC 7 BUILDING VIBRATIONS BEFORE COLLAPSE

A few quotes:

The west edge of WTC 7 (to the right in the frame) was of the most interest in this analysis. This was the northwest corner of the building, which was clear of smoke throughout the recorded period.
From the camera's perspective, the angle of this edge appears close to vertical.
The tripod mount kept fluctuations in viewpoint to a minimum, although they were not be negligible.
In the previous WTC 2 moir� analysis, frame-to-frame and slow motions of a tripod-mounted camera were found to be a source of error.
Due to the camera location, points on the north face near the northwest edge were closer to the camera than points near the northeast edge. However, this distortion was small, since the width of the north face was much smaller than the distance of WTC 7 from the camera.
The perspective view of the camera looking up at WTC 7 also introduced some error into the measurement of the number of pixels for the width, as did the uncertainty of a couple of pixels in the exact location of the edges defining the north face.
Given these sources of error, an estimate for this video of the width of the north face of WTC was 301 � 4 horizontal pixels.
Since the true dimension of the north face was 329 ft, the conversion factor was 1.09 ft � 0.02 ft per horizontal pixel. Combining this with the equivalence of 100 � 10 vertical pixels for each horizontal pixel gave the final conversion factor of 1.1 ft � 0.1 ft (13 in. � 1 in.) for each 100 pixels of vertical marker motion.
To prepare for analysis, the video clip was exported into a sequence of images, with each image carrying the data for a single frame of the video. Each frame was then converted from the original RGB color into grayscale values.
One source of uncertainty in this analysis was the curve-fitting process; another arose from defects in the video images.
Many of the frames from this video contained defects, such as the color and black-and-white patterns in the lower center and lower left of Figure C-1. Similar defects occurring along the northwest edge being used for the analysis were responsible for some outlier points in the results.
To find the intersection point of the pixel intensity plots for these two pixel columns, the data for each column were first fitted to a smooth curve. A third-order polynomial least-squares fit gave a good compromise between the variance (a measure of the distance of each data point from the curve) and an estimate of the error in determining the intersection point.
Starting at about 6 s before the penthouse began its downward movement into the building, there was an abrupt change in the slope of the data that marked the beginning of oscillations that continue until collapse. There was also a second abrupt change in the data at about 1.5 s before the penthouse started moving downward.
there are major changes in the location of the marker point that occurred over long time intervals of 20 s or greater. These changes were likely due to movement of the camera.
Of primary interest in this analysis was any information that could shed light on the collapse sequence of WTC 7.

If you have been following the detail of our *discussion*, it should be immediately clear to you why I have highlighted the quotes above.

br/>