Robert,
Could explain why this is a race condition in the timer? I too
am trying abosrb what you are describing. Excellent description, I
could feel the hair pulling, especially with the pulse behind the
graticle! Ugh!
Richard
--- In lpc2100@yahoogroups.com, Robert Adsett <subscriptions@a...>
wrote:
> I just spent a day tracing down a problem. I thought I'd share
the, umm
> adventure, in the hopes that the information might be useful (or
> entertaining in a "thank god it happened to him not me" way).
>
> I've been updating my timing routines and testing and documenting
> them. One of the final steps was to test the one-wire
communication
> library I've been working on. Now this library is a port of the
Dallas
> library adapted for the LPC and they've just updated it so thought
I'd
> bring over the changes at the same time. That was probably a
mistake.
>
> Transferring the changes went without too many problems. (It did
reveal an
> apparent bug in PC-Lint but I haven't managed to make a small
enough test
> case that shows the problem). The first test of the functionality
went
> without problem.
>
> Feeling confident I moved on to the second test. Compile, load
run. Looks
> OK. Run first step, that's odd where's the output. No output, I
figured
> something had changed in the way the routines worked or the display
> routines and I hadn't noticed. Some time later and I hadn't found
anything.
>
> Well, next step is to add debugging prints to figure out where it
stops
> behaving as expected. Add a few and narrow it down to an area where
the
> devices are being enumerated.
>
> Add debug prints at coarse level to enumeration. Uh Oh! It
suddenly starts
> working. Now it's looking like some sort of memory problem or
timing issue.
>
> Feeling confident in the timing routines (I'd just finished testing
them
> after all) I starting down the road looking for memory issues, such
as
> overflow, bad pointers (especially in newly modified code) and even
code
> generation. Hours later no progress except rather more puzzlement.
>
> Oscilloscope time. The puzzle deepens. With the debug prints in
the
> enumeration process is clearly proceeding with each device
responding in
> turn. Take out the prints and the process collapses into something
that is
> not even consistent between tests. Eeekk!
>
> Replace debug prints with calls to WaitUs (a delay routine). With
large
> waits between enumerations (5mS) everything appears to work. I
didn't see
> why that should be necessary, time to explore further. Shorten the
delay,
> the whole thing collapses again. Hmm, that's pointing back to
timing again.
>
> Look more carefully at the collapse. Things make less sense than
> before. A wait after the enumeration appears to affect the
> enumeration. IE enumerate(), wait for 5ms works, but enumerate
wait for
> 1ms fails. The program is managing to effects events after they
> occur? Obviously I'm missing something. Time to stop trying to
solve the
> problem and just start gathering information, something I should
have
> started earlier.
>
> Add pin toggles around the individual enumerations so that the
process is
> easy to correlate on the oscilloscope. Looks right on the
> oscilloscope. The enumeration bit pattern pauses at the pause
between
> enumerations.
>
> Remove the delay between enumerations. It still works. What
the...! The
> pin toggle appears not to happen between enumerations. After a few
more
> tests and fooling around I find it. The pin toggle is just narrow
enough
> to hide behind an oscilloscope graticule.
>
> Remove the toggle between the enumerations. It collapses again.
Good, at
> least it hasn't just disappeared on me.
>
> Now, let's zoom in on the enumeration and see if I can't find out
exactly
> where it's breaking down. Gather up code to trace along the path
as I
> manually decode. (Note that each bit signal is 80-90uS long and
there are
> 70 odd bits so seeing each bit does involve going to a higher level
of
> detail, and as it turns out the devil is in the detail). Repeat
several
> measurements at higher resolution, the first 1/2 dozen or so bits
are not
> consistently repeatable, they seen to fall into two patterns. Ahh,
now I'm
> are getting somewhere.
>
> Freeze a sample and look in greater detail (thank heavens for
DSOs). Reset
> OK, bit 1 OK, bit 2 OK, bit 3 OK, bit 4 OK. Wait a minute what's
that
> glitch doing there and why is bit 4 so much longer than the others?
>
> Take a break. An important part of the process, probably too long
delayed.
>
> Come back to problem. Take several measurements. Save the first
one as a
> reference and take others until I have one that is different from
the first
> for comparison. Hmm the two measurements are very close. Where I
have a
> glitch on the first one, I have a full rise on my second
> measurement. Check the timing and indeed the glitch occurs where
the
> output should normally rise and the extended low period covers the
low
> period for two bits. It appears something is killing the output of
the
> timers match register as it occurs.
>
> Check, read code, check documentation. Some time later I notice
the
> following sequence and realize its possible significance
>
> - Set output low
> - Set timer match to set output high in 75 uS
> - Wait 15 uS
> - Sample input
> - Wait 60 uS
> - Disable timer match (1)
> - Wait additional time
>
> The disable at (1) is there to prevent the match from occurring on
wrap
> around after it's no longer needed. Unfortunately if the waits are
> accurate (and I just spent a fair amount of time ensuring they
were) that
> disable happens at approximately the same time that the match
actually
> occurs and cancels it just as it starts to affect the output. It
appears
> there is a race condition in the timer module. The fix is simple
> enough. Move the disable to later in the process.
>
> I don't think I could hit this consistently if I tried (I expect
the timing
> would change if you looked at it crosswise), so why am I
(relatively)
> confident that this is the source of the problem? A couple of
> reasons. The sensitivity of the effect to code changes. Almost
any change
> in code position or timing eliminates the problem. Changing the
timing
> using the fix suggested works as does adding time to the waits
before the
> disable. Also the glitch coming out of the CPU is only 30nS wide
and I've
> not run across anything in the micro that can produce a pulse that
narrow
> under normal circumstances (hmm if that could be made repeatable it
might
> be useful). Finally reducing the wait time in the sequence above
> eliminates the rise entirely.
>
> I don't suppose anyone can confirm that the timer has this apparent
> glitch? I know whenever I see an issue like that mentioned I
wonder "how
> did anyone ever run across that?". Well now I know :)
>
> Robert
>
>
> " 'Freedom' has no meaning of itself. There are always
restrictions,
> be they legal, genetic, or physical. If you don't believe me, try
to
> chew a radio signal. "
>
> Kelvin Throop, IIIMessage
Re: A tale of timing and short pulses
2004-02-07 by Richard
Attachments
- No local attachments were found for this message.