Synth-DIY Yahoo! Groups Archives

Thread

Stack checking

2004-01-26 by Michael Pont

I'm using the GNU C ARM Compiler Version 3.3.1, to compile for various members of the LPC21xx family.  

The option  -mapcs-stack-check is recognised.  However, I've found various messages on the WWW to suggest that this option (although recognised and documented) has not been implemented for ARM.

Does anyone know what the current situation is?

Thanks,

Michael.

Re: [lpc2100] Stack checking

2004-01-26 by Ben Dooks

... last time i looked (2.95.3) it seemed to be producing correct output for stack checking -- Ben Q:      What s a light-year? A:      One-third less calories

Re: [lpc2100] Stack checking

2004-01-26 by Lewin A.R.W. Edwards

Michael Pont wrote:
> The option  -mapcs-stack-check is recognised.  However, I've found various messages on the WWW to suggest that this option (although recognised and documented) has not been implemented for ARM.

Just IMHO, it is not desirable to rely on runtime stack checking in an 
embedded application. What is the program supposed to do when a stack 
overrun is detected? Hand the user a box of candy? Explode?

It is preferable to analyze stack requirements beforehand, and monitor 
them during worst-case usage testing, if possible. Sometimes I add code 
to my watchdog-kicking code that checks for pre-RTL-init signature at 
the bottom of stack; if it's not there, I write a stack overflow 
diagnostic to the product log and stop kicking the dog to force a reset.

-- 
  -- Lewin A.R.W. Edwards (http://www.zws.com/)
Learn how to develop high-end embedded systems on a tight budget!
http://www.amazon.com/exec/obidos/ASIN/0750676094/zws-20

RE: [lpc2100] Stack checking

2004-01-26 by Paul Curtis

Lewin,

> Michael Pont wrote:
> > The option  -mapcs-stack-check is recognised.  However, I've found 
> > various messages on the WWW to suggest that this option (although 
> > recognised and documented) has not been implemented for ARM.
> 
> Just IMHO, it is not desirable to rely on runtime stack 
> checking in an 
> embedded application. What is the program supposed to do when a stack 
> overrun is detected? Hand the user a box of candy? Explode?

If you're developing, what's the harm in having some extra diagnostics
in your toolset?  Modula-2 and Pascal had this for years, with the
ability to turn stack checking off if you required it.  For heaven's
sake, our Modula-2 compilers even had stack checking in coroutines
allowing you to stop and diagnose a problem when an interrupt comes in
and an IOTRANSFER activates.  The debugger was well integrated and
didn't screw the stack up on stack overflow, so you can see where things
went wrong.

IMO, stack checking stops the headscratcher where you see variables
overwritten for no reason.  Or worse, the heap and the stack collide,
causing delayed havoc.

--
Paul Curtis, Rowley Associates Ltd http://www.rowley.co.uk
CrossWorks for MSP430, ARM, and (soon) Atmel AVR processors

Re: [lpc2100] Stack checking

2004-01-26 by Ben Dooks

... you can do several things: 1) warn the user that the stack is overflowing (better than exception) 2) extend the stack from somewhere (acorn used this for

Re: [lpc2100] Stack checking

2004-01-27 by Michael J. Pont

----- Original Message -----

Show quoted textHide quoted text

From: "Lewin A.R.W. Edwards"
>
> Just IMHO, it is not desirable to rely on runtime stack checking in an
> embedded application. What is the program supposed to do when a stack
> overrun is detected? Hand the user a box of candy? Explode?

I couldn't agree more.

However, my thought is that - if I can get the feature to work - it may
prove useful during development.

Michael.

Re: [lpc2100] Stack checking

2004-01-27 by Michael J. Pont

From: "Ben Dooks" 
> last time i looked (2.95.3) it seemed to be producing correct
> output for stack checking 

Thanks  - I'll explore this again.

Michael.

Re: Stack checking

2004-01-27 by Peter

--- In lpc2100@yahoogroups.com, "Lewin A.R.W. Edwards" <larwe@l...> 
wrote:
> It is preferable to analyze stack requirements beforehand, and 
monitor 
> them during worst-case usage testing, if possible. Sometimes I add 
code 
> to my watchdog-kicking code that checks for pre-RTL-init signature 
at 
> the bottom of stack; if it's not there, I write a stack overflow 
> diagnostic to the product log and stop kicking the dog to force a 
reset.

I have to agree with Lewin, in the ARM tools at least there's a 
significant performance penalty for turning on software stack 
checking - I have seen 15% performance drop. This is due to 
additional code per function call, and having a register dedicated 
to holding the stack limit, which leads to more spill-out onto the 
stack.

The big problem with enabling software stack checking during testing 
then disabling for production is that you should really run all your 
tests again - otherwise you're not releasing the product that you 
tested.

In my experience, unless someone is trying to instantiate a 20K C++ 
object on the stack, its usually an interrupt mode stack that 
overflows rather than the application stack. Checking for signatures 
at the bottom of the stack area can be extended to cover FIQ and IRQ 
mode stacks, and is a relatively low-overhead operation.

One nice feature about this is that generally interrupt code keeps 
running long after application code has found new and exciting ways 
to occupy itself in non-existent memory, so placing the stack 
checking here can rescue the whole system.

Incidentally... I've seen watchdog timer reload code in interrupt 
handlers, but for the reason I've mentioned above that may not 
protect against dead application code. Isn't it better to reload the 
watchdog timer at strategic points in the application? That way, the 
watchdog event kicks in if the application locks up.

Peter.

Re: [lpc2100] Re: Stack checking

2004-01-27 by Lewin A.R.W. Edwards

>>them during worst-case usage testing, if possible. Sometimes I add 
>>to my watchdog-kicking code that checks for pre-RTL-init signature 

> Incidentally... I've seen watchdog timer reload code in interrupt 
> handlers, but for the reason I've mentioned above that may not 
> protect against dead application code. Isn't it better to reload the 
> watchdog timer at strategic points in the application? That way, the 

The care and feeding of watchdogs is a topic of many PhDs. The "correct" 
approach depends on the application and the hardware's idiosyncrasies. 
However, a couple of things:

1. I didn't actually say above that I was kicking the dog in an ISR, 
though it might have sounded that way :)

2. One method I have used with tolerable success [depends on being able 
to analyze relative task time requirements] is to have a short buffer 
(size depends on number of concurrent tasks) and a timestamp variable. I 
code each task so that at an "appropriate" juncture (i.e. not in an 
inner loop), it acquires a semaphore controlling access to the variables 
aforementioned.  It places the current time in the timestamp variable, 
and adds a task ID byte to the head of the buffer, pushing down the 
remaining contents. It then releases the semaphore.

An ISR checks the buffer to make sure that it contains a signature for 
every known task, and no unknown task [which would mean memory 
corruption]. It also checks that the timestamp is "reasonably 
up-to-date", by which I usually mean 10x-20x the operating system's 
scheduler tick. That multiplier might be altered if there are device 
drivers that evilly take long timeslices away from normal tasks. If both 
these conditions are satisfied, the WDT is kicked. If one or both are 
unsatisfied, the ISR writes the event to the device log and returns 
without kicking the dog. This gives the system a second bite at the 
cherry to recover the dead task (but now the WDT is growling). 
Optionally, a supervisory task could see the "task 1234 suspected dead" 
log entry and kill/restart the task. But if the system doesn't normalize 
within the WDT's expiry time, it will reboot.

If one task dies, it will stop putting its ID into the buffer, and that 
ID will soon cycle out as other tasks displace it ->WDT reset. If 
something evil happens and all tasks block on the watchdog buffer 
semaphore, the timestamp freezes and the ISR assumes gross system 
failure and explicitly halts to wait for the dog to bite.

  --
  -- Lewin A.R.W. Edwards (http://www.zws.com/)
Learn how to develop high-end embedded systems on a tight budget!
http://www.amazon.com/exec/obidos/ASIN/0750676094/zws-20

Re: Stack checking

2004-01-27 by Peter

--- In lpc2100@yahoogroups.com, "Lewin A.R.W. Edwards" <larwe@l...> 
wrote:
> 1. I didn't actually say above that I was kicking the dog in an 
ISR, 
> though it might have sounded that way :)

Nope... didn't mean to imply that - sorry, my mind went off at a 
tangent :) This was the subject of some debate at ARM, where the 
standard watchdog peripheral kicked a reset if two consecutive 
interrupts generated by the watchdog timer were not serviced. Unless 
there was some additional interaction with the application code, it 
could entirely fail to detect a locked system.

> 2. One method I have used with tolerable success (depends on being 
able [extensive snip - readers are strongly recommended to see the 
original post]

Very cunning - I like it! I did have some implementation 
observations/questions but I can see I'm going way off-topic ;)

Peter.

Re: [lpc2100] Re: Stack checking

2004-01-27 by Robert Adsett

At 10:04 AM 1/27/04 -0500, you wrote:
> >>them during worst-case usage testing, if possible. Sometimes I add
> >>to my watchdog-kicking code that checks for pre-RTL-init signature
>
> > Incidentally... I've seen watchdog timer reload code in interrupt
> > handlers, but for the reason I've mentioned above that may not
> > protect against dead application code. Isn't it better to reload the
> > watchdog timer at strategic points in the application? That way, the
>
>The care and feeding of watchdogs is a topic of many PhDs. The "correct"
>approach depends on the application and the hardware's idiosyncrasies.
>However, a couple of things:
>
>1. I didn't actually say above that I was kicking the dog in an ISR,
>though it might have sounded that way :)
>
>2. One method I have used with tolerable success [depends on being able
>to analyze relative task time requirements] is to have a short buffer
>(size depends on number of concurrent tasks) and a timestamp variable. I
>code each task so that at an "appropriate" juncture (i.e. not in an
>inner loop), it acquires a semaphore controlling access to the variables
>aforementioned.  It places the current time in the timestamp variable,
>and adds a task ID byte to the head of the buffer, pushing down the
>remaining contents. It then releases the semaphore.

<snip>
Sounds very similar to a technique I have used in the past.  In my case 
like your the watchdog was serviced in a timer interrupt and that interrupt 
used additional data to watch the health of the system.

I was usually running several control loops and each loop would set an ECC 
protected value when it ran.  This data would contain a timeout period for 
that loop.  The timer interrupt would count down each ECC protected value 
and stop resetting the watchdog if any one of them reached 0.  To allow 
startup to proceed w/o incident the loop data could exist in two states 
-'not started' and 'running' with no mechanism provided to shut them down 
(turning them on was just a matter of performing the normal update on them).

This manages to protect against
         - Timer interrupt failure (rare but I have seen it happen, usually 
because someone created a section of code that left interrupts completely 
disabled)
         - memory overwrites.  These would invalidate the ECC code and 
trigger an immediate reset.
         - loop failures or loops taking too long to complete.

What they didn't provide, however, was the capability to log the failing 
task.  I generally only had a few bytes of non-volatile memory available 
for that error logging anyway and often the time to guarantee a write would 
exceed the watchdog period.

Robert

" 'Freedom' has no meaning of itself.  There are always restrictions,
be they legal, genetic, or physical.  If you don't believe me, try to
chew a radio signal. "

                         Kelvin Throop, III

Re: [lpc2100] Re: Stack checking

2004-01-27 by Lewin A.R.W. Edwards

> interrupts generated by the watchdog timer were not serviced. Unless 
> there was some additional interaction with the application code, it 
> could entirely fail to detect a locked system.

Well, that's the problem with any WDT implementation, really - you've 
got an exceedingly dumb peripheral that needs to have your intentions 
communicated to it, and the intention you need to communicate is "reset 
the system if it's not working as I intended". Quite a tall order.

> Very cunning - I like it! I did have some implementation 
> observations/questions but I can see I'm going way off-topic ;)

I don't think it's OT for this listserver. WDT usage in multitasking 
environments is highly relevant to ARM in general. And in particular for 
parts like the LPC21xx, which are designed for deeply embedded control 
functions, system reliability and fast recovery time are potentially 
very interesting and on-topic issues!

The method I described is imperfect, of course - I haven't heard of a 
perfect method. One of the main downsides is that you need to perform a 
lot of real-world code analysis to determine exactly how large to make 
the "run buffer". You also need to choose very carefully exactly where 
in each task you should update the task running buffer. These analyses 
need to be revisited when you update the code, also.

Tasks that can block for a long time (e.g. waiting for a keystroke) are 
hard to monitor with this system, unless you add additional logic to see 
if the task is blocked on an OS service [but then how do you know the 
underlying driver isn't crashed... the solution I prefer is always to 
use asynchronous I/O and periodically poll for completion, updating 
system health indications every time I poll to indicate that I'm not 
really dead, just waiting].

But I think that once it's properly tuned, it's quite capable. It also 
allows the system to recover individual tasks, if possible - which means 
increased availability on a device that takes a long time to reboot.

-- 
  -- Lewin A.R.W. Edwards (http://www.zws.com/)
Learn how to develop high-end embedded systems on a tight budget!
http://www.amazon.com/exec/obidos/ASIN/0750676094/zws-20

Re: [lpc2100] Re: Stack checking

2004-01-27 by Lewin A.R.W. Edwards

Interesting...

> Sounds very similar to a technique I have used in the past.  In my case 

I can't claim to have invented it, but I used this method for the first 
time in some code running on a Z-80 system, probably developed in 1989? 
I know I was talking to a lot of my mentors (other teenagers :) at the 
time, it wasn't an independently developed idea.

> I was usually running several control loops and each loop would set an ECC 
> protected value when it ran.  This data would contain a timeout period for 

I didn't think of the ECC protection, I should definitely add more 
protection of this sort to the task buffer.

> What they didn't provide, however, was the capability to log the failing 
> task.  I generally only had a few bytes of non-volatile memory available 
> for that error logging anyway and often the time to guarantee a write would 
> exceed the watchdog period.

I didn't really start to refine my technique until I started working on 
my submarine. In this case it is very interesting to me to know what 
tasks are failing and why. If the system gets launched with buggy 
software, it will only be money being lost, but it's MY money, which 
makes it much more important ;) Anyway, I really need every possible 
byte of health info to be logged.

The board on which this WDT code I've been talking about is running is 
an x86-based SBC (once the design is refined, this board will be 
replaced with something ARM, probably a custom board). This SBC is 
onboard to do "heavy lifting" tasks (machine vision, bulk storage of 
image and audio data, etc) and tasks that are complicated to do on 
microcontrollers (USB camera interface, 802.11b networking). These are 
all tasks that are really hard to prototype quickly, and also hard to 
estimate CPU requirements for. So it's much easier to buy off-the-shelf 
consumer parts, use an overkill CPU, and see how it performs. Then, ONLY 
once you have a good idea of actual cycles-n-bytes system requirements, 
you can build an optimized custom circuit. The problem is, a lot of 
these drivers and support tasks are subject to intermittent failure. 
(802.11b drivers in particular - if the signal fluctuates, you often 
need to unload the driver and reset the hardware).

The sub is actually driven by an ATmega128, which has an RS232 link to 
the above SBC, a second RS232 link to a GPS rxvr, and an SPI link to 
various peripherals (propulsion motor controllers, accelerometer, 
stepper motor controllers, power management module, etc etc; all of 
these have individual AVR microcontrollers - again, this is an 
ease-of-prototyping thing).

  --
  -- Lewin A.R.W. Edwards (http://www.zws.com/)
Learn how to develop high-end embedded systems on a tight budget!
http://www.amazon.com/exec/obidos/ASIN/0750676094/zws-20

watchdogs, was Re: [lpc2100] Re: Stack checking

2004-01-27 by Matthias Weingart

On Tue, Jan 27, 2004 at 11:28:31AM -0500, Lewin A.R.W. Edwards wrote:
> The method I described is imperfect, of course - I haven't heard of a 
> perfect method. One of the main downsides is that you need to perform a 

Interesting article about watchdogs:

http://www.ganssle.com/watchdogs.pdf

        Matthias

Re: watchdogs, was Re: [lpc2100] Re: Stack checking

2004-01-28 by Lewin A.R.W. Edwards

> Interesting article about watchdogs:
> 
> http://www.ganssle.com/watchdogs.pdf

Yep, it's a good article. Ganssle has written/compiled a book 
[including that article, and many other snippets of great interest], 
which I had the privilege to proofread late last year. I would expect 
to see it on the shelves in a few months. It's listed on Amazon 
already, though not available for order. ISBN 075067606X. Heartily 
recommended. "The Firmware Handbook" is the title.

-- Lewin A.R.W. Edwards (http://www.zws.com/)
Learn how to develop high-end embedded systems on a tight budget!
http://www.amazon.com/exec/obidos/ASIN/0750676094/zws-20

Lpc2000

Stack checking

Stack checking

Re: [lpc2100] Stack checking

Re: [lpc2100] Stack checking

RE: [lpc2100] Stack checking

Re: [lpc2100] Stack checking

Re: [lpc2100] Stack checking

Re: [lpc2100] Stack checking

Re: Stack checking

Re: [lpc2100] Re: Stack checking

Re: Stack checking

Re: [lpc2100] Re: Stack checking

Re: [lpc2100] Re: Stack checking

Re: [lpc2100] Re: Stack checking

watchdogs, was Re: [lpc2100] Re: Stack checking

Re: watchdogs, was Re: [lpc2100] Re: Stack checking

Move to quarantaine