[sdiy] Re: 1970's again? Now DSP assembly
Cornutt, David K
david.k.cornutt at boeing.com
Tue Feb 1 16:53:06 CET 2005
Since we're talking about assembly code and loop efficiency,
I'll pass along this sort-of related story from my extremely
brief and long-ago career as an instruction set designer...
Back in the mid-'80s, when I worked at Gould, we were working
on the NP1, which was going to be the company's first highly
pipelined RISC processor. Well, the instruction set turned out
to be not so "reduced"; there was a basic set of instructions
which were all-hardware, but then there were additional
complex instructions implemented in microcode to do things
like memory search, string manipulations, etc. The stuff
that CPU designers were fond of putting in back in those
days for "compiler support".
At one point we had a little contest. Three assembly language
programmers, including myself, were challenged to write some
assembly-language vector math code that would be faster than
what the compiler turned out. We couldn't do it; no matter
how much use we made of the vector instructions, the compiled
code was always faster -- significantly faster. Finally, we
got to see the assembler output of the compiler. We were
astounded when we saw that it didn't use the vector instructions!
The compiler designers had already found out what we didn't know:
coding the operations as partially-unrolled loops of scalar
instructions took maximum advantage of the instruction pipelining
and memory caching, whereas the microcoded vector instructions
caused pipeline flushes and cache misses that totally defeated the
purpose of doing the operations in microcode.
More information about the Synth-diy
mailing list