I recently came across Revisiting the Past: A Look Back at the X200 Series by Chris Carson at LowEndMac.com. The article was about a series of Power Macintosh and Performa computers that Apple shipped in the mid 1990’s, all based on the same motherboard with 603 or 603e processors. Their model numbers were in the 5200/5300 and 6200/6300 range (before the 6360), and they are often referred to as the x200 series.
The article portrays this series in a rather poor light claiming that it was “…horribly crippled by an incredibly bizarre architecture…” I was surprised to read this. In the 1990’s I used a Performa 6300CD for a couple of years and I never thought of it as being a bad or slow computer. Curious as to how this line got such a poor reputation I followed the links to read both Power Mac and Performa x200, Road Apples by Dan Knight, and Performa and Power Mac x200 Issues by Scott L. Barber.
I was immersed in Macintosh programming in the late 1990’s and I’m familiar with both the PowerPC architecture and Apple’s motherboard designs. I don’t know what the source of information was behind these articles. But just about everything they say is wrong.
Some of it is bang-your-head-on-the-keyboard wrong.
I think Low End Mac is a great resource and I intend no disrespect towards the authors above. But these articles are a source of myth and confusion about the x200 series. In this rebuttal I will focus on and quote Mr. Barber’s piece which appears to be the foundation for later articles.
Addendum: Ted Hodges followed up to my article with a new article at Low End Mac: The Golden Road Apple: How I Discovered that the Worst Mac Ever Wasn’t.
While there are other links below the most important references are:
- IBM’s PowerPC 603 RISC Microprocessor Technical Summary
- IBM’s PowerPC 603e RISC Microprocessor Technical Summary
- Apple’s Developer Note Power Macintosh 5200/75 and Power Macintosh 6200/75 Computers
The first x200 models used the PowerPC 603 processor which had a 16K on-chip cache (8K data + 8K instruction). This cache was too small for Apple’s 68K emulator to work efficiently. Even PowerPC native apps were affected because large portions of the OS were still 68K. This was not the case with the 603e used by later models as it had a 32K on-chip cache (16K+16K).
Other than the cache difference, everything I say below in reference to the 603 applies to the 603e as well.
“What this means to the home user is simple: Because the memory path is only 32 bits wide and the processor uses 64-bit commands, it takes four processor cycles before an instruction can be completely sent to the processor.”
The 603 implemented the 32-bit PowerPC architecture. Instructions, addresses, and general purpose registers were 32-bit and could be loaded with a single read over a 32-bit bus. The floating point registers could be treated as 32-bit single precision or 64-bit double precision values.
Put simply, the 603 did not need additional cycles to load a command on a 32-bit bus and therefore did not take “…four times longer to process anything.”
What about 64-bit values? The 603 was a superscalar processor with an independent Load Store Unit (LSU). This means the other units could work on data in the registers while the LSU accessed main memory. Loading a 64-bit value would take more time over a 32-bit bus. But it would neither pause the 603 nor require extra cycles to compose the value. As such the load times given in both articles were incorrect. (DRAM access timing is beyond the scope of this article. I will likely address load times in a later article.)
The LSU was not limited to floating point values in 8-byte transfers. It could move two instructions or integer values. It could also perform 32-byte burst transfers. These involved multiple data bus accesses but were streamlined in other ways. So while Mr. Barber’s specific claims are false, the general claim that a 603 would be faster on a 64-bit bus is true.
How much faster would depend in large part on the presence and speed of an L2 cache. Without one the difference would be somewhat less than 2x due to DRAM latency. But with a cache the difference would be much smaller.
The x200 models had a 256K L2 cache on a 64-bit bus clocked at processor speed (please see the 40 MHz addendum under x200 Motherboard). Because of this the impact of the 32-bit memory path was minimal.
Evaluating the 32-bit Memory Path
The standard benchmarking tool of the time, MacBench, used real world algorithms which included a good mix of memory I/O. The tests were affected by data buses and caches like any real program. MacBench scores should therefore give us both a reliable indication of overall performance and a way to estimate the impact of the bus.
|Model||CPU||CPU Speed||RAM Bus Width||MacBench 4 CPU Score|
|Performa 6300||603e||100 MHz||32-bit||137|
|Power Mac 8100/80||601||80 MHz||64-bit||142|
|Power Mac 7500/100||601||100 MHz||64-bit||164|
These three Macs are very close in processor, memory, and bus speed, and they all have a 256K L2 cache. The 603e integer unit was a little faster than the 601 integer unit. But this is about as close as we can get to equalizing other factors given the equipment Apple shipped.
As you can see the 6300 score falls shy of the 7500/100 by about 20% and is very close to an 8100/80. While specific algorithms might see more or less of an impact, I think it’s fair to say that the overall speed penalty of the 32-bit bus was roughly 20-25% versus 601 based Macs. This is a far cry from 4x or even 2x.
Mr. Barber claimed that these Macs scored well on benchmarks yet were “absolutely terrible” in the real world. This may have been true with 603 based models due to the emulation issue. But there’s no reason to doubt their predictive value when it comes to the 603e or to PowerPC code.
(Please see the Addendum at the end of this blog post for a real world comparative speed test performed using Photoshop 4.)
Advanced Hardware Issues with the x200 Series
In the Advanced Hardware Issues section Mr. Barber described what he believed to be the motherboard design of the x200 series. The highlights include:
- The 64-bit CPU bus split into two 32-bit buses, “Left32” and “Right32”, with main memory on one half of the bus.
- The CPU forced to act as a bridge between the two buses.
- Components arranged to fill out the bits of a bus and described as conflicting where they overlap.
- A range of mismatched component speeds derived from one clock signal.
- No multiplexers or buffers. Mr. Barber stressed his belief that all multiplexing was handled via software.
Not to put too fine of a point on it, but everything Mr. Barber said in this section was false. Such a motherboard would never work.
To begin with you cannot split a PowerPC data bus in the manner he described. When the processor is on a 64-bit bus it’s going to use all the data pins. If half those lines went to RAM and half simultaneously went somewhere else then an 8-byte or 32-byte transaction would result in a hard crash. In the 32-bit mode there was one single 32-bit bus, not a split bus.
Just as important, the bits of the data bus were not split between devices in order to fill out the bus. All components were connected to the same bits as with any personal computer data bus. They did not interfere with each other because they were address mapped and managed by a strict, hardware enforced arbitration protocol. This is because if two components were to drive the bus at the same time their outputs would be connected leading to electrical damage.
With the article description in mind let’s look at the actual design as documented by Apple.
- The 603 processor, 256K external L2 cache, and ROM shared a 64-bit bus clocked at the same speed as the 603.
- Addendum: I have conflicting information that this ‘local bus’ actually ran at a max of 40 MHz. This is almost certainly true. This doesn’t change my overall conclusions because the observed performance is what it is. And Apple was still clearly trying to improve performance with a faster 64-bit bus to the L2 cache and ROM. In retrospect I should have dug deeper on this point. The idea that this bus ran at CPU speed, or would have had an L2 cache fast enough to take advantage of those speeds, should have raised a red flag for me.
- A custom chip, Capella, linked the 603 bus to a 32-bit 68040 bus running at 37.5 MHz on boards with the 75 MHz CPU, and 40 MHz on later boards with faster processors. Capella provided 64-bit/32-bit translation and bus arbitration. The ‘040 bus connected the memory controller and graphics controller making Capella similar to a northbridge chip.
- Another custom chip, PrimeTime II, linked the ‘040 bus in a hierarchy to a 16 MHz 32-bit 68030 bus. This bus connected sound, ADB, the floppy disk, and the LC PDS slot which dictated the clock speed and ‘030 protocol. PrimeTime II was similar to a southbridge chip. It had I/O buffers, its own clock, and hardware support for multiplexing both 8-bit and 16-bit devices to the 32-bit bus.
A few other important details to note:
- SCSI, IDE, and serial port I/O was handled via controllers embedded in the F108 ASIC on the ‘040 bus.
- The IDE controller was 16-bit with an I/O buffer.
- There was no on board network controller. The comm slot was just a proprietary expansion slot.
Evaluating the x200 Motherboard
The single thing I see which impacted performance in favor of lower cost was placing the memory controller and RAM on a 32-bit bus. Thanks to the L2 cache this enabled Apple to use more of their existing component designs for a small speed penalty. Given the higher clock speeds and lower pricing of these models it was a fair trade off.
Apple could have reduced the cost of Capella by setting the 603 to its 32-bit external data bus mode. Instead they retained a 64-bit bus close to the processor to speed up the 256K cache and ROM. And they designed Capella so that the 603 could continue to execute instructions from the cache and ROM while the ‘040 bus was busy.
Mr. Barber’s article implies that multiple buses and speeds compromised these Macs. In fact most personal computers from that era to today have multiple buses for the same reason that the x200 did: it allows the motherboard to mix slower and faster chips without compromising the fast ones.
Conflicting I/O & Multitasking in the 90’s
Mr. Barber gave numerous examples of devices conflicting with other devices, such as typing vs. audio and networking vs. graphics, which he blamed on overlapping bits. As explained above the data bus bits were not split between devices, and devices could not simply talk over or block each other at the hardware level.
This does not mean that you would never experience problems like those he listed. In Mac OS classic code was cooperatively scheduled. Busy or errant code could retain control of the CPU and block other processes. Every classic Mac could experience paused and dropped typing; broken audio; screen redraw pauses; and general unresponsiveness while under a heavy load. Windows 9x suffered from the same issues. These types of problems were eliminated with modern OS architectures.
While using a 6300 at home I used various Power Macs at work. I do not recall the 6300 being better or worse than its peers when it came to these issues. And I do not recall Mr. Barber’s specific examples at all. This does not mean that they never occurred for anyone. But they would have been due to software issues and not the motherboard.
Slow Internet Handling
“One of the biggest complaints about the x200 series is slow Internet handling. For one thing, looking at the chart above, all data from either the ports or the ethernet controller must pass through the processor to get to memory, then be processed, sent to the IDE controller for cache saving, and then interpreted for graphics display.”
The developer note indicates that the SCSI and serial port controllers had direct memory access (DMA). It does not specify whether or not other components had DMA. I would guess most components required the CPU to manage I/O. This has nothing to do with multiple buses. A component either has DMA or the CPU must manage the I/O.
But even if the CPU had to relay traffic for an Ethernet card it would not have impacted Internet handling. A 10 Mbps card downloading at full speed would create bus traffic of 1.25 MB/s to the CPU and then to RAM. The ‘030, ‘040, and 603 buses could carry 64 MB/s, 150 MB/s, and 600 MB/s respectively. Relaying the data would require so little of the CPU’s time that the impact would not be human observable.
If the 603 models were noticeably slow on the Internet then it would have been due to the 16K cache and legacy 68K code. As I recall the 6300CD handled the Internet about the same as Power Macs like the 6100 and 8100.
“Apple scrimped on the port controller. There is no hardware handshaking in the ports, therefore an external modem faster than 9600 baud is useless.”
An 8530 SCC serial port controller was embedded in the F108 ASIC. The developer note documents the handshake pins and specifically notes that the GPi pin for each port is connected to the controller and available for use. The GPi pin is what’s missing from earlier entry level Macs that limits their transfer speeds.
Adrian Winnard’s tests confirm that later models supported speeds greater than 9,600 baud. I remember removing the internal 28.8K modem from my 6300 and replacing it with an external 56K modem. I consistently connected at near maximum speed. Other users have emailed Low End Mac to say the same thing.
It’s possible that a bug in early hardware or software broke handshaking on the first models. It’s also possible this confusion stems from cabling. Whatever the cause behind some users not achieving maximum serial port speeds, Apple did not leave out the serial port controller or hardware handshaking on any x200 model. And they could not have saved any money by doing so.
VRAM & L2 Cache
“Neither video RAM nor L2 cache are upgradable, but in the case of this machine they would only serve to further slow it down.”
As previously noted the x200 Macs had a fast L2 cache (for the time). Increasing it would have only improved performance. But it sat on a DIMM card with the ROM which precluded any future upgrades.
The video buffer was actually composed of 60ns DRAM chips on a private 32-bit bus connected to the Valkyrie video controller. Valkyrie could not offload drawing from the CPU. But it had multiple I/O buffers so that the CPU could move on to other work instead of waiting for writes to complete. A larger video buffer would not have slowed the computer down except for the obvious implication that there would be more for the CPU to draw.
In light of the corrections above do these Macs warrant being called road apples and the worst Macs ever?
Not when it comes to models based on the 603e. The desktop models had the same form factor and expansion options that Low End Mac praised in the Quadra 630, which they called one of the 25 Most Important Macs and “…probably the most flexible consumer Mac ever made.”
Eighteen months after the 630 was introduced the 6300 raised the bar with 4x the integer performance, 20x the floating point performance, and a doubling of both hard disk and CD-ROM speeds. Performas based on the 603e were comparable in speed to professional Power Macs released just a year prior. They were excellent Macs for the consumer market.
I never used the models based on the 75 MHz 603 so I have no frame of reference to evaluate their real world performance relative to the few benchmarks still available online. Unlike the 603e the 603 benchmark scores may not accurately reflect their performance because benchmarks did not stress 68K emulation. Still, I can’t help but wonder if a relatively small issue grew into a legend with time. I am now very curious to restore one and see first hand the impact of the smaller cache.
Whatever their performance, the motherboard was not to blame. If Apple’s 75 MHz 603 based Performas deserve their terrible reputation, they deserve it for the lack of a mere 16 KB of on-chip memory.
Addendum: 6300CD vs 6100/66 DOS Using Photoshop 4
After writing the initial blog post I found my old 6300CD and purchased a 6100/66 DOS. I always intended to write a follow up post speed testing the two across a range of PowerPC and 68K applications. I never got around to doing it while I had both machines. I still have the 6300CD. So at some point in the future if I get another NuBus Power Macintosh with a 601 processor I may follow up with that post.
For now I’m adding a set of Photoshop 4 times to this post. These are the only tests which I carefully performed while I had both machines (i.e. checking all settings for speed impact, performing multiple runs, etc). It should be noted that the Performa has been upgraded with a much larger and newer IDE drive which would likely improve operations involving disk I/O. Never the less the tests showed what I expected. The 100 MHz 603e should be roughly 50% faster than the 66 MHz 601. Instead it’s roughly 20-30% faster, and sometimes comes close to 50% faster. Occasionally it falls behind. I’m not sure why the resize test ran faster on the 6100/66, but the result was consistent. The motherboard imposes a small speed penalty, but nothing remotely close to the 4x speed penalty suggested by Mr. Barber. All times are in seconds.
|Task||Power Mac 6100/66||Performa 6300CD||Percent Difference|
|Unsharp Mask (50%/1px/0 Th)||10||8||125%|
|Gaussian Blur 2px||12||9||133%|
|Resize to 480x300||9||12||75%|
|Save As TIFF (Mac LZW)||27||22||123%|
- Power Mac 6100/66 DOS w/256K L2 cache card, 264 MB RAM, and 350 MB SCSI HDD.
- Performa 6300CD w/256K L2 cache, 64 MB RAM, and a 40 GB IDE HDD.
- Both running System 7.5.5 with Modern Memory Manager on, Virtual Memory off, and 512K of disk cache.
- Photoshop 4 had 40 MB of RAM assigned to it.
- Test file was the 1920×1200 El Capitan Desktop JPEG. All tests were started from the original image.
- No other applications were running and each machine had a minimal set of extensions loaded.
While I had both machines I performed additional informal comparisons. I always found the 6300CD to be a little faster. Blocking tasks would block on either one. Background tasks which were well behaved and did not block on the 6100 also did not block on the 6300CD. I tried my hardest to disrupt audio by typing on the 6300CD (one of Mr. Barber’s specific examples to prove motherboard component interference), but I never could. Network downloads always ran faster on the 6300CD, even with System 7.5 and MacTCP. It should be noted that my 6300CD has a LC PDS Ethernet card, and that this may (or may not) be faster than the comm slot cards available at the time. Regardless, it shows that networking was not bottlenecked by the motherboard design.