If there was any weakness of Voodoo Graphics it was it's 3D-only implementation. And since 3dfx did not have yet enough resources to develop 2D internally they had to team up with other companies. First partnership was formed with Orchid's Micronix already in the spring of 1996. But their 2d chip was far from complete and 3Dfx searched for other options. The world was just realizing how strong Voodoo is when 3Dfx and Alliance together announced a complete graphics solution. AT3D single chip accelerator was going to be combined with the Voodoo Rush chipset in a single card. In the beginning of 1997 first board by Hercules came out, but the response was far from positive. What happened? Bad sign were rumors of Direct3D being handled by AT3D and Voodoo chipset being used for Glide games. That does not put much faith in Rush performance and horribly underestimates suckiness of AT3D. Of course, all 3D ended up on the 3dfx chipset, but performance was unexpectedly lacking behind Voodoo Graphics. The Micronix chip did not help the Rush either, in fact it was worse than Alliance.
Note how the FJR chip is now smaller than pixel engine of Voodoo Graphics and it lost 32 pins as well. TMU has usual 2 MB of memory and frame buffer is 4 MB, but shared between 2d and 3d engines. 3dfx parts are running at 50 MHz, this card is well suited for comparison with classic Voodoo Graphics. The AT25 is clocked at 72 MHz and memory cannot keep up with it, creating artifacts on the desktop (!). Setting frequency one or two MHz lower solves it. 3D resolutions of Rush are limited by frame buffer size and 2D chip refreshing capabilities. Alliance chips have 175 MHz ramdac and Micronix only 160 MHz.
The Rush was a reworked Voodoo chipset developed in a year after and therefore codenamed SST-96. What was the time spent on? Not on full triangle setup. Just like its older sibling the chip can start working on triangles with subpixel correction. It takes 16 clocks to correct a triangle as compared to 7 without it, but being a separate unit from triangle rasterization it practically doesn't hold the pipeline back. Besides float to int parameter conversions, that is all that this "setup" unit does. The basic specification was identical to the original Voodoo Graphics except for early boards clocked at 45 MHz, 10% less. Which does not yet fully explain much since the Rush performance deficiency is quite noticeable. What else is going on? First, some rumors and speculations: 1) The Bus fight: 2D chip and Rush are competing with each other for PCI. 2) The Chip fight: Various rumors of incompatibilities were floating around, I trust none of them. But there is certainly (negligible) CPU overhead for managing memory regions of 2D and 3D engines. 3) 3dfx actually changed the pixel pipeline and rather for a weaker one. The chip is not marked FBI as in Voodoo Graphics but FJR (should stand for FBI Junior?). The package size is reduced to that of TREX TMU but actual chip size cannot be determined by wirebond package. 4) Performance hit from new additions. Support for 3D rendering in window requires more clipping and separate addressing for different buffers. Rush supports configurable chroma range keys and stereo 3d rendering, has own 2D blitter and ROP functions, it is nearly a complete video device.
Now lets get real.
1) Frame buffer fight: the Rush features Pseudo Unified Memory Architecture (PUMA), which connects the 2D/3D devices through the pins of the frame buffer memory and the 3D control interface defines signals which coordinate 2D/3D operation. 2D device is the requester and the 3D device is the grantor. Tiling shared buffer has some overhead. I can imagine additional bandwidth and capacity constrains.
2) Memory organization is big deal
The performance loss is a result of different packing and tiling organization of pixels within frame buffer. SST-96 packs pixel quads of color or depth onto separately mapped pages while SST-1 packs pairs of color and depth within a memory word. Additionally, SST-96 tiles pixels of color/depth onto linear strides of memory while SST-1 tiles pixels of color/depth onto rectangular strides. These differences result in lower access efficiency. As polygons are getting smaller and pixels/triangle ratio increases the Rush should suffer from more page misses between color/depth accesses. My test however shows biggest weakness at average 50 pixel/triangle scenario, perhaps with a high speed host we are hitting another bottleneck.
I was wondering how ATI's 1997 Rage Pro flagship with final drivers compares with Voodoos. Card with the same memory amount and bus should be a good comparison.
Blows are being traded, but after averaging this Rage Pro is in relative numbers only hair slower than Voodoo Graphics. Voodoo Rush is trailing behind by almost 20%. Differences in minimal fps mimic average frame rate.
Image quality is exactly identical to Voodoo Graphics, so the gallery is really the same for both. Again, please be aware those screenshots may not represent real quality as Rush also uses 24-bit color dithering to native 16-bit RGB buffer using 4x4 or 2x2 ordered dither matrix and I do not know how to capture actual image after ramdac filtering. Rush supports resolutions below 512x384 and can play Direct3D games which first Voodoo could render only via Glide. So there are more results this time, though some are still missing. Myth 2 did not work, Turok and Ultimate Race Pro glitched too much under new Direct3D driver and strangely in Tomb Raider the Rush does not perform bilinear filtering. With so few exceptions Voodoo Rush now exhibits excellent compatibility which is a story much different from earlier drivers. It took at least one year to get rid of basic bugs. No wonder Rush users were disappointed.
Throughout 1997 competition was catching up with Voodoo Graphics. 3dfx of course wanted to stay performance leader, securing big margins of high end market. Looking back it seems they became victims of their own success. 3dfx laid out ambitious 2d/3d core codename Rampage but it could not be finished in time before losing the crown. 3dfx went on to tweak existing Voodoo architecture and created Voodoo2 as a stop gap solution for 1998. Voodoo2 was very impressive upon it's introduction, but looking back the board is rather a desperate measure. This 3d-only multitexturing chipset consisted of three chips, each still with with it's own memory interface. Competition catch up with such complicated solution within several months. 3dfx then decided to use completed 2d part of Rampage and created finally all-in-one single chip graphics solution. In second half of the year the company reach out for bigger value market with a cost-effective Banshee. Delays on Rampage however caused uncertainty and 3dfx decided to go with old Voodoo architecture into 1999 as well. Voodoo3 was born, while being a good and effective product of it's own the leadership of 3dfx was gone. And developers were getting seriously disappointed with 3dfx not delivering promised feature rich architecture and pressing onto them Voodoo limitations, mainly small textures and 16 bit output. While still working on Rampage a new parallel project was launched to update the Voodoo architecture to new standards. The company quickly loosing revenue welcomed year 2000 with workforce cuts. Yet in March 2000 3dfx bought Gigapixel, developer of tile based rendering architecture targeting among others upcoming Xbox. However, that deal went to (surprise) arch rival Nvidia. Last iteration of Voodoo, the Napalm chip was delayed. With 14 million transistors it was a tough nut to crack for 0.25 micron manufacturing. Execution of new products struggled, monstrous four chip Voodoo5 6k was never released and more importantly single chip Voodoo4 was delayed too much and was smashed by GeForce 2 MX, Kyros and cheap Radeons. In November 3dfx decides to quit card manufacturing business. At the end of 2000 3dfx despite finishing Rampage was forced to admit bitter defeat. All assets were sold to Nvidia and after much enticement needed to overcome long and heated rivalry most of 3dfx engineers went also. Many fans keep on cheering for the company long time after its demise, yet 3dfx was obviously not perfect. They lost Dreamcast contract to PowerVR, tried some patent trolling with multitexturing, and alienated board manufacturers by making their Voodoo3 exclusive. 3dfx took a lot of undeserved punishment for lack of 32 bit colors or AGP features. Now everyone knows Voodoos were internally often working with higher color precision than others and 16 bit output was used only to save memory bandwidth. And AGP had little to no use for high performance solutions. Inability to market their advantages cost 3dfx dearly. But there is no excuse for horrible execution. The big lesson is not to miss product cycles just because your new architecture is not so great as you hoped it would be. Sticking with the old one is not going to be better.