PowerVR PCX1 review

From arcades to PC

PowerVR's first consumer incarnation, the Midas 3 produced only for Compaq Presario line, was a very limited deal. In second half of 1996 Videologic/NEC delivered first integrated chip for whole market. It was called PCX1 and put together single ISP, TSP, bus bridge and coupled them with lot more texture memory. It claimed performance of ~250,000 sustained 100 pixel triangles per second at 640 x 480 resolution, and 30 frames-per-second true color speed with every pixel mip mapped, textured, fogged, lit and shaded. And with improved PowerVR architecture, those numbers should be closer to real world performance than others.

If it only was so simple

The card

PCX boards are the most minimalistic. The memory of PCI interface is gone, and not missed. Clock is taken directly from the crystal, texturing SDRAM is connected through narrow 32-bit memory bus and there are only a few SMDs around 0.5-micron chip hiding all that smart technology. No output is needed since images are sent to the primary card over the PCI bus. While it means lower cost, easier installation, and no signal concerns like with Voodoo dongles this also makes PCX somewhat sensitive to PCI bandwidth and primary card. I have done my own investigation whether the PCI bandwidth can impact performance in a measurable way, and the answer is negative. Also, PCX does not work as an optional renderer, by default, its hardware abstraction layer is added to the primary card, making it virtually single device. Since the primary card serves as a framebuffer for the PCX, available 3d modes are depending on the amount of video memory of the primary card. With for example 2 MB primary video memory the PCX will be still able to reach its maximum 1024x768 resolution, but single buffered, which may cause serious tearing.

First PCX is ticking at 60 MHz. On the top is Videologic's Apocalypse 3D. Their boards are ready for SGRAM not by an accident. PCX1 supports one megabyte SGRAM modules, but all cards I've seen featured 4MB SDRAM.

Down here is a not so well known I/O data board.

Architecture

The chip's design has three main highlights: Infinite Planes, 'tiled deferred' rendering and full 32 bit floating point on-chip Z-buffer. What is behind planes? PCX can work with more than triangles and quadrilaterals. Geometry can also be defined merely by two vectors, creating a plane which can for example efficiently represent large surface such as ground or sky. What more, by combining more planes a three dimensional convex models can be "carved out" and dynamically changed. Important procedures such as collision detection works on planes as well. Thanks to volumes defined by infinite planes, shadows and lights can be cast from any object over any surface. If geometry of the object casting volume would create too many planes, simplified hidden object can be used instead of visible model to project easier computed volume. SGL also supports static volumes made of planes, anything within the volume will be precalculated as lighted/shadowed. Volumes do not need to be projected ad infinitum, planes in scene can be defined to only limit volumes.
Tile rendering divides the screen into small tiles and draws all of it, dump it to primary card and than moves onto another tile. Tiles are small enough to fit into fast cache of ISP (12 kB for first generation), where hidden surfaces can be easily removed. PowerVR first gathers all the geometry for given frame, discards hidden ones and then for each pixel finds closest surface from which color will be determined. Only visible surfaces are textured, shaded and lit. If the surface is translucent, color values are stored in an accumulation buffer which can still only store up to 16 values for operations determining final pixel color. This technique which traces depth of scene per pixel goes well in hand with infinite planes special effects, since calculation of volume for effects needs such tracing as well. Z-buffer with it's hassles is completely eliminated, leaving more memory capacity and bandwidth to texturing. Insensitive to scene complexity PCX chips can achieve impressive performance with anemic 32 bit memory bus.

This image is from Videologic's propagation, but does not look far from reality.

On the negative side gathering all the data for display list requires some more space of PowerVR video memory. Also data for triangles overlapping more tiles have to be send to the chip multiple times, but the additional traffic is much lower than what deferred tiling saves. PCX1 features per-pixel mip mapping which adaptively removes texture samples with distance, effectively reducing texture shimmering. Perspective correction of textures looks flawless. But here the list of nice features ends. Missing are specular highlights and texture interpolation. Fog does not work under d3d. Only basic alpha blending is supported, and under d3d even color key transparency can fail. Needless to say PCX1 suffers from many transparency artifacts in quite a few Direct3D games. PowerVR was not the only company developing deferred renderer, but nobody else managed to bring them to the consumer market.

Testing

Infinite planes effects were not part of Open APIs and full capabilities of PowerVR can be exploited only through proprietary SGL. The API also supported data instancing to save memory by avoiding duplicates of materials, transformations and objects. I got three miniGL libraries. Version 1.0.0.1 from Apocalypse 3Dx cd for Quake 1 games, another version 1.0.1.9 which came with Quake 2 and finally 1.0.6.0 which came with Sin and seems fastest. Those OpenGL games usually have textures high enough to not miss bilinear filtering, but lightmaps are too blocky. Still the results are impressive considering difficulties of making PCX architecture behaving like OpenGL accelerator. Latest official drivers and minigl 1.0.6.0 were used for both PCX chips. I tried three primary cards to check their impact on PCX speed. My test system does not have an ISA which would be fun to choke PCX with. But there is little to no difference between PCI and AGP primary card. Riva 128 AGP was faster than Millenium PCI however usually within margin of error. Trident 9750 EDO was very close. Then I tried A50 SiS 6326 AGP card and got 10% slower GLQuake. Primary card has to do final frame buffer output so some performance difference should be expected. But I was really surprised to see Quake scores drop by 20% with Trident Blade 3D and Verite 2. I cannot gather any rule which primary card should do well with PCX. In the end I chose Virge GX2, it may not be the fastest support for PCX1 but reliably provides high results across all games and resolutions. Newer 3D cards often deny giving up their priority and refuse Direct3d input from PCX. Last note is about installation. My PCX boards are spoiled babies demanding specific resources, I have to often try them and their primary cards in all PCI slots until they kindly decide to operate.


Lighting of Quake cannot be rendered properly with PCX
View PCX screenshot gallery

Options

AFAIK there is no soft way to overclock PCX cards, but there is easy crystal mod. Base clock comes directly from the crystal, so it's frequency equals core/memory clock. I am not touching my cards, since they are nowadays hard to find. In the PCX datasheet I found separate clock inputs for core and memory clock, so perhaps asynchronous clocks are possible with extra crystal. Control panel offers 24-bit mode and PCX1 not only supports this frame buffer output but does mixing of texture, lighting, and shading in 24-bit mode. This means also better than usual 16 bit mode. Such color precision in 1996 is exceptional. But true colors were consider unreasonable framerate killer until the end of the decade, how does it impact performance?

Negligibly. Having most of the core working in high precision and memory bandwidth saving tile rendering made it possible. On the other hand, quality difference between good 16 bit and 24 bit in such old games is quite small. Also support for texture formats is far from optimal.

Gaming with PCX1

SGL games are usually a blast, Tomb Raider is playable at 1024x768 and Ultimate Race with day/night transitions shows lighting/shading effects of Infinite Planes.


Actua Ice Hockey 2 features SGL renderer up to 1024x768 (full size on click).
It is an example of SGL too heavy for PCX1 to achieve pleasant framerate at such a high resolution.

Turok custom renderer however displays some limitation, no fog and problematic transparencies, just like Direct3d version which is a bit slower. Unreal with last patch have SGL broken, stick to older versions. DirectX compatibility after long fiddling appears to be acceptable. Several titles failed with optimized renderer. Switching to standard helps some incompatibilities but sacrifices performance. Fifa RTWC requires standard, so does NHL 99 but is far from playable. Objects in SOTE are rendered properly only with standard option, which runs one third slower. Even those games working with optimized setting sometimes ran too slow for a chip of such high performance figures. Unusual rendering technique often needed extra driver effort for games written with only immediate renderers in mind and by far not everything runs properly. There are z-artifacts in Cart precision Racing, polygons under the track are popping up, broken transparency and sky. Expendable needs forced 16 bit textures for proper colors and blending is mess. Some textures in Incoming are missing. Result of Wing Commander Prophecy shouldn't be taken seriously due to inability to texture space. MDK d3d is unusable. Shadows in Myth II are broken and various other artifacts are seen. Ultimate Race Pro (d3d) suffers from many minor artifacts. Fence in Viper Racing is warping frenetically. Texture formats without alpha has to be set in Forsaken. To avoid transparency issues, special PowerVR path has to be selected. Even then performance is nothing to write home about. 5650 RGBA textures are not available, green can only have five bits. But some games have problem with 5550 too. Then four bits per channel textures are needed, causing heavy banding. Vertex fog under Direct3d is impossible. Overall PCX1 is a bit disappointing in DirectX. Trends of open APIs fell hard on this chip. It must had hurt PowerVR even more since deferred rendering API from Microft was not released despite being developed first.


Without texture filtering one can sometimes be in doubt wheter his accelerator is used

Conclusion

PowerVR was confident with it's PC product and created "Ultimate Certification" for games of the highest quality, not only in technical capabilities but also in game play and market appeal. Games in this category should achieve a minimum frame rate and resolution of 30 fps at 800x600. All of this must be done with a minimum of 16.7 million colors and should take advantage of all advanced features of PowerVR. Titles like Virtua On by Sega Entertainment, Rave Racer, Tekken 2 and Air Combat 22 by NAMCO and Ultim@te Race by Kalisto Entertainment. Namco however cancelled PC development hurting PowerVR at start. High dependency on CPU speed did not help either. When a retro gamer looks now at PowerVR hardware there are no killer games dedicated for it, even the Ultimate Race was later released in improved version with Direct3D support. Most interesting are few Sega games which looked better than on console and that's about it.
PCX1 was a strong and cost effective contender, but being only tiling architecture on the market it did not receive much consideration from game developers. Performance is not completely convincing, which is still plenty for the price. But compatibility in Direct3d leaves something to desire. Infinite planes technologies seemed out of place while bilinear filtering became a must. Lack of games fully exploiting the chip made it look inferior and as the time went by things were only getting worse. Videologic/NEC however quickly released much improved PCX2.

continue to PCX2 review