Recently I have been trying to create a bootable HelenOS CD for the Serengeti machine containing the UltraSPARC III processor. I decided to use SILO as a bootloader. The Sunfire machine with the UltraSPARC II processor was able to boot from the CD without any problems. On Serengeti, however, the boot process ended with an error. SILO even did not print its banner.


SmartFirmware, Copyright (C) 1996-2001. All rights reserved.
Boot path: /ssm@0,0/pci@19,700000/scsi@2/disk@6,0:f Boot args:
ERROR: Illegal instruction
debugger entered.
ok

I was trying to boot different operating systems on the Serengeti machine. With Aurora Linux, I got exactly the same error. OpenBSD and NetBSD printed their banners, but ended with an error later. OpenSolaris (marTux distribution) booted without any problems.

As Linux and HelenOS (both using SILO) were unable to boot due to the same error, I was suspicious about two things:

  • the SILO bootloader, and
  • the way how the bootable CD is being made.

After checking the mkisofs manual pages, I was almost convinced that the way how the bootable CD is being made is alright. Therefore, SILO seemed to be the most probable cause. The problem was that the SILO website had been out of order since January, it seemed that the SILO community had not paid for the sparc-boot.org domain. I found an archived version of the website at http://web.archive.org, but found no solution to my problem. I also mailed to the sparc-linux mailinglist, but (which I was really disappointed by) got no answer.

My colleages adviced me to explore the binary contents of the CD, find the SILO code inside it and then to find out whether the processor really executes the code or whether it executes some rubbish. I learned how to use hexdump and dd utilities (see http://helenos.pavel-rimsky.cz/doku.php?id=handy_shell_commands for my notes) and found the location of the SILO’s isofs.b file inside the image of the CD. Finding out whether the CPU is executing the right instructions was an easy task, as I tested the CD on the Simics simulator, not on a real piece of hardware (see http://helenos.pavel-rimsky.cz/doku.php?id=handy_simics_commands for some handy Simics commands).

I was a little bit confused. The 2nd to 8th instructions of the isofs.b file where illegal trap (ILLTRAP) instructions. These were the instructions that the Serengeti machine failed due to! Firstly, how come the ILLTRAP instruction is there in the isofs.b file? Secondly, how come it works without any problems on the US-II machine, even though the instructions are illegal on US-II as well?

I mailed to the HelenOS mailing list to discuss it with more experienced hackers. More experiments showed that the Sunfire (US-II) machine does not execute the first eight instructions of the isofs.b file, but its firmware jumps to the nineth instrution. On the other hand, the Serengeti (US-III) machine’s firmware jumps to the first instrucrion contained in the isofs.b file, thus ending with an Illegal Instruction trap. What the hell the eight instructions mean?

Jakub Jermář suggested (and verified) that those eight “instructions” are in fact the ELF header of the SPARC executable. While the Sunfire (US-II) machine understands this format and its firmware really loads the isofs.b file into the memory without this 8-bit header, the Serengeti machine loads the whole contents of the isofs.b file onto memory, jumping to the fist word of it. The first eight words are not instructions, but an ELF header, so the Illegal Instruction trap occurs.

The solution is pretty easy. Just remove the first eight words from the isofs.b file (to be more precise, those eight words are contained after the initial 512-byte sequence of zeros in isofs.b; the sequence of zeros is ignored by the Firmware, however).

I wrote a bash script which downloads SILO from WWW and patches it. You can download it from http://helenos.pavel-rimsky.cz/doku.php?id=download_and_patch_silo. Enjoy!