HelenOS Blog

Tetris on Niagara

pavel — Thu, 19 Mar 2009 22:59:30 +0000

During the last three months I have been trying to port HelenOS to the sun4v UltraSPARC architecture (Niagara). The sun4v architecture is very interesting from the system programmer’s point of view, so I will definitely put a link to my master thesis here (but I have write the thesis first, of course).

For the time being, you can just enjoy the screenshots of Tetris running on a (Simics-simulated) T2000 server and on a (remotely accessed) real T1000 server.

The only big feature missing is a support for SMP. This is what I will play with in the following weeks, because Niagara without multiprocessing would be like… (If you have an idea of a good simile, post it as a comment).

Tetris on Simics T2000 server

Tetris on T1000

Two plugins I can’t imagine Vim without

pavel — Thu, 19 Mar 2009 21:41:20 +0000

I like Vim. Especially for editing C code. I would like to introduce two extensions I especially like.

The first one is a Vim binding for Cscope. Cscope is a tool which makes an index from your source tree and uses the index to quickly tell you

where a function with a given name is defined,
where a function with a given name is called from,
where a given string occurs in your source files,
etc.

Cscope can be used either as a standalone command-line application, or via a binding into a text editor. Using the Vim binding for Cscope is very easy. Imagine you would like to get a list of all places from which a function is called. All you must do is to move the caret above the function name and press a certain key combination (Ctrl+\ c). You can use Vim command line as well (:cs find c my_function). Having the list, you just type the number of the occurrence you are interested in and Vim will jump to the occurrence.

Since this is a blog about HelenOS, I will mention how to use Cscope for HelenOS development. In our Makefile we define a target called cscope. The target creates the Cscope index. Just cd to the HelenOS root directory and type

make cscope

and an index file called cscope.out will be generated for you. Now when you start Vim from the HelenOS root directory, it will automatically connect to Cscope so that the above key shortcuts (or Vim commands) can be used. A tutorial on using Cscope with Vim can be found at http://cscope.sourceforge.net/cscope_vim_tutorial.html.

The second Vim extension I appreciate is called Mini Buffer Explorer (http://vim.sourceforge.net/scripts/script.php?script_id=159). I don’t know exactly what is meant by ‘buffer’ in Vim… So let me simply say that this extension allows you to switch between opened files using tabs. It is especially useful in combination with the Cscope binding. If you jump from a location in file a.c to some location in file b.c, the latest edit location in file a.c will not be lost. Instead, two ‘tabs’ will be shown above the edit area: [a.c] and [b.c]. When you are done with investigating the contents of the b.c file, you will just switch back to a.c using the tabs.

With the Mini Buffer Explorer extension you can have as many concurrently opened files in one running Vim instance as you want. To open a new file, just use a standard Vim command :E. To close a current file, use the :bd command (:q would close Vim completely).

Your favorite Linux distribution will probably contain a package called ‘cscope’. Installing the two extensions is just a matter of copying a couple of files to the ~/.vim/plugin directory.

Mini Buffer Explorer with two tabs representing two concurrently opened files

Looking up a place where a function is defined or declared

Opening a file using the :E command

Debugging using stripes

pavel — Sat, 15 Nov 2008 18:08:59 +0000

After several months of playing with a simulated Serengeti machine I have started to have some fun with a real hardware containing an UltraSPARC III CPU. It is a Sun Blade 1500 workstation which I’ve been lent by Sun Microsystems globalization division.

A Spanish (or French?) keyboard is connected to it (you know, globalization division…), but this is not the most interesting challenge. The most interesting thing is a way how to debug the kernel on such a machine.

The kernel crashes before the output gets initialized, so it is not possible to use printf to identify the exact point of failure. With some help of the OpenBoot PROM, however, it is possible to find an area in the physical address space to which the screen is (pixel by pixel) mapped and draw something to the screen by writing to that area.

The code which makes it can look like this:

static void sb1500stripe(int color, int position)
{
    int i;
    for (i = 0; i < 0xA00; i++)
        asm volatile ("stba %0, [%1] 0x15 \n" ::
            "r" ((color)),
            "r" (0x7f708000000 + (position)*0x1400 + i));
}

This will draw a thin horizontal stripe. The value 0×7f708000000 is the physical address where the framebuffer starts, position determines a distance from the top of the screen.

The basic idea how to use this for debugging purposes is simple: call sb1500stripe from several places of a function you suspect of causing the crash, each time passing a different value of position to it. Then deduce the exact line in your source code which is to blame.

The idea is simple, yet not always working. There are basically two problems with this approach. The first one is that some functions do not cause the crash generally, but only when called with some special parameters. If such a function is called n times before it causes the crash, it succeeds n - 1 times, each time drawing all the stripes. From the programmer’s point of view all the stripes have been drawn, so it seems that the function has not caused any crash at all. The solution is straightforward: at the beginning the function must clear (blank) the area where it is about to draw the stripes.

The second problem is much more tricky. When I first encountered it, it seemed that malloc failed when calling slab_alloc, but the slab_alloc function seemed to proceed until the return statement. I was quite confused. What the hell can break during the return? I checked the value of the CANRESTORE register (maybe some strange error during register window fill?), but its value was 3. Then I was trying to find (using binary lookup) the value of the %i7 register (which contains the return address during the function call), which was complicated by the fact that this value depends on the kernel binaries, hence changes when you modify the kernel code (which I was doing). After several hours I found out that even the %i7 register was correct.

Actually, the return statement was causing no failure at all! But the debugging method was wrong. What happened was that the slab_alloc function called itself recursively. The nested call succeeded (painted all the stripes), but not the call which made the recursive call. Blanking the area for the stripes did not help here, as all the stripes were being painted during the recursive call (i. e. after the blanking took place).

The solution which I came with and which works just fine for me is that I have a static variable (call_order) defined in the suspect function, which is being incremented (and copied to a local variable call_order_copy) at the beginning of the function call. The position of the particular stripe depends not only on the place within the function from which it is being painted, but also on the value of the call_order_copy variable. So every call of the suspect function has its own set of stripes. From that the exact point of failure within the suspect function can be determined.

Standard output to Simics CLI window

pavel — Fri, 19 Sep 2008 15:35:00 +0000

Simics is a brilliant tool, but what it lacks is a simple way how to make an output from the simulated code. On some simulators, such as MSIM, the simulated code can print a character by writing its code to a special address of memory. It is a priceless feature - debugging output is possible without having to write a driver for a framebuffer or a serial console.

I’ve had some troubles making the graphical console work on the Serengeti machine. On the Simics forum (https://www.simics.net/mwf/forum_show.pl) a guy from Virtutech told me that it is theoretically possible to configure the simulated Simics machine to support a graphical console, but…

First, why do you want the graphical console in the serengeti machine? The Serengeti architecture does not have any serial ports where mouse and keyboards can be connected, i.e. even if you get the graphics card to work, it can only be used to view output. No interactive programs can be run using it. The older SunFire architecture has better support for a graphical device with keyboard/mouse input.

The graphics card was never tested as output device for OBP. Instead the machine was booted as usual and then an X server was started with no mouse/keyboard configured. This required some manual setup and since this was several years ago, I don’t know if it works with more modern Solaris versions.

As HelenOS UltraSPARC port does not support the serial console yet, I will have to write a serial console driver one day. It is, however, difficult to debug a kernel with no possibility of debugging output and I don’t have mood for writing the serial console driver now. So I feel a little bit envious of the ones who can debug their kernels in MIPS-like simulators. Wait a moment… Is it really impossible to use some simple way of output in Simics? With some sort of invention, one can achieve it!

Simics API (see http://www.cs.sfu.ca/~fedorova/Tech/simics-guides-3.0.26/simics-reference-manual-public-all.pdf to get its full description) can be used by Python scripts, which can be included directly in the Simics configuration file.

An API function called SIM_realtime_event exist, via which a Python routine, which will be called in a given number of milliseconds, can be registered. It is therefore possible to regularly run a background Python routine while the simulation is running.

Such a routine can use other Simics API functions, such as SIM_read_register, SIM_write_register, SIM_read_phys_memory or SIM_write_phys_memory to “communicate” with the simulated code. It can also use standard Python output routines to write to the Simics CLI window.

The final solution uses a buffer the simulated code writes to and the Python routine reads from (bytes to which the simulated code is free to write are set to zero). The address of the buffer is transmitted to the Python script by

setting the g3 register to the buffer address, and
writting a magic value to the g2 register.

The code on the HelenOS side looks like this:

/* all buffer positions are free at the beginning */
uint16_t i;
	for (i = 0; i < BUFSIZE; i++) {
buffer[i] = '\0';
}

/*
 * pass the address of the buffer to the Simics' Python script
 *   - write it to the %g3 register
 *   - write the magic value to the %g2 register
 *     (so that the script knows that the value in %g3 is valid)
 *   - loop until the value is read
 *     (the script notifies us by setting %g2 to 0)
 */
asm volatile (
	"or %0, 0, %%g3\n"

	"set 0x18273645, %%g2\n"

	"0: cmp %%g2, 0\n"
	"bnz 0b\n"
	"nop"
	:: "r" (buffer)
);

And on the Python side:

if ((buf == 0) and (register2Value == 0x18273645)):
	buf = SIM_read_register(SIM_current_processor(), register3Number);
	SIM_write_register(SIM_current_processor(), register2Number, 0);

Once the Python script knows the location of the buffer, the simulated code can write to the buffer…

/** Writes a single character to the Simics CLI.
 *
 * The character is not written immediately, but it is stored to the first free
 * position in the buffer, waiting for Simics' Python routine to pick it
 * and print it.
 */
static void simics_putchar(struct chardev * cd, char c)
{
	/* the first free position in the buffer */
	static uint16_t current = 0;

	/* '\0' terminates a contiguous block of characters to be printed! */
	if (c == '\0')
		return;

	/* wait till buffer is non-full and other processors aren't writing to it */
	while (1) {
		while (buffer[current] != 0)
			;
		if (spinlock_trylock(&simics_buf_lock))
			break;
	}

	buffer[current] = c;

	current = (current + 1) % BUFSIZE;
	membar();

	spinlock_unlock(&simics_buf_lock);
}

… and the Python script can read from the buffer, printing the characters to the CLI window:

byte = SIM_read_phys_memory(SIM_current_processor(), buf + offset, 1);
while byte != 0:
	SIM_putchar(byte);
	SIM_flush();
	SIM_write_phys_memory(SIM_current_processor(), buf + offset, 0, 1);
	offset = (offset + 1) % 512;
	byte = SIM_read_phys_memory(SIM_current_processor(), buf + offset, 1);

This mechanism is easy and significantly faster than writing the output to the graphical console. Moreover, as the Simics CLI runs in the Unix terminal window, the output can be processed by standard Unix means (the tee command, pipes, etc.). It will be necessary to implement a serial console driver in the future, but for the time being, the mechanism described in this post is pretty sufficient.

Bootable CD for Serengeti

pavel — Tue, 12 Aug 2008 21:50:18 +0000

Recently I have been trying to create a bootable HelenOS CD for the Serengeti machine containing the UltraSPARC III processor. I decided to use SILO as a bootloader. The Sunfire machine with the UltraSPARC II processor was able to boot from the CD without any problems. On Serengeti, however, the boot process ended with an error. SILO even did not print its banner.

I was trying to boot different operating systems on the Serengeti machine. With Aurora Linux, I got exactly the same error. OpenBSD and NetBSD printed their banners, but ended with an error later. OpenSolaris (marTux distribution) booted without any problems.

As Linux and HelenOS (both using SILO) were unable to boot due to the same error, I was suspicious about two things:

the SILO bootloader, and
the way how the bootable CD is being made.

After checking the mkisofs manual pages, I was almost convinced that the way how the bootable CD is being made is alright. Therefore, SILO seemed to be the most probable cause. The problem was that the SILO website had been out of order since January, it seemed that the SILO community had not paid for the sparc-boot.org domain. I found an archived version of the website at http://web.archive.org, but found no solution to my problem. I also mailed to the sparc-linux mailinglist, but (which I was really disappointed by) got no answer.

My colleages adviced me to explore the binary contents of the CD, find the SILO code inside it and then to find out whether the processor really executes the code or whether it executes some rubbish. I learned how to use hexdump and dd utilities (see http://helenos.pavel-rimsky.cz/doku.php?id=handy_shell_commands for my notes) and found the location of the SILO’s isofs.b file inside the image of the CD. Finding out whether the CPU is executing the right instructions was an easy task, as I tested the CD on the Simics simulator, not on a real piece of hardware (see http://helenos.pavel-rimsky.cz/doku.php?id=handy_simics_commands for some handy Simics commands).

I was a little bit confused. The 2nd to 8th instructions of the isofs.b file where illegal trap (ILLTRAP) instructions. These were the instructions that the Serengeti machine failed due to! Firstly, how come the ILLTRAP instruction is there in the isofs.b file? Secondly, how come it works without any problems on the US-II machine, even though the instructions are illegal on US-II as well?

I mailed to the HelenOS mailing list to discuss it with more experienced hackers. More experiments showed that the Sunfire (US-II) machine does not execute the first eight instructions of the isofs.b file, but its firmware jumps to the nineth instrution. On the other hand, the Serengeti (US-III) machine’s firmware jumps to the first instrucrion contained in the isofs.b file, thus ending with an Illegal Instruction trap. What the hell the eight instructions mean?

Jakub Jermář suggested (and verified) that those eight “instructions” are in fact the ELF header of the SPARC executable. While the Sunfire (US-II) machine understands this format and its firmware really loads the isofs.b file into the memory without this 8-bit header, the Serengeti machine loads the whole contents of the isofs.b file onto memory, jumping to the fist word of it. The first eight words are not instructions, but an ELF header, so the Illegal Instruction trap occurs.

The solution is pretty easy. Just remove the first eight words from the isofs.b file (to be more precise, those eight words are contained after the initial 512-byte sequence of zeros in isofs.b; the sequence of zeros is ignored by the Firmware, however).

I wrote a bash script which downloads SILO from WWW and patches it. You can download it from http://helenos.pavel-rimsky.cz/doku.php?id=download_and_patch_silo. Enjoy!