Famicom Emulator Tape Formats

This is just a quick dump of how I see various emulators handle the Famicom Tape Recorder data.

On real hardware, data is written to a memory address (which?) and that is converted to analog audio via the keyboard's port (or via that other accessory that can do it). In the case of emulators, that whole analog step is skipped and the digital data written to the memory address is simply stored into a file so that it can be played back later.

They all essentially work the same way: watch that memory address and record its value every n cycles. What varies is the rate at which they sample the data and the way that they go about storing that data.

Playback is just the same process in reverse, so I won't bother explicitly going over that.

I will note that in the past I had mentioned that some emulators (which?) will store the tape data in WAV format without a header. Having done more research, I can say that this is not correct. It only so happens that the way that some emulators store the data can be interpreted as a rather basic waveform (if you can pick a samle rate close enough to the emulator's sample rate). Another way to put it is that the digitization of audio by taking rapid samples of the the analog data is pretty much the exact same approach taken by the emulator to record the tape data, so it is not surprising that the two formats can be, in some case, interchangeable.

Unless otherwise noted, I am deriving the formats used by emulators by analyzing their source code (so I'm not just guessing!).

Mesen

References: Core/FamilyBasicDataRecorder.h

Takes a sample every 88 cycles. This would be approximately 20000 samples per second.

More specifically, it is: $$ \frac{178773}{88} = 20338.3295 $$ So, 20338.3295 samples per second (the ClockRateNtsc constant divided by 88).

Stores each sample as a 0 or 1 bit (for low or high, respectively), compressing eight samples into a single byte (little-endian).

The clever compression being done here prevents the format from being read as a wave file. (Unless it is possible to store more than one sample into a single byte within a wave file, but this did not seem possible to me according to my research).

Nestopia

References: Nestopia source code: core/input/NstInpFamilyKeyboard.cpp (around line 487) core/NstBase.hpp (around line 301)

Writes 0x90 for high and 0x70 for low.

Defines a TAPE_CLOCK constant as 32000, so that is likely the sample rate.

Takes a bit of math, but they say clock speed is 1789772.7272: $$ \begin{align} \frac{(39375000 \times 6)}{11} = 2147272.7272\\\\ \frac{2147272.7272}{12} = 1789772.7272 \end{align} $$ which is right where you'd expect it to be for NTSC.

So that works out to be a sample every 55.9303 cycles: $$ \frac{1789772.7272}{32000} = 55.9303 $$

Virtuanes

References: Virtuanes source code: NES/Nes.cpp (around line 3671)

Sample rate is Famicom clock speed divided by 32000.00.

They define the NTSC clock speed as 1789772.5 (NES/Nes.h).

So sample rate would be every 55.9303 cycles: $$ \frac{1789772.5}{32000.00} = 55.9303 $$ Code comments suggest that the developer tried a sample rate of 22000, but ran into some unspecified issues.

Will write out 0x90 for high and 0x70 for low.

Given that a full byte is used for each sample, this format can be interpreted as a wave file (with sample rate of 32000).

As an aside, the logic for reading in tape data will look for values greater than 0x8C or lower than 0x74 (as opposed to looking for 0x90 or 0x70 exactly), which means that it would be a bit forgiving when fed in a properly prepared wave file from another source. (Nestopia is also doing this...i wonder if one used the other as a reference...or maybe they cribbed the same reference...).

Nintaco

References: Nintaco source code, specifically:

TVSystem.java
DataRecorderMapper.java
FamilyBasicPrefs.java
BitList.java

By default, takes a sample every 88 cycles, but this is configurable in the GUI. Their .tape format does not store this rate anywhere, so if you deviate from the default you'll have to know what rate was used in order to load it properly. But assuming those defaults, we can take its calculated NTSC clock speed and work out the sample rate:

$$ \begin{align} \frac{19687500}{11} = 1789772.7272 \\\\ \frac{1789772.7272}{88} = 20338.3264 \end{align} $$

That puts us at 20338.3264 samples per second -- just like Mesen.

Nintaco stores tape data into a propriatry .tape format. It starts with three 32-bit integers. If you take a look at the BitList.java file, you can see more clearly what these values are and how they are used. Briefly, though, they are:

capacity - The length of the backing Java int[] array that holds the tape data in Nintaco, multiplied by 32 (since it stores 32 bits per int in the array). This is often larger than the data itself.
size - The number of bits (or number of samples) written during the recording.
length - The number of Java ints it would take to hold all of the bits in the recording.

When it comes to storing the actual tape data, Nintaco takes a similar approach to Mesen, in that it packs the samples into single bits. However, Nintaco uses groups of 32 bits rather than groups of 8 bits, so the approach to reading/writing them varies a little bit.

Here's another way of describing the difference:

Assume the following stream of 32 samples, each of which we've given a name, starting from 0 and going through x. We'll call this the natural bit order -- the order in which you would encounter them if you were playing them on a tape:

$$ \begin{array}{cccc} \texttt{0 1 2 3 4 5 6 7} \\ \texttt{a b c d e f g h} \\ \texttt{i j k l m n o p} \\ \texttt{q r s t u v w x} \end{array} $$

So for Mesen .fbt, the bits are packed like this into four bytes:

$$ \begin{array}{cccc} \text{Byte 0} & \text{Byte 1} & \text{Byte 2} & \text{Byte 3} \\ \texttt{76543210} & \texttt{hgfedcba} & \texttt{ponmlkji} & \texttt{xwvutsrq} \end{array} $$

Versus Nintaco .tape, which would do it this way:

$$ \begin{array}{cccc} \text{Byte 0} & \text{Byte 1} & \text{Byte 2} & \text{Byte 3} \\ \texttt{xwvutsrq} & \texttt{ponmlkji} & \texttt{hgfedcba} & \texttt{76543210} \end{array} $$

So in both cases, samples are converted to bits and are grouped by filling from right to left -- the first bit in the stream goes on the right, and each new bit shifts leftward in the group. The difference is that Mesen is doing it in groups of 8 bits (uint8) and Nintaco is doing it in groups of 32 bits (java int).