This is just a quick dump of how I see various emulators handle the Famicom Tape Recorder data.
On real hardware, data is written to a memory address (which?) and that is converted to analog audio via the keyboard's port (or via that other accessory that can do it). In the case of emulators, that whole analog step is skipped and the digital data written to the memory address is simply stored into a file so that it can be played back later.
They all essentially work the same way: watch that memory address and record its value every n cycles. What varies is the rate at which they sample the data and the way that they go about storing that data.
Playback is just the same process in reverse, so I won't bother explicitly going over that.
I will note that in the past I had mentioned that some emulators (which?) will store the tape data in WAV format without a header. Having done more research, I can say that this is not correct. It only so happens that the way that some emulators store the data can be interpreted as a rather basic waveform (if you can pick a samle rate close enough to the emulator's sample rate). Another way to put it is that the digitization of audio by taking rapid samples of the the analog data is pretty much the exact same approach taken by the emulator to record the tape data, so it is not surprising that the two formats can be, in some case, interchangeable.
Unless otherwise noted, I am deriving the formats used by emulators by analyzing their source code (so I'm not just guessing!).
References: https://github.com/SourMesen/Mesen/blob/master/Core/FamilyBasicDataRecorder.h
Takes a sample every 88 cycles. This would be approximately 20000 samples per second.
More specifically, it is:
$$
\frac{178773}{88} = 20338.3295
$$
So, 20338.3295 samples per second (the ClockRateNtsc
constant divided by 88).
Stores each sample as a 0 or 1 bit (for low or high, respectively), compressing eight samples into a single byte (little-endian).
The clever compression being done here prevents the format from being read as a wave file. (Unless it is possible to store more than one sample into a single byte within a wave file, but this did not seem possible to me according to my research).
References:
Nestopia source code:
core/input/NstInpFamilyKeyboard.cpp
(around line 487)
core/NstBase.hpp
(around line 301)
Writes 0x90 for high and 0x70 for low.
Defines a TAPE_CLOCK
constant as 32000, so that is likely the sample rate.
Takes a bit of math, but they say clock speed is 1789772.7272:
$$ \begin{align} \frac{(39375000 \times 6)}{11} = 2147272.7272\\ \frac{2147272.7272}{12} = 1789772.7272 \end{align} $$
which is right where you'd expect it to be for NTSC.
So that works out to be a sample every 55.9303 cycles: $$ \frac{1789772.7272}{32000} = 55.9303 $$
References:
Virtuanes source code: NES/Nes.cpp
(around line 3671)
Sample rate is Famicom clock speed divided by 32000.00.
They define the NTSC clock speed as 1789772.5 (NES/Nes.h
).
So sample rate would be every 55.9303 cycles: $$ \frac{1789772.5}{32000.00} = 55.9303 $$ Code comments suggest that the developer tried a sample rate of 22000, but ran into some unspecified issues.
Will write out 0x90 for high and 0x70 for low.
Given that a full byte is used for each sample, this format can be interpreted as a wave file (with sample rate of 32000).
As an aside, the logic for reading in tape data will look for values greater than 0x8C or lower than 0x74 (as opposed to looking for 0x90 or 0x70 exactly), which means that it would be a bit forgiving when fed in a properly prepared wave file from another source. (Nestopia is also doing this...i wonder if one used the other as a reference...or maybe they cribbed the same reference...).