Jan Newmarch's blog: May 2012

Wednesday, 30 May 2012

Found low latency for Karaoke with PulseAudio

As a followup to the last post: I can now play Midi files and sing along with no noticeable latency using PulseAudio. To direct the (default) microphone to the (default) speaker load the module_loopback (thanks to rusty0101):

    pactl load-module module-loopback latency_msec=1

Everything you speak/sing/holler will then be played on the speaker.

For Midi file playback I discovered FluidSynth. The following program will play a Midi file:

// See http://fluidsynth.sourceforge.net/api/

// Run by ./PlayFile /usr/share/soundfonts/FluidR3_GM.sf2 ../54150.mid

int main(int argc, char** argv)
{
    int i;
    fluid_settings_t* settings;
    fluid_synth_t* synth;
    fluid_player_t* player;
    fluid_audio_driver_t* adriver;
    settings = new_fluid_settings();
    synth = new_fluid_synth(settings);
    player = new_fluid_player(synth);

    fluid_settings_setstr(settings, "audio.driver", "pulseaudio");
// Use paman to find output device's name
    fluid_settings_setstr(settings, "audio.pulseaudio.device",
            "alsa_output.pci-0000_00_1b.0.analog-stereo");

    adriver = new_fluid_audio_driver(settings, synth);
    /* process command line arguments */
    for (i = 1; i < argc; i++) {
        if (fluid_is_soundfont(argv[i])) {
           fluid_synth_sfload(synth, argv[1], 1);
        }
        if (fluid_is_midifile(argv[i])) {
            fluid_player_add(player, argv[i]);
        }
    }
    /* play the midi files, if any */
    fluid_player_play(player);
    /* wait for playback termination */
    fluid_player_join(player);
    /* cleanup */
    delete_fluid_audio_driver(adriver);
    delete_fluid_player(player);
    delete_fluid_synth(synth);
    delete_fluid_settings(settings);
    return 0;
}

And whadda-you-know? It all works fine.

There's just the matter of hooking up a GUI, and a few thousand lines of code. But at least I'm starting from a good base, and I now realise the Java Sound framework can't give me that, sad to say.

Tuesday, 29 May 2012

In search of (low) latency

This is a followup to my investigations into playing my Songken DVD DKD files on my laptop. In an earlier blog I described how to decode the DKD files into Midi or Midi+WMA files. The intent was then to build a Midi player that would also show the notes of the melody and also the notes the singer was singing.

Well, I did all that. Java Sound has a Midi player. Java Sound has a Sampled API to handle sounds from the microphone to the loudspeaker. Java has a GUI for showing stuff. TarsosDSP by Joren Six has implemented a number of pitch detection algorithms such as YIN and they can be pulled in to give an estimate of the pitch sung. Java can convert characters from language encodings such as GB2312 to Unicode and display them so I can see Chinese and other characters. So it's all there....

... but latency still kills it. The Midi player introduces latency somehow into the sampled sounds, but even if you work around it - even if you just do sampled data alone - then there is still that little delay. Here are my Java source files. Maybe I will write up an explanation of what I was doing with them later. I'm going to stop work on them right now till I get the latency sorted out.

The standard audio system for (consumer) sound on Linux is Pulse Audio. But as Lennart Poettering explained at the Linux Audio Conference 2010, pro audio has different aims to consumer audio, and this project is closer to pro audio than consumer audio (although to think of Karaoke singers as pros is stretching it a bit :-). In consumer audio, latencies of upto 2 seconds may be permissible, while pro audio sets an upper limit of 20 milli-seconds.

Java Sound is estimated to have a 50msec delay: "These measurements suggest that the latency introduced by buffers in the "Java Sound Audio Engine" is about 50 ms, independant of the sample rate." Now that's on old equipment, but it means there is an uphill struggle.

The sound quality of the builtin soundcard HDA Intel PCH (STAC92xx) on my Dell laptop is appalling. That has to be overcome too. This laptop doesn't have a microphone input, so I started looking at USB sound cards. My first attempt was with a AnPu Portable USB 3D Virtual 5.1 Audio Sound Card Adapter Blue from Dino Direct. Dino was good: delivery post-free within 2 weeks. But the card was cheap (A$4) and broke when I inadvertently yanked it out of the USB slot.

My second attempt was with Swamp Industries for an XLR to USB Adapter. That was about A$20 but I got it with a microphone as well. The service was good again. Well, the card's okay for input, but still has to go out through the onboard soundcard.

The third attempt was with a Sound Blaster X-Fi Surround 5.1 Pro at A$70. It's a USB 1.1 device (Linux still has issues with USB 2 devices, apparently). Pulse Audio only recognises it as an input device, not as an output device, so it didn't seem to improve things.

Pulse Audio is an audio layer above Alsa (OSS was used previously to Alsa). Alsa could see the device fine:

$arecord -l
**** List of CAPTURE Hardware Devices ****
card 0: PCH [HDA Intel PCH], device 0: STAC92xx Analog [STAC92xx Analog]
Subdevices: 0/1
Subdevice #0: subdevice #0
card 2: Pro [SB X-Fi Surround 5.1 Pro], device 0: USB Audio [USB Audio]
Subdevices: 1/1
Subdevice #0: subdevice #0

and

$aplay -l
...
card 2: Pro [SB X-Fi Surround 5.1 Pro], device 0: USB Audio [USB Audio]
Subdevices: 1/1
Subdevice #0: subdevice #0
card 2: Pro [SB X-Fi Surround 5.1 Pro], device 1: USB Audio [USB Audio #1]
Subdevices: 1/1
Subdevice #0: subdevice #0

Now about this time I went off on what turned out to be a wild goose chase (at least so far) by looking at Jack: "JACK is [a] system for handling real-time, low latency audio (and MIDI)". Jack currently also uses Alsa. Now that looks good - but Java Sound and Jack don't play together.

Java Sound has a couple of weird bits where "obviously equivalent" things aren't. I hit this first with volume control in playing a Midi file: you can't set the volume on the default device but you can if you iterate through the devices and select the default one. Then you can set the volume on it. Huh? Thanks to Greg Donahue for solving that one. You hit similar problems trying to find the sound cards and you end up either with

Java Sound not playing to your default card; or
When you explicitly select the default card then Java Sound throws an exception saying that its PulseAudio drivers can't find it.

So after all that, where are we?

Java Sound has latency problems
The inbuilt soundcard is crap
Pulse Audio can't properly find the USB soundcard
Java Sound uses Pulse Audio
Pulse Audio has latency issues
Jack is ignored by Java Sound
Alsa and Jack can find the USB soundcards

Is it possible to have latency-free sound on Linux? Well, Jack claims to be latency-free, but then it has to go through the Alsa layer. Can the Alsa layer be latency-free? Not completely, but I finally figured out the following test:

arecord -f dat -B 4 -D hw:0| aplay -B 4 -D hw:2 -f dat -

i.e record at DAT standard (16 bits, 48k samples) from the builtin mike (hw:0) played on the USB soundcard (hw:2), with 4msec buffer time. And hey! It works! No latency that my poor ear can hear. This simple pipeline isn't perfect: any overrun introduces latency into the pipeline, but that can be handled in code by dropping samples. The sample size can be increased and it still sounds okay - 4ms was the lowest I could take it.

Conclusion:

the top-down approach through Java works but has latency issues
the bottom-up approach through Alsa handles latency

I just need to combine the two...

Saturday, 5 May 2012

java sound: midi and sampled streams played together

I've been playing around with my home Karaoke system. This has included decoding my Songken DKD disk. Once I did that, I wanted to emulate the behaviour of my Malata Karaoke player: playing the Midi files, showing the lyrics and also showing in a bar graph the notes that should be sung and the notes the performer is actually singing.

This requires processing files of Midi data, handling the soundcard microphone input and speaker output and using a GUI to show everything. Java Sound looks like a perfect choice for this as it can do all these things. (Although Oracle's custodianship of Java and their outrageous API copyright claims makes it increasingly difficult to justify starting a new project using Java.)

Playing a file of Midi data is easy:

    try {
        Sequence sequence = MidiSystem.getSequence(midiFile);
        Sequencer sequencer = MidiSystem.getSequencer();
        sequencer.open();
        sequencer.setSequence(sequence);
        sequencer.start();
    } catch (Exception e) {...}

Copying sound from the microphone to the speaker is a bit harder. You have to set up TargetDataLine to read bytes from the microphone, set up a SourceDataLine to send bytes to the speaker and then copy bytes from the target to the source (yes, that's the correct way though the nomenclature is strange, copying from the target of the input mixer to the source of the output mixer) [based on code by Matthias Pfisterer]:

    private static AudioFormat getAudioFormat(){
        float sampleRate = 44100.0F;
        //8000,11025,16000,22050,44100
        int sampleSizeInBits = 16;
        //8,16
        int channels = 1;
        //1,2
        boolean signed = true;
        //true,false
        boolean bigEndian = false;
        //true,false
        return new AudioFormat(sampleRate,
                   sampleSizeInBits,
                   channels,
                   signed,
                   bigEndian);
    }//end getAudioFormat

    public void playAudio() throws Exception {
        AudioFormat audioFormat;
        TargetDataLine targetDataLine;

        audioFormat = getAudioFormat();
        DataLine.Info dataLineInfo =
            new DataLine.Info(
                  TargetDataLine.class,
                  audioFormat);
        targetDataLine = (TargetDataLine)
            AudioSystem.getLine(dataLineInfo);

        targetDataLine.open(audioFormat,
                audioFormat.getFrameSize() * FRAMES_PER_BUFFER);
        targetDataLine.start();

        playAudioStream(new AudioInputStream(targetDataLine));
    } // playAudioFile

    /** Plays audio from the given audio input stream. */
    public void playAudioStream( AudioInputStream audioInputStream ) {
        // Audio format provides information like sample rate, size, channels.
        AudioFormat audioFormat = audioInputStream.getFormat();

        // Open a data line to play our type of sampled audio.
        // Use SourceDataLine for play and TargetDataLine for record.
        DataLine.Info info = new DataLine.Info( SourceDataLine.class,
             audioFormat );
        if ( !AudioSystem.isLineSupported( info ) ) {
            System.out.println( "Play.playAudioStream does not handle this type of audio on this system." );
            return;
        }

        try {
             SourceDataLine dataLine = (SourceDataLine) AudioSystem.getLine( info );

            dataLine.open( audioFormat,
               audioFormat.getFrameSize() * FRAMES_PER_BUFFER);

            // Allows the line to move data in and out to a port.
            dataLine.start();

            // Create a buffer for moving data from the audio stream to the line.
            int bufferSize = (int) audioFormat.getSampleRate() *
            audioFormat.getFrameSize();
            bufferSize = audioFormat.getFrameSize() * FRAMES_PER_BUFFER;
            // See http://docs.oracle.com/javase/6/docs/technotes/guides/sound/programmer_guide/chapter5.html
            // for recommendation about buffer size
            byte [] buffer = new byte[bufferSize / 5];

            // Move the data until done or there is an error.
            try {
                int bytesRead = 0;
                while ( bytesRead >= 0 ) {
                    bytesRead = audioInputStream.read( buffer, 0, buffer.length );
                    if ( bytesRead >= 0 ) {
                        int framesWritten = dataLine.write( buffer, 0, bytesRead );
                    }
                } // while
            } catch ( IOException e ) {
                e.printStackTrace();
            }
          dataLine.drain();


            dataLine.close();
        } catch ( LineUnavailableException e ) {
            e.printStackTrace();
      }
    } // playAudioStream

Now both of those work okay, picking up default devices, mixers, data lines, etc. You have to be careful running the copy code from microphone to speaker - you can set up a howling feedback loop between your laptop's microphone and speaker if you don't use, say, headphones.

There is a detectable latency (delay between the sounds) between talking/singing into the microphone and getting sound out of the speaker, but it is acceptable. But when you put the two pieces of code in the same program - even in different threads - then the latency blows out and the result isn't acceptable after all. There is a distinct delay between the input and the output sounds. Processing the Midi data somehow interferes with processing the sampled data and introduces additional delays which make it unusable.

I looked around on the Web, and read all the Sun/Oracle documentation that I could find, but couldn't find anything talking about this problem in the context of the Java Sound API. I've now found a solution (even if it isn't totally portable) so that is why I'm writing this blog.

The above code leaves almost everything to defaults. So the Midi code must be re-setting some default used by the sampled data code. The most likely candidate is the output Mixer, but you can't get from the SourceDataLine to its Mixer, and the Midi API nowhere gives you access to things like Mixers. Digging around in the OpenJDK source code showed lots of interesting things such as the Midi code setting its Midi-processing thread loop to a very high priority but I ran out of steam before finding the link between the two processing streams. The com.sun.media.sound package has a bunch of software mixers - the answer is probably in there somewhere.

So I looked at the mixers available. The following function shows how:

public void listMixers() {
    try{
        Mixer.Info[] mixerInfo =
            AudioSystem.getMixerInfo();
        System.out.println("Available mixers:");
        for(int cnt = 0; cnt < mixerInfo.length; cnt++){
            System.out.println(mixerInfo[cnt].getName());
        }//end for loop
     } catch(Exception e) {
     }
}
On my laptop running Fedora 16 this lists

Available mixers:
PulseAudio Mixer
default [default]
PCH [plughw:0,0]
NVidia [plughw:1,3]
NVidia [plughw:1,7]
NVidia [plughw:1,8]
Port PCH [hw:0]
Port NVidia [hw:1]

There's a default mixer which I don't want, several hardware mixers and a PulseAudio one. PulseAudio is the audio system on most current Linux systems so I get the best (Linux) portability by choosing that one, while avoiding whatever default Java Sound gives me.

Do these mixers support source lines and target lines? This shows the full list

          System.out.println("Available mixers:");
            for(int cnt = 0; cnt < mixerInfo.length;
                cnt++){
                System.out.println(mixerInfo[cnt].
                                   getName());

                Mixer mixer = AudioSystem.getMixer(mixerInfo[cnt]);
                Line.Info[] sourceLines = mixer.getSourceLineInfo();
                for (Line.Info s: sourceLines) {
                    System.out.println(" Source line: " + s.toString());
                }
                Line.Info[] targetLines = mixer.getTargetLineInfo();
                for (Line.Info t: targetLines) {
                    System.out.println(" Target line: " + t.toString());
                }
            }//end for loop

This shows results like
PulseAudio Mixer
    Source line: interface SourceDataLine supporting 42 audio formats, and buffers of 0 to 1000000 bytes
Source line: interface Clip supporting 42 audio formats, and buffers of 0 to 1000000 bytes
    Target line: interface TargetDataLine supporting 42 audio formats, and buffers of 0 to 1000000 bytes

(Note that you have to ask for mixer.getSourceLineInfo() - asking for mixer.getSourceLines() only shows the open lines and there will be none of those till you open them!)

I leave the Midi code alone. I don't need to mess with it. The sampled data I handle this way:

            Mixer.Info[] mixerInfo = AudioSystem.getMixerInfo();
            Mixer mixer = null;
            for(int cnt = 0; cnt < mixerInfo.length; cnt++){
                if (mixerInfo[cnt].getName().equals("PulseAudio Mixer")) {
                    mixer = AudioSystem.getMixer(mixerInfo[cnt]);
                    break;
                }
            }//end for loop
            if (mixer == null) {
                System.out.println("can't find a PulseAudio mixer");
            } else {
                Line.Info[] lines = mixer.getSourceLineInfo();
                if (lines.length >= 1) {
                    try {
                        dataLine = (SourceDataLine) AudioSystem.getLine(lines[0]);
                        System.out.println("Got a Pulse Audio source line");
                    } catch(Exception e) {
                    }
                } else {
                    System.out.println("no source lines for this mixer " +
                                                     mixer.toString());
                }
            }

And that's it! I can now write to this SourceDataLine and my sampled data is going straight to the Linux sound mixer, bypassing whatever the Java Sound Midi system is doing. Latency problem solved.

Now on to the next steps...