Comparison of Streaming Formats

The latest version of SecuritySpy supports new streaming formats which significantly enhance compatibility with new and existing network cameras. The following information about these formats will be useful when making purchasing decisions and setting up video surveillance systems based upon SecuritySpy.

Network Streaming Protocols
There are two main protocols used for carrying video and audio data over IP networks: HTTP and RTSP. Using these protocols, it is possible to transmit video and audio in various compression formats (JPEG, MPEG-4, H.264, AAC etc.).

HTTP is the foundation of communication of the World Wide Web, used for transferring web pages and other associated files between web servers and clients. HTTP has long been established as a method of transmitting JPEG video streams – based on a data streaming method invented by Netscape in 1998, and since then adopted as an unofficial standard and supported by most web browsers. This is the primary format upon which SecuritySpy has been built, prior to version 3.0.

1998 was apparently a good year for video streaming, because it was also the year that the RTSP protocol was invented. While HTTP is designed for general web content, RTSP is designed specifically for media streaming. Unlike HTTP, RTSP is not limited to carrying just JPEG video data; it can conceivably carry video and audio of any compression format.

Despite the long-standing availability of RTSP as a standard, it has only been relatively recently that it has become widely employed in network cameras. For many years, the processing capabilities of network cameras was not good enough to produce the more CPU-intensive compression formats MPEG-4 and H.264, so manufacturers tended to stick to the simplest format with the widest web-browser compatibility, which is JPEG-over-HTTP. But now, with improvements in embedded processing capability, and customer demand for more sophisticated and mobile-friendly compression formats, there has been an explosion in MPEG-4-over-RTSP and H.264-over-RTSP implementation in network cameras.

One drawback of RTSP is the fact that it operates on a different port to HTTP (554 compared to 80), and that it is sometimes blocked by proxy servers and firewalls. In response to this, Apple have described a method for tunnelling RTSP data within an HTTP connection, which resolves these issues. This method, called RTSP tunnelling or RTSP-over-HTTP, is supported by many network cameras, and I would expect its use to become more dominant especially now it has been endorsed by the ONVIF standard.

In response to this changing landscape, SecuritySpy now supports the following formats:

Video: JPEG, JPEG 2000
Audio: PCM, G.711 (µ-law and a-law), G.726

Video: JPEG, MPEG-4, H.264
Audio: PCM,  G.711 (µ-law and a-law), G.726, AAC, AMR

This covers the vast majority of formats that are produced by modern network cameras.

Which format is best?
When making decisions about which video compression format to use (JPEG vs. MPEG-4 vs. H.264), it is important to bear in mind that one is not necessarily “better” than the others; they all have their advantages and disadvantages.

MPEG-4 and H.264 differ significantly from JPEG in that they are both temporally compressed formats; that is the video sequence comprises one I-frame (key frame), which encodes one entire image, followed by multiple P-frames (delta frames), which encode only changes in the image since the previous frame. This strategy results in a much lower data rate compared with JPEG, especially for video surveillance footage where the majority of the image often remains the same. The more P-frames that exist between the I-frames (known as the I-frame ratekey frame rate, or GOV length), the lower the data rate will be.

There is often an impression that MPEG-4 and H.264 are therefore “lower quality” formats compared to JPEG, due to this extra temporal compression, however this is not the case: SecuritySpy provides an adjustable quality setting for each format, at the lower end of which the image will look quite degraded, but at the higher end of which the image will be indistinguishable from the original uncompressed image. At all points where the quality of the JPEG video compared to the MPEG-4/H.264 video is perceived to be the equivalent, the data rate of the MPEG-4/H.264 video will be much lower than the JPEG video. As a (very) rough rule of thumb, the data rate of MPEG-4 is around a fifth that of JPEG video, and the data rate of H.264 is around half that of MPEG-4, at equivalent perceived quality.

H.264 achieves this extra saving by employing B-frames in addition to P-frames, which depend not only on the previous image in the sequence, but on the next image. As you can imagine, this increase in complexity has costs in terms of processing power required to encode and decode the data.

So, SecuritySpy can receive incoming JPEG, MPEG-4, or H.264 video data, and can additionally be set to save that data to disk directly, without any further encoding, or it can re-encode the video to JPEG, MPEG-4, or H.264 formats. This gives users ultimate control over how the data is processed and saved. Here are the main considerations you need to bear in mind:

Disk space for captured files
Saving JPEG data results in the largest files; H.264 the smallest. If you have no need to transmit the captured files over the internet, it is best to implement a large storage capacity (large-capacity hard disks are inexpensive these days), and have SecuritySpy receive and directly save JPEG data to disk. JPEG is the quickest format to process, resulting in the lowest CPU utilisation and therefore the highest performance (in terms of the total number of frames per second that SecuritySpy can process from all cameras).

If SecuritySpy is receiving JPEG data from a camera, but you need small captured file sizes, it is best to set SecuritySpy to encode this data as MPEG-4 (H.264 can also be used, but it is only viable for a small number of cameras due to its very high memory and CPU requirements).

CPU usage
Saving the incoming data directly to disk without re-encoding typically results in the lowest CPU usage and highest quality possible. If SecuritySpy is receiving MPEG-4 or H.264 data from a camera, it makes sense to save this data directly to disk without re-encoding. However, there are a two disadvantages in doing this. The first is that SecuritySpy cannot modify the images in any way, so it cannot apply a text overlay (date and time) or transformation (rotation or flipping), or change the compression quality. Therefore all such settings must be set in the camera itself, not SecuritySpy. Secondly, for MPEG-4 and H.264 data, being temporally compressed, SecuritySpy cannot change the frame rate of the video; all video capture must be at whatever rate is being supplied by the camera.

Hence with temporally-compressed formats such as MPEG-4 and H.264, SecuritySpy must decode/use every single frame in order to maintain the integrity of the video stream. With JPEG data however, SecuritySpy need decode/use only those frames that are required by the current conditions and can safely ignore those that aren’t (this is another reason why processing JPEG data generally requires less CPU power than MPEG-4 or H.264). Furthermore, if the computer becomes overloaded and has to discard frames, JPEG captures will slow down but otherwise be unaffected, but MPEG-4 and H.264 captures will be come corrupt (for the period until the next key frame), therefore, when using these formats, it is very important to make sure the computer is not overloaded.

Compared to video streams, the data rate of audio streams is generally much lower, so which audio format to choose is a much less important consideration. Network cameras typically supply one channel of audio (mono) and use a sample rate of 8kHz, which satisfactorily encodes speech at a low data rate.

PCM data is basically uncompressed, so will result in the highest data rate. G.711 uses a very fast encoding scheme that reduces this data rate by a factor of two without losing too much quality. G.726, AMR and AAC offer the lowest data rates of the bunch (albeit with higher CPU usage), but while G.726 and AMR are limited to low-fidelity audio (they were primarily designed for encoding speech for telephone networks), AAC is capable of high-fidelity audio, so is the best choice when quality is the primary consideration. I would generally recommend setting SecuritySpy to capture the audio from the camera directly to disk with no re-encoding, except when the incoming audio data is PCM, in which case I would use G.711 compression in SecuritySpy due to its low CPU usage.

It is difficult to speak in general terms because there is so much variation in customers’ requirements and budgets, however if I were to set up a general-purpose video surveillance system, I would use the following:

  • Employ cameras capable of sending H.264 video data over RTSP (plus audio in any format).
  • Set up a text overlay of the date and time in the camera itself (ideally linked to a network time server – for example – for accuracy).
  • Set up the correct frame rate for the video in the camera itself (around 10fps, with a key frame rate of 10-20; a value of between 1x and 2x the frame rate is generally good).
  • Set SecuritySpy to receive these H.264 streams, and record them directly to disk.
  • Test the system to make sure the computer can comfortably handle the incoming video; if it’s overloaded, reduce the frame rates in the cameras.

I hope this post helps to make sense of what can be a complex topic. Please email us with any questions.

11 thoughts on “Comparison of Streaming Formats

  1. Matt

    I have a vivotek 360 IP camera (5 megapixels) and am extremely happy to see native format H264/AAC/MPEG4 streamed capture in version3.

    I have gone from Motion Jpeg -> Mpeg on the fly conversion consuming up 120GB/6 hour segment without audio and a variable framerate, to now native 15fps 1920×1920 h264 with Audio, at 25GB/6hr segment, with Audio

    I believe this is the only Mac solution to offer native h264/mpeg support. Highly recommended !

  2. Nunuv Yurbiz

    If I’m recording in MPEG4 format, which stream is best to connect to, the camera’s MPEG4 stream or H264 stream (assuming I am not streaming straight to disk but rather SecuritySpy is encoding)?

    1. bensoftware Post author

      In this case the better stream to receive from the camera is MPEG-4, as it’s faster for the computer to decompress compared to H.264. Better still would be to receive JPEG format video from the camera, as this is faster still, being a simpler format (although higher-bandwidth so not ideal for a slow network). Best would be to receive the MPEG-4 video and have SecuritySpy capture this directly to disk, but there may be good reasons why your system design doesn’t allow this.

  3. Gavin Kopp

    If I were to set up approximately 12 megapixel cameras and wanted to view via iphone, would it be best to be set up in jpeg?

    1. bensoftware Post author

      Hi Gavin,

      I would say it’s best to get cameras that support either MPEG-4 or H.264 compression with SecuritySpy, and set SecuritySpy to use either of these streams. Use the advice in our user manual to capture the data from the cameras directly to disk with no re-encoding. This will result in the best performance and quality.

      SecuritySpy will deliver compatible streams to the iPhone, no matter what the input formats to SecuritySpy are. The above setup will result in much lower CPU usage and network bandwidth usage compared to using JPEG from the cameras.

      Hope this helps,

  4. Marc

    Hello, I was hoping that you could provide some guidance on a proper setup for my scenario. We have a location with 18 cameras, they are a mix of Y-Cam VGA Bullet and Y-Cam POE Range and a few Y-Cam Bullets 720P. Currently using a Mac mini Server Mid 2011 on Mavericks with 16GB RAM, 2.0GHz Core i7 and 750GB for storage. All Cameras are connected POE. What would be ideal camera and Security Spy settings, or what system would you recommend, if you feel the Mac mini will be overwhelmed. Thank you!

    1. bensoftware Post author

      Hi Marc,

      A Mac mini should be ideal for this setup. It sounds like most of your cameras are standard-resolution (VGA), and this isn’t very taxing on the computer’s resources. Check our our System Requirements Calculator, which will give you a good idea of how powerful the Mac mini will need to be.

      Hope this helps,

  5. Ken

    Hi Thanks for your blog post, it’s been really helpful when I’m trying to understand the whole landscape of video streaming. I’m trying to setup a video stream for a mobile website that would be able to run on both android and iOS phones and I’m wondering what kind of setup (protocols and types of codecs and video containers) you would suggest or if you have recommendations on related topics to look into?

    Thank you!

    a little more context:
    I’m live streaming video of plants to help gardeners monitor their plants so I would expect a small number (less than 30) of users/clients (the owner(s) of the garden and maybe workers) to be requesting the live stream of their plant from their phones. And I also read that iOS is only compatible with HLS protocol but saw from your post that you can use rtsp tunneling as well. I’m running a usb camera through a raspberry pi but would be open to using ip cameras too if that’s going to be too much for the pi to handle.

    1. bensoftware Post author

      Hi Ken, please check out our Adding live video to any web page blog post, which details a few methods for adding live video from your SecuritySpy server into one of your own web pages. I would recommend the “JavaScript JPEG” method for your case, as it is compatible with virtually all viewing clients. If you’re planning to stream directly from your pi to the clients (rather than via SecuritySpy), the same method will work as long as the pi can provide a URL for accessing the camera’s current JPEG. However, having multiple clients viewing simultaneously will put a significant load on the server, which SecuritySpy will be able to comfortable handle but the pi might not, so I would recommend going through SecuritySpy in any case.

  6. Tom

    I am just learning about the IP cameras, and I’m a little confused about the terminology for MPEG-4. When you refer to MPEG-4 e.g “which video compression format to use (JPEG vs. MPEG-4 vs. H.264)” are you talking about MPEG-4 encoding with something like xvid, i.g. MPEG-4 Visual encoding. or something else? I ask, because I’ve not seen any cameras that support MPEG-4 encoding (though I saw something that support MPEG-2 (i.e. MPEG-2 part 2)

    I am finding this all very confusing…. 😉

    1. Ben Software Post author

      It’s almost designed to be confusing 🙂

      When I say MPEG-4 in the above context, I’m referring to the MPEG-4 Part 2 video codec, also called “MPEG-4 Visual”.

      H.264 is a more recent and more advanced codec. It is also part of the MPEG-4 specification (specifically MPEG-4 Part 10). It’s also sometimes called “Advanced Video Coding” or “MPEG-4 AVC”.

      Even more confusingly, “MPEG-4” can also refer to the MP4 container format (i.e. movie files with the extension .mp4 or .m4v). This comes from part 14 of the specification, and basically describes the format of the metadata (e.g. frame durations and positions in the file). MP4 files can contain a number of different video and audio compression formats, though these are usually H.264 for video and AAC for audio. The MP4 container format is basically the same as the QuickTime file format.

      So basically “MPEG-4” is a big suite of specifications that is made up of many parts, so specifically what is meant by “MPEG-4” depends upon the context in which the term is used. In the context of video encoding formats (codecs), the way I use the terms MPEG-4 (to refer to the MPEG-4 Part 2 video codec) and H.264 (to refer to the MPEG-4 Part 10 video codec) mirrors the way that Apple refers to these codecs.


Leave a Reply

Your email address will not be published. Required fields are marked *