The latest version of SecuritySpy supports new streaming formats which significantly enhance compatibility with new and existing network cameras. The following information about these formats will be useful when making purchasing decisions and setting up video surveillance systems based upon SecuritySpy.
Network Streaming Protocols
There are two main protocols used for carrying video and audio data over IP networks: HTTP and RTSP. Using these protocols, it is possible to transmit video and audio in various compression formats (JPEG, MPEG-4, H.264, AAC etc.).
HTTP is the foundation of communication of the World Wide Web, used for transferring web pages and other associated files between web servers and clients. HTTP has long been established as a method of transmitting JPEG video streams – based on a data streaming method invented by Netscape in 1998, and since then adopted as an unofficial standard and supported by most web browsers. This is the primary format upon which SecuritySpy has been built, prior to version 3.0.
1998 was apparently a good year for video streaming, because it was also the year that the RTSP protocol was invented. While HTTP is designed for general web content, RTSP is designed specifically for media streaming. Unlike HTTP, RTSP is not limited to carrying just JPEG video data; it can conceivably carry video and audio of any compression format.
Despite the long-standing availability of RTSP as a standard, it has only been relatively recently that it has become widely employed in network cameras. For many years, the processing capabilities of network cameras was not good enough to produce the more CPU-intensive compression formats MPEG-4 and H.264, so manufacturers tended to stick to the simplest format with the widest web-browser compatibility, which is JPEG-over-HTTP. But now, with improvements in embedded processing capability, and customer demand for more sophisticated and mobile-friendly compression formats, there has been an explosion in MPEG-4-over-RTSP and H.264-over-RTSP implementation in network cameras.
One drawback of RTSP is the fact that it operates on a different port to HTTP (554 compared to 80), and that it is sometimes blocked by proxy servers and firewalls. In response to this, Apple have described a method for tunnelling RTSP data within an HTTP connection, which resolves these issues. This method, called RTSP tunnelling or RTSP-over-HTTP, is supported by many network cameras, and I would expect its use to become more dominant especially now it has been endorsed by the ONVIF standard.
In response to this changing landscape, SecuritySpy now supports the following formats:
Video: JPEG, JPEG 2000
Audio: PCM, G.711 (µ-law and a-law), G.726
RTSP / RTSP-over-HTTP
Video: JPEG, MPEG-4, H.264
Audio: PCM, G.711 (µ-law and a-law), G.726, AAC, AMR
This covers the vast majority of formats that are produced by modern network cameras.
Which format is best?
When making decisions about which video compression format to use (JPEG vs. MPEG-4 vs. H.264), it is important to bear in mind that one is not necessarily “better” than the others; they all have their advantages and disadvantages.
MPEG-4 and H.264 differ significantly from JPEG in that they are both temporally compressed formats; that is the video sequence comprises one I-frame (key frame), which encodes one entire image, followed by multiple P-frames (delta frames), which encode only changes in the image since the previous frame. This strategy results in a much lower data rate compared with JPEG, especially for video surveillance footage where the majority of the image often remains the same. The more P-frames that exist between the I-frames (known as the I-frame rate, key frame rate, or GOV length), the lower the data rate will be.
There is often an impression that MPEG-4 and H.264 are therefore “lower quality” formats compared to JPEG, due to this extra temporal compression, however this is not the case: SecuritySpy provides an adjustable quality setting for each format, at the lower end of which the image will look quite degraded, but at the higher end of which the image will be indistinguishable from the original uncompressed image. At all points where the quality of the JPEG video compared to the MPEG-4/H.264 video is perceived to be the equivalent, the data rate of the MPEG-4/H.264 video will be much lower than the JPEG video. As a (very) rough rule of thumb, the data rate of MPEG-4 is around a fifth that of JPEG video, and the data rate of H.264 is around half that of MPEG-4, at equivalent perceived quality.
H.264 achieves this extra saving by employing B-frames in addition to P-frames, which depend not only on the previous image in the sequence, but on the next image. As you can imagine, this increase in complexity has costs in terms of processing power required to encode and decode the data.
So, SecuritySpy can receive incoming JPEG, MPEG-4, or H.264 video data, and can additionally be set to save that data to disk directly, without any further encoding, or it can re-encode the video to JPEG, MPEG-4, or H.264 formats. This gives users ultimate control over how the data is processed and saved. Here are the main considerations you need to bear in mind:
Disk space for captured files
Saving JPEG data results in the largest files; H.264 the smallest. If you have no need to transmit the captured files over the internet, it is best to implement a large storage capacity (large-capacity hard disks are inexpensive these days), and have SecuritySpy receive and directly save JPEG data to disk. JPEG is the quickest format to process, resulting in the lowest CPU utilisation and therefore the highest performance (in terms of the total number of frames per second that SecuritySpy can process from all cameras).
If SecuritySpy is receiving JPEG data from a camera, but you need small captured file sizes, it is best to set SecuritySpy to encode this data as MPEG-4 (H.264 can also be used, but it is only viable for a small number of cameras due to its very high memory and CPU requirements).
Saving the incoming data directly to disk without re-encoding typically results in the lowest CPU usage and highest quality possible. If SecuritySpy is receiving MPEG-4 or H.264 data from a camera, it makes sense to save this data directly to disk without re-encoding. However, there are a two disadvantages in doing this. The first is that SecuritySpy cannot modify the images in any way, so it cannot apply a text overlay (date and time) or transformation (rotation or flipping), or change the compression quality. Therefore all such settings must be set in the camera itself, not SecuritySpy. Secondly, for MPEG-4 and H.264 data, being temporally compressed, SecuritySpy cannot change the frame rate of the video; all video capture must be at whatever rate is being supplied by the camera.
Hence with temporally-compressed formats such as MPEG-4 and H.264, SecuritySpy must decode/use every single frame in order to maintain the integrity of the video stream. With JPEG data however, SecuritySpy need decode/use only those frames that are required by the current conditions and can safely ignore those that aren’t (this is another reason why processing JPEG data generally requires less CPU power than MPEG-4 or H.264). Furthermore, if the computer becomes overloaded and has to discard frames, JPEG captures will slow down but otherwise be unaffected, but MPEG-4 and H.264 captures will be come corrupt (for the period until the next key frame), therefore, when using these formats, it is very important to make sure the computer is not overloaded.
Compared to video streams, the data rate of audio streams is generally much lower, so which audio format to choose is a much less important consideration. Network cameras typically supply one channel of audio (mono) and use a sample rate of 8kHz, which satisfactorily encodes speech at a low data rate.
PCM data is basically uncompressed, so will result in the highest data rate. G.711 uses a very fast encoding scheme that reduces this data rate by a factor of two without losing too much quality. G.726, AMR and AAC offer the lowest data rates of the bunch (albeit with higher CPU usage), but while G.726 and AMR are limited to low-fidelity audio (they were primarily designed for encoding speech for telephone networks), AAC is capable of high-fidelity audio, so is the best choice when quality is the primary consideration. I would generally recommend setting SecuritySpy to capture the audio from the camera directly to disk with no re-encoding, except when the incoming audio data is PCM, in which case I would use G.711 compression in SecuritySpy due to its low CPU usage.
It is difficult to speak in general terms because there is so much variation in customers’ requirements and budgets, however if I were to set up a general-purpose video surveillance system, I would use the following:
- Employ cameras capable of sending H.264 video data over RTSP (plus audio in any format).
- Set up a text overlay of the date and time in the camera itself (ideally linked to a network time server – for example time.apple.com – for accuracy).
- Set up the correct frame rate for the video in the camera itself (around 10fps, with a key frame rate of 10-20; a value of between 1x and 2x the frame rate is generally good).
- Set SecuritySpy to receive these H.264 streams, and record them directly to disk.
- Test the system to make sure the computer can comfortably handle the incoming video; if it’s overloaded, reduce the frame rates in the cameras.
I hope this post helps to make sense of what can be a complex topic. Please email us with any questions.