SecuritySpy 
Multi-camera video surveillance software for the Mac
SecuritySpy Movie File Metadata
This document describes the custom metadata atoms embedded in QuickTime/MP4 movie files.
1. Overview
SecuritySpy stores custom metadata inside the standard QuickTime/MP4 user data container. The location within the file hierarchy is:
File (ftyp, mdat, ...)
└── moov
├── mvhd
├── trak (video)
├── trak (audio, if present)
└── udta
├── Sver (always present)
├── Mtyp (always present)
├── Mtma (optional)
└── Evnt (optional)
Each item is a standard QuickTime atom (also called a box in ISO Base Media File Format terminology). An atom begins with a 4-byte big size followed by a 4-byte ASCII type code.
General rules
- All multi-byte integer fields are stored in big (network) byte order unless explicitly noted otherwise.
- Structures are tightly packed with no padding between fields.
- The atom
sizefield always includes the 8 bytes of the atom header itself. - The movie timescale is 600 (i.e. one timescale unit = 1/600th of a second).
2. Sver — Software Version
Records the version of SecuritySpy that created the movie file. Always present.
Layout (16 bytes)
| Offset | Size | Field | Type | Byte Order | Description |
|---|---|---|---|---|---|
| 0 | 4 | size | uint32 | Big | Always 0x00000010 (16) |
| 4 | 4 | type | ASCII | — | 'Sver' (0x53766572) |
| 8 | 8 | version | char[8] | — | Null-terminated ASCII version string, e.g. "6.0.1" |
The version string occupies exactly 8 bytes. Shorter strings are null-terminated; any remaining bytes after the null terminator should be ignored.
3. Mtyp — Movie Type
Indicates how the movie was recorded. Always present.
Layout (12 bytes)
| Offset | Size | Field | Type | Byte Order | Description |
|---|---|---|---|---|---|
| 0 | 4 | size | uint32 | Big | Always 0x0000000C (12) |
| 4 | 4 | type | ASCII | — | 'Mtyp' (0x4D747970) |
| 8 | 1 | movieType | uint8 | — | See values below |
| 9 | 3 | (unused) | — | — | Reserved, currently zero |
Movie type values
| Value | Meaning |
|---|---|
0 | Motion capture — recording was triggered by detected motion or other events |
1 | Continuous — recording is part of a continuous recording schedule |
4. Mtma — Time Mapping
Maps positions within the movie timeline to real wall-clock times. This is essential because a single movie file may span gaps in recording (e.g. if the camera went offline briefly or if motion-capture segments are concatenated). Each entry marks the start of a contiguous segment and records the wall-clock time at which that segment began.
This atom is optional — it is present only when time-mapping information was recorded.
Atom header (8 bytes)
| Offset | Size | Field | Type | Byte Order | Description |
|---|---|---|---|---|---|
| 0 | 4 | size | uint32 | Big | 8 + (N × 42) where N is the number of entries |
| 4 | 4 | type | ASCII | — | 'Mtma' (0x4D746D61) |
Immediately following the 8-byte header is an array of N time-mapping entries.
Each entry (42 bytes)
| Offset | Size | Field | Type | Byte Order | Description |
|---|---|---|---|---|---|
| 0 | 2 | year | int16 | Big | Calendar year (e.g. 2026) |
| 2 | 2 | month | int16 | Big | 1–12 |
| 4 | 2 | day | int16 | Big | 1–31 |
| 6 | 2 | hour | int16 | Big | 0–23 (local time) |
| 8 | 2 | minute | int16 | Big | 0–59 |
| 10 | 2 | second | int16 | Big | 0–59 |
| 12 | 2 | subSec600 | int16 | Big | Sub-second precision in 1/600ths of a second (0–599) |
| 14 | 2 | (reserved) | int16 | Big | Reserved for future use |
| 16 | 8 | startTimeValue | int64 | Big | Start position of this segment in the movie, in timescale units (timescale 600) |
| 24 | 8 | endTimeValue | int64 | Big | End position of this segment in the movie, in timescale units (timescale 600) |
| 32 | 2 | (reserved) | int16 | Big | Reserved for future use |
| 34 | 2 | (reserved) | int16 | Big | Reserved (in files created before v5.2.5 this field holds the timescale value, 600) |
Interpreting time-mapping data
To convert a movie timeline position to a wall-clock time:
- Find the Mtma entry whose
startTimeValue ≤ position < endTimeValue. - Compute the offset:
offset = position − startTimeValue. - Convert the offset to seconds:
seconds = offset / 600.0. - Add the offset to the wall-clock time given by the entry's date/time fields (year, month, day, hour, minute, second, subSec600).
If the position falls in a gap between segments (i.e. between one entry's endTimeValue and the next entry's startTimeValue), there is no valid wall-clock mapping for that position.
5. Evnt — Event Data
Contains a per-frame record of motion detection, AI classification, and presence detection events. Only frames with notable activity are recorded here — not every frame in the movie has an event entry.
This atom is optional — it is present only when event data was recorded.
Atom header (12 bytes)
| Offset | Size | Field | Type | Byte Order | Description |
|---|---|---|---|---|---|
| 0 | 4 | size | uint32 | Big | 12 + (N × recordSize) |
| 4 | 4 | type | ASCII | — | 'Evnt' (0x4576746E) |
| 8 | 1 | recordSize | uint8 | — | Size of each entry in bytes (currently 32) |
| 9 | 1 | version | uint8 | — | Format version (currently 1) |
| 10 | 1 | (reserved) | uint8 | — | 0 |
| 11 | 1 | (reserved) | uint8 | — | 0 |
recordSize field in the header tells you the size of each entry. Future versions may extend the entry beyond 32 bytes (up to 252). When reading, always use the recordSize from the header to step through entries, and ignore any trailing bytes you don't recognise. If recordSize is smaller than expected (from an older file), treat missing fields as zero.
Each entry — version 1 (32 bytes)
| Offset | Size | Field | Type | Byte Order | Description |
|---|---|---|---|---|---|
| 0 | 4 | movieTime | uint32 | Big | Position in the movie timeline, in timescale units (timescale 600) |
| 4 | 1 | triggerFlag | bool | — | 1 if this frame caused a motion trigger (i.e. motion duration exceeded the trigger threshold), 0 otherwise |
| 5 | 3 | (reserved) | — | — | Reserved, currently zero |
| 8 | 1 | classifiedFlag | bool | — | 1 if this frame was classified by the AI model, 0 if not |
| 9 | 1 | probH | uint8 | — | Probability of human presence (0–100%). Only valid when classifiedFlag is 1 |
| 10 | 1 | probV | uint8 | — | Probability of vehicle presence (0–100%). Only valid when classifiedFlag is 1 |
| 11 | 1 | probA | uint8 | — | Probability of animal presence (0–100%). Only valid when classifiedFlag is 1 |
| 12 | 2 | mdRect.x | uint16 | Big | Bounding rectangle of detected motion, normalised to a 65535×65535 coordinate space. Valid only when triggerFlag is 1. To convert to pixel coordinates: pixelX = mdRect.x × imageWidth / 65535 |
| 14 | 2 | mdRect.y | uint16 | Big | |
| 16 | 2 | mdRect.w | uint16 | Big | |
| 18 | 2 | mdRect.h | uint16 | Big | |
| 20 | 4 | seqId | uint32 | Big | Sequence identifier — a random number shared by all frames belonging to the same continuous motion event. A new sequence starts when there is a gap of more than 4 seconds since the previous motion frame |
| 24 | 2 | arrivedObjects | uint16 | Little | Bitfield indicating which object types have arrived in the scene on this frame (see object type bits below) |
| 26 | 2 | departedObjects | uint16 | Little | Bitfield indicating which object types have departed from the scene on this frame |
| 28 | 1 | presenceRect.x | uint8 | — | Bounding rectangle of the presence detection zone, normalised to a 32×32 coordinate space. To convert to pixel coordinates: pixelX = presenceRect.x × imageWidth / 32 |
| 29 | 1 | presenceRect.y | uint8 | — | |
| 30 | 1 | presenceRect.w | uint8 | — | |
| 31 | 1 | presenceRect.h | uint8 | — |
Object type bitfield values
Used in the arrivedObjects and departedObjects fields:
| Bit | Mask | Object Type |
|---|---|---|
| 0 | 0x0001 | Human |
| 1 | 0x0002 | Vehicle |
| 2 | 0x0004 | Animal |
Bits 3–15 are reserved for future object types.
Understanding event fields
- movieTime — use this value with the Mtma time mapping to determine the real wall-clock time of the event.
- triggerFlag — indicates the frame at which SecuritySpy determined that motion had persisted long enough to constitute a genuine trigger event. Frames with motion that hasn't yet exceeded the trigger duration will have
triggerFlag = 0. - classifiedFlag / probH / probV / probA — when the AI classification model has been run on a frame,
classifiedFlagis set to 1 and the three probability fields indicate the confidence (0–100%) that the frame contains a human, vehicle, or animal respectively. WhenclassifiedFlagis 0, the probability fields should be ignored. - seqId — groups related frames into motion sequences. All event entries that are part of the same continuous motion event share the same
seqIdvalue. A new randomseqIdis generated when motion resumes after a gap of more than 4 seconds. - arrivedObjects / departedObjects — indicate transitions in presence detection. When an object type first appears in the scene, the corresponding bit is set in
arrivedObjects. When it leaves, the bit is set indepartedObjects. On most frames these fields will be zero. - presenceRect — the region of the frame in which presence detection is operating, normalised to a 32×32 grid.
6. Additional Notes
Older file formats
Files created by SecuritySpy versions prior to the introduction of the version-1 Evnt format may contain a version-0 event record with a smaller recordSize (20 bytes). The version-0 layout is:
| Offset | Size | Field | Type | Byte Order | Description |
|---|---|---|---|---|---|
| 0 | 4 | movieTime | uint32 | Big | Position in movie timeline |
| 4 | 1 | probH | int8 | — | Human probability (0–100) or −1 if not classified |
| 5 | 1 | probV | int8 | — | Vehicle probability (0–100) or −1 if not classified |
| 6 | 1 | (reserved) | uint8 | — | 0 |
| 7 | 1 | motionValue | uint8 | — | Motion intensity (0–180) |
| 8 | 8 | mdRect | IntRectU16 | Big | Motion detection location (normalised to 65535×65535) |
Always check the recordSize field in the Evnt atom header to determine which version you are reading.
Reading the udta atom
To locate the udta atom, parse the moov atom's children by walking through atoms sequentially (read size, read type, skip to next). The udta atom will have type code 0x75647461. Then walk its children the same way to find each metadata atom by its type code.
Complete udta structure summary
udta (variable size)
├── Sver 16 bytes (fixed) — always present
├── Mtyp 12 bytes (fixed) — always present
├── Mtma 8 + N×42 bytes — optional
└── Evnt 12 + N×recordSize — optional
Example: parsing in Python
import struct
def read_atom(f):
"""Read an atom header; returns (type, size, data_offset)."""
hdr = f.read(8)
if len(hdr) < 8:
return None, 0, 0
size, atype = struct.unpack('>I4s', hdr)
return atype, size, f.tell()
def find_atom(f, parent_offset, parent_size, target_type):
"""Find a child atom of the given type within a parent atom."""
f.seek(parent_offset)
end = parent_offset + parent_size
while f.tell() < end:
atype, size, data_off = read_atom(f)
if atype is None or size < 8:
break
if atype == target_type:
return data_off, size - 8
f.seek(data_off + size - 8)
return None, 0
def parse_sver(data):
"""Parse an Sver atom's payload (8 bytes after header)."""
return data[:8].split(b'\x00')[0].decode('ascii')
def parse_mtyp(data):
"""Parse an Mtyp atom's payload (4 bytes after header)."""
return 'continuous' if data[0] == 1 else 'motion-capture'
def parse_mtma(data):
"""Parse Mtma entries from the payload after the 8-byte atom header."""
entries = []
entry_size = 42
count = len(data) // entry_size
for i in range(count):
e = data[i*entry_size:(i+1)*entry_size]
year, month, day, hour, minute, second, subsec = struct.unpack('>7h', e[0:14])
start, end = struct.unpack('>qq', e[16:32])
entries.append({
'time': f'{year:04d}-{month:02d}-{day:02d} {hour:02d}:{minute:02d}:{second:02d}.{subsec}',
'start': start, 'end': end
})
return entries
def parse_evnt(data):
"""Parse Evnt entries from the full atom payload (after 8-byte atom header)."""
record_size = data[0]
version = data[1]
entries = []
payload = data[4:] # skip the 4-byte val field
count = len(payload) // record_size
for i in range(count):
e = payload[i*record_size:(i+1)*record_size]
movie_time = struct.unpack('>I', e[0:4])[0]
trigger = bool(e[4])
classified = bool(e[8])
prob_h, prob_v, prob_a = e[9], e[10], e[11]
md_x, md_y, md_w, md_h = struct.unpack('>4H', e[12:20])
seq_id = struct.unpack('>I', e[20:24])[0]
arrived, departed = struct.unpack('<HH', e[24:28])
pr_x, pr_y, pr_w, pr_h = e[28], e[29], e[30], e[31]
entries.append({
'movieTime': movie_time,
'trigger': trigger,
'classified': classified,
'probHuman': prob_h if classified else None,
'probVehicle': prob_v if classified else None,
'probAnimal': prob_a if classified else None,
'mdRect': (md_x, md_y, md_w, md_h),
'seqId': seq_id,
'arrivedObjects': arrived,
'departedObjects': departed,
'presenceRect': (pr_x, pr_y, pr_w, pr_h),
})
return entries