Why is my sound faster than my video?

It’s a common experience when watching videos or live streams online that the audio seems slightly ahead of the video. This phenomenon is often referred to as “sound faster than video.” While subtle, it can be annoying and distracting for viewers. The underlying reason has to do with the different speeds that sound and light travel at, as well as how audio and video signals are encoded and buffered during transmission and playback. This article will provide an overview of why we perceive sound as arriving before synchronized video, and some potential solutions to align audio and video.

Speed of Sound

The speed of sound refers to how fast sound waves travel through a medium like air. Specifically, it is defined as the distance traveled per unit of time by a sound wave propagating through an elastic medium. At sea level and room temperature of 20°C (68°F), the speed of sound in air is approximately 343 meters per second or 1,235 kilometers per hour.[1]

The speed of sound depends on the medium it is traveling through as well as environmental factors like temperature and pressure. In air, sound travels faster when the air is warmer since the molecules have more kinetic energy and vibrate more quickly. For example, at 0°C the speed of sound in air is 331 m/s, while at room temperature of 20°C it is 343 m/s.[2]

In summary, sound travels extremely fast through air, covering over 340 meters every second at common temperatures and pressure levels. This speed is much slower than the speed of light, but still fast enough for the human ear to perceive sound as instantaneous.

Speed of Light

Light travels incredibly fast. According to the scientific sources https://en.wikipedia.org/wiki/Speed_of_light and https://www.space.com/15830-light-speed.html, the speed of light in a vacuum is exactly 299,792,458 meters per second. That equates to about 186,000 miles per second. To put into perspective just how fast light travels, as explained by https://www.grc.nasa.gov/www/k-12/Numbers/Math/Mathematical_Thinking/how_fast_is_the_speed.htm, a person traveling at the speed of light could circle the Earth’s equator over 7 times in just 1 second. The speed of light is constant and finite, though incredibly fast on a human scale.

Video Encoding

Video encoding is the process of compressing and formatting raw video into a digital file or format that can be efficiently transmitted or stored and later decoded for viewing. Common video codecs and formats used for encoding include H.264/MPEG-4 AVC, HEVC/H.265, VP9, AV1, and MPEG-2 ¹. During encoding, the video is analyzed frame by frame, broken into pixel blocks, and algorithms are applied to remove spatial and temporal redundancy, allowing the video to be compressed significantly without major losses in visual quality. The encoded video is then wrapped into a container format like MP4, MOV, or MKV along with audio and metadata to produce the final video file.

For transmission, the compressed digital video file passes through a video encoder which further prepares it for delivery over the internet or cable systems. The encoder breaks the video into small packets and may apply additional compression and data reduction techniques. Packets also receive headers with information to ensure proper decoding and synchronization. The video can then be multiplexed and transmitted efficiently over networks, where a decoder unpacks the packets, rebuilds frames, and displays the video for viewing.

Audio Encoding

Audio encoding refers to the process of converting analog audio signals into digital format for transmission or storage. This is done through an audio codec or encoder. According to the documentation from Cloud Speech-to-Text, “An audio encoding refers to the manner in which audio data is stored and transmitted.”

Some common audio codecs used for encoding include MP3, AAC, Ogg Vorbis, FLAC, PCM, and more. These compress the audio data to reduce file size while maintaining quality. The analog audio input first gets sampled and quantized into PCM digital audio. This PCM audio then gets compressed by the codec into the desired format based on lossy or lossless algorithms.

For example, the MP3 audio format uses perceptual audio coding and psychoacoustic models to compress the data by removing frequencies less audible to the human ear. This compressed MP3 audio can then be transmitted over the internet or stored digitally on devices. According to the Audio Encoding 101 guide, “Compressed audio formats take up a lot more digital space. Compressed files are better for storing and transmitting audio where file size matters.”

Buffering

Buffering occurs when a video player downloads a certain amount of video and audio data before starting playback. This buffer acts as a reservoir to ensure smooth, uninterrupted playback in case there are momentary drops in network speed (Cloudflare).

Buffers allow streaming to adapt to changes in network conditions. A larger buffer means more data is stored before starting playback. This minimizes stalling and rebuffering events, but can increase start-up delay. A smaller buffer reduces start-up delay, but is more prone to rebuffering events when the network slows (Kaltura).

Overall, buffering enables smooth streaming by storing data to cover gaps caused by network fluctuations. However, improper buffer settings can negatively impact the viewer experience with excessive start-up delays or rebuffering events.

Latency

There can be latency, or delays, between audio and video signals for several reasons. Audio signals move very fast, near the speed of sound. According to Wikipedia, the speed of sound is 343 m/s or 767 mph. Video signals move at the speed of light, which is much faster at 186,000 miles per second. Because of this large difference in speeds, audio will often arrive before video if they are not properly synchronized.

Additionally, both audio and video signals require processing which can introduce delays. As Allion explains, screens have to process audio and video signals separately, which takes a small amount of time and can cause latency between the two. Video encoding and decoding can also introduce delays as complex video compression algorithms require significant processing.

Buffering, network delays, and other latency in transmission can also separate audio and video timing. Overall, the relatively slow speed of sound compared to light, processing requirements, and transmission delays are the main causes of latency differences between audio and video.

Synchronization

The synchronization of audio and video refers to their relative timing during playback. A common issue that can occur is when the audio plays faster than the video, causing them to become progressively out of sync. This is often noticeable in streaming videos or recordings, where the audio will be heard before the corresponding video.

There are a few key reasons why the audio and video can become unsynchronized:

Audio is processed and encoded much faster than video. Audio packets can be sent and buffered more quickly than video frames (Source). This means the audio often gets ahead of the video playback.
Network latency and jitter can affect the video stream more than the audio. Any lag or variation in packet delivery time will likely delay the video playback while audio continues uninterrupted.
The audio and video streams are compressed and encoded separately. The relative timing may drift, particularly if variable frame rates/sampling rates are used.

Buffering differences between audio and video can build up over time. For example, audio may begin playback while video is still buffering, allowing audio to get ahead.

Overall, the challenge is that audio and video timing drift apart during processing, streaming, and playback. Solutions are needed to detect and correct this drift for proper lip sync and synchronization.

Solutions

There are a few tips to help fix audio and video synchronization issues:

Connect your sound system directly to the video source rather than passing through the TV – according to The Master Switch, this avoids delays caused by TV processing.

Check the audio delay/sync settings on both source devices and the TV. There may be an option to adjust timing and get audio back in sync.

Try updating drivers for video cards and sound cards on computers experiencing sync problems – outdated drivers can cause latency.

Adjust the TV’s processing settings – disable any extra audio effects or ‘enhancements’ that could slow down audio.

Factory reset the TV to clear any problematic customized settings.

Use the source device’s audio output rather than the TV speakers if possible – TV speakers can lag behind while built-in speakers on devices may have better sync.

Conclusion

In summary, the apparent faster speed of sound compared to video is caused by differences in how audio and video data are encoded and buffered. While sound travels slower than light in reality, audio requires less data and is therefore faster to encode and decode than video. This difference is exaggerated by audio buffering techniques that allow playback to begin before the full file is downloaded. To resynchronize, one solution is to intentionally delay audio playback to match video. But for livestreams and video calls, ensuring both are encoded at the same rate and managing latency across the network is key. While the gap cannot be fully closed, an awareness of the technical factors helps explain this common A/V sync issue.