A Video Compression Scheme using Audio Frequency Analyses,

or Voice Recognition.

Nathan Hemenway


Various methods and pattern matching techniques now exist

where by physical, psycho-acoustic events can be detected

from any given audio source.

These methods might be Blind Source Separation, Multivariate

Mixture of Gaussians, Hidden Markov Modeling, or perhaps

simple density analyses or volume level changing. A plethora

of digital signal processing techniques exist.

This paper describes a A Video Compression Scheme using

Audio Frequency Analyses, or Voice Recognition as a first layer

in an algorithm that achieves demonstrated improved psycho-acoustic

performance for a video compression scheme.



Video compression algorithms historically have used

sophisticated analyses methods to reduce the bit rate

and throughput of each frame of video. The reason is

simple, reduce the amount of data transfer per frame,

and you reduce the total amount of overall size of data

transferred. Also the per frame overhead of rendering

each frame is reduced. The result: an efficient mechanism

for delivering video over media that have bandwidth issues.

These techniques almost always rely on complex image

analyses where by the video frame is reduced to the lowest

denominator of color, motion, and detail.

Given any video segment, certain frames are chosen which

are used in these analyses. Too often these frames, or key frames

are chosen at constant distances apart, or at moments

in the sequence where motion, or change is detected.

Too often, key or crucial moments where audio events

are happening, there is no video key frame selected.

This results often in artifacts we recognize as where

video and audio are not synching, or are perceived

as mismatched.