Fwd: Re: [AVT] sender report rtp timestamp
Robert Olson
olson at mcs.anl.gov
Thu Jan 2 14:16:58 CST 2003
FYI on RTP timestamp calculations.
>Date: Mon, 30 Dec 2002 12:42:04 -0800 (PST)
>From: Stephen Casner <casner at acm.org>
>To: Matthew Heaney <mheaney at on2.com>
>cc: avt at ietf.org
>Subject: Re: [AVT] sender report rtp timestamp
>Sender: avt-admin at ietf.org
>X-BeenThere: avt at ietf.org
>X-Mailman-Version: 2.0.12
>List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/avt>,
> <mailto:avt-request at ietf.org?subject=unsubscribe>
>List-Id: Audio/Video Transport Working Group <avt.ietf.org>
>List-Post: <mailto:avt at ietf.org>
>List-Help: <mailto:avt-request at ietf.org?subject=help>
>List-Subscribe: <https://www1.ietf.org/mailman/listinfo/avt>,
> <mailto:avt-request at ietf.org?subject=subscribe>
>X-Spam-Status: No, hits=-4.4 required=5.0 tests=IN_REP_TO version=2.21
>X-Spam-Level:
>
>On Mon, 30 Dec 2002, Matthew Heaney wrote:
>
> > Section 6.4.1 of the RFC describes the RTP timestamp of the RTCP sender
> report this way:
> >
> > RTP timestamp: 32 bits
> > Corresponds to the same time as the NTP timestamp (above), but in
> > the same units and with the same random offset as the RTP timestamps
> > in data packets.
> >
> > Does anyone know how to calculate this? I understand that an RTP
> > timestamp has a start time (the initial value is random, and
> > assigned at session-creation time), but how to you convert a NTP
> > value to RTP timestamp format?
>
>
>I'm working on a note to describe how I think intra-media and
>inter-media synchronization (e.g., audio/video lip sync) should be
>implemented in RTP. It is not complete, but below are the parts that
>address the method for calculating the RTP timestamp corresponding to
>the NTP timestamp in an RTCP SR packet.
>
>An introduction to some of the notation (which is partially borrowed):
>
> First letter
>
> T = reference timestamp, often wall-clock (absolute) time, either
> in 64-bit fixed-point NTP format or in double-precision
> floating-point format
>
> M = media timestamp, i.e., the 32-bit RTP timestamp that ticks at
> the natural rate of the medium
>
> O = a media timestamp offset (difference between two timestamps)
>
> R = the clock rate for a media timestamp, in Hz
>
> Second or third letter
>
> l = local (that is L, not one)
>
> r = remote
>
> a = audio
>
> v = video
>
>For example, Mla is the local audio clock, which represents both the
>audio sample being captured at any instant and the sample being
>rendered at the same instant, if the audio is full-duplex on the same
>clock, or just the sample being rendered if the application is
>receive-only. Similarly, Mlv is the local video clock, but it may be
>useful only for the capture side because the video capture and video
>rendering may not be synchronous (even different frame rates). This
>is one reason why audio is more likely to be the master -- the audio
>needs to be continuous, and video can be more flexible. Per the RTP
>spec, the Ml clocks are initialized with random values.
>
> - For each medium, maintain the relationship between local reference
> clock Tl (NTP timestamps) and local media clock Ml (RTP
> timestamps) for media clock rate R. The result is an offset Ol,
> normally kept as a 32-bit integer:
>
> Ola = Mla - (Tl - Da) * Ra
> Olv = Mlv - (Tl - Dv) * Rv
>
> This is implemented by obtaining timestamp pairs Ml (the timestamp
> of the first sample in the block) and Tl (reference/system time)
> at each media input interrupt. D is the (configured) estimate of
> the delay from capture of the first sample to the interrupt, in
> reference clock units (presumably a fraction of a second). After
> multiplying by R, the whole part of the partial result is
> truncated to 32 bits (i.e., mod 2**32) to be subtracted from Ml.
>
> Since there is some interrupt jitter, we need to use a minimum
> filter to track the earliest interrupts, which all have delay
> values close to each other (that's D). This minimum filter is
> relatively easy to implement because the time constant can be
> short, and it is robust because there are no shifts (assuming no
> overruns). Ol changes (slowly) to track the skew between the
> local reference clock and media clock.
>
>- Since the local media clock is not directly readable, the offset
> Ol can be used to calculate the current value of the media clock
> at any time using the readable reference clock (usually the system
> clock):
>
> Mla = Tl * Ra + Ola
> Mlv = Tl * Rv + Olv
>
> Again here, the result of the multiplication is truncated to a
> 32-bit integer (i.e., mod 2**32). We don't have to worry about
> timestamp wraparound because this is all modulo arithmetic.
>
> This equation is used to calculate the RTP timestamps in RTCP
> Sender Report (SR) packets. Whenever it is time to send an SR,
> the NTP timestamp for the SR is Tl and the RTP timestamp is Ml as
> calculated. Note that Ml is generally NOT equal to the RTP
> timestamp in the last data packet transmitted since that timestamp
> would correspond to a reference timestamp at least D earlier (the
> capture delay).
>
>To achieve sample-accurate synchronization, the reference time
>corresponding to the media sample time must be obtained very
>accurately. In addition, the truncation of a fraction of a sample
>time must be avoided in the calculations of Ol and Ml above. It is
>possible to keep Ol as a floating point number, but when calculating
>the integer Ml is it necessary to fudge the value of Tl so that no
>fraction is truncated. This will affect the RTT calculation that can
>be done with RTCP SR timestamps, but in most cases not significantly.
>This level of accuracy is not necessary for typical audio/video
>synchronization, but is necessary when synchronizing multiple channels
>of audio sent in separate streams.
>
> -- Steve
>
>_______________________________________________
>Audio/Video Transport Working Group
>avt at ietf.org
>https://www1.ietf.org/mailman/listinfo/avt
More information about the ag-dev
mailing list