Note for Survey on Bitrate Adaptation Schemes for Streaming Media Over HTTP (1)
Level: IEEE Communications Surveys & Tutorials 2019
Traditional non-HAS IP-based streaming
The client receives media that is typically pushed by a media server using connection-oriented protocol such as Real-time Messaging Protocol(RTMP/TCP) or connectionless protocol such as Real-time Transport Protocol(RTP/UDP).
Real-time Streaming Protocol(RTSP) is a common protocol to control the media servers, which is responsible for setting up a streaming session and keeping the state information during this session, but is not responsible for actual media delivery(task for protocol like RTP).
The media server performs rate adaption and data delivery scheduling based on the RTP Control Protocol(RTCP) reports sent by the client.
When it comes to NAT and firewall, additional protocols or configurations are needed during the session establishment.
The characteristics result in complex and expensive servers. These scalability and vendor dependency issues as well as high maintenance costs have resulted in deployment challenges for protocols like RTSP.
Around 2005, HTTP adaptive streaming(HAS) became popular and dominant, which treated the media content like regular Web content and delivered it in small pieces over HTTP protocol.
- HTTP as application and TCP as the transport-layer protocol.
- Client pull the data from a standard HTTP server, which simply hosts the media content.
- HAS solutions employ dynamic adaptation with respect to varying network conditions to provide a seamless streaming experience.
- The original file/stream is partitioned into segments (also called chunks) of equi-length playback time. Multiple versions(also called representations) of each segment are generated that vary in bitrate/resolution/quality using an encoder or a transcoder.
- The server generates an index file, which is a manifest that lists the available representations including HTTP urls to identify the segments along with their availability times.
- The client first receives the manifest that contains the metadata for video, audio, subtitles and other features, then constantly measures certain parameters: available network bandwidth, buffer status, battery and CPU levels, etc. According to these parameters, the HAS client repeatedly fetches the most suitable next segment among the available representations from the server.
- It use HTTP to deliver video segments, which simplifies the traversal through NATs and firewalls.
- At the server side, it use conventional Web servers or caches available within the networks of ISPs and CDNs.
- At the client side, it requests and fetches each segment independently from others and maintains the playback session state, whereas the server is not required to maintain any state.
- It doesn’t require a persistent connection between the client and server, which improves system scalability and reduces implementation and deployment costs.
Multi-Client Competition/Stability Issues
A centralized management controller can enhance the overall video quality, while improve QoE.
A robust HAS scheme should achieve 3 main objectives:
- Stability: HAS clients should avoid frequent bitrate switching.
- Fairness: Multiple HAS clients competing for available bandwidth should equally share network resources based on viewer, content and device characteristics.
- High Utilization: While the clients attempt to be stable and fair, network resources should be used as efficiently as possible.
A streaming session consists of 2 states: buffer-filling state and steady state.
The buffer-filling state aims to fill the playback buffer and reach a certain threshold where the playback can be initiated or resumed.
The steady state is to keep the buffer level above a minimum threshold despite bandwidth fluctuation or interruptions. The steady state consists of 2 activity periods referred to as ON and OFF.
The client requests a segment every $T_s$ time units, where $T_s$ represents the content time duration of each segment, and sum of ON and OFF period durations equals $T_s$.
- ON period: client downloads the current segment and notes the achieved throughput value that will be later used in selecting the appropriate bitrate for future segments.
- OFF period: client becomes idle temporarily.
There are different cases during competition process.
The ON periods of clients don’t overlap during the current segment download, each client will overestimate the available bandwidth. So longer download time will cause the initially non-overlapping ON periods to eventually start overlapping.
As the amount of overlap increases, the clients will have lower bandwidth estimations and start selecting segments that have lower bitrate. These segment will take less time to download, causing the amount of overlap among the ON periods to precedurally shorten, until the process reverts to its initial situation.
The cycle repeats itself, causing periodic up and down shift in the selected bitrates, leading to unstable video quality, unfairness, and underutilization.
The correlation between video bitrate and its perceptual quality is non-linear.
- Different video content types have unique characteristics.
- Differences of inter-stream and intra-stream video scene complexity across content.
QoE Optimization and Measurement
HAS scheme uses application control loop, which also interacts with a lower-layer control loop(such as TCP congestion control). It plays a key role in determining the viewer QoE.
Factors influencing QoE are categorized as:
- Perceptual, directly perceived by the viewer.
- Technical, indirectly affecting the QoE.
Perceptual factors include the video image quality, initial delay, stalling duration and frequency.
The impact of these factors differs depending on the users subjectivity.
Most users consider initial delays less critical than stalling.
Technical factors include the algorithms, parameters, and hardware/software used in streaming system.
Specifically, factors are:
- Server side: encoding parameters, video qualities and segment size.
- Client side: adaptation parameters and environment that clients reside in.
- Objective matrics: Peak Signal-to-Noise Ratio(PSNR), Structural SIMilarity(SSIM and SSIMplus), Perceived Video Quality(PVQ) and Statistically Indifferent Quality Variation(SIQV).
- Subjective matrics: Mean Opinion Score(MOS).
- Quality-of-Service (QoS)-derived matrics: startup delay, average video bitrate, quality switches and rebuffering events.
Try to optimize each metric is difficult because it may result in conflicts.
Inter-Destination Multimedia Synchronization
Online communities are drifting towards watching online videos together in a synchronized manner.
Having Multiple streaming clients distributed in different geographical locations poses challenges in delivering video content simultaneously, while keeping the playback state of each client the same.
Typically, IDMS solutions involve a master node to which clients synchronize their playout to.
Rainer et proposed an IDMS architecture for DASH by using a distribute control scheme where peers can communicate and negotiate a reference placback timestamp in each session.
In another work, Rainer et provided a crowdsourced subjective evaluation to find a asynchronism threshold at which QoE was not significantly affected.