8.5 Reviewing VoIP Quality Assessment Factors

During the VoIP Quality assessment, Vivinet Assessor calculates Call Quality based on a set of factors known to affect the perceived quality of voice over IP transmissions. Just as device and link utilization is measured and rated for VoIP readiness, Call Quality is also rated, and the results of the ratings are broken out in detail in the final VoIP Readiness report.

A subjective factor is necessarily part of evaluating VoIP because a listener must be able to understand the received transmission, and both talkers must be able to tolerate the amount of delay between speaking and being heard (called “the walkie-talkie effect”), lost or fractured syllables, and echo that often impede the conversation.

To determine the relative quality of each simulated VoIP call made during a VoIP Quality assessment, Vivinet Assessor measures the following quality impairment factors:

Each factor is measured, and the results are evaluated for VoIP readiness. You can determine how any of these factors are rated by changing the result ranges for the VoIP Quality assessment. For more information, see Section 7.8.1, Setting Result Ranges.

8.5.1 Mean Opinion Score

The chief unit of measurement for Call Quality in Vivinet Assessor VoIP Quality results is an estimated Mean Opinion Score, or MOS. The E-Model, ITU Standard G.107, quantifies what is essentially a subjective judgment: a user’s opinion of the perceived quality of a voice transmission. After much study, the ITU determined which impairment factors produced the strongest user perceptions of lower quality. The E-Model thus includes factors for equipment and impairments and takes into account typical users’ perceptions of voice transmissions affected by jitter, lost data, and delay.

In its calculations, Vivinet Assessor modifies the E-model slightly and adds three call quality categories, Good, Acceptable, and Poor, to help you determine how well VoIP performs on your network. You can change the way the categories are applied to conform to your own quality standards.

Readiness Assessment reports, however, also show how many calls could not be completed (“Unavailable” calls) and give you the objective measurements of lost data, jitter, and delay so that you can independently judge for yourself. See Section 11.2, VoIP Quality Assessment Errors.

Vivinet Assessor uses a modified version of the ITU G.107 standard E-Model equation to calculate a Mean Opinion Score (MOS) estimate for each call group.

The E-Model, developed by the European Telecommunications Standards Institute (ETSI), has become ITU standard G.107. This algorithm is used to evaluate the quality of a transmission by factoring in the “mouth-to-ear” characteristics of a speech path. The output of an E-model calculation is a single scalar, the “R-value,” which is derived from voice quality impairment factors. The R-value output is then mapped to an estimated MOS.

In calculating the MOS, Vivinet Assessor modifies the E-model slightly, using the following factors:

Factor

Description

End-to-End Delay

Delayed datagrams are perhaps the single greatest hindrance to VoIP call quality. This value includes all network (or one-way) delay, packetization delay, and jitter buffer delay between the endpoints For more information, see Section 8.5.2, Delay.

Jitter Buffer Loss

Jitter occurs when there are variations in datagram arrival times within a single transmission. When jitter exceeds jitter buffer capacity, datagram loss occurs, reducing call clarity. For more information see Section 8.5.3, Jitter.

Lost Data

Datagrams that never arrived at the receiver. When a datagram is lost, you can lose an entire syllable, and the more datagrams that are lost consecutively, the more the clarity suffers. For more information, see Section 8.5.5, Lost Data.

A MOS of 5 is excellent; a MOS of 1 is unacceptably poor. The following table (taken from ITU G.107) summarizes the relationship between the MOS and user satisfaction:

MOS (lower limit)

User Satisfaction

4.34

Very satisfied

4.03

Satisfied

3.60

Some users dissatisfied

3.10

Many users dissatisfied

2.58

Nearly all users dissatisfied

By default, Vivinet Assessor maps MOS estimates to the three readiness ratings categories as follows:

MOS Range

Score

Meaning

5.0 - 4.03

Good (green)

Most or nearly all users satisfied

4.02 - 3.60

Acceptable (yellow)

Some users satisfied

Below 3.60

Poor (red)

Most or nearly all users dissatisfied

If the MOS value calculated for a call group during an assessment is less than the maximum possible for the codec being used, the “Factors Affecting Call Quality” charts in the report will show which impairment factors contributed to the degradation in the MOS, and how much they contributed. Therefore, these tables provide rudimentary guidelines for improving call quality on your network.

8.5.2 Delay

The end-to-end delay, or latency, as measured between the endpoints is a key factor in determining VoIP call quality. Vivinet Assessor calculates the end-to-end delay for calls between the endpoints in a single direction by adding the following factors:

Delay Type

How Calculated

Network (or one-way) delay

Datagram's RTP timestamp subtracted from the time it was received by receiving endpoint. Includes:

  • propagation delay: time spent on the actual network

  • transport delay: time spent getting through intermediate network devices, such as routers and switches

Packetization delay

Fixed value; dependent on codec selected. For more information, see Section 7.11.2, Reviewing Codec Types.

Jitter buffer delay

Fixed value; dependent on type and size of jitter buffer configured by user. For more information, see Section 7.11.6, Understanding Jitter Buffers.

Additional fixed delay

Fixed value; user-configured. For more information, see Section 7.10.1, Adding a Call Script.

There is a distinction between the delay impairment factor and the delay statistic. In charts reporting delay statistics, the calculations include all the factors shown in the table above. However, in charts showing call-quality impairment factors, the packetization delay is included in the Codec impairment value, not in the Delay impairment value.

Most callers notice round-trip delays in excess of 250 ms. ITU-T standard G.114 specifies 150 ms as the maximum one-way delay that is tolerable for high-quality VoIP, so you should consider these factors when determining your delay budget.

How Endpoints Calculate Delay

To provide a useful measurement of delay for VoIP, the endpoints in each call group must continuously synchronize their high-precision clocks. The endpoints maintain virtual (software) clocks for each partner involved in a VoIP test. These virtual clocks consist of the offset between the microsecond clocks maintained by the two endpoints

A high-resolution microsecond clock is maintained independently of the operating system’s system clock. The endpoints paired with each other in a call group compare their respective versions of the clocks prior to the start of each set of simulated calls and periodically during the calls. They also measure clock synchronization and drift between sets of calls to establish a track record for the expected delay. If an error occurs in this process, you will see error message CHR0359:

An error was detected in the high precision timer.

The Windows 98 and Windows Me operating systems do not support high-precision timing very well. You will see CHR0384 if you try to use endpoints on these operating systems in your VoIP Quality assessment.

8.5.3 Jitter

As simulated calls run during a VoIP Quality assessment, the endpoints calculate jitter, a factor known to adversely affect call quality. Jitter is also called delay variation, and it indicates the differences in arrival times among all datagrams sent during a simulated voice over IP call.

When a datagram is sent, the sender (one of the ) gives it a timestamp. When it is received, the receiver adds another timestamp. These two timestamps are used to calculate the datagram’s transit time. If the transit times for datagrams within the same call are different, the call contains jitter. In a telephone call, the effects of jitter may be similar to the effects of packet loss: some words may be missing or garbled.

The amount of jitter in a call depends on the degree of difference between the datagrams’ transit times. If the transit time for all datagrams is the same—no matter how long it took for the datagrams to arrive—the call contains no jitter. If the transit times differ slightly, the call contains some jitter. And if there is any jitter detected for a call, Vivinet Assessor measures jitter buffer loss as well.

Vivinet Assessor reports show jitter as an average. But to calculate call quality scores, it uses the statistic for datagrams lost due to the size of the jitter buffer (“jitter buffer loss”) in the call script.

8.5.4 Jitter Buffers and Datagram Loss

VoIP equipment typically has a jitter buffer, either frame-based or absolute. A frame-based jitter buffer holds a given number of voice datagrams, whereas an absolute jitter buffer is based on time. They both smooth out VoIP network jitter. All call scripts used in VoIP Quality assessments by default emulate a frame-based jitter buffer of two datagrams. For more information, see Section 7.11.6, Understanding Jitter Buffers.

Although jitter buffers smooth out jitter by feeding datagrams to the application in a steady stream, they also exacerbate data loss: datagrams that are not contained by the jitter buffer are discarded. Thus, Vivinet Assessor’s jitter buffer lost datagrams statistic includes:

  • jitter buffer overruns—datagrams that had a delay variation greater than the jitter buffer size or were delayed too long. For example, a datagram with a delay of 50 ms would not be contained in a jitter buffer set to 40 ms. Or if five datagrams were delayed sequentially, an absolute jitter buffer set to two datagrams would discard three datagrams.

  • jitter buffer underruns—datagrams that arrived too quickly while the jitter buffer was still full.

Jitter buffers may be static or dynamic. Each type of buffer has its strengths, but it is in the nature of IP networks to exact a trade-off. One assumption used in calculating call quality is that buffering not only causes loss, but also adds delay, which can offset the positive effects of smoothing out jitter.

In charts reporting jitter statistics, the calculations are based on jitter buffer loss, but the jitter average is also shown in the accompanying table. Similarly, in charts showing call-quality impairment factors, the jitter impairment factor reflects jitter buffer loss for the call scripts used.

When jitter is detected, jitter buffer loss totals are factored into the Mean Opinion Score estimate and also reported separately. You can determine what levels of jitter buffer loss are acceptable on your network by configuring result ranges. For more information, see Section 5.2, Determining Assessment Result Ranges.

Jitter buffer loss in excess of .5% of all datagrams sent in a call can adversely affect call quality.

8.5.5 Lost Data

In VoIP Readiness Assessment reports, Vivinet Assessor includes statistics on lost packets, expressed as a percentage of all data sent in the relevant calls. For example, in charts indicating lost data by call group, lost data is expressed as a percentage of all data sent between the endpoints in the call group over the course of the entire assessment. Other charts might show data loss as a percentage of data sent at a certain time of day, averaged over the course of all days in the assessment.

When a packet is lost during a VoIP transmission, you can lose an entire syllable or word in a conversation. Obviously, packet loss can severely impair call quality. Vivinet Assessor therefore includes data loss as a call quality impairment factor in calculating the MOS of each simulated VoIP call. Industry-wide, greater than 5% packet loss is considered a problem for VoIP.

To measure data loss, one computer in each call group keeps track of how many bytes of data it sent. The sending endpoint reports to the receiving endpoint how many bytes it sent, and the receiver compares that value to the amount received to determine lost data.

In Analysis Console, you can see values for Maximum Consecutive Datagram Loss, which is not included in the reports. Consecutive loss has a greater negative impact on call quality than simple datagram loss. Analysis Console also shows how many datagrams were received out of order over the course of the assessment. For more information, see Section 9.0, Working with Analysis Console.

If you have packet loss concealment (PLC) enabled for the G.711 codecs, call quality will improve if any data is lost during the assessment. PLC may make the codec itself more expensive to manufacture, but it does not add delay or have other bad side-effects. The VoIP equipment you plan to purchase probably uses PLC, which has become an industry standard. For more information, see Section 7.11.2, Reviewing Codec Types.

8.5.6 Time Zone Considerations

If you run a VoIP Quality assessment across various time zones, be aware that you will need to “translate” some of your results to understand the actual times they were recorded. Vivinet Assessor creates graphs and reports for all results in the Executive Summary or Complete Report as if they have occurred in the local time zone—the time zone where the Console is located.

Unless you take time zones into account, you may misinterpret some results. Here is an example:

VoIP quality may improve from noon to 1 PM for call groups located on the East Coast as data traffic on the network is reduced during the lunch hour. However, this same time slot on the West Coast (that is, 9 AM to 10 AM) is a likely to be a time of heavy traffic, and VoIP quality may be much lower for these call groups. Yet Vivinet Assessor reports will graph the 9 AM to 10 AM West Coast call groups in the same time slot as the 12 to 1 PM call groups on the East Coast. In graphs that show average quality broken out by time of day, be aware of the role that time zones are playing.

In addition, keep in mind that Vivinet Assessor is using the localization (date and time) settings of the operating system when it generates charts and graphs in Analysis Console, as well as reports. The Scheduler service also uses these settings. If you change the Windows default setting to Automatically adjust clock for daylight saving changes on the Console computer, you need to reboot the computer or restart the Scheduler service so that it will use the same settings. Otherwise, you may see a negative value for the “% complete” reading during Verification.