Verlustlose und verlustbehaftete Kompression von Screen-Content-Daten mit Hilfe von maschinellem Lernen / Lossless and lossy Compression of Screen Content Data using Machine Learning

Overview

This project explores a novel approach for the compression of screen content data, taking into account different aspects of the image properties and the specific objective of the compression. Unlike traditional photo or broadcast video data, the content of screen content images is very diverse in terms of its statistical properties. Often, certain regions within the images are characterized by two typical properties: a limited number of colors and repeating patterns. Studies have shown that conventional compression methods are not able to store screen content data efficiently, even with the inclusion of special tools. Much more successful is a method based on ideal data encoding, whose success depends on optimal modeling of the probability distributions of the pixel symbols. For this new compression method, a model approach has already been implemented (prototype), but it is still limited to lossless compression and achieves high compression efficiency only for image content with certain properties. The project investigates new methods for better modeling of the probability distributions of the image point symbols using machine learning methods. Overall, the project pursues several approaches:

Der vorhandene Prototyp verwendet in einigen Verarbeitungsstufen lediglich globale Bildinformation für das Modellieren der Wahrscheinlichkeitsverteilungen. Durch Schätzverfahren, die mehr Vorwissen, z. B. lokaler Natur, einfließen lassen, sind deutliche Gewinne in der Kompression zu erwarten. Dafür werden alternative Lernverfahren in Betracht gezogen. Generell geht es um das Lernen mit wenigen Beispielen. Für diesen Ansatz sind höhere Kompressionsverhältnisse zu erzielen als mit klassischen Verfahren, welche zunächst das Bild in Regionen unterschiedlichen Typs segmentieren und anschließend ggf. auf herkömmliche Kompressionsmethoden umschalten.

The existing prototype uses only global image information in some processing stages for modeling the probability distributions. Significant gains in compression can be expected through estimation techniques that incorporate more prior knowledge, e.g., of a local nature. Alternative learning methods are considered for this purpose. In general, the focus is on learning with few examples. For this approach, higher compression ratios can be achieved than with classical methods, which first segment the image into regions of different types and then switch to conventional compression methods if necessary.

An extension of the model approach to image sequence compression is possible in principle and is being pursued. Taking the temporal component into account can lead to improved modeling of the probability distribution depending on the situation and must therefore be explored. In image sequences, changes in signal statistics typically occur due to newly emerging content or scene changes. It must be specifically investigated how the existing or new learning procedure can be supplemented with suitable elements for forgetting or relearning.

Überblick

Dieses Projektes erforscht einen neuartigen Ansatz für die Kompression von Screen-Content-Daten unter Berücksichtigung von verschiedenen Facetten der Bildeigenschaften und der spezifischen Zielstellung der Kompression. Im Gegensatz zu klassischen Foto- oder Broadcast-Videodaten ist der Inhalt von Screen-Content-Bildern sehr divers hinsichtlich seiner statistischen Eigenschaften. Häufig sind bestimmte Regionen innerhalb der Bilder durch zwei typische Eigenschaften gekennzeichnet: eine begrenzte Anzahl von Farben und sich wiederholende Muster. Untersuchungen haben gezeigt, dass konventionelle Kompressionsverfahren auch mit Einbeziehen von speziellen Werkzeugen nicht in der Lage sind, Screen-Content-Daten effizient zu speichern. Deutlich erfolgreicher ist eine Methode basierend auf der idealen Datencodierung, deren Erfolg von einer optimalen Modellierung der Wahrscheinlichkeitsverteilungen der Bildpunktsymbole abhängt. Für diese neue Kompressionsmethode wurde bereits ein Modellansatz implementiert (Prototyp), welcher jedoch noch beschränkt ist auf die verlustlose Kompression und nur für Bildinhalte mit bestimmten Eigenschaften eine hohe Kompressionseffizienz erreicht. Das Projektvorhaben untersucht neue Verfahren zum besseren Modellieren der Wahrscheinlichkeitsverteilungen der Bildpunktsymbole unter Einbeziehung von Methoden des maschinellen Lernens. Insgesamt verfolgt das Vorhaben mehrere Ansätze:

Der vorhandene Prototyp verwendet in einigen Verarbeitungsstufen lediglich globale Bildinformation für das Modellieren der Wahrscheinlichkeitsverteilungen. Durch Schätzverfahren, die mehr Vorwissen, z. B. lokaler Natur, einfließen lassen, sind deutliche Gewinne in der Kompression zu erwarten. Dafür werden alternative Lernverfahren in Betracht gezogen. Generell geht es um das Lernen mit wenigen Beispielen. Für diesen Ansatz sind höhere Kompressionsverhältnisse zu erzielen als mit klassischen Verfahren, welche zunächst das Bild in Regionen unterschiedlichen Typs segmentieren und anschließend ggf. auf herkömmliche Kompressionsmethoden umschalten.

Das Forschungsvorhaben soll weiterhin untersuchen, wie eine geeignete Rate-Distortion-Optimierung das Verfahren zu einer nahezu-verlustlosen oder verlustbehafteten Kompression erweitern kann. Diese Optimierung kann durch eine dedizierte Bildanalyse parametrisiert werden, um Wahrnehmungsmodelle des menschlichen Sehens zu berücksichtigen.

Eine Erweiterung des Modellansatzes auf Bildsequenzkompression ist prinzipiell möglich und wird angestrebt. Das Berücksichtigen der zeitlichen Komponente kann situationsabhängig zu einer verbesserten Modellierung der Wahrscheinlichkeitsverteilung führen und muss deshalb erforscht werden. In Bildsequenzen treten typischerweise Änderungen in der Signalstatistik durch neu auftauchende Inhalte oder Szenenwechsel auf. Es ist speziell zu untersuchen, wie das vorhandene bzw. neue Lernverfahren mit geeigneten Elementen zum Vergessen bzw. Umlernen ergänzt werden können.

1

Investigations on algorithm selection for interval-based coding methods

Strutz, Tilo; Schreiber, Nico (2025)

Multimedia Tools and Applications.
DOI: 10.1007/s11042-025-20971-3

Peer Reviewed

ABSTRACT

There is a class of entropy-coding methods which do not substitute symbols by code words (such as Huffman coding), but operate on intervals or ranges and thus allow a better approximation of the data entropy. This class includes three prominent members: conventional arithmetic coding, range coding, and coding based on asymmetric numeral systems. To determine the correct symbol in the decoder, each of these methods requires the comparison of a state variable with subinterval boundaries.

In adaptive operation, considering varying symbol statistics, an array of interval boundaries must additionally be kept up to date. The larger the symbol alphabet, the more time-consuming both the search for the correct subinterval and the updating of interval borders become. These entropy coding methods play an important role in all data transmission and storage applications, and optimising speed can be crucial.

Based on detailed pseudo-code, different known and proposed approaches are discussed to speed up the symbol search in the decoder and the adaptation of the array of interval borders, both depending on the chosen alphabet size. It is shown that reducing the big O complexity in practical implementations does not necessarily lead to an acceleration, especially if the alphabet size is too small. For example, the symbol determination at the decoder shows an expected low cpu-clock ratio (O(logn) algorithm versus O(n) algorithm) of about 0.62 for an alphabet with 256 symbols. However, for an alphabet with only 4 symbols, this ratio is 1.05, that means the algorithm with lower theoretical complexity executes slightly faster here. In adaptive compression mode, the binary indexing (BI) method proves to be superior when considering the overall processing time. Although the symbol search (in the decoder) takes longer than using other algorithms (e.g. cpu-clock ratio BI/O(logn) is 1.57), the faster updating of the array of interval borders more than compensates for this disadvantage (total ratio BI/O(logn) is 0.72). A variant of the binary indexing method is proposed, which is more flexible and has a partially lower complexity than the original approach. Specifically, the rescaling of cumulative counts can be reduced in its complexity from O(4n+[log2(n)−2]·n/2) to O(3n).

Enhanced color palette modeling for lossless screen content

Och, Hannah; Uddehal, Shabhrish; Strutz, Tilo; Kaup, André (2024)

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP'24), 14-19 April 2024, Seoul, South Korea, accepted for publication 2024, 3670 - 3674.
DOI: 10.1109/ICASSP48485.2024.10446445

Peer Reviewed

ABSTRACT

Soft context formation is a lossless image coding method for screen content. It encodes images pixel by pixel via arithmetic coding by collecting statistics for probability distribution estimation. Its main pipeline includes three stages, namely a context model based stage, a color palette stage and a residual coding stage. Each stage is only employed if the previous stage is impossible since necessary statistics, e.g. colors or contexts, have not been learned yet. We propose the following enhancements: First, information from previous stages is used to remove redundant palette entries and prediction errors in subsequent stages. Additionally, implicitly known stage decision signals are no longer explicitly transmitted. These enhancements lead to an average bit rate decrease of 1.16% on the evaluated data. Compared to FLIF and HEVC, the proposed method needs roughly 0.28 and 0.17 bits per pixel less on average for 24-bit screen content images, respectively.

Improved screen content coding in VVC using soft context formation

Och, Hannah; Uddehal, Shabhrish; Strutz, Tilo; Kaup, André (2024)

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP'24), 14-19 April 2024, Seoul, South Korea, accepted for publication 2024, 3685 - 3689.
DOI: 10.1109/ICASSP48485.2024.10447125

Peer Reviewed

ABSTRACT

Screen content images typically contain a mix of natural and synthetic image parts. Synthetic sections usually are comprised of uniformly colored areas and repeating colors and patterns. In the VVC standard, these properties are exploited using Intra Block Copy and Palette Mode. In this paper, we show that pixel-wise lossless coding can outperform lossy VVC coding in such areas. We propose an enhanced VVC coding approach for screen content images using the principle of soft context formation. First, the image is separated into two layers in a block-wise manner using a learning-based method with four block features. Synthetic image parts are coded losslessly using soft context formation, the rest with VVC. We modify the available soft context formation coder to incorporate information gained by the decoded VVC layer for improved coding efficiency. Using this approach, we achieve Bjontegaard-Delta-rate gains of 4.98% on the evaluated data sets compared to VVC.

Image Segmentation for Improved Lossless Screen Content Compression

Uddehal, Shabhrish; Strutz, Tilo; Och, Hannah; Kaup, André (2023)

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP'23), 4-10 June 2023, Rhodes Island, Greece 2023.

Peer Reviewed

ABSTRACT

In recent years, it has been found that screen content images (SCI) can be effectively compressed based on appropriate probability modelling and suitable entropy coding methods such as arithmetic coding. The key objective is determining the best probability distribution for each pixel position. This strategy works particularly well for images with synthetic (textual) content. However, usually screen content images not only consist of synthetic but also pictorial (natural) regions. These images require diverse models of probability distributions to be optimally compressed. One way to achieve this goal is to separate synthetic and natural regions. This paper proposes a segmentation method that identifies natural regions enabling better adaptive treatment. It supplements a compression method known as Soft Context Formation (SCF) and operates as a pre-processing step. If at least one natural segment is found within the SCI, it is split into two subimages (natural and synthetic parts) and the process of modelling and coding is performed separately for both. For SCIs with natural regions, the proposed method achieves a bit-rate reduction of up to 11.6% and 1.52% with respect to HEVC and the previous version of the SCF.

Optimization of Probability Distributions for Residual Coding of Screen Content

Och, Hannah; Strutz, Tilo; Kaup, André (2021)

VCIP 2021, Munich, 5-8 December 2021.
DOI: 10.1109/VCIP53242.2021.9675326

Peer Reviewed

ABSTRACT

Probability distribution modeling is the basis for most competitive methods for lossless coding of screen content. One such state-of-the-art method is known as soft context formation (SCF). For each pixel to be encoded, a probability distribution is estimated based on the neighboring pattern and the occurrence of that pattern in the already encoded image. Using an arithmetic coder, the pixel color can thus be encoded very efficiently, provided that the current color has been observed before in association with a similar pattern. If this is not the case, the color is instead encoded using a color palette or, if it is still unknown, via residual coding. Both palette-based coding and residual coding have significantly worse compression efficiency than coding based on soft context formation. In this paper, the residual coding stage is improved by adaptively trimming the probability distributions for the residual error. Furthermore, an enhanced probability modeling for indicating a new color depending on the occurrence of new colors in the neighborhood is proposed. These modifications result in a bitrate reduction of up to 2.9% on average. Compared to HEVC (HM-16.21 + SCM-8.8) and FLIF, the improved SCF method saves on average about 11% and 18% rate, respectively.

Screen content compression based on enhanced soft context formation

Strutz, Tilo; Möller, Phillip (2020)

IEEE Transactions on Multimedia 22 (5), 1126 - 1138.
DOI: 10.1109/TMM.2019.2941270

Peer Reviewed

ABSTRACT

The compression of screen content has attracted the interest of researchers in the last years as the market for transferring data from computer displays is growing. It has already been shown that especially those methods can effectively compress screen contentwhich are able to predict the probability distribution of next pixel values. This prediction is typically based on a kind of learning process. The predictor learns the relationship between probable pixel colours and surrounding texture. Recently, an effective method called ‘soft context formation’ (SCF) had been proposed which achieves much lower bitrates for images with less than 8 000 colours than other state-of-the-art compression schemes.
This paper presents an enhanced version of SCF. The average lossless compression performance has increased by about 5% in
application to images with less than 8 000 colours and about 10% for imageswith up to 90 000 colours. In comparison to FLIF, FP8v3, andHEVC(HM−16.20+SCM−8.8), it achieves savings of about 33%, 4%, and 11% on average. The improvements compared to
the original version result from various modifications. The largest contribution is achieved by the local estimation of the probability
distribution for unpredictable colours in stage II of the compression scheme.