Responsive image





Bilddatenkompression

Strutz, Tilo (2025)


 

Vorwort zur sechsten Auflage


Die vorangegangenen Auflagen von „Bilddatenkompression“ reflektierten regelmäßig auch die Neuentwicklungen im Bereich Videokompression durch zusätzliche Abschnitte im Kapitel „Standards zur Bildsequenzkompression“. Inzwischen gibt es wieder neue Standards sowohl in der Einzelbild- als auch in der Bildsequenzkompression und viele vorangegangene Konzepte sind erweitert und ergänzt worden. Ich habe deshalb die neue Auflage für eine wesentliche Umstrukturierung genutzt. Standards zur Videokompression werden nicht mehr separat besprochen. Stattdessen finden sich nun die wesentlichen Methoden in den thematisch passenden Kapiteln wieder, wobei viele Textpassagen deutlich gestrafft oder komplett entfernt wurden, wenn sie keine Relevanz mehr für aktuelle Kompressionssysteme haben. Das Kapitel „Standards zur Einzelbildkompression“ wurde dagegen signifikant erweitert. Hinzugekommen ist ein Überblick über die wichtigsten JPEG- und andere Aktivitäten nach JPEG-2000 (JPEG-XR, -XT, -XS, -XL, -AI, HEIC) und ein Vergleich dieser Verfahren für verlustbehaftete und verlustlose Kompression in Anwendung auf verschiedene Farb- und Graustufenbilder. 


Den Ausführungen in Kapitel 1 wurde ein Abschnitt vorangestellt, welcher den Unterschied zwischen Daten und Information dediziert erklärt.
Das Kapitel „Entropiecodierung“ wurde mit ausführlichen Beschreibungen von weiteren Codierungsmethoden ergänzt, welche eine ähnliche Kompressionseffizienz wie die konventionelle arithmetische Codierung aufweisen, aber unter bestimmten Voraussetzungen schneller sind: Range-Codierung und Codierung basierend auf asymmetrischen Zahlensystemen. Die Ausführungen zu Move-to-Front und Incremental-Frequency-Count wurden ins Kapitel der Präcodierungsverfahren verschoben.
Der Abschnitt zu den Qualitätsmaßen (Abschnitt 2.5.2.1) erläutert neben dem Spitzen-Signal-Rausch-Verhältnis ein strukturelles Ähnlichkeitsmaß (SSIM). Im Abschnitt zur Prädiktion finden sich jetzt auch die blockbasierten Verfahren, welche in Videokompressionsstandards zum Einsatz kommen.


Kapitel 7 enthält neben den alten Ausführungen zu Wahrnehmung und Farbe nun einen Abschnitt zu verschiedenen Eigenschaften von Bildern, welche bei der Auswahl oder Entwicklung eines Kompresionsverfahrens berücksichtigt werden sollten. An vielen Stellen im Buch wurden Aussagen aktualisiert oder präzisiert, die Verständlichkeit erhöht oder kleinere Korrekturen vorgenommen. Beispielrechnungen wurden vom Fließtext in separate Boxen verschoben und zusätzliche Übungsaufgaben wurden formuliert. Auf abgedruckte Quelltexte im Anhang wird in dieser Auflage verzichtet.

Das vorliegende Buch ist im Einzelnen wie folgt gegliedert:
Kapitel 1 führt den Leser in die Problematik der Übertragung von Daten und in die Notwendigkeit der Kompression ein. Anschließend werden im zweiten Kapitel die Grundlagen der Datenkompression behandelt. Es wird begründet, warum Kompression möglich ist und wie man die Leistungsfähigkeit eines Kompressionsalgorithmus bewerten kann. Die Kapitel 3 und 4 befassen sich mit den Codierungsverfahren. Anhand von einfachen Beispielen wird zunächst die Codierung einzelner Symbole und die Anpassung an die statistischen Eigenschaften des zu verarbeitenden Signals beschrieben. Anschließend werden Präcodierungsverfahren erläutert, welche die Beziehungen zwischen den Symbolen eines Signals zur Steigerung der Kompression ausnutzen.
Kapitel 5 beschäftigt sich mit der Datenreduktion, also demWeglassen von (irrelevanter) Information. Es werden die Abtastratenumsetzung und Verfahren zur Quantisierung diskutiert.
Die Ausführungen im Kapitel 6 beinhalten nach einer Einführung in das Thema „Korrelation“ drei wesentliche Methoden zur Dekorrelation von Signalwerten. Den Beginn machen Techniken zur Prädiktion von Signalwerten. Danach werden die Grundlagen diskreter Transformationen erläutert und verschiedene Transformationsarten vorgestellt (DCT, DST, WHT, Ganzzahl-DCT, DWT). Eine besondere Stellung nehmen dabei die diskrete Wavelet-Transformation und die fraktale Transformation ein. Im dritten Teil dieses Kapitels werden die Grundlagen von Filterbänken behandelt. Sie sind für das Verständnis der praktischen Umsetzung von Wavelet-Transformationen hilfreich. Ein schneller Algorithmus für bestimmte Filterbank-Arten (das Lifting-Schema) wird vorgestellt.
Das folgende Kapitel 7 greift dieses Schema wieder auf, da es auch zur Dekorrelation der Farbkomponenten zur Anwendung kommt. Außer dem beschreibt Kapitel 7 die Eigenschaften des menschlichen Auges, das Helligkeits- und Farbsehen und befasst sich mit wesentliche Eigenschaften von Bildern. Das Verständnis der visuellen Wahrnehmung ist eine Voraussetzung für die sinnvolle Entwicklung von Algorithmen zur Bilddatenkompression.
Es werden alle modernen Farbräume und die entsprechenden Transformationen erläutert.
Das Kapitel 8 befasst sich mit Verfahren zur Kompression von einzelnen Bildern. Anhand der Standards JPEG-1, JPEG-LS und JPEG-2000 wird detailliert gezeigt, wie Verfahren und Methoden der Datenkompression zu leistungsfähigen Systemen kombiniert werden können. Ziel ist dabei nicht die vollständige Darlegung der Standards, sondern die Darstellung der Konzepte unter Bezugnahme auf die in den vorangegangenen Kapiteln beschriebenen Grundlagen. Des Weiteren wird ein Überblick über jüngere Standardisierungsaktivitäten gegeben und die Leistungsfähigkeit der verschiedenen Standards und proprietären Verfahren verglichen.
Kapitel 9 stellt grundlegende Methoden zur Bildsequenzkompression vor. Dabei geht es im Wesentlichen um das Verringern der zeitlichen Korrelation durch Methoden der Bewegungskompensation und die Kompensation der dabei auftretenden Kompressionsartefakte. Den Ausführungen wurde ein Überblick über Standardisierungen im Bereich Bildsequenzkompression vorangestellt.
Im Anhang des Buches finden sich die verwendeten Testbilder, die mathematischen Ableitungen sowohl der diskreten Kosinus- als auch der Sinus-Transformation (DCT, DST), ein vollständiges Beispiel für das Decodieren eines JPEG-1-Bitstroms, ein Beispiel zur Codierung in JPEG-2000 und Lösungen zu ausgewählten Testfragen.


Coburg, im September 2025 , Tilo Strutz

more

Investigations on algorithm selection for interval-based coding methods

Strutz, Tilo; Schreiber, Nico (2025)

Multimedia Tools and Applications.
DOI: 10.1007/s11042-025-20971-3


Peer Reviewed
 

There is a class of entropy-coding methods which do not substitute symbols by code words (such as Huffman coding), but operate on intervals or ranges and thus allow a better approximation of the data entropy. This class includes three prominent members: conventional arithmetic coding, range coding, and coding based on asymmetric numeral systems. To determine the correct symbol in the decoder, each of these methods requires the comparison of a state variable with subinterval boundaries.

In adaptive operation, considering varying symbol statistics, an array of interval boundaries must additionally be kept up to date. The larger the symbol alphabet, the more time-consuming both the search for the correct subinterval and the updating of interval borders become. These entropy coding methods play an important role in all data transmission and storage applications, and optimising speed can be crucial.

Based on detailed pseudo-code, different known and proposed approaches are discussed to speed up the symbol search in the decoder and the adaptation of the array of interval borders, both depending on the chosen alphabet size. It is shown that reducing the big O complexity in practical implementations does not necessarily lead to an acceleration, especially if the alphabet size is too small. For example, the symbol determination at the decoder shows an expected low cpu-clock ratio (O(logn) algorithm versus O(n) algorithm) of about 0.62 for an alphabet with 256 symbols. However, for an alphabet with only 4 symbols, this ratio is 1.05, that means the algorithm with lower theoretical complexity executes slightly faster here. In adaptive compression mode, the binary indexing (BI) method proves to be superior when considering the overall processing time. Although the symbol search (in the decoder) takes longer than using other algorithms (e.g. cpu-clock ratio BI/O(logn) is 1.57), the faster updating of the array of interval borders more than compensates for this disadvantage (total ratio BI/O(logn) is 0.72). A variant of the binary indexing method is proposed, which is more flexible and has a partially lower complexity than the original approach. Specifically, the rescaling of cumulative counts can be reduced in its complexity from O(4n+[log2(n)−2]·n/2) to O(3n).

more

Improved screen content coding in VVC using soft context formation

Och, Hannah; Uddehal, Shabhrish; Strutz, Tilo; Kaup, André (2024)

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP'24), 14-19 April 2024, Seoul, South Korea, accepted for publication 2024, 3685 - 3689.
DOI: 10.1109/ICASSP48485.2024.10447125


Peer Reviewed
 

Screen content images typically contain a mix of natural and synthetic image parts. Synthetic sections usually are comprised of uniformly colored areas and repeating colors and patterns. In the VVC standard, these properties are exploited using Intra Block Copy and Palette Mode. In this paper, we show that pixel-wise lossless coding can outperform lossy VVC coding in such areas. We propose an enhanced VVC coding approach for screen content images using the principle of soft context formation. First, the image is separated into two layers in a block-wise manner using a learning-based method with four block features. Synthetic image parts are coded losslessly using soft context formation, the rest with VVC. We modify the available soft context formation coder to incorporate information gained by the decoded VVC layer for improved coding efficiency. Using this approach, we achieve Bjontegaard-Delta-rate gains of 4.98% on the evaluated data sets compared to VVC.

more

Enhanced color palette modeling for lossless screen content

Och, Hannah; Uddehal, Shabhrish; Strutz, Tilo; Kaup, André (2024)

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP'24), 14-19 April 2024, Seoul, South Korea, accepted for publication 2024, 3670 - 3674.
DOI: 10.1109/ICASSP48485.2024.10446445


Peer Reviewed
 

Soft context formation is a lossless image coding method for screen content. It encodes images pixel by pixel via arithmetic coding by collecting statistics for probability distribution estimation. Its main pipeline includes three stages, namely a context model based stage, a color palette stage and a residual coding stage. Each stage is only employed if the previous stage is impossible since necessary statistics, e.g. colors or contexts, have not been learned yet. We propose the following enhancements: First, information from previous stages is used to remove redundant palette entries and prediction errors in subsequent stages. Additionally, implicitly known stage decision signals are no longer explicitly transmitted. These enhancements lead to an average bit rate decrease of 1.16% on the evaluated data. Compared to FLIF and HEVC, the proposed method needs roughly 0.28 and 0.17 bits per pixel less on average for 24-bit screen content images, respectively.

more

Rescaling of Symbol Counts for Adaptive rANS Coding

Strutz, Tilo (2023)

31st European Signal Processing Conference (EUSIPCO), September 04--08, 2023, Helsinki, Finnland, 585-589.


Peer Reviewed
 

The abbreviation rANS stands for a relatively new method of arithmetic coding based on asymmetric numeral systems (ANS) which combines the advantages of arithmetic coding in terms of performance and the advantages of Huffman coding in terms of speed.
Compared to conventional arithmetic coding methods, the mathematical apparatus is slightly different which has the consequence that the decoding order is reversed to the encoding order, i.e. the processing follows the last-in-first-out principle.
This makes it somewhat difficult to design the coding process to adapt to changing symbol statistics, and therefore rANS coding has so far only been applied in settings with fixed statistics.
In particular, the frequent rescaling of statistics required to reduce the influence of old symbols becomes a problem when the order of processing is different on the encoder and decoder sides.

This paper proposes a new method that allows adaptive coding within the framework of rANS coding and additionally offers the possibility of rescaling the symbols frequencies. Investigations show that this method enables the same compression performance for rANS as for conventional arithmetic coding.

more

Image Segmentation for Improved Lossless Screen Content Compression

Uddehal, Shabhrish; Strutz, Tilo; Och, Hannah; Kaup, André (2023)

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP'23), 4-10 June 2023, Rhodes Island, Greece 2023.


Peer Reviewed
 

In recent years, it has been found that screen content images (SCI) can be effectively compressed based on appropriate probability modelling and suitable entropy coding methods such as arithmetic coding. The key objective is determining the best probability distribution for each pixel position. This strategy works particularly well for images with synthetic (textual) content. However, usually screen content images not only consist of synthetic but also pictorial (natural) regions. These images require diverse models of probability distributions to be optimally compressed. One way to achieve this goal is to separate synthetic and natural regions. This paper proposes a segmentation method that identifies natural regions enabling better adaptive treatment. It supplements a compression method known as Soft Context Formation (SCF) and operates as a pre-processing step. If at least one natural segment is found within the SCI, it is split into two subimages (natural and synthetic parts) and the process of modelling and coding is performed separately for both. For SCIs with natural regions, the proposed method achieves a bit-rate reduction of up to 11.6% and 1.52% with respect to HEVC and the previous version of the SCF.

more

Re-Designing the Wheel for Systematic Travelling Salesmen

Strutz, Tilo (2023)

Algorithms 2023 (2), 91.
DOI: 10.3390/a16020091


Open Access Peer Reviewed
 

This paper investigates the systematic and complete usage of k-opt permutations with
k = 2 . . . 6 in application to local optimization of symmetric two-dimensional instances up to
107 points. The proposed method utilizes several techniques for accelerating the processing, such that
good tours can be achieved in limited time: candidates selection based on Delaunay triangulation,
precomputation of a sparse distance matrix, two-level data structure, and parallel processing based
on multithreading. The proposed approach finds good tours (excess of 0.72–8.68% over best-known
tour) in a single run within 30 min for instances with more than 105 points and specifically 3.37% for
the largest examined tour containing 107 points. The new method proves to be competitive with a
state-of-the-art approach based on the Lin–Kernigham–Helsgaun method (LKH) when applied to
clustered instances.

more

Optimization of Probability Distributions for Residual Coding of Screen Content

Och, Hannah; Strutz, Tilo; Kaup, André (2021)

VCIP 2021, Munich, 5-8 December 2021.
DOI: 10.1109/VCIP53242.2021.9675326


Peer Reviewed
 

Probability distribution modeling is the basis for most competitive methods for lossless coding of screen content. One such state-of-the-art method is known as soft context formation (SCF). For each pixel to be encoded, a probability distribution is estimated based on the neighboring pattern and the occurrence of that pattern in the already encoded image. Using an arithmetic coder, the pixel color can thus be encoded very efficiently, provided that the current color has been observed before in association with a similar pattern. If this is not the case, the color is instead encoded using a color palette or, if it is still unknown, via residual coding. Both palette-based coding and residual coding have significantly worse compression efficiency than coding based on soft context formation. In this paper, the residual coding stage is improved by adaptively trimming the probability distributions for the residual error. Furthermore, an enhanced probability modeling for indicating a new color depending on the occurrence of new colors in the neighborhood is proposed. These modifications result in a bitrate reduction of up to 2.9% on average. Compared to HEVC (HM-16.21 + SCM-8.8) and FLIF, the improved SCF method saves on average about 11% and 18% rate, respectively.

more

The Distance Transform and its Computation - An Introduction -

Strutz, Tilo (2021)

Technical paper, June, 2021, TECH/2021/06, arxiv.org/abs/2106.03503v1.
DOI: 10.48550/arXiv.2106.03503


 

Distance transformation is an image processing technique used for many different
applications. Related to a binary image, the general idea is to determine the distance of
all background points to the nearest object point (or vice versa). In this tutorial, different
approaches are explained in detail and compared using examples. Corresponding source
code is provided to facilitate own investigations. A particular objective of this tutorial
is to clarify the difference between arbitrary distance transforms and exact Euclidean
distance transformations.

more

Traveling Santa Problem: Optimization of a Million-Households Tour Within One Hour

Strutz, Tilo (2021)

Frontiers in Robotics and AI, 8:652417, 8.
DOI: 10.3389/frobt.2021.652417


Open Access Peer Reviewed
 

Finding the shortest tour visiting all given points at least ones belongs to the most famous optimization problems until today (TSP . . . travelling salesman problem). Optimal solutions exist for many problems up to several ten thousand points. The major difficulty in solving larger problems is the required computational complexity. This shifts the research from finding the optimum with no time limitation to approaches that find good but sub-optimal solutions in pre-defined limited time. This paper proposes a new approach for two-dimensional symmetric problems with more than a million coordinates that is able to create good initial tours within few minutes. It is based on a hierarchical clustering strategy and supports parallel processing. In addition, a method is proposed that can correct unfavourable paths with moderate computational complexity. The new approach is superior to state-of-the-art methods when applied to TSP instances with non-uniformly distributed coordinates.

more

Spatial Resolution-Independent CNN-based Person Detection in Agricultural Image Data

Strutz, Tilo; Leipnitz, Alexander; Jokisch, Oliver (2020)

5th Int. Conf. on Interactive Collaborative Robotics, ICR.


Peer Reviewed
 

Advanced object detectors based on Convolutional Neural Networks (CNNs) offer high detection rates for many application scenarios but only within their respective training, validation and test data. Recent studies show that such methods provide a limited generalization ability for unknown data, even for small image modifications including a limited scale invariance. Reliable person detection with aerial robots (Unmanned Aerial Vehicles, UAVs) is an essential task to fulfill high security requirements or to support robot control, communication, and human-robot interaction. Particularly in an agricultural context persons need to be detected from a long distance and a high altitude to allow the UAV an adequate and timely response. While UAVs are able to produce high resolution images that enable the detection of persons from a longer distance, typical CNN input layer sizes are comparably low. The inevitable scaling of images to match the input-layer size can lead to a further reduction in person sizes. We investigate the reliability of different YOLOv3 architectures for person detection in regard to those input-scaling effects. The popular VisDrone data set with its varying image resolutions and relatively small depiction of humans is used as well as high resolution UAV images from an agricultural data set. To overcome the scaling problem, an algorithm is presented for segmenting high resolution images in overlapping tiles that match the input-layer size. The number and overlap of the tiles are dynamically determined based on the image resolution. It is shown that the detection rate of very small persons in high resolution images can be improved using this tiling approach.

more

Screen content compression based on enhanced soft context formation

Strutz, Tilo; Möller, Phillip (2020)

IEEE Transactions on Multimedia 22 (5), 1126 - 1138.
DOI: 10.1109/TMM.2019.2941270


Peer Reviewed
 

The compression of screen content has attracted the interest of researchers in the last years as the market for transferring data from computer displays is growing. It has already been shown that especially those methods can effectively compress screen contentwhich are able to predict the probability distribution of next pixel values. This prediction is typically based on a kind of learning process. The predictor learns the relationship between probable pixel colours and surrounding texture. Recently, an effective method called ‘soft context formation’ (SCF) had been proposed which achieves much lower bitrates for images with less than 8 000 colours than other state-of-the-art compression schemes.
This paper presents an enhanced version of SCF. The average lossless compression performance has increased by about 5% in
application to images with less than 8 000 colours and about 10% for imageswith up to 90 000 colours. In comparison to FLIF, FP8v3, andHEVC(HM−16.20+SCM−8.8), it achieves savings of about 33%, 4%, and 11% on average. The improvements compared to
the original version result from various modifications. The largest contribution is achieved by the local estimation of the probability
distribution for unpredictable colours in stage II of the compression scheme.

more

Comparison of Light-Weight Multi-Scale CNNs for Texture Regression in Agricultural Context

Strutz, Tilo; Leipnitz, Alexander (2020)

28th European Signal Processing Conference (EUSIPCO) 2020.
DOI: 10.23919/Eusipco47968.2020.9287758


Peer Reviewed
more

Bilddatenkompression

Strutz, Tilo (2017)

Grundlagen, Codierung, Wavelets, JPEG, MPEG, H.264, HEVC. 5. Auflage.



Data Fitting and Uncertainty: A practical introduction to weighted least squares and beyond

Strutz, Tilo (2016)

2nd edition.



Prof. Dr.-Ing. habil. Tilo Strutz


Hochschule Coburg

Fakultät Elektrotechnik und Informatik (FEI)
Friedrich-Streib-Str. 2
96450 Coburg

T +49 9561 317 529
Tilo.Strutz[at]hs-coburg.de

ORCID iD: 0000-0001-5063-6515