static char license_1[16][70] = {
"LICENSE NOTICE AND DISCLAIMER",
"This is a modified version of software provided by Jill",
"Goldschneider at the University of Washington Data Compression",
"Lab. The modifications were made by the",
"Environmental Research Institute of Michigan",
"1975 Green Road",
"Ann Arbor, Michigan 48107",
"Telephone: (313)994-1200",
"under U.S. government contract DLA900-88-D-0392.",
"There is no warranty, either express or implied, for these",
"modifications and documentation, including, without limitation,",
"warranty of merchantibility and warranty of fitness for a ",
"particular purpose.",
"These modifications may be used for any non-commercial",
"purpose, such as research, provided this notice is kept",
"intact."
};
/******************************************************************************
Name: mem_tsvqe_util.c
Date: Jan. 31, 1996
Marko Slyz created mem_tsvqe_util.c by modifying a version of tsvqe_util.c,
which was written by Jill Goldschneider and last revised in February 1994.
The routines in mem_tsvqe_util have the following features:
1) The encoder generates an actual compressed bit stream, which consists of
indices. The decoder uses these indices to construct an approximation to
the original data. Calling P2I_image_encode() and then feeding the
resulting output to I2P_image_decode() should give the same result as a
single call to tsvqe_util.c's image_encode().
2) They compress and decompress to and from memory instead of disk.
3) The encoder can insert restart markers in the output bit stream. Then, if
that bit stream becomes corrupted, the decoder uses these restart markers
to recover at least some of the original data.
DESCRIPTION OF THE BASIC TECHNIQUE
The restart marker scheme is similar to the one in the JPEG standard [1,
Section 7.4]. Start by picking some natural number RI_len, then take the
stream of (possibly variable-length) indices and put a marker after every
RI_len of them. A marker consists of two bytes. The first is always
7F. The second is a number that cycles through markers[] (which is defined
in mem_bits.c). The entire first marker is (7F markers[1]), the
(NUM_MARKERS-1)th is (7F markers[NUM_MARKERS - 1]) and the NUM_MARKERSth
marker is (7F markers[1]) again.
At the receiver, parse the data into segments delimited by markers. Each
segment's marker determines the location in the image to begin placing the
pixels corresponding to that segment's indices. This will work perfectly
if the markers never get corrupted.
One complication is that 7F's, which indicate markers, can also appear
naturally in the bits composing the indices. This is bypassed by stuffing
a 00 byte after each natural occurrence of 7F. The receiver will then need
to remember that 7F 00 needs to be translated to just 7F before the
decompressor sees it.
WHAT HAPPENS WHEN SUBSTITUTION AND INSERTION/DELETION ERRORS OCCUR
Unfortunately, the markers can get corrupted. One coping strategy is to
remember the index (i.e. the position in markers[]) of the last decoded
marker. Then look for the next marker whose index is greater. If its index
isn't exactly one greater then the receiver knows that an error has been
made, but for simplicity doesn't try to estimate the missing marker's
location.
With this strategy, if an error occurs in a marker then the entire segment
corresponding to that marker is lost. Also, if an error occurs in the
indices, then at most the rest of the segment is lost.
COMPARISONS WITH EXISTING TECHNIQUES
7F is used instead of the FF used in the JPEG scheme because, no matter
what bits surround 7F, it is always possible to tell where the 7F
starts. This is not true for FF; consider the bits stream "01" as an
example. Suppose that an FF marker code was appended at the end of the bit
stream to produce: "0111111111". The decoder would not be able to tell
whether the marker begins at bit position 2 or 3, since FF's start at both
positions.
The rationale for using markers[] -- as opposed to the numbers from 1 to
NUM_MARKERS-1 -- arises from choosing the codes in markers[] such that no
single insertion, deletion or substitution error can create another valid
entry in markers[]. So any single error in a marker will create an invalid
codeword. This is useful since it's better to have a codeword that's known
to be wrong than to have a codeword that's wrong but is thought to be ok;
the latter can screw-up not just the current interval but the succeeding
ones as well.
The restart marker scheme used here is also similar to comma codes, which
are discussed in, for example, [2, Section 12.5].
REFERENCES
[1] Pennebaker, Mitchell, _JPEG_Still_Image_Data_compression_Standard_,
Van Nostrand Reinhold, New York, 1993.
[2] J. Stiffler, "Theory of Synchronous Communication", Prentice-Hall,
Englewood Cliffs, 1971.
*****************************************************************************/
#include
#include
#include "tsvq.h"
#include "protect.h"
#include "mem_bits.h"
#define round(x) floor((x) + 0.5)
/* -------------------------------------------------- */
/* An external variable that should be put into a header file: */
extern int dim;
/* -------------------------------------------------- */
/* This encodes the vector pointed to by "vector" using the tree rooted at
"node" and saves the index in the array pointed to by "(*output_p)". It
also sets "*tempbits" and "*tempdistortion".
Note that only if a tree consists of one node is it possible for a zero-bit
codeword to occur. Moreover, in that case _all_ of the codewords will have
zero bits since there is never a need to disambiguate two possibilities,
except for EOF/not-EOF, and that's handled by the initial transmission of
the number of codewords. */
static void mem_encode(TreeNode *node, DATATYPE *vector, int *tempbits,
DISTTYPE *tempdistortion)
{
DISTTYPE left_dist, right_dist;
node->avmse += (*tempdistortion = dist(vector, node->data));
*tempbits = 0;
while ((node->left_child != NULL) && (node->right_child != NULL)) {
node->count++;
left_dist = dist(vector, node->left_child->data);
right_dist = dist(vector, node->right_child->data);
/* Go to the next node: */
if (left_dist < right_dist) {
(*tempbits) += mem_output_zero();
node = node->left_child;
node->avmse += (*tempdistortion = left_dist);
} else {
(*tempbits) += mem_output_one();
node = node->right_child;
node->avmse += (*tempdistortion = right_dist);
}
}
assert((node->left_child == NULL) && (node->right_child == NULL));
node->count++;
}
#if 0
static void print_blocked(DATATYPE *data, int num_in)
{
int i, j;
FILE *fp;
OPEN(fp, "mem_data_jjj", a);
for (i = 0; i < num_in; i++) {
/* fprintf(fp, "%x: ", i); */
for (j = 0; j < dim; j++)
fprintf(fp, "%x ", data[i*dim + j]);
fprintf(fp, "\n");
}
CLOSE(fp);
}
remove("mem_data_jjj"); /* so that print_blocked() starts fresh */
print_blocked(input, num_in);
#endif
/* P2I_image_encode(): This translates pixels to indices.
INPUT:
root: codebook tree
input: input[i*dim] is the first pixel in the ith input vector
output_p: (*output_p) should be a pointer to uchar. Upon return it will
point to an array (ceil(*bits/8.0) long) of indices.
num_in: the number of input vectors
rate: An array of num_in uchars. It'll contain the number of
bits per vector.
RI_len: The length of a restart interval, i.e. the number of indices
to output between restart markers. If this is zero then
no restart markers are used.
OUTPUT:
bits: the number of bits used to encode the image
maxbits: the longest codeword used
numpixels: the number of pixels encoded
*/
DISTTYPE P2I_image_encode(TreeNode *root, long *bits, int *maxbits,
long *numpixels, DATATYPE *input, uchar **output_p,
int num_in, uchar *rate, int RI_len)
{
int i; /* the index of the current vector */
int tempbits; /* length of codeword to encode current vector */
DISTTYPE tempdistortion; /* distortion of current encoded vector */
DISTTYPE distortion; /* distortion of encoded image */
distortion = 0.0;
*maxbits = 0;
*numpixels = 0;
init_mem_bits(output_p, RI_len);
for (i = 0; i < num_in; i++) {
/* Put a marker after every RI_len indices, and don't have one before the
zeroth index. */
if ((RI_len > 0) && (i % RI_len == 0) && (i > 0)) {
rate[i] = MARKER_LEN;
insert_marker();
} else
rate[i] = 0;
mem_encode(root, input+i*dim, &tempbits, &tempdistortion);
distortion += tempdistortion;
rate[i] += tempbits;
if (tempbits > *maxbits)
*maxbits = tempbits;
*numpixels += dim;
}
*bits = number_of_bits_output();
normalize_avmse(root);
return distortion;
}
/* -------------------------------------------------- */
/* The decoding routines: */
#define BAD_INTERVAL -1
static int min(int a, int b) { if (a < b) return a; else return b; }
/* Some debugging code: */
#ifdef DBITS
/* Define a place to record the beginnings of input[]'s intervals. (note
that interval_start[] records the same thing but for clean_input[]): */
#define MAX_INTERVAL_START 250000
static int input_interval_start[MAX_INTERVAL_START];
/* Define places to record the locations of the markers that precede stuffed
zero bytes that occur in "input" and "clean_input": */
#define MAX_STUFFED 50000
static int input_stuffed[MAX_STUFFED];
static int clean_stuffed[MAX_STUFFED];
static int stuffed_pos; /* the current location in input_stuffed[] and
clean_stuffed[] */
static int input_pos; /* the current location in
input_interval_start[] */
/* This is called right after a stuffed byte has been detected. */
static void RECORD_STUFFED(int bit, int k)
{
if (stuffed_pos < MAX_STUFFED) {
input_stuffed[stuffed_pos] = bit - 16;
clean_stuffed[stuffed_pos] = k - 8;
} else
pr_error("Recompile with a larger value of MAX_STUFFED");
stuffed_pos++;
}
/* This is called right after an interval start marker has been detected. */
static void RECORD_INPUT_INTERVAL_START(int bit, int i, int num_RIs)
{
if (i < MAX_INTERVAL_START) {
for ( ; input_pos < i ; input_pos++)
input_interval_start[input_pos] = BAD_INTERVAL;
input_interval_start[input_pos] = bit - 16;
if (input_pos < num_RIs) input_pos++;
} else
pr_error("Recompile with a larger value of MAX_INTERVAL_START");
}
/* INPUTS:
This prints all the bits in Xinput. While it's doing this, it prints a
"X#1(#2):" right before each bit position indicated by Xinterval_start,
and a "Y#1(#2):" right before each bit position indicated by
Xstuffed. Here #1 is the number of that marker, while #2 is its
location.
in_len: The number of bits in Xinput.
max_Xint, max_Xstuff: The number of valid elements in Xinterval_start
and Xstuffed respectively.
fname: The name of the file in which to print the data. */
static void DISPLAY_BITS(uchar *Xinput, int *Xinterval_start, int *Xstuffed,
int in_len, int max_Xint, int max_Xstuff,
char *fname)
{
int i, j, k;
int bad_count; /* the number of bad intervals before this one */
FILE *fp;
OPEN(fp, fname, w);
j = k = bad_count = 0;
for (i = 0; i < in_len; i++) {
if (j <= max_Xint && Xinterval_start[j] == i) {
for (; bad_count > 0; bad_count--)
fprintf(fp, "\n\nX%d: BAD_INTERVAL", j - bad_count);
/* Note that Xinterval_start[0] is always 0, so the data can't
start with bad intervals. */
fprintf(fp, "\n\nX%d(%d): ", j, i);
j++;
while (j < max_Xint-1 && Xinterval_start[j] == BAD_INTERVAL)
j++, bad_count++;
}
if (k < max_Xstuff && Xstuffed[k] == i) {
fprintf(fp, "\nY%d(%d): ", k, i);
if (k < max_Xstuff) k++;
}
if (BITTEST(Xinput, i))
fputc('1', fp);
else
fputc('0', fp);
/* Put spaces after printing each group of 8 bits: */
if (j > 1 && (i - Xinterval_start[j-1-bad_count] - 7) % 8 == 0)
fputc(' ', fp);
}
CLOSE(fp);
}
static void PRINT_MARKERS(int *interval_start, int num_RIs, char *fname)
{
int i;
FILE *fp;
OPEN(fp, fname, w);
if (input_pos != num_RIs)
fprintf(fp, "warning: (input_pos = %d) != (num_RIs = %d)\n\n",
input_pos, num_RIs);
fprintf(fp, "i \t\t interval_start \t\t input_interval_start\n");
for (i = 0; i <= num_RIs; i++)
fprintf(fp, "%d \t\t %d \t\t %d \t\t 0\n", i, interval_start[i],
input_interval_start[i]);
fprintf(fp, "\n\n\n");
fprintf(fp, "input_stuffed \t\t\t clean_stuffed\n");
for (i = 0; i < stuffed_pos; i++)
fprintf(fp, "%d \t\t\t %d \t\t\t 1\n", input_stuffed[i],
clean_stuffed[i]);
CLOSE(fp);
}
static void INIT_DEBUG(void)
{
input_pos = stuffed_pos = 0;
}
#else
#define RECORD_STUFFED(x,y)
#define RECORD_INPUT_INTERVAL_START(x,y,z)
#define stuffed_pos x
#define DISPLAY_BITS(a, b, c, d, e, f, g)
#define PRINT_MARKERS(a, b, c)
#define INIT_DEBUG()
#endif
/* This gets rid of the FF already copied to the output. */
static void back_up_8_bits(uchar *clean_input, int *k)
{
int j;
for (j = 0; j < 8; j++) {
(*k)--;
BITUNSET(clean_input, *k);
}
}
/* next_marker_num_with_copy(): This copies bits from input[] to
clean_input[]. It goes until either the next 0x7F byte (inclusive) in
input[], or the end of the array.
It then packs the next 8 bits into an int and returns the index of the
element in markers[] that is equal to it. If the next 8 bits do not
correspond to an element in markers[], it just returns NUM_MARKERS, and if
there aren't enough bits left in the data stream to assemble a marker, it
returns -1. */
static int next_marker_num_with_copy(uchar *input, int *bit, int in_len,
uchar *clean_input, int *k)
{
int mask, t;
int have_starting_zero, consecutive_ones;
have_starting_zero = consecutive_ones = 0;
/* Find the next marker prefix: */
while (*bit < in_len && consecutive_ones < 7) {
if (BITTEST(input, *bit)) {
BITSET(clean_input, *k);
if (have_starting_zero) consecutive_ones++;
} else {
have_starting_zero = 1;
consecutive_ones = 0;
}
(*bit)++; (*k)++;
}
/* If the file ends before the marker number then return: */
if ((*bit)+8 > in_len)
return -1;
/* Otherwise, pull it out: */
for (t = 0, mask = 0x80; mask > 0; mask >>= 1, (*bit)++)
if (BITTEST(input, *bit))
t |= mask;
return marker_search_table[t];
}
/* next_marker_num(): This finds the next marker in "input" and returns its
number. It ignores stuffed 0 bytes. */
static int next_marker_num(uchar *input, int bit, int in_len)
{
int mask, t;
int have_starting_zero, consecutive_ones;
do {
have_starting_zero = consecutive_ones = 0;
/* Find the next marker prefix: */
while (bit < in_len && consecutive_ones < 7) {
if (BITTEST(input, bit)) {
if (have_starting_zero) consecutive_ones++;
} else {
have_starting_zero = 1;
consecutive_ones = 0;
}
bit++;
}
/* If the file ends before the marker number then quit: */
if (bit+8 > in_len)
return -1;
/* Otherwise pull out the marker number: */
for (t = 0, mask = 0x80; mask > 0; mask >>= 1, bit++)
if (BITTEST(input, bit))
t |= mask;
} while (t == 0); /* ignore stuffed 0 bytes */
return marker_search_table[t];
}
/* If the previous marker and the next marker are consistent with each other
(i.e. the next marker is two larger than the previous marker), then
CHECK_NEXT() makes parse() ignore the current marker.
Using CHECK_NEXT() is optional, but parse() will work better with it if
seldom are two consecutive markers corrupted to create a spurious marker
sequence. Error rates where this is common would probably be too high for
parse() anyway.
Note that (previous_marker%NUM_MARKERS + 1) is the current marker,
and ((previous_marker + 1)%NUM_MARKERS + 1) is the next marker.
*/
#define CHECK_NEXT() \
if ((valid > i) && (next_marker_num(input, bit, in_len) == \
(previous_marker + 1)%NUM_MARKERS + 1)) \
valid = i;
#define RECORD_START_AND_MOVE_ON() \
CHECK_NEXT(); \
for (; i < valid; i++) /* Flag all the intervals */ \
interval_start[i] = BAD_INTERVAL; /* after the last good one. */ \
interval_start[i] = k; \
RECORD_INPUT_INTERVAL_START(bit, i, num_RIs); \
i++; \
previous_marker = t;
/* parse():
INPUTS:
input: All of the input data indices.
in_len: The number of valid bits in "input".
num_RIs: The original number of restart intervals in "input".
OUTPUTS:
clean_input (the return value): This contains the data in "input" with
all of the markers and stuffed zero bytes removed.
interval_start: This should have been allocated in the calling routine to
point to (num_RIs + 1) ints. Upon return, interval_start[i] is the
number of the first bit in the ith restart interval.
parse() assumes that input[] can be generated by a grammar whose terminal
symbols are {0,1} and that has the following rules:
data -> restart_interval data
data -> e
restart_interval -> indices marker
indices -> no_stuffed stuffed indices
indices -> e
; no_stuffed can expand to any string that doesn't contain 01111111:
no_stuffed -> I
I -> 1 I, I -> e, I -> 0 Z
Z -> 0 Z, Z -> e,
Z -> 1 N1, Z -> 11 N2, ...., Z -> 111111 N6
Ni -> 0 Z, Ni -> e for i in {1,2,...,6}
stuffed -> 01111111 00000000
stuffed -> e
marker -> 01111111 u u in markers[] (see mem_bits.c)
(where e is the empty string). */
static uchar *parse(uchar *input, int *interval_start, int num_RIs, int in_len)
{
int i; /* the number of the current restart interval */
int valid; /* the next valid restart interval number */
int k; /* the bit position in the "clean input" array */
int bit; /* the bit position in the "input" array */
uchar *clean_input; /* the input array with the markers and stuffed
bytes removed */
int previous_marker; /* the number of the last marker segment */
int t; /* the current marker's number (in {0, ...,
NUM_MARKERS}). 0 implies a byte stuffed 7F. */
init_marker_search_table();
bit = k = 0;
interval_start[0] = 0; /* The 0th restart interval always */
i = 1; /* starts at bit position 0. */
RECORD_INPUT_INTERVAL_START(16, 0, num_RIs);
clean_input = pr_alloc(sizeof(uchar) * ceil((double)in_len/CHAR_BIT), 0);
t = next_marker_num_with_copy(input, &bit, in_len, clean_input, &k);
previous_marker = 0;
while (t != -1 && i < num_RIs) {
/* LOOP INVARIANT: locations 0 to i-1 of interval_start point to the
beginnings of the first i restart intervals. */
/* Figure out what to do depending on what it was: */
if (t < 0) break; /* the data stream ended too soon */
else if (t == 0) /* this was a stuffed 7F byte */
RECORD_STUFFED(bit,k);
else if (t > previous_marker && t < NUM_MARKERS) {
back_up_8_bits(clean_input, &k); /* delete the marker prefix */
valid = min(i+(t-previous_marker-1), num_RIs);
RECORD_START_AND_MOVE_ON();
} else if (t <= previous_marker) {
back_up_8_bits(clean_input, &k);
valid = min(i + ((NUM_MARKERS-1)-previous_marker) + (t-1), num_RIs);
RECORD_START_AND_MOVE_ON();
} else bit -= 8; /* Assume that a bit error generated a spurious 7F and
copy everything to clean_input. */
/* Move on to the next marker: */
t = next_marker_num_with_copy(input, &bit, in_len, clean_input, &k);
assert(k <= in_len);
}
for (; i < num_RIs; i++)
interval_start[i] = BAD_INTERVAL;
interval_start[num_RIs] = k;
RECORD_INPUT_INTERVAL_START(bit+16, num_RIs, num_RIs);
return clean_input;
}
static void mem_decode(TreeNode *node, uchar *input, DATATYPE *output,
int *bit, int max_bit)
{
int j;
if (*bit == BAD_INTERVAL) {
for (j = 0; j < dim; j++)
output[j] = (DATATYPE)0;
return;
}
while ((node->left_child != NULL) && (node->right_child != NULL)
&& (*bit < max_bit)) {
if (BITTEST(input, *bit))
node = node->right_child;
else
node = node->left_child;
(*bit)++;
}
for (j = 0; j < dim; j++) /* for double data take out the round */
output[j] = (DATATYPE) round(node->data[j]);
/* output[j] = (DATATYPE) node->data[j]; */
}
/* I2P_image_decode(): This translates indices to pixels.
INPUT:
root: codebook tree
input: points to a bit array of indices
num_in: the number of indices coded-up in "input"
RI_len: the length of the restart intervals (expressed in
"number of indices")
in_len: the length of the input data (expressed in bits)
OUTPUT:
image_vectors: Upon return this points to an array, num_in*dim long,
of DATATYPEs filled with the decoded vectors.
*/
DATATYPE *I2P_image_decode(TreeNode *root, uchar *input, int num_in,
int RI_len, int in_len)
{
int i, j, k;
int bit; /* the current location in the clean_input bit array */
int num_RIs; /* the number of RI's in the image */
int current_len; /* the length of the current restart interval */
uchar *clean_input; /* same as "input", except without restart markers */
DATATYPE *image_vectors; /* will hold the decoded image */
int *interval_start; /* interval_start[i] will be the location, expressed
in bits, of the beginning of the ith restart
interval in clean_input[]. */
image_vectors = pr_alloc(sizeof(DATATYPE) * dim * num_in, NOF);
if (RI_len > 0) {
INIT_DEBUG();
num_RIs = (int)ceil((double)num_in/RI_len);
interval_start = pr_alloc(sizeof(int)*(num_RIs+1), NOF);
clean_input = parse(input, interval_start, num_RIs, in_len);
PRINT_MARKERS(interval_start, num_RIs, "markers");
DISPLAY_BITS(input, input_interval_start, input_stuffed, in_len,
num_RIs, stuffed_pos, "input_bits");
DISPLAY_BITS(clean_input, interval_start, clean_stuffed, in_len,
num_RIs, stuffed_pos, "clean_bits");
for (i = 0; i < num_RIs; i++) {
current_len = min(RI_len, num_in - i*RI_len);
bit = interval_start[i]; /* first valid bit in this interval */
k = i + 1;
while (interval_start[k] == BAD_INTERVAL) k++; /* last valid bit */
for (j = 0; j < current_len; j++)
mem_decode(root, clean_input, image_vectors + (i*RI_len+j)*dim, &bit,
interval_start[k]);
}
pr_free(interval_start);
pr_free(clean_input);
} else {
bit = 0;
for (j = 0; j < num_in; j++)
mem_decode(root, input, image_vectors + j*dim, &bit, in_len);
}
return image_vectors;
}