IT용어위키



Huffman Code

Huffman Code is a lossless data compression algorithm that assigns variable-length binary codes to input characters based on their frequency. It is widely used in data compression applications such as file compression and encoding.

Concept

Huffman coding works by assigning shorter codes to more frequent characters and longer codes to less frequent characters, minimizing the total number of bits used to represent the data.

Algorithm

The Huffman coding algorithm follows these steps:

  1. Calculate Frequency: Count the occurrence of each character in the input data.
  2. Build a Priority Queue: Create a min-heap where each node represents a character and its frequency.
  3. Construct a Huffman Tree:
    • Extract the two nodes with the lowest frequency.
    • Merge them into a new node with their combined frequency.
    • Repeat until only one node remains, which becomes the root of the tree.
  4. Assign Binary Codes: Traverse the tree and assign binary values (0 for left, 1 for right) to generate unique Huffman codes for each character.

Example

Given the input string hello world, the frequency table and corresponding Huffman codes might be:

Character Frequency Huffman Code
h 1 1100
e 1 1101
l 3 10
o 2 00
w 1 1110
r 1 1111
d 1 01

The compressed representation replaces each character with its Huffman code.

Implementation

A simple implementation of Huffman coding in Python:

import heapq
from collections import Counter, namedtuple

class Node(namedtuple("Node", ["char", "freq", "left", "right"])):
    def __lt__(self, other):
        return self.freq < other.freq

def build_huffman_tree(text):
    freq = Counter(text)
    heap = [Node(char, freq, None, None) for char, freq in freq.items()]
    heapq.heapify(heap)

    while len(heap) > 1:
        left = heapq.heappop(heap)
        right = heapq.heappop(heap)
        merged = Node(None, left.freq + right.freq, left, right)
        heapq.heappush(heap, merged)

    return heap[0]

def generate_codes(node, prefix="", codebook={}):
    if node:
        if node.char:
            codebook[node.char] = prefix
        generate_codes(node.left, prefix + "0", codebook)
        generate_codes(node.right, prefix + "1", codebook)
    return codebook

text = "hello world"
tree = build_huffman_tree(text)
codes = generate_codes(tree)
print(codes)

Advantages

  • Lossless Compression
    • Ensures no data loss in reconstruction.
  • Efficient Encoding
    • Optimized for frequent characters, reducing overall storage size.
  • Used in Many Applications
    • Commonly applied in ZIP, GZIP, JPEG, and other compression formats.

Limitations

  • Requires Frequency Table
    • The table must be transmitted alongside the encoded data.
  • Not Optimal for Small Texts
    • Overhead can be significant for short messages.
  • Sensitive to Input Data
    • Performance depends on character frequency distribution.

Applications

  • File Compression
    • Used in ZIP and GZIP formats.
  • Multimedia Encoding
    • Applied in JPEG and MP3 compression.
  • Data Transmission
    • Efficient encoding in communication protocols.

See Also


  출처: IT위키(IT위키에서 최신 문서 보기)
  * 본 페이지는 공대위키에서 미러링된 페이지입니다. 일부 오류나 표현의 누락이 있을 수 있습니다. 원본 문서는 공대위키에서 확인하세요!