Skip to main content

ESP H.264 Practical Usage Guide

·7 mins·
Multimedia H.264 Performance Tuning ESP32-P4 ESP32-S3
Author
Hou haiyan
Embedded Software Engineer at Espressif
Table of Contents
This article introduces Espressif’s esp_h264 component, a lightweight H.264 codec optimized for embedded devices. It shows how to leverage hardware acceleration, implement efficient video processing, and optimize performance for various applications.

Overview
#

What is ESP H.264?
#

Espressif has recently launched the esp_h264 component for ESP32 series microcontrollers, which through hardware acceleration, dynamic scheduling and lightweight algorithms, is able to balance the computing power and power consumption of video codec.

esp_h264

Key Features
#

  • Hardware Acceleration: Leverages ESP32-P4 for hardware encoding and high-speed decoding, with single-instruction, multiple-data (SIMD) acceleration on ESP32-S3 for enhanced efficiency
  • Memory Optimization: Implements advanced algorithms to minimize memory usage, ensuring stable operation on resource-constrained devices
  • Dynamic Configuration: Flexible parameter adjustment for real-time optimization of performance, resource allocation, and video quality
  • Advanced Encoding: Supports Baseline profile, high-quality I/P frame generation, ROI encoding, and bitrate control
  • Efficient Decoding: Software-based parsing of standard H.264 streams for smooth video playback

Target Applications
#

esp_h264 main applications are:

  • Video surveillance systems
  • Remote meetings and communication
  • Mobile streaming applications
  • IoT video processing

CODEC specifications
#

Encoding
#

PlatformTypeMax ResolutionMax PerformanceAdvanced Features
ESP32-S3SoftwareAny320×240@11fpsBasic encoding
ESP32-P4Hardware≤1080P1920×1080@30fpsDual encoding, ROI optimization, Motion vector output

Decoding
#

PlatformTypeMax ResolutionMax Performance
ESP32-S3SoftwareAny320×240@19fps
ESP32-P4SoftwareAny1280×720@10fps

Getting Started
#

Basic Workflow
#

The hardware encoding standardization process can be summarized into four core operations:

Single Hardware encoder use flow

  1. Initialize: Create encoder with configuration parameters
  2. Start: Open the encoder for processing
  3. Process: Execute frame-by-frame encoding in a loop
  4. Cleanup: Release resources and destroy encoder object

Quick Start Example
#

// Hardware single-stream encoding configuration example
esp_h264_enc_cfg_hw_t cfg = {0};
cfg.gop = 30;
cfg.fps = 30;
cfg.res = {.width = 640, .height = 480};
cfg.rc = {
    .bitrate = (640 * 480 * 30) / 100,
    .qp_min = 26,
    .qp_max = 30
};
cfg.pic_type = ESP_H264_RAW_FMT_O_UYY_E_VYY;

// Initialize encoder
esp_h264_enc_t *enc = NULL;
esp_h264_enc_hw_new(&cfg, &enc);

// Allocate input/output buffers
esp_h264_enc_in_frame_t in_frame = {.raw_data.len = 640 * 480 * 1.5};
in_frame.raw_data.buffer = esp_h264_aligned_calloc(128, 1, 
                                                   in_frame.raw_data.len, 
                                                   &in_frame.raw_data.len, 
                                                   ESP_H264_MEM_INTERNAL);

// Start encoding
esp_h264_enc_open(enc);

// Encoding loop
while (capture_frame(in_frame.raw_data.buffer)) {
    esp_h264_enc_process(enc, &in_frame, &out_frame);
    send_packet(out_frame.raw_data.buffer);
}

// Resource release
esp_h264_enc_close(enc);
esp_h264_enc_del(enc);
esp_h264_free(in_frame.raw_data.buffer);

API Reference
#

The following section provides a brief overview of the available functions.

These functions are thread-safe and can be called at any time during the encoder lifecycle.

Encoding functions
#

FunctionDescriptionPlatform Support
esp_h264_enc_sw_newCreate single-stream software encoderESP32-S3, ESP32-P4
esp_h264_enc_hw_newCreate single-stream hardware encoderESP32-P4 only
esp_h264_enc_dual_hw_newCreate dual-stream hardware encoderESP32-P4 only
esp_h264_enc_openStart encoderAll platforms
esp_h264_enc_processExecute encoding for a single frame and output compressed dataAll platforms
esp_h264_enc_closeStop encoderAll platforms
esp_h264_enc_delRelease encoder resourcesAll platforms

Decoding functions
#

FunctionDescriptionPlatform Support
esp_h264_dec_sw_newCreate software decoderESP32-S3, ESP32-P4
esp_h264_dec_openStart decoderAll platforms
esp_h264_dec_processExecute decoding for a single frame and output raw dataAll platforms
esp_h264_dec_closeStop decoderAll platforms
esp_h264_dec_delRelease decoder resourcesAll platforms

Dynamic Parameter Control
#

FunctionDescriptionTypical Use Cases
esp_h264_enc_get_resolutionGet resolution informationDisplay configuration
esp_h264_enc_get/set_fpsDynamically adjust frame rateNetwork bandwidth adaptation
esp_h264_enc_get/set_gopDynamically adjust GOP sizeQuality vs. bandwidth balance
esp_h264_enc_get/set_bitrateDynamically adjust bitrateNetwork bandwidth adaptation

Advanced Features
#

This section highlights advanced capabilities of the H.264 encoder that offer greater control and flexibility for specialized use cases. These features include region-based quality adjustments, motion vector extraction for video analysis, and dual-stream encoding support on the ESP32-P4.

Region of Interest (ROI) Encoding
#

ROI encoding allows you to allocate more bits to important areas of the frame while reducing quality in less critical regions.

ROI Configuration

// Set the center area for high-priority encoding
esp_h264_enc_roi_cfg_t roi_cfg = {
    .roi_mode = ESP_H264_ROI_MODE_DELTA_QP,
    .none_roi_delta_qp = 10  // Increase QP by 10 for non-ROI region
};
ESP_H264_CHECK(esp_h264_enc_hw_cfg_roi(param_hd, roi_cfg));

// Define the center 1/4 area as ROI
esp_h264_enc_roi_reg_t roi_reg = {
    .x = width / 4, .y = height / 4,
    .len_x = width / 2, .len_y = height / 2
};
ESP_H264_CHECK(esp_h264_enc_hw_set_roi_region(param_hd, roi_reg));

ROI API Functions

FunctionDescriptionUse Cases
esp_h264_enc_cfg_roiConfigure ROI parametersKey encoding for faces, license plates
esp_h264_enc_get_roi_cfg_infoGet current ROI configurationStatus monitoring
esp_h264_enc_set_roi_regionDefine ROI regionsSpecific area enhancement
esp_h264_enc_get_roi_regionGet ROI region informationConfiguration verification

Motion Vector Extraction
#

Extract motion vector data for video analysis and post-processing applications.

Motion Vector API Functions

FunctionDescriptionUse Cases
esp_h264_enc_cfg_mvConfigure motion vector outputVideo analysis setup
esp_h264_enc_get_mv_cfg_infoGet motion vector configurationConfiguration verification
esp_h264_enc_set_mv_pktSet motion vector packet bufferData collection
esp_h264_enc_get_mv_data_lenGet motion vector data lengthBuffer management

Dual-Stream Encoding (ESP32-P4 Only)
#

ESP32-P4 supports simultaneous encoding of two independent video streams with different parameters.

// Main stream 1080P storage, sub-stream 480P transmission
esp_h264_enc_cfg_dual_hw_t dual_cfg = {0};
dual_cfg.cfg0 = {.res = {1920, 1080}, .bitrate = 4000000};  // Main stream
dual_cfg.cfg1 = {.res = {640, 480}, .bitrate = 1000000};    // Sub-stream
ESP_H264_CHECK(esp_h264_enc_dual_hw_new(&dual_cfg, &enc));

Application Scenarios & Best Practices
#

The following examples demonstrate how to apply advanced encoding features to meet specific use-case requirements. Each scenario outlines an optimal configuration strategy, showcasing how ROI, bitrate control, and motion vectors can be tailored for performance, privacy, or adaptability.

1. Video Surveillance
#

In video surveillance applications, it’s critical to maintain high visual fidelity in regions that contain important details—such as faces, license plates, or motion-detected areas—while conserving bandwidth and storage elsewhere. ROI (Region of Interest) encoding allows the encoder to prioritize such regions by allocating more bits, thereby enhancing clarity where it matters most.

Optimal Configuration:

  • Enable ROI encoding to enhance key visual areas.
  • GOP = 30 ensures a keyframe every second at 30 fps, balancing video seekability and compression.
  • QP range: [20–35] provides a controlled balance between compression efficiency and perceptual quality, especially in bandwidth-constrained environments.
// Surveillance optimized configuration
esp_h264_enc_cfg_hw_t surveillance_cfg = {
    .gop = 30,
    .fps = 25,
    .res = {1280, 720},
    .rc = {
        .bitrate = 2000000,
        .qp_min = 20,
        .qp_max = 35
    }
};

ROI Setup for Key Areas: To further refine quality, specific regions—such as the center of the frame or areas flagged by motion detection—can be configured for lower quantization parameters (QPs), resulting in better detail preservation.

  • Reduce QP in key regions by up to 25%, improving clarity for facial recognition or license plate reading.
  • Leverage motion vector data to dynamically track and adapt ROI regions for intelligent, resource-efficient surveillance.

2. Privacy Protection
#

In scenarios where privacy is a concern—such as public-facing cameras or indoor monitoring—specific regions of the video may need to be intentionally blurred. This can be achieved by strategically increasing the quantization parameter (QP) in those regions, reducing detail without additional processing overhead.

Implementation Strategy:

  • Increase QP by 25% in ROI areas to achieve blur effect
  • Use fixed GOP to prevent mosaic area diffusion
// Privacy protection ROI configuration
esp_h264_enc_roi_cfg_t privacy_cfg = {
    .roi_mode = ESP_H264_ROI_MODE_DELTA_QP,
    .none_roi_delta_qp = -5  // Better quality for non-sensitive areas
};

// Blur sensitive area
esp_h264_enc_roi_reg_t blur_region = {
    .x = sensitive_x, .y = sensitive_y,
    .len_x = sensitive_width, .len_y = sensitive_height,
    .qp = 15  // High QP for blur effect
};

3. Network Adaptive Streaming
#

For real-time video applications operating over variable or constrained networks, maintaining a stable and responsive stream is essential. By dynamically adjusting encoding parameters such as bitrate and frame rate based on current bandwidth conditions, the encoder can optimize video quality while minimizing buffering and transmission failures.

Strategy:

  • Enable dynamic bitrate control (CBR/VBR)
  • Adjust parameters based on network conditions
// Network adaptation function
void adapt_to_network_conditions(esp_h264_enc_handle_t enc, uint32_t available_bandwidth) {
    esp_h264_enc_param_hw_handle_t param_hd;
    esp_h264_enc_hw_get_param_hd(enc, &param_hd);
    
    if (available_bandwidth < 1000000) {  // < 1 Mbps
        esp_h264_enc_set_bitrate(&param_hd->base, 800000);
        esp_h264_enc_set_fps(&param_hd->base, 15);
    } else if (available_bandwidth < 3000000) {  // < 3 Mbps
        esp_h264_enc_set_bitrate(&param_hd->base, 2000000);
        esp_h264_enc_set_fps(&param_hd->base, 25);
    } else {  // >= 3 Mbps
        esp_h264_enc_set_bitrate(&param_hd->base, 4000000);
        esp_h264_enc_set_fps(&param_hd->base, 30);
    }
}

Resources and Support
#

Development Resources
#

Technical Support
#

Conclusion
#

Espressif’s lightweight H.264 codec component esp_h264 is designed for efficient video processing on resource-constrained devices. This comprehensive guide analyzes its core advantages from four dimensions: technical features, API interfaces, application scenarios, and troubleshooting, thereby helping developers unlock the potential of embedded video codec.

Whether you’re building a surveillance system, implementing video streaming, or developing innovative multimedia applications, ESP H.264 offers the tools and performance needed to succeed in resource-constrained environments.

Related

How to switch between multiple ESP32 firmware binaries stored in the flash memory
·6 mins
Embedded Systems ESP32 ESP32-S3 ESP32-P4 GUI OTA Espressif BSP
Explore the PIE capabilities on the ESP32-P4
·10 mins
ESP32-P4 ESP32-S3 PIE AI DSP Assembly
Simple Boot explained
·6 mins
ESP32 ESP32-S2 ESP32-S3 ESP32-C3 ESP32-C6 ESP32-C2 ESP-IDF Zephyr NuttX
In this article, we explore a simplified ESP32 boot process using single-image binaries to speed up build and flash times — ideal for development workflows. This approach sacrifices features like OTA updates but enables faster iteration.