A Practical Guide to SECS/GEM Integration for Yield Analytics

SECS-GEM semiconductor equipment integration

SEMI E5 (SECS-II) and E30 (GEM) are the foundational communication standards for semiconductor equipment automation. On paper, they describe a well-defined message structure: equipment events generate SECS messages, a host (usually your MES) receives and responds to them. In practice, every equipment vendor interprets the standard slightly differently, the documentation gaps are significant, and the message structures you actually receive from a KLA Surfscan differ meaningfully from what you receive from an ASML optical scanner or a Lam etch chamber. This guide covers what to expect and how to deal with the differences.

The SECS/GEM Stack in Brief

SECS/GEM is not a single protocol. It is a stack of standards. SEMI E4 (SECS-I) defines the physical layer — originally RS-232 serial, still in use on older tools. SEMI E37 (HSMS) defines the TCP/IP transport layer used by virtually all equipment installed after approximately 2000. SEMI E5 (SECS-II) defines the message structure — how data is encoded in a binary format called SML. SEMI E30 (GEM) defines the application layer: which event types equipment should report, how equipment state models should work, and how recipes and alarms are structured.

For yield analytics integration, the relevant messages are primarily S6F11 (Event Report Send), which carries equipment event data in real time, and S9 messages (error handling). The specific variables reported in S6F11 events depend on what the equipment vendor has chosen to include and how they have structured the variable lists. This is where most integration work happens.

Variable Reports: The First Integration Challenge

GEM equipment maintains a list of "collection events" — named events like ProcessingStarted, ProcessingComplete, AlarmOccurred — each associated with a set of variables (status variables, trace data variables, or equipment constants). When an event fires, the equipment sends those variables in an S6F11 message.

The challenge is that the list of available events and the variables associated with each event are not standardized beyond the GEM-required minimums. A KLA eDR-7000 review SEM will report ProcessingComplete with a different variable set than a KLA Surfscan SP7, even though both tools are from the same vendor. The variable names are often vendor-specific (e.g., "DefectCount" versus "NumberOfDefects" versus "DEFECT_CNT"), the units are not always documented, and the data types (ASCII, integer, float) can vary between firmware versions.

The practical consequence: you cannot write a single SECS/GEM parser that handles all tools without a per-tool configuration layer. Every integration project starts with pulling the GEM Equipment Constant Dictionary (VID list) and Event/Collection list from the tool and mapping them to your analytics schema. KLA provides this documentation reasonably well for their Defect Review and Optical Inspection tools. Lam and Applied Materials toolsets vary significantly — some tools have well-documented VID lists; older chamber versions may have undocumented variables that only appear when you actually send a S2F23 request and examine what comes back.

Equipment State Model Inconsistencies

GEM defines a Control State Model (CSM) that describes when equipment accepts host commands and when it is in manual mode. The states are: Host Initiated Scenario (REMOTE/ONLINE) and Operator Initiated Scenario (LOCAL). In a fully automated fab, you want all tools in REMOTE/ONLINE so the host can send and receive commands. In practice, equipment frequently drops to LOCAL during maintenance, PM cycles, or alarm conditions — and does not always send a reliable notification when it does.

For yield analytics, the equipment state matters because events that fire while the tool is in a non-standard state may not reflect normal production conditions. An inspection event that runs while an engineer is in LOCAL mode doing a maintenance qualification should be flagged differently than a production inspection run. Most tools do report control state as a variable, but the variable name and encoding vary. Build your state tracking logic to be defensive: assume equipment state transitions can be missed and include state validation as part of your event processing pipeline.

Clock Synchronization and Timestamp Reliability

Timestamp alignment is critical for yield correlation. If you are trying to correlate an etch chamber event to an inspection result that ran 20 minutes later, you need both timestamps to be accurate and synchronized to the same reference. This is less reliable than it sounds. Equipment clocks drift. Older tools may not support NTP synchronization. Some tools report timestamps in local time without timezone offset information; others use UTC; a few use Unix epoch times but with seconds precision rather than milliseconds.

A common problem: two tools in the same fab running the same lot report events with timestamps that differ by 90 seconds because one was never updated to daylight saving time after the last maintenance window. For inspection-to-process correlation that depends on minute-level precision, that offset produces false correlations and missed correlations. The fix is to include timestamp validation as a standard onboarding step: run a reference lot with known process times, collect SECS event timestamps from all connected tools, and calculate the offset correction for each tool. Correct at ingestion time rather than storing the raw (incorrect) timestamps.

Message Sequencing and Recovery

HSMS (TCP/IP) transport introduces message sequencing requirements that do not exist in SECS-I (serial). Each message has a System Bytes header that must be acknowledged. The equipment tracks outstanding messages and will stop sending if the host does not acknowledge within the T3 timeout (typically 45 seconds). If the host goes offline for maintenance, or if the HSMS connection is interrupted by a network event, the equipment may queue events internally or drop them, depending on vendor implementation.

Recovery after a connection interruption is one of the most common sources of data gaps in yield analytics. Equipment typically retries connection at 10-30 second intervals (the T5 timer), but buffered events from during the disconnection period are handled inconsistently. KLA inspection tools generally buffer events and replay them on reconnect. Lam etch systems often do not buffer — events from during the disconnection are lost. Applied Materials CMP tools have a configurable buffer with a default size of 500 events, which is sufficient for brief outages but not for multi-hour maintenance windows.

The integration implication: your analytics system must maintain a per-tool connection state and flag any data gaps caused by disconnections. Do not attempt to fill gaps by extrapolation. Flag the gap clearly so that correlation analyses spanning the disconnection period are treated with appropriate uncertainty.

KLARF as the Inspection Data Complement

SECS/GEM provides real-time equipment event data — process state changes, recipe start/end, alarm events. It does not carry the full inspection result data. The actual defect coordinates, CDSEM image references, and defect classification results come in KLARF files (KLA Results File format), which are written to a shared file location or transferred via FTP when an inspection run completes.

The integration therefore requires two parallel streams: SECS/GEM for real-time event tracking, and KLARF file monitoring for inspection result ingestion. The two streams must be joined on lot ID and wafer ID to build a complete picture of each inspection event. This join is usually straightforward when lot IDs are consistent — which they are in most modern MES configurations — but becomes complicated when lot splits, merges, or rework steps are involved, because the lot ID may change downstream of certain operations.

Track the wafer serial number (the physical wafer identifier, usually etched or laser-scribed on the wafer itself) as a secondary join key alongside lot ID. Wafer serial numbers persist through lot operations that change lot IDs, and they enable retrospective analysis that lot IDs alone cannot support.

Practical Onboarding Sequence

When onboarding a new tool to the SynthKernel integration, our standard sequence is: first, establish HSMS connectivity and verify bidirectional communication using a loopback test; second, request and document the full VID list and event list from the tool using S1F11 (Equipment Status Request) and S1F13 (Establish Communications Request); third, configure variable reports for the events relevant to yield analytics (ProcessingComplete, AlarmOccurred, SubstrateMapSend for inspection tools); fourth, run a reference lot and verify that event timestamps match process records from the MES; fifth, validate KLARF file ingestion for inspection tools and confirm that the KLARF defect coordinates are in the expected wafer coordinate system.

That sequence takes two to three days for a well-documented tool with a cooperative equipment support team, and up to two weeks for older tools with limited documentation where the VID list has to be reconstructed empirically. Building in buffer time for the older toolsets is essential for any realistic integration project plan.

Common Integration Failures and How to Avoid Them

The three most common integration failures we encounter are: clock drift on older tools causing timestamp misalignment (fix: always run a reference lot as the first integration validation step); variable name changes after a firmware update (fix: build VID mapping as a configuration layer that can be updated without code changes, and monitor for unexpected variable-absent events after any tool firmware update); and KLARF coordinate system mismatches between the inspection tool and the analytics platform (fix: include a coordinate registration verification step using a patterned reference wafer with known defect positions before going into production use).

None of these failures are exotic. All of them have caused multi-week data quality problems at fabs that skipped the validation steps because the integration "seemed to be working" after initial connectivity was established. The integration is not finished when the HSMS connection is green. It is finished when you have a validated, time-aligned data stream with documented variable mappings and confirmed coordinate registration.