← Back to Blog

What Is in a KLARF File and Why It Is the Foundation of Yield Data Integration

KLARF semiconductor data file structure

KLARF is a 30-year-old file format. It was defined by KLA Instruments in the early 1990s and standardized as SEMI E89. It is still the primary output format for wafer inspection data from KLA optical and SEM-based inspection tools, and it is the format that most yield management systems were built to ingest. Understanding its structure — what it contains, what it omits, and where it is ambiguous — is essential for anyone building yield analytics on inspection data.

File Structure Overview

A KLARF file is a structured ASCII text file organized into records. Each record begins with a record identifier followed by a semicolon and a list of values. The most important records for yield analytics are: FileVersion (usually 1.8 for modern KLA tools), InspectionStationID (tool identifier), SampleType (wafer or substrate), WaferID (usually lot ID and wafer number), InspectionOrientation (notch direction and coordinate system origin), and DefectList (the core data — the list of detected defects with coordinates and attributes).

A DefectList record contains one line per detected defect. Each line includes: defect ID (sequential integer), X and Y coordinates in microns in the wafer coordinate system, X and Y size of the defect bounding box, a defect class (integer code from the tool's classification bins), an image count (number of review images associated with this defect), and several additional attributes whose content varies by tool and firmware version.

The Coordinate System Problem

KLARF specifies that defect coordinates are in the "design coordinate system" relative to the wafer origin, but it does not fully specify where the origin is or how the coordinate axes are oriented for all wafer sizes and notch types. Different KLA tool generations have made different implementation choices. KLA Surfscan systems use a coordinate origin at the wafer center with the positive Y axis pointing toward the notch. KLA optical patterned wafer inspection tools (e.g., the 2920 and subsequent systems) typically use a different orientation convention. ASML HMI tools use yet another convention.

The practical consequence: defect coordinates from two different tools in the same fab may use different coordinate systems, and comparing them requires a coordinate transformation. If you overlay defect maps from a KLA Surfscan and a KLA eDR-7000 review SEM on the same wafer without applying the correct transformation, the apparent spatial correlation between the two maps is meaningless. This is a common source of misleading correlation results in fabs that have not implemented coordinate normalization.

SynthKernel normalizes all ingested KLARF files to a standard wafer coordinate system (center origin, positive Y toward notch up, right-handed coordinate system consistent with SEMI M1 wafer specification) as a required step during ingestion. This normalization uses the InspectionOrientation record in the KLARF header, combined with a per-tool coordinate system lookup table maintained in the integration configuration. Without this step, cross-tool overlay analysis is unreliable.

Defect Class Codes: The Classification Tower of Babel

The DefectList record includes a class code for each defect — typically an integer between 0 and 99. KLARF specifies that class codes are defined by the ClassificationList record earlier in the file, which maps each code to a text description. This sounds well-structured. In practice, the classification systems are completely non-standard across tools, vendors, and even tool generations from the same vendor.

KLA optical inspection tools classify defects into bins defined by the recipe running on the tool. A recipe for a post-etch inspection step might define bins: 1=particle_large, 2=particle_small, 3=bridge, 4=scratch, 5=nuisance, 0=unclassified. A different recipe for the same tool running a different process layer might use completely different bin numbers and descriptions. The class code in the KLARF file is only interpretable with reference to the specific recipe that generated it.

This is one of the primary reasons that automated defect classification using the rule-based bins in the KLARF file is unreliable across tool changes, recipe changes, or process layer changes. The bin definitions change, the bin numbers change, and there is no machine-readable schema for translating between bin systems. AI-based classification that works directly from defect images — bypassing the rule-based bins entirely — sidesteps this problem. The classification is derived from the image content, not the tool's recipe-specific bin code, and it remains valid across tool and recipe changes.

ImageList and the Path to Review Images

When defects are reviewed on a separate CDSEM review tool (like the KLA eDR-7000), the review results may be appended to the original KLARF file or written to a separate review KLARF file. The ImageList record in the review KLARF contains references to the SEM image files, typically as file paths relative to the review tool's local storage. These paths are tool-specific, often using Windows UNC paths to a network share on the review workstation.

For yield analytics systems that need to process the actual SEM images — to run AI classification on the defect morphology rather than relying on the rule-based bin codes — the ImageList path handling is a significant integration concern. Image file paths often break when the network share moves, when the review tool is replaced, or when images are archived. Building robust image retrieval requires either a content-addressed image archive (where images are indexed by a hash of the KLARF defect record rather than by path) or a regularly validated path registry that detects broken references before they cause silent data loss.

What KLARF Does Not Contain

Understanding what KLARF omits is as important as understanding what it contains. KLARF does not include: wafer process history (which equipment touched the wafer before inspection), recipe parameters used at the inspection step (sensitivity settings, scan resolution), post-inspection review decisions by human reviewers, or any information about the die map or circuit design. It is purely a list of detected anomalies with coordinates and tool-assigned class codes.

All of the additional context needed for yield-correlated analysis — process equipment IDs, recipe parameters, design layer information, probe yield at the same die location — comes from other data sources that must be joined to the KLARF data by lot ID, wafer ID, and coordinate. The KLARF file is the anchor; the value of yield analytics comes from what you build on top of it.

KLARF Version Compatibility

KLARF version 1.8 is the current standard and is what modern KLA tools produce. KLARF 1.6 and 1.7, produced by older tool generations, differ in several record structures: the InspectionOrientation record format changed between 1.6 and 1.8, and the DefectList attribute columns available in 1.6 are a subset of those in 1.8. Fabs with a mixed installed base of tool vintages often receive KLARF files in multiple versions from different tools in the same process flow. A KLARF parser must handle version-specific variations in record format and gracefully handle missing attributes in older versions rather than crashing on absent fields.

SEMREF and the Emerging Standards Gap

SEMI has been working on a successor to KLARF that addresses the coordinate system ambiguity and the classification code non-standardization. The SEMREF standard (SEMI E116) adds explicit coordinate system metadata and supports richer defect attribute structures. Adoption has been slow: KLA's newer platforms support E116 output, but many installed KLA tools still produce KLARF 1.8 as their primary output, and yield management software at most fabs was built to KLARF, not E116.

For the next five to seven years, KLARF will remain the dominant inspection data format in most production fabs, and any yield analytics system that depends on inspection data must be built to handle it well. The coordinate normalization, classification code abstraction, and image path management challenges described in this post are not going away. They are the engineering work that stands between raw inspection data and actionable yield intelligence, and building robust solutions to them is what makes the difference between a yield analytics demonstration and a production-quality system.