SECOM Data Set(SECOM公司数据集)
黑布林热量
数据摘要:
Data from a mi-conductor manufacturing process. A complex modern mi-conductor manufacturing process is normally under consistent surveillance via the monitoring of signals/variables collected from nsors and or process measurement points. However, not all of the signals are equally valuable in a specific monitoring system. The measured signals contain a combination of uful information, irrelevant information as well as noi. It is often the ca that uful information is buried in the latter two. Engineers typically have a much larger number of signals than are actually required. If we consider each type of signal as a feature, then feature lection may be applied to identify the most relevant signals. The Process Engineers may then u the signals to determine key factors contributing to yield excursions downstream in the process. This will enable an increa in process throughput, decread time to learning and reduce the per unit production costs.
中文关键词:
mi-conductor,manufacturing process,monitoring培训类别
system,signals,production costs,
英文关键词:
半导体,制造过程,监控系统,信号,生产成本,
数据格式:
TEXT
数字祝福语1到10数据用途:
难忘的暑假
Classification, Causal-Discovery
数据详细介绍:
SECOM Data Set Abstract: Data from a mi-conductor manufacturing process
Source:
Authors: Michael McCann, Adrian Johnston
Data Set Information:
A complex modern mi-conductor manufacturing process is normally under consistent surveillance via the monitoring of signals/variables collected from nsors and or process measurement points. However, not all of the signals are equally valuable in a specific monitoring system. The measured signals contain a combination of uful information, irrelevant information as well as noi. It is often the ca that uful information is buried in the latter two. Engineers typically have a much larger number of signals than are actually required. If we consider each type of signal as a feature, then feature lection may be applied to identify the most relevant signals. The Process Engineers may then u the signals to determine key factors contributing to yield excursions downstream in the process. This will enable an increa in process throughput, decread time to learning and reduce the per unit production costs.
To enhance current business improvement techniques the application of feature lection as an intelligent systems technique is being investigated.
The datat prented in this ca reprents a lection of such features where each example reprents a single production entity with associated measured features and the labels reprent a simple pass/fail yield for in hou line testing, figure 2, and associated date time stamp. Where ?1 corresponds to a pass and 1 corresponds to a fail and the data time stamp is for that specific test point.
大港口Using feature lection techniques it is desired to rank features according to their impact on the overall yield for the product, causal relationships may also be considered with a view to identifying the key features.
Results may be submitted in terms of feature relevance for predictability using error rates as our evaluation metrics. It is suggested that cross validation be applied to generate the results. Some baline results are shown below for basic feature lection techniques using a simple kernel ridge classifier and 10 fold cross validation.
Baline Results: Pre-processing objects were applied to the datat simply to standardize the data and remove the constant features and then a number of different feature lection objects lecting 40 highest ranked features were applied with a simple classifier to achieve some initial results. 10 fol
d cross validation was ud and the balanced error rate (*BER) generated as our initial performance metric to help investigate this datat.
SECOM Datat: 1567 examples 591 features, 104 fails
FSmethod (40 features) BER % True + % True - %
S2N (signal to noi) 34.5 +-2.6 57.8 +-5.3 73.1 +2.1
Ttest 33.7 +-2.1 59.6 +-4.7 73.0 +-1.8
Relief 40.1 +-2.8 48.3 +-5.9 71.6 +-3.2
Pearson 34.1 +-2.0 57.4 +-4.3 74.4 +-4.9
Ftest 33.5 +-2.2 59.1 +-4.8 73.8 +-1.8
Gram Schmidt 35.6 +-2.4 51.2 +-11.8 77.5 +-2.3
Attribute Information:
Key facts: Data Structure: The data consists of 2 files the datat file SECOM consisting of 1567 examples each with 591 features a 1567 x 591 matrix and a labels file containing t he classifications and date time stamp for each example.
As with any real life data situations this data contains null values varying in intensity depending on the individuals features. This needs to be taken into consideration when investigating the data either through pre-processing or within the technique applied.
The data is reprented in a raw text file each line reprenting an individual example and the features perated by spaces. The null values are reprented by the 'NaN' value as per MatLab.
Relevant Papers:三朵玫瑰花
N/A
Citation Request:
Plea refer to the Machine Learning Repository's citation policy
pass掉是什么意思数据预览:
大坏狐狸的故事