9-Data Mining

更新时间:2023-07-12 18:35:04 阅读: 评论:0

Outline
New Words
Phras
Abbreviations
Text
Questions for Discussion
New Words(1)
New Words(3)
高考 英语
Phras(1)
Phras(3)
Abbreviations
cet etest net cnWe are deluged by data—scientific data, medical data, demographic data, financial data, and marketing data. People have no time to look at the data. Human attention has become the precious resource. So, we must find ways to automatically analyze the data, to automatically classify it, to automatically summarize it, to automatically discover and characterize trends in it, and to automatically flag anomalies.
reliable是什么意思This is one of the most active and exciting areas of the databa rearch community. Rearchers in areas including statistics, visualization, artificial intelligence, and machine learning are contributing to this field. Data mining is a multidisciplinary field, drawing work from areas including machine learning, statistics, pattern recognition, information retrieval, neural networks, knowledge-bad systems, artificial intelligence, high-performance computing, and data visualization.
知名留学机构Data mining emerged during the late 1980s, made great strides during the 1990s, and continues to flourish into the new millennium. Data mining has attracted a great deal of attention in the information industry and in society as a whole in recent years, due to the wide availability of huge amounts of data and the imminent need for turning such data into uful information and knowledge.
The information and knowledge gained can be ud for applications ranging from market analysis, fraud detection, and customer retention, to production control and science exploration.
The Evolution of Databa System Technology
Data mining can be viewed as a result of the natural evolution of information technology. The databa system industry has witnesd an evolutionary path in the development of the following functionalities (Figure 11.1):
Figure 11.1 The evolution of databa system technology
data collection and databa creation, data management (including data storage and retrieval, and databa transaction processing), and advanced data analysis (involving data warehousing and data mining). For instance, the early development of data collection and databa creation mechanisms rved as a prerequisite for later development of effective mechanisms for data storage and retrieval, and query and transaction processing.
With numerous databa systems offering query and transaction processing as common practice, advanced data analysis has naturally become the next target.
Data can now be stored in many different kinds of databas and information repositories. One data repository architecture that has emerged is the data warehou, a repository of multiple heterogeneo
us data sources organized under a unified schema at a single site in order to facilitate management decision making.abash
Data warehou technology includes data cleaning, data integration, and on-line analytical processing (OLAP), that is, analysis techniques with functionalities such as summarization, consolidation, and aggregation as well as the ability to view information from different angles.
Although OLAP tools support multidimensional analysis and decision making, additional data
analysis tools are required for in-depth analysis, such as data classification, clustering, and the characterization of data changes over time. In addition, huge volumes of data can be accumulated beyond databas and data warehous.
名人访谈
Typical examples include the World Wide Web and data streams, where data flow in and out like streams, as in applications like video surveillance, telecommunication, and nsor networks. The effective and efficient analysis of data in such different forms becomes a challenging task.
The abundance of data, coupled with the need for powerful data analysis tools, has been described as a data rich but information poor situation. The fast-growing, tremendous amount of data, collected
and stored in large and numerous data repositories, has far exceeded our human ability for comprehension without powerful tools.
分析化学就业前景As a result, data collected in large data repositories become “data tombs”—data archives that are ldom visited.
Conquently, important decisions are often made bad not on the information-rich data stored in data repositories, but rather on a decision maker’s intuition, simply becau the decision maker does not have the tools to extract the valuable knowledge embedded in the vast amounts of data.
In addition, consider expert system technologies, which typically rely on urs or domain experts to manually input knowledge into knowledge bas. Unfortunately, this procedure is prone to bias and errors, and is extremely time-consuming and costly.
What Is Data Mining?
Data mining techniques perform data analysis and may uncover important data patterns, contributing greatly to business strategies, knowledge bas, and scientific and medical rearch. The widening gap between data and information calls for a systematic development of data mining techniques that will turn “data tombs” into “golden nuggets” of knowledge.
Simply stated, data mining refers to extracting or “mining” knowledge from large amounts of data. The term is actually a misnomer. Remember that the mining of gold from rocks or sand is referred to as gold mining rather than rock or sand mining. Thus, data mining should have been more appropriately named “knowledge mining from data,” which is unfortunately somewhat long. “Knowledge mining,” a shorter term, may not reflect the emphasis on mining from large amounts of data.
couldNevertheless, mining is a vivid term characterizing the process that finds a small t of precious nuggets from a great deal of raw material. Thus, such a misnomer that carries both “data” and “mining” became a popular choice. Many other terms carry a similar or s lightly different meaning to data mining, such as knowledge mining from data, knowledge extraction, data/pattern analysis, data archaeology, and data dredging.
Many people treat data mining as a synonym for another popularly ud term, Knowledge Discovery from Databas, or KDD. Alternatively, others view data mining as simply an esntial step in the process of knowledge discovery. Knowledge discovery as a process is depicted in Figure 11.2 and consists of an iterative quence of the following steps:
Figure 11.2 Data mining as a step in the process of knowledge discoverymark feehily
1. Data cleaning (to remove noi and inconsistent data).
2. Data integration (where multiple data sources may be combined).
3. Data lection (where data relevant to the analysis task are retrieved from the databa).
4. Data transformation (where data are transformed or consolidated into forms appropriate for mining by performing summary or aggregation operations, for instance).
5. Data mining (an esntial process where intelligent methods are applied in order to extract data patterns).
6. Pattern evaluation (to identify the truly interesting patterns reprenting knowledge bad on some interestingness measures).
7. Knowledge prentation (where visualization and knowledge reprentation techniques are ud
to prent the mined knowledge to the ur).
Steps 1 to 4 are different forms of data preprocessing, where the data are prepared for mining. The d
ata mining step may interact with the ur or a knowledge ba. The interesting patterns are prented to the ur and may be stored as new knowledge in the knowledge ba. Note that according to this view, data mining is only one step in the entire process, albeit an esntial one becau it uncovers hidden patterns for evaluation.
We agree that data mining is a step in the knowledge discovery process. However, in industry, in media, and in the databa rearch milieu, the term data mining is becoming more popular than the longer term of knowledge discovery from data. We adopt a broad view of data mining functionality: data mining is the process of discovering interesting knowledge from large amounts of data stored in databas, data warehous, or other information repositories.
Architecture of a Typical Data Mining System
Bad on the view above, the architecture of a typical data mining system may have the following major components (Figure 11.3):
1. Databa, data warehou, World Wide Web, or other information repository: This is one or a t of databas, data warehous, spreadsheets, or other kinds of information repositories. Data cleaning and data integration techniques may be performed on the data.
2. Databa or data warehou rver: The databa or data warehou rver is responsible for fetching the relevant data, bad on the ur’s data mining request.
3.Knowledge ba: This is the domain knowledge that is ud to guide the arch or evaluate the interestingness of resulting patterns. Such knowledge can include concept hierarchies, ud to organize attributes or attribute values into different levels of abstraction.
4.Data mining engine: This is esntial to the data mining system and ideally consists of a t of functional modules for tasks such as characterization, association and correlation analysis, classification, prediction, cluster analysis, outlier analysis, and evolution analysis.
5.Pattern evaluation module: This component typically employs interestingness measures and interacts with the data mining modules so as to focus the arch toward interesting patterns. It may u interestingness thresholds to filter out discovered patterns. Alternatively, the pattern evaluation module may be integrated with the mining module, depending on the implementation of the data mining method ud.
For efficient data mining, it is highly recommended to push the evaluation of pattern interestingness as deep as possible into the mining process so as to confine the arch to only the interesting patter
corona
ns.
6.Ur interface: This module communicates between urs and the data mining system, allowing the ur to interact with the system by specifying a data mining query or task, providing information to help focus the arch, and performing exploratory data mining bad on the intermediate data mining results. In addition, this component allows the ur to brow databa and data warehou schemas or data structures, evaluate mined patterns, and visualize the patterns in different forms.
From a data warehou perspective, data mining can be viewed as an advanced stage of on-line analytical processing (OLAP). However, data mining goes far beyond the narrow scope of summarization-style analytical processing of data warehou systems by incorporating more advanced techniques for data analysis. Although there are many “data mining systems” on the market, not all of them can perform true data mining.
A data analysis system that does not handle large amounts of data should be more appropriately categorized as a machine learning system, a statistical data analysis tool, or an experimental system prototype.
A system that can only perform data or information retrieval, including finding aggregate values, or th
at performs deductive query answering in large databas should be more appropriately categorized as a databa system, an information retrieval system, or a deductive databa system.
Overall, data mining involves an integration of techniques from multiple disciplines such as databa and data warehou technology, statistics, machine learning, high-performance computing, pattern recognition, neural networks, data visualization, information retrieval, image and signal processing, and spatial or temporal data analysis.
By performing data mining, interesting knowledge, regularities, or high-level information can be extracted from databas and viewed or browd from different angles.
The discovered knowledge can be applied to decision making, process control, information management, and query processing. Therefore, data mining is considered one of the most important frontiers in databa and information systems and one of the most promising interdisciplinary developments in the information technology.
Questions for Discussion
Is data mining a simple transformation of technology developed from databas, statistics, and machine learning?
Explain how the evolution of databa technology led to data mining.
Describe the steps involved in data mining when viewed as a process of knowledge discovery.

本文发布于:2023-07-12 18:35:04,感谢您对本站的认可!

本文链接:https://www.wtabcd.cn/fanwen/fan/90/175332.html

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。

标签:知名   就业   访谈
相关文章
留言与评论(共有 0 条评论)
   
验证码:
Copyright ©2019-2022 Comsenz Inc.Powered by © 专利检索| 网站地图