What is data mining _ data mining technology analysis

Data mining refers to the process of automatically searching for information with special relevance hidden from a large amount of data. In the world of computer storage, there are huge amounts of unused data and they are still growing rapidly. These data are like gold mines to be excavated, and the number of scientists, engineers, and analysts who conduct data analysis has been relatively small. This gap is called the main reason for data mining. Data mining is a multidisciplinary field involving neural networks, genetic algorithms, regression, statistical analysis, machine learning, cluster analysis, and special group analysis. It develops algorithms and systems for mining large-scale massive and multi-dimensional data sets, and develops appropriate privacy. And safe mode to improve the ease of use of the data system.

Data mining is different from traditional statistics. Statistical inference is hypothesis-driven, that is, forming hypotheses and verifying them on a data basis; data mining is data-driven, that is, automatically extracting patterns and assumptions from the data. The goal of data mining is to extract qualitative models that can be easily converted into logical rules or visual representations, which are more human-oriented than traditional statistics.

What is data mining _ data mining technology analysis

Data Mining Technology Brief

There are many techniques for data mining, and there are different classifications according to different classifications. The following focuses on some of the techniques commonly used in data mining: statistical techniques, association rules, history-based analysis, genetic algorithms, aggregation detection, connection analysis, decision trees, neural networks, rough sets, fuzzy sets, regression analysis, differential analysis, Concept descriptions include thirteen commonly used data mining techniques.

1. Statistical techniques

Data mining involves many fields of science and technology, such as statistical techniques. The main idea of ​​statistical techniques for mining data sets is that statistical methods assume a distribution or probability model (for example, a normal distribution) for a given data set and then use the corresponding method to mine according to the model.

2, association rules

Data association is an important class of discoverable knowledge that exists in the database. If there is some regularity in the value I of two or more variables, it is called association. Associations can be divided into simple associations, temporal associations, and causal associations. The purpose of association analysis is to find out the associated networks in the database. Sometimes the association function of the data in the database is not known, even if it is known to be uncertain, so the rules generated by the association analysis have credibility.

3. History-based MBR (Memory-based Reasoning) analysis

First look for similar situations based on empirical knowledge and then apply the information from these situations to the current example. This is the essence of MBR (Memory Based Reasoning). The MBR first looks for neighbors that are similar to the new record, and then uses these neighbors to classify and evaluate the new data. There are three main problems with using MBR, finding deterministic historical data; determining the most efficient way to represent historical data; determining the number of distance functions, union functions, and neighbors.

4. Genetic Algorithm GA (GeneTIc Algorithms)

Based on evolutionary theory, and using optimization techniques such as genetic integration, genetic variation, and natural selection. The main idea is to form a new group of the most appropriate rules of the current group according to the principle of survival of the fittest, and the descendants of these rules. Typically, the fitness of the rule is used to evaluate the classification accuracy of the training sample set.

5, aggregation detection

The process of grouping a collection of physical or abstract objects into multiple classes of similar objects is called clustering. A cluster generated by clustering is a collection of data objects that are similar to objects in the same cluster and different from objects in other clusters. The degree of dissimilarity is calculated based on the attribute value of the description object, and the distance is a frequently used measurement method.

6, connection analysis

Link analysis, Link analysis, its basic theory is graph theory. The idea of ​​graph theory is to find an algorithm that can produce good results but not perfect results, rather than an algorithm that finds perfect solutions. Connection analysis uses the idea that imperfect results are feasible if they are feasible. With connection analysis, some patterns can be analyzed from the behavior of some users; and the resulting concepts can be applied to a wider user community.

7, the decision tree

Decision trees provide a way to show rules like what values ​​are obtained under what conditions.

What is data mining _ data mining technology analysis

Butt Connector

Butt Connector,Lugs Insulated Female Connectors,Insulated Female Connectors,Non-Insulated Spade Terminals Wire Connector

Taixing Longyi Terminals Co.,Ltd. , https://www.lycopperterminals.com

Posted on