What Is Data Mining?

What Is Data Mining?

Data mining is the practice of automatically searching large stores of data to discover patterns and trends that go beyond simple analysis. Data mining a high degree of difficulty of solving mathematical algorithms to separate parts of the data and calculate the quality of probability of the future events
Meaning data mining can be described as the process of extracting or obtaining data, analyzing it from many dimensions or perspectives, and then producing a summary of the information in a useful form that identifies relationships within the data.

There are two types of data mining:
Automated discovery of previously unknown patterns (Descriptive)
· That gives information about existing data.
How? Data mining sweep through databases and identify previously hidden patterns in one-step.

Automated prediction of trends and behaviors (Predictive)
· Which makes prediction based on the data.

How? Data mining automates the process of finding predictive information in large databases. Questions that traditionally required extensive hands-on analysis can now be answered directly from the data.


Databases can be larger in both depth and breadth:

More columns - Analysts must often limit the number of variables they examine when doing hands-on analysis due to some restrictions. Yet some variables are get rid of because they seem unimportant. They may carry information about unknown patterns.

More rows - larger samples produce or provides lower quantity of errors and variance, and allow users to make a conclusion reached based on evidence and reasoning about small but important segments of a population.



Data Mining System Classification

A data mining system can be classified according to the following criteria −

· Database Technology

- CORE TECHNOLOGY with links to: information management / processing.
(Modern retailers use advanced data mining techniques to determine trends in sales and consumer preference to optimize stock control, retail performance, customer convenience and profit.)


· Machine Learning

- A type of artificial intelligence (AI) that provides computers with the ability to learn without being in a clear and detailed manner programming
(Machine learning focuses on the development of computer programs that can change when exposed to new data)


· Information Science

- The study of processes for storing and retrieving information, especially scientific or technical information.
(Concerned with the analysis, collection, classification, manipulation, storage, retrieval, movement, dissemination, and protection of information)


· Visualization

- General term that describes any effort to help people understand the significance of data by placing it in a visual context

Based on the Databases Mined
We can classify a data mining system according to the kind of databases mined. Database system can be arranged in classes or categories according to different criteria such as data models, types of data.






Based on the kind of Knowledge Mined 

We can classify a data mining system according to the kind of knowledge mined. Such as:
· Characterization

· Discrimination

· Association and Correlation Analysis

- Association refers to the general relationship between two random variables while the correlation refers to a more or less a linear relationship between the random variables.


· Classification
- systematic arrangement in groups or categories according to established criteria
(The act of forming into a class or classes; a distribution into groups, as classes, orders)


· Prediction

· Outlier Analysis
- Data objects that do not comply with the general behavior or model of the data. Such data objects, which are excessively different from or not staying the same throughout the remaining set of data


· Evolution Analysis
The Applications Adapted
We can classify a data mining system according to the applications adapted. These applications are as follows −
· Finance
· Telecommunications
· DNA
· Stock Markets
· E-mail

Advantages and Disadvantages of Data Mining brings a lot of benefits to businesses, society, governments as well as the individual. However, privacy, security, and misuse of information are the big problems if they are not addressed and resolved properly.


Researchers : 


Dan Robert Pagaduan
Brille Anne Reyes
Charlene Faye Napat
John Seve Insigne

Comments