*Rani of Hearts-Reena Kapoor Paristan#55*The Cave Woman

First Previous

Page

120

Next Last

Frequent Posters

Masooma Bukhari

@MasoomaBukhari

Achiever

+ 4

Posted: 5 years ago

#921

Advantages of DBSCAN:

Is great at separating clusters of high density versus clusters of low density within a given dataset.
Is great with handling outliers within the dataset.

Disadvantages of DBSCAN:

Does not work well when dealing with clusters of varying densities.
Struggles with high dimensionality data.

Quote

Masooma Bukhari

@MasoomaBukhari

Achiever

+ 4

Posted: 5 years ago

#922

a data object is a global outlier if it deviates significantly from the rest of the data set

a data object is a contextual outlier if it deviates significantly with respect to a specific context of the object. Contextual outliers are also known as conditional outliers because they are conditional on the selected context. “The temperature today is 28◦C. Is it exceptional (i.e., an outlier)?” It depends, for example, on the time and location!

An object in a data set is a local outlier if its density significantly deviates from the local area in which it occurs.

a subset of data objects forms a collective outlier if the objects as a whole deviate significantly from the entire data set.

Edited by MasoomaBukhari - 5 years ago

Quote

Looks like you are new here. Register for free, learn and contribute.

Masooma Bukhari

@MasoomaBukhari

Achiever

+ 4

Posted: 5 years ago

#923

Proximity-based methods assume that an object is an outlier if the nearest neighbors of the object are far away in feature space, that is, the proximity of the object to its neighbors significantly deviates from the proximity of most of the other objects to their neighbors in the same data set.

Quote

Masooma Bukhari

@MasoomaBukhari

Achiever

+ 4

Posted: 5 years ago

#924

Clustering-based methods assume that the normal data objects belong to large and dense clusters, whereas outliers belong to small or sparse clusters, or do not belong to any clusters

Quote

Masooma Bukhari

@MasoomaBukhari

Achiever

+ 4

Posted: 5 years ago

#925

kyu k hm ne premid time pr notes bnate bnate memes bnana shuru kr di thiii 😡 so ab bnane prege

Quote

Masooma Bukhari

@MasoomaBukhari

Achiever

+ 4

Posted: 5 years ago

#926

Principal components analysis searches for k n-dimensional orthogonal vectors that can best be used to represent the data, where k ≤ n. The original data are thus projected onto a much smaller space, resulting in dimensionality reduction. PCA “combines” the essence of attributes by creating an alternative, smaller set of variables. The initial data can then be projected onto this smaller set

Quote

Masooma Bukhari

@MasoomaBukhari

Achiever

+ 4

Posted: 5 years ago

#927

Strategies for data transformation include the following:

1. Smoothing, which works to remove noise from the data. Techniques include binning,regression, and clustering.

2. Attribute construction (or feature construction), where new attributes are constructed and added from the given set of attributes to help the mining process.

3. Aggregation, where summary or aggregation operations are applied to the data. For example, the daily sales data may be aggregated so as to compute monthly and annual total amounts.

4. Normalization, where the attribute data are scaled so as to fall within a smaller range,such as −1.0 to 1.0, or 0.0 to 1.0.

5. Discretization, where the raw values of a numeric attribute (e.g., age) are replaced by interval labels (e.g., 0–10, 11–20, etc.) or conceptual labels (e.g., youth, adult, senior)

6. Concept hierarchy generation for nominal data, where attributes such as street can be generalized to higher-level concepts, like city or country.

Quote

Masooma Bukhari

@MasoomaBukhari

Achiever

+ 4

Posted: 5 years ago

#928

normalization formula:

( (obj - min) / (max - min) ) * (new max - new min) + new min

pg 151 pdf

Quote

Masooma Bukhari

@MasoomaBukhari

Achiever

+ 4

Posted: 5 years ago

#929

The constraints can include the following:

Knowledge type constraints: These specify the type of knowledge to be mined, such as association, correlation, classification, or clustering.
Data constraints: These specify the set of task-relevant data.
Dimension/level constraints: These specify the desired dimensions (or attributes)of the data, the abstraction levels, or the level of the concept hierarchies to be used in mining.
Interestingness constraints: These specify thresholds on statistical measures of rule interestingness such as support, confidence, and correlation.
Rule constraints: These specify the form of, or conditions on, the rules to be mined.