R语言学习笔记之OutlierDetection

star2017 1年前 ⋅ 10154 阅读
Outlier Detection 孤立点检测
This page shows an example on outlier detection with the LOF (Local Outlier Factor) algorithm.
The LOF algorithm
 LOF (Local Outlier Factor) is an algorithm for identifying density-based local outliers [Breunig et al., 2000]. With LOF, the local density of a point is compared with that of its neighbors. If the former is signi.cantly lower than the latter (with an LOF value greater than one), the point is in a sparser region than its neighbors, which suggests it be an outlier.
Function lofactor(data, k) in packages DMwR and dprep calculates local outlier factors using the LOF algorithm, where k is the number of neighbors used in the calculation of the local outlier factors.
Calculate Outlier Scores
> library(DMwR)
> # remove “Species”, which is a categorical column
> iris2 <- iris[,1:4]
> outlier.scores <- lofactor(iris2, k=5)
> plot(density(outlier.scores))

102227hko5ms3dd3ns4imm

> # pick top 5 as outliers
> outliers <- order(outlier.scores, decreasing=T)[1:5]
> # who are outliers
> print(outliers)
[1] 42 107 23 110 63
Visualize Outliers with Plots Next, we show outliers with a biplot of the first two principal components.
> n <- nrow(iris2)
> labels <- 1:n
> labels[-outliers] <- “.”
> biplot(prcomp(iris2), cex=.8, xlabs=labels)
102259jmn7banalxxnbkjn
We can also show outliers with a pairs plot as below, where outliers are labeled with “+” in red.
> pch <- rep(“.”, n)
> pch[outliers] <- “+”
> col <- rep(“black”, n)
> col[outliers] <- “red”
> pairs(iris2, pch=pch, col=col)

102358tftzt969ab9a1o09

Parallel Computation of LOF Scores
Package Rlof provides function lof(), a parallel implementation of the LOF algorithm. Its usage is similar to the above lofactor(), but lof() has two additional features of supporting multiple values of k and several choices of distance metrics. Below is an example of lof().
> library(Rlof)
> outlier.scores <- lof(iris2, k=5)
> # try with different number of neighbors (k = 5,6,7,8,9 and 10)
> outlier.scores <- lof(iris2, k=c(5:10))

原创文章,作者:xsmile,如若转载,请注明出处:http://www.17bigdata.com/r%e8%af%ad%e8%a8%80%e5%ad%a6%e4%b9%a0%e7%ac%94%e8%ae%b0%e4%b9%8boutlier-detection/

更多内容请访问:IT源点

全部评论: 0

    我有话说: