Käesolevas õppematerjalis on tehtud põhiliste klassifitseerimisalgoritmide ülevaade mitme klassiga klassifitseerimisülesande näitel. Mudelite konstrueerimiseks on kasutatud andmestik olive paketist dslabs:
library(dslabs)
data(olive)
library(DT)
datatable(olive,options = list(scrollX = TRUE,dom = 'ltip',ordering=F,pageLength = 5))
Andmestiku struktuur:
str(olive)
## 'data.frame': 572 obs. of 10 variables:
## $ region : Factor w/ 3 levels "Northern Italy",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ area : Factor w/ 9 levels "Calabria","Coast-Sardinia",..: 5 5 5 5 5 5 5 5 5 5 ...
## $ palmitic : num 10.75 10.88 9.11 9.66 10.51 ...
## $ palmitoleic: num 0.75 0.73 0.54 0.57 0.67 0.49 0.66 0.61 0.6 0.55 ...
## $ stearic : num 2.26 2.24 2.46 2.4 2.59 2.68 2.64 2.35 2.39 2.13 ...
## $ oleic : num 78.2 77.1 81.1 79.5 77.7 ...
## $ linoleic : num 6.72 7.81 5.49 6.19 6.72 6.78 6.18 7.34 7.09 6.33 ...
## $ linolenic : num 0.36 0.31 0.31 0.5 0.5 0.51 0.49 0.39 0.46 0.26 ...
## $ arachidic : num 0.6 0.61 0.63 0.78 0.8 0.7 0.56 0.64 0.83 0.52 ...
## $ eicosenoic : num 0.29 0.29 0.29 0.35 0.46 0.44 0.29 0.35 0.33 0.3 ...
Ülesanne: konstrueerida mudel oliiviõli päritolu piirkonna area määramiseks oliiviõlis rasvhapete sisalduse järgi.
Eemaldame andmestikust tunnust region, mis annab otseset vihjet piirkonnala ning hakkab segama rasvhapete sisalduse järgi klassifitseerimisel:
olive <- olive[,-1]
Andmestikust statistiline ülevaade:
summary(olive)
## area palmitic palmitoleic stearic
## South-Apulia :206 Min. : 6.10 Min. :0.1500 Min. :1.520
## Inland-Sardinia: 65 1st Qu.:10.95 1st Qu.:0.8775 1st Qu.:2.050
## Calabria : 56 Median :12.01 Median :1.1000 Median :2.230
## Umbria : 51 Mean :12.32 Mean :1.2609 Mean :2.289
## East-Liguria : 50 3rd Qu.:13.60 3rd Qu.:1.6925 3rd Qu.:2.490
## West-Liguria : 50 Max. :17.53 Max. :2.8000 Max. :3.750
## (Other) : 94
## oleic linoleic linolenic arachidic
## Min. :63.00 Min. : 4.480 Min. :0.0000 Min. :0.000
## 1st Qu.:70.00 1st Qu.: 7.707 1st Qu.:0.2600 1st Qu.:0.500
## Median :73.03 Median :10.300 Median :0.3300 Median :0.610
## Mean :73.12 Mean : 9.805 Mean :0.3189 Mean :0.581
## 3rd Qu.:76.80 3rd Qu.:11.807 3rd Qu.:0.4025 3rd Qu.:0.700
## Max. :84.10 Max. :14.700 Max. :0.7400 Max. :1.050
##
## eicosenoic
## Min. :0.0100
## 1st Qu.:0.0200
## Median :0.1700
## Mean :0.1628
## 3rd Qu.:0.2800
## Max. :0.5800
##
Näeme, et meie prognoositav tunnus area moodustab tasakaalustamata klasse, kõige rohkem on esitatud South-Apulia.
prop.table(table(olive$area))*100
##
## Calabria Coast-Sardinia East-Liguria Inland-Sardinia North-Apulia
## 9.790210 5.769231 8.741259 11.363636 4.370629
## Sicily South-Apulia Umbria West-Liguria
## 6.293706 36.013986 8.916084 8.741259
Sel juhul accuracy ei ole õige meetrika mudeli täpsuse määramiseks ja parem on kasutada nt Cohen´i Kappa kordajat või F1 meetrikat.
Eraldame andmestikust 80% treeningandmeteks ja 20% testandmeteks
RNGkind(sample.kind = "Rounding")
set.seed(123)
sam <- sample(1:nrow(olive),floor(nrow(olive)*0.2))
olive.train <- olive[-sam,]
olive.test <- olive[sam,]
Andmete ettevalmistamisel on vaja tunnuse area faktortasemeid ümber nimetada, et vastaksid R-i standrdile (nt sidekrips nimetuses ei ole lubatud)
levels(olive.train$area) <- make.names(levels(olive.train$area))
levels(olive.train$area)
## [1] "Calabria" "Coast.Sardinia" "East.Liguria" "Inland.Sardinia"
## [5] "North.Apulia" "Sicily" "South.Apulia" "Umbria"
## [9] "West.Liguria"
levels(olive.test$area) <- make.names(levels(olive.test$area))
levels(olive.test$area)
## [1] "Calabria" "Coast.Sardinia" "East.Liguria" "Inland.Sardinia"
## [5] "North.Apulia" "Sicily" "South.Apulia" "Umbria"
## [9] "West.Liguria"
R kasutatab kaalutud lähima naabri klassifikaatorit, mille korral lähimatele objektidele antakse klassifitseerimisel suurem kaal. Kaalude arvutamiseks kasutatakse sobivat kauguste tuuma (kernel) funktsiooni. Kuna meetod põhineb kaugustel, arvulised tunnused nõuvad skaleerimist, mittearvulistest dummy variables moodustamist. Vt nt https://datasciencebook.ca/classification.html#classification-with-k-nearest-neighbors.
olive.train.scale=olive.train
olive.train.scale[,names(olive.train[,-1])]=lapply(olive.train.scale[,names(olive.train[,-1])],scale)
olive.test.scale=olive.test
olive.test.scale[,names(olive.test[,-1])]=lapply(olive.test.scale[,names(olive.test[,-1])],scale)
Kasutame paketti caret KNN klassikikaatori tuunimiseks. Funktsiooni trainControl korral määrame summaryFunction = multiClassSummary ja meetrikaks funktsioonis train määrame Kappa. Et kõik teised meetikad nt AUC oleksid kättesaadavad, on vaja installida pakett MLmetrics
library(rpart)
library(caret)
library(MLmetrics)
library(doParallel)
Mycluster = makeCluster(detectCores()-1)
registerDoParallel(Mycluster)
system.time({
RNGkind(sample.kind = "Rounding")
set.seed(123)
train.control <- trainControl(method = "cv",number = 5 ,search = "random",classProbs = TRUE,summaryFunction = multiClassSummary)
tuneGrid <- expand.grid(kmax = 3:10,distance = 1:4,kernel = c('gaussian','triangular','rectangular','epanechnikov','optimal'))
RNGkind(sample.kind = "Rounding")
set.seed(123)
kknn_fit <- train(area~.,olive.train.scale, method = 'kknn',trControl = train.control,tuneGrid = tuneGrid,metric = "Kappa")
})
## user system elapsed
## 1.92 0.09 37.90
stopCluster(Mycluster)
registerDoSEQ()
plot(kknn_fit)
kknn_fit
## k-Nearest Neighbors
##
## 458 samples
## 8 predictor
## 9 classes: 'Calabria', 'Coast.Sardinia', 'East.Liguria', 'Inland.Sardinia', 'North.Apulia', 'Sicily', 'South.Apulia', 'Umbria', 'West.Liguria'
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 367, 365, 368, 367, 365
## Resampling results across tuning parameters:
##
## kmax distance kernel logLoss AUC prAUC Accuracy
## 3 1 gaussian 0.9363879 0.9782953 0.10015755 0.9475316
## 3 1 triangular 0.6505765 0.9847859 0.11325723 0.9586183
## 3 1 rectangular 1.2189864 0.9713236 0.05731363 0.9586183
## 3 1 epanechnikov 0.6503545 0.9847859 0.11325723 0.9586183
## 3 1 optimal 1.3533639 0.9684916 0.02913030 0.9608161
## 3 2 gaussian 0.6575607 0.9838142 0.13293189 0.9586656
## 3 2 triangular 0.9403771 0.9763739 0.08740472 0.9564433
## 3 2 rectangular 0.7272751 0.9830683 0.10828702 0.9563961
## 3 2 epanechnikov 0.8693190 0.9778247 0.09941942 0.9564433
## 3 2 optimal 1.5803031 0.9621353 0.03370578 0.9542455
## 3 3 gaussian 0.5843853 0.9844076 0.14321693 0.9630383
## 3 3 triangular 0.7304993 0.9799511 0.10616317 0.9608634
## 3 3 rectangular 0.5859785 0.9841660 0.13886124 0.9630383
## 3 3 epanechnikov 0.7305021 0.9799288 0.10592478 0.9608634
## 3 3 optimal 1.3533639 0.9695384 0.02813916 0.9608161
## 3 4 gaussian 0.9423186 0.9782121 0.12160467 0.9542928
## 3 4 triangular 0.7345656 0.9798879 0.11294677 0.9587128
## 3 4 rectangular 0.9418116 0.9776201 0.10176491 0.9565150
## 3 4 epanechnikov 0.7345731 0.9798879 0.11294677 0.9587128
## 3 4 optimal 1.5019178 0.9673001 0.02922410 0.9565150
## 4 1 gaussian 0.9389410 0.9782032 0.11361445 0.9519761
## 4 1 triangular 0.5135887 0.9876184 0.18076406 0.9563961
## 4 1 rectangular 1.1558969 0.9725699 0.09919010 0.9497294
## 4 1 epanechnikov 0.5139869 0.9875846 0.18048022 0.9563961
## 4 1 optimal 1.3533639 0.9684916 0.02913030 0.9608161
## 4 2 gaussian 0.6612447 0.9838255 0.15056988 0.9565150
## 4 2 triangular 0.5081438 0.9862277 0.17343080 0.9630383
## 4 2 rectangular 0.7272751 0.9830683 0.10828702 0.9563961
## 4 2 epanechnikov 0.5082729 0.9862390 0.19106879 0.9630383
## 4 2 optimal 1.5803031 0.9621353 0.03370578 0.9542455
## 4 3 gaussian 0.5867765 0.9844188 0.15085492 0.9608878
## 4 3 triangular 0.5827790 0.9841692 0.19010161 0.9653078
## 4 3 rectangular 0.5859785 0.9841660 0.13886124 0.9630383
## 4 3 epanechnikov 0.5796553 0.9841954 0.19794123 0.9674583
## 4 3 optimal 1.3533639 0.9695384 0.02813916 0.9608161
## 4 4 gaussian 0.5948099 0.9844582 0.19170660 0.9608161
## 4 4 triangular 0.5906185 0.9840986 0.18647639 0.9631573
## 4 4 rectangular 0.9451291 0.9776051 0.11238271 0.9565150
## 4 4 epanechnikov 0.5888310 0.9842318 0.18797144 0.9653078
## 4 4 optimal 1.5019178 0.9673001 0.02922410 0.9565150
## 5 1 gaussian 0.8701471 0.9785663 0.13907515 0.9519761
## 5 1 triangular 0.4455340 0.9880904 0.19358592 0.9541983
## 5 1 rectangular 1.1558969 0.9725699 0.09919010 0.9497294
## 5 1 epanechnikov 0.4461783 0.9880565 0.19330208 0.9541983
## 5 1 optimal 1.0790528 0.9732963 0.11419861 0.9586183
## 5 2 gaussian 0.6612447 0.9838255 0.15056988 0.9565150
## 5 2 triangular 0.5069134 0.9862390 0.21563108 0.9630383
## 5 2 rectangular 0.7272751 0.9830683 0.10828702 0.9563961
## 5 2 epanechnikov 0.5098013 0.9862390 0.21765128 0.9630383
## 5 2 optimal 0.5168291 0.9859048 0.22988617 0.9563961
## 5 3 gaussian 0.5867765 0.9844188 0.15085492 0.9608878
## 5 3 triangular 0.5789736 0.9842243 0.20153820 0.9674583
## 5 3 rectangular 0.5859785 0.9841660 0.13886124 0.9630383
## 5 3 epanechnikov 0.5801737 0.9841954 0.20139241 0.9674583
## 5 3 optimal 0.5178505 0.9857641 0.22878031 0.9652361
## 5 4 gaussian 0.5948099 0.9844582 0.19170660 0.9608161
## 5 4 triangular 0.5858816 0.9839292 0.21076386 0.9631573
## 5 4 rectangular 0.9451291 0.9776051 0.11238271 0.9565150
## 5 4 epanechnikov 0.5855379 0.9840986 0.21476334 0.9631573
## 5 4 optimal 0.5917624 0.9835343 0.23746601 0.9608634
## 6 1 gaussian 0.8688396 0.9786503 0.14898271 0.9541983
## 6 1 triangular 0.3794342 0.9904515 0.20124160 0.9541983
## 6 1 rectangular 1.1558969 0.9725699 0.09919010 0.9497294
## 6 1 epanechnikov 0.3804492 0.9904176 0.20790220 0.9541983
## 6 1 optimal 1.0790528 0.9732963 0.11419861 0.9586183
## 6 2 gaussian 0.6612447 0.9838255 0.15056988 0.9565150
## 6 2 triangular 0.5100586 0.9863111 0.23336470 0.9651889
## 6 2 rectangular 0.7272751 0.9830683 0.10828702 0.9563961
## 6 2 epanechnikov 0.5107600 0.9863111 0.22296788 0.9630383
## 6 2 optimal 0.5168291 0.9859048 0.22988617 0.9563961
## 6 3 gaussian 0.5867765 0.9844188 0.15085492 0.9608878
## 6 3 triangular 0.5789736 0.9842243 0.20153820 0.9674583
## 6 3 rectangular 0.5859785 0.9841660 0.13886124 0.9630383
## 6 3 epanechnikov 0.5801737 0.9841954 0.20139241 0.9674583
## 6 3 optimal 0.5178505 0.9857641 0.22878031 0.9652361
## 6 4 gaussian 0.6012326 0.9840625 0.21635074 0.9586183
## 6 4 triangular 0.5227002 0.9853670 0.25789421 0.9631573
## 6 4 rectangular 0.9451291 0.9776051 0.11238271 0.9565150
## 6 4 epanechnikov 0.5909420 0.9837029 0.24662971 0.9610067
## 6 4 optimal 0.5924129 0.9835782 0.24503510 0.9608634
## 7 1 gaussian 0.7386400 0.9808136 0.20094877 0.9520477
## 7 1 triangular 0.3822916 0.9904627 0.23205597 0.9541983
## 7 1 rectangular 1.1558969 0.9725699 0.09919010 0.9497294
## 7 1 epanechnikov 0.3835813 0.9903962 0.24283309 0.9541983
## 7 1 optimal 0.8743831 0.9778749 0.17900575 0.9564678
## 7 2 gaussian 0.6612447 0.9838255 0.15056988 0.9565150
## 7 2 triangular 0.5112485 0.9862390 0.23261160 0.9651889
## 7 2 rectangular 0.7272751 0.9830683 0.10828702 0.9563961
## 7 2 epanechnikov 0.5107600 0.9863111 0.22296788 0.9630383
## 7 2 optimal 0.5168291 0.9859048 0.22988617 0.9563961
## 7 3 gaussian 0.5867765 0.9844188 0.15085492 0.9608878
## 7 3 triangular 0.5870414 0.9839393 0.23606049 0.9652361
## 7 3 rectangular 0.5859785 0.9841660 0.13886124 0.9630383
## 7 3 epanechnikov 0.5801737 0.9841954 0.20139241 0.9674583
## 7 3 optimal 0.5224372 0.9855781 0.24650678 0.9630383
## 7 4 gaussian 0.6012326 0.9840625 0.21635074 0.9586183
## 7 4 triangular 0.5244481 0.9853300 0.27060035 0.9631573
## 7 4 rectangular 0.9451291 0.9776051 0.11238271 0.9565150
## 7 4 epanechnikov 0.5239043 0.9853344 0.26596567 0.9610067
## 7 4 optimal 0.5963256 0.9834608 0.26368037 0.9586656
## 8 1 gaussian 0.7386400 0.9808136 0.20094877 0.9520477
## 8 1 triangular 0.3833383 0.9904627 0.23483375 0.9541983
## 8 1 rectangular 1.1558969 0.9725699 0.09919010 0.9497294
## 8 1 epanechnikov 0.3847494 0.9903962 0.24561087 0.9541983
## 8 1 optimal 0.8734958 0.9778749 0.17900575 0.9564678
## 8 2 gaussian 0.6612447 0.9838255 0.15056988 0.9565150
## 8 2 triangular 0.3758316 0.9904248 0.30089724 0.9673867
## 8 2 rectangular 0.7272751 0.9830683 0.10828702 0.9563961
## 8 2 epanechnikov 0.4447955 0.9879179 0.25954280 0.9652361
## 8 2 optimal 0.5168291 0.9859048 0.22988617 0.9563961
## 8 3 gaussian 0.5867765 0.9844188 0.15085492 0.9608878
## 8 3 triangular 0.5884754 0.9838522 0.24366034 0.9630383
## 8 3 rectangular 0.5859785 0.9841660 0.13886124 0.9630383
## 8 3 epanechnikov 0.5850546 0.9838233 0.23617328 0.9652605
## 8 3 optimal 0.5237800 0.9854909 0.25410664 0.9630383
## 8 4 gaussian 0.6012326 0.9840625 0.21635074 0.9586183
## 8 4 triangular 0.5244978 0.9853351 0.27932611 0.9631573
## 8 4 rectangular 0.9451291 0.9776051 0.11238271 0.9565150
## 8 4 epanechnikov 0.5239043 0.9853344 0.26596567 0.9610067
## 8 4 optimal 0.6004049 0.9833912 0.27925820 0.9564433
## 9 1 gaussian 0.7386400 0.9808136 0.20094877 0.9520477
## 9 1 triangular 0.3833383 0.9904627 0.23483375 0.9541983
## 9 1 rectangular 1.1558969 0.9725699 0.09919010 0.9497294
## 9 1 epanechnikov 0.3847494 0.9903962 0.24561087 0.9541983
## 9 1 optimal 0.8734958 0.9778749 0.17900575 0.9564678
## 9 2 gaussian 0.6612447 0.9838255 0.15056988 0.9565150
## 9 2 triangular 0.3758316 0.9904248 0.30089724 0.9673867
## 9 2 rectangular 0.7272751 0.9830683 0.10828702 0.9563961
## 9 2 epanechnikov 0.4447955 0.9879179 0.25954280 0.9652361
## 9 2 optimal 0.4511456 0.9875670 0.25310791 0.9563961
## 9 3 gaussian 0.5867765 0.9844188 0.15085492 0.9608878
## 9 3 triangular 0.5884754 0.9838522 0.24366034 0.9630383
## 9 3 rectangular 0.5859785 0.9841660 0.13886124 0.9630383
## 9 3 epanechnikov 0.5850546 0.9838233 0.23617328 0.9652605
## 9 3 optimal 0.4597401 0.9871078 0.25578902 0.9630383
## 9 4 gaussian 0.6012326 0.9840625 0.21635074 0.9586183
## 9 4 triangular 0.5244978 0.9853351 0.27932611 0.9631573
## 9 4 rectangular 0.9451291 0.9776051 0.11238271 0.9565150
## 9 4 epanechnikov 0.5239043 0.9853344 0.26596567 0.9610067
## 9 4 optimal 0.6004049 0.9833912 0.27925820 0.9564433
## 10 1 gaussian 0.7386400 0.9808136 0.20094877 0.9520477
## 10 1 triangular 0.3860667 0.9904301 0.24172804 0.9541983
## 10 1 rectangular 1.1558969 0.9725699 0.09919010 0.9497294
## 10 1 epanechnikov 0.3847494 0.9903962 0.24561087 0.9541983
## 10 1 optimal 0.8734958 0.9778749 0.17900575 0.9564678
## 10 2 gaussian 0.6612447 0.9838255 0.15056988 0.9565150
## 10 2 triangular 0.3780721 0.9903594 0.30319863 0.9673867
## 10 2 rectangular 0.7272751 0.9830683 0.10828702 0.9563961
## 10 2 epanechnikov 0.4447955 0.9879179 0.25954280 0.9652361
## 10 2 optimal 0.4511456 0.9875670 0.25310791 0.9563961
## 10 3 gaussian 0.5867765 0.9844188 0.15085492 0.9608878
## 10 3 triangular 0.5884754 0.9838522 0.24366034 0.9630383
## 10 3 rectangular 0.5859785 0.9841660 0.13886124 0.9630383
## 10 3 epanechnikov 0.5850546 0.9838233 0.23617328 0.9652605
## 10 3 optimal 0.4597401 0.9871078 0.25578902 0.9630383
## 10 4 gaussian 0.6012326 0.9840625 0.21635074 0.9586183
## 10 4 triangular 0.5244978 0.9853351 0.27932611 0.9631573
## 10 4 rectangular 0.9451291 0.9776051 0.11238271 0.9565150
## 10 4 epanechnikov 0.5239043 0.9853344 0.26596567 0.9610067
## 10 4 optimal 0.6004049 0.9833912 0.27925820 0.9564433
## Kappa Mean_F1 Mean_Sensitivity Mean_Specificity Mean_Pos_Pred_Value
## 0.9354334 0.9279477 0.9228032 0.9925403 0.9502527
## 0.9490984 0.9448089 0.9388085 0.9940944 0.9653958
## 0.9491855 0.9448103 0.9388085 0.9942126 0.9617274
## 0.9490984 0.9448089 0.9388085 0.9940944 0.9653958
## 0.9518643 0.9488248 0.9425122 0.9944710 0.9659602
## 0.9493506 0.9436772 0.9388085 0.9943307 0.9605659
## 0.9466530 0.9402767 0.9341789 0.9941721 0.9559394
## 0.9465329 0.9404979 0.9336498 0.9940538 0.9584812
## 0.9466530 0.9402767 0.9341789 0.9941721 0.9559394
## 0.9438920 0.9370791 0.9304752 0.9937954 0.9547775
## 0.9547364 0.9506049 0.9471419 0.9949753 0.9627208
## 0.9521330 0.9500902 0.9450134 0.9947078 0.9628119
## 0.9547086 0.9514528 0.9471419 0.9948696 0.9643454
## 0.9521330 0.9500902 0.9450134 0.9947078 0.9628119
## 0.9520171 0.9498396 0.9444843 0.9945926 0.9638305
## 0.9440240 0.9404325 0.9373415 0.9938045 0.9534252
## 0.9494252 0.9474891 0.9418388 0.9943374 0.9614550
## 0.9467095 0.9441154 0.9381351 0.9939638 0.9594943
## 0.9494252 0.9474891 0.9418388 0.9943374 0.9614550
## 0.9467383 0.9457574 0.9406363 0.9939638 0.9594732
## 0.9409419 0.9350760 0.9287556 0.9931818 0.9570593
## 0.9463797 0.9417116 0.9356339 0.9938234 0.9627062
## 0.9381994 0.9319636 0.9265069 0.9929169 0.9508170
## 0.9463797 0.9417116 0.9356339 0.9938234 0.9627062
## 0.9518643 0.9488248 0.9425122 0.9944710 0.9659602
## 0.9467242 0.9409176 0.9360307 0.9940693 0.9576029
## 0.9547392 0.9531643 0.9484646 0.9949692 0.9641405
## 0.9465329 0.9404979 0.9336498 0.9940538 0.9584812
## 0.9547392 0.9531643 0.9484646 0.9949692 0.9641405
## 0.9438920 0.9370791 0.9304752 0.9937954 0.9547775
## 0.9521145 0.9471322 0.9443641 0.9947228 0.9585232
## 0.9575694 0.9575241 0.9533468 0.9952432 0.9677502
## 0.9547086 0.9514528 0.9471419 0.9948696 0.9643454
## 0.9601589 0.9589160 0.9540202 0.9955047 0.9693874
## 0.9520171 0.9498396 0.9444843 0.9945926 0.9638305
## 0.9520202 0.9478162 0.9443641 0.9947043 0.9602516
## 0.9548616 0.9549230 0.9501722 0.9948729 0.9663932
## 0.9467095 0.9441154 0.9381351 0.9939638 0.9594943
## 0.9574510 0.9563148 0.9508456 0.9951343 0.9680304
## 0.9467383 0.9457574 0.9406363 0.9939638 0.9594732
## 0.9409419 0.9350760 0.9287556 0.9931818 0.9570593
## 0.9437009 0.9376971 0.9319302 0.9935650 0.9584734
## 0.9381994 0.9319636 0.9265069 0.9929169 0.9508170
## 0.9437009 0.9376971 0.9319302 0.9935650 0.9584734
## 0.9491855 0.9448103 0.9388085 0.9942126 0.9617274
## 0.9467242 0.9409176 0.9360307 0.9940693 0.9576029
## 0.9547392 0.9531643 0.9484646 0.9949692 0.9641405
## 0.9465329 0.9404979 0.9336498 0.9940538 0.9584812
## 0.9547392 0.9531643 0.9484646 0.9949692 0.9641405
## 0.9465631 0.9424676 0.9384117 0.9940632 0.9553948
## 0.9521145 0.9471322 0.9443641 0.9947228 0.9585232
## 0.9601589 0.9589160 0.9540202 0.9955047 0.9693874
## 0.9547086 0.9514528 0.9471419 0.9948696 0.9643454
## 0.9601589 0.9589160 0.9540202 0.9955047 0.9693874
## 0.9574427 0.9547374 0.9508456 0.9952463 0.9649430
## 0.9520202 0.9478162 0.9443641 0.9947043 0.9602516
## 0.9547988 0.9527498 0.9476710 0.9948729 0.9654378
## 0.9467095 0.9441154 0.9381351 0.9939638 0.9594943
## 0.9547988 0.9527498 0.9476710 0.9948729 0.9654378
## 0.9520187 0.9493476 0.9448932 0.9946049 0.9611169
## 0.9437058 0.9363717 0.9319302 0.9935776 0.9561895
## 0.9437009 0.9376971 0.9319302 0.9935650 0.9584734
## 0.9381994 0.9319636 0.9265069 0.9929169 0.9508170
## 0.9437009 0.9376971 0.9319302 0.9935650 0.9584734
## 0.9491855 0.9448103 0.9388085 0.9942126 0.9617274
## 0.9467242 0.9409176 0.9360307 0.9940693 0.9576029
## 0.9573461 0.9546144 0.9487412 0.9952274 0.9664443
## 0.9465329 0.9404979 0.9336498 0.9940538 0.9584812
## 0.9547384 0.9531107 0.9480678 0.9949659 0.9644491
## 0.9465631 0.9424676 0.9384117 0.9940632 0.9553948
## 0.9521145 0.9471322 0.9443641 0.9947228 0.9585232
## 0.9601589 0.9589160 0.9540202 0.9955047 0.9693874
## 0.9547086 0.9514528 0.9471419 0.9948696 0.9643454
## 0.9601589 0.9589160 0.9540202 0.9955047 0.9693874
## 0.9574427 0.9547374 0.9508456 0.9952463 0.9649430
## 0.9493115 0.9430599 0.9406604 0.9944333 0.9584334
## 0.9547988 0.9527498 0.9476710 0.9948729 0.9654378
## 0.9467095 0.9441154 0.9381351 0.9939638 0.9594943
## 0.9521754 0.9483727 0.9444964 0.9946203 0.9617341
## 0.9520187 0.9493476 0.9448932 0.9946049 0.9611169
## 0.9410508 0.9319414 0.9266512 0.9933161 0.9534315
## 0.9436815 0.9360445 0.9294290 0.9935650 0.9584932
## 0.9381994 0.9319636 0.9265069 0.9929169 0.9508170
## 0.9436815 0.9360445 0.9294290 0.9935650 0.9584932
## 0.9465584 0.9416541 0.9356339 0.9939512 0.9597521
## 0.9467242 0.9409176 0.9360307 0.9940693 0.9576029
## 0.9573461 0.9546144 0.9487412 0.9952274 0.9664443
## 0.9465329 0.9404979 0.9336498 0.9940538 0.9584812
## 0.9547384 0.9531107 0.9480678 0.9949659 0.9644491
## 0.9465631 0.9424676 0.9384117 0.9940632 0.9553948
## 0.9521145 0.9471322 0.9443641 0.9947228 0.9585232
## 0.9574411 0.9558994 0.9508456 0.9952337 0.9669183
## 0.9547086 0.9514528 0.9471419 0.9948696 0.9643454
## 0.9601589 0.9589160 0.9540202 0.9955047 0.9693874
## 0.9547364 0.9506049 0.9471419 0.9949753 0.9627208
## 0.9493115 0.9430599 0.9406604 0.9944333 0.9584334
## 0.9547731 0.9498763 0.9451698 0.9948818 0.9637293
## 0.9467095 0.9441154 0.9381351 0.9939638 0.9594943
## 0.9521754 0.9483727 0.9444964 0.9946203 0.9617341
## 0.9493124 0.9452151 0.9411895 0.9943339 0.9588946
## 0.9410508 0.9319414 0.9266512 0.9933161 0.9534315
## 0.9436815 0.9360445 0.9294290 0.9935650 0.9584932
## 0.9381994 0.9319636 0.9265069 0.9929169 0.9508170
## 0.9436815 0.9360445 0.9294290 0.9935650 0.9584932
## 0.9465584 0.9416541 0.9356339 0.9939512 0.9597521
## 0.9467242 0.9409176 0.9360307 0.9940693 0.9576029
## 0.9600488 0.9574030 0.9515190 0.9954951 0.9689134
## 0.9465329 0.9404979 0.9336498 0.9940538 0.9584812
## 0.9574411 0.9558994 0.9508456 0.9952337 0.9669183
## 0.9465631 0.9424676 0.9384117 0.9940632 0.9553948
## 0.9521145 0.9471322 0.9443641 0.9947228 0.9585232
## 0.9547347 0.9517668 0.9471419 0.9949627 0.9646961
## 0.9547086 0.9514528 0.9471419 0.9948696 0.9643454
## 0.9574526 0.9547834 0.9503165 0.9952337 0.9671652
## 0.9547364 0.9506049 0.9471419 0.9949753 0.9627208
## 0.9493115 0.9430599 0.9406604 0.9944333 0.9584334
## 0.9547731 0.9498763 0.9451698 0.9948818 0.9637293
## 0.9467095 0.9441154 0.9381351 0.9939638 0.9594943
## 0.9521754 0.9483727 0.9444964 0.9946203 0.9617341
## 0.9465950 0.9419136 0.9380149 0.9940629 0.9564255
## 0.9410508 0.9319414 0.9266512 0.9933161 0.9534315
## 0.9436815 0.9360445 0.9294290 0.9935650 0.9584932
## 0.9381994 0.9319636 0.9265069 0.9929169 0.9508170
## 0.9436815 0.9360445 0.9294290 0.9935650 0.9584932
## 0.9465584 0.9416541 0.9356339 0.9939512 0.9597521
## 0.9467242 0.9409176 0.9360307 0.9940693 0.9576029
## 0.9600488 0.9574030 0.9515190 0.9954951 0.9689134
## 0.9465329 0.9404979 0.9336498 0.9940538 0.9584812
## 0.9574411 0.9558994 0.9508456 0.9952337 0.9669183
## 0.9465631 0.9424676 0.9384117 0.9940632 0.9553948
## 0.9521145 0.9471322 0.9443641 0.9947228 0.9585232
## 0.9547347 0.9517668 0.9471419 0.9949627 0.9646961
## 0.9547086 0.9514528 0.9471419 0.9948696 0.9643454
## 0.9574526 0.9547834 0.9503165 0.9952337 0.9671652
## 0.9547364 0.9506049 0.9471419 0.9949753 0.9627208
## 0.9493115 0.9430599 0.9406604 0.9944333 0.9584334
## 0.9547731 0.9498763 0.9451698 0.9948818 0.9637293
## 0.9467095 0.9441154 0.9381351 0.9939638 0.9594943
## 0.9521754 0.9483727 0.9444964 0.9946203 0.9617341
## 0.9465950 0.9419136 0.9380149 0.9940629 0.9564255
## 0.9410508 0.9319414 0.9266512 0.9933161 0.9534315
## 0.9436815 0.9360445 0.9294290 0.9935650 0.9584932
## 0.9381994 0.9319636 0.9265069 0.9929169 0.9508170
## 0.9436815 0.9360445 0.9294290 0.9935650 0.9584932
## 0.9465584 0.9416541 0.9356339 0.9939512 0.9597521
## 0.9467242 0.9409176 0.9360307 0.9940693 0.9576029
## 0.9600488 0.9574030 0.9515190 0.9954951 0.9689134
## 0.9465329 0.9404979 0.9336498 0.9940538 0.9584812
## 0.9574411 0.9558994 0.9508456 0.9952337 0.9669183
## 0.9465631 0.9424676 0.9384117 0.9940632 0.9553948
## 0.9521145 0.9471322 0.9443641 0.9947228 0.9585232
## 0.9547347 0.9517668 0.9471419 0.9949627 0.9646961
## 0.9547086 0.9514528 0.9471419 0.9948696 0.9643454
## 0.9574526 0.9547834 0.9503165 0.9952337 0.9671652
## 0.9547364 0.9506049 0.9471419 0.9949753 0.9627208
## 0.9493115 0.9430599 0.9406604 0.9944333 0.9584334
## 0.9547731 0.9498763 0.9451698 0.9948818 0.9637293
## 0.9467095 0.9441154 0.9381351 0.9939638 0.9594943
## 0.9521754 0.9483727 0.9444964 0.9946203 0.9617341
## 0.9465950 0.9419136 0.9380149 0.9940629 0.9564255
## Mean_Neg_Pred_Value Mean_Precision Mean_Recall Mean_Detection_Rate
## 0.9936021 0.9502527 0.9228032 0.1052813
## 0.9949004 0.9653958 0.9388085 0.1065131
## 0.9948801 0.9617274 0.9388085 0.1065131
## 0.9949004 0.9653958 0.9388085 0.1065131
## 0.9951416 0.9659602 0.9425122 0.1067573
## 0.9948805 0.9605659 0.9388085 0.1065184
## 0.9945979 0.9559394 0.9341789 0.1062715
## 0.9946004 0.9584812 0.9336498 0.1062662
## 0.9945979 0.9559394 0.9341789 0.1062715
## 0.9943450 0.9547775 0.9304752 0.1060273
## 0.9953961 0.9627208 0.9471419 0.1070043
## 0.9950260 0.9628119 0.9450134 0.1067626
## 0.9953961 0.9643454 0.9471419 0.1070043
## 0.9950260 0.9628119 0.9450134 0.1067626
## 0.9950289 0.9638305 0.9444843 0.1067573
## 0.9942379 0.9534252 0.9373415 0.1060325
## 0.9947792 0.9614550 0.9418388 0.1065236
## 0.9945054 0.9594943 0.9381351 0.1062794
## 0.9947792 0.9614550 0.9418388 0.1065236
## 0.9943968 0.9594732 0.9406363 0.1062794
## 0.9941190 0.9570593 0.9287556 0.1057751
## 0.9946357 0.9627062 0.9356339 0.1062662
## 0.9938433 0.9508170 0.9265069 0.1055255
## 0.9946357 0.9627062 0.9356339 0.1062662
## 0.9951416 0.9659602 0.9425122 0.1067573
## 0.9946159 0.9576029 0.9360307 0.1062794
## 0.9953751 0.9641405 0.9484646 0.1070043
## 0.9946004 0.9584812 0.9336498 0.1062662
## 0.9953751 0.9641405 0.9484646 0.1070043
## 0.9943450 0.9547775 0.9304752 0.1060273
## 0.9951315 0.9585232 0.9443641 0.1067653
## 0.9955491 0.9677502 0.9533468 0.1072564
## 0.9953961 0.9643454 0.9471419 0.1070043
## 0.9959074 0.9693874 0.9540202 0.1074954
## 0.9950289 0.9638305 0.9444843 0.1067573
## 0.9951283 0.9602516 0.9443641 0.1067573
## 0.9953024 0.9663932 0.9501722 0.1070175
## 0.9945054 0.9594943 0.9381351 0.1062794
## 0.9956606 0.9680304 0.9508456 0.1072564
## 0.9943968 0.9594732 0.9406363 0.1062794
## 0.9941190 0.9570593 0.9287556 0.1057751
## 0.9943742 0.9584734 0.9319302 0.1060220
## 0.9938433 0.9508170 0.9265069 0.1055255
## 0.9943742 0.9584734 0.9319302 0.1060220
## 0.9948801 0.9617274 0.9388085 0.1065131
## 0.9946159 0.9576029 0.9360307 0.1062794
## 0.9953751 0.9641405 0.9484646 0.1070043
## 0.9946004 0.9584812 0.9336498 0.1062662
## 0.9953751 0.9641405 0.9484646 0.1070043
## 0.9945960 0.9553948 0.9384117 0.1062662
## 0.9951315 0.9585232 0.9443641 0.1067653
## 0.9959074 0.9693874 0.9540202 0.1074954
## 0.9953961 0.9643454 0.9471419 0.1070043
## 0.9959074 0.9693874 0.9540202 0.1074954
## 0.9956428 0.9649430 0.9508456 0.1072485
## 0.9951283 0.9602516 0.9443641 0.1067573
## 0.9954131 0.9654378 0.9476710 0.1070175
## 0.9945054 0.9594943 0.9381351 0.1062794
## 0.9954131 0.9654378 0.9476710 0.1070175
## 0.9951283 0.9611169 0.9448932 0.1067626
## 0.9943776 0.9561895 0.9319302 0.1060220
## 0.9943742 0.9584734 0.9319302 0.1060220
## 0.9938433 0.9508170 0.9265069 0.1055255
## 0.9943742 0.9584734 0.9319302 0.1060220
## 0.9948801 0.9617274 0.9388085 0.1065131
## 0.9946159 0.9576029 0.9360307 0.1062794
## 0.9957486 0.9664443 0.9487412 0.1072432
## 0.9946004 0.9584812 0.9336498 0.1062662
## 0.9953783 0.9644491 0.9480678 0.1070043
## 0.9945960 0.9553948 0.9384117 0.1062662
## 0.9951315 0.9585232 0.9443641 0.1067653
## 0.9959074 0.9693874 0.9540202 0.1074954
## 0.9953961 0.9643454 0.9471419 0.1070043
## 0.9959074 0.9693874 0.9540202 0.1074954
## 0.9956428 0.9649430 0.9508456 0.1072485
## 0.9948872 0.9584334 0.9406604 0.1065131
## 0.9954131 0.9654378 0.9476710 0.1070175
## 0.9945054 0.9594943 0.9381351 0.1062794
## 0.9951691 0.9617341 0.9444964 0.1067785
## 0.9951283 0.9611169 0.9448932 0.1067626
## 0.9942369 0.9534315 0.9266512 0.1057831
## 0.9944950 0.9584932 0.9294290 0.1060220
## 0.9938433 0.9508170 0.9265069 0.1055255
## 0.9944950 0.9584932 0.9294290 0.1060220
## 0.9946305 0.9597521 0.9356339 0.1062742
## 0.9946159 0.9576029 0.9360307 0.1062794
## 0.9957486 0.9664443 0.9487412 0.1072432
## 0.9946004 0.9584812 0.9336498 0.1062662
## 0.9953783 0.9644491 0.9480678 0.1070043
## 0.9945960 0.9553948 0.9384117 0.1062662
## 0.9951315 0.9585232 0.9443641 0.1067653
## 0.9956428 0.9669183 0.9508456 0.1072485
## 0.9953961 0.9643454 0.9471419 0.1070043
## 0.9959074 0.9693874 0.9540202 0.1074954
## 0.9953961 0.9627208 0.9471419 0.1070043
## 0.9948872 0.9584334 0.9406604 0.1065131
## 0.9955394 0.9637293 0.9451698 0.1070175
## 0.9945054 0.9594943 0.9381351 0.1062794
## 0.9951691 0.9617341 0.9444964 0.1067785
## 0.9948816 0.9588946 0.9411895 0.1065184
## 0.9942369 0.9534315 0.9266512 0.1057831
## 0.9944950 0.9584932 0.9294290 0.1060220
## 0.9938433 0.9508170 0.9265069 0.1055255
## 0.9944950 0.9584932 0.9294290 0.1060220
## 0.9946305 0.9597521 0.9356339 0.1062742
## 0.9946159 0.9576029 0.9360307 0.1062794
## 0.9960132 0.9689134 0.9515190 0.1074874
## 0.9946004 0.9584812 0.9336498 0.1062662
## 0.9956428 0.9669183 0.9508456 0.1072485
## 0.9945960 0.9553948 0.9384117 0.1062662
## 0.9951315 0.9585232 0.9443641 0.1067653
## 0.9953961 0.9646961 0.9471419 0.1070043
## 0.9953961 0.9643454 0.9471419 0.1070043
## 0.9956606 0.9671652 0.9503165 0.1072512
## 0.9953961 0.9627208 0.9471419 0.1070043
## 0.9948872 0.9584334 0.9406604 0.1065131
## 0.9955394 0.9637293 0.9451698 0.1070175
## 0.9945054 0.9594943 0.9381351 0.1062794
## 0.9951691 0.9617341 0.9444964 0.1067785
## 0.9946233 0.9564255 0.9380149 0.1062715
## 0.9942369 0.9534315 0.9266512 0.1057831
## 0.9944950 0.9584932 0.9294290 0.1060220
## 0.9938433 0.9508170 0.9265069 0.1055255
## 0.9944950 0.9584932 0.9294290 0.1060220
## 0.9946305 0.9597521 0.9356339 0.1062742
## 0.9946159 0.9576029 0.9360307 0.1062794
## 0.9960132 0.9689134 0.9515190 0.1074874
## 0.9946004 0.9584812 0.9336498 0.1062662
## 0.9956428 0.9669183 0.9508456 0.1072485
## 0.9945960 0.9553948 0.9384117 0.1062662
## 0.9951315 0.9585232 0.9443641 0.1067653
## 0.9953961 0.9646961 0.9471419 0.1070043
## 0.9953961 0.9643454 0.9471419 0.1070043
## 0.9956606 0.9671652 0.9503165 0.1072512
## 0.9953961 0.9627208 0.9471419 0.1070043
## 0.9948872 0.9584334 0.9406604 0.1065131
## 0.9955394 0.9637293 0.9451698 0.1070175
## 0.9945054 0.9594943 0.9381351 0.1062794
## 0.9951691 0.9617341 0.9444964 0.1067785
## 0.9946233 0.9564255 0.9380149 0.1062715
## 0.9942369 0.9534315 0.9266512 0.1057831
## 0.9944950 0.9584932 0.9294290 0.1060220
## 0.9938433 0.9508170 0.9265069 0.1055255
## 0.9944950 0.9584932 0.9294290 0.1060220
## 0.9946305 0.9597521 0.9356339 0.1062742
## 0.9946159 0.9576029 0.9360307 0.1062794
## 0.9960132 0.9689134 0.9515190 0.1074874
## 0.9946004 0.9584812 0.9336498 0.1062662
## 0.9956428 0.9669183 0.9508456 0.1072485
## 0.9945960 0.9553948 0.9384117 0.1062662
## 0.9951315 0.9585232 0.9443641 0.1067653
## 0.9953961 0.9646961 0.9471419 0.1070043
## 0.9953961 0.9643454 0.9471419 0.1070043
## 0.9956606 0.9671652 0.9503165 0.1072512
## 0.9953961 0.9627208 0.9471419 0.1070043
## 0.9948872 0.9584334 0.9406604 0.1065131
## 0.9955394 0.9637293 0.9451698 0.1070175
## 0.9945054 0.9594943 0.9381351 0.1062794
## 0.9951691 0.9617341 0.9444964 0.1067785
## 0.9946233 0.9564255 0.9380149 0.1062715
## Mean_Balanced_Accuracy
## 0.9576718
## 0.9664514
## 0.9665106
## 0.9664514
## 0.9684916
## 0.9665696
## 0.9641755
## 0.9638518
## 0.9641755
## 0.9621353
## 0.9710586
## 0.9698606
## 0.9710057
## 0.9698606
## 0.9695384
## 0.9655730
## 0.9680881
## 0.9660495
## 0.9680881
## 0.9673001
## 0.9609687
## 0.9647286
## 0.9597119
## 0.9647286
## 0.9684916
## 0.9650500
## 0.9717169
## 0.9638518
## 0.9717169
## 0.9621353
## 0.9695434
## 0.9742950
## 0.9710057
## 0.9747624
## 0.9695384
## 0.9695342
## 0.9725225
## 0.9660495
## 0.9729899
## 0.9673001
## 0.9609687
## 0.9627476
## 0.9597119
## 0.9627476
## 0.9665106
## 0.9650500
## 0.9717169
## 0.9638518
## 0.9717169
## 0.9662374
## 0.9695434
## 0.9747624
## 0.9710057
## 0.9747624
## 0.9730459
## 0.9695342
## 0.9712719
## 0.9660495
## 0.9712719
## 0.9697490
## 0.9627539
## 0.9627476
## 0.9597119
## 0.9627476
## 0.9665106
## 0.9650500
## 0.9719843
## 0.9638518
## 0.9715169
## 0.9662374
## 0.9695434
## 0.9747624
## 0.9710057
## 0.9747624
## 0.9730459
## 0.9675468
## 0.9712719
## 0.9660495
## 0.9695583
## 0.9697490
## 0.9599837
## 0.9614970
## 0.9597119
## 0.9614970
## 0.9647925
## 0.9650500
## 0.9719843
## 0.9638518
## 0.9715169
## 0.9662374
## 0.9695434
## 0.9730396
## 0.9710057
## 0.9747624
## 0.9710586
## 0.9675468
## 0.9700258
## 0.9660495
## 0.9695583
## 0.9677617
## 0.9599837
## 0.9614970
## 0.9597119
## 0.9614970
## 0.9647925
## 0.9650500
## 0.9735070
## 0.9638518
## 0.9730396
## 0.9662374
## 0.9695434
## 0.9710523
## 0.9710057
## 0.9727751
## 0.9710586
## 0.9675468
## 0.9700258
## 0.9660495
## 0.9695583
## 0.9660389
## 0.9599837
## 0.9614970
## 0.9597119
## 0.9614970
## 0.9647925
## 0.9650500
## 0.9735070
## 0.9638518
## 0.9730396
## 0.9662374
## 0.9695434
## 0.9710523
## 0.9710057
## 0.9727751
## 0.9710586
## 0.9675468
## 0.9700258
## 0.9660495
## 0.9695583
## 0.9660389
## 0.9599837
## 0.9614970
## 0.9597119
## 0.9614970
## 0.9647925
## 0.9650500
## 0.9735070
## 0.9638518
## 0.9730396
## 0.9662374
## 0.9695434
## 0.9710523
## 0.9710057
## 0.9727751
## 0.9710586
## 0.9675468
## 0.9700258
## 0.9660495
## 0.9695583
## 0.9660389
##
## Kappa was used to select the optimal model using the largest value.
## The final values used for the model were kmax = 7, distance = 3 and kernel
## = epanechnikov.
head(kknn_fit$results[order(-kknn_fit$results$Kappa),],20)
## kmax distance kernel logLoss AUC prAUC Accuracy
## 34 4 3 epanechnikov 0.5796553 0.9841954 0.1979412 0.9674583
## 52 5 3 triangular 0.5789736 0.9842243 0.2015382 0.9674583
## 54 5 3 epanechnikov 0.5801737 0.9841954 0.2013924 0.9674583
## 72 6 3 triangular 0.5789736 0.9842243 0.2015382 0.9674583
## 74 6 3 epanechnikov 0.5801737 0.9841954 0.2013924 0.9674583
## 94 7 3 epanechnikov 0.5801737 0.9841954 0.2013924 0.9674583
## 107 8 2 triangular 0.3758316 0.9904248 0.3008972 0.9673867
## 127 9 2 triangular 0.3758316 0.9904248 0.3008972 0.9673867
## 147 10 2 triangular 0.3780721 0.9903594 0.3031986 0.9673867
## 32 4 3 triangular 0.5827790 0.9841692 0.1901016 0.9653078
## 114 8 3 epanechnikov 0.5850546 0.9838233 0.2361733 0.9652605
## 134 9 3 epanechnikov 0.5850546 0.9838233 0.2361733 0.9652605
## 154 10 3 epanechnikov 0.5850546 0.9838233 0.2361733 0.9652605
## 39 4 4 epanechnikov 0.5888310 0.9842318 0.1879714 0.9653078
## 55 5 3 optimal 0.5178505 0.9857641 0.2287803 0.9652361
## 75 6 3 optimal 0.5178505 0.9857641 0.2287803 0.9652361
## 92 7 3 triangular 0.5870414 0.9839393 0.2360605 0.9652361
## 109 8 2 epanechnikov 0.4447955 0.9879179 0.2595428 0.9652361
## 129 9 2 epanechnikov 0.4447955 0.9879179 0.2595428 0.9652361
## 149 10 2 epanechnikov 0.4447955 0.9879179 0.2595428 0.9652361
## Kappa Mean_F1 Mean_Sensitivity Mean_Specificity Mean_Pos_Pred_Value
## 34 0.9601589 0.9589160 0.9540202 0.9955047 0.9693874
## 52 0.9601589 0.9589160 0.9540202 0.9955047 0.9693874
## 54 0.9601589 0.9589160 0.9540202 0.9955047 0.9693874
## 72 0.9601589 0.9589160 0.9540202 0.9955047 0.9693874
## 74 0.9601589 0.9589160 0.9540202 0.9955047 0.9693874
## 94 0.9601589 0.9589160 0.9540202 0.9955047 0.9693874
## 107 0.9600488 0.9574030 0.9515190 0.9954951 0.9689134
## 127 0.9600488 0.9574030 0.9515190 0.9954951 0.9689134
## 147 0.9600488 0.9574030 0.9515190 0.9954951 0.9689134
## 32 0.9575694 0.9575241 0.9533468 0.9952432 0.9677502
## 114 0.9574526 0.9547834 0.9503165 0.9952337 0.9671652
## 134 0.9574526 0.9547834 0.9503165 0.9952337 0.9671652
## 154 0.9574526 0.9547834 0.9503165 0.9952337 0.9671652
## 39 0.9574510 0.9563148 0.9508456 0.9951343 0.9680304
## 55 0.9574427 0.9547374 0.9508456 0.9952463 0.9649430
## 75 0.9574427 0.9547374 0.9508456 0.9952463 0.9649430
## 92 0.9574411 0.9558994 0.9508456 0.9952337 0.9669183
## 109 0.9574411 0.9558994 0.9508456 0.9952337 0.9669183
## 129 0.9574411 0.9558994 0.9508456 0.9952337 0.9669183
## 149 0.9574411 0.9558994 0.9508456 0.9952337 0.9669183
## Mean_Neg_Pred_Value Mean_Precision Mean_Recall Mean_Detection_Rate
## 34 0.9959074 0.9693874 0.9540202 0.1074954
## 52 0.9959074 0.9693874 0.9540202 0.1074954
## 54 0.9959074 0.9693874 0.9540202 0.1074954
## 72 0.9959074 0.9693874 0.9540202 0.1074954
## 74 0.9959074 0.9693874 0.9540202 0.1074954
## 94 0.9959074 0.9693874 0.9540202 0.1074954
## 107 0.9960132 0.9689134 0.9515190 0.1074874
## 127 0.9960132 0.9689134 0.9515190 0.1074874
## 147 0.9960132 0.9689134 0.9515190 0.1074874
## 32 0.9955491 0.9677502 0.9533468 0.1072564
## 114 0.9956606 0.9671652 0.9503165 0.1072512
## 134 0.9956606 0.9671652 0.9503165 0.1072512
## 154 0.9956606 0.9671652 0.9503165 0.1072512
## 39 0.9956606 0.9680304 0.9508456 0.1072564
## 55 0.9956428 0.9649430 0.9508456 0.1072485
## 75 0.9956428 0.9649430 0.9508456 0.1072485
## 92 0.9956428 0.9669183 0.9508456 0.1072485
## 109 0.9956428 0.9669183 0.9508456 0.1072485
## 129 0.9956428 0.9669183 0.9508456 0.1072485
## 149 0.9956428 0.9669183 0.9508456 0.1072485
## Mean_Balanced_Accuracy logLossSD AUCSD prAUCSD AccuracySD
## 34 0.9747624 0.5065049 0.01516802 0.04675295 0.02410024
## 52 0.9747624 0.5067785 0.01519625 0.04409328 0.02410024
## 54 0.9747624 0.5058397 0.01516802 0.04360954 0.02410024
## 72 0.9747624 0.5067785 0.01519625 0.04409328 0.02410024
## 74 0.9747624 0.5058397 0.01516802 0.04360954 0.02410024
## 94 0.9747624 0.5058397 0.01516802 0.04360954 0.02410024
## 107 0.9735070 0.4807288 0.01474440 0.06293014 0.02147841
## 127 0.9735070 0.4807288 0.01474440 0.06293014 0.02147841
## 147 0.9735070 0.4812466 0.01476486 0.06411596 0.02147841
## 32 0.9742950 0.5019539 0.01512556 0.05784717 0.02454426
## 114 0.9727751 0.5085178 0.01536931 0.06260142 0.02338656
## 134 0.9727751 0.5085178 0.01536931 0.06260142 0.02338656
## 154 0.9727751 0.5085178 0.01536931 0.06260142 0.02338656
## 39 0.9729899 0.5174999 0.01525469 0.05288171 0.02785411
## 55 0.9730459 0.4871126 0.01475850 0.04165115 0.02060888
## 75 0.9730459 0.4871126 0.01475850 0.04165115 0.02060888
## 92 0.9730396 0.5016083 0.01534698 0.05920592 0.02060888
## 109 0.9730396 0.4569198 0.01388791 0.05076053 0.02060888
## 129 0.9730396 0.4569198 0.01388791 0.05076053 0.02060888
## 149 0.9730396 0.4569198 0.01388791 0.05076053 0.02060888
## KappaSD Mean_F1SD Mean_SensitivitySD Mean_SpecificitySD
## 34 0.02953297 0.03356604 0.03653540 0.003068656
## 52 0.02953297 0.03356604 0.03653540 0.003068656
## 54 0.02953297 0.03356604 0.03653540 0.003068656
## 72 0.02953297 0.03356604 0.03653540 0.003068656
## 74 0.02953297 0.03356604 0.03653540 0.003068656
## 94 0.02953297 0.03356604 0.03653540 0.003068656
## 107 0.02632590 0.03017864 0.03234521 0.002697432
## 127 0.02632590 0.03017864 0.03234521 0.002697432
## 147 0.02632590 0.03017864 0.03234521 0.002697432
## 32 0.03004726 0.03335446 0.03631541 0.003120799
## 114 0.02867263 0.03243434 0.03634032 0.003047961
## 134 0.02867263 0.03243434 0.03634032 0.003047961
## 154 0.02867263 0.03243434 0.03634032 0.003047961
## 39 0.03428097 0.03836147 0.04255770 0.003700350
## 55 0.02526948 0.02801432 0.03194224 0.002616590
## 75 0.02526948 0.02801432 0.03194224 0.002616590
## 92 0.02526713 0.02936569 0.03194224 0.002595814
## 109 0.02526713 0.02936569 0.03194224 0.002595814
## 129 0.02526713 0.02936569 0.03194224 0.002595814
## 149 0.02526713 0.02936569 0.03194224 0.002595814
## Mean_Pos_Pred_ValueSD Mean_Neg_Pred_ValueSD Mean_PrecisionSD Mean_RecallSD
## 34 0.02980704 0.002974632 0.02980704 0.03653540
## 52 0.02980704 0.002974632 0.02980704 0.03653540
## 54 0.02980704 0.002974632 0.02980704 0.03653540
## 72 0.02980704 0.002974632 0.02980704 0.03653540
## 74 0.02980704 0.002974632 0.02980704 0.03653540
## 94 0.02980704 0.002974632 0.02980704 0.03653540
## 107 0.02805797 0.002675637 0.02805797 0.03234521
## 127 0.02805797 0.002675637 0.02805797 0.03234521
## 147 0.02805797 0.002675637 0.02805797 0.03234521
## 32 0.02968672 0.003121670 0.02968672 0.03631541
## 114 0.02791759 0.002864250 0.02791759 0.03634032
## 134 0.02791759 0.002864250 0.02791759 0.03634032
## 154 0.02791759 0.002864250 0.02791759 0.03634032
## 39 0.03217536 0.003376896 0.03217536 0.04255770
## 55 0.02543787 0.002547861 0.02543787 0.03194224
## 75 0.02543787 0.002547861 0.02543787 0.03194224
## 92 0.02701807 0.002547861 0.02701807 0.03194224
## 109 0.02701807 0.002547861 0.02701807 0.03194224
## 129 0.02701807 0.002547861 0.02701807 0.03194224
## 149 0.02701807 0.002547861 0.02701807 0.03194224
## Mean_Detection_RateSD Mean_Balanced_AccuracySD
## 34 0.002677804 0.01976991
## 52 0.002677804 0.01976991
## 54 0.002677804 0.01976991
## 72 0.002677804 0.01976991
## 74 0.002677804 0.01976991
## 94 0.002677804 0.01976991
## 107 0.002386489 0.01750609
## 127 0.002386489 0.01750609
## 147 0.002386489 0.01750609
## 32 0.002727140 0.01963608
## 114 0.002598506 0.01966141
## 134 0.002598506 0.01966141
## 154 0.002598506 0.01966141
## 39 0.003094901 0.02310962
## 55 0.002289875 0.01724183
## 75 0.002289875 0.01724183
## 92 0.002289875 0.01723344
## 109 0.002289875 0.01723344
## 129 0.002289875 0.01723344
## 149 0.002289875 0.01723344
Parim leitud mudel:
kknn_fit$bestTune
## kmax distance kernel
## 94 7 3 epanechnikov
Parima mudeli parameetrid: kmax = 7, distance = 3, kernel = epanechnikov.
Samaväärsed mudelid:
datatable(head(kknn_fit$results[order(-kknn_fit$results$Kappa),],6),options = list(scrollX = TRUE,dom = 'ltip',ordering=F))
Klassifitseerimise tulemused treenimisandmetel:
confusionMatrix(predict(kknn_fit),olive.train$area,mode = "everything")
## Confusion Matrix and Statistics
##
## Reference
## Prediction Calabria Coast.Sardinia East.Liguria Inland.Sardinia
## Calabria 41 0 0 0
## Coast.Sardinia 0 24 0 0
## East.Liguria 0 0 41 0
## Inland.Sardinia 0 1 0 55
## North.Apulia 0 0 0 0
## Sicily 0 0 0 0
## South.Apulia 0 0 0 0
## Umbria 0 0 0 0
## West.Liguria 0 0 0 0
## Reference
## Prediction North.Apulia Sicily South.Apulia Umbria West.Liguria
## Calabria 0 1 0 0 0
## Coast.Sardinia 0 0 0 0 0
## East.Liguria 0 0 0 0 0
## Inland.Sardinia 0 0 0 0 0
## North.Apulia 22 1 0 0 0
## Sicily 0 30 0 0 0
## South.Apulia 0 1 162 0 0
## Umbria 0 0 0 40 0
## West.Liguria 0 0 0 0 39
##
## Overall Statistics
##
## Accuracy : 0.9913
## 95% CI : (0.9778, 0.9976)
## No Information Rate : 0.3537
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.9893
##
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: Calabria Class: Coast.Sardinia Class: East.Liguria
## Sensitivity 1.00000 0.96000 1.00000
## Specificity 0.99760 1.00000 1.00000
## Pos Pred Value 0.97619 1.00000 1.00000
## Neg Pred Value 1.00000 0.99770 1.00000
## Precision 0.97619 1.00000 1.00000
## Recall 1.00000 0.96000 1.00000
## F1 0.98795 0.97959 1.00000
## Prevalence 0.08952 0.05459 0.08952
## Detection Rate 0.08952 0.05240 0.08952
## Detection Prevalence 0.09170 0.05240 0.08952
## Balanced Accuracy 0.99880 0.98000 1.00000
## Class: Inland.Sardinia Class: North.Apulia Class: Sicily
## Sensitivity 1.0000 1.00000 0.90909
## Specificity 0.9975 0.99771 1.00000
## Pos Pred Value 0.9821 0.95652 1.00000
## Neg Pred Value 1.0000 1.00000 0.99299
## Precision 0.9821 0.95652 1.00000
## Recall 1.0000 1.00000 0.90909
## F1 0.9910 0.97778 0.95238
## Prevalence 0.1201 0.04803 0.07205
## Detection Rate 0.1201 0.04803 0.06550
## Detection Prevalence 0.1223 0.05022 0.06550
## Balanced Accuracy 0.9988 0.99885 0.95455
## Class: South.Apulia Class: Umbria Class: West.Liguria
## Sensitivity 1.0000 1.00000 1.00000
## Specificity 0.9966 1.00000 1.00000
## Pos Pred Value 0.9939 1.00000 1.00000
## Neg Pred Value 1.0000 1.00000 1.00000
## Precision 0.9939 1.00000 1.00000
## Recall 1.0000 1.00000 1.00000
## F1 0.9969 1.00000 1.00000
## Prevalence 0.3537 0.08734 0.08515
## Detection Rate 0.3537 0.08734 0.08515
## Detection Prevalence 0.3559 0.08734 0.08515
## Balanced Accuracy 0.9983 1.00000 1.00000
Treenimisandmetel Kappa=0.9893. Testandmetel:
confusionMatrix(predict(kknn_fit, olive.test.scale),olive.test$area,mode = "everything")
## Confusion Matrix and Statistics
##
## Reference
## Prediction Calabria Coast.Sardinia East.Liguria Inland.Sardinia
## Calabria 15 0 0 0
## Coast.Sardinia 0 7 0 0
## East.Liguria 0 0 7 0
## Inland.Sardinia 0 1 0 10
## North.Apulia 0 0 0 0
## Sicily 0 0 0 0
## South.Apulia 0 0 0 0
## Umbria 0 0 0 0
## West.Liguria 0 0 2 0
## Reference
## Prediction North.Apulia Sicily South.Apulia Umbria West.Liguria
## Calabria 0 1 1 0 0
## Coast.Sardinia 0 0 0 0 0
## East.Liguria 0 0 0 0 1
## Inland.Sardinia 0 0 0 0 0
## North.Apulia 3 0 0 0 0
## Sicily 0 2 2 0 0
## South.Apulia 0 0 41 0 0
## Umbria 0 0 0 11 0
## West.Liguria 0 0 0 0 10
##
## Overall Statistics
##
## Accuracy : 0.9298
## 95% CI : (0.8664, 0.9692)
## No Information Rate : 0.386
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.9125
##
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: Calabria Class: Coast.Sardinia Class: East.Liguria
## Sensitivity 1.0000 0.87500 0.77778
## Specificity 0.9798 1.00000 0.99048
## Pos Pred Value 0.8824 1.00000 0.87500
## Neg Pred Value 1.0000 0.99065 0.98113
## Precision 0.8824 1.00000 0.87500
## Recall 1.0000 0.87500 0.77778
## F1 0.9375 0.93333 0.82353
## Prevalence 0.1316 0.07018 0.07895
## Detection Rate 0.1316 0.06140 0.06140
## Detection Prevalence 0.1491 0.06140 0.07018
## Balanced Accuracy 0.9899 0.93750 0.88413
## Class: Inland.Sardinia Class: North.Apulia Class: Sicily
## Sensitivity 1.00000 1.00000 0.66667
## Specificity 0.99038 1.00000 0.98198
## Pos Pred Value 0.90909 1.00000 0.50000
## Neg Pred Value 1.00000 1.00000 0.99091
## Precision 0.90909 1.00000 0.50000
## Recall 1.00000 1.00000 0.66667
## F1 0.95238 1.00000 0.57143
## Prevalence 0.08772 0.02632 0.02632
## Detection Rate 0.08772 0.02632 0.01754
## Detection Prevalence 0.09649 0.02632 0.03509
## Balanced Accuracy 0.99519 1.00000 0.82432
## Class: South.Apulia Class: Umbria Class: West.Liguria
## Sensitivity 0.9318 1.00000 0.90909
## Specificity 1.0000 1.00000 0.98058
## Pos Pred Value 1.0000 1.00000 0.83333
## Neg Pred Value 0.9589 1.00000 0.99020
## Precision 1.0000 1.00000 0.83333
## Recall 0.9318 1.00000 0.90909
## F1 0.9647 1.00000 0.86957
## Prevalence 0.3860 0.09649 0.09649
## Detection Rate 0.3596 0.09649 0.08772
## Detection Prevalence 0.3596 0.09649 0.10526
## Balanced Accuracy 0.9659 1.00000 0.94484
Testandmetel Kappa=0.9125.
Muutujate tähtsus KNN klassifitseerimisel
plot(varImp(kknn_fit))
Atribuutide tähtsus erinevates klassides on erinev.
Naïve Bayesi klassifitseerimine põhineb Bayesi reeglil, kusjuures Klass määratakse suurima klassi kuulumise tõenäosusega ja klassi tinglik tõenäosus määratakse Bayesi valemiga.
Andmete ettevalmistamisel arvulised tunnused nõuvad skaleerimist, mittearvulistest dummy variables moodustamist. Vt nt https://uc-r.github.io/naive_bayes.
Kasutame paketti caret NB klassikikaatori tuunimiseks. Funktsiooni trainControl korral määrame summaryFunction = multiClassSummary ja meetrikaks funktsioonis train määrame Kappa.
Mycluster = makeCluster(detectCores()-1)
registerDoParallel(Mycluster)
system.time({
RNGkind(sample.kind = "Rounding")
set.seed(123)
train.control <- trainControl(method = "cv",number = 5 ,search = "random",classProbs = TRUE,summaryFunction = multiClassSummary)
tuneGrid <- expand.grid(usekernel = c(FALSE,TRUE),fL = 0:5,adjust = 0:5)
RNGkind(sample.kind = "Rounding")
set.seed(123)
nb_fit <- train(area~.,olive.train.scale, method = 'nb',trControl = train.control,tuneGrid = tuneGrid,metric = "Kappa")
})
## user system elapsed
## 1.11 0.01 28.78
stopCluster(Mycluster)
registerDoSEQ()
plot(nb_fit)
nb_fit
## Naive Bayes
##
## 458 samples
## 8 predictor
## 9 classes: 'Calabria', 'Coast.Sardinia', 'East.Liguria', 'Inland.Sardinia', 'North.Apulia', 'Sicily', 'South.Apulia', 'Umbria', 'West.Liguria'
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 367, 365, 368, 367, 365
## Resampling results across tuning parameters:
##
## usekernel fL adjust logLoss AUC prAUC Accuracy Kappa
## FALSE 0 0 0.2279951 0.9945969 0.8278884 0.9543172 0.9441053
## FALSE 0 1 0.2279951 0.9945969 0.8278884 0.9543172 0.9441053
## FALSE 0 2 0.2279951 0.9945969 0.8278884 0.9543172 0.9441053
## FALSE 0 3 0.2279951 0.9945969 0.8278884 0.9543172 0.9441053
## FALSE 0 4 0.2279951 0.9945969 0.8278884 0.9543172 0.9441053
## FALSE 0 5 0.2279951 0.9945969 0.8278884 0.9543172 0.9441053
## FALSE 1 0 0.2279951 0.9945969 0.8278884 0.9543172 0.9441053
## FALSE 1 1 0.2279951 0.9945969 0.8278884 0.9543172 0.9441053
## FALSE 1 2 0.2279951 0.9945969 0.8278884 0.9543172 0.9441053
## FALSE 1 3 0.2279951 0.9945969 0.8278884 0.9543172 0.9441053
## FALSE 1 4 0.2279951 0.9945969 0.8278884 0.9543172 0.9441053
## FALSE 1 5 0.2279951 0.9945969 0.8278884 0.9543172 0.9441053
## FALSE 2 0 0.2279951 0.9945969 0.8278884 0.9543172 0.9441053
## FALSE 2 1 0.2279951 0.9945969 0.8278884 0.9543172 0.9441053
## FALSE 2 2 0.2279951 0.9945969 0.8278884 0.9543172 0.9441053
## FALSE 2 3 0.2279951 0.9945969 0.8278884 0.9543172 0.9441053
## FALSE 2 4 0.2279951 0.9945969 0.8278884 0.9543172 0.9441053
## FALSE 2 5 0.2279951 0.9945969 0.8278884 0.9543172 0.9441053
## FALSE 3 0 0.2279951 0.9945969 0.8278884 0.9543172 0.9441053
## FALSE 3 1 0.2279951 0.9945969 0.8278884 0.9543172 0.9441053
## FALSE 3 2 0.2279951 0.9945969 0.8278884 0.9543172 0.9441053
## FALSE 3 3 0.2279951 0.9945969 0.8278884 0.9543172 0.9441053
## FALSE 3 4 0.2279951 0.9945969 0.8278884 0.9543172 0.9441053
## FALSE 3 5 0.2279951 0.9945969 0.8278884 0.9543172 0.9441053
## FALSE 4 0 0.2279951 0.9945969 0.8278884 0.9543172 0.9441053
## FALSE 4 1 0.2279951 0.9945969 0.8278884 0.9543172 0.9441053
## FALSE 4 2 0.2279951 0.9945969 0.8278884 0.9543172 0.9441053
## FALSE 4 3 0.2279951 0.9945969 0.8278884 0.9543172 0.9441053
## FALSE 4 4 0.2279951 0.9945969 0.8278884 0.9543172 0.9441053
## FALSE 4 5 0.2279951 0.9945969 0.8278884 0.9543172 0.9441053
## FALSE 5 0 0.2279951 0.9945969 0.8278884 0.9543172 0.9441053
## FALSE 5 1 0.2279951 0.9945969 0.8278884 0.9543172 0.9441053
## FALSE 5 2 0.2279951 0.9945969 0.8278884 0.9543172 0.9441053
## FALSE 5 3 0.2279951 0.9945969 0.8278884 0.9543172 0.9441053
## FALSE 5 4 0.2279951 0.9945969 0.8278884 0.9543172 0.9441053
## FALSE 5 5 0.2279951 0.9945969 0.8278884 0.9543172 0.9441053
## TRUE 0 0 NaN NaN NaN NaN NaN
## TRUE 0 1 0.3298594 0.9912368 0.8179129 0.9324550 0.9175106
## TRUE 0 2 0.2248097 0.9944869 0.8230938 0.9433511 0.9306576
## TRUE 0 3 0.2281915 0.9939013 0.8154503 0.9279649 0.9115387
## TRUE 0 4 0.2796767 0.9934008 0.8090756 0.9105699 0.8898588
## TRUE 0 5 0.3450491 0.9912329 0.7884345 0.8996510 0.8761059
## TRUE 1 0 NaN NaN NaN NaN NaN
## TRUE 1 1 0.3298594 0.9912368 0.8179129 0.9324550 0.9175106
## TRUE 1 2 0.2248097 0.9944869 0.8230938 0.9433511 0.9306576
## TRUE 1 3 0.2281915 0.9939013 0.8154503 0.9279649 0.9115387
## TRUE 1 4 0.2796767 0.9934008 0.8090756 0.9105699 0.8898588
## TRUE 1 5 0.3450491 0.9912329 0.7884345 0.8996510 0.8761059
## TRUE 2 0 NaN NaN NaN NaN NaN
## TRUE 2 1 0.3298594 0.9912368 0.8179129 0.9324550 0.9175106
## TRUE 2 2 0.2248097 0.9944869 0.8230938 0.9433511 0.9306576
## TRUE 2 3 0.2281915 0.9939013 0.8154503 0.9279649 0.9115387
## TRUE 2 4 0.2796767 0.9934008 0.8090756 0.9105699 0.8898588
## TRUE 2 5 0.3450491 0.9912329 0.7884345 0.8996510 0.8761059
## TRUE 3 0 NaN NaN NaN NaN NaN
## TRUE 3 1 0.3298594 0.9912368 0.8179129 0.9324550 0.9175106
## TRUE 3 2 0.2248097 0.9944869 0.8230938 0.9433511 0.9306576
## TRUE 3 3 0.2281915 0.9939013 0.8154503 0.9279649 0.9115387
## TRUE 3 4 0.2796767 0.9934008 0.8090756 0.9105699 0.8898588
## TRUE 3 5 0.3450491 0.9912329 0.7884345 0.8996510 0.8761059
## TRUE 4 0 NaN NaN NaN NaN NaN
## TRUE 4 1 0.3298594 0.9912368 0.8179129 0.9324550 0.9175106
## TRUE 4 2 0.2248097 0.9944869 0.8230938 0.9433511 0.9306576
## TRUE 4 3 0.2281915 0.9939013 0.8154503 0.9279649 0.9115387
## TRUE 4 4 0.2796767 0.9934008 0.8090756 0.9105699 0.8898588
## TRUE 4 5 0.3450491 0.9912329 0.7884345 0.8996510 0.8761059
## TRUE 5 0 NaN NaN NaN NaN NaN
## TRUE 5 1 0.3298594 0.9912368 0.8179129 0.9324550 0.9175106
## TRUE 5 2 0.2248097 0.9944869 0.8230938 0.9433511 0.9306576
## TRUE 5 3 0.2281915 0.9939013 0.8154503 0.9279649 0.9115387
## TRUE 5 4 0.2796767 0.9934008 0.8090756 0.9105699 0.8898588
## TRUE 5 5 0.3450491 0.9912329 0.7884345 0.8996510 0.8761059
## Mean_F1 Mean_Sensitivity Mean_Specificity Mean_Pos_Pred_Value
## 0.9323104 0.9333995 0.9940783 0.9403178
## 0.9323104 0.9333995 0.9940783 0.9403178
## 0.9323104 0.9333995 0.9940783 0.9403178
## 0.9323104 0.9333995 0.9940783 0.9403178
## 0.9323104 0.9333995 0.9940783 0.9403178
## 0.9323104 0.9333995 0.9940783 0.9403178
## 0.9323104 0.9333995 0.9940783 0.9403178
## 0.9323104 0.9333995 0.9940783 0.9403178
## 0.9323104 0.9333995 0.9940783 0.9403178
## 0.9323104 0.9333995 0.9940783 0.9403178
## 0.9323104 0.9333995 0.9940783 0.9403178
## 0.9323104 0.9333995 0.9940783 0.9403178
## 0.9323104 0.9333995 0.9940783 0.9403178
## 0.9323104 0.9333995 0.9940783 0.9403178
## 0.9323104 0.9333995 0.9940783 0.9403178
## 0.9323104 0.9333995 0.9940783 0.9403178
## 0.9323104 0.9333995 0.9940783 0.9403178
## 0.9323104 0.9333995 0.9940783 0.9403178
## 0.9323104 0.9333995 0.9940783 0.9403178
## 0.9323104 0.9333995 0.9940783 0.9403178
## 0.9323104 0.9333995 0.9940783 0.9403178
## 0.9323104 0.9333995 0.9940783 0.9403178
## 0.9323104 0.9333995 0.9940783 0.9403178
## 0.9323104 0.9333995 0.9940783 0.9403178
## 0.9323104 0.9333995 0.9940783 0.9403178
## 0.9323104 0.9333995 0.9940783 0.9403178
## 0.9323104 0.9333995 0.9940783 0.9403178
## 0.9323104 0.9333995 0.9940783 0.9403178
## 0.9323104 0.9333995 0.9940783 0.9403178
## 0.9323104 0.9333995 0.9940783 0.9403178
## 0.9323104 0.9333995 0.9940783 0.9403178
## 0.9323104 0.9333995 0.9940783 0.9403178
## 0.9323104 0.9333995 0.9940783 0.9403178
## 0.9323104 0.9333995 0.9940783 0.9403178
## 0.9323104 0.9333995 0.9940783 0.9403178
## 0.9323104 0.9333995 0.9940783 0.9403178
## NaN NaN NaN NaN
## 0.9019562 0.9045924 0.9914315 0.9139538
## 0.9086546 0.9103373 0.9927390 0.9298769
## 0.8917832 0.8824691 0.9905308 0.9342747
## 0.8024536 0.8576808 0.9880977 0.8806403
## NaN 0.8437037 0.9864323 NaN
## NaN NaN NaN NaN
## 0.9019562 0.9045924 0.9914315 0.9139538
## 0.9086546 0.9103373 0.9927390 0.9298769
## 0.8917832 0.8824691 0.9905308 0.9342747
## 0.8024536 0.8576808 0.9880977 0.8806403
## NaN 0.8437037 0.9864323 NaN
## NaN NaN NaN NaN
## 0.9019562 0.9045924 0.9914315 0.9139538
## 0.9086546 0.9103373 0.9927390 0.9298769
## 0.8917832 0.8824691 0.9905308 0.9342747
## 0.8024536 0.8576808 0.9880977 0.8806403
## NaN 0.8437037 0.9864323 NaN
## NaN NaN NaN NaN
## 0.9019562 0.9045924 0.9914315 0.9139538
## 0.9086546 0.9103373 0.9927390 0.9298769
## 0.8917832 0.8824691 0.9905308 0.9342747
## 0.8024536 0.8576808 0.9880977 0.8806403
## NaN 0.8437037 0.9864323 NaN
## NaN NaN NaN NaN
## 0.9019562 0.9045924 0.9914315 0.9139538
## 0.9086546 0.9103373 0.9927390 0.9298769
## 0.8917832 0.8824691 0.9905308 0.9342747
## 0.8024536 0.8576808 0.9880977 0.8806403
## NaN 0.8437037 0.9864323 NaN
## NaN NaN NaN NaN
## 0.9019562 0.9045924 0.9914315 0.9139538
## 0.9086546 0.9103373 0.9927390 0.9298769
## 0.8917832 0.8824691 0.9905308 0.9342747
## 0.8024536 0.8576808 0.9880977 0.8806403
## NaN 0.8437037 0.9864323 NaN
## Mean_Neg_Pred_Value Mean_Precision Mean_Recall Mean_Detection_Rate
## 0.9943610 0.9403178 0.9333995 0.10603525
## 0.9943610 0.9403178 0.9333995 0.10603525
## 0.9943610 0.9403178 0.9333995 0.10603525
## 0.9943610 0.9403178 0.9333995 0.10603525
## 0.9943610 0.9403178 0.9333995 0.10603525
## 0.9943610 0.9403178 0.9333995 0.10603525
## 0.9943610 0.9403178 0.9333995 0.10603525
## 0.9943610 0.9403178 0.9333995 0.10603525
## 0.9943610 0.9403178 0.9333995 0.10603525
## 0.9943610 0.9403178 0.9333995 0.10603525
## 0.9943610 0.9403178 0.9333995 0.10603525
## 0.9943610 0.9403178 0.9333995 0.10603525
## 0.9943610 0.9403178 0.9333995 0.10603525
## 0.9943610 0.9403178 0.9333995 0.10603525
## 0.9943610 0.9403178 0.9333995 0.10603525
## 0.9943610 0.9403178 0.9333995 0.10603525
## 0.9943610 0.9403178 0.9333995 0.10603525
## 0.9943610 0.9403178 0.9333995 0.10603525
## 0.9943610 0.9403178 0.9333995 0.10603525
## 0.9943610 0.9403178 0.9333995 0.10603525
## 0.9943610 0.9403178 0.9333995 0.10603525
## 0.9943610 0.9403178 0.9333995 0.10603525
## 0.9943610 0.9403178 0.9333995 0.10603525
## 0.9943610 0.9403178 0.9333995 0.10603525
## 0.9943610 0.9403178 0.9333995 0.10603525
## 0.9943610 0.9403178 0.9333995 0.10603525
## 0.9943610 0.9403178 0.9333995 0.10603525
## 0.9943610 0.9403178 0.9333995 0.10603525
## 0.9943610 0.9403178 0.9333995 0.10603525
## 0.9943610 0.9403178 0.9333995 0.10603525
## 0.9943610 0.9403178 0.9333995 0.10603525
## 0.9943610 0.9403178 0.9333995 0.10603525
## 0.9943610 0.9403178 0.9333995 0.10603525
## 0.9943610 0.9403178 0.9333995 0.10603525
## 0.9943610 0.9403178 0.9333995 0.10603525
## 0.9943610 0.9403178 0.9333995 0.10603525
## NaN NaN NaN NaN
## 0.9915533 0.9139538 0.9045924 0.10360611
## 0.9932630 0.9298769 0.9103373 0.10481678
## 0.9917206 0.9342747 0.8824691 0.10310721
## 0.9898684 0.8806403 0.8576808 0.10117444
## 0.9885898 NaN 0.8437037 0.09996123
## NaN NaN NaN NaN
## 0.9915533 0.9139538 0.9045924 0.10360611
## 0.9932630 0.9298769 0.9103373 0.10481678
## 0.9917206 0.9342747 0.8824691 0.10310721
## 0.9898684 0.8806403 0.8576808 0.10117444
## 0.9885898 NaN 0.8437037 0.09996123
## NaN NaN NaN NaN
## 0.9915533 0.9139538 0.9045924 0.10360611
## 0.9932630 0.9298769 0.9103373 0.10481678
## 0.9917206 0.9342747 0.8824691 0.10310721
## 0.9898684 0.8806403 0.8576808 0.10117444
## 0.9885898 NaN 0.8437037 0.09996123
## NaN NaN NaN NaN
## 0.9915533 0.9139538 0.9045924 0.10360611
## 0.9932630 0.9298769 0.9103373 0.10481678
## 0.9917206 0.9342747 0.8824691 0.10310721
## 0.9898684 0.8806403 0.8576808 0.10117444
## 0.9885898 NaN 0.8437037 0.09996123
## NaN NaN NaN NaN
## 0.9915533 0.9139538 0.9045924 0.10360611
## 0.9932630 0.9298769 0.9103373 0.10481678
## 0.9917206 0.9342747 0.8824691 0.10310721
## 0.9898684 0.8806403 0.8576808 0.10117444
## 0.9885898 NaN 0.8437037 0.09996123
## NaN NaN NaN NaN
## 0.9915533 0.9139538 0.9045924 0.10360611
## 0.9932630 0.9298769 0.9103373 0.10481678
## 0.9917206 0.9342747 0.8824691 0.10310721
## 0.9898684 0.8806403 0.8576808 0.10117444
## 0.9885898 NaN 0.8437037 0.09996123
## Mean_Balanced_Accuracy
## 0.9637389
## 0.9637389
## 0.9637389
## 0.9637389
## 0.9637389
## 0.9637389
## 0.9637389
## 0.9637389
## 0.9637389
## 0.9637389
## 0.9637389
## 0.9637389
## 0.9637389
## 0.9637389
## 0.9637389
## 0.9637389
## 0.9637389
## 0.9637389
## 0.9637389
## 0.9637389
## 0.9637389
## 0.9637389
## 0.9637389
## 0.9637389
## 0.9637389
## 0.9637389
## 0.9637389
## 0.9637389
## 0.9637389
## 0.9637389
## 0.9637389
## 0.9637389
## 0.9637389
## 0.9637389
## 0.9637389
## 0.9637389
## NaN
## 0.9480119
## 0.9515382
## 0.9365000
## 0.9228893
## 0.9150680
## NaN
## 0.9480119
## 0.9515382
## 0.9365000
## 0.9228893
## 0.9150680
## NaN
## 0.9480119
## 0.9515382
## 0.9365000
## 0.9228893
## 0.9150680
## NaN
## 0.9480119
## 0.9515382
## 0.9365000
## 0.9228893
## 0.9150680
## NaN
## 0.9480119
## 0.9515382
## 0.9365000
## 0.9228893
## 0.9150680
## NaN
## 0.9480119
## 0.9515382
## 0.9365000
## 0.9228893
## 0.9150680
##
## Kappa was used to select the optimal model using the largest value.
## The final values used for the model were fL = 0, usekernel = FALSE and adjust
## = 0.
head(nb_fit$results[order(-nb_fit$results$Kappa),],10)
## usekernel fL adjust logLoss AUC prAUC Accuracy Kappa
## 1 FALSE 0 0 0.2279951 0.9945969 0.8278884 0.9543172 0.9441053
## 2 FALSE 0 1 0.2279951 0.9945969 0.8278884 0.9543172 0.9441053
## 3 FALSE 0 2 0.2279951 0.9945969 0.8278884 0.9543172 0.9441053
## 4 FALSE 0 3 0.2279951 0.9945969 0.8278884 0.9543172 0.9441053
## 5 FALSE 0 4 0.2279951 0.9945969 0.8278884 0.9543172 0.9441053
## 6 FALSE 0 5 0.2279951 0.9945969 0.8278884 0.9543172 0.9441053
## 7 FALSE 1 0 0.2279951 0.9945969 0.8278884 0.9543172 0.9441053
## 8 FALSE 1 1 0.2279951 0.9945969 0.8278884 0.9543172 0.9441053
## 9 FALSE 1 2 0.2279951 0.9945969 0.8278884 0.9543172 0.9441053
## 10 FALSE 1 3 0.2279951 0.9945969 0.8278884 0.9543172 0.9441053
## Mean_F1 Mean_Sensitivity Mean_Specificity Mean_Pos_Pred_Value
## 1 0.9323104 0.9333995 0.9940783 0.9403178
## 2 0.9323104 0.9333995 0.9940783 0.9403178
## 3 0.9323104 0.9333995 0.9940783 0.9403178
## 4 0.9323104 0.9333995 0.9940783 0.9403178
## 5 0.9323104 0.9333995 0.9940783 0.9403178
## 6 0.9323104 0.9333995 0.9940783 0.9403178
## 7 0.9323104 0.9333995 0.9940783 0.9403178
## 8 0.9323104 0.9333995 0.9940783 0.9403178
## 9 0.9323104 0.9333995 0.9940783 0.9403178
## 10 0.9323104 0.9333995 0.9940783 0.9403178
## Mean_Neg_Pred_Value Mean_Precision Mean_Recall Mean_Detection_Rate
## 1 0.994361 0.9403178 0.9333995 0.1060352
## 2 0.994361 0.9403178 0.9333995 0.1060352
## 3 0.994361 0.9403178 0.9333995 0.1060352
## 4 0.994361 0.9403178 0.9333995 0.1060352
## 5 0.994361 0.9403178 0.9333995 0.1060352
## 6 0.994361 0.9403178 0.9333995 0.1060352
## 7 0.994361 0.9403178 0.9333995 0.1060352
## 8 0.994361 0.9403178 0.9333995 0.1060352
## 9 0.994361 0.9403178 0.9333995 0.1060352
## 10 0.994361 0.9403178 0.9333995 0.1060352
## Mean_Balanced_Accuracy logLossSD AUCSD prAUCSD AccuracySD
## 1 0.9637389 0.1578302 0.004083802 0.02532101 0.01896598
## 2 0.9637389 0.1578302 0.004083802 0.02532101 0.01896598
## 3 0.9637389 0.1578302 0.004083802 0.02532101 0.01896598
## 4 0.9637389 0.1578302 0.004083802 0.02532101 0.01896598
## 5 0.9637389 0.1578302 0.004083802 0.02532101 0.01896598
## 6 0.9637389 0.1578302 0.004083802 0.02532101 0.01896598
## 7 0.9637389 0.1578302 0.004083802 0.02532101 0.01896598
## 8 0.9637389 0.1578302 0.004083802 0.02532101 0.01896598
## 9 0.9637389 0.1578302 0.004083802 0.02532101 0.01896598
## 10 0.9637389 0.1578302 0.004083802 0.02532101 0.01896598
## KappaSD Mean_F1SD Mean_SensitivitySD Mean_SpecificitySD
## 1 0.02334615 0.03271749 0.03768198 0.002531668
## 2 0.02334615 0.03271749 0.03768198 0.002531668
## 3 0.02334615 0.03271749 0.03768198 0.002531668
## 4 0.02334615 0.03271749 0.03768198 0.002531668
## 5 0.02334615 0.03271749 0.03768198 0.002531668
## 6 0.02334615 0.03271749 0.03768198 0.002531668
## 7 0.02334615 0.03271749 0.03768198 0.002531668
## 8 0.02334615 0.03271749 0.03768198 0.002531668
## 9 0.02334615 0.03271749 0.03768198 0.002531668
## 10 0.02334615 0.03271749 0.03768198 0.002531668
## Mean_Pos_Pred_ValueSD Mean_Neg_Pred_ValueSD Mean_PrecisionSD Mean_RecallSD
## 1 0.03027377 0.00225509 0.03027377 0.03768198
## 2 0.03027377 0.00225509 0.03027377 0.03768198
## 3 0.03027377 0.00225509 0.03027377 0.03768198
## 4 0.03027377 0.00225509 0.03027377 0.03768198
## 5 0.03027377 0.00225509 0.03027377 0.03768198
## 6 0.03027377 0.00225509 0.03027377 0.03768198
## 7 0.03027377 0.00225509 0.03027377 0.03768198
## 8 0.03027377 0.00225509 0.03027377 0.03768198
## 9 0.03027377 0.00225509 0.03027377 0.03768198
## 10 0.03027377 0.00225509 0.03027377 0.03768198
## Mean_Detection_RateSD Mean_Balanced_AccuracySD
## 1 0.002107332 0.02003125
## 2 0.002107332 0.02003125
## 3 0.002107332 0.02003125
## 4 0.002107332 0.02003125
## 5 0.002107332 0.02003125
## 6 0.002107332 0.02003125
## 7 0.002107332 0.02003125
## 8 0.002107332 0.02003125
## 9 0.002107332 0.02003125
## 10 0.002107332 0.02003125
Parim leitud mudel:
nb_fit$bestTune
## fL usekernel adjust
## 1 0 FALSE 0
Parima mudeli parameetrid: fL = 0, usekernel = FALSE, adjust = 0.
Samaväärsed mudelid:
datatable(head(nb_fit$results[order(-nb_fit$results$Kappa),],10),options = list(scrollX = TRUE,dom = 'ltip',ordering=F))
Klassifitseerimise tulemused treenimisandmetel:
confusionMatrix(predict(nb_fit),olive.train$area,mode = "everything")
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 245
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 246
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 252
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 253
## Confusion Matrix and Statistics
##
## Reference
## Prediction Calabria Coast.Sardinia East.Liguria Inland.Sardinia
## Calabria 38 0 0 0
## Coast.Sardinia 0 25 0 0
## East.Liguria 0 0 41 0
## Inland.Sardinia 0 0 0 55
## North.Apulia 0 0 0 0
## Sicily 1 0 0 0
## South.Apulia 2 0 0 0
## Umbria 0 0 0 0
## West.Liguria 0 0 0 0
## Reference
## Prediction North.Apulia Sicily South.Apulia Umbria West.Liguria
## Calabria 2 3 1 0 0
## Coast.Sardinia 0 0 0 0 0
## East.Liguria 0 0 0 1 0
## Inland.Sardinia 0 0 0 0 0
## North.Apulia 20 6 0 0 0
## Sicily 0 21 2 0 0
## South.Apulia 0 3 159 0 0
## Umbria 0 0 0 39 0
## West.Liguria 0 0 0 0 39
##
## Overall Statistics
##
## Accuracy : 0.9541
## 95% CI : (0.9308, 0.9714)
## No Information Rate : 0.3537
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.9439
##
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: Calabria Class: Coast.Sardinia Class: East.Liguria
## Sensitivity 0.92683 1.00000 1.00000
## Specificity 0.98561 1.00000 0.99760
## Pos Pred Value 0.86364 1.00000 0.97619
## Neg Pred Value 0.99275 1.00000 1.00000
## Precision 0.86364 1.00000 0.97619
## Recall 0.92683 1.00000 1.00000
## F1 0.89412 1.00000 0.98795
## Prevalence 0.08952 0.05459 0.08952
## Detection Rate 0.08297 0.05459 0.08952
## Detection Prevalence 0.09607 0.05459 0.09170
## Balanced Accuracy 0.95622 1.00000 0.99880
## Class: Inland.Sardinia Class: North.Apulia Class: Sicily
## Sensitivity 1.0000 0.90909 0.63636
## Specificity 1.0000 0.98624 0.99294
## Pos Pred Value 1.0000 0.76923 0.87500
## Neg Pred Value 1.0000 0.99537 0.97235
## Precision 1.0000 0.76923 0.87500
## Recall 1.0000 0.90909 0.63636
## F1 1.0000 0.83333 0.73684
## Prevalence 0.1201 0.04803 0.07205
## Detection Rate 0.1201 0.04367 0.04585
## Detection Prevalence 0.1201 0.05677 0.05240
## Balanced Accuracy 1.0000 0.94766 0.81465
## Class: South.Apulia Class: Umbria Class: West.Liguria
## Sensitivity 0.9815 0.97500 1.00000
## Specificity 0.9831 1.00000 1.00000
## Pos Pred Value 0.9695 1.00000 1.00000
## Neg Pred Value 0.9898 0.99761 1.00000
## Precision 0.9695 1.00000 1.00000
## Recall 0.9815 0.97500 1.00000
## F1 0.9755 0.98734 1.00000
## Prevalence 0.3537 0.08734 0.08515
## Detection Rate 0.3472 0.08515 0.08515
## Detection Prevalence 0.3581 0.08515 0.08515
## Balanced Accuracy 0.9823 0.98750 1.00000
Treenimisandmetel Kappa=0.9439. Testandmetel:
confusionMatrix(predict(nb_fit, olive.test.scale),olive.test$area,mode = "everything")
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 5
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 31
## Confusion Matrix and Statistics
##
## Reference
## Prediction Calabria Coast.Sardinia East.Liguria Inland.Sardinia
## Calabria 14 0 0 0
## Coast.Sardinia 0 8 0 0
## East.Liguria 0 0 6 0
## Inland.Sardinia 0 0 0 10
## North.Apulia 0 0 0 0
## Sicily 1 0 0 0
## South.Apulia 0 0 0 0
## Umbria 0 0 1 0
## West.Liguria 0 0 2 0
## Reference
## Prediction North.Apulia Sicily South.Apulia Umbria West.Liguria
## Calabria 0 1 3 0 0
## Coast.Sardinia 0 0 0 0 0
## East.Liguria 0 0 0 0 1
## Inland.Sardinia 0 0 0 0 0
## North.Apulia 3 0 0 0 0
## Sicily 0 2 2 0 0
## South.Apulia 0 0 39 0 0
## Umbria 0 0 0 11 0
## West.Liguria 0 0 0 0 10
##
## Overall Statistics
##
## Accuracy : 0.9035
## 95% CI : (0.8339, 0.9508)
## No Information Rate : 0.386
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.8805
##
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: Calabria Class: Coast.Sardinia Class: East.Liguria
## Sensitivity 0.9333 1.00000 0.66667
## Specificity 0.9596 1.00000 0.99048
## Pos Pred Value 0.7778 1.00000 0.85714
## Neg Pred Value 0.9896 1.00000 0.97196
## Precision 0.7778 1.00000 0.85714
## Recall 0.9333 1.00000 0.66667
## F1 0.8485 1.00000 0.75000
## Prevalence 0.1316 0.07018 0.07895
## Detection Rate 0.1228 0.07018 0.05263
## Detection Prevalence 0.1579 0.07018 0.06140
## Balanced Accuracy 0.9465 1.00000 0.82857
## Class: Inland.Sardinia Class: North.Apulia Class: Sicily
## Sensitivity 1.00000 1.00000 0.66667
## Specificity 1.00000 1.00000 0.97297
## Pos Pred Value 1.00000 1.00000 0.40000
## Neg Pred Value 1.00000 1.00000 0.99083
## Precision 1.00000 1.00000 0.40000
## Recall 1.00000 1.00000 0.66667
## F1 1.00000 1.00000 0.50000
## Prevalence 0.08772 0.02632 0.02632
## Detection Rate 0.08772 0.02632 0.01754
## Detection Prevalence 0.08772 0.02632 0.04386
## Balanced Accuracy 1.00000 1.00000 0.81982
## Class: South.Apulia Class: Umbria Class: West.Liguria
## Sensitivity 0.8864 1.00000 0.90909
## Specificity 1.0000 0.99029 0.98058
## Pos Pred Value 1.0000 0.91667 0.83333
## Neg Pred Value 0.9333 1.00000 0.99020
## Precision 1.0000 0.91667 0.83333
## Recall 0.8864 1.00000 0.90909
## F1 0.9398 0.95652 0.86957
## Prevalence 0.3860 0.09649 0.09649
## Detection Rate 0.3421 0.09649 0.08772
## Detection Prevalence 0.3421 0.10526 0.10526
## Balanced Accuracy 0.9432 0.99515 0.94484
Testandmetel Kappa=0.8805.
Tulemuste väljastamisel on väljastatud ka hoiatused kujul: Numerical 0 probability for all classes with observation 5. See ei ole viga, aga hoiatus sellest, et andmed ei ole piisavalt ettevalmistatud, nad sisaldavad erindeid ja nõuavad Box-Cox/Yeo-Johnson teisenduste rakendamist.
library(psych)
pairs.panels(scale(olive[,-1]))
Muutujate tähtsus KNN klassifitseerimisel
plot(varImp(nb_fit))
SVM-meetod otsib tunnuste ruumi jagavat tasapinda, millele lähimate andmepunktide (tugivektorite) omavaheline kaugus on kõige suurem. Tugivektorid on klasside eralduspinnale kõige lähemal olevad andmepunktid.
Andmete ettevalmistamisel arvulised tunnused nõuvad skaleerimist, mittearvulistest dummy variables moodustamist. Vt nt http://www.sthda.com/english/articles/36-classification-methods-essentials/144-svm-model-support-vector-machine-essentials/.
Kasutame paketti caret SVM klassikikaatori tuunimiseks.
Mycluster = makeCluster(detectCores()-1)
registerDoParallel(Mycluster)
system.time({
train.control <- trainControl(method = "cv",number = 5)
RNGkind(sample.kind = "Rounding")
set.seed(123)
svm.linear <- train(area~.,data=olive.train.scale, method = "svmLinear", trControl = train.control,tuneGrid = expand.grid(C=0:20),metric = "Kappa")
})
## user system elapsed
## 0.81 0.00 4.81
stopCluster(Mycluster)
registerDoSEQ()
plot(svm.linear)
svm.linear
## Support Vector Machines with Linear Kernel
##
## 458 samples
## 8 predictor
## 9 classes: 'Calabria', 'Coast.Sardinia', 'East.Liguria', 'Inland.Sardinia', 'North.Apulia', 'Sicily', 'South.Apulia', 'Umbria', 'West.Liguria'
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 367, 365, 368, 367, 365
## Resampling results across tuning parameters:
##
## C Accuracy Kappa
## 0 NaN NaN
## 1 0.9606987 0.9519006
## 2 0.9564922 0.9468324
## 3 0.9564678 0.9469004
## 4 0.9520477 0.9415478
## 5 0.9455016 0.9335914
## 6 0.9477238 0.9362954
## 7 0.9499216 0.9389722
## 8 0.9477238 0.9362835
## 9 0.9477238 0.9362835
## 10 0.9477238 0.9362835
## 11 0.9477711 0.9363043
## 12 0.9477711 0.9363043
## 13 0.9477711 0.9363043
## 14 0.9455733 0.9336653
## 15 0.9455733 0.9336653
## 16 0.9455733 0.9336653
## 17 0.9433511 0.9309295
## 18 0.9433511 0.9309295
## 19 0.9433511 0.9309295
## 20 0.9433511 0.9309295
##
## Kappa was used to select the optimal model using the largest value.
## The final value used for the model was C = 1.
head(svm.linear$results[order(-svm.linear$results$Kappa),],20)
## C Accuracy Kappa AccuracySD KappaSD
## 2 1 0.9606987 0.9519006 0.02632914 0.03221578
## 4 3 0.9564678 0.9469004 0.01495887 0.01832932
## 3 2 0.9564922 0.9468324 0.02406177 0.02944283
## 5 4 0.9520477 0.9415478 0.01201418 0.01475265
## 8 7 0.9499216 0.9389722 0.01941548 0.02380792
## 12 11 0.9477711 0.9363043 0.01902859 0.02331069
## 13 12 0.9477711 0.9363043 0.01902859 0.02331069
## 14 13 0.9477711 0.9363043 0.01902859 0.02331069
## 7 6 0.9477238 0.9362954 0.02068460 0.02536288
## 9 8 0.9477238 0.9362835 0.02068460 0.02537271
## 10 9 0.9477238 0.9362835 0.02068460 0.02537271
## 11 10 0.9477238 0.9362835 0.02068460 0.02537271
## 15 14 0.9455733 0.9336653 0.02002904 0.02454191
## 16 15 0.9455733 0.9336653 0.02002904 0.02454191
## 17 16 0.9455733 0.9336653 0.02002904 0.02454191
## 6 5 0.9455016 0.9335914 0.01689904 0.02078612
## 18 17 0.9433511 0.9309295 0.01571433 0.01926276
## 19 18 0.9433511 0.9309295 0.01571433 0.01926276
## 20 19 0.9433511 0.9309295 0.01571433 0.01926276
## 21 20 0.9433511 0.9309295 0.01571433 0.01926276
Parim leitud mudel:
svm.linear$bestTune
## C
## 2 1
Parima mudeli parameetrid: C = 1.
Klassifitseerimise tulemused treenimisandmetel:
confusionMatrix(predict(svm.linear),olive.train$area,mode = "everything")
## Confusion Matrix and Statistics
##
## Reference
## Prediction Calabria Coast.Sardinia East.Liguria Inland.Sardinia
## Calabria 38 0 0 0
## Coast.Sardinia 0 24 0 0
## East.Liguria 0 0 41 0
## Inland.Sardinia 0 1 0 55
## North.Apulia 0 0 0 0
## Sicily 2 0 0 0
## South.Apulia 1 0 0 0
## Umbria 0 0 0 0
## West.Liguria 0 0 0 0
## Reference
## Prediction North.Apulia Sicily South.Apulia Umbria West.Liguria
## Calabria 0 2 0 0 0
## Coast.Sardinia 0 0 0 0 0
## East.Liguria 0 0 0 0 0
## Inland.Sardinia 0 0 0 0 0
## North.Apulia 22 2 0 0 0
## Sicily 0 27 2 0 0
## South.Apulia 0 2 160 0 0
## Umbria 0 0 0 40 0
## West.Liguria 0 0 0 0 39
##
## Overall Statistics
##
## Accuracy : 0.9738
## 95% CI : (0.9547, 0.9864)
## No Information Rate : 0.3537
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.968
##
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: Calabria Class: Coast.Sardinia Class: East.Liguria
## Sensitivity 0.92683 0.96000 1.00000
## Specificity 0.99520 1.00000 1.00000
## Pos Pred Value 0.95000 1.00000 1.00000
## Neg Pred Value 0.99282 0.99770 1.00000
## Precision 0.95000 1.00000 1.00000
## Recall 0.92683 0.96000 1.00000
## F1 0.93827 0.97959 1.00000
## Prevalence 0.08952 0.05459 0.08952
## Detection Rate 0.08297 0.05240 0.08952
## Detection Prevalence 0.08734 0.05240 0.08952
## Balanced Accuracy 0.96102 0.98000 1.00000
## Class: Inland.Sardinia Class: North.Apulia Class: Sicily
## Sensitivity 1.0000 1.00000 0.81818
## Specificity 0.9975 0.99541 0.99059
## Pos Pred Value 0.9821 0.91667 0.87097
## Neg Pred Value 1.0000 1.00000 0.98595
## Precision 0.9821 0.91667 0.87097
## Recall 1.0000 1.00000 0.81818
## F1 0.9910 0.95652 0.84375
## Prevalence 0.1201 0.04803 0.07205
## Detection Rate 0.1201 0.04803 0.05895
## Detection Prevalence 0.1223 0.05240 0.06769
## Balanced Accuracy 0.9988 0.99771 0.90439
## Class: South.Apulia Class: Umbria Class: West.Liguria
## Sensitivity 0.9877 1.00000 1.00000
## Specificity 0.9899 1.00000 1.00000
## Pos Pred Value 0.9816 1.00000 1.00000
## Neg Pred Value 0.9932 1.00000 1.00000
## Precision 0.9816 1.00000 1.00000
## Recall 0.9877 1.00000 1.00000
## F1 0.9846 1.00000 1.00000
## Prevalence 0.3537 0.08734 0.08515
## Detection Rate 0.3493 0.08734 0.08515
## Detection Prevalence 0.3559 0.08734 0.08515
## Balanced Accuracy 0.9888 1.00000 1.00000
Treenimisandmetel Kappa=0.968. Testandmetel:
confusionMatrix(predict(svm.linear, olive.test.scale),olive.test$area,mode = "everything")
## Confusion Matrix and Statistics
##
## Reference
## Prediction Calabria Coast.Sardinia East.Liguria Inland.Sardinia
## Calabria 14 0 0 0
## Coast.Sardinia 0 8 0 0
## East.Liguria 0 0 6 0
## Inland.Sardinia 0 0 0 10
## North.Apulia 0 0 0 0
## Sicily 1 0 0 0
## South.Apulia 0 0 0 0
## Umbria 0 0 1 0
## West.Liguria 0 0 2 0
## Reference
## Prediction North.Apulia Sicily South.Apulia Umbria West.Liguria
## Calabria 0 0 2 0 0
## Coast.Sardinia 0 0 0 0 0
## East.Liguria 0 0 0 0 0
## Inland.Sardinia 0 0 0 0 0
## North.Apulia 3 0 0 0 0
## Sicily 0 3 2 0 0
## South.Apulia 0 0 40 0 0
## Umbria 0 0 0 11 0
## West.Liguria 0 0 0 0 11
##
## Overall Statistics
##
## Accuracy : 0.9298
## 95% CI : (0.8664, 0.9692)
## No Information Rate : 0.386
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.9129
##
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: Calabria Class: Coast.Sardinia Class: East.Liguria
## Sensitivity 0.9333 1.00000 0.66667
## Specificity 0.9798 1.00000 1.00000
## Pos Pred Value 0.8750 1.00000 1.00000
## Neg Pred Value 0.9898 1.00000 0.97222
## Precision 0.8750 1.00000 1.00000
## Recall 0.9333 1.00000 0.66667
## F1 0.9032 1.00000 0.80000
## Prevalence 0.1316 0.07018 0.07895
## Detection Rate 0.1228 0.07018 0.05263
## Detection Prevalence 0.1404 0.07018 0.05263
## Balanced Accuracy 0.9566 1.00000 0.83333
## Class: Inland.Sardinia Class: North.Apulia Class: Sicily
## Sensitivity 1.00000 1.00000 1.00000
## Specificity 1.00000 1.00000 0.97297
## Pos Pred Value 1.00000 1.00000 0.50000
## Neg Pred Value 1.00000 1.00000 1.00000
## Precision 1.00000 1.00000 0.50000
## Recall 1.00000 1.00000 1.00000
## F1 1.00000 1.00000 0.66667
## Prevalence 0.08772 0.02632 0.02632
## Detection Rate 0.08772 0.02632 0.02632
## Detection Prevalence 0.08772 0.02632 0.05263
## Balanced Accuracy 1.00000 1.00000 0.98649
## Class: South.Apulia Class: Umbria Class: West.Liguria
## Sensitivity 0.9091 1.00000 1.00000
## Specificity 1.0000 0.99029 0.98058
## Pos Pred Value 1.0000 0.91667 0.84615
## Neg Pred Value 0.9459 1.00000 1.00000
## Precision 1.0000 0.91667 0.84615
## Recall 0.9091 1.00000 1.00000
## F1 0.9524 0.95652 0.91667
## Prevalence 0.3860 0.09649 0.09649
## Detection Rate 0.3509 0.09649 0.09649
## Detection Prevalence 0.3509 0.10526 0.11404
## Balanced Accuracy 0.9545 0.99515 0.99029
Testandmetel Kappa=0.9129.
Muutujate tähtsus KNN klassifitseerimisel
plot(varImp(svm.linear))
Atribuutide tähtsus erinevates klassides on erinev.
Mycluster = makeCluster(detectCores()-1)
registerDoParallel(Mycluster)
system.time({
train.control <- trainControl(method = "cv",number = 5,search = "random")
RNGkind(sample.kind = "Rounding")
set.seed(123)
svm.radial <- train(area~.,data=olive.train.scale, method = "svmRadial", trControl = train.control, tuneLength = 20,metric = "Kappa")
})
## user system elapsed
## 0.95 0.02 6.32
stopCluster(Mycluster)
registerDoSEQ()
plot(svm.radial)
svm.radial
## Support Vector Machines with Radial Basis Function Kernel
##
## 458 samples
## 8 predictor
## 9 classes: 'Calabria', 'Coast.Sardinia', 'East.Liguria', 'Inland.Sardinia', 'North.Apulia', 'Sicily', 'South.Apulia', 'Umbria', 'West.Liguria'
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 367, 365, 368, 367, 365
## Resampling results across tuning parameters:
##
## sigma C Accuracy Kappa
## 0.005852913 0.05230441 0.3537059 0.0000000
## 0.010128835 760.90419287 0.9478412 0.9365860
## 0.012137441 47.60640178 0.9673638 0.9600616
## 0.015399114 39.83984930 0.9651889 0.9573897
## 0.020887845 39.25689606 0.9696089 0.9628821
## 0.022524452 0.11789589 0.6088810 0.4976433
## 0.028598399 0.10414713 0.6287361 0.5238913
## 0.030698472 2.70237697 0.9608405 0.9520295
## 0.071715001 3.72380596 0.9717594 0.9654090
## 0.075472517 147.95409522 0.9389783 0.9256451
## 0.079285182 951.56621613 0.9389783 0.9256451
## 0.088892668 0.76655335 0.9608405 0.9520295
## 0.180744543 508.53955572 0.9412478 0.9283146
## 0.231552023 0.43421604 0.9629911 0.9545952
## 0.284751577 0.07716565 0.7620765 0.6897584
## 0.337231496 36.08576317 0.9565623 0.9469792
## 0.492926619 46.00685001 0.9608634 0.9520715
## 0.678763927 183.03993937 0.9542211 0.9437007
## 0.726718585 0.06503385 0.4912805 0.2517811
## 1.780554660 0.29270054 0.6025483 0.4402274
##
## Kappa was used to select the optimal model using the largest value.
## The final values used for the model were sigma = 0.071715 and C = 3.723806.
head(svm.radial$results[order(-svm.radial$results$Kappa),],5)
## sigma C Accuracy Kappa AccuracySD KappaSD
## 9 0.07171500 3.723806 0.9717594 0.9654090 0.01815230 0.02219872
## 5 0.02088784 39.256896 0.9696089 0.9628821 0.02078556 0.02535304
## 3 0.01213744 47.606402 0.9673638 0.9600616 0.02309314 0.02827360
## 4 0.01539911 39.839849 0.9651889 0.9573897 0.01934853 0.02366657
## 14 0.23155202 0.434216 0.9629911 0.9545952 0.01635930 0.02014991
Parim leitud mudel:
svm.radial$bestTune
## sigma C
## 9 0.071715 3.723806
Parima mudeli parameetrid: sigma = 0.071715, C = 3.723806.
Klassifitseerimise tulemused treenimisandmetel:
confusionMatrix(predict(svm.radial),olive.train$area,mode = "everything")
## Confusion Matrix and Statistics
##
## Reference
## Prediction Calabria Coast.Sardinia East.Liguria Inland.Sardinia
## Calabria 39 0 0 0
## Coast.Sardinia 0 24 0 0
## East.Liguria 0 0 41 0
## Inland.Sardinia 0 1 0 55
## North.Apulia 0 0 0 0
## Sicily 0 0 0 0
## South.Apulia 2 0 0 0
## Umbria 0 0 0 0
## West.Liguria 0 0 0 0
## Reference
## Prediction North.Apulia Sicily South.Apulia Umbria West.Liguria
## Calabria 1 3 0 0 0
## Coast.Sardinia 0 0 0 0 0
## East.Liguria 0 0 0 0 0
## Inland.Sardinia 0 0 0 0 0
## North.Apulia 21 2 0 0 0
## Sicily 0 26 1 0 0
## South.Apulia 0 2 161 0 0
## Umbria 0 0 0 40 0
## West.Liguria 0 0 0 0 39
##
## Overall Statistics
##
## Accuracy : 0.9738
## 95% CI : (0.9547, 0.9864)
## No Information Rate : 0.3537
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.9679
##
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: Calabria Class: Coast.Sardinia Class: East.Liguria
## Sensitivity 0.95122 0.96000 1.00000
## Specificity 0.99041 1.00000 1.00000
## Pos Pred Value 0.90698 1.00000 1.00000
## Neg Pred Value 0.99518 0.99770 1.00000
## Precision 0.90698 1.00000 1.00000
## Recall 0.95122 0.96000 1.00000
## F1 0.92857 0.97959 1.00000
## Prevalence 0.08952 0.05459 0.08952
## Detection Rate 0.08515 0.05240 0.08952
## Detection Prevalence 0.09389 0.05240 0.08952
## Balanced Accuracy 0.97081 0.98000 1.00000
## Class: Inland.Sardinia Class: North.Apulia Class: Sicily
## Sensitivity 1.0000 0.95455 0.78788
## Specificity 0.9975 0.99541 0.99765
## Pos Pred Value 0.9821 0.91304 0.96296
## Neg Pred Value 1.0000 0.99770 0.98376
## Precision 0.9821 0.91304 0.96296
## Recall 1.0000 0.95455 0.78788
## F1 0.9910 0.93333 0.86667
## Prevalence 0.1201 0.04803 0.07205
## Detection Rate 0.1201 0.04585 0.05677
## Detection Prevalence 0.1223 0.05022 0.05895
## Balanced Accuracy 0.9988 0.97498 0.89276
## Class: South.Apulia Class: Umbria Class: West.Liguria
## Sensitivity 0.9938 1.00000 1.00000
## Specificity 0.9865 1.00000 1.00000
## Pos Pred Value 0.9758 1.00000 1.00000
## Neg Pred Value 0.9966 1.00000 1.00000
## Precision 0.9758 1.00000 1.00000
## Recall 0.9938 1.00000 1.00000
## F1 0.9847 1.00000 1.00000
## Prevalence 0.3537 0.08734 0.08515
## Detection Rate 0.3515 0.08734 0.08515
## Detection Prevalence 0.3603 0.08734 0.08515
## Balanced Accuracy 0.9902 1.00000 1.00000
Treenimisandmetel Kappa=0.9679. Testandmetel:
confusionMatrix(predict(svm.radial, olive.test.scale),olive.test$area,mode = "everything")
## Confusion Matrix and Statistics
##
## Reference
## Prediction Calabria Coast.Sardinia East.Liguria Inland.Sardinia
## Calabria 15 0 0 0
## Coast.Sardinia 0 8 0 0
## East.Liguria 0 0 7 0
## Inland.Sardinia 0 0 0 10
## North.Apulia 0 0 0 0
## Sicily 0 0 0 0
## South.Apulia 0 0 0 0
## Umbria 0 0 1 0
## West.Liguria 0 0 1 0
## Reference
## Prediction North.Apulia Sicily South.Apulia Umbria West.Liguria
## Calabria 0 1 1 0 0
## Coast.Sardinia 0 0 0 0 0
## East.Liguria 0 0 0 0 2
## Inland.Sardinia 0 0 0 0 0
## North.Apulia 3 0 0 0 0
## Sicily 0 2 2 0 0
## South.Apulia 0 0 41 0 0
## Umbria 0 0 0 11 0
## West.Liguria 0 0 0 0 9
##
## Overall Statistics
##
## Accuracy : 0.9298
## 95% CI : (0.8664, 0.9692)
## No Information Rate : 0.386
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.9126
##
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: Calabria Class: Coast.Sardinia Class: East.Liguria
## Sensitivity 1.0000 1.00000 0.77778
## Specificity 0.9798 1.00000 0.98095
## Pos Pred Value 0.8824 1.00000 0.77778
## Neg Pred Value 1.0000 1.00000 0.98095
## Precision 0.8824 1.00000 0.77778
## Recall 1.0000 1.00000 0.77778
## F1 0.9375 1.00000 0.77778
## Prevalence 0.1316 0.07018 0.07895
## Detection Rate 0.1316 0.07018 0.06140
## Detection Prevalence 0.1491 0.07018 0.07895
## Balanced Accuracy 0.9899 1.00000 0.87937
## Class: Inland.Sardinia Class: North.Apulia Class: Sicily
## Sensitivity 1.00000 1.00000 0.66667
## Specificity 1.00000 1.00000 0.98198
## Pos Pred Value 1.00000 1.00000 0.50000
## Neg Pred Value 1.00000 1.00000 0.99091
## Precision 1.00000 1.00000 0.50000
## Recall 1.00000 1.00000 0.66667
## F1 1.00000 1.00000 0.57143
## Prevalence 0.08772 0.02632 0.02632
## Detection Rate 0.08772 0.02632 0.01754
## Detection Prevalence 0.08772 0.02632 0.03509
## Balanced Accuracy 1.00000 1.00000 0.82432
## Class: South.Apulia Class: Umbria Class: West.Liguria
## Sensitivity 0.9318 1.00000 0.81818
## Specificity 1.0000 0.99029 0.99029
## Pos Pred Value 1.0000 0.91667 0.90000
## Neg Pred Value 0.9589 1.00000 0.98077
## Precision 1.0000 0.91667 0.90000
## Recall 0.9318 1.00000 0.81818
## F1 0.9647 0.95652 0.85714
## Prevalence 0.3860 0.09649 0.09649
## Detection Rate 0.3596 0.09649 0.07895
## Detection Prevalence 0.3596 0.10526 0.08772
## Balanced Accuracy 0.9659 0.99515 0.90424
Testandmetel Kappa=0.9126.
Muutujate tähtsus KNN klassifitseerimisel
plot(varImp(svm.radial))
Atribuutide tähtsus erinevates klassides on erinev.
Mycluster = makeCluster(detectCores()-1)
registerDoParallel(Mycluster)
system.time({
train.control <- trainControl(method = "cv",number = 5,search = "random")
RNGkind(sample.kind = "Rounding")
set.seed(123)
svm.poly <- train(area~.,data=olive.train.scale, method = "svmPoly", trControl = train.control, tuneLength = 20,metric = "Kappa")
})
## user system elapsed
## 0.53 0.00 5.20
stopCluster(Mycluster)
registerDoSEQ()
plot(svm.poly)
svm.poly
## Support Vector Machines with Polynomial Kernel
##
## 458 samples
## 8 predictor
## 9 classes: 'Calabria', 'Coast.Sardinia', 'East.Liguria', 'Inland.Sardinia', 'North.Apulia', 'Sicily', 'South.Apulia', 'Umbria', 'West.Liguria'
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 367, 365, 368, 367, 365
## Resampling results across tuning parameters:
##
## degree scale C Accuracy Kappa
## 1 1.350448e-05 10.66062234 0.3537059 0.0000000
## 1 1.403411e-04 78.77238023 0.7925637 0.7413192
## 1 4.860568e-04 343.86069012 0.9542700 0.9440001
## 1 5.700872e-02 0.13231889 0.7490071 0.6843434
## 1 1.048617e-01 0.11768275 0.8056576 0.7578714
## 2 6.023569e-05 233.52217294 0.8799638 0.8516818
## 2 3.410707e-04 0.49640502 0.3537059 0.0000000
## 2 7.657911e-03 0.35245513 0.6616590 0.5683239
## 2 2.485111e-02 2.30675518 0.9543172 0.9440720
## 2 1.647391e-01 0.11098902 0.9586900 0.9493377
## 2 6.068960e-01 3.10157522 0.9261223 0.9101695
## 3 5.714618e-05 31.48794977 0.6484706 0.5515120
## 3 1.689878e-04 1.53362062 0.3537059 0.0000000
## 3 3.410443e-03 0.26756677 0.4759400 0.2733118
## 3 1.411122e-02 3.97081942 0.9565394 0.9467365
## 3 2.991687e-02 0.15247781 0.8604443 0.8266708
## 3 4.586162e-02 126.57710659 0.9412234 0.9286316
## 3 4.705130e-02 2.32655459 0.9673638 0.9600290
## 3 1.273563e+00 0.05032669 0.9347489 0.9203501
## 3 1.864893e+00 1.44661575 0.9304478 0.9151187
##
## Kappa was used to select the optimal model using the largest value.
## The final values used for the model were degree = 3, scale = 0.0470513 and C
## = 2.326555.
head(svm.poly$results[order(-svm.poly$results$Kappa),],5)
## degree scale C Accuracy Kappa AccuracySD KappaSD
## 18 3 0.0470513011 2.326555 0.9673638 0.9600290 0.02309314 0.02827340
## 10 2 0.1647390821 0.110989 0.9586900 0.9493377 0.02066948 0.02536111
## 15 3 0.0141112236 3.970819 0.9565394 0.9467365 0.02839641 0.03485378
## 9 2 0.0248511123 2.306755 0.9543172 0.9440720 0.02552114 0.03132162
## 3 1 0.0004860568 343.860690 0.9542700 0.9440001 0.02887019 0.03541881
Parim leitud mudel:
svm.poly$bestTune
## degree scale C
## 18 3 0.0470513 2.326555
Parima mudeli parameetrid: degree = 3, scale = 0.0470513, C = 2.326555.
Klassifitseerimise tulemused treenimisandmetel:
confusionMatrix(predict(svm.poly),olive.train$area,mode = "everything")
## Confusion Matrix and Statistics
##
## Reference
## Prediction Calabria Coast.Sardinia East.Liguria Inland.Sardinia
## Calabria 40 0 0 0
## Coast.Sardinia 0 24 0 0
## East.Liguria 0 0 41 0
## Inland.Sardinia 0 1 0 55
## North.Apulia 0 0 0 0
## Sicily 0 0 0 0
## South.Apulia 1 0 0 0
## Umbria 0 0 0 0
## West.Liguria 0 0 0 0
## Reference
## Prediction North.Apulia Sicily South.Apulia Umbria West.Liguria
## Calabria 1 3 0 0 0
## Coast.Sardinia 0 0 0 0 0
## East.Liguria 0 0 0 0 0
## Inland.Sardinia 0 0 0 0 0
## North.Apulia 21 1 0 0 0
## Sicily 0 27 1 0 0
## South.Apulia 0 2 161 0 0
## Umbria 0 0 0 40 0
## West.Liguria 0 0 0 0 39
##
## Overall Statistics
##
## Accuracy : 0.9782
## 95% CI : (0.9602, 0.9895)
## No Information Rate : 0.3537
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.9733
##
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: Calabria Class: Coast.Sardinia Class: East.Liguria
## Sensitivity 0.97561 0.96000 1.00000
## Specificity 0.99041 1.00000 1.00000
## Pos Pred Value 0.90909 1.00000 1.00000
## Neg Pred Value 0.99758 0.99770 1.00000
## Precision 0.90909 1.00000 1.00000
## Recall 0.97561 0.96000 1.00000
## F1 0.94118 0.97959 1.00000
## Prevalence 0.08952 0.05459 0.08952
## Detection Rate 0.08734 0.05240 0.08952
## Detection Prevalence 0.09607 0.05240 0.08952
## Balanced Accuracy 0.98301 0.98000 1.00000
## Class: Inland.Sardinia Class: North.Apulia Class: Sicily
## Sensitivity 1.0000 0.95455 0.81818
## Specificity 0.9975 0.99771 0.99765
## Pos Pred Value 0.9821 0.95455 0.96429
## Neg Pred Value 1.0000 0.99771 0.98605
## Precision 0.9821 0.95455 0.96429
## Recall 1.0000 0.95455 0.81818
## F1 0.9910 0.95455 0.88525
## Prevalence 0.1201 0.04803 0.07205
## Detection Rate 0.1201 0.04585 0.05895
## Detection Prevalence 0.1223 0.04803 0.06114
## Balanced Accuracy 0.9988 0.97613 0.90791
## Class: South.Apulia Class: Umbria Class: West.Liguria
## Sensitivity 0.9938 1.00000 1.00000
## Specificity 0.9899 1.00000 1.00000
## Pos Pred Value 0.9817 1.00000 1.00000
## Neg Pred Value 0.9966 1.00000 1.00000
## Precision 0.9817 1.00000 1.00000
## Recall 0.9938 1.00000 1.00000
## F1 0.9877 1.00000 1.00000
## Prevalence 0.3537 0.08734 0.08515
## Detection Rate 0.3515 0.08734 0.08515
## Detection Prevalence 0.3581 0.08734 0.08515
## Balanced Accuracy 0.9918 1.00000 1.00000
Treenimisandmetel Kappa=0.9733. Testandmetel:
confusionMatrix(predict(svm.poly, olive.test.scale),olive.test$area,mode = "everything")
## Confusion Matrix and Statistics
##
## Reference
## Prediction Calabria Coast.Sardinia East.Liguria Inland.Sardinia
## Calabria 15 0 0 0
## Coast.Sardinia 0 8 0 0
## East.Liguria 0 0 7 0
## Inland.Sardinia 0 0 0 10
## North.Apulia 0 0 0 0
## Sicily 0 0 0 0
## South.Apulia 0 0 0 0
## Umbria 0 0 1 0
## West.Liguria 0 0 1 0
## Reference
## Prediction North.Apulia Sicily South.Apulia Umbria West.Liguria
## Calabria 0 1 1 0 0
## Coast.Sardinia 0 0 0 0 0
## East.Liguria 0 0 0 0 0
## Inland.Sardinia 0 0 0 0 0
## North.Apulia 3 0 0 0 0
## Sicily 0 2 2 0 0
## South.Apulia 0 0 41 0 0
## Umbria 0 0 0 11 0
## West.Liguria 0 0 0 0 11
##
## Overall Statistics
##
## Accuracy : 0.9474
## 95% CI : (0.889, 0.9804)
## No Information Rate : 0.386
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.9344
##
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: Calabria Class: Coast.Sardinia Class: East.Liguria
## Sensitivity 1.0000 1.00000 0.77778
## Specificity 0.9798 1.00000 1.00000
## Pos Pred Value 0.8824 1.00000 1.00000
## Neg Pred Value 1.0000 1.00000 0.98131
## Precision 0.8824 1.00000 1.00000
## Recall 1.0000 1.00000 0.77778
## F1 0.9375 1.00000 0.87500
## Prevalence 0.1316 0.07018 0.07895
## Detection Rate 0.1316 0.07018 0.06140
## Detection Prevalence 0.1491 0.07018 0.06140
## Balanced Accuracy 0.9899 1.00000 0.88889
## Class: Inland.Sardinia Class: North.Apulia Class: Sicily
## Sensitivity 1.00000 1.00000 0.66667
## Specificity 1.00000 1.00000 0.98198
## Pos Pred Value 1.00000 1.00000 0.50000
## Neg Pred Value 1.00000 1.00000 0.99091
## Precision 1.00000 1.00000 0.50000
## Recall 1.00000 1.00000 0.66667
## F1 1.00000 1.00000 0.57143
## Prevalence 0.08772 0.02632 0.02632
## Detection Rate 0.08772 0.02632 0.01754
## Detection Prevalence 0.08772 0.02632 0.03509
## Balanced Accuracy 1.00000 1.00000 0.82432
## Class: South.Apulia Class: Umbria Class: West.Liguria
## Sensitivity 0.9318 1.00000 1.00000
## Specificity 1.0000 0.99029 0.99029
## Pos Pred Value 1.0000 0.91667 0.91667
## Neg Pred Value 0.9589 1.00000 1.00000
## Precision 1.0000 0.91667 0.91667
## Recall 0.9318 1.00000 1.00000
## F1 0.9647 0.95652 0.95652
## Prevalence 0.3860 0.09649 0.09649
## Detection Rate 0.3596 0.09649 0.09649
## Detection Prevalence 0.3596 0.10526 0.10526
## Balanced Accuracy 0.9659 0.99515 0.99515
Testandmetel Kappa=0.9344.
Muutujate tähtsus KNN klassifitseerimisel
plot(varImp(svm.poly))
Atribuutide tähtsus erinevates klassides on erinev.
Mycluster = makeCluster(detectCores()-1)
registerDoParallel(Mycluster)
system.time({
tuneGrid_mnl <- expand.grid(decay = seq(0, 1, by = 0.1))
train.control <- trainControl(method = "cv",number = 5,search = "grid",
classProbs = TRUE,
summaryFunction = multiClassSummary)
RNGkind(sample.kind = "Rounding")
set.seed(123)
fit.logit <- train(area~.,data=olive.train, method = "multinom", trControl = train.control,metric = "Kappa",tuneGrid = tuneGrid_mnl,trace = FALSE)
})
## user system elapsed
## 0.53 0.01 4.64
stopCluster(Mycluster)
registerDoSEQ()
plot(fit.logit)
fit.logit
## Penalized Multinomial Regression
##
## 458 samples
## 8 predictor
## 9 classes: 'Calabria', 'Coast.Sardinia', 'East.Liguria', 'Inland.Sardinia', 'North.Apulia', 'Sicily', 'South.Apulia', 'Umbria', 'West.Liguria'
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 367, 365, 368, 367, 365
## Resampling results across tuning parameters:
##
## decay logLoss AUC prAUC Accuracy Kappa Mean_F1
## 0.0 0.6059458 0.9924479 0.7577432 0.9238300 0.9070667 0.8903739
## 0.1 0.2742515 0.9899711 0.7989435 0.9084927 0.8880737 0.8617584
## 0.2 0.3264671 0.9872370 0.7780617 0.8953515 0.8717804 0.8427825
## 0.3 0.3626803 0.9846453 0.7664523 0.8909315 0.8661399 0.8362198
## 0.4 0.3909264 0.9831358 0.7592052 0.8865587 0.8607683 0.8314279
## 0.5 0.4143118 0.9817613 0.7522944 0.8800599 0.8527959 0.8217217
## 0.6 0.4343926 0.9804581 0.7493196 0.8779093 0.8503018 0.8207851
## 0.7 0.4520672 0.9796006 0.7441995 0.8778849 0.8501804 0.8194847
## 0.8 0.4678802 0.9785066 0.7406718 0.8713616 0.8421864 0.8364612
## 0.9 0.4822288 0.9775978 0.7353966 0.8647438 0.8339152 0.8256718
## 1.0 0.4953887 0.9767387 0.7313853 0.8625460 0.8310568 0.8210842
## Mean_Sensitivity Mean_Specificity Mean_Pos_Pred_Value Mean_Neg_Pred_Value
## 0.8911570 0.9902443 0.9030123 0.9902015
## 0.8656241 0.9883104 0.8948403 0.9887511
## 0.8472379 0.9864922 0.8804211 0.9872010
## 0.8420681 0.9858351 0.8696200 0.9868144
## 0.8365125 0.9853028 0.8638640 0.9862703
## 0.8265125 0.9845183 0.8413612 0.9854713
## 0.8258391 0.9842569 0.8404798 0.9850972
## 0.8237558 0.9842601 0.8404300 0.9852261
## 0.8153343 0.9834662 0.8223777 0.9844549
## 0.8056429 0.9825441 0.8169985 0.9836856
## 0.8019392 0.9821674 0.8135017 0.9834393
## Mean_Precision Mean_Recall Mean_Detection_Rate Mean_Balanced_Accuracy
## 0.9030123 0.8911570 0.10264778 0.9407007
## 0.8948403 0.8656241 0.10094363 0.9269672
## 0.8804211 0.8472379 0.09948350 0.9168650
## 0.8696200 0.8420681 0.09899239 0.9139516
## 0.8638640 0.8365125 0.09850653 0.9109077
## 0.8413612 0.8265125 0.09778443 0.9055154
## 0.8404798 0.8258391 0.09754548 0.9050480
## 0.8404300 0.8237558 0.09754277 0.9040080
## 0.8223777 0.8153343 0.09681796 0.8994003
## 0.8169985 0.8056429 0.09608264 0.8940935
## 0.8135017 0.8019392 0.09583844 0.8920533
##
## Kappa was used to select the optimal model using the largest value.
## The final value used for the model was decay = 0.
head(fit.logit$results[order(-fit.logit$results$Kappa),],5)
## decay logLoss AUC prAUC Accuracy Kappa Mean_F1
## 1 0.0 0.6059458 0.9924479 0.7577432 0.9238300 0.9070667 0.8903739
## 2 0.1 0.2742515 0.9899711 0.7989435 0.9084927 0.8880737 0.8617584
## 3 0.2 0.3264671 0.9872370 0.7780617 0.8953515 0.8717804 0.8427825
## 4 0.3 0.3626803 0.9846453 0.7664523 0.8909315 0.8661399 0.8362198
## 5 0.4 0.3909264 0.9831358 0.7592052 0.8865587 0.8607683 0.8314279
## Mean_Sensitivity Mean_Specificity Mean_Pos_Pred_Value Mean_Neg_Pred_Value
## 1 0.8911570 0.9902443 0.9030123 0.9902015
## 2 0.8656241 0.9883104 0.8948403 0.9887511
## 3 0.8472379 0.9864922 0.8804211 0.9872010
## 4 0.8420681 0.9858351 0.8696200 0.9868144
## 5 0.8365125 0.9853028 0.8638640 0.9862703
## Mean_Precision Mean_Recall Mean_Detection_Rate Mean_Balanced_Accuracy
## 1 0.9030123 0.8911570 0.10264778 0.9407007
## 2 0.8948403 0.8656241 0.10094363 0.9269672
## 3 0.8804211 0.8472379 0.09948350 0.9168650
## 4 0.8696200 0.8420681 0.09899239 0.9139516
## 5 0.8638640 0.8365125 0.09850653 0.9109077
## logLossSD AUCSD prAUCSD AccuracySD KappaSD Mean_F1SD
## 1 0.23297584 0.003200508 0.03500452 0.02341025 0.02866900 0.04418392
## 2 0.07518731 0.006455353 0.02862192 0.03270825 0.04005961 0.05841365
## 3 0.07134740 0.006709591 0.02746782 0.02444432 0.03008097 0.04382832
## 4 0.06861522 0.007176839 0.02310044 0.02726844 0.03373790 0.04817611
## 5 0.06626223 0.007526678 0.02308407 0.02776222 0.03438352 0.05034381
## Mean_SensitivitySD Mean_SpecificitySD Mean_Pos_Pred_ValueSD
## 1 0.04213728 0.002929479 0.04611538
## 2 0.05824946 0.004077027 0.03461405
## 3 0.04630952 0.003244217 0.02272302
## 4 0.04967022 0.003726555 0.02831697
## 5 0.05116294 0.003761560 0.02931327
## Mean_Neg_Pred_ValueSD Mean_PrecisionSD Mean_RecallSD Mean_Detection_RateSD
## 1 0.002872770 0.04611538 0.04213728 0.002601138
## 2 0.004030121 0.03461405 0.05824946 0.003634250
## 3 0.003086322 0.02272302 0.04630952 0.002716036
## 4 0.003403333 0.02831697 0.04967022 0.003029827
## 5 0.003427401 0.02931327 0.05116294 0.003084691
## Mean_Balanced_AccuracySD
## 1 0.02247678
## 2 0.03113428
## 3 0.02474522
## 4 0.02663007
## 5 0.02740647
Parim leitud mudel:
fit.logit$bestTune
## decay
## 1 0
Parima mudeli parameetrid: decay = 0.
Klassifitseerimise tulemused treenimisandmetel:
confusionMatrix(predict(fit.logit), olive.train$area,mode = "everything")
## Confusion Matrix and Statistics
##
## Reference
## Prediction Calabria Coast.Sardinia East.Liguria Inland.Sardinia
## Calabria 37 0 0 0
## Coast.Sardinia 0 25 0 0
## East.Liguria 0 0 41 0
## Inland.Sardinia 0 0 0 55
## North.Apulia 0 0 0 0
## Sicily 3 0 0 0
## South.Apulia 1 0 0 0
## Umbria 0 0 0 0
## West.Liguria 0 0 0 0
## Reference
## Prediction North.Apulia Sicily South.Apulia Umbria West.Liguria
## Calabria 0 1 0 0 0
## Coast.Sardinia 0 0 0 0 0
## East.Liguria 0 0 0 0 0
## Inland.Sardinia 0 0 0 0 0
## North.Apulia 22 0 0 0 0
## Sicily 0 30 2 0 0
## South.Apulia 0 2 160 0 0
## Umbria 0 0 0 40 0
## West.Liguria 0 0 0 0 39
##
## Overall Statistics
##
## Accuracy : 0.9803
## 95% CI : (0.963, 0.991)
## No Information Rate : 0.3537
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.976
##
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: Calabria Class: Coast.Sardinia Class: East.Liguria
## Sensitivity 0.90244 1.00000 1.00000
## Specificity 0.99760 1.00000 1.00000
## Pos Pred Value 0.97368 1.00000 1.00000
## Neg Pred Value 0.99048 1.00000 1.00000
## Precision 0.97368 1.00000 1.00000
## Recall 0.90244 1.00000 1.00000
## F1 0.93671 1.00000 1.00000
## Prevalence 0.08952 0.05459 0.08952
## Detection Rate 0.08079 0.05459 0.08952
## Detection Prevalence 0.08297 0.05459 0.08952
## Balanced Accuracy 0.95002 1.00000 1.00000
## Class: Inland.Sardinia Class: North.Apulia Class: Sicily
## Sensitivity 1.0000 1.00000 0.90909
## Specificity 1.0000 1.00000 0.98824
## Pos Pred Value 1.0000 1.00000 0.85714
## Neg Pred Value 1.0000 1.00000 0.99291
## Precision 1.0000 1.00000 0.85714
## Recall 1.0000 1.00000 0.90909
## F1 1.0000 1.00000 0.88235
## Prevalence 0.1201 0.04803 0.07205
## Detection Rate 0.1201 0.04803 0.06550
## Detection Prevalence 0.1201 0.04803 0.07642
## Balanced Accuracy 1.0000 1.00000 0.94866
## Class: South.Apulia Class: Umbria Class: West.Liguria
## Sensitivity 0.9877 1.00000 1.00000
## Specificity 0.9899 1.00000 1.00000
## Pos Pred Value 0.9816 1.00000 1.00000
## Neg Pred Value 0.9932 1.00000 1.00000
## Precision 0.9816 1.00000 1.00000
## Recall 0.9877 1.00000 1.00000
## F1 0.9846 1.00000 1.00000
## Prevalence 0.3537 0.08734 0.08515
## Detection Rate 0.3493 0.08734 0.08515
## Detection Prevalence 0.3559 0.08734 0.08515
## Balanced Accuracy 0.9888 1.00000 1.00000
Treenimisandmetel Kappa=0.976. Testandmetel:
confusionMatrix(predict(fit.logit, olive.test),olive.test$area,mode = "everything")
## Confusion Matrix and Statistics
##
## Reference
## Prediction Calabria Coast.Sardinia East.Liguria Inland.Sardinia
## Calabria 15 0 0 0
## Coast.Sardinia 0 8 0 0
## East.Liguria 0 0 6 0
## Inland.Sardinia 0 0 0 10
## North.Apulia 0 0 0 0
## Sicily 0 0 0 0
## South.Apulia 0 0 1 0
## Umbria 0 0 2 0
## West.Liguria 0 0 0 0
## Reference
## Prediction North.Apulia Sicily South.Apulia Umbria West.Liguria
## Calabria 0 0 1 0 0
## Coast.Sardinia 0 0 0 0 0
## East.Liguria 0 0 0 0 0
## Inland.Sardinia 0 0 0 0 0
## North.Apulia 3 0 0 0 0
## Sicily 0 3 3 0 0
## South.Apulia 0 0 40 0 2
## Umbria 0 0 0 11 0
## West.Liguria 0 0 0 0 9
##
## Overall Statistics
##
## Accuracy : 0.9211
## 95% CI : (0.8554, 0.9633)
## No Information Rate : 0.386
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.9011
##
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: Calabria Class: Coast.Sardinia Class: East.Liguria
## Sensitivity 1.0000 1.00000 0.66667
## Specificity 0.9899 1.00000 1.00000
## Pos Pred Value 0.9375 1.00000 1.00000
## Neg Pred Value 1.0000 1.00000 0.97222
## Precision 0.9375 1.00000 1.00000
## Recall 1.0000 1.00000 0.66667
## F1 0.9677 1.00000 0.80000
## Prevalence 0.1316 0.07018 0.07895
## Detection Rate 0.1316 0.07018 0.05263
## Detection Prevalence 0.1404 0.07018 0.05263
## Balanced Accuracy 0.9949 1.00000 0.83333
## Class: Inland.Sardinia Class: North.Apulia Class: Sicily
## Sensitivity 1.00000 1.00000 1.00000
## Specificity 1.00000 1.00000 0.97297
## Pos Pred Value 1.00000 1.00000 0.50000
## Neg Pred Value 1.00000 1.00000 1.00000
## Precision 1.00000 1.00000 0.50000
## Recall 1.00000 1.00000 1.00000
## F1 1.00000 1.00000 0.66667
## Prevalence 0.08772 0.02632 0.02632
## Detection Rate 0.08772 0.02632 0.02632
## Detection Prevalence 0.08772 0.02632 0.05263
## Balanced Accuracy 1.00000 1.00000 0.98649
## Class: South.Apulia Class: Umbria Class: West.Liguria
## Sensitivity 0.9091 1.00000 0.81818
## Specificity 0.9571 0.98058 1.00000
## Pos Pred Value 0.9302 0.84615 1.00000
## Neg Pred Value 0.9437 1.00000 0.98095
## Precision 0.9302 0.84615 1.00000
## Recall 0.9091 1.00000 0.81818
## F1 0.9195 0.91667 0.90000
## Prevalence 0.3860 0.09649 0.09649
## Detection Rate 0.3509 0.09649 0.07895
## Detection Prevalence 0.3772 0.11404 0.07895
## Balanced Accuracy 0.9331 0.99029 0.90909
Testandmetel Kappa=0.9011.
Muutujate tähtsus KNN klassifitseerimisel
plot(varImp(fit.logit))
Atribuutide tähtsus erinevates klassides on erinev.
Tulemusena on saadud järgmised mudelid:
library(knitr)
library(kableExtra)
mudel <- c("KNN", "NB", "SVMpoly", "LogR")
Kappa_train <- c(0.9893, 0.9439, 0.9733,0.976)
Kappa_test <- c(0.9125, 0.8805, 0.9344,0.9011)
tabel <- cbind("Mudel"=mudel,"Kappa_train"=Kappa_train,"Kappa_test"=Kappa_test)
x <- kable_styling(kable(tabel, caption="Parimad mudelid"), c("striped","bordered","hover"),full_width = F, position = "left")
column_spec(column_spec(x,1, width_min = "25em"),2:3, width_min = "5em")
| Mudel | Kappa_train | Kappa_test |
|---|---|---|
| KNN | 0.9893 | 0.9125 |
| NB | 0.9439 | 0.8805 |
| SVMpoly | 0.9733 | 0.9344 |
| LogR | 0.976 | 0.9011 |