Käesolevas õppematerjalis on tehtud põhiliste klassifitseerimisalgoritmide ülevaade mitme klassiga klassifitseerimisülesande näitel. Mudelite konstrueerimiseks on kasutatud andmestik olive paketist dslabs:

library(dslabs)
data(olive)
library(DT)
datatable(olive,options = list(scrollX = TRUE,dom = 'ltip',ordering=F,pageLength = 5))

Andmestiku struktuur:

str(olive)
## 'data.frame':    572 obs. of  10 variables:
##  $ region     : Factor w/ 3 levels "Northern Italy",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ area       : Factor w/ 9 levels "Calabria","Coast-Sardinia",..: 5 5 5 5 5 5 5 5 5 5 ...
##  $ palmitic   : num  10.75 10.88 9.11 9.66 10.51 ...
##  $ palmitoleic: num  0.75 0.73 0.54 0.57 0.67 0.49 0.66 0.61 0.6 0.55 ...
##  $ stearic    : num  2.26 2.24 2.46 2.4 2.59 2.68 2.64 2.35 2.39 2.13 ...
##  $ oleic      : num  78.2 77.1 81.1 79.5 77.7 ...
##  $ linoleic   : num  6.72 7.81 5.49 6.19 6.72 6.78 6.18 7.34 7.09 6.33 ...
##  $ linolenic  : num  0.36 0.31 0.31 0.5 0.5 0.51 0.49 0.39 0.46 0.26 ...
##  $ arachidic  : num  0.6 0.61 0.63 0.78 0.8 0.7 0.56 0.64 0.83 0.52 ...
##  $ eicosenoic : num  0.29 0.29 0.29 0.35 0.46 0.44 0.29 0.35 0.33 0.3 ...

Ülesanne: konstrueerida mudel oliiviõli päritolu piirkonna area määramiseks oliiviõlis rasvhapete sisalduse järgi.

Eemaldame andmestikust tunnust region, mis annab otseset vihjet piirkonnala ning hakkab segama rasvhapete sisalduse järgi klassifitseerimisel:

olive <- olive[,-1]

Andmestikust statistiline ülevaade:

summary(olive)
##               area        palmitic      palmitoleic        stearic     
##  South-Apulia   :206   Min.   : 6.10   Min.   :0.1500   Min.   :1.520  
##  Inland-Sardinia: 65   1st Qu.:10.95   1st Qu.:0.8775   1st Qu.:2.050  
##  Calabria       : 56   Median :12.01   Median :1.1000   Median :2.230  
##  Umbria         : 51   Mean   :12.32   Mean   :1.2609   Mean   :2.289  
##  East-Liguria   : 50   3rd Qu.:13.60   3rd Qu.:1.6925   3rd Qu.:2.490  
##  West-Liguria   : 50   Max.   :17.53   Max.   :2.8000   Max.   :3.750  
##  (Other)        : 94                                                   
##      oleic          linoleic        linolenic        arachidic    
##  Min.   :63.00   Min.   : 4.480   Min.   :0.0000   Min.   :0.000  
##  1st Qu.:70.00   1st Qu.: 7.707   1st Qu.:0.2600   1st Qu.:0.500  
##  Median :73.03   Median :10.300   Median :0.3300   Median :0.610  
##  Mean   :73.12   Mean   : 9.805   Mean   :0.3189   Mean   :0.581  
##  3rd Qu.:76.80   3rd Qu.:11.807   3rd Qu.:0.4025   3rd Qu.:0.700  
##  Max.   :84.10   Max.   :14.700   Max.   :0.7400   Max.   :1.050  
##                                                                   
##    eicosenoic    
##  Min.   :0.0100  
##  1st Qu.:0.0200  
##  Median :0.1700  
##  Mean   :0.1628  
##  3rd Qu.:0.2800  
##  Max.   :0.5800  
## 

Näeme, et meie prognoositav tunnus area moodustab tasakaalustamata klasse, kõige rohkem on esitatud South-Apulia.

prop.table(table(olive$area))*100
## 
##        Calabria  Coast-Sardinia    East-Liguria Inland-Sardinia    North-Apulia 
##        9.790210        5.769231        8.741259       11.363636        4.370629 
##          Sicily    South-Apulia          Umbria    West-Liguria 
##        6.293706       36.013986        8.916084        8.741259

Sel juhul accuracy ei ole õige meetrika mudeli täpsuse määramiseks ja parem on kasutada nt Cohen´i Kappa kordajat või F1 meetrikat.

Eraldame andmestikust 80% treeningandmeteks ja 20% testandmeteks

RNGkind(sample.kind = "Rounding")
set.seed(123)
sam <- sample(1:nrow(olive),floor(nrow(olive)*0.2))
olive.train <- olive[-sam,]
olive.test <- olive[sam,]

Andmete ettevalmistamisel on vaja tunnuse area faktortasemeid ümber nimetada, et vastaksid R-i standrdile (nt sidekrips nimetuses ei ole lubatud)

levels(olive.train$area) <- make.names(levels(olive.train$area))
levels(olive.train$area)
## [1] "Calabria"        "Coast.Sardinia"  "East.Liguria"    "Inland.Sardinia"
## [5] "North.Apulia"    "Sicily"          "South.Apulia"    "Umbria"         
## [9] "West.Liguria"
levels(olive.test$area) <- make.names(levels(olive.test$area))
levels(olive.test$area)
## [1] "Calabria"        "Coast.Sardinia"  "East.Liguria"    "Inland.Sardinia"
## [5] "North.Apulia"    "Sicily"          "South.Apulia"    "Umbria"         
## [9] "West.Liguria"

k-lähima naabri meetod (KNN)

R kasutatab kaalutud lähima naabri klassifikaatorit, mille korral lähimatele objektidele antakse klassifitseerimisel suurem kaal. Kaalude arvutamiseks kasutatakse sobivat kauguste tuuma (kernel) funktsiooni. Kuna meetod põhineb kaugustel, arvulised tunnused nõuvad skaleerimist, mittearvulistest dummy variables moodustamist. Vt nt https://datasciencebook.ca/classification.html#classification-with-k-nearest-neighbors.

olive.train.scale=olive.train
olive.train.scale[,names(olive.train[,-1])]=lapply(olive.train.scale[,names(olive.train[,-1])],scale)
olive.test.scale=olive.test
olive.test.scale[,names(olive.test[,-1])]=lapply(olive.test.scale[,names(olive.test[,-1])],scale)

Kasutame paketti caret KNN klassikikaatori tuunimiseks. Funktsiooni trainControl korral määrame summaryFunction = multiClassSummary ja meetrikaks funktsioonis train määrame Kappa. Et kõik teised meetikad nt AUC oleksid kättesaadavad, on vaja installida pakett MLmetrics

library(rpart)
library(caret)
library(MLmetrics)
library(doParallel)
Mycluster = makeCluster(detectCores()-1)
registerDoParallel(Mycluster)
system.time({
RNGkind(sample.kind = "Rounding")
set.seed(123)
train.control <- trainControl(method = "cv",number = 5 ,search = "random",classProbs = TRUE,summaryFunction = multiClassSummary)
tuneGrid <- expand.grid(kmax = 3:10,distance = 1:4,kernel = c('gaussian','triangular','rectangular','epanechnikov','optimal'))
RNGkind(sample.kind = "Rounding")
set.seed(123)
kknn_fit <- train(area~.,olive.train.scale, method = 'kknn',trControl = train.control,tuneGrid = tuneGrid,metric = "Kappa")
})
##    user  system elapsed 
##    1.92    0.09   37.90
stopCluster(Mycluster)
registerDoSEQ()
plot(kknn_fit)

kknn_fit
## k-Nearest Neighbors 
## 
## 458 samples
##   8 predictor
##   9 classes: 'Calabria', 'Coast.Sardinia', 'East.Liguria', 'Inland.Sardinia', 'North.Apulia', 'Sicily', 'South.Apulia', 'Umbria', 'West.Liguria' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 367, 365, 368, 367, 365 
## Resampling results across tuning parameters:
## 
##   kmax  distance  kernel        logLoss    AUC        prAUC       Accuracy 
##    3    1         gaussian      0.9363879  0.9782953  0.10015755  0.9475316
##    3    1         triangular    0.6505765  0.9847859  0.11325723  0.9586183
##    3    1         rectangular   1.2189864  0.9713236  0.05731363  0.9586183
##    3    1         epanechnikov  0.6503545  0.9847859  0.11325723  0.9586183
##    3    1         optimal       1.3533639  0.9684916  0.02913030  0.9608161
##    3    2         gaussian      0.6575607  0.9838142  0.13293189  0.9586656
##    3    2         triangular    0.9403771  0.9763739  0.08740472  0.9564433
##    3    2         rectangular   0.7272751  0.9830683  0.10828702  0.9563961
##    3    2         epanechnikov  0.8693190  0.9778247  0.09941942  0.9564433
##    3    2         optimal       1.5803031  0.9621353  0.03370578  0.9542455
##    3    3         gaussian      0.5843853  0.9844076  0.14321693  0.9630383
##    3    3         triangular    0.7304993  0.9799511  0.10616317  0.9608634
##    3    3         rectangular   0.5859785  0.9841660  0.13886124  0.9630383
##    3    3         epanechnikov  0.7305021  0.9799288  0.10592478  0.9608634
##    3    3         optimal       1.3533639  0.9695384  0.02813916  0.9608161
##    3    4         gaussian      0.9423186  0.9782121  0.12160467  0.9542928
##    3    4         triangular    0.7345656  0.9798879  0.11294677  0.9587128
##    3    4         rectangular   0.9418116  0.9776201  0.10176491  0.9565150
##    3    4         epanechnikov  0.7345731  0.9798879  0.11294677  0.9587128
##    3    4         optimal       1.5019178  0.9673001  0.02922410  0.9565150
##    4    1         gaussian      0.9389410  0.9782032  0.11361445  0.9519761
##    4    1         triangular    0.5135887  0.9876184  0.18076406  0.9563961
##    4    1         rectangular   1.1558969  0.9725699  0.09919010  0.9497294
##    4    1         epanechnikov  0.5139869  0.9875846  0.18048022  0.9563961
##    4    1         optimal       1.3533639  0.9684916  0.02913030  0.9608161
##    4    2         gaussian      0.6612447  0.9838255  0.15056988  0.9565150
##    4    2         triangular    0.5081438  0.9862277  0.17343080  0.9630383
##    4    2         rectangular   0.7272751  0.9830683  0.10828702  0.9563961
##    4    2         epanechnikov  0.5082729  0.9862390  0.19106879  0.9630383
##    4    2         optimal       1.5803031  0.9621353  0.03370578  0.9542455
##    4    3         gaussian      0.5867765  0.9844188  0.15085492  0.9608878
##    4    3         triangular    0.5827790  0.9841692  0.19010161  0.9653078
##    4    3         rectangular   0.5859785  0.9841660  0.13886124  0.9630383
##    4    3         epanechnikov  0.5796553  0.9841954  0.19794123  0.9674583
##    4    3         optimal       1.3533639  0.9695384  0.02813916  0.9608161
##    4    4         gaussian      0.5948099  0.9844582  0.19170660  0.9608161
##    4    4         triangular    0.5906185  0.9840986  0.18647639  0.9631573
##    4    4         rectangular   0.9451291  0.9776051  0.11238271  0.9565150
##    4    4         epanechnikov  0.5888310  0.9842318  0.18797144  0.9653078
##    4    4         optimal       1.5019178  0.9673001  0.02922410  0.9565150
##    5    1         gaussian      0.8701471  0.9785663  0.13907515  0.9519761
##    5    1         triangular    0.4455340  0.9880904  0.19358592  0.9541983
##    5    1         rectangular   1.1558969  0.9725699  0.09919010  0.9497294
##    5    1         epanechnikov  0.4461783  0.9880565  0.19330208  0.9541983
##    5    1         optimal       1.0790528  0.9732963  0.11419861  0.9586183
##    5    2         gaussian      0.6612447  0.9838255  0.15056988  0.9565150
##    5    2         triangular    0.5069134  0.9862390  0.21563108  0.9630383
##    5    2         rectangular   0.7272751  0.9830683  0.10828702  0.9563961
##    5    2         epanechnikov  0.5098013  0.9862390  0.21765128  0.9630383
##    5    2         optimal       0.5168291  0.9859048  0.22988617  0.9563961
##    5    3         gaussian      0.5867765  0.9844188  0.15085492  0.9608878
##    5    3         triangular    0.5789736  0.9842243  0.20153820  0.9674583
##    5    3         rectangular   0.5859785  0.9841660  0.13886124  0.9630383
##    5    3         epanechnikov  0.5801737  0.9841954  0.20139241  0.9674583
##    5    3         optimal       0.5178505  0.9857641  0.22878031  0.9652361
##    5    4         gaussian      0.5948099  0.9844582  0.19170660  0.9608161
##    5    4         triangular    0.5858816  0.9839292  0.21076386  0.9631573
##    5    4         rectangular   0.9451291  0.9776051  0.11238271  0.9565150
##    5    4         epanechnikov  0.5855379  0.9840986  0.21476334  0.9631573
##    5    4         optimal       0.5917624  0.9835343  0.23746601  0.9608634
##    6    1         gaussian      0.8688396  0.9786503  0.14898271  0.9541983
##    6    1         triangular    0.3794342  0.9904515  0.20124160  0.9541983
##    6    1         rectangular   1.1558969  0.9725699  0.09919010  0.9497294
##    6    1         epanechnikov  0.3804492  0.9904176  0.20790220  0.9541983
##    6    1         optimal       1.0790528  0.9732963  0.11419861  0.9586183
##    6    2         gaussian      0.6612447  0.9838255  0.15056988  0.9565150
##    6    2         triangular    0.5100586  0.9863111  0.23336470  0.9651889
##    6    2         rectangular   0.7272751  0.9830683  0.10828702  0.9563961
##    6    2         epanechnikov  0.5107600  0.9863111  0.22296788  0.9630383
##    6    2         optimal       0.5168291  0.9859048  0.22988617  0.9563961
##    6    3         gaussian      0.5867765  0.9844188  0.15085492  0.9608878
##    6    3         triangular    0.5789736  0.9842243  0.20153820  0.9674583
##    6    3         rectangular   0.5859785  0.9841660  0.13886124  0.9630383
##    6    3         epanechnikov  0.5801737  0.9841954  0.20139241  0.9674583
##    6    3         optimal       0.5178505  0.9857641  0.22878031  0.9652361
##    6    4         gaussian      0.6012326  0.9840625  0.21635074  0.9586183
##    6    4         triangular    0.5227002  0.9853670  0.25789421  0.9631573
##    6    4         rectangular   0.9451291  0.9776051  0.11238271  0.9565150
##    6    4         epanechnikov  0.5909420  0.9837029  0.24662971  0.9610067
##    6    4         optimal       0.5924129  0.9835782  0.24503510  0.9608634
##    7    1         gaussian      0.7386400  0.9808136  0.20094877  0.9520477
##    7    1         triangular    0.3822916  0.9904627  0.23205597  0.9541983
##    7    1         rectangular   1.1558969  0.9725699  0.09919010  0.9497294
##    7    1         epanechnikov  0.3835813  0.9903962  0.24283309  0.9541983
##    7    1         optimal       0.8743831  0.9778749  0.17900575  0.9564678
##    7    2         gaussian      0.6612447  0.9838255  0.15056988  0.9565150
##    7    2         triangular    0.5112485  0.9862390  0.23261160  0.9651889
##    7    2         rectangular   0.7272751  0.9830683  0.10828702  0.9563961
##    7    2         epanechnikov  0.5107600  0.9863111  0.22296788  0.9630383
##    7    2         optimal       0.5168291  0.9859048  0.22988617  0.9563961
##    7    3         gaussian      0.5867765  0.9844188  0.15085492  0.9608878
##    7    3         triangular    0.5870414  0.9839393  0.23606049  0.9652361
##    7    3         rectangular   0.5859785  0.9841660  0.13886124  0.9630383
##    7    3         epanechnikov  0.5801737  0.9841954  0.20139241  0.9674583
##    7    3         optimal       0.5224372  0.9855781  0.24650678  0.9630383
##    7    4         gaussian      0.6012326  0.9840625  0.21635074  0.9586183
##    7    4         triangular    0.5244481  0.9853300  0.27060035  0.9631573
##    7    4         rectangular   0.9451291  0.9776051  0.11238271  0.9565150
##    7    4         epanechnikov  0.5239043  0.9853344  0.26596567  0.9610067
##    7    4         optimal       0.5963256  0.9834608  0.26368037  0.9586656
##    8    1         gaussian      0.7386400  0.9808136  0.20094877  0.9520477
##    8    1         triangular    0.3833383  0.9904627  0.23483375  0.9541983
##    8    1         rectangular   1.1558969  0.9725699  0.09919010  0.9497294
##    8    1         epanechnikov  0.3847494  0.9903962  0.24561087  0.9541983
##    8    1         optimal       0.8734958  0.9778749  0.17900575  0.9564678
##    8    2         gaussian      0.6612447  0.9838255  0.15056988  0.9565150
##    8    2         triangular    0.3758316  0.9904248  0.30089724  0.9673867
##    8    2         rectangular   0.7272751  0.9830683  0.10828702  0.9563961
##    8    2         epanechnikov  0.4447955  0.9879179  0.25954280  0.9652361
##    8    2         optimal       0.5168291  0.9859048  0.22988617  0.9563961
##    8    3         gaussian      0.5867765  0.9844188  0.15085492  0.9608878
##    8    3         triangular    0.5884754  0.9838522  0.24366034  0.9630383
##    8    3         rectangular   0.5859785  0.9841660  0.13886124  0.9630383
##    8    3         epanechnikov  0.5850546  0.9838233  0.23617328  0.9652605
##    8    3         optimal       0.5237800  0.9854909  0.25410664  0.9630383
##    8    4         gaussian      0.6012326  0.9840625  0.21635074  0.9586183
##    8    4         triangular    0.5244978  0.9853351  0.27932611  0.9631573
##    8    4         rectangular   0.9451291  0.9776051  0.11238271  0.9565150
##    8    4         epanechnikov  0.5239043  0.9853344  0.26596567  0.9610067
##    8    4         optimal       0.6004049  0.9833912  0.27925820  0.9564433
##    9    1         gaussian      0.7386400  0.9808136  0.20094877  0.9520477
##    9    1         triangular    0.3833383  0.9904627  0.23483375  0.9541983
##    9    1         rectangular   1.1558969  0.9725699  0.09919010  0.9497294
##    9    1         epanechnikov  0.3847494  0.9903962  0.24561087  0.9541983
##    9    1         optimal       0.8734958  0.9778749  0.17900575  0.9564678
##    9    2         gaussian      0.6612447  0.9838255  0.15056988  0.9565150
##    9    2         triangular    0.3758316  0.9904248  0.30089724  0.9673867
##    9    2         rectangular   0.7272751  0.9830683  0.10828702  0.9563961
##    9    2         epanechnikov  0.4447955  0.9879179  0.25954280  0.9652361
##    9    2         optimal       0.4511456  0.9875670  0.25310791  0.9563961
##    9    3         gaussian      0.5867765  0.9844188  0.15085492  0.9608878
##    9    3         triangular    0.5884754  0.9838522  0.24366034  0.9630383
##    9    3         rectangular   0.5859785  0.9841660  0.13886124  0.9630383
##    9    3         epanechnikov  0.5850546  0.9838233  0.23617328  0.9652605
##    9    3         optimal       0.4597401  0.9871078  0.25578902  0.9630383
##    9    4         gaussian      0.6012326  0.9840625  0.21635074  0.9586183
##    9    4         triangular    0.5244978  0.9853351  0.27932611  0.9631573
##    9    4         rectangular   0.9451291  0.9776051  0.11238271  0.9565150
##    9    4         epanechnikov  0.5239043  0.9853344  0.26596567  0.9610067
##    9    4         optimal       0.6004049  0.9833912  0.27925820  0.9564433
##   10    1         gaussian      0.7386400  0.9808136  0.20094877  0.9520477
##   10    1         triangular    0.3860667  0.9904301  0.24172804  0.9541983
##   10    1         rectangular   1.1558969  0.9725699  0.09919010  0.9497294
##   10    1         epanechnikov  0.3847494  0.9903962  0.24561087  0.9541983
##   10    1         optimal       0.8734958  0.9778749  0.17900575  0.9564678
##   10    2         gaussian      0.6612447  0.9838255  0.15056988  0.9565150
##   10    2         triangular    0.3780721  0.9903594  0.30319863  0.9673867
##   10    2         rectangular   0.7272751  0.9830683  0.10828702  0.9563961
##   10    2         epanechnikov  0.4447955  0.9879179  0.25954280  0.9652361
##   10    2         optimal       0.4511456  0.9875670  0.25310791  0.9563961
##   10    3         gaussian      0.5867765  0.9844188  0.15085492  0.9608878
##   10    3         triangular    0.5884754  0.9838522  0.24366034  0.9630383
##   10    3         rectangular   0.5859785  0.9841660  0.13886124  0.9630383
##   10    3         epanechnikov  0.5850546  0.9838233  0.23617328  0.9652605
##   10    3         optimal       0.4597401  0.9871078  0.25578902  0.9630383
##   10    4         gaussian      0.6012326  0.9840625  0.21635074  0.9586183
##   10    4         triangular    0.5244978  0.9853351  0.27932611  0.9631573
##   10    4         rectangular   0.9451291  0.9776051  0.11238271  0.9565150
##   10    4         epanechnikov  0.5239043  0.9853344  0.26596567  0.9610067
##   10    4         optimal       0.6004049  0.9833912  0.27925820  0.9564433
##   Kappa      Mean_F1    Mean_Sensitivity  Mean_Specificity  Mean_Pos_Pred_Value
##   0.9354334  0.9279477  0.9228032         0.9925403         0.9502527          
##   0.9490984  0.9448089  0.9388085         0.9940944         0.9653958          
##   0.9491855  0.9448103  0.9388085         0.9942126         0.9617274          
##   0.9490984  0.9448089  0.9388085         0.9940944         0.9653958          
##   0.9518643  0.9488248  0.9425122         0.9944710         0.9659602          
##   0.9493506  0.9436772  0.9388085         0.9943307         0.9605659          
##   0.9466530  0.9402767  0.9341789         0.9941721         0.9559394          
##   0.9465329  0.9404979  0.9336498         0.9940538         0.9584812          
##   0.9466530  0.9402767  0.9341789         0.9941721         0.9559394          
##   0.9438920  0.9370791  0.9304752         0.9937954         0.9547775          
##   0.9547364  0.9506049  0.9471419         0.9949753         0.9627208          
##   0.9521330  0.9500902  0.9450134         0.9947078         0.9628119          
##   0.9547086  0.9514528  0.9471419         0.9948696         0.9643454          
##   0.9521330  0.9500902  0.9450134         0.9947078         0.9628119          
##   0.9520171  0.9498396  0.9444843         0.9945926         0.9638305          
##   0.9440240  0.9404325  0.9373415         0.9938045         0.9534252          
##   0.9494252  0.9474891  0.9418388         0.9943374         0.9614550          
##   0.9467095  0.9441154  0.9381351         0.9939638         0.9594943          
##   0.9494252  0.9474891  0.9418388         0.9943374         0.9614550          
##   0.9467383  0.9457574  0.9406363         0.9939638         0.9594732          
##   0.9409419  0.9350760  0.9287556         0.9931818         0.9570593          
##   0.9463797  0.9417116  0.9356339         0.9938234         0.9627062          
##   0.9381994  0.9319636  0.9265069         0.9929169         0.9508170          
##   0.9463797  0.9417116  0.9356339         0.9938234         0.9627062          
##   0.9518643  0.9488248  0.9425122         0.9944710         0.9659602          
##   0.9467242  0.9409176  0.9360307         0.9940693         0.9576029          
##   0.9547392  0.9531643  0.9484646         0.9949692         0.9641405          
##   0.9465329  0.9404979  0.9336498         0.9940538         0.9584812          
##   0.9547392  0.9531643  0.9484646         0.9949692         0.9641405          
##   0.9438920  0.9370791  0.9304752         0.9937954         0.9547775          
##   0.9521145  0.9471322  0.9443641         0.9947228         0.9585232          
##   0.9575694  0.9575241  0.9533468         0.9952432         0.9677502          
##   0.9547086  0.9514528  0.9471419         0.9948696         0.9643454          
##   0.9601589  0.9589160  0.9540202         0.9955047         0.9693874          
##   0.9520171  0.9498396  0.9444843         0.9945926         0.9638305          
##   0.9520202  0.9478162  0.9443641         0.9947043         0.9602516          
##   0.9548616  0.9549230  0.9501722         0.9948729         0.9663932          
##   0.9467095  0.9441154  0.9381351         0.9939638         0.9594943          
##   0.9574510  0.9563148  0.9508456         0.9951343         0.9680304          
##   0.9467383  0.9457574  0.9406363         0.9939638         0.9594732          
##   0.9409419  0.9350760  0.9287556         0.9931818         0.9570593          
##   0.9437009  0.9376971  0.9319302         0.9935650         0.9584734          
##   0.9381994  0.9319636  0.9265069         0.9929169         0.9508170          
##   0.9437009  0.9376971  0.9319302         0.9935650         0.9584734          
##   0.9491855  0.9448103  0.9388085         0.9942126         0.9617274          
##   0.9467242  0.9409176  0.9360307         0.9940693         0.9576029          
##   0.9547392  0.9531643  0.9484646         0.9949692         0.9641405          
##   0.9465329  0.9404979  0.9336498         0.9940538         0.9584812          
##   0.9547392  0.9531643  0.9484646         0.9949692         0.9641405          
##   0.9465631  0.9424676  0.9384117         0.9940632         0.9553948          
##   0.9521145  0.9471322  0.9443641         0.9947228         0.9585232          
##   0.9601589  0.9589160  0.9540202         0.9955047         0.9693874          
##   0.9547086  0.9514528  0.9471419         0.9948696         0.9643454          
##   0.9601589  0.9589160  0.9540202         0.9955047         0.9693874          
##   0.9574427  0.9547374  0.9508456         0.9952463         0.9649430          
##   0.9520202  0.9478162  0.9443641         0.9947043         0.9602516          
##   0.9547988  0.9527498  0.9476710         0.9948729         0.9654378          
##   0.9467095  0.9441154  0.9381351         0.9939638         0.9594943          
##   0.9547988  0.9527498  0.9476710         0.9948729         0.9654378          
##   0.9520187  0.9493476  0.9448932         0.9946049         0.9611169          
##   0.9437058  0.9363717  0.9319302         0.9935776         0.9561895          
##   0.9437009  0.9376971  0.9319302         0.9935650         0.9584734          
##   0.9381994  0.9319636  0.9265069         0.9929169         0.9508170          
##   0.9437009  0.9376971  0.9319302         0.9935650         0.9584734          
##   0.9491855  0.9448103  0.9388085         0.9942126         0.9617274          
##   0.9467242  0.9409176  0.9360307         0.9940693         0.9576029          
##   0.9573461  0.9546144  0.9487412         0.9952274         0.9664443          
##   0.9465329  0.9404979  0.9336498         0.9940538         0.9584812          
##   0.9547384  0.9531107  0.9480678         0.9949659         0.9644491          
##   0.9465631  0.9424676  0.9384117         0.9940632         0.9553948          
##   0.9521145  0.9471322  0.9443641         0.9947228         0.9585232          
##   0.9601589  0.9589160  0.9540202         0.9955047         0.9693874          
##   0.9547086  0.9514528  0.9471419         0.9948696         0.9643454          
##   0.9601589  0.9589160  0.9540202         0.9955047         0.9693874          
##   0.9574427  0.9547374  0.9508456         0.9952463         0.9649430          
##   0.9493115  0.9430599  0.9406604         0.9944333         0.9584334          
##   0.9547988  0.9527498  0.9476710         0.9948729         0.9654378          
##   0.9467095  0.9441154  0.9381351         0.9939638         0.9594943          
##   0.9521754  0.9483727  0.9444964         0.9946203         0.9617341          
##   0.9520187  0.9493476  0.9448932         0.9946049         0.9611169          
##   0.9410508  0.9319414  0.9266512         0.9933161         0.9534315          
##   0.9436815  0.9360445  0.9294290         0.9935650         0.9584932          
##   0.9381994  0.9319636  0.9265069         0.9929169         0.9508170          
##   0.9436815  0.9360445  0.9294290         0.9935650         0.9584932          
##   0.9465584  0.9416541  0.9356339         0.9939512         0.9597521          
##   0.9467242  0.9409176  0.9360307         0.9940693         0.9576029          
##   0.9573461  0.9546144  0.9487412         0.9952274         0.9664443          
##   0.9465329  0.9404979  0.9336498         0.9940538         0.9584812          
##   0.9547384  0.9531107  0.9480678         0.9949659         0.9644491          
##   0.9465631  0.9424676  0.9384117         0.9940632         0.9553948          
##   0.9521145  0.9471322  0.9443641         0.9947228         0.9585232          
##   0.9574411  0.9558994  0.9508456         0.9952337         0.9669183          
##   0.9547086  0.9514528  0.9471419         0.9948696         0.9643454          
##   0.9601589  0.9589160  0.9540202         0.9955047         0.9693874          
##   0.9547364  0.9506049  0.9471419         0.9949753         0.9627208          
##   0.9493115  0.9430599  0.9406604         0.9944333         0.9584334          
##   0.9547731  0.9498763  0.9451698         0.9948818         0.9637293          
##   0.9467095  0.9441154  0.9381351         0.9939638         0.9594943          
##   0.9521754  0.9483727  0.9444964         0.9946203         0.9617341          
##   0.9493124  0.9452151  0.9411895         0.9943339         0.9588946          
##   0.9410508  0.9319414  0.9266512         0.9933161         0.9534315          
##   0.9436815  0.9360445  0.9294290         0.9935650         0.9584932          
##   0.9381994  0.9319636  0.9265069         0.9929169         0.9508170          
##   0.9436815  0.9360445  0.9294290         0.9935650         0.9584932          
##   0.9465584  0.9416541  0.9356339         0.9939512         0.9597521          
##   0.9467242  0.9409176  0.9360307         0.9940693         0.9576029          
##   0.9600488  0.9574030  0.9515190         0.9954951         0.9689134          
##   0.9465329  0.9404979  0.9336498         0.9940538         0.9584812          
##   0.9574411  0.9558994  0.9508456         0.9952337         0.9669183          
##   0.9465631  0.9424676  0.9384117         0.9940632         0.9553948          
##   0.9521145  0.9471322  0.9443641         0.9947228         0.9585232          
##   0.9547347  0.9517668  0.9471419         0.9949627         0.9646961          
##   0.9547086  0.9514528  0.9471419         0.9948696         0.9643454          
##   0.9574526  0.9547834  0.9503165         0.9952337         0.9671652          
##   0.9547364  0.9506049  0.9471419         0.9949753         0.9627208          
##   0.9493115  0.9430599  0.9406604         0.9944333         0.9584334          
##   0.9547731  0.9498763  0.9451698         0.9948818         0.9637293          
##   0.9467095  0.9441154  0.9381351         0.9939638         0.9594943          
##   0.9521754  0.9483727  0.9444964         0.9946203         0.9617341          
##   0.9465950  0.9419136  0.9380149         0.9940629         0.9564255          
##   0.9410508  0.9319414  0.9266512         0.9933161         0.9534315          
##   0.9436815  0.9360445  0.9294290         0.9935650         0.9584932          
##   0.9381994  0.9319636  0.9265069         0.9929169         0.9508170          
##   0.9436815  0.9360445  0.9294290         0.9935650         0.9584932          
##   0.9465584  0.9416541  0.9356339         0.9939512         0.9597521          
##   0.9467242  0.9409176  0.9360307         0.9940693         0.9576029          
##   0.9600488  0.9574030  0.9515190         0.9954951         0.9689134          
##   0.9465329  0.9404979  0.9336498         0.9940538         0.9584812          
##   0.9574411  0.9558994  0.9508456         0.9952337         0.9669183          
##   0.9465631  0.9424676  0.9384117         0.9940632         0.9553948          
##   0.9521145  0.9471322  0.9443641         0.9947228         0.9585232          
##   0.9547347  0.9517668  0.9471419         0.9949627         0.9646961          
##   0.9547086  0.9514528  0.9471419         0.9948696         0.9643454          
##   0.9574526  0.9547834  0.9503165         0.9952337         0.9671652          
##   0.9547364  0.9506049  0.9471419         0.9949753         0.9627208          
##   0.9493115  0.9430599  0.9406604         0.9944333         0.9584334          
##   0.9547731  0.9498763  0.9451698         0.9948818         0.9637293          
##   0.9467095  0.9441154  0.9381351         0.9939638         0.9594943          
##   0.9521754  0.9483727  0.9444964         0.9946203         0.9617341          
##   0.9465950  0.9419136  0.9380149         0.9940629         0.9564255          
##   0.9410508  0.9319414  0.9266512         0.9933161         0.9534315          
##   0.9436815  0.9360445  0.9294290         0.9935650         0.9584932          
##   0.9381994  0.9319636  0.9265069         0.9929169         0.9508170          
##   0.9436815  0.9360445  0.9294290         0.9935650         0.9584932          
##   0.9465584  0.9416541  0.9356339         0.9939512         0.9597521          
##   0.9467242  0.9409176  0.9360307         0.9940693         0.9576029          
##   0.9600488  0.9574030  0.9515190         0.9954951         0.9689134          
##   0.9465329  0.9404979  0.9336498         0.9940538         0.9584812          
##   0.9574411  0.9558994  0.9508456         0.9952337         0.9669183          
##   0.9465631  0.9424676  0.9384117         0.9940632         0.9553948          
##   0.9521145  0.9471322  0.9443641         0.9947228         0.9585232          
##   0.9547347  0.9517668  0.9471419         0.9949627         0.9646961          
##   0.9547086  0.9514528  0.9471419         0.9948696         0.9643454          
##   0.9574526  0.9547834  0.9503165         0.9952337         0.9671652          
##   0.9547364  0.9506049  0.9471419         0.9949753         0.9627208          
##   0.9493115  0.9430599  0.9406604         0.9944333         0.9584334          
##   0.9547731  0.9498763  0.9451698         0.9948818         0.9637293          
##   0.9467095  0.9441154  0.9381351         0.9939638         0.9594943          
##   0.9521754  0.9483727  0.9444964         0.9946203         0.9617341          
##   0.9465950  0.9419136  0.9380149         0.9940629         0.9564255          
##   Mean_Neg_Pred_Value  Mean_Precision  Mean_Recall  Mean_Detection_Rate
##   0.9936021            0.9502527       0.9228032    0.1052813          
##   0.9949004            0.9653958       0.9388085    0.1065131          
##   0.9948801            0.9617274       0.9388085    0.1065131          
##   0.9949004            0.9653958       0.9388085    0.1065131          
##   0.9951416            0.9659602       0.9425122    0.1067573          
##   0.9948805            0.9605659       0.9388085    0.1065184          
##   0.9945979            0.9559394       0.9341789    0.1062715          
##   0.9946004            0.9584812       0.9336498    0.1062662          
##   0.9945979            0.9559394       0.9341789    0.1062715          
##   0.9943450            0.9547775       0.9304752    0.1060273          
##   0.9953961            0.9627208       0.9471419    0.1070043          
##   0.9950260            0.9628119       0.9450134    0.1067626          
##   0.9953961            0.9643454       0.9471419    0.1070043          
##   0.9950260            0.9628119       0.9450134    0.1067626          
##   0.9950289            0.9638305       0.9444843    0.1067573          
##   0.9942379            0.9534252       0.9373415    0.1060325          
##   0.9947792            0.9614550       0.9418388    0.1065236          
##   0.9945054            0.9594943       0.9381351    0.1062794          
##   0.9947792            0.9614550       0.9418388    0.1065236          
##   0.9943968            0.9594732       0.9406363    0.1062794          
##   0.9941190            0.9570593       0.9287556    0.1057751          
##   0.9946357            0.9627062       0.9356339    0.1062662          
##   0.9938433            0.9508170       0.9265069    0.1055255          
##   0.9946357            0.9627062       0.9356339    0.1062662          
##   0.9951416            0.9659602       0.9425122    0.1067573          
##   0.9946159            0.9576029       0.9360307    0.1062794          
##   0.9953751            0.9641405       0.9484646    0.1070043          
##   0.9946004            0.9584812       0.9336498    0.1062662          
##   0.9953751            0.9641405       0.9484646    0.1070043          
##   0.9943450            0.9547775       0.9304752    0.1060273          
##   0.9951315            0.9585232       0.9443641    0.1067653          
##   0.9955491            0.9677502       0.9533468    0.1072564          
##   0.9953961            0.9643454       0.9471419    0.1070043          
##   0.9959074            0.9693874       0.9540202    0.1074954          
##   0.9950289            0.9638305       0.9444843    0.1067573          
##   0.9951283            0.9602516       0.9443641    0.1067573          
##   0.9953024            0.9663932       0.9501722    0.1070175          
##   0.9945054            0.9594943       0.9381351    0.1062794          
##   0.9956606            0.9680304       0.9508456    0.1072564          
##   0.9943968            0.9594732       0.9406363    0.1062794          
##   0.9941190            0.9570593       0.9287556    0.1057751          
##   0.9943742            0.9584734       0.9319302    0.1060220          
##   0.9938433            0.9508170       0.9265069    0.1055255          
##   0.9943742            0.9584734       0.9319302    0.1060220          
##   0.9948801            0.9617274       0.9388085    0.1065131          
##   0.9946159            0.9576029       0.9360307    0.1062794          
##   0.9953751            0.9641405       0.9484646    0.1070043          
##   0.9946004            0.9584812       0.9336498    0.1062662          
##   0.9953751            0.9641405       0.9484646    0.1070043          
##   0.9945960            0.9553948       0.9384117    0.1062662          
##   0.9951315            0.9585232       0.9443641    0.1067653          
##   0.9959074            0.9693874       0.9540202    0.1074954          
##   0.9953961            0.9643454       0.9471419    0.1070043          
##   0.9959074            0.9693874       0.9540202    0.1074954          
##   0.9956428            0.9649430       0.9508456    0.1072485          
##   0.9951283            0.9602516       0.9443641    0.1067573          
##   0.9954131            0.9654378       0.9476710    0.1070175          
##   0.9945054            0.9594943       0.9381351    0.1062794          
##   0.9954131            0.9654378       0.9476710    0.1070175          
##   0.9951283            0.9611169       0.9448932    0.1067626          
##   0.9943776            0.9561895       0.9319302    0.1060220          
##   0.9943742            0.9584734       0.9319302    0.1060220          
##   0.9938433            0.9508170       0.9265069    0.1055255          
##   0.9943742            0.9584734       0.9319302    0.1060220          
##   0.9948801            0.9617274       0.9388085    0.1065131          
##   0.9946159            0.9576029       0.9360307    0.1062794          
##   0.9957486            0.9664443       0.9487412    0.1072432          
##   0.9946004            0.9584812       0.9336498    0.1062662          
##   0.9953783            0.9644491       0.9480678    0.1070043          
##   0.9945960            0.9553948       0.9384117    0.1062662          
##   0.9951315            0.9585232       0.9443641    0.1067653          
##   0.9959074            0.9693874       0.9540202    0.1074954          
##   0.9953961            0.9643454       0.9471419    0.1070043          
##   0.9959074            0.9693874       0.9540202    0.1074954          
##   0.9956428            0.9649430       0.9508456    0.1072485          
##   0.9948872            0.9584334       0.9406604    0.1065131          
##   0.9954131            0.9654378       0.9476710    0.1070175          
##   0.9945054            0.9594943       0.9381351    0.1062794          
##   0.9951691            0.9617341       0.9444964    0.1067785          
##   0.9951283            0.9611169       0.9448932    0.1067626          
##   0.9942369            0.9534315       0.9266512    0.1057831          
##   0.9944950            0.9584932       0.9294290    0.1060220          
##   0.9938433            0.9508170       0.9265069    0.1055255          
##   0.9944950            0.9584932       0.9294290    0.1060220          
##   0.9946305            0.9597521       0.9356339    0.1062742          
##   0.9946159            0.9576029       0.9360307    0.1062794          
##   0.9957486            0.9664443       0.9487412    0.1072432          
##   0.9946004            0.9584812       0.9336498    0.1062662          
##   0.9953783            0.9644491       0.9480678    0.1070043          
##   0.9945960            0.9553948       0.9384117    0.1062662          
##   0.9951315            0.9585232       0.9443641    0.1067653          
##   0.9956428            0.9669183       0.9508456    0.1072485          
##   0.9953961            0.9643454       0.9471419    0.1070043          
##   0.9959074            0.9693874       0.9540202    0.1074954          
##   0.9953961            0.9627208       0.9471419    0.1070043          
##   0.9948872            0.9584334       0.9406604    0.1065131          
##   0.9955394            0.9637293       0.9451698    0.1070175          
##   0.9945054            0.9594943       0.9381351    0.1062794          
##   0.9951691            0.9617341       0.9444964    0.1067785          
##   0.9948816            0.9588946       0.9411895    0.1065184          
##   0.9942369            0.9534315       0.9266512    0.1057831          
##   0.9944950            0.9584932       0.9294290    0.1060220          
##   0.9938433            0.9508170       0.9265069    0.1055255          
##   0.9944950            0.9584932       0.9294290    0.1060220          
##   0.9946305            0.9597521       0.9356339    0.1062742          
##   0.9946159            0.9576029       0.9360307    0.1062794          
##   0.9960132            0.9689134       0.9515190    0.1074874          
##   0.9946004            0.9584812       0.9336498    0.1062662          
##   0.9956428            0.9669183       0.9508456    0.1072485          
##   0.9945960            0.9553948       0.9384117    0.1062662          
##   0.9951315            0.9585232       0.9443641    0.1067653          
##   0.9953961            0.9646961       0.9471419    0.1070043          
##   0.9953961            0.9643454       0.9471419    0.1070043          
##   0.9956606            0.9671652       0.9503165    0.1072512          
##   0.9953961            0.9627208       0.9471419    0.1070043          
##   0.9948872            0.9584334       0.9406604    0.1065131          
##   0.9955394            0.9637293       0.9451698    0.1070175          
##   0.9945054            0.9594943       0.9381351    0.1062794          
##   0.9951691            0.9617341       0.9444964    0.1067785          
##   0.9946233            0.9564255       0.9380149    0.1062715          
##   0.9942369            0.9534315       0.9266512    0.1057831          
##   0.9944950            0.9584932       0.9294290    0.1060220          
##   0.9938433            0.9508170       0.9265069    0.1055255          
##   0.9944950            0.9584932       0.9294290    0.1060220          
##   0.9946305            0.9597521       0.9356339    0.1062742          
##   0.9946159            0.9576029       0.9360307    0.1062794          
##   0.9960132            0.9689134       0.9515190    0.1074874          
##   0.9946004            0.9584812       0.9336498    0.1062662          
##   0.9956428            0.9669183       0.9508456    0.1072485          
##   0.9945960            0.9553948       0.9384117    0.1062662          
##   0.9951315            0.9585232       0.9443641    0.1067653          
##   0.9953961            0.9646961       0.9471419    0.1070043          
##   0.9953961            0.9643454       0.9471419    0.1070043          
##   0.9956606            0.9671652       0.9503165    0.1072512          
##   0.9953961            0.9627208       0.9471419    0.1070043          
##   0.9948872            0.9584334       0.9406604    0.1065131          
##   0.9955394            0.9637293       0.9451698    0.1070175          
##   0.9945054            0.9594943       0.9381351    0.1062794          
##   0.9951691            0.9617341       0.9444964    0.1067785          
##   0.9946233            0.9564255       0.9380149    0.1062715          
##   0.9942369            0.9534315       0.9266512    0.1057831          
##   0.9944950            0.9584932       0.9294290    0.1060220          
##   0.9938433            0.9508170       0.9265069    0.1055255          
##   0.9944950            0.9584932       0.9294290    0.1060220          
##   0.9946305            0.9597521       0.9356339    0.1062742          
##   0.9946159            0.9576029       0.9360307    0.1062794          
##   0.9960132            0.9689134       0.9515190    0.1074874          
##   0.9946004            0.9584812       0.9336498    0.1062662          
##   0.9956428            0.9669183       0.9508456    0.1072485          
##   0.9945960            0.9553948       0.9384117    0.1062662          
##   0.9951315            0.9585232       0.9443641    0.1067653          
##   0.9953961            0.9646961       0.9471419    0.1070043          
##   0.9953961            0.9643454       0.9471419    0.1070043          
##   0.9956606            0.9671652       0.9503165    0.1072512          
##   0.9953961            0.9627208       0.9471419    0.1070043          
##   0.9948872            0.9584334       0.9406604    0.1065131          
##   0.9955394            0.9637293       0.9451698    0.1070175          
##   0.9945054            0.9594943       0.9381351    0.1062794          
##   0.9951691            0.9617341       0.9444964    0.1067785          
##   0.9946233            0.9564255       0.9380149    0.1062715          
##   Mean_Balanced_Accuracy
##   0.9576718             
##   0.9664514             
##   0.9665106             
##   0.9664514             
##   0.9684916             
##   0.9665696             
##   0.9641755             
##   0.9638518             
##   0.9641755             
##   0.9621353             
##   0.9710586             
##   0.9698606             
##   0.9710057             
##   0.9698606             
##   0.9695384             
##   0.9655730             
##   0.9680881             
##   0.9660495             
##   0.9680881             
##   0.9673001             
##   0.9609687             
##   0.9647286             
##   0.9597119             
##   0.9647286             
##   0.9684916             
##   0.9650500             
##   0.9717169             
##   0.9638518             
##   0.9717169             
##   0.9621353             
##   0.9695434             
##   0.9742950             
##   0.9710057             
##   0.9747624             
##   0.9695384             
##   0.9695342             
##   0.9725225             
##   0.9660495             
##   0.9729899             
##   0.9673001             
##   0.9609687             
##   0.9627476             
##   0.9597119             
##   0.9627476             
##   0.9665106             
##   0.9650500             
##   0.9717169             
##   0.9638518             
##   0.9717169             
##   0.9662374             
##   0.9695434             
##   0.9747624             
##   0.9710057             
##   0.9747624             
##   0.9730459             
##   0.9695342             
##   0.9712719             
##   0.9660495             
##   0.9712719             
##   0.9697490             
##   0.9627539             
##   0.9627476             
##   0.9597119             
##   0.9627476             
##   0.9665106             
##   0.9650500             
##   0.9719843             
##   0.9638518             
##   0.9715169             
##   0.9662374             
##   0.9695434             
##   0.9747624             
##   0.9710057             
##   0.9747624             
##   0.9730459             
##   0.9675468             
##   0.9712719             
##   0.9660495             
##   0.9695583             
##   0.9697490             
##   0.9599837             
##   0.9614970             
##   0.9597119             
##   0.9614970             
##   0.9647925             
##   0.9650500             
##   0.9719843             
##   0.9638518             
##   0.9715169             
##   0.9662374             
##   0.9695434             
##   0.9730396             
##   0.9710057             
##   0.9747624             
##   0.9710586             
##   0.9675468             
##   0.9700258             
##   0.9660495             
##   0.9695583             
##   0.9677617             
##   0.9599837             
##   0.9614970             
##   0.9597119             
##   0.9614970             
##   0.9647925             
##   0.9650500             
##   0.9735070             
##   0.9638518             
##   0.9730396             
##   0.9662374             
##   0.9695434             
##   0.9710523             
##   0.9710057             
##   0.9727751             
##   0.9710586             
##   0.9675468             
##   0.9700258             
##   0.9660495             
##   0.9695583             
##   0.9660389             
##   0.9599837             
##   0.9614970             
##   0.9597119             
##   0.9614970             
##   0.9647925             
##   0.9650500             
##   0.9735070             
##   0.9638518             
##   0.9730396             
##   0.9662374             
##   0.9695434             
##   0.9710523             
##   0.9710057             
##   0.9727751             
##   0.9710586             
##   0.9675468             
##   0.9700258             
##   0.9660495             
##   0.9695583             
##   0.9660389             
##   0.9599837             
##   0.9614970             
##   0.9597119             
##   0.9614970             
##   0.9647925             
##   0.9650500             
##   0.9735070             
##   0.9638518             
##   0.9730396             
##   0.9662374             
##   0.9695434             
##   0.9710523             
##   0.9710057             
##   0.9727751             
##   0.9710586             
##   0.9675468             
##   0.9700258             
##   0.9660495             
##   0.9695583             
##   0.9660389             
## 
## Kappa was used to select the optimal model using the largest value.
## The final values used for the model were kmax = 7, distance = 3 and kernel
##  = epanechnikov.
head(kknn_fit$results[order(-kknn_fit$results$Kappa),],20)
##     kmax distance       kernel   logLoss       AUC     prAUC  Accuracy
## 34     4        3 epanechnikov 0.5796553 0.9841954 0.1979412 0.9674583
## 52     5        3   triangular 0.5789736 0.9842243 0.2015382 0.9674583
## 54     5        3 epanechnikov 0.5801737 0.9841954 0.2013924 0.9674583
## 72     6        3   triangular 0.5789736 0.9842243 0.2015382 0.9674583
## 74     6        3 epanechnikov 0.5801737 0.9841954 0.2013924 0.9674583
## 94     7        3 epanechnikov 0.5801737 0.9841954 0.2013924 0.9674583
## 107    8        2   triangular 0.3758316 0.9904248 0.3008972 0.9673867
## 127    9        2   triangular 0.3758316 0.9904248 0.3008972 0.9673867
## 147   10        2   triangular 0.3780721 0.9903594 0.3031986 0.9673867
## 32     4        3   triangular 0.5827790 0.9841692 0.1901016 0.9653078
## 114    8        3 epanechnikov 0.5850546 0.9838233 0.2361733 0.9652605
## 134    9        3 epanechnikov 0.5850546 0.9838233 0.2361733 0.9652605
## 154   10        3 epanechnikov 0.5850546 0.9838233 0.2361733 0.9652605
## 39     4        4 epanechnikov 0.5888310 0.9842318 0.1879714 0.9653078
## 55     5        3      optimal 0.5178505 0.9857641 0.2287803 0.9652361
## 75     6        3      optimal 0.5178505 0.9857641 0.2287803 0.9652361
## 92     7        3   triangular 0.5870414 0.9839393 0.2360605 0.9652361
## 109    8        2 epanechnikov 0.4447955 0.9879179 0.2595428 0.9652361
## 129    9        2 epanechnikov 0.4447955 0.9879179 0.2595428 0.9652361
## 149   10        2 epanechnikov 0.4447955 0.9879179 0.2595428 0.9652361
##         Kappa   Mean_F1 Mean_Sensitivity Mean_Specificity Mean_Pos_Pred_Value
## 34  0.9601589 0.9589160        0.9540202        0.9955047           0.9693874
## 52  0.9601589 0.9589160        0.9540202        0.9955047           0.9693874
## 54  0.9601589 0.9589160        0.9540202        0.9955047           0.9693874
## 72  0.9601589 0.9589160        0.9540202        0.9955047           0.9693874
## 74  0.9601589 0.9589160        0.9540202        0.9955047           0.9693874
## 94  0.9601589 0.9589160        0.9540202        0.9955047           0.9693874
## 107 0.9600488 0.9574030        0.9515190        0.9954951           0.9689134
## 127 0.9600488 0.9574030        0.9515190        0.9954951           0.9689134
## 147 0.9600488 0.9574030        0.9515190        0.9954951           0.9689134
## 32  0.9575694 0.9575241        0.9533468        0.9952432           0.9677502
## 114 0.9574526 0.9547834        0.9503165        0.9952337           0.9671652
## 134 0.9574526 0.9547834        0.9503165        0.9952337           0.9671652
## 154 0.9574526 0.9547834        0.9503165        0.9952337           0.9671652
## 39  0.9574510 0.9563148        0.9508456        0.9951343           0.9680304
## 55  0.9574427 0.9547374        0.9508456        0.9952463           0.9649430
## 75  0.9574427 0.9547374        0.9508456        0.9952463           0.9649430
## 92  0.9574411 0.9558994        0.9508456        0.9952337           0.9669183
## 109 0.9574411 0.9558994        0.9508456        0.9952337           0.9669183
## 129 0.9574411 0.9558994        0.9508456        0.9952337           0.9669183
## 149 0.9574411 0.9558994        0.9508456        0.9952337           0.9669183
##     Mean_Neg_Pred_Value Mean_Precision Mean_Recall Mean_Detection_Rate
## 34            0.9959074      0.9693874   0.9540202           0.1074954
## 52            0.9959074      0.9693874   0.9540202           0.1074954
## 54            0.9959074      0.9693874   0.9540202           0.1074954
## 72            0.9959074      0.9693874   0.9540202           0.1074954
## 74            0.9959074      0.9693874   0.9540202           0.1074954
## 94            0.9959074      0.9693874   0.9540202           0.1074954
## 107           0.9960132      0.9689134   0.9515190           0.1074874
## 127           0.9960132      0.9689134   0.9515190           0.1074874
## 147           0.9960132      0.9689134   0.9515190           0.1074874
## 32            0.9955491      0.9677502   0.9533468           0.1072564
## 114           0.9956606      0.9671652   0.9503165           0.1072512
## 134           0.9956606      0.9671652   0.9503165           0.1072512
## 154           0.9956606      0.9671652   0.9503165           0.1072512
## 39            0.9956606      0.9680304   0.9508456           0.1072564
## 55            0.9956428      0.9649430   0.9508456           0.1072485
## 75            0.9956428      0.9649430   0.9508456           0.1072485
## 92            0.9956428      0.9669183   0.9508456           0.1072485
## 109           0.9956428      0.9669183   0.9508456           0.1072485
## 129           0.9956428      0.9669183   0.9508456           0.1072485
## 149           0.9956428      0.9669183   0.9508456           0.1072485
##     Mean_Balanced_Accuracy logLossSD      AUCSD    prAUCSD AccuracySD
## 34               0.9747624 0.5065049 0.01516802 0.04675295 0.02410024
## 52               0.9747624 0.5067785 0.01519625 0.04409328 0.02410024
## 54               0.9747624 0.5058397 0.01516802 0.04360954 0.02410024
## 72               0.9747624 0.5067785 0.01519625 0.04409328 0.02410024
## 74               0.9747624 0.5058397 0.01516802 0.04360954 0.02410024
## 94               0.9747624 0.5058397 0.01516802 0.04360954 0.02410024
## 107              0.9735070 0.4807288 0.01474440 0.06293014 0.02147841
## 127              0.9735070 0.4807288 0.01474440 0.06293014 0.02147841
## 147              0.9735070 0.4812466 0.01476486 0.06411596 0.02147841
## 32               0.9742950 0.5019539 0.01512556 0.05784717 0.02454426
## 114              0.9727751 0.5085178 0.01536931 0.06260142 0.02338656
## 134              0.9727751 0.5085178 0.01536931 0.06260142 0.02338656
## 154              0.9727751 0.5085178 0.01536931 0.06260142 0.02338656
## 39               0.9729899 0.5174999 0.01525469 0.05288171 0.02785411
## 55               0.9730459 0.4871126 0.01475850 0.04165115 0.02060888
## 75               0.9730459 0.4871126 0.01475850 0.04165115 0.02060888
## 92               0.9730396 0.5016083 0.01534698 0.05920592 0.02060888
## 109              0.9730396 0.4569198 0.01388791 0.05076053 0.02060888
## 129              0.9730396 0.4569198 0.01388791 0.05076053 0.02060888
## 149              0.9730396 0.4569198 0.01388791 0.05076053 0.02060888
##        KappaSD  Mean_F1SD Mean_SensitivitySD Mean_SpecificitySD
## 34  0.02953297 0.03356604         0.03653540        0.003068656
## 52  0.02953297 0.03356604         0.03653540        0.003068656
## 54  0.02953297 0.03356604         0.03653540        0.003068656
## 72  0.02953297 0.03356604         0.03653540        0.003068656
## 74  0.02953297 0.03356604         0.03653540        0.003068656
## 94  0.02953297 0.03356604         0.03653540        0.003068656
## 107 0.02632590 0.03017864         0.03234521        0.002697432
## 127 0.02632590 0.03017864         0.03234521        0.002697432
## 147 0.02632590 0.03017864         0.03234521        0.002697432
## 32  0.03004726 0.03335446         0.03631541        0.003120799
## 114 0.02867263 0.03243434         0.03634032        0.003047961
## 134 0.02867263 0.03243434         0.03634032        0.003047961
## 154 0.02867263 0.03243434         0.03634032        0.003047961
## 39  0.03428097 0.03836147         0.04255770        0.003700350
## 55  0.02526948 0.02801432         0.03194224        0.002616590
## 75  0.02526948 0.02801432         0.03194224        0.002616590
## 92  0.02526713 0.02936569         0.03194224        0.002595814
## 109 0.02526713 0.02936569         0.03194224        0.002595814
## 129 0.02526713 0.02936569         0.03194224        0.002595814
## 149 0.02526713 0.02936569         0.03194224        0.002595814
##     Mean_Pos_Pred_ValueSD Mean_Neg_Pred_ValueSD Mean_PrecisionSD Mean_RecallSD
## 34             0.02980704           0.002974632       0.02980704    0.03653540
## 52             0.02980704           0.002974632       0.02980704    0.03653540
## 54             0.02980704           0.002974632       0.02980704    0.03653540
## 72             0.02980704           0.002974632       0.02980704    0.03653540
## 74             0.02980704           0.002974632       0.02980704    0.03653540
## 94             0.02980704           0.002974632       0.02980704    0.03653540
## 107            0.02805797           0.002675637       0.02805797    0.03234521
## 127            0.02805797           0.002675637       0.02805797    0.03234521
## 147            0.02805797           0.002675637       0.02805797    0.03234521
## 32             0.02968672           0.003121670       0.02968672    0.03631541
## 114            0.02791759           0.002864250       0.02791759    0.03634032
## 134            0.02791759           0.002864250       0.02791759    0.03634032
## 154            0.02791759           0.002864250       0.02791759    0.03634032
## 39             0.03217536           0.003376896       0.03217536    0.04255770
## 55             0.02543787           0.002547861       0.02543787    0.03194224
## 75             0.02543787           0.002547861       0.02543787    0.03194224
## 92             0.02701807           0.002547861       0.02701807    0.03194224
## 109            0.02701807           0.002547861       0.02701807    0.03194224
## 129            0.02701807           0.002547861       0.02701807    0.03194224
## 149            0.02701807           0.002547861       0.02701807    0.03194224
##     Mean_Detection_RateSD Mean_Balanced_AccuracySD
## 34            0.002677804               0.01976991
## 52            0.002677804               0.01976991
## 54            0.002677804               0.01976991
## 72            0.002677804               0.01976991
## 74            0.002677804               0.01976991
## 94            0.002677804               0.01976991
## 107           0.002386489               0.01750609
## 127           0.002386489               0.01750609
## 147           0.002386489               0.01750609
## 32            0.002727140               0.01963608
## 114           0.002598506               0.01966141
## 134           0.002598506               0.01966141
## 154           0.002598506               0.01966141
## 39            0.003094901               0.02310962
## 55            0.002289875               0.01724183
## 75            0.002289875               0.01724183
## 92            0.002289875               0.01723344
## 109           0.002289875               0.01723344
## 129           0.002289875               0.01723344
## 149           0.002289875               0.01723344

Parim leitud mudel:

kknn_fit$bestTune
##    kmax distance       kernel
## 94    7        3 epanechnikov

Parima mudeli parameetrid: kmax = 7, distance = 3, kernel = epanechnikov.

Samaväärsed mudelid:

datatable(head(kknn_fit$results[order(-kknn_fit$results$Kappa),],6),options = list(scrollX = TRUE,dom = 'ltip',ordering=F))

Klassifitseerimise tulemused treenimisandmetel:

confusionMatrix(predict(kknn_fit),olive.train$area,mode = "everything")
## Confusion Matrix and Statistics
## 
##                  Reference
## Prediction        Calabria Coast.Sardinia East.Liguria Inland.Sardinia
##   Calabria              41              0            0               0
##   Coast.Sardinia         0             24            0               0
##   East.Liguria           0              0           41               0
##   Inland.Sardinia        0              1            0              55
##   North.Apulia           0              0            0               0
##   Sicily                 0              0            0               0
##   South.Apulia           0              0            0               0
##   Umbria                 0              0            0               0
##   West.Liguria           0              0            0               0
##                  Reference
## Prediction        North.Apulia Sicily South.Apulia Umbria West.Liguria
##   Calabria                   0      1            0      0            0
##   Coast.Sardinia             0      0            0      0            0
##   East.Liguria               0      0            0      0            0
##   Inland.Sardinia            0      0            0      0            0
##   North.Apulia              22      1            0      0            0
##   Sicily                     0     30            0      0            0
##   South.Apulia               0      1          162      0            0
##   Umbria                     0      0            0     40            0
##   West.Liguria               0      0            0      0           39
## 
## Overall Statistics
##                                           
##                Accuracy : 0.9913          
##                  95% CI : (0.9778, 0.9976)
##     No Information Rate : 0.3537          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.9893          
##                                           
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: Calabria Class: Coast.Sardinia Class: East.Liguria
## Sensitivity                  1.00000               0.96000             1.00000
## Specificity                  0.99760               1.00000             1.00000
## Pos Pred Value               0.97619               1.00000             1.00000
## Neg Pred Value               1.00000               0.99770             1.00000
## Precision                    0.97619               1.00000             1.00000
## Recall                       1.00000               0.96000             1.00000
## F1                           0.98795               0.97959             1.00000
## Prevalence                   0.08952               0.05459             0.08952
## Detection Rate               0.08952               0.05240             0.08952
## Detection Prevalence         0.09170               0.05240             0.08952
## Balanced Accuracy            0.99880               0.98000             1.00000
##                      Class: Inland.Sardinia Class: North.Apulia Class: Sicily
## Sensitivity                          1.0000             1.00000       0.90909
## Specificity                          0.9975             0.99771       1.00000
## Pos Pred Value                       0.9821             0.95652       1.00000
## Neg Pred Value                       1.0000             1.00000       0.99299
## Precision                            0.9821             0.95652       1.00000
## Recall                               1.0000             1.00000       0.90909
## F1                                   0.9910             0.97778       0.95238
## Prevalence                           0.1201             0.04803       0.07205
## Detection Rate                       0.1201             0.04803       0.06550
## Detection Prevalence                 0.1223             0.05022       0.06550
## Balanced Accuracy                    0.9988             0.99885       0.95455
##                      Class: South.Apulia Class: Umbria Class: West.Liguria
## Sensitivity                       1.0000       1.00000             1.00000
## Specificity                       0.9966       1.00000             1.00000
## Pos Pred Value                    0.9939       1.00000             1.00000
## Neg Pred Value                    1.0000       1.00000             1.00000
## Precision                         0.9939       1.00000             1.00000
## Recall                            1.0000       1.00000             1.00000
## F1                                0.9969       1.00000             1.00000
## Prevalence                        0.3537       0.08734             0.08515
## Detection Rate                    0.3537       0.08734             0.08515
## Detection Prevalence              0.3559       0.08734             0.08515
## Balanced Accuracy                 0.9983       1.00000             1.00000

Treenimisandmetel Kappa=0.9893. Testandmetel:

confusionMatrix(predict(kknn_fit, olive.test.scale),olive.test$area,mode = "everything")
## Confusion Matrix and Statistics
## 
##                  Reference
## Prediction        Calabria Coast.Sardinia East.Liguria Inland.Sardinia
##   Calabria              15              0            0               0
##   Coast.Sardinia         0              7            0               0
##   East.Liguria           0              0            7               0
##   Inland.Sardinia        0              1            0              10
##   North.Apulia           0              0            0               0
##   Sicily                 0              0            0               0
##   South.Apulia           0              0            0               0
##   Umbria                 0              0            0               0
##   West.Liguria           0              0            2               0
##                  Reference
## Prediction        North.Apulia Sicily South.Apulia Umbria West.Liguria
##   Calabria                   0      1            1      0            0
##   Coast.Sardinia             0      0            0      0            0
##   East.Liguria               0      0            0      0            1
##   Inland.Sardinia            0      0            0      0            0
##   North.Apulia               3      0            0      0            0
##   Sicily                     0      2            2      0            0
##   South.Apulia               0      0           41      0            0
##   Umbria                     0      0            0     11            0
##   West.Liguria               0      0            0      0           10
## 
## Overall Statistics
##                                           
##                Accuracy : 0.9298          
##                  95% CI : (0.8664, 0.9692)
##     No Information Rate : 0.386           
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.9125          
##                                           
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: Calabria Class: Coast.Sardinia Class: East.Liguria
## Sensitivity                   1.0000               0.87500             0.77778
## Specificity                   0.9798               1.00000             0.99048
## Pos Pred Value                0.8824               1.00000             0.87500
## Neg Pred Value                1.0000               0.99065             0.98113
## Precision                     0.8824               1.00000             0.87500
## Recall                        1.0000               0.87500             0.77778
## F1                            0.9375               0.93333             0.82353
## Prevalence                    0.1316               0.07018             0.07895
## Detection Rate                0.1316               0.06140             0.06140
## Detection Prevalence          0.1491               0.06140             0.07018
## Balanced Accuracy             0.9899               0.93750             0.88413
##                      Class: Inland.Sardinia Class: North.Apulia Class: Sicily
## Sensitivity                         1.00000             1.00000       0.66667
## Specificity                         0.99038             1.00000       0.98198
## Pos Pred Value                      0.90909             1.00000       0.50000
## Neg Pred Value                      1.00000             1.00000       0.99091
## Precision                           0.90909             1.00000       0.50000
## Recall                              1.00000             1.00000       0.66667
## F1                                  0.95238             1.00000       0.57143
## Prevalence                          0.08772             0.02632       0.02632
## Detection Rate                      0.08772             0.02632       0.01754
## Detection Prevalence                0.09649             0.02632       0.03509
## Balanced Accuracy                   0.99519             1.00000       0.82432
##                      Class: South.Apulia Class: Umbria Class: West.Liguria
## Sensitivity                       0.9318       1.00000             0.90909
## Specificity                       1.0000       1.00000             0.98058
## Pos Pred Value                    1.0000       1.00000             0.83333
## Neg Pred Value                    0.9589       1.00000             0.99020
## Precision                         1.0000       1.00000             0.83333
## Recall                            0.9318       1.00000             0.90909
## F1                                0.9647       1.00000             0.86957
## Prevalence                        0.3860       0.09649             0.09649
## Detection Rate                    0.3596       0.09649             0.08772
## Detection Prevalence              0.3596       0.09649             0.10526
## Balanced Accuracy                 0.9659       1.00000             0.94484

Testandmetel Kappa=0.9125.

Muutujate tähtsus KNN klassifitseerimisel

plot(varImp(kknn_fit))

Atribuutide tähtsus erinevates klassides on erinev.

Naïve Bayesi klassifikaator (NB)

Naïve Bayesi klassifitseerimine põhineb Bayesi reeglil, kusjuures Klass määratakse suurima klassi kuulumise tõenäosusega ja klassi tinglik tõenäosus määratakse Bayesi valemiga.

Andmete ettevalmistamisel arvulised tunnused nõuvad skaleerimist, mittearvulistest dummy variables moodustamist. Vt nt https://uc-r.github.io/naive_bayes.

Kasutame paketti caret NB klassikikaatori tuunimiseks. Funktsiooni trainControl korral määrame summaryFunction = multiClassSummary ja meetrikaks funktsioonis train määrame Kappa.

Mycluster = makeCluster(detectCores()-1)
registerDoParallel(Mycluster)
system.time({
RNGkind(sample.kind = "Rounding")
set.seed(123)
train.control <- trainControl(method = "cv",number = 5 ,search = "random",classProbs = TRUE,summaryFunction = multiClassSummary)
tuneGrid <- expand.grid(usekernel = c(FALSE,TRUE),fL = 0:5,adjust = 0:5)
RNGkind(sample.kind = "Rounding")
set.seed(123)
nb_fit <- train(area~.,olive.train.scale, method = 'nb',trControl = train.control,tuneGrid = tuneGrid,metric = "Kappa")
})
##    user  system elapsed 
##    1.11    0.01   28.78
stopCluster(Mycluster)
registerDoSEQ()
plot(nb_fit)

nb_fit
## Naive Bayes 
## 
## 458 samples
##   8 predictor
##   9 classes: 'Calabria', 'Coast.Sardinia', 'East.Liguria', 'Inland.Sardinia', 'North.Apulia', 'Sicily', 'South.Apulia', 'Umbria', 'West.Liguria' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 367, 365, 368, 367, 365 
## Resampling results across tuning parameters:
## 
##   usekernel  fL  adjust  logLoss    AUC        prAUC      Accuracy   Kappa    
##   FALSE      0   0       0.2279951  0.9945969  0.8278884  0.9543172  0.9441053
##   FALSE      0   1       0.2279951  0.9945969  0.8278884  0.9543172  0.9441053
##   FALSE      0   2       0.2279951  0.9945969  0.8278884  0.9543172  0.9441053
##   FALSE      0   3       0.2279951  0.9945969  0.8278884  0.9543172  0.9441053
##   FALSE      0   4       0.2279951  0.9945969  0.8278884  0.9543172  0.9441053
##   FALSE      0   5       0.2279951  0.9945969  0.8278884  0.9543172  0.9441053
##   FALSE      1   0       0.2279951  0.9945969  0.8278884  0.9543172  0.9441053
##   FALSE      1   1       0.2279951  0.9945969  0.8278884  0.9543172  0.9441053
##   FALSE      1   2       0.2279951  0.9945969  0.8278884  0.9543172  0.9441053
##   FALSE      1   3       0.2279951  0.9945969  0.8278884  0.9543172  0.9441053
##   FALSE      1   4       0.2279951  0.9945969  0.8278884  0.9543172  0.9441053
##   FALSE      1   5       0.2279951  0.9945969  0.8278884  0.9543172  0.9441053
##   FALSE      2   0       0.2279951  0.9945969  0.8278884  0.9543172  0.9441053
##   FALSE      2   1       0.2279951  0.9945969  0.8278884  0.9543172  0.9441053
##   FALSE      2   2       0.2279951  0.9945969  0.8278884  0.9543172  0.9441053
##   FALSE      2   3       0.2279951  0.9945969  0.8278884  0.9543172  0.9441053
##   FALSE      2   4       0.2279951  0.9945969  0.8278884  0.9543172  0.9441053
##   FALSE      2   5       0.2279951  0.9945969  0.8278884  0.9543172  0.9441053
##   FALSE      3   0       0.2279951  0.9945969  0.8278884  0.9543172  0.9441053
##   FALSE      3   1       0.2279951  0.9945969  0.8278884  0.9543172  0.9441053
##   FALSE      3   2       0.2279951  0.9945969  0.8278884  0.9543172  0.9441053
##   FALSE      3   3       0.2279951  0.9945969  0.8278884  0.9543172  0.9441053
##   FALSE      3   4       0.2279951  0.9945969  0.8278884  0.9543172  0.9441053
##   FALSE      3   5       0.2279951  0.9945969  0.8278884  0.9543172  0.9441053
##   FALSE      4   0       0.2279951  0.9945969  0.8278884  0.9543172  0.9441053
##   FALSE      4   1       0.2279951  0.9945969  0.8278884  0.9543172  0.9441053
##   FALSE      4   2       0.2279951  0.9945969  0.8278884  0.9543172  0.9441053
##   FALSE      4   3       0.2279951  0.9945969  0.8278884  0.9543172  0.9441053
##   FALSE      4   4       0.2279951  0.9945969  0.8278884  0.9543172  0.9441053
##   FALSE      4   5       0.2279951  0.9945969  0.8278884  0.9543172  0.9441053
##   FALSE      5   0       0.2279951  0.9945969  0.8278884  0.9543172  0.9441053
##   FALSE      5   1       0.2279951  0.9945969  0.8278884  0.9543172  0.9441053
##   FALSE      5   2       0.2279951  0.9945969  0.8278884  0.9543172  0.9441053
##   FALSE      5   3       0.2279951  0.9945969  0.8278884  0.9543172  0.9441053
##   FALSE      5   4       0.2279951  0.9945969  0.8278884  0.9543172  0.9441053
##   FALSE      5   5       0.2279951  0.9945969  0.8278884  0.9543172  0.9441053
##    TRUE      0   0             NaN        NaN        NaN        NaN        NaN
##    TRUE      0   1       0.3298594  0.9912368  0.8179129  0.9324550  0.9175106
##    TRUE      0   2       0.2248097  0.9944869  0.8230938  0.9433511  0.9306576
##    TRUE      0   3       0.2281915  0.9939013  0.8154503  0.9279649  0.9115387
##    TRUE      0   4       0.2796767  0.9934008  0.8090756  0.9105699  0.8898588
##    TRUE      0   5       0.3450491  0.9912329  0.7884345  0.8996510  0.8761059
##    TRUE      1   0             NaN        NaN        NaN        NaN        NaN
##    TRUE      1   1       0.3298594  0.9912368  0.8179129  0.9324550  0.9175106
##    TRUE      1   2       0.2248097  0.9944869  0.8230938  0.9433511  0.9306576
##    TRUE      1   3       0.2281915  0.9939013  0.8154503  0.9279649  0.9115387
##    TRUE      1   4       0.2796767  0.9934008  0.8090756  0.9105699  0.8898588
##    TRUE      1   5       0.3450491  0.9912329  0.7884345  0.8996510  0.8761059
##    TRUE      2   0             NaN        NaN        NaN        NaN        NaN
##    TRUE      2   1       0.3298594  0.9912368  0.8179129  0.9324550  0.9175106
##    TRUE      2   2       0.2248097  0.9944869  0.8230938  0.9433511  0.9306576
##    TRUE      2   3       0.2281915  0.9939013  0.8154503  0.9279649  0.9115387
##    TRUE      2   4       0.2796767  0.9934008  0.8090756  0.9105699  0.8898588
##    TRUE      2   5       0.3450491  0.9912329  0.7884345  0.8996510  0.8761059
##    TRUE      3   0             NaN        NaN        NaN        NaN        NaN
##    TRUE      3   1       0.3298594  0.9912368  0.8179129  0.9324550  0.9175106
##    TRUE      3   2       0.2248097  0.9944869  0.8230938  0.9433511  0.9306576
##    TRUE      3   3       0.2281915  0.9939013  0.8154503  0.9279649  0.9115387
##    TRUE      3   4       0.2796767  0.9934008  0.8090756  0.9105699  0.8898588
##    TRUE      3   5       0.3450491  0.9912329  0.7884345  0.8996510  0.8761059
##    TRUE      4   0             NaN        NaN        NaN        NaN        NaN
##    TRUE      4   1       0.3298594  0.9912368  0.8179129  0.9324550  0.9175106
##    TRUE      4   2       0.2248097  0.9944869  0.8230938  0.9433511  0.9306576
##    TRUE      4   3       0.2281915  0.9939013  0.8154503  0.9279649  0.9115387
##    TRUE      4   4       0.2796767  0.9934008  0.8090756  0.9105699  0.8898588
##    TRUE      4   5       0.3450491  0.9912329  0.7884345  0.8996510  0.8761059
##    TRUE      5   0             NaN        NaN        NaN        NaN        NaN
##    TRUE      5   1       0.3298594  0.9912368  0.8179129  0.9324550  0.9175106
##    TRUE      5   2       0.2248097  0.9944869  0.8230938  0.9433511  0.9306576
##    TRUE      5   3       0.2281915  0.9939013  0.8154503  0.9279649  0.9115387
##    TRUE      5   4       0.2796767  0.9934008  0.8090756  0.9105699  0.8898588
##    TRUE      5   5       0.3450491  0.9912329  0.7884345  0.8996510  0.8761059
##   Mean_F1    Mean_Sensitivity  Mean_Specificity  Mean_Pos_Pred_Value
##   0.9323104  0.9333995         0.9940783         0.9403178          
##   0.9323104  0.9333995         0.9940783         0.9403178          
##   0.9323104  0.9333995         0.9940783         0.9403178          
##   0.9323104  0.9333995         0.9940783         0.9403178          
##   0.9323104  0.9333995         0.9940783         0.9403178          
##   0.9323104  0.9333995         0.9940783         0.9403178          
##   0.9323104  0.9333995         0.9940783         0.9403178          
##   0.9323104  0.9333995         0.9940783         0.9403178          
##   0.9323104  0.9333995         0.9940783         0.9403178          
##   0.9323104  0.9333995         0.9940783         0.9403178          
##   0.9323104  0.9333995         0.9940783         0.9403178          
##   0.9323104  0.9333995         0.9940783         0.9403178          
##   0.9323104  0.9333995         0.9940783         0.9403178          
##   0.9323104  0.9333995         0.9940783         0.9403178          
##   0.9323104  0.9333995         0.9940783         0.9403178          
##   0.9323104  0.9333995         0.9940783         0.9403178          
##   0.9323104  0.9333995         0.9940783         0.9403178          
##   0.9323104  0.9333995         0.9940783         0.9403178          
##   0.9323104  0.9333995         0.9940783         0.9403178          
##   0.9323104  0.9333995         0.9940783         0.9403178          
##   0.9323104  0.9333995         0.9940783         0.9403178          
##   0.9323104  0.9333995         0.9940783         0.9403178          
##   0.9323104  0.9333995         0.9940783         0.9403178          
##   0.9323104  0.9333995         0.9940783         0.9403178          
##   0.9323104  0.9333995         0.9940783         0.9403178          
##   0.9323104  0.9333995         0.9940783         0.9403178          
##   0.9323104  0.9333995         0.9940783         0.9403178          
##   0.9323104  0.9333995         0.9940783         0.9403178          
##   0.9323104  0.9333995         0.9940783         0.9403178          
##   0.9323104  0.9333995         0.9940783         0.9403178          
##   0.9323104  0.9333995         0.9940783         0.9403178          
##   0.9323104  0.9333995         0.9940783         0.9403178          
##   0.9323104  0.9333995         0.9940783         0.9403178          
##   0.9323104  0.9333995         0.9940783         0.9403178          
##   0.9323104  0.9333995         0.9940783         0.9403178          
##   0.9323104  0.9333995         0.9940783         0.9403178          
##         NaN        NaN               NaN               NaN          
##   0.9019562  0.9045924         0.9914315         0.9139538          
##   0.9086546  0.9103373         0.9927390         0.9298769          
##   0.8917832  0.8824691         0.9905308         0.9342747          
##   0.8024536  0.8576808         0.9880977         0.8806403          
##         NaN  0.8437037         0.9864323               NaN          
##         NaN        NaN               NaN               NaN          
##   0.9019562  0.9045924         0.9914315         0.9139538          
##   0.9086546  0.9103373         0.9927390         0.9298769          
##   0.8917832  0.8824691         0.9905308         0.9342747          
##   0.8024536  0.8576808         0.9880977         0.8806403          
##         NaN  0.8437037         0.9864323               NaN          
##         NaN        NaN               NaN               NaN          
##   0.9019562  0.9045924         0.9914315         0.9139538          
##   0.9086546  0.9103373         0.9927390         0.9298769          
##   0.8917832  0.8824691         0.9905308         0.9342747          
##   0.8024536  0.8576808         0.9880977         0.8806403          
##         NaN  0.8437037         0.9864323               NaN          
##         NaN        NaN               NaN               NaN          
##   0.9019562  0.9045924         0.9914315         0.9139538          
##   0.9086546  0.9103373         0.9927390         0.9298769          
##   0.8917832  0.8824691         0.9905308         0.9342747          
##   0.8024536  0.8576808         0.9880977         0.8806403          
##         NaN  0.8437037         0.9864323               NaN          
##         NaN        NaN               NaN               NaN          
##   0.9019562  0.9045924         0.9914315         0.9139538          
##   0.9086546  0.9103373         0.9927390         0.9298769          
##   0.8917832  0.8824691         0.9905308         0.9342747          
##   0.8024536  0.8576808         0.9880977         0.8806403          
##         NaN  0.8437037         0.9864323               NaN          
##         NaN        NaN               NaN               NaN          
##   0.9019562  0.9045924         0.9914315         0.9139538          
##   0.9086546  0.9103373         0.9927390         0.9298769          
##   0.8917832  0.8824691         0.9905308         0.9342747          
##   0.8024536  0.8576808         0.9880977         0.8806403          
##         NaN  0.8437037         0.9864323               NaN          
##   Mean_Neg_Pred_Value  Mean_Precision  Mean_Recall  Mean_Detection_Rate
##   0.9943610            0.9403178       0.9333995    0.10603525         
##   0.9943610            0.9403178       0.9333995    0.10603525         
##   0.9943610            0.9403178       0.9333995    0.10603525         
##   0.9943610            0.9403178       0.9333995    0.10603525         
##   0.9943610            0.9403178       0.9333995    0.10603525         
##   0.9943610            0.9403178       0.9333995    0.10603525         
##   0.9943610            0.9403178       0.9333995    0.10603525         
##   0.9943610            0.9403178       0.9333995    0.10603525         
##   0.9943610            0.9403178       0.9333995    0.10603525         
##   0.9943610            0.9403178       0.9333995    0.10603525         
##   0.9943610            0.9403178       0.9333995    0.10603525         
##   0.9943610            0.9403178       0.9333995    0.10603525         
##   0.9943610            0.9403178       0.9333995    0.10603525         
##   0.9943610            0.9403178       0.9333995    0.10603525         
##   0.9943610            0.9403178       0.9333995    0.10603525         
##   0.9943610            0.9403178       0.9333995    0.10603525         
##   0.9943610            0.9403178       0.9333995    0.10603525         
##   0.9943610            0.9403178       0.9333995    0.10603525         
##   0.9943610            0.9403178       0.9333995    0.10603525         
##   0.9943610            0.9403178       0.9333995    0.10603525         
##   0.9943610            0.9403178       0.9333995    0.10603525         
##   0.9943610            0.9403178       0.9333995    0.10603525         
##   0.9943610            0.9403178       0.9333995    0.10603525         
##   0.9943610            0.9403178       0.9333995    0.10603525         
##   0.9943610            0.9403178       0.9333995    0.10603525         
##   0.9943610            0.9403178       0.9333995    0.10603525         
##   0.9943610            0.9403178       0.9333995    0.10603525         
##   0.9943610            0.9403178       0.9333995    0.10603525         
##   0.9943610            0.9403178       0.9333995    0.10603525         
##   0.9943610            0.9403178       0.9333995    0.10603525         
##   0.9943610            0.9403178       0.9333995    0.10603525         
##   0.9943610            0.9403178       0.9333995    0.10603525         
##   0.9943610            0.9403178       0.9333995    0.10603525         
##   0.9943610            0.9403178       0.9333995    0.10603525         
##   0.9943610            0.9403178       0.9333995    0.10603525         
##   0.9943610            0.9403178       0.9333995    0.10603525         
##         NaN                  NaN             NaN           NaN         
##   0.9915533            0.9139538       0.9045924    0.10360611         
##   0.9932630            0.9298769       0.9103373    0.10481678         
##   0.9917206            0.9342747       0.8824691    0.10310721         
##   0.9898684            0.8806403       0.8576808    0.10117444         
##   0.9885898                  NaN       0.8437037    0.09996123         
##         NaN                  NaN             NaN           NaN         
##   0.9915533            0.9139538       0.9045924    0.10360611         
##   0.9932630            0.9298769       0.9103373    0.10481678         
##   0.9917206            0.9342747       0.8824691    0.10310721         
##   0.9898684            0.8806403       0.8576808    0.10117444         
##   0.9885898                  NaN       0.8437037    0.09996123         
##         NaN                  NaN             NaN           NaN         
##   0.9915533            0.9139538       0.9045924    0.10360611         
##   0.9932630            0.9298769       0.9103373    0.10481678         
##   0.9917206            0.9342747       0.8824691    0.10310721         
##   0.9898684            0.8806403       0.8576808    0.10117444         
##   0.9885898                  NaN       0.8437037    0.09996123         
##         NaN                  NaN             NaN           NaN         
##   0.9915533            0.9139538       0.9045924    0.10360611         
##   0.9932630            0.9298769       0.9103373    0.10481678         
##   0.9917206            0.9342747       0.8824691    0.10310721         
##   0.9898684            0.8806403       0.8576808    0.10117444         
##   0.9885898                  NaN       0.8437037    0.09996123         
##         NaN                  NaN             NaN           NaN         
##   0.9915533            0.9139538       0.9045924    0.10360611         
##   0.9932630            0.9298769       0.9103373    0.10481678         
##   0.9917206            0.9342747       0.8824691    0.10310721         
##   0.9898684            0.8806403       0.8576808    0.10117444         
##   0.9885898                  NaN       0.8437037    0.09996123         
##         NaN                  NaN             NaN           NaN         
##   0.9915533            0.9139538       0.9045924    0.10360611         
##   0.9932630            0.9298769       0.9103373    0.10481678         
##   0.9917206            0.9342747       0.8824691    0.10310721         
##   0.9898684            0.8806403       0.8576808    0.10117444         
##   0.9885898                  NaN       0.8437037    0.09996123         
##   Mean_Balanced_Accuracy
##   0.9637389             
##   0.9637389             
##   0.9637389             
##   0.9637389             
##   0.9637389             
##   0.9637389             
##   0.9637389             
##   0.9637389             
##   0.9637389             
##   0.9637389             
##   0.9637389             
##   0.9637389             
##   0.9637389             
##   0.9637389             
##   0.9637389             
##   0.9637389             
##   0.9637389             
##   0.9637389             
##   0.9637389             
##   0.9637389             
##   0.9637389             
##   0.9637389             
##   0.9637389             
##   0.9637389             
##   0.9637389             
##   0.9637389             
##   0.9637389             
##   0.9637389             
##   0.9637389             
##   0.9637389             
##   0.9637389             
##   0.9637389             
##   0.9637389             
##   0.9637389             
##   0.9637389             
##   0.9637389             
##         NaN             
##   0.9480119             
##   0.9515382             
##   0.9365000             
##   0.9228893             
##   0.9150680             
##         NaN             
##   0.9480119             
##   0.9515382             
##   0.9365000             
##   0.9228893             
##   0.9150680             
##         NaN             
##   0.9480119             
##   0.9515382             
##   0.9365000             
##   0.9228893             
##   0.9150680             
##         NaN             
##   0.9480119             
##   0.9515382             
##   0.9365000             
##   0.9228893             
##   0.9150680             
##         NaN             
##   0.9480119             
##   0.9515382             
##   0.9365000             
##   0.9228893             
##   0.9150680             
##         NaN             
##   0.9480119             
##   0.9515382             
##   0.9365000             
##   0.9228893             
##   0.9150680             
## 
## Kappa was used to select the optimal model using the largest value.
## The final values used for the model were fL = 0, usekernel = FALSE and adjust
##  = 0.
head(nb_fit$results[order(-nb_fit$results$Kappa),],10)
##    usekernel fL adjust   logLoss       AUC     prAUC  Accuracy     Kappa
## 1      FALSE  0      0 0.2279951 0.9945969 0.8278884 0.9543172 0.9441053
## 2      FALSE  0      1 0.2279951 0.9945969 0.8278884 0.9543172 0.9441053
## 3      FALSE  0      2 0.2279951 0.9945969 0.8278884 0.9543172 0.9441053
## 4      FALSE  0      3 0.2279951 0.9945969 0.8278884 0.9543172 0.9441053
## 5      FALSE  0      4 0.2279951 0.9945969 0.8278884 0.9543172 0.9441053
## 6      FALSE  0      5 0.2279951 0.9945969 0.8278884 0.9543172 0.9441053
## 7      FALSE  1      0 0.2279951 0.9945969 0.8278884 0.9543172 0.9441053
## 8      FALSE  1      1 0.2279951 0.9945969 0.8278884 0.9543172 0.9441053
## 9      FALSE  1      2 0.2279951 0.9945969 0.8278884 0.9543172 0.9441053
## 10     FALSE  1      3 0.2279951 0.9945969 0.8278884 0.9543172 0.9441053
##      Mean_F1 Mean_Sensitivity Mean_Specificity Mean_Pos_Pred_Value
## 1  0.9323104        0.9333995        0.9940783           0.9403178
## 2  0.9323104        0.9333995        0.9940783           0.9403178
## 3  0.9323104        0.9333995        0.9940783           0.9403178
## 4  0.9323104        0.9333995        0.9940783           0.9403178
## 5  0.9323104        0.9333995        0.9940783           0.9403178
## 6  0.9323104        0.9333995        0.9940783           0.9403178
## 7  0.9323104        0.9333995        0.9940783           0.9403178
## 8  0.9323104        0.9333995        0.9940783           0.9403178
## 9  0.9323104        0.9333995        0.9940783           0.9403178
## 10 0.9323104        0.9333995        0.9940783           0.9403178
##    Mean_Neg_Pred_Value Mean_Precision Mean_Recall Mean_Detection_Rate
## 1             0.994361      0.9403178   0.9333995           0.1060352
## 2             0.994361      0.9403178   0.9333995           0.1060352
## 3             0.994361      0.9403178   0.9333995           0.1060352
## 4             0.994361      0.9403178   0.9333995           0.1060352
## 5             0.994361      0.9403178   0.9333995           0.1060352
## 6             0.994361      0.9403178   0.9333995           0.1060352
## 7             0.994361      0.9403178   0.9333995           0.1060352
## 8             0.994361      0.9403178   0.9333995           0.1060352
## 9             0.994361      0.9403178   0.9333995           0.1060352
## 10            0.994361      0.9403178   0.9333995           0.1060352
##    Mean_Balanced_Accuracy logLossSD       AUCSD    prAUCSD AccuracySD
## 1               0.9637389 0.1578302 0.004083802 0.02532101 0.01896598
## 2               0.9637389 0.1578302 0.004083802 0.02532101 0.01896598
## 3               0.9637389 0.1578302 0.004083802 0.02532101 0.01896598
## 4               0.9637389 0.1578302 0.004083802 0.02532101 0.01896598
## 5               0.9637389 0.1578302 0.004083802 0.02532101 0.01896598
## 6               0.9637389 0.1578302 0.004083802 0.02532101 0.01896598
## 7               0.9637389 0.1578302 0.004083802 0.02532101 0.01896598
## 8               0.9637389 0.1578302 0.004083802 0.02532101 0.01896598
## 9               0.9637389 0.1578302 0.004083802 0.02532101 0.01896598
## 10              0.9637389 0.1578302 0.004083802 0.02532101 0.01896598
##       KappaSD  Mean_F1SD Mean_SensitivitySD Mean_SpecificitySD
## 1  0.02334615 0.03271749         0.03768198        0.002531668
## 2  0.02334615 0.03271749         0.03768198        0.002531668
## 3  0.02334615 0.03271749         0.03768198        0.002531668
## 4  0.02334615 0.03271749         0.03768198        0.002531668
## 5  0.02334615 0.03271749         0.03768198        0.002531668
## 6  0.02334615 0.03271749         0.03768198        0.002531668
## 7  0.02334615 0.03271749         0.03768198        0.002531668
## 8  0.02334615 0.03271749         0.03768198        0.002531668
## 9  0.02334615 0.03271749         0.03768198        0.002531668
## 10 0.02334615 0.03271749         0.03768198        0.002531668
##    Mean_Pos_Pred_ValueSD Mean_Neg_Pred_ValueSD Mean_PrecisionSD Mean_RecallSD
## 1             0.03027377            0.00225509       0.03027377    0.03768198
## 2             0.03027377            0.00225509       0.03027377    0.03768198
## 3             0.03027377            0.00225509       0.03027377    0.03768198
## 4             0.03027377            0.00225509       0.03027377    0.03768198
## 5             0.03027377            0.00225509       0.03027377    0.03768198
## 6             0.03027377            0.00225509       0.03027377    0.03768198
## 7             0.03027377            0.00225509       0.03027377    0.03768198
## 8             0.03027377            0.00225509       0.03027377    0.03768198
## 9             0.03027377            0.00225509       0.03027377    0.03768198
## 10            0.03027377            0.00225509       0.03027377    0.03768198
##    Mean_Detection_RateSD Mean_Balanced_AccuracySD
## 1            0.002107332               0.02003125
## 2            0.002107332               0.02003125
## 3            0.002107332               0.02003125
## 4            0.002107332               0.02003125
## 5            0.002107332               0.02003125
## 6            0.002107332               0.02003125
## 7            0.002107332               0.02003125
## 8            0.002107332               0.02003125
## 9            0.002107332               0.02003125
## 10           0.002107332               0.02003125

Parim leitud mudel:

nb_fit$bestTune
##   fL usekernel adjust
## 1  0     FALSE      0

Parima mudeli parameetrid: fL = 0, usekernel = FALSE, adjust = 0.

Samaväärsed mudelid:

datatable(head(nb_fit$results[order(-nb_fit$results$Kappa),],10),options = list(scrollX = TRUE,dom = 'ltip',ordering=F))

Klassifitseerimise tulemused treenimisandmetel:

confusionMatrix(predict(nb_fit),olive.train$area,mode = "everything")
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 245
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 246
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 252
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 253
## Confusion Matrix and Statistics
## 
##                  Reference
## Prediction        Calabria Coast.Sardinia East.Liguria Inland.Sardinia
##   Calabria              38              0            0               0
##   Coast.Sardinia         0             25            0               0
##   East.Liguria           0              0           41               0
##   Inland.Sardinia        0              0            0              55
##   North.Apulia           0              0            0               0
##   Sicily                 1              0            0               0
##   South.Apulia           2              0            0               0
##   Umbria                 0              0            0               0
##   West.Liguria           0              0            0               0
##                  Reference
## Prediction        North.Apulia Sicily South.Apulia Umbria West.Liguria
##   Calabria                   2      3            1      0            0
##   Coast.Sardinia             0      0            0      0            0
##   East.Liguria               0      0            0      1            0
##   Inland.Sardinia            0      0            0      0            0
##   North.Apulia              20      6            0      0            0
##   Sicily                     0     21            2      0            0
##   South.Apulia               0      3          159      0            0
##   Umbria                     0      0            0     39            0
##   West.Liguria               0      0            0      0           39
## 
## Overall Statistics
##                                           
##                Accuracy : 0.9541          
##                  95% CI : (0.9308, 0.9714)
##     No Information Rate : 0.3537          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.9439          
##                                           
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: Calabria Class: Coast.Sardinia Class: East.Liguria
## Sensitivity                  0.92683               1.00000             1.00000
## Specificity                  0.98561               1.00000             0.99760
## Pos Pred Value               0.86364               1.00000             0.97619
## Neg Pred Value               0.99275               1.00000             1.00000
## Precision                    0.86364               1.00000             0.97619
## Recall                       0.92683               1.00000             1.00000
## F1                           0.89412               1.00000             0.98795
## Prevalence                   0.08952               0.05459             0.08952
## Detection Rate               0.08297               0.05459             0.08952
## Detection Prevalence         0.09607               0.05459             0.09170
## Balanced Accuracy            0.95622               1.00000             0.99880
##                      Class: Inland.Sardinia Class: North.Apulia Class: Sicily
## Sensitivity                          1.0000             0.90909       0.63636
## Specificity                          1.0000             0.98624       0.99294
## Pos Pred Value                       1.0000             0.76923       0.87500
## Neg Pred Value                       1.0000             0.99537       0.97235
## Precision                            1.0000             0.76923       0.87500
## Recall                               1.0000             0.90909       0.63636
## F1                                   1.0000             0.83333       0.73684
## Prevalence                           0.1201             0.04803       0.07205
## Detection Rate                       0.1201             0.04367       0.04585
## Detection Prevalence                 0.1201             0.05677       0.05240
## Balanced Accuracy                    1.0000             0.94766       0.81465
##                      Class: South.Apulia Class: Umbria Class: West.Liguria
## Sensitivity                       0.9815       0.97500             1.00000
## Specificity                       0.9831       1.00000             1.00000
## Pos Pred Value                    0.9695       1.00000             1.00000
## Neg Pred Value                    0.9898       0.99761             1.00000
## Precision                         0.9695       1.00000             1.00000
## Recall                            0.9815       0.97500             1.00000
## F1                                0.9755       0.98734             1.00000
## Prevalence                        0.3537       0.08734             0.08515
## Detection Rate                    0.3472       0.08515             0.08515
## Detection Prevalence              0.3581       0.08515             0.08515
## Balanced Accuracy                 0.9823       0.98750             1.00000

Treenimisandmetel Kappa=0.9439. Testandmetel:

confusionMatrix(predict(nb_fit, olive.test.scale),olive.test$area,mode = "everything")
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 5
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 31
## Confusion Matrix and Statistics
## 
##                  Reference
## Prediction        Calabria Coast.Sardinia East.Liguria Inland.Sardinia
##   Calabria              14              0            0               0
##   Coast.Sardinia         0              8            0               0
##   East.Liguria           0              0            6               0
##   Inland.Sardinia        0              0            0              10
##   North.Apulia           0              0            0               0
##   Sicily                 1              0            0               0
##   South.Apulia           0              0            0               0
##   Umbria                 0              0            1               0
##   West.Liguria           0              0            2               0
##                  Reference
## Prediction        North.Apulia Sicily South.Apulia Umbria West.Liguria
##   Calabria                   0      1            3      0            0
##   Coast.Sardinia             0      0            0      0            0
##   East.Liguria               0      0            0      0            1
##   Inland.Sardinia            0      0            0      0            0
##   North.Apulia               3      0            0      0            0
##   Sicily                     0      2            2      0            0
##   South.Apulia               0      0           39      0            0
##   Umbria                     0      0            0     11            0
##   West.Liguria               0      0            0      0           10
## 
## Overall Statistics
##                                           
##                Accuracy : 0.9035          
##                  95% CI : (0.8339, 0.9508)
##     No Information Rate : 0.386           
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.8805          
##                                           
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: Calabria Class: Coast.Sardinia Class: East.Liguria
## Sensitivity                   0.9333               1.00000             0.66667
## Specificity                   0.9596               1.00000             0.99048
## Pos Pred Value                0.7778               1.00000             0.85714
## Neg Pred Value                0.9896               1.00000             0.97196
## Precision                     0.7778               1.00000             0.85714
## Recall                        0.9333               1.00000             0.66667
## F1                            0.8485               1.00000             0.75000
## Prevalence                    0.1316               0.07018             0.07895
## Detection Rate                0.1228               0.07018             0.05263
## Detection Prevalence          0.1579               0.07018             0.06140
## Balanced Accuracy             0.9465               1.00000             0.82857
##                      Class: Inland.Sardinia Class: North.Apulia Class: Sicily
## Sensitivity                         1.00000             1.00000       0.66667
## Specificity                         1.00000             1.00000       0.97297
## Pos Pred Value                      1.00000             1.00000       0.40000
## Neg Pred Value                      1.00000             1.00000       0.99083
## Precision                           1.00000             1.00000       0.40000
## Recall                              1.00000             1.00000       0.66667
## F1                                  1.00000             1.00000       0.50000
## Prevalence                          0.08772             0.02632       0.02632
## Detection Rate                      0.08772             0.02632       0.01754
## Detection Prevalence                0.08772             0.02632       0.04386
## Balanced Accuracy                   1.00000             1.00000       0.81982
##                      Class: South.Apulia Class: Umbria Class: West.Liguria
## Sensitivity                       0.8864       1.00000             0.90909
## Specificity                       1.0000       0.99029             0.98058
## Pos Pred Value                    1.0000       0.91667             0.83333
## Neg Pred Value                    0.9333       1.00000             0.99020
## Precision                         1.0000       0.91667             0.83333
## Recall                            0.8864       1.00000             0.90909
## F1                                0.9398       0.95652             0.86957
## Prevalence                        0.3860       0.09649             0.09649
## Detection Rate                    0.3421       0.09649             0.08772
## Detection Prevalence              0.3421       0.10526             0.10526
## Balanced Accuracy                 0.9432       0.99515             0.94484

Testandmetel Kappa=0.8805.

Tulemuste väljastamisel on väljastatud ka hoiatused kujul: Numerical 0 probability for all classes with observation 5. See ei ole viga, aga hoiatus sellest, et andmed ei ole piisavalt ettevalmistatud, nad sisaldavad erindeid ja nõuavad Box-Cox/Yeo-Johnson teisenduste rakendamist.

library(psych)
pairs.panels(scale(olive[,-1]))

Muutujate tähtsus KNN klassifitseerimisel

plot(varImp(nb_fit))

Tugivektormasinad (Support Vector Machines, SVM)

SVM-meetod otsib tunnuste ruumi jagavat tasapinda, millele lähimate andmepunktide (tugivektorite) omavaheline kaugus on kõige suurem. Tugivektorid on klasside eralduspinnale kõige lähemal olevad andmepunktid.

Andmete ettevalmistamisel arvulised tunnused nõuvad skaleerimist, mittearvulistest dummy variables moodustamist. Vt nt http://www.sthda.com/english/articles/36-classification-methods-essentials/144-svm-model-support-vector-machine-essentials/.

Kasutame paketti caret SVM klassikikaatori tuunimiseks.

SVM parameetrite tuunimine paketiga caret, method = “svmLinear”

Mycluster = makeCluster(detectCores()-1)
registerDoParallel(Mycluster)
system.time({

train.control <- trainControl(method = "cv",number = 5)

RNGkind(sample.kind = "Rounding")
set.seed(123)
svm.linear <- train(area~.,data=olive.train.scale, method = "svmLinear", trControl = train.control,tuneGrid = expand.grid(C=0:20),metric = "Kappa")

})
##    user  system elapsed 
##    0.81    0.00    4.81
stopCluster(Mycluster)
registerDoSEQ()
plot(svm.linear)

svm.linear
## Support Vector Machines with Linear Kernel 
## 
## 458 samples
##   8 predictor
##   9 classes: 'Calabria', 'Coast.Sardinia', 'East.Liguria', 'Inland.Sardinia', 'North.Apulia', 'Sicily', 'South.Apulia', 'Umbria', 'West.Liguria' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 367, 365, 368, 367, 365 
## Resampling results across tuning parameters:
## 
##   C   Accuracy   Kappa    
##    0        NaN        NaN
##    1  0.9606987  0.9519006
##    2  0.9564922  0.9468324
##    3  0.9564678  0.9469004
##    4  0.9520477  0.9415478
##    5  0.9455016  0.9335914
##    6  0.9477238  0.9362954
##    7  0.9499216  0.9389722
##    8  0.9477238  0.9362835
##    9  0.9477238  0.9362835
##   10  0.9477238  0.9362835
##   11  0.9477711  0.9363043
##   12  0.9477711  0.9363043
##   13  0.9477711  0.9363043
##   14  0.9455733  0.9336653
##   15  0.9455733  0.9336653
##   16  0.9455733  0.9336653
##   17  0.9433511  0.9309295
##   18  0.9433511  0.9309295
##   19  0.9433511  0.9309295
##   20  0.9433511  0.9309295
## 
## Kappa was used to select the optimal model using the largest value.
## The final value used for the model was C = 1.
head(svm.linear$results[order(-svm.linear$results$Kappa),],20)
##     C  Accuracy     Kappa AccuracySD    KappaSD
## 2   1 0.9606987 0.9519006 0.02632914 0.03221578
## 4   3 0.9564678 0.9469004 0.01495887 0.01832932
## 3   2 0.9564922 0.9468324 0.02406177 0.02944283
## 5   4 0.9520477 0.9415478 0.01201418 0.01475265
## 8   7 0.9499216 0.9389722 0.01941548 0.02380792
## 12 11 0.9477711 0.9363043 0.01902859 0.02331069
## 13 12 0.9477711 0.9363043 0.01902859 0.02331069
## 14 13 0.9477711 0.9363043 0.01902859 0.02331069
## 7   6 0.9477238 0.9362954 0.02068460 0.02536288
## 9   8 0.9477238 0.9362835 0.02068460 0.02537271
## 10  9 0.9477238 0.9362835 0.02068460 0.02537271
## 11 10 0.9477238 0.9362835 0.02068460 0.02537271
## 15 14 0.9455733 0.9336653 0.02002904 0.02454191
## 16 15 0.9455733 0.9336653 0.02002904 0.02454191
## 17 16 0.9455733 0.9336653 0.02002904 0.02454191
## 6   5 0.9455016 0.9335914 0.01689904 0.02078612
## 18 17 0.9433511 0.9309295 0.01571433 0.01926276
## 19 18 0.9433511 0.9309295 0.01571433 0.01926276
## 20 19 0.9433511 0.9309295 0.01571433 0.01926276
## 21 20 0.9433511 0.9309295 0.01571433 0.01926276

Parim leitud mudel:

svm.linear$bestTune
##   C
## 2 1

Parima mudeli parameetrid: C = 1.

Klassifitseerimise tulemused treenimisandmetel:

confusionMatrix(predict(svm.linear),olive.train$area,mode = "everything")
## Confusion Matrix and Statistics
## 
##                  Reference
## Prediction        Calabria Coast.Sardinia East.Liguria Inland.Sardinia
##   Calabria              38              0            0               0
##   Coast.Sardinia         0             24            0               0
##   East.Liguria           0              0           41               0
##   Inland.Sardinia        0              1            0              55
##   North.Apulia           0              0            0               0
##   Sicily                 2              0            0               0
##   South.Apulia           1              0            0               0
##   Umbria                 0              0            0               0
##   West.Liguria           0              0            0               0
##                  Reference
## Prediction        North.Apulia Sicily South.Apulia Umbria West.Liguria
##   Calabria                   0      2            0      0            0
##   Coast.Sardinia             0      0            0      0            0
##   East.Liguria               0      0            0      0            0
##   Inland.Sardinia            0      0            0      0            0
##   North.Apulia              22      2            0      0            0
##   Sicily                     0     27            2      0            0
##   South.Apulia               0      2          160      0            0
##   Umbria                     0      0            0     40            0
##   West.Liguria               0      0            0      0           39
## 
## Overall Statistics
##                                           
##                Accuracy : 0.9738          
##                  95% CI : (0.9547, 0.9864)
##     No Information Rate : 0.3537          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.968           
##                                           
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: Calabria Class: Coast.Sardinia Class: East.Liguria
## Sensitivity                  0.92683               0.96000             1.00000
## Specificity                  0.99520               1.00000             1.00000
## Pos Pred Value               0.95000               1.00000             1.00000
## Neg Pred Value               0.99282               0.99770             1.00000
## Precision                    0.95000               1.00000             1.00000
## Recall                       0.92683               0.96000             1.00000
## F1                           0.93827               0.97959             1.00000
## Prevalence                   0.08952               0.05459             0.08952
## Detection Rate               0.08297               0.05240             0.08952
## Detection Prevalence         0.08734               0.05240             0.08952
## Balanced Accuracy            0.96102               0.98000             1.00000
##                      Class: Inland.Sardinia Class: North.Apulia Class: Sicily
## Sensitivity                          1.0000             1.00000       0.81818
## Specificity                          0.9975             0.99541       0.99059
## Pos Pred Value                       0.9821             0.91667       0.87097
## Neg Pred Value                       1.0000             1.00000       0.98595
## Precision                            0.9821             0.91667       0.87097
## Recall                               1.0000             1.00000       0.81818
## F1                                   0.9910             0.95652       0.84375
## Prevalence                           0.1201             0.04803       0.07205
## Detection Rate                       0.1201             0.04803       0.05895
## Detection Prevalence                 0.1223             0.05240       0.06769
## Balanced Accuracy                    0.9988             0.99771       0.90439
##                      Class: South.Apulia Class: Umbria Class: West.Liguria
## Sensitivity                       0.9877       1.00000             1.00000
## Specificity                       0.9899       1.00000             1.00000
## Pos Pred Value                    0.9816       1.00000             1.00000
## Neg Pred Value                    0.9932       1.00000             1.00000
## Precision                         0.9816       1.00000             1.00000
## Recall                            0.9877       1.00000             1.00000
## F1                                0.9846       1.00000             1.00000
## Prevalence                        0.3537       0.08734             0.08515
## Detection Rate                    0.3493       0.08734             0.08515
## Detection Prevalence              0.3559       0.08734             0.08515
## Balanced Accuracy                 0.9888       1.00000             1.00000

Treenimisandmetel Kappa=0.968. Testandmetel:

confusionMatrix(predict(svm.linear, olive.test.scale),olive.test$area,mode = "everything")
## Confusion Matrix and Statistics
## 
##                  Reference
## Prediction        Calabria Coast.Sardinia East.Liguria Inland.Sardinia
##   Calabria              14              0            0               0
##   Coast.Sardinia         0              8            0               0
##   East.Liguria           0              0            6               0
##   Inland.Sardinia        0              0            0              10
##   North.Apulia           0              0            0               0
##   Sicily                 1              0            0               0
##   South.Apulia           0              0            0               0
##   Umbria                 0              0            1               0
##   West.Liguria           0              0            2               0
##                  Reference
## Prediction        North.Apulia Sicily South.Apulia Umbria West.Liguria
##   Calabria                   0      0            2      0            0
##   Coast.Sardinia             0      0            0      0            0
##   East.Liguria               0      0            0      0            0
##   Inland.Sardinia            0      0            0      0            0
##   North.Apulia               3      0            0      0            0
##   Sicily                     0      3            2      0            0
##   South.Apulia               0      0           40      0            0
##   Umbria                     0      0            0     11            0
##   West.Liguria               0      0            0      0           11
## 
## Overall Statistics
##                                           
##                Accuracy : 0.9298          
##                  95% CI : (0.8664, 0.9692)
##     No Information Rate : 0.386           
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.9129          
##                                           
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: Calabria Class: Coast.Sardinia Class: East.Liguria
## Sensitivity                   0.9333               1.00000             0.66667
## Specificity                   0.9798               1.00000             1.00000
## Pos Pred Value                0.8750               1.00000             1.00000
## Neg Pred Value                0.9898               1.00000             0.97222
## Precision                     0.8750               1.00000             1.00000
## Recall                        0.9333               1.00000             0.66667
## F1                            0.9032               1.00000             0.80000
## Prevalence                    0.1316               0.07018             0.07895
## Detection Rate                0.1228               0.07018             0.05263
## Detection Prevalence          0.1404               0.07018             0.05263
## Balanced Accuracy             0.9566               1.00000             0.83333
##                      Class: Inland.Sardinia Class: North.Apulia Class: Sicily
## Sensitivity                         1.00000             1.00000       1.00000
## Specificity                         1.00000             1.00000       0.97297
## Pos Pred Value                      1.00000             1.00000       0.50000
## Neg Pred Value                      1.00000             1.00000       1.00000
## Precision                           1.00000             1.00000       0.50000
## Recall                              1.00000             1.00000       1.00000
## F1                                  1.00000             1.00000       0.66667
## Prevalence                          0.08772             0.02632       0.02632
## Detection Rate                      0.08772             0.02632       0.02632
## Detection Prevalence                0.08772             0.02632       0.05263
## Balanced Accuracy                   1.00000             1.00000       0.98649
##                      Class: South.Apulia Class: Umbria Class: West.Liguria
## Sensitivity                       0.9091       1.00000             1.00000
## Specificity                       1.0000       0.99029             0.98058
## Pos Pred Value                    1.0000       0.91667             0.84615
## Neg Pred Value                    0.9459       1.00000             1.00000
## Precision                         1.0000       0.91667             0.84615
## Recall                            0.9091       1.00000             1.00000
## F1                                0.9524       0.95652             0.91667
## Prevalence                        0.3860       0.09649             0.09649
## Detection Rate                    0.3509       0.09649             0.09649
## Detection Prevalence              0.3509       0.10526             0.11404
## Balanced Accuracy                 0.9545       0.99515             0.99029

Testandmetel Kappa=0.9129.

Muutujate tähtsus KNN klassifitseerimisel

plot(varImp(svm.linear))

Atribuutide tähtsus erinevates klassides on erinev.

SVM parameetrite tuunimine paketiga caret, method = “svmRadial”

Mycluster = makeCluster(detectCores()-1)
registerDoParallel(Mycluster)
system.time({

train.control <- trainControl(method = "cv",number = 5,search = "random")

RNGkind(sample.kind = "Rounding")
set.seed(123)
svm.radial <- train(area~.,data=olive.train.scale, method = "svmRadial", trControl = train.control, tuneLength = 20,metric = "Kappa")

})
##    user  system elapsed 
##    0.95    0.02    6.32
stopCluster(Mycluster)
registerDoSEQ()
plot(svm.radial)

svm.radial
## Support Vector Machines with Radial Basis Function Kernel 
## 
## 458 samples
##   8 predictor
##   9 classes: 'Calabria', 'Coast.Sardinia', 'East.Liguria', 'Inland.Sardinia', 'North.Apulia', 'Sicily', 'South.Apulia', 'Umbria', 'West.Liguria' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 367, 365, 368, 367, 365 
## Resampling results across tuning parameters:
## 
##   sigma        C             Accuracy   Kappa    
##   0.005852913    0.05230441  0.3537059  0.0000000
##   0.010128835  760.90419287  0.9478412  0.9365860
##   0.012137441   47.60640178  0.9673638  0.9600616
##   0.015399114   39.83984930  0.9651889  0.9573897
##   0.020887845   39.25689606  0.9696089  0.9628821
##   0.022524452    0.11789589  0.6088810  0.4976433
##   0.028598399    0.10414713  0.6287361  0.5238913
##   0.030698472    2.70237697  0.9608405  0.9520295
##   0.071715001    3.72380596  0.9717594  0.9654090
##   0.075472517  147.95409522  0.9389783  0.9256451
##   0.079285182  951.56621613  0.9389783  0.9256451
##   0.088892668    0.76655335  0.9608405  0.9520295
##   0.180744543  508.53955572  0.9412478  0.9283146
##   0.231552023    0.43421604  0.9629911  0.9545952
##   0.284751577    0.07716565  0.7620765  0.6897584
##   0.337231496   36.08576317  0.9565623  0.9469792
##   0.492926619   46.00685001  0.9608634  0.9520715
##   0.678763927  183.03993937  0.9542211  0.9437007
##   0.726718585    0.06503385  0.4912805  0.2517811
##   1.780554660    0.29270054  0.6025483  0.4402274
## 
## Kappa was used to select the optimal model using the largest value.
## The final values used for the model were sigma = 0.071715 and C = 3.723806.
head(svm.radial$results[order(-svm.radial$results$Kappa),],5)
##         sigma         C  Accuracy     Kappa AccuracySD    KappaSD
## 9  0.07171500  3.723806 0.9717594 0.9654090 0.01815230 0.02219872
## 5  0.02088784 39.256896 0.9696089 0.9628821 0.02078556 0.02535304
## 3  0.01213744 47.606402 0.9673638 0.9600616 0.02309314 0.02827360
## 4  0.01539911 39.839849 0.9651889 0.9573897 0.01934853 0.02366657
## 14 0.23155202  0.434216 0.9629911 0.9545952 0.01635930 0.02014991

Parim leitud mudel:

svm.radial$bestTune
##      sigma        C
## 9 0.071715 3.723806

Parima mudeli parameetrid: sigma = 0.071715, C = 3.723806.

Klassifitseerimise tulemused treenimisandmetel:

confusionMatrix(predict(svm.radial),olive.train$area,mode = "everything")
## Confusion Matrix and Statistics
## 
##                  Reference
## Prediction        Calabria Coast.Sardinia East.Liguria Inland.Sardinia
##   Calabria              39              0            0               0
##   Coast.Sardinia         0             24            0               0
##   East.Liguria           0              0           41               0
##   Inland.Sardinia        0              1            0              55
##   North.Apulia           0              0            0               0
##   Sicily                 0              0            0               0
##   South.Apulia           2              0            0               0
##   Umbria                 0              0            0               0
##   West.Liguria           0              0            0               0
##                  Reference
## Prediction        North.Apulia Sicily South.Apulia Umbria West.Liguria
##   Calabria                   1      3            0      0            0
##   Coast.Sardinia             0      0            0      0            0
##   East.Liguria               0      0            0      0            0
##   Inland.Sardinia            0      0            0      0            0
##   North.Apulia              21      2            0      0            0
##   Sicily                     0     26            1      0            0
##   South.Apulia               0      2          161      0            0
##   Umbria                     0      0            0     40            0
##   West.Liguria               0      0            0      0           39
## 
## Overall Statistics
##                                           
##                Accuracy : 0.9738          
##                  95% CI : (0.9547, 0.9864)
##     No Information Rate : 0.3537          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.9679          
##                                           
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: Calabria Class: Coast.Sardinia Class: East.Liguria
## Sensitivity                  0.95122               0.96000             1.00000
## Specificity                  0.99041               1.00000             1.00000
## Pos Pred Value               0.90698               1.00000             1.00000
## Neg Pred Value               0.99518               0.99770             1.00000
## Precision                    0.90698               1.00000             1.00000
## Recall                       0.95122               0.96000             1.00000
## F1                           0.92857               0.97959             1.00000
## Prevalence                   0.08952               0.05459             0.08952
## Detection Rate               0.08515               0.05240             0.08952
## Detection Prevalence         0.09389               0.05240             0.08952
## Balanced Accuracy            0.97081               0.98000             1.00000
##                      Class: Inland.Sardinia Class: North.Apulia Class: Sicily
## Sensitivity                          1.0000             0.95455       0.78788
## Specificity                          0.9975             0.99541       0.99765
## Pos Pred Value                       0.9821             0.91304       0.96296
## Neg Pred Value                       1.0000             0.99770       0.98376
## Precision                            0.9821             0.91304       0.96296
## Recall                               1.0000             0.95455       0.78788
## F1                                   0.9910             0.93333       0.86667
## Prevalence                           0.1201             0.04803       0.07205
## Detection Rate                       0.1201             0.04585       0.05677
## Detection Prevalence                 0.1223             0.05022       0.05895
## Balanced Accuracy                    0.9988             0.97498       0.89276
##                      Class: South.Apulia Class: Umbria Class: West.Liguria
## Sensitivity                       0.9938       1.00000             1.00000
## Specificity                       0.9865       1.00000             1.00000
## Pos Pred Value                    0.9758       1.00000             1.00000
## Neg Pred Value                    0.9966       1.00000             1.00000
## Precision                         0.9758       1.00000             1.00000
## Recall                            0.9938       1.00000             1.00000
## F1                                0.9847       1.00000             1.00000
## Prevalence                        0.3537       0.08734             0.08515
## Detection Rate                    0.3515       0.08734             0.08515
## Detection Prevalence              0.3603       0.08734             0.08515
## Balanced Accuracy                 0.9902       1.00000             1.00000

Treenimisandmetel Kappa=0.9679. Testandmetel:

confusionMatrix(predict(svm.radial, olive.test.scale),olive.test$area,mode = "everything")
## Confusion Matrix and Statistics
## 
##                  Reference
## Prediction        Calabria Coast.Sardinia East.Liguria Inland.Sardinia
##   Calabria              15              0            0               0
##   Coast.Sardinia         0              8            0               0
##   East.Liguria           0              0            7               0
##   Inland.Sardinia        0              0            0              10
##   North.Apulia           0              0            0               0
##   Sicily                 0              0            0               0
##   South.Apulia           0              0            0               0
##   Umbria                 0              0            1               0
##   West.Liguria           0              0            1               0
##                  Reference
## Prediction        North.Apulia Sicily South.Apulia Umbria West.Liguria
##   Calabria                   0      1            1      0            0
##   Coast.Sardinia             0      0            0      0            0
##   East.Liguria               0      0            0      0            2
##   Inland.Sardinia            0      0            0      0            0
##   North.Apulia               3      0            0      0            0
##   Sicily                     0      2            2      0            0
##   South.Apulia               0      0           41      0            0
##   Umbria                     0      0            0     11            0
##   West.Liguria               0      0            0      0            9
## 
## Overall Statistics
##                                           
##                Accuracy : 0.9298          
##                  95% CI : (0.8664, 0.9692)
##     No Information Rate : 0.386           
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.9126          
##                                           
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: Calabria Class: Coast.Sardinia Class: East.Liguria
## Sensitivity                   1.0000               1.00000             0.77778
## Specificity                   0.9798               1.00000             0.98095
## Pos Pred Value                0.8824               1.00000             0.77778
## Neg Pred Value                1.0000               1.00000             0.98095
## Precision                     0.8824               1.00000             0.77778
## Recall                        1.0000               1.00000             0.77778
## F1                            0.9375               1.00000             0.77778
## Prevalence                    0.1316               0.07018             0.07895
## Detection Rate                0.1316               0.07018             0.06140
## Detection Prevalence          0.1491               0.07018             0.07895
## Balanced Accuracy             0.9899               1.00000             0.87937
##                      Class: Inland.Sardinia Class: North.Apulia Class: Sicily
## Sensitivity                         1.00000             1.00000       0.66667
## Specificity                         1.00000             1.00000       0.98198
## Pos Pred Value                      1.00000             1.00000       0.50000
## Neg Pred Value                      1.00000             1.00000       0.99091
## Precision                           1.00000             1.00000       0.50000
## Recall                              1.00000             1.00000       0.66667
## F1                                  1.00000             1.00000       0.57143
## Prevalence                          0.08772             0.02632       0.02632
## Detection Rate                      0.08772             0.02632       0.01754
## Detection Prevalence                0.08772             0.02632       0.03509
## Balanced Accuracy                   1.00000             1.00000       0.82432
##                      Class: South.Apulia Class: Umbria Class: West.Liguria
## Sensitivity                       0.9318       1.00000             0.81818
## Specificity                       1.0000       0.99029             0.99029
## Pos Pred Value                    1.0000       0.91667             0.90000
## Neg Pred Value                    0.9589       1.00000             0.98077
## Precision                         1.0000       0.91667             0.90000
## Recall                            0.9318       1.00000             0.81818
## F1                                0.9647       0.95652             0.85714
## Prevalence                        0.3860       0.09649             0.09649
## Detection Rate                    0.3596       0.09649             0.07895
## Detection Prevalence              0.3596       0.10526             0.08772
## Balanced Accuracy                 0.9659       0.99515             0.90424

Testandmetel Kappa=0.9126.

Muutujate tähtsus KNN klassifitseerimisel

plot(varImp(svm.radial))

Atribuutide tähtsus erinevates klassides on erinev.

SVM parameetrite tuunimine paketiga caret, method = “svmPoly”

Mycluster = makeCluster(detectCores()-1)
registerDoParallel(Mycluster)
system.time({

train.control <- trainControl(method = "cv",number = 5,search = "random")

RNGkind(sample.kind = "Rounding")
set.seed(123)
svm.poly <- train(area~.,data=olive.train.scale, method = "svmPoly", trControl = train.control, tuneLength = 20,metric = "Kappa")

})
##    user  system elapsed 
##    0.53    0.00    5.20
stopCluster(Mycluster)
registerDoSEQ()
plot(svm.poly)

svm.poly
## Support Vector Machines with Polynomial Kernel 
## 
## 458 samples
##   8 predictor
##   9 classes: 'Calabria', 'Coast.Sardinia', 'East.Liguria', 'Inland.Sardinia', 'North.Apulia', 'Sicily', 'South.Apulia', 'Umbria', 'West.Liguria' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 367, 365, 368, 367, 365 
## Resampling results across tuning parameters:
## 
##   degree  scale         C             Accuracy   Kappa    
##   1       1.350448e-05   10.66062234  0.3537059  0.0000000
##   1       1.403411e-04   78.77238023  0.7925637  0.7413192
##   1       4.860568e-04  343.86069012  0.9542700  0.9440001
##   1       5.700872e-02    0.13231889  0.7490071  0.6843434
##   1       1.048617e-01    0.11768275  0.8056576  0.7578714
##   2       6.023569e-05  233.52217294  0.8799638  0.8516818
##   2       3.410707e-04    0.49640502  0.3537059  0.0000000
##   2       7.657911e-03    0.35245513  0.6616590  0.5683239
##   2       2.485111e-02    2.30675518  0.9543172  0.9440720
##   2       1.647391e-01    0.11098902  0.9586900  0.9493377
##   2       6.068960e-01    3.10157522  0.9261223  0.9101695
##   3       5.714618e-05   31.48794977  0.6484706  0.5515120
##   3       1.689878e-04    1.53362062  0.3537059  0.0000000
##   3       3.410443e-03    0.26756677  0.4759400  0.2733118
##   3       1.411122e-02    3.97081942  0.9565394  0.9467365
##   3       2.991687e-02    0.15247781  0.8604443  0.8266708
##   3       4.586162e-02  126.57710659  0.9412234  0.9286316
##   3       4.705130e-02    2.32655459  0.9673638  0.9600290
##   3       1.273563e+00    0.05032669  0.9347489  0.9203501
##   3       1.864893e+00    1.44661575  0.9304478  0.9151187
## 
## Kappa was used to select the optimal model using the largest value.
## The final values used for the model were degree = 3, scale = 0.0470513 and C
##  = 2.326555.
head(svm.poly$results[order(-svm.poly$results$Kappa),],5)
##    degree        scale          C  Accuracy     Kappa AccuracySD    KappaSD
## 18      3 0.0470513011   2.326555 0.9673638 0.9600290 0.02309314 0.02827340
## 10      2 0.1647390821   0.110989 0.9586900 0.9493377 0.02066948 0.02536111
## 15      3 0.0141112236   3.970819 0.9565394 0.9467365 0.02839641 0.03485378
## 9       2 0.0248511123   2.306755 0.9543172 0.9440720 0.02552114 0.03132162
## 3       1 0.0004860568 343.860690 0.9542700 0.9440001 0.02887019 0.03541881

Parim leitud mudel:

svm.poly$bestTune
##    degree     scale        C
## 18      3 0.0470513 2.326555

Parima mudeli parameetrid: degree = 3, scale = 0.0470513, C = 2.326555.

Klassifitseerimise tulemused treenimisandmetel:

confusionMatrix(predict(svm.poly),olive.train$area,mode = "everything")
## Confusion Matrix and Statistics
## 
##                  Reference
## Prediction        Calabria Coast.Sardinia East.Liguria Inland.Sardinia
##   Calabria              40              0            0               0
##   Coast.Sardinia         0             24            0               0
##   East.Liguria           0              0           41               0
##   Inland.Sardinia        0              1            0              55
##   North.Apulia           0              0            0               0
##   Sicily                 0              0            0               0
##   South.Apulia           1              0            0               0
##   Umbria                 0              0            0               0
##   West.Liguria           0              0            0               0
##                  Reference
## Prediction        North.Apulia Sicily South.Apulia Umbria West.Liguria
##   Calabria                   1      3            0      0            0
##   Coast.Sardinia             0      0            0      0            0
##   East.Liguria               0      0            0      0            0
##   Inland.Sardinia            0      0            0      0            0
##   North.Apulia              21      1            0      0            0
##   Sicily                     0     27            1      0            0
##   South.Apulia               0      2          161      0            0
##   Umbria                     0      0            0     40            0
##   West.Liguria               0      0            0      0           39
## 
## Overall Statistics
##                                           
##                Accuracy : 0.9782          
##                  95% CI : (0.9602, 0.9895)
##     No Information Rate : 0.3537          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.9733          
##                                           
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: Calabria Class: Coast.Sardinia Class: East.Liguria
## Sensitivity                  0.97561               0.96000             1.00000
## Specificity                  0.99041               1.00000             1.00000
## Pos Pred Value               0.90909               1.00000             1.00000
## Neg Pred Value               0.99758               0.99770             1.00000
## Precision                    0.90909               1.00000             1.00000
## Recall                       0.97561               0.96000             1.00000
## F1                           0.94118               0.97959             1.00000
## Prevalence                   0.08952               0.05459             0.08952
## Detection Rate               0.08734               0.05240             0.08952
## Detection Prevalence         0.09607               0.05240             0.08952
## Balanced Accuracy            0.98301               0.98000             1.00000
##                      Class: Inland.Sardinia Class: North.Apulia Class: Sicily
## Sensitivity                          1.0000             0.95455       0.81818
## Specificity                          0.9975             0.99771       0.99765
## Pos Pred Value                       0.9821             0.95455       0.96429
## Neg Pred Value                       1.0000             0.99771       0.98605
## Precision                            0.9821             0.95455       0.96429
## Recall                               1.0000             0.95455       0.81818
## F1                                   0.9910             0.95455       0.88525
## Prevalence                           0.1201             0.04803       0.07205
## Detection Rate                       0.1201             0.04585       0.05895
## Detection Prevalence                 0.1223             0.04803       0.06114
## Balanced Accuracy                    0.9988             0.97613       0.90791
##                      Class: South.Apulia Class: Umbria Class: West.Liguria
## Sensitivity                       0.9938       1.00000             1.00000
## Specificity                       0.9899       1.00000             1.00000
## Pos Pred Value                    0.9817       1.00000             1.00000
## Neg Pred Value                    0.9966       1.00000             1.00000
## Precision                         0.9817       1.00000             1.00000
## Recall                            0.9938       1.00000             1.00000
## F1                                0.9877       1.00000             1.00000
## Prevalence                        0.3537       0.08734             0.08515
## Detection Rate                    0.3515       0.08734             0.08515
## Detection Prevalence              0.3581       0.08734             0.08515
## Balanced Accuracy                 0.9918       1.00000             1.00000

Treenimisandmetel Kappa=0.9733. Testandmetel:

confusionMatrix(predict(svm.poly, olive.test.scale),olive.test$area,mode = "everything")
## Confusion Matrix and Statistics
## 
##                  Reference
## Prediction        Calabria Coast.Sardinia East.Liguria Inland.Sardinia
##   Calabria              15              0            0               0
##   Coast.Sardinia         0              8            0               0
##   East.Liguria           0              0            7               0
##   Inland.Sardinia        0              0            0              10
##   North.Apulia           0              0            0               0
##   Sicily                 0              0            0               0
##   South.Apulia           0              0            0               0
##   Umbria                 0              0            1               0
##   West.Liguria           0              0            1               0
##                  Reference
## Prediction        North.Apulia Sicily South.Apulia Umbria West.Liguria
##   Calabria                   0      1            1      0            0
##   Coast.Sardinia             0      0            0      0            0
##   East.Liguria               0      0            0      0            0
##   Inland.Sardinia            0      0            0      0            0
##   North.Apulia               3      0            0      0            0
##   Sicily                     0      2            2      0            0
##   South.Apulia               0      0           41      0            0
##   Umbria                     0      0            0     11            0
##   West.Liguria               0      0            0      0           11
## 
## Overall Statistics
##                                          
##                Accuracy : 0.9474         
##                  95% CI : (0.889, 0.9804)
##     No Information Rate : 0.386          
##     P-Value [Acc > NIR] : < 2.2e-16      
##                                          
##                   Kappa : 0.9344         
##                                          
##  Mcnemar's Test P-Value : NA             
## 
## Statistics by Class:
## 
##                      Class: Calabria Class: Coast.Sardinia Class: East.Liguria
## Sensitivity                   1.0000               1.00000             0.77778
## Specificity                   0.9798               1.00000             1.00000
## Pos Pred Value                0.8824               1.00000             1.00000
## Neg Pred Value                1.0000               1.00000             0.98131
## Precision                     0.8824               1.00000             1.00000
## Recall                        1.0000               1.00000             0.77778
## F1                            0.9375               1.00000             0.87500
## Prevalence                    0.1316               0.07018             0.07895
## Detection Rate                0.1316               0.07018             0.06140
## Detection Prevalence          0.1491               0.07018             0.06140
## Balanced Accuracy             0.9899               1.00000             0.88889
##                      Class: Inland.Sardinia Class: North.Apulia Class: Sicily
## Sensitivity                         1.00000             1.00000       0.66667
## Specificity                         1.00000             1.00000       0.98198
## Pos Pred Value                      1.00000             1.00000       0.50000
## Neg Pred Value                      1.00000             1.00000       0.99091
## Precision                           1.00000             1.00000       0.50000
## Recall                              1.00000             1.00000       0.66667
## F1                                  1.00000             1.00000       0.57143
## Prevalence                          0.08772             0.02632       0.02632
## Detection Rate                      0.08772             0.02632       0.01754
## Detection Prevalence                0.08772             0.02632       0.03509
## Balanced Accuracy                   1.00000             1.00000       0.82432
##                      Class: South.Apulia Class: Umbria Class: West.Liguria
## Sensitivity                       0.9318       1.00000             1.00000
## Specificity                       1.0000       0.99029             0.99029
## Pos Pred Value                    1.0000       0.91667             0.91667
## Neg Pred Value                    0.9589       1.00000             1.00000
## Precision                         1.0000       0.91667             0.91667
## Recall                            0.9318       1.00000             1.00000
## F1                                0.9647       0.95652             0.95652
## Prevalence                        0.3860       0.09649             0.09649
## Detection Rate                    0.3596       0.09649             0.09649
## Detection Prevalence              0.3596       0.10526             0.10526
## Balanced Accuracy                 0.9659       0.99515             0.99515

Testandmetel Kappa=0.9344.

Muutujate tähtsus KNN klassifitseerimisel

plot(varImp(svm.poly))

Atribuutide tähtsus erinevates klassides on erinev.

Logistiline regressioon

Mycluster = makeCluster(detectCores()-1)
registerDoParallel(Mycluster)
system.time({
tuneGrid_mnl <- expand.grid(decay = seq(0, 1, by = 0.1))
train.control <- trainControl(method = "cv",number = 5,search = "grid",
                              classProbs = TRUE,
                              summaryFunction = multiClassSummary)

RNGkind(sample.kind = "Rounding")
set.seed(123)
fit.logit <- train(area~.,data=olive.train, method = "multinom", trControl = train.control,metric = "Kappa",tuneGrid = tuneGrid_mnl,trace = FALSE)

})
##    user  system elapsed 
##    0.53    0.01    4.64
stopCluster(Mycluster)
registerDoSEQ()
plot(fit.logit)

fit.logit
## Penalized Multinomial Regression 
## 
## 458 samples
##   8 predictor
##   9 classes: 'Calabria', 'Coast.Sardinia', 'East.Liguria', 'Inland.Sardinia', 'North.Apulia', 'Sicily', 'South.Apulia', 'Umbria', 'West.Liguria' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 367, 365, 368, 367, 365 
## Resampling results across tuning parameters:
## 
##   decay  logLoss    AUC        prAUC      Accuracy   Kappa      Mean_F1  
##   0.0    0.6059458  0.9924479  0.7577432  0.9238300  0.9070667  0.8903739
##   0.1    0.2742515  0.9899711  0.7989435  0.9084927  0.8880737  0.8617584
##   0.2    0.3264671  0.9872370  0.7780617  0.8953515  0.8717804  0.8427825
##   0.3    0.3626803  0.9846453  0.7664523  0.8909315  0.8661399  0.8362198
##   0.4    0.3909264  0.9831358  0.7592052  0.8865587  0.8607683  0.8314279
##   0.5    0.4143118  0.9817613  0.7522944  0.8800599  0.8527959  0.8217217
##   0.6    0.4343926  0.9804581  0.7493196  0.8779093  0.8503018  0.8207851
##   0.7    0.4520672  0.9796006  0.7441995  0.8778849  0.8501804  0.8194847
##   0.8    0.4678802  0.9785066  0.7406718  0.8713616  0.8421864  0.8364612
##   0.9    0.4822288  0.9775978  0.7353966  0.8647438  0.8339152  0.8256718
##   1.0    0.4953887  0.9767387  0.7313853  0.8625460  0.8310568  0.8210842
##   Mean_Sensitivity  Mean_Specificity  Mean_Pos_Pred_Value  Mean_Neg_Pred_Value
##   0.8911570         0.9902443         0.9030123            0.9902015          
##   0.8656241         0.9883104         0.8948403            0.9887511          
##   0.8472379         0.9864922         0.8804211            0.9872010          
##   0.8420681         0.9858351         0.8696200            0.9868144          
##   0.8365125         0.9853028         0.8638640            0.9862703          
##   0.8265125         0.9845183         0.8413612            0.9854713          
##   0.8258391         0.9842569         0.8404798            0.9850972          
##   0.8237558         0.9842601         0.8404300            0.9852261          
##   0.8153343         0.9834662         0.8223777            0.9844549          
##   0.8056429         0.9825441         0.8169985            0.9836856          
##   0.8019392         0.9821674         0.8135017            0.9834393          
##   Mean_Precision  Mean_Recall  Mean_Detection_Rate  Mean_Balanced_Accuracy
##   0.9030123       0.8911570    0.10264778           0.9407007             
##   0.8948403       0.8656241    0.10094363           0.9269672             
##   0.8804211       0.8472379    0.09948350           0.9168650             
##   0.8696200       0.8420681    0.09899239           0.9139516             
##   0.8638640       0.8365125    0.09850653           0.9109077             
##   0.8413612       0.8265125    0.09778443           0.9055154             
##   0.8404798       0.8258391    0.09754548           0.9050480             
##   0.8404300       0.8237558    0.09754277           0.9040080             
##   0.8223777       0.8153343    0.09681796           0.8994003             
##   0.8169985       0.8056429    0.09608264           0.8940935             
##   0.8135017       0.8019392    0.09583844           0.8920533             
## 
## Kappa was used to select the optimal model using the largest value.
## The final value used for the model was decay = 0.
head(fit.logit$results[order(-fit.logit$results$Kappa),],5)
##   decay   logLoss       AUC     prAUC  Accuracy     Kappa   Mean_F1
## 1   0.0 0.6059458 0.9924479 0.7577432 0.9238300 0.9070667 0.8903739
## 2   0.1 0.2742515 0.9899711 0.7989435 0.9084927 0.8880737 0.8617584
## 3   0.2 0.3264671 0.9872370 0.7780617 0.8953515 0.8717804 0.8427825
## 4   0.3 0.3626803 0.9846453 0.7664523 0.8909315 0.8661399 0.8362198
## 5   0.4 0.3909264 0.9831358 0.7592052 0.8865587 0.8607683 0.8314279
##   Mean_Sensitivity Mean_Specificity Mean_Pos_Pred_Value Mean_Neg_Pred_Value
## 1        0.8911570        0.9902443           0.9030123           0.9902015
## 2        0.8656241        0.9883104           0.8948403           0.9887511
## 3        0.8472379        0.9864922           0.8804211           0.9872010
## 4        0.8420681        0.9858351           0.8696200           0.9868144
## 5        0.8365125        0.9853028           0.8638640           0.9862703
##   Mean_Precision Mean_Recall Mean_Detection_Rate Mean_Balanced_Accuracy
## 1      0.9030123   0.8911570          0.10264778              0.9407007
## 2      0.8948403   0.8656241          0.10094363              0.9269672
## 3      0.8804211   0.8472379          0.09948350              0.9168650
## 4      0.8696200   0.8420681          0.09899239              0.9139516
## 5      0.8638640   0.8365125          0.09850653              0.9109077
##    logLossSD       AUCSD    prAUCSD AccuracySD    KappaSD  Mean_F1SD
## 1 0.23297584 0.003200508 0.03500452 0.02341025 0.02866900 0.04418392
## 2 0.07518731 0.006455353 0.02862192 0.03270825 0.04005961 0.05841365
## 3 0.07134740 0.006709591 0.02746782 0.02444432 0.03008097 0.04382832
## 4 0.06861522 0.007176839 0.02310044 0.02726844 0.03373790 0.04817611
## 5 0.06626223 0.007526678 0.02308407 0.02776222 0.03438352 0.05034381
##   Mean_SensitivitySD Mean_SpecificitySD Mean_Pos_Pred_ValueSD
## 1         0.04213728        0.002929479            0.04611538
## 2         0.05824946        0.004077027            0.03461405
## 3         0.04630952        0.003244217            0.02272302
## 4         0.04967022        0.003726555            0.02831697
## 5         0.05116294        0.003761560            0.02931327
##   Mean_Neg_Pred_ValueSD Mean_PrecisionSD Mean_RecallSD Mean_Detection_RateSD
## 1           0.002872770       0.04611538    0.04213728           0.002601138
## 2           0.004030121       0.03461405    0.05824946           0.003634250
## 3           0.003086322       0.02272302    0.04630952           0.002716036
## 4           0.003403333       0.02831697    0.04967022           0.003029827
## 5           0.003427401       0.02931327    0.05116294           0.003084691
##   Mean_Balanced_AccuracySD
## 1               0.02247678
## 2               0.03113428
## 3               0.02474522
## 4               0.02663007
## 5               0.02740647

Parim leitud mudel:

fit.logit$bestTune
##   decay
## 1     0

Parima mudeli parameetrid: decay = 0.

Klassifitseerimise tulemused treenimisandmetel:

confusionMatrix(predict(fit.logit), olive.train$area,mode = "everything")
## Confusion Matrix and Statistics
## 
##                  Reference
## Prediction        Calabria Coast.Sardinia East.Liguria Inland.Sardinia
##   Calabria              37              0            0               0
##   Coast.Sardinia         0             25            0               0
##   East.Liguria           0              0           41               0
##   Inland.Sardinia        0              0            0              55
##   North.Apulia           0              0            0               0
##   Sicily                 3              0            0               0
##   South.Apulia           1              0            0               0
##   Umbria                 0              0            0               0
##   West.Liguria           0              0            0               0
##                  Reference
## Prediction        North.Apulia Sicily South.Apulia Umbria West.Liguria
##   Calabria                   0      1            0      0            0
##   Coast.Sardinia             0      0            0      0            0
##   East.Liguria               0      0            0      0            0
##   Inland.Sardinia            0      0            0      0            0
##   North.Apulia              22      0            0      0            0
##   Sicily                     0     30            2      0            0
##   South.Apulia               0      2          160      0            0
##   Umbria                     0      0            0     40            0
##   West.Liguria               0      0            0      0           39
## 
## Overall Statistics
##                                         
##                Accuracy : 0.9803        
##                  95% CI : (0.963, 0.991)
##     No Information Rate : 0.3537        
##     P-Value [Acc > NIR] : < 2.2e-16     
##                                         
##                   Kappa : 0.976         
##                                         
##  Mcnemar's Test P-Value : NA            
## 
## Statistics by Class:
## 
##                      Class: Calabria Class: Coast.Sardinia Class: East.Liguria
## Sensitivity                  0.90244               1.00000             1.00000
## Specificity                  0.99760               1.00000             1.00000
## Pos Pred Value               0.97368               1.00000             1.00000
## Neg Pred Value               0.99048               1.00000             1.00000
## Precision                    0.97368               1.00000             1.00000
## Recall                       0.90244               1.00000             1.00000
## F1                           0.93671               1.00000             1.00000
## Prevalence                   0.08952               0.05459             0.08952
## Detection Rate               0.08079               0.05459             0.08952
## Detection Prevalence         0.08297               0.05459             0.08952
## Balanced Accuracy            0.95002               1.00000             1.00000
##                      Class: Inland.Sardinia Class: North.Apulia Class: Sicily
## Sensitivity                          1.0000             1.00000       0.90909
## Specificity                          1.0000             1.00000       0.98824
## Pos Pred Value                       1.0000             1.00000       0.85714
## Neg Pred Value                       1.0000             1.00000       0.99291
## Precision                            1.0000             1.00000       0.85714
## Recall                               1.0000             1.00000       0.90909
## F1                                   1.0000             1.00000       0.88235
## Prevalence                           0.1201             0.04803       0.07205
## Detection Rate                       0.1201             0.04803       0.06550
## Detection Prevalence                 0.1201             0.04803       0.07642
## Balanced Accuracy                    1.0000             1.00000       0.94866
##                      Class: South.Apulia Class: Umbria Class: West.Liguria
## Sensitivity                       0.9877       1.00000             1.00000
## Specificity                       0.9899       1.00000             1.00000
## Pos Pred Value                    0.9816       1.00000             1.00000
## Neg Pred Value                    0.9932       1.00000             1.00000
## Precision                         0.9816       1.00000             1.00000
## Recall                            0.9877       1.00000             1.00000
## F1                                0.9846       1.00000             1.00000
## Prevalence                        0.3537       0.08734             0.08515
## Detection Rate                    0.3493       0.08734             0.08515
## Detection Prevalence              0.3559       0.08734             0.08515
## Balanced Accuracy                 0.9888       1.00000             1.00000

Treenimisandmetel Kappa=0.976. Testandmetel:

confusionMatrix(predict(fit.logit, olive.test),olive.test$area,mode = "everything")
## Confusion Matrix and Statistics
## 
##                  Reference
## Prediction        Calabria Coast.Sardinia East.Liguria Inland.Sardinia
##   Calabria              15              0            0               0
##   Coast.Sardinia         0              8            0               0
##   East.Liguria           0              0            6               0
##   Inland.Sardinia        0              0            0              10
##   North.Apulia           0              0            0               0
##   Sicily                 0              0            0               0
##   South.Apulia           0              0            1               0
##   Umbria                 0              0            2               0
##   West.Liguria           0              0            0               0
##                  Reference
## Prediction        North.Apulia Sicily South.Apulia Umbria West.Liguria
##   Calabria                   0      0            1      0            0
##   Coast.Sardinia             0      0            0      0            0
##   East.Liguria               0      0            0      0            0
##   Inland.Sardinia            0      0            0      0            0
##   North.Apulia               3      0            0      0            0
##   Sicily                     0      3            3      0            0
##   South.Apulia               0      0           40      0            2
##   Umbria                     0      0            0     11            0
##   West.Liguria               0      0            0      0            9
## 
## Overall Statistics
##                                           
##                Accuracy : 0.9211          
##                  95% CI : (0.8554, 0.9633)
##     No Information Rate : 0.386           
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.9011          
##                                           
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: Calabria Class: Coast.Sardinia Class: East.Liguria
## Sensitivity                   1.0000               1.00000             0.66667
## Specificity                   0.9899               1.00000             1.00000
## Pos Pred Value                0.9375               1.00000             1.00000
## Neg Pred Value                1.0000               1.00000             0.97222
## Precision                     0.9375               1.00000             1.00000
## Recall                        1.0000               1.00000             0.66667
## F1                            0.9677               1.00000             0.80000
## Prevalence                    0.1316               0.07018             0.07895
## Detection Rate                0.1316               0.07018             0.05263
## Detection Prevalence          0.1404               0.07018             0.05263
## Balanced Accuracy             0.9949               1.00000             0.83333
##                      Class: Inland.Sardinia Class: North.Apulia Class: Sicily
## Sensitivity                         1.00000             1.00000       1.00000
## Specificity                         1.00000             1.00000       0.97297
## Pos Pred Value                      1.00000             1.00000       0.50000
## Neg Pred Value                      1.00000             1.00000       1.00000
## Precision                           1.00000             1.00000       0.50000
## Recall                              1.00000             1.00000       1.00000
## F1                                  1.00000             1.00000       0.66667
## Prevalence                          0.08772             0.02632       0.02632
## Detection Rate                      0.08772             0.02632       0.02632
## Detection Prevalence                0.08772             0.02632       0.05263
## Balanced Accuracy                   1.00000             1.00000       0.98649
##                      Class: South.Apulia Class: Umbria Class: West.Liguria
## Sensitivity                       0.9091       1.00000             0.81818
## Specificity                       0.9571       0.98058             1.00000
## Pos Pred Value                    0.9302       0.84615             1.00000
## Neg Pred Value                    0.9437       1.00000             0.98095
## Precision                         0.9302       0.84615             1.00000
## Recall                            0.9091       1.00000             0.81818
## F1                                0.9195       0.91667             0.90000
## Prevalence                        0.3860       0.09649             0.09649
## Detection Rate                    0.3509       0.09649             0.07895
## Detection Prevalence              0.3772       0.11404             0.07895
## Balanced Accuracy                 0.9331       0.99029             0.90909

Testandmetel Kappa=0.9011.

Muutujate tähtsus KNN klassifitseerimisel

plot(varImp(fit.logit))

Atribuutide tähtsus erinevates klassides on erinev.

Kokkuvõte

Tulemusena on saadud järgmised mudelid:

library(knitr)
library(kableExtra)
mudel <- c("KNN", "NB", "SVMpoly", "LogR")
Kappa_train <- c(0.9893, 0.9439, 0.9733,0.976)
Kappa_test <- c(0.9125, 0.8805, 0.9344,0.9011)
tabel <- cbind("Mudel"=mudel,"Kappa_train"=Kappa_train,"Kappa_test"=Kappa_test)

x <- kable_styling(kable(tabel, caption="Parimad mudelid"), c("striped","bordered","hover"),full_width = F, position = "left")
column_spec(column_spec(x,1, width_min = "25em"),2:3, width_min = "5em")
Parimad mudelid
Mudel Kappa_train Kappa_test
KNN 0.9893 0.9125
NB 0.9439 0.8805
SVMpoly 0.9733 0.9344
LogR 0.976 0.9011