Introduction to Neural Networks
Neural network is an information-processing machine and can be viewed as analogous to human nervous system. Just like human nervous system, which is made up of interconnected neurons, a neural network is made up of interconnected information processing units. The information processing units do not work in a linear manner. In fact, neural network draws its strength from parallel processing of information, which allows it to deal with non-linearity. Neural network becomes handy to infer meaning and detect patterns from complex data sets.
By the end of this lesson, you will be able to
A neural network is a computational system that creates predictions based on existing data. A neural network is a model characterized by an activation function, which is used by interconnected information processing units to transform input into output. A neural network has always been compared to human nervous system. Information in passed through interconnected units analogous to information passage through neurons in humans. The first layer of the neural network receives the raw input, processes it and passes the processed information to the hidden layers. The hidden layer passes the information to the last layer, which produces the output. The advantage of neural network is that it is adaptive in nature. It learns from the information provided, i.e. trains itself from the data, which has a known outcome and optimizes its weights for a better prediction in situations with unknown outcome.
A perceptron, viz. single layer neural network, is the most basic form of a neural network. A perceptron receives multidimensional input and processes it using a weighted summation and an activation function. It is trained using a labeled data and learning algorithm that optimize the weights in the summation processor. A major limitation of perceptron model is its inability to deal with non-linearity. A multilayered neural network overcomes this limitation and helps solve non-linear problems. The input layer connects with hidden layer, which in turn connects to the output layer. The connections are weighted and weights are optimized using a learning rule.
There are many learning rules that are used with neural network:
The learning rules can be used in conjunction with backpropgation error method. The learning rule is used to calculate the error at the output unit. This error is backpropagated to all the units such that the error at each unit is proportional to the contribution of that unit towards total error at the output unit. The errors at each unit are then used to optimize the weight at each connection.
A neural network consists of:
Input layers: Layers that take inputs based on existing data
Hidden layers: Layers that use backpropagation to optimise the weights of the input variables in order to improve the predictive power of the model
Output layers: Output of predictions based on the data from the input and hidden layers
library(tidyverse)
set.seed(1)
#lets split the data 60/40
library(caret)
trainIndex <- createDataPartition(iris$Species, p = .6, list = FALSE, times = 1)
#grab the data
irisTrain <- iris[ trainIndex,]
irisTest <- iris[-trainIndex,]
#added something here
IrisNNET<- train(
form = factor(Species) ~ .,
data = irisTrain,
#here we add classProbs because we want probs
trControl = trainControl(method = "cv", number = 10,
classProbs = TRUE),
method = "nnet",
preProcess = c("center", "scale"),
tuneLength = 5,
#add this please
trace=FALSE)
#IrisNNET
knitr::kable(IrisNNET$bestTune)
size | decay | |
---|---|---|
11 | 5 | 0 |
plot(IrisNNET)
IrisNNET_Pred<-predict(IrisNNET,irisTest,type="prob")
knitr::kable(IrisNNET_Pred)%>%
kableExtra::kable_styling("striped")%>%
kableExtra::scroll_box(width = "50%",height="300px")
setosa | versicolor | virginica | |
---|---|---|---|
2 | 0.9999970 | 0.0000030 | 0.0000000 |
5 | 0.9999970 | 0.0000030 | 0.0000000 |
8 | 0.9999970 | 0.0000030 | 0.0000000 |
11 | 0.9999970 | 0.0000030 | 0.0000000 |
13 | 0.9999970 | 0.0000030 | 0.0000000 |
16 | 0.9999970 | 0.0000030 | 0.0000000 |
17 | 0.9999970 | 0.0000030 | 0.0000000 |
19 | 0.9999970 | 0.0000030 | 0.0000000 |
24 | 0.9999970 | 0.0000030 | 0.0000000 |
28 | 0.9999970 | 0.0000030 | 0.0000000 |
30 | 0.9999970 | 0.0000030 | 0.0000000 |
32 | 0.9999970 | 0.0000030 | 0.0000000 |
35 | 0.9999970 | 0.0000030 | 0.0000000 |
37 | 0.9999970 | 0.0000030 | 0.0000000 |
38 | 0.9999970 | 0.0000030 | 0.0000000 |
42 | 0.9999965 | 0.0000035 | 0.0000000 |
44 | 0.9999970 | 0.0000030 | 0.0000000 |
45 | 0.9999970 | 0.0000030 | 0.0000000 |
46 | 0.9999970 | 0.0000030 | 0.0000000 |
50 | 0.9999970 | 0.0000030 | 0.0000000 |
53 | 0.0000000 | 1.0000000 | 0.0000000 |
55 | 0.0000000 | 1.0000000 | 0.0000000 |
59 | 0.0000000 | 1.0000000 | 0.0000000 |
60 | 0.0000000 | 1.0000000 | 0.0000000 |
63 | 0.0000000 | 1.0000000 | 0.0000000 |
65 | 0.0000000 | 1.0000000 | 0.0000000 |
66 | 0.0000000 | 1.0000000 | 0.0000000 |
70 | 0.0000000 | 1.0000000 | 0.0000000 |
71 | 0.0000000 | 0.0000000 | 1.0000000 |
76 | 0.0000000 | 1.0000000 | 0.0000000 |
77 | 0.0000000 | 1.0000000 | 0.0000000 |
78 | 0.0000000 | 1.0000000 | 0.0000000 |
81 | 0.0000000 | 1.0000000 | 0.0000000 |
83 | 0.0000000 | 1.0000000 | 0.0000000 |
84 | 0.0000000 | 0.0000000 | 1.0000000 |
85 | 0.0000000 | 0.9999984 | 0.0000016 |
86 | 0.0000000 | 1.0000000 | 0.0000000 |
87 | 0.0000000 | 1.0000000 | 0.0000000 |
95 | 0.0000000 | 1.0000000 | 0.0000000 |
97 | 0.0000000 | 1.0000000 | 0.0000000 |
103 | 0.0000000 | 0.0000000 | 1.0000000 |
104 | 0.0000000 | 0.0000000 | 1.0000000 |
105 | 0.0000000 | 0.0000000 | 1.0000000 |
106 | 0.0000000 | 0.0000000 | 1.0000000 |
108 | 0.0000000 | 0.0000000 | 1.0000000 |
112 | 0.0000000 | 0.0000000 | 1.0000000 |
114 | 0.0000000 | 0.0000000 | 1.0000000 |
115 | 0.0000000 | 0.0000000 | 1.0000000 |
124 | 0.0000000 | 0.0000000 | 1.0000000 |
127 | 0.0000000 | 0.0000000 | 1.0000000 |
130 | 0.0000000 | 0.9992583 | 0.0007417 |
134 | 0.0000000 | 1.0000000 | 0.0000000 |
135 | 0.0000000 | 0.0000000 | 1.0000000 |
136 | 0.0000000 | 0.0000000 | 1.0000000 |
137 | 0.0000000 | 0.0000000 | 1.0000000 |
138 | 0.0000000 | 0.0000000 | 1.0000000 |
142 | 0.0000000 | 0.0000000 | 1.0000000 |
144 | 0.0000000 | 0.0000000 | 1.0000000 |
145 | 0.0000000 | 0.0000000 | 1.0000000 |
147 | 0.0000000 | 0.0000000 | 1.0000000 |
irisNNETtestpred<-cbind(IrisNNET_Pred,irisTest)
irisNNETtestpred<-irisNNETtestpred%>%
mutate(prediction=if_else(setosa>versicolor & setosa>virginica,"setosa",
if_else(versicolor>setosa & versicolor>virginica, "versicolor",
if_else(virginica>setosa & virginica>versicolor,"virginica", "PROBLEM"))))
table(irisNNETtestpred$prediction)
setosa versicolor virginica
20 20 20
NNETConfusion<-confusionMatrix(factor(irisNNETtestpred$prediction),factor(irisNNETtestpred$Species))
NNETConfusion
Confusion Matrix and Statistics
Reference
Prediction setosa versicolor virginica
setosa 20 0 0
versicolor 0 18 2
virginica 0 2 18
Overall Statistics
Accuracy : 0.9333
95% CI : (0.838, 0.9815)
No Information Rate : 0.3333
P-Value [Acc > NIR] : < 2.2e-16
Kappa : 0.9
Mcnemar's Test P-Value : NA
Statistics by Class:
Class: setosa Class: versicolor Class: virginica
Sensitivity 1.0000 0.9000 0.9000
Specificity 1.0000 0.9500 0.9500
Pos Pred Value 1.0000 0.9000 0.9000
Neg Pred Value 1.0000 0.9500 0.9500
Prevalence 0.3333 0.3333 0.3333
Detection Rate 0.3333 0.3000 0.3000
Detection Prevalence 0.3333 0.3333 0.3333
Balanced Accuracy 1.0000 0.9250 0.9250
ggplot(as.data.frame(NNETConfusion$table))+
geom_raster(aes(x=Reference, y=Prediction, fill=Freq)) +
geom_text(aes(x=Reference, y=Prediction, label=Freq)) +
scale_fill_gradient2( low = "red", high = "darkred", na.value="black", name = "Freq" )+
scale_x_discrete(name="Actual Class") +
scale_y_discrete(name="Predicted Class")+
ggtitle("Confusion is fun")+
theme(plot.title = element_text(hjust=0.5, size=10, face='bold'))
#summary(IrisNNET)
#had to add something
V<-as.data.frame(caret::varImp(IrisNNET)$importance)%>%
arrange(desc(Overall))
knitr::kable(V)
Overall | setosa | versicolor | virginica | |
---|---|---|---|---|
Petal.Width | 100.00000 | 100.00000 | 100.00000 | 100.00000 |
Petal.Length | 64.27811 | 64.27811 | 64.27811 | 64.27811 |
Sepal.Length | 27.49959 | 27.49959 | 27.49959 | 27.49959 |
Sepal.Width | 0.00000 | 0.00000 | 0.00000 | 0.00000 |
ggplot2::ggplot(V, aes(x=reorder(rownames(V),Overall), y=Overall)) +
geom_point( color="blue", size=4, alpha=0.6)+
geom_segment( aes(x=rownames(V), xend=rownames(V), y=0, yend=Overall),
color='skyblue') +
xlab('Variable')+
ylab('Overall Importance')+
theme_light() +
coord_flip()
#install.packages("NeuralNetTools")
NeuralNetTools::garson(IrisNNET)
NeuralNetTools::olden(IrisNNET)
NeuralNetTools::plotnet(IrisNNET)
set.seed(1)
#lets split the data 60/40
library(caret)
trainIndex <- createDataPartition(iris$Species, p = .6, list = FALSE, times = 1)
#grab the data
train <- iris[ trainIndex,]
test <- iris[-trainIndex,]
mygrid<-expand.grid(size = seq(1, 9, by = 2),
decay = c(0.0001,0.001,0.01, 0.1))
IrisNNET<- train(
form = factor(Species) ~ .,
data = train,
#here we add classProbs because we want probs
trControl = trainControl(method = "cv", number = 10,
classProbs = TRUE),
method = "nnet",
preProcess = c("center", "scale"),
tuneGrid = mygrid,
trace=FALSE)
knitr::kable(IrisNNET$bestTune)
size | decay | |
---|---|---|
3 | 1 | 0.01 |
pl = seq(min(iris$Petal.Length), max(iris$Petal.Length), by=0.1)
pw = seq(min(iris$Petal.Width), max(iris$Petal.Width), by=0.1)
# generates the boundaries for your graph
lgrid <- expand.grid(Petal.Length=pl,
Petal.Width=pw,
Sepal.Length = 5.4,
Sepal.Width=3.1)
IrisNNETGrid2 <- predict(IrisNNET, newdata=lgrid)
IrisNNETGrid <- as.numeric(IrisNNETGrid2)
# get the points from the test data...
testPred <- predict(IrisNNET, newdata=test)
testPred <- as.numeric(testPred)
# this gets the points for the testPred...
test$Pred <- testPred
probs <- matrix(IrisNNETGrid, length(pl), length(pw))
ggplot(data=lgrid) + stat_contour(aes(x=Petal.Length, y=Petal.Width, z=IrisNNETGrid),bins=10) +
geom_point(aes(x=Petal.Length, y=Petal.Width, colour=IrisNNETGrid2),alpha=.2) +
geom_point(data=test, aes(x=Petal.Length, y=Petal.Width, shape=Species), size=2) +
labs(shape = "Testing Species") +
geom_point(data=train, aes(x=Petal.Length, y=Petal.Width, color=Species), size=2, alpha=0.75)+
theme_bw()+
labs(color = "Training Species")+
ggtitle("Decision Surface")
Lets get some more interesting data:
# install.packages("devtools")
# devtools::install_github("tyluRp/ucimlr")
datatoget<-ucimlr::ucidata()
knitr::kable(datatoget)%>%
kableExtra::kable_styling("striped")%>%
kableExtra::scroll_box(width = "100%",height="300px")
name | type | task | variable_types | observations | variables | year |
---|---|---|---|---|---|---|
Abalone | Multivariate | Classification | Categorical, Integer, Real | 4177 | 8 | 1995 |
Adult | Multivariate | Classification | Categorical, Integer | 48842 | 14 | 1996 |
Annealing | Multivariate | Classification | Categorical, Integer, Real | 798 | 38 | NA |
Anonymous Microsoft Web Data | NA | Recommender-Systems | Categorical | 37711 | 294 | 1998 |
Arrhythmia | Multivariate | Classification | Categorical, Integer, Real | 452 | 279 | 1998 |
Artificial Characters | Multivariate | Classification | Categorical, Integer, Real | 6000 | 7 | 1992 |
Audiology (Original) | Multivariate | Classification | Categorical | 226 | NA | 1987 |
Audiology (Standardized) | Multivariate | Classification | Categorical | 226 | 69 | 1992 |
Auto MPG | Multivariate | Regression | Categorical, Real | 398 | 8 | 1993 |
Automobile | Multivariate | Regression | Categorical, Integer, Real | 205 | 26 | 1987 |
Badges | Univariate, Text | Classification | NA | 294 | 1 | 1994 |
Balance Scale | Multivariate | Classification | Categorical | 625 | 4 | 1994 |
Balloons | Multivariate | Classification | Categorical | 16 | 4 | NA |
Breast Cancer | Multivariate | Classification | Categorical | 286 | 9 | 1988 |
Breast Cancer Wisconsin (Original) | Multivariate | Classification | Integer | 699 | 10 | 1992 |
Breast Cancer Wisconsin (Prognostic) | Multivariate | Classification, Regression | Real | 198 | 34 | 1995 |
Breast Cancer Wisconsin (Diagnostic) | Multivariate | Classification | Real | 569 | 32 | 1995 |
Pittsburgh Bridges | Multivariate | Classification | Categorical, Integer | 108 | 13 | 1990 |
Car Evaluation | Multivariate | Classification | Categorical | 1728 | 6 | 1997 |
Census Income | Multivariate | Classification | Categorical, Integer | 48842 | 14 | 1996 |
Chess (King-Rook vs. King-Knight) | Multivariate, Data-Generator | Classification | Categorical, Integer | NA | 22 | 1988 |
Chess (King-Rook vs. King-Pawn) | Multivariate | Classification | Categorical | 3196 | 36 | 1989 |
Chess (King-Rook vs. King) | Multivariate | Classification | Categorical, Integer | 28056 | 6 | 1994 |
Chess (Domain Theories) | Domain-Theory | NA | NA | NA | NA | NA |
Bach Chorales | Univariate, Time-Series | NA | Categorical, Integer | 100 | 6 | NA |
Connect-4 | Multivariate, Spatial | Classification | Categorical | 67557 | 42 | 1995 |
Credit Approval | Multivariate | Classification | Categorical, Integer, Real | 690 | 15 | NA |
Japanese Credit Screening | Multivariate, Domain-Theory | Classification | Categorical, Real, Integer | 125 | NA | 1992 |
Computer Hardware | Multivariate | Regression | Integer | 209 | 9 | 1987 |
Contraceptive Method Choice | Multivariate | Classification | Categorical, Integer | 1473 | 9 | 1997 |
Covertype | Multivariate | Classification | Categorical, Integer | 581012 | 54 | 1998 |
Cylinder Bands | Multivariate | Classification | Categorical, Integer, Real | 512 | 39 | 1995 |
Dermatology | Multivariate | Classification | Categorical, Integer | 366 | 33 | 1998 |
Diabetes | Multivariate, Time-Series | NA | Categorical, Integer | NA | 20 | NA |
DGP2 - The Second Data Generation Program | Data-Generator | NA | Real | NA | NA | NA |
Document Understanding | NA | NA | NA | NA | NA | 1994 |
EBL Domain Theories | NA | NA | NA | NA | NA | NA |
Echocardiogram | Multivariate | Classification | Categorical, Integer, Real | 132 | 12 | 1989 |
Ecoli | Multivariate | Classification | Real | 336 | 8 | 1996 |
Flags | Multivariate | Classification | Categorical, Integer | 194 | 30 | 1990 |
Function Finding | NA | Function-Learning | Real | 352 | NA | 1990 |
Glass Identification | Multivariate | Classification | Real | 214 | 10 | 1987 |
Haberman’s Survival | Multivariate | Classification | Integer | 306 | 3 | 1999 |
Hayes-Roth | Multivariate | Classification | Categorical | 160 | 5 | 1989 |
Heart Disease | Multivariate | Classification | Categorical, Integer, Real | 303 | 75 | 1988 |
Hepatitis | Multivariate | Classification | Categorical, Integer, Real | 155 | 19 | 1988 |
Horse Colic | Multivariate | Classification | Categorical, Integer, Real | 368 | 27 | 1989 |
ICU | Multivariate, Time-Series | NA | Real | NA | NA | NA |
Image Segmentation | Multivariate | Classification | Real | 2310 | 19 | 1990 |
Internet Advertisements | Multivariate | Classification | Categorical, Integer, Real | 3279 | 1558 | 1998 |
Ionosphere | Multivariate | Classification | Integer, Real | 351 | 34 | 1989 |
Iris | Multivariate | Classification | Real | 150 | 4 | 1988 |
ISOLET | Multivariate | Classification | Real | 7797 | 617 | 1994 |
Kinship | Relational | Relational-Learning | Categorical | 104 | 12 | 1990 |
Labor Relations | Multivariate | NA | Categorical, Integer, Real | 57 | 16 | 1988 |
LED Display Domain | Multivariate, Data-Generator | Classification | Categorical | NA | 7 | 1988 |
Lenses | Multivariate | Classification | Categorical | 24 | 4 | 1990 |
Letter Recognition | Multivariate | Classification | Integer | 20000 | 16 | 1991 |
Liver Disorders | Multivariate | NA | Categorical, Integer, Real | 345 | 7 | 1990 |
Logic Theorist | Domain-Theory | NA | NA | NA | NA | NA |
Lung Cancer | Multivariate | Classification | Integer | 32 | 56 | 1992 |
Lymphography | Multivariate | Classification | Categorical | 148 | 18 | 1988 |
Mechanical Analysis | Multivariate | Classification | Categorical, Integer, Real | 209 | 8 | 1990 |
Meta-data | Multivariate | Classification | Categorical, Integer, Real | 528 | 22 | 1996 |
Mobile Robots | Domain-Theory | NA | Categorical, Integer, Real | NA | NA | 1995 |
Molecular Biology (Promoter Gene Sequences) | Sequential, Domain-Theory | Classification | Categorical | 106 | 58 | 1990 |
Molecular Biology (Protein Secondary Structure) | Sequential | Classification | Categorical | 128 | NA | NA |
Molecular Biology (Splice-junction Gene Sequences) | Sequential, Domain-Theory | Classification | Categorical | 3190 | 61 | 1992 |
MONK’s Problems | Multivariate | Classification | Categorical | 432 | 7 | 1992 |
Moral Reasoner | Domain-Theory | NA | NA | 202 | NA | 1994 |
Multiple Features | Multivariate | Classification | Integer, Real | 2000 | 649 | NA |
Mushroom | Multivariate | Classification | Categorical | 8124 | 22 | 1987 |
Musk (Version 1) | Multivariate | Classification | Integer | 476 | 168 | 1994 |
Musk (Version 2) | Multivariate | Classification | Integer | 6598 | 168 | 1994 |
Nursery | Multivariate | Classification | Categorical | 12960 | 8 | 1997 |
Othello Domain Theory | Domain-Theory | NA | NA | NA | NA | 1991 |
Page Blocks Classification | Multivariate | Classification | Integer, Real | 5473 | 10 | 1995 |
Optical Recognition of Handwritten Digits | Multivariate | Classification | Integer | 5620 | 64 | 1998 |
Pen-Based Recognition of Handwritten Digits | Multivariate | Classification | Integer | 10992 | 16 | 1998 |
Post-Operative Patient | Multivariate | Classification | Categorical, Integer | 90 | 8 | 1993 |
Primary Tumor | Multivariate | Classification | Categorical | 339 | 17 | 1988 |
Prodigy | Domain-Theory | NA | NA | NA | NA | NA |
Qualitative Structure Activity Relationships | Domain-Theory | NA | NA | NA | NA | NA |
Quadruped Mammals | Multivariate, Data-Generator | Classification | Real | NA | 72 | 1992 |
Servo | Multivariate | Regression | Categorical, Integer | 167 | 4 | 1993 |
Shuttle Landing Control | Multivariate | Classification | Categorical | 15 | 6 | 1988 |
Solar Flare | Multivariate | Regression | Categorical | 1389 | 10 | 1989 |
Soybean (Large) | Multivariate | Classification | Categorical | 307 | 35 | 1988 |
Soybean (Small) | Multivariate | Classification | Categorical | 47 | 35 | 1987 |
Challenger USA Space Shuttle O-Ring | Multivariate | Regression | Integer | 23 | 4 | 1993 |
Low Resolution Spectrometer | Multivariate | Classification | Integer, Real | 531 | 102 | 1988 |
Spambase | Multivariate | Classification | Integer, Real | 4601 | 57 | 1999 |
SPECT Heart | Multivariate | Classification | Categorical | 267 | 22 | 2001 |
SPECTF Heart | Multivariate | Classification | Integer | 267 | 44 | 2001 |
Sponge | Multivariate | Clustering | Categorical, Integer | 76 | 45 | NA |
Statlog Project | NA | NA | NA | NA | NA | 1992 |
Student Loan Relational | Domain-Theory | NA | NA | 1000 | NA | 1993 |
Teaching Assistant Evaluation | Multivariate | Classification | Categorical, Integer | 151 | 5 | 1997 |
Tic-Tac-Toe Endgame | Multivariate | Classification | Categorical | 958 | 9 | 1991 |
Thyroid Disease | Multivariate, Domain-Theory | Classification | Categorical, Real | 7200 | 21 | 1987 |
Trains | Multivariate | Classification | Categorical | 10 | 32 | 1994 |
University | Multivariate | Classification | Categorical, Integer | 285 | 17 | 1988 |
Congressional Voting Records | Multivariate | Classification | Categorical | 435 | 16 | 1987 |
Water Treatment Plant | Multivariate | Clustering | Integer, Real | 527 | 38 | 1993 |
Waveform Database Generator (Version 1) | Multivariate, Data-Generator | Classification | Real | 5000 | 21 | 1988 |
Waveform Database Generator (Version 2) | Multivariate, Data-Generator | Classification | Real | 5000 | 40 | 1988 |
Wine | Multivariate | Classification | Integer, Real | 178 | 13 | 1991 |
Yeast | Multivariate | Classification | Real | 1484 | 8 | 1996 |
Zoo | Multivariate | Classification | Categorical, Integer | 101 | 17 | 1990 |
Undocumented | NA | NA | NA | NA | NA | NA |
Twenty Newsgroups | Text | NA | NA | 20000 | NA | 1999 |
Australian Sign Language signs | Multivariate, Time-Series | Classification | Categorical, Real | 6650 | 15 | 1999 |
Australian Sign Language signs (High Quality) | Multivariate, Time-Series | Classification | Real | 2565 | 22 | 2002 |
US Census Data (1990) | Multivariate | Clustering | Categorical | 2458285 | 68 | NA |
Census-Income (KDD) | Multivariate | Classification | Categorical, Integer | 299285 | 40 | 2000 |
Coil 1999 Competition Data | Multivariate | NA | Categorical, Real | 340 | 17 | 1999 |
Corel Image Features | Multivariate | NA | Real | 68040 | 89 | 1999 |
E. Coli Genes | Relational | NA | NA | NA | NA | 2001 |
EEG Database | Multivariate, Time-Series | NA | Categorical, Integer, Real | 122 | 4 | 1999 |
El Nino | Spatio-temporal | NA | Integer, Real | 178080 | 12 | 1999 |
Entree Chicago Recommendation Data | Transactional, Sequential | Recommender-Systems | Categorical | 50672 | NA | 2000 |
CMU Face Images | Image | Classification | Integer | 640 | NA | 1999 |
Insurance Company Benchmark (COIL 2000) | Multivariate | Regression, Description | Categorical, Integer | 9000 | 86 | 2000 |
Internet Usage Data | Multivariate | NA | Categorical, Integer | 10104 | 72 | 1999 |
IPUMS Census Database | Multivariate | NA | Categorical, Integer | 256932 | 61 | 1999 |
Japanese Vowels | Multivariate, Time-Series | Classification | Real | 640 | 12 | NA |
KDD Cup 1998 Data | Multivariate | Regression | Categorical, Integer | 191779 | 481 | 1998 |
KDD Cup 1999 Data | Multivariate | Classification | Categorical, Integer | 4000000 | 42 | 1999 |
M. Tuberculosis Genes | Relational | NA | NA | NA | NA | 2001 |
Movie | Multivariate, Relational | NA | NA | 10000 | NA | 1999 |
MSNBC.com Anonymous Web Data | Sequential | NA | Categorical | 989818 | NA | NA |
NSF Research Award Abstracts 1990-2003 | Text | NA | NA | 129000 | NA | 2003 |
Pioneer-1 Mobile Robot Data | Multivariate, Time-Series | NA | Categorical, Real | NA | NA | 1999 |
Pseudo Periodic Synthetic Time Series | Univariate, Time-Series | NA | NA | 100000 | NA | 1999 |
Reuters-21578 Text Categorization Collection | Text | Classification | Categorical | 21578 | 5 | 1997 |
Robot Execution Failures | Multivariate, Time-Series | Classification | Integer | 463 | 90 | 1999 |
Synthetic Control Chart Time Series | Time-Series | Classification, Clustering | Real | 600 | NA | 1999 |
Syskill and Webert Web Page Ratings | Multivariate, Text | Classification | Categorical | 332 | 5 | 1998 |
UNIX User Data | Text, Sequential | NA | NA | NA | NA | NA |
Volcanoes on Venus - JARtool experiment | Image | Classification | NA | NA | NA | NA |
Statlog (Australian Credit Approval) | Multivariate | Classification | Categorical, Integer, Real | 690 | 14 | NA |
Statlog (German Credit Data) | Multivariate | Classification | Categorical, Integer | 1000 | 20 | 1994 |
Statlog (Heart) | Multivariate | Classification | Categorical, Real | 270 | 13 | NA |
Statlog (Landsat Satellite) | Multivariate | Classification | Integer | 6435 | 36 | 1993 |
Statlog (Image Segmentation) | Multivariate | Classification | Real | 2310 | 19 | 1990 |
Statlog (Shuttle) | Multivariate | Classification | Integer | 58000 | 9 | NA |
Statlog (Vehicle Silhouettes) | Multivariate | Classification | Integer | 946 | 18 | NA |
Connectionist Bench (Nettalk Corpus) | Multivariate | NA | Categorical | 20008 | 4 | NA |
Connectionist Bench (Sonar, Mines vs. Rocks) | Multivariate | Classification | Real | 208 | 60 | NA |
Connectionist Bench (Vowel Recognition - Deterding Data) | NA | Classification | Real | 528 | 10 | NA |
Economic Sanctions | Domain-Theory | NA | NA | NA | NA | NA |
Protein Data | NA | NA | NA | NA | NA | NA |
Cloud | Multivariate | NA | Real | 1024 | 10 | 1989 |
CalIt2 Building People Counts | Multivariate, Time-Series | NA | Categorical, Integer | 10080 | 4 | 2006 |
Dodgers Loop Sensor | Multivariate, Time-Series | NA | Categorical, Integer | 50400 | 3 | 2006 |
Poker Hand | Multivariate | Classification | Categorical, Integer | 1025010 | 11 | 2007 |
MAGIC Gamma Telescope | Multivariate | Classification | Real | 19020 | 11 | 2007 |
UJI Pen Characters | Multivariate, Sequential | Classification | Integer | 1364 | NA | 2007 |
Mammographic Mass | Multivariate | Classification | Integer | 961 | 6 | 2007 |
Forest Fires | Multivariate | Regression | Real | 517 | 13 | 2008 |
Reuters Transcribed Subset | Text | Classification | NA | 200 | NA | 2008 |
Bag of Words | Text | Clustering | Integer | 8000000 | 100000 | 2008 |
Concrete Compressive Strength | Multivariate | Regression | Real | 1030 | 9 | 2007 |
Hill-Valley | Sequential | Classification | Real | 606 | 101 | 2008 |
Arcene | Multivariate | Classification | Real | 900 | 10000 | 2008 |
Dexter | Multivariate | Classification | Integer | 2600 | 20000 | 2008 |
Dorothea | Multivariate | Classification | Integer | 1950 | 100000 | 2008 |
Gisette | Multivariate | Classification | Integer | 13500 | 5000 | 2008 |
Madelon | Multivariate | Classification | Real | 4400 | 500 | 2008 |
Ozone Level Detection | Multivariate, Sequential, Time-Series | Classification | Real | 2536 | 73 | 2008 |
Abscisic Acid Signaling Network | Multivariate | Causal-Discovery | Integer | 300 | 43 | 2008 |
Parkinsons | Multivariate | Classification | Real | 197 | 23 | 2008 |
Character Trajectories | Time-Series | Classification, Clustering | Real | 2858 | 3 | 2008 |
Blood Transfusion Service Center | Multivariate | Classification | Real | 748 | 5 | 2008 |
UJI Pen Characters (Version 2) | Multivariate, Sequential | Classification | Integer | 11640 | NA | 2009 |
Semeion Handwritten Digit | Multivariate | Classification | Integer | 1593 | 256 | 2008 |
SECOM | Multivariate | Classification, Causal-Discovery | Real | 1567 | 591 | 2008 |
Plants | Multivariate | Clustering | Categorical | 22632 | 70 | 2008 |
Libras Movement | Multivariate, Sequential | Classification, Clustering | Real | 360 | 91 | 2009 |
Concrete Slump Test | Multivariate | Regression | Real | 103 | 10 | 2009 |
Communities and Crime | Multivariate | Regression | Real | 1994 | 128 | 2009 |
Acute Inflammations | Multivariate | Classification | Categorical, Integer | 120 | 6 | 2009 |
Wine Quality | Multivariate | Classification, Regression | Real | 4898 | 12 | 2009 |
URL Reputation | Multivariate, Time-Series | Classification | Integer, Real | 2396130 | 3231961 | 2009 |
p53 Mutants | Multivariate | Classification | Real | 16772 | 5409 | 2010 |
Parkinsons Telemonitoring | Multivariate | Regression | Integer, Real | 5875 | 26 | 2009 |
Demospongiae | Multivariate | Classification | Integer | 503 | NA | 2010 |
Opinosis Opinion / Review | Text | NA | NA | 51 | NA | 2010 |
Breast Tissue | Multivariate | Classification | Real | 106 | 10 | 2010 |
Cardiotocography | Multivariate | Classification | Real | 2126 | 23 | 2010 |
Wall-Following Robot Navigation Data | Multivariate, Sequential | Classification | Real | 5456 | 24 | 2010 |
Spoken Arabic Digit | Multivariate, Time-Series | Classification | Real | 8800 | 13 | 2010 |
Localization Data for Person Activity | Univariate, Sequential, Time-Series | Classification | Real | 164860 | 8 | 2010 |
AutoUniv | Multivariate | Classification | Categorical, Integer, Real | NA | NA | 2010 |
Steel Plates Faults | Multivariate | Classification | Integer, Real | 1941 | 27 | 2010 |
MiniBooNE particle identification | Multivariate | Classification | Real | 130065 | 50 | 2010 |
YearPredictionMSD | Multivariate | Regression | Real | 515345 | 90 | 2011 |
PEMS-SF | Multivariate, Time-Series | Classification | Real | 440 | 138672 | 2011 |
OpinRank Review Dataset | Text | NA | NA | NA | NA | 2011 |
Relative location of CT slices on axial axis | Domain-Theory | Regression | Real | 53500 | 386 | 2011 |
Online Handwritten Assamese Characters Dataset | Multivariate, Sequential | Classification | Integer | 8235 | NA | 2011 |
PubChem Bioassay Data | Multivariate | Classification | Integer, Real | NA | NA | 2011 |
Record Linkage Comparison Patterns | Multivariate | Classification | Real | 5749132 | 12 | 2011 |
Communities and Crime Unnormalized | Multivariate | Regression | Real | 2215 | 147 | 2011 |
Vertebral Column | Multivariate | Classification | Real | 310 | 6 | 2011 |
EMG Physical Action Data Set | Time-Series | Classification | Real | 10000 | 8 | 2011 |
Vicon Physical Action Data Set | Time-Series | Classification | Real | 3000 | 27 | 2011 |
Amazon Commerce reviews set | Multivariate, Text, Domain-Theory | Classification | Real | 1500 | 10000 | 2011 |
Amazon Access Samples | Time-Series, Domain-Theory | Regression, Clustering, Causal-Discovery | NA | 30000 | 20000 | 2011 |
Reuter_50_50 | Multivariate, Text, Domain-Theory | Classification, Clustering | Real | 2500 | 10000 | 2011 |
Farm Ads | Text | Classification | NA | 4143 | 54877 | 2011 |
DBWorld e-mails | Text | Classification | NA | 64 | 4702 | 2011 |
KEGG Metabolic Relation Network (Directed) | Multivariate, Univariate, Text | Classification, Regression, Clustering | Integer, Real | 53414 | 24 | 2011 |
KEGG Metabolic Reaction Network (Undirected) | Multivariate, Univariate, Text | Classification, Regression, Clustering | Integer, Real | 65554 | 29 | 2011 |
Bank Marketing | Multivariate | Classification | Real | 45211 | 17 | 2012 |
YouTube Comedy Slam Preference Data | Text | Classification | NA | 1138562 | 3 | 2012 |
Gas Sensor Array Drift Dataset | Multivariate | Classification | Real | 13910 | 128 | 2012 |
ILPD (Indian Liver Patient Dataset) | Multivariate | Classification | Integer, Real | 583 | 10 | 2012 |
OPPORTUNITY Activity Recognition | Multivariate, Time-Series | Classification | Real | 2551 | 242 | 2012 |
Nomao | Univariate | Classification | Real | 34465 | 120 | 2012 |
SMS Spam Collection | Multivariate, Text, Domain-Theory | Classification, Clustering | Real | 5574 | NA | 2012 |
Skin Segmentation | Univariate | Classification | Real | 245057 | 4 | 2012 |
Planning Relax | Univariate | Classification | Real | 182 | 13 | 2012 |
PAMAP2 Physical Activity Monitoring | Multivariate, Time-Series | Classification | Real | 3850505 | 52 | 2012 |
Restaurant & consumer data | Multivariate | NA | NA | 138 | 47 | 2012 |
CNAE-9 | Multivariate, Text | Classification | Integer | 1080 | 857 | 2012 |
Individual household electric power consumption | Multivariate, Time-Series | Regression, Clustering | Real | 2075259 | 9 | 2012 |
seeds | Multivariate | Classification, Clustering | Real | 210 | 7 | 2012 |
Northix | Multivariate, Univariate, Text | Classification | Integer, Real | 115 | 200 | 2012 |
QtyT40I10D100K | Sequential | NA | Integer | 3960456 | 4 | 2012 |
Legal Case Reports | Text | Classification | NA | NA | NA | 2012 |
Human Activity Recognition Using Smartphones | Multivariate, Time-Series | Classification, Clustering | NA | 10299 | 561 | 2012 |
One-hundred plant species leaves data set | NA | Classification | Real | 1600 | 64 | 2012 |
Energy efficiency | Multivariate | Classification, Regression | Integer, Real | 768 | 8 | 2012 |
Yacht Hydrodynamics | Multivariate | Regression | Real | 308 | 7 | 2013 |
Fertility | Multivariate | Classification, Regression | Real | 100 | 10 | 2013 |
Daphnet Freezing of Gait | Multivariate, Time-Series | Classification | Real | 237 | 9 | 2013 |
3D Road Network (North Jutland, Denmark) | Sequential, Text | Regression, Clustering | Real | 434874 | 4 | 2013 |
ISTANBUL STOCK EXCHANGE | Multivariate, Univariate, Time-Series | Classification, Regression | Real | 536 | 8 | 2013 |
Buzz in social media | Time-Series, Multivariate | Regression, Classification | Integer, Real | 140000 | 77 | 2013 |
First-order theorem proving | Multivariate | Classification | Real | 6118 | 51 | 2013 |
Wearable Computing: Classification of Body Postures and Movements (PUC-Rio) | Sequential | Classification | Integer, Real | 165632 | 18 | 2013 |
Gas sensor arrays in open sampling settings | Multivariate, Time-Series | Classification | Real | 18000 | 1950000 | 2013 |
Climate Model Simulation Crashes | Multivariate | Classification | Real | 540 | 18 | 2013 |
MicroMass | Multivariate | Classification | Real | 931 | 1300 | 2013 |
QSAR biodegradation | Multivariate | Classification | Integer, Real | 1055 | 41 | 2013 |
BLOGGER | Multivariate | Classification | NA | 100 | 6 | 2013 |
Daily and Sports Activities | Multivariate, Time-Series | Classification, Clustering | Real | 9120 | 5625 | 2013 |
User Knowledge Modeling | Multivariate | Classification, Clustering | Integer | 403 | 5 | 2013 |
Reuters RCV1 RCV2 Multilingual, Multiview Text Categorization Test collection | Multivariate | Classification | Real | 111740 | NA | 2013 |
NYSK | Multivariate, Sequential, Text | Clustering | NA | 10421 | 7 | 2013 |
Turkiye Student Evaluation | Multivariate | Classification, Clustering | NA | 5820 | 33 | 2013 |
ser Knowledge Modeling Data (Students’ Knowledge Levels on DC Electrical Machines) | Multivariate | Classification | Real | 403 | 5 | 2013 |
EEG Eye State | Multivariate, Sequential, Time-Series | Classification | Integer, Real | 14980 | 15 | 2013 |
Physicochemical Properties of Protein Tertiary Structure | Multivariate | Regression | Real | 45730 | 9 | 2013 |
seismic-bumps | Multivariate | Classification | Real | 2584 | 19 | 2013 |
banknote authentication | Multivariate | Classification | Real | 1372 | 5 | 2013 |
USPTO Algorithm Challenge, run by NASA-Harvard Tournament Lab and TopCoder Problem: Pat | Domain-Theory | Classification | Integer | 306 | 5 | 2013 |
YouTube Multiview Video Games Dataset | Multivariate, Text | Classification, Clustering | Integer, Real | 120000 | 1000000 | 2013 |
Gas Sensor Array Drift Dataset at Different Concentrations | Multivariate, Time-Series | Classification, Regression, Clustering, Causa | Real | 13910 | 129 | 2013 |
Activities of Daily Living (ADLs) Recognition Using Binary Sensors | Multivariate, Sequential, Time-Series | Classification, Clustering | NA | 2747 | NA | 2013 |
SkillCraft1 Master Table Dataset | Multivariate | Regression | Integer, Real | 3395 | 20 | 2013 |
Weight Lifting Exercises monitored with Inertial Measurement Units | Multivariate | Classification | Real | 39242 | 152 | 2013 |
SML2010 | Multivariate, Sequential, Time-Series, Text | Regression | Real | 4137 | 24 | 2014 |
Bike Sharing Dataset | Univariate | Regression | Integer, Real | 17389 | 16 | 2013 |
Predict keywords activities in a online social media | Multivariate, Sequential, Time-Series | NA | Integer, Real | 51 | 35 | 2013 |
Thoracic Surgery Data | Multivariate | Classification | Integer, Real | 470 | 17 | 2013 |
EMG dataset in Lower Limb | Multivariate, Time-Series | NA | Real | 132 | 5 | 2014 |
SUSY | NA | Classification | Real | 5000000 | 18 | 2014 |
HIGGS | NA | Classification | Real | 11000000 | 28 | 2014 |
Qualitative_Bankruptcy | Multivariate | Classification | NA | 250 | 7 | 2014 |
LSVT Voice Rehabilitation | Multivariate | Classification | Real | 126 | 309 | 2014 |
Dataset for ADL Recognition with Wrist-worn Accelerometer | Multivariate, Time-Series | Classification, Clustering | NA | NA | 3 | 2014 |
Wilt | Multivariate | Classification | NA | 4889 | 6 | 2014 |
User Identification From Walking Activity | Univariate, Sequential, Time-Series | Classification, Clustering | Real | NA | NA | 2014 |
Activity Recognition from Single Chest-Mounted Accelerometer | Univariate, Sequential, Time-Series | Classification, Clustering | Real | NA | NA | 2014 |
Leaf | Multivariate | Classification | Real | 340 | 16 | 2014 |
Dresses_Attribute_Sales | Text | Classification, Clustering | NA | 501 | 13 | 2014 |
Tamilnadu Electricity Board Hourly Readings | Multivariate | Classification, Regression, Clustering | Real | 45781 | 5 | 2013 |
Airfoil Self-Noise | Multivariate | Regression | Real | 1503 | 6 | 2014 |
Wholesale customers | Multivariate | Classification, Clustering | Integer | 440 | 8 | 2014 |
Twitter Data set for Arabic Sentiment Analysis | Text | Classification | NA | 2000 | 2 | 2014 |
Combined Cycle Power Plant | Multivariate | Regression | Real | 9568 | 4 | 2014 |
Urban Land Cover | Multivariate | Classification | NA | 168 | 148 | 2014 |
Diabetes 130-US hospitals for years 1999-2008 | Multivariate | Classification, Clustering | Integer | 100000 | 55 | 2014 |
Bach Choral Harmony | Sequential | Classification | NA | 5665 | 17 | 2014 |
StoneFlakes | Multivariate | Classification, Clustering, Causal-Discovery | Real | 79 | 8 | 2014 |
Tennis Major Tournament Match Statistics | Multivariate | Classification, Regression, Clustering | Integer, Real | 127 | 42 | 2014 |
Parkinson Speech Dataset with Multiple Types of Sound Recordings | Multivariate | Classification, Regression | Integer, Real | 1040 | 26 | 2014 |
Gesture Phase Segmentation | Multivariate, Sequential, Time-Series | Classification, Clustering | Real | 9900 | 50 | 2014 |
Perfume Data | Univariate, Domain-Theory | Classification, Clustering | Integer | 560 | 2 | 2014 |
BlogFeedback | Multivariate | Regression | Integer, Real | 60021 | 281 | 2014 |
REALDISP Activity Recognition Dataset | Multivariate, Time-Series | Classification | Real | 1419 | 120 | 2014 |
Newspaper and magazine images segmentation dataset | NA | Classification | NA | 101 | NA | 2014 |
AAAI 2014 Accepted Papers | Multivariate | Clustering | NA | 399 | 6 | 2014 |
Gas sensor array under flow modulation | Multivariate, Time-Series | Classification, Regression | Real | 58 | 120432 | 2014 |
Gas sensor array exposed to turbulent gas mixtures | Multivariate, Time-Series | Classification, Regression | Real | 180 | 150000 | 2014 |
UJIIndoorLoc | Multivariate | Classification, Regression | Integer, Real | 21048 | 529 | 2014 |
Sentence Classification | Text | Classification | Integer | NA | NA | 2014 |
Dow Jones Index | Time-Series | Classification, Clustering | Integer, Real | 750 | 16 | 2014 |
sEMG for Basic Hand movements | Time-Series | Classification | Real | 3000 | 2500 | 2014 |
AAAI 2013 Accepted Papers | Multivariate | Clustering | NA | 150 | 5 | 2014 |
Geographical Original of Music | Multivariate | Classification, Regression | Real | 1059 | 68 | 2014 |
Condition Based Maintenance of Naval Propulsion Plants | Multivariate | Regression | Real | 11934 | 16 | 2014 |
Grammatical Facial Expressions | Multivariate, Sequential | Classification, Clustering | Real | 27965 | 100 | 2014 |
NoisyOffice | Multivariate | Classification, Regression | Real | 216 | 216 | 2015 |
MHEALTH Dataset | Multivariate, Time-Series | Classification | Real | 120 | 23 | 2014 |
Student Performance | Multivariate | Classification, Regression | Integer | 649 | 33 | 2014 |
ElectricityLoadDiagrams20112014 | Time-Series | Regression, Clustering | Real | 370 | 140256 | 2015 |
Gas sensor array under dynamic gas mixtures | Multivariate, Time-Series | Classification, Regression | Real | 4178504 | 19 | 2015 |
microblogPCU | Multivariate, Univariate, Sequential, Text | Classification, Causal-Discovery | Integer, Real | 221579 | 20 | 2015 |
Firm-Teacher_Clave-Direction_Classification | Multivariate | Classification | NA | 10800 | 20 | 2015 |
Dataset for Sensorless Drive Diagnosis | Multivariate | Classification | Real | 58509 | 49 | 2015 |
TV News Channel Commercial Detection Dataset | Multivariate | Classification, Clustering | Real | 129685 | 12 | 2015 |
Phishing Websites | NA | Classification | Integer | 2456 | 30 | 2015 |
Greenhouse Gas Observing Network | Multivariate, Time-Series | Regression | Real | 2921 | 5232 | 2015 |
Diabetic Retinopathy Debrecen Data Set | Multivariate | Classification | Integer, Real | 1151 | 20 | 2014 |
HIV-1 protease cleavage | Multivariate | Classification | Categorical | 6590 | 1 | 2015 |
Sentiment Labelled Sentences | Text | Classification | NA | 3000 | NA | 2015 |
Online News Popularity | Multivariate | Classification, Regression | Integer, Real | 39797 | 61 | 2015 |
Forest type mapping | Multivariate | Classification | NA | 326 | 27 | 2015 |
wiki4HE | Multivariate | Regression, Clustering, Causal-Discovery | NA | 913 | 53 | 2015 |
Online Video Characteristics and Transcoding Time Dataset | Multivariate | Regression | Integer, Real | 168286 | 11 | 2015 |
Chronic_Kidney_Disease | Multivariate | Classification | Real | 400 | 25 | 2015 |
Machine Learning based ZZAlpha Ltd. Stock Recommendations 2012-2014 | Sequential, Time-Series | Classification | Real | 314080 | 0 | 2015 |
Folio | Multivariate | Classification, Clustering | NA | 637 | 20 | 2015 |
Taxi Service Trajectory - Prediction Challenge, ECML PKDD 2015 | Multivariate, Sequential, Time-Series, Domain-Theory | Clustering, Causal-Discovery | Real | 1710671 | 9 | 2015 |
Cuff-Less Blood Pressure Estimation | Multivariate | Classification, Regression | Real | 12000 | 3 | 2015 |
Smartphone-Based Recognition of Human Activities and Postural Transitions | Multivariate, Time-Series | Classification | Real | 10929 | 561 | 2015 |
Mice Protein Expression | Multivariate | Classification, Clustering | Real | 1080 | 82 | 2015 |
UJIIndoorLoc-Mag | Multivariate, Sequential, Time-Series | Classification, Regression, Clustering | Integer, Real | 40000 | 13 | 2015 |
Heterogeneity Activity Recognition | Multivariate, Time-Series | Classification, Clustering | Real | 43930257 | 16 | 2015 |
Educational Process Mining (EPM): A Learning Analytics Data Set | Multivariate, Sequential, Time-Series | Classification, Regression, Clustering | Integer | 230318 | 13 | 2015 |
HEPMASS | Multivariate | Classification | Real | 10500000 | 28 | 2016 |
Indoor User Movement Prediction from RSS data | Multivariate, Sequential, Time-Series | Classification | Real | 13197 | 4 | 2016 |
Open University Learning Analytics dataset | Multivariate, Sequential, Time-Series | Classification, Regression, Clustering | Integer | NA | NA | 2015 |
default of credit card clients | Multivariate | Classification | Integer, Real | 30000 | 24 | 2016 |
Mesotheliomaâ<U+0080><U+0099>s disease data set | Multivariate | Classification | Real | 324 | 34 | 2016 |
Online Retail | Multivariate, Sequential, Time-Series | Classification, Clustering | Integer, Real | 541909 | 8 | 2015 |
SIFT10M | Multivariate | Causal-Discovery | Integer | 11164866 | 128 | 2016 |
GPS Trajectories | Multivariate | Classification, Regression | Real | 163 | 15 | 2016 |
Detect Malacious Executable(AntiVirus) | Multivariate | Classification | Real | 373 | 513 | 2016 |
Occupancy Detection | Multivariate, Time-Series | Classification | Real | 20560 | 7 | 2016 |
Improved Spiral Test Using Digitized Graphics Tablet for Monitoring Parkinsonâ<U+0080><U+0099>s Disease | Multivariate | Classification, Regression, Clustering | Real | 40 | 7 | 2016 |
News Aggregator | Multivariate | Classification, Clustering | NA | 422937 | 5 | 2016 |
Air Quality | Multivariate, Time-Series | Regression | Real | 9358 | 15 | 2016 |
Twin gas sensor arrays | Multivariate, Time-Series, Domain-Theory | Classification, Regression | Real | 640 | 480000 | 2016 |
Gas sensors for home activity monitoring | Multivariate, Time-Series | Classification | Real | 919438 | 11 | 2016 |
Facebook Comment Volume Dataset | Multivariate | Regression | Integer, Real | 40949 | 54 | 2016 |
Smartphone Dataset for Human Activity Recognition (HAR) in Ambient Assisted Living (AAL) | Time-Series | Classification | Real | 5744 | 561 | 2016 |
Polish companies bankruptcy data | Multivariate | Classification | Real | 10503 | 64 | 2016 |
Activity Recognition system based on Multisensor data fusion (AReM) | Multivariate, Sequential, Time-Series | Classification | Real | 42240 | 6 | 2016 |
Dota2 Games Results | Multivariate | Classification | NA | 102944 | 116 | 2016 |
Facebook metrics | Multivariate | Regression | Integer | 500 | 19 | 2016 |
UbiqLog (smartphone lifelogging) | Multivariate | Causal-Discovery | NA | 9782222 | NA | 2016 |
NIPS Conference Papers 1987-2015 | Text | Clustering | Integer | 11463 | 5812 | 2016 |
HTRU2 | Multivariate | Classification, Clustering | Real | 17898 | 9 | 2017 |
Drug consumption (quantified) | Multivariate | Classification | Real | 1885 | 32 | 2016 |
Appliances energy prediction | Multivariate, Time-Series | Regression | Real | 19735 | 29 | 2017 |
Miskolc IIS Hybrid IPS | Text | Classification, Clustering, Causal-Discovery | Integer | 1540 | 67 | 2016 |
KDC-4007 dataset Collection | Multivariate, Text | Classification, Regression | Integer | 4007 | NA | 2017 |
Geo-Magnetic field and WLAN dataset for indoor localisation from wristband and smartphone | Multivariate, Sequential, Time-Series | Classification, Regression, Clustering | Integer, Real | 153540 | 25 | 2017 |
DrivFace | Multivariate | Classification, Regression, Clustering | Real | 606 | 6400 | 2016 |
Website Phishing | Multivariate | Classification | Integer | 1353 | 10 | 2016 |
YouTube Spam Collection | Text | Classification | NA | 1956 | 5 | 2017 |
Beijing PM2.5 Data | Multivariate, Time-Series | Regression | Integer, Real | 43824 | 13 | 2017 |
Cargo 2000 Freight Tracking and Tracing | Multivariate, Sequential | Classification, Regression | Integer | 3942 | 98 | 2016 |
Cervical cancer (Risk Factors) | Multivariate | Classification | Integer, Real | 858 | 36 | 2017 |
Quality Assessment of Digital Colposcopies | Multivariate | Classification | Real | 287 | 69 | 2017 |
KASANDR | Multivariate | Causal-Discovery | Integer | 17764280 | 2158859 | 2017 |
FMA: A Dataset For Music Analysis | Multivariate, Time-Series | Classification, Clustering | Real | 106574 | 518 | 2017 |
Air quality | Multivariate, Time-Series | Regression | Real | 9358 | 15 | 2016 |
Epileptic Seizure Recognition | Multivariate, Time-Series | Classification, Clustering | Integer, Real | 11500 | 179 | 2017 |
Devanagari Handwritten Character Dataset | NA | Classification | Integer | 92000 | NA | 2016 |
Stock portfolio performance | Multivariate | Regression | Real | 315 | 12 | 2016 |
MoCap Hand Postures | Multivariate | Classification, Clustering | Integer, Real | 78095 | 38 | 2016 |
Early biomarkers of Parkinson<U+0092>s disease based on natural connected speech | Multivariate | Classification, Regression | Integer, Real | 130 | 65 | 2017 |
Data for Software Engineering Teamwork Assessment in Education Setting | Sequential, Time-Series | Classification | Integer, Real | 74 | 102 | 2017 |
PM2.5 Data of Five Chinese Cities | Multivariate, Time-Series | Regression | Integer, Real | 52854 | 86 | 2017 |
Parkinson Disease Spiral Drawings Using Digitized Graphics Tablet | Multivariate | Classification, Regression, Clustering | Integer | 77 | 7 | 2017 |
Sales_Transactions_Dataset_Weekly | Multivariate, Time-Series | Clustering | Integer, Real | 811 | 53 | 2017 |
Las Vegas Strip | NA | Classification, Regression | Integer | 504 | 20 | 2017 |
Eco-hotel | Text | NA | NA | 401 | 1 | 2017 |
MEU-Mobile KSD | Multivariate | Classification | Integer, Real | 2856 | 71 | 2016 |
Crowdsourced Mapping | Multivariate | Classification | NA | 10546 | 29 | 2016 |
gene expression cancer RNA-Seq | Multivariate | Classification, Clustering | Real | 801 | 20531 | 2016 |
Hybrid Indoor Positioning Dataset from WiFi RSSI, Bluetooth and magnetometer | Multivariate, Sequential, Time-Series | Classification | Real | 1540 | 65 | 2016 |
chestnut â<U+0080><U+0093> LARVIC | NA | Classification, Clustering | NA | 1451 | 3 | 2017 |
Burst Header Packet (BHP) flooding attack on Optical Burst Switching (OBS) Network | Text | Classification | Integer | 1075 | 22 | 2017 |
Motion Capture Hand Postures | Multivariate | Classification, Clustering | Real | 78095 | 38 | 2017 |
Anuran Calls (MFCCs) | Multivariate | Classification, Clustering | Real | 7195 | 22 | 2017 |
TTC-3600: Benchmark dataset for Turkish text categorization | Text | Classification, Clustering | Integer | 3600 | 4814 | 2017 |
Gastrointestinal Lesions in Regular Colonoscopy | Multivariate | Classification | Real | 76 | 698 | 2016 |
Daily Demand Forecasting Orders | Time-Series | Regression | Integer | 60 | 13 | 2017 |
Paper Reviews | Text | Classification, Regression | Integer | 405 | 10 | 2017 |
extention of Z-Alizadeh sani dataset | NA | Classification | Integer, Real | 303 | 59 | 2017 |
Z-Alizadeh Sani | NA | Classification | Integer, Real | 303 | 56 | 2017 |
Dynamic Features of VirusShare Executables | Multivariate, Time-Series | Classification, Regression | Integer | 107888 | 482 | 2017 |
IDA2016Challenge | Multivariate | Classification | Integer | 76000 | 171 | 2017 |
DSRC Vehicle Communications | Sequential, Text | Clustering | Real | 10000 | 5 | 2017 |
Mturk User-Perceived Clusters over Images | Multivariate, Text | Clustering | Integer | 180 | 500 | 2016 |
Character Font Images | Multivariate | Classification | Integer, Real | 745000 | 411 | 2016 |
DeliciousMIL: A Data Set for Multi-Label Multi-Instance Learning with Instance Labels | Text | Classification | Integer | 12234 | 8519 | 2016 |
Autistic Spectrum Disorder Screening Data for Children | Multivariate | Classification | Integer | 292 | 21 | 2017 |
Autistic Spectrum Disorder Screening Data for Adolescent | Multivariate | Classification | Integer | 104 | 21 | 2017 |
APS Failure at Scania Trucks | Multivariate | Classification | Integer, Real | 60000 | 171 | 2017 |
Wireless Indoor Localization | Multivariate | Classification | Real | 2000 | 7 | 2017 |
HCC Survival | Multivariate | Classification | Integer, Real | 165 | 49 | 2017 |
CSM (Conventional and Social Media Movies) Dataset 2014 and 2015 | Multivariate | Classification, Regression | Integer | 217 | 12 | 2017 |
University of Tehran Question Dataset 2016 (UTQD.2016) | Text | Classification | NA | 1175 | 3 | 2017 |
Autism Screening Adult | NA | Classification | Integer | 704 | 21 | 2017 |
Activity recognition with healthy older people using a batteryless wearable sensor | Sequential | Classification | Real | 75128 | 9 | 2016 |
Immunotherapy Dataset | Univariate | Classification | Integer, Real | 90 | 8 | 2018 |
Cryotherapy Dataset | Univariate | Classification | Integer, Real | 90 | 7 | 2018 |
OCT data & Color Fundus Images of Left & Right Eyes | Multivariate | Classification | Real | 50 | 2 | 2016 |
Discrete Tone Image Dataset | Multivariate | Classification | NA | 71 | 11 | 2018 |
News Popularity in Multiple Social Media Platforms | Multivariate, Time-Series, Text | Regression | Integer, Real | 93239 | 11 | 2018 |
Ultrasonic flowmeter diagnostics | Multivariate | Classification | Real | 540 | 173 | 2018 |
ICMLA 2014 Accepted Papers Data Set | Multivariate | Classification, Clustering | NA | 105 | 5 | 2018 |
BLE RSSI Dataset for Indoor localization and Navigation | Multivariate, Sequential, Time-Series | Classification, Clustering | Integer | 6611 | 15 | 2018 |
Container Crane Controller Data Set | Univariate, Domain-Theory | Classification, Regression | Real | 15 | 3 | 2018 |
Residential Building Data Set | Multivariate | Regression | Real | 372 | 105 | 2018 |
Health News in Twitter | Text | Clustering | Real | 58000 | 25000 | 2018 |
chipseq | Sequential | Classification | Integer | 4960 | NA | 2018 |
SGEMM GPU kernel performance | Multivariate | Regression | Integer | 241600 | 18 | 2018 |
Repeat Consumption Matrices | Multivariate | Clustering | Real | 130000 | 21000 | 2018 |
detection_of_IoT_botnet_attacks_N_BaIoT | Multivariate, Sequential | Classification, Clustering | Real | 7062606 | 115 | 2018 |
Absenteeism at work | Multivariate, Time-Series | Classification, Clustering | Integer, Real | 740 | 21 | 2018 |
SCADI | Multivariate | Classification, Clustering | NA | 70 | 206 | 2018 |
Condition monitoring of hydraulic systems | Multivariate, Time-Series | Classification, Regression | Real | 2205 | 43680 | 2018 |
Carbon Nanotubes | Univariate | Regression | Real | 10721 | 8 | 2018 |
Optical Interconnection Network | Multivariate | Classification, Regression | Integer, Real | 640 | 10 | 2018 |
Sports articles for objectivity analysis | Multivariate, Text | Classification | Integer | 1000 | 59 | 2018 |
Breast Cancer Coimbra | Multivariate | Classification | Integer | 116 | 10 | 2018 |
GNFUV Unmanned Surface Vehicles Sensor Data | Multivariate, Time-Series | Regression | Real | 1672 | 5 | 2018 |
Dishonest Internet users Dataset | Multivariate | Classification, Clustering | NA | 322 | 5 | 2018 |
Victorian Era Authorship Attribution | Text | Classification | NA | 93600 | 1000 | 2018 |
Simulated Falls and Daily Living Activities Data Set | Time-Series | Classification | Integer | 3060 | 138 | 2018 |
Multimodal Damage Identification for Humanitarian Computing | Multivariate, Text | Classification | Integer | 5879 | NA | 2018 |
EEG Steady-State Visual Evoked Potential Signals | Multivariate, Time-Series | Classification, Regression | Integer | 9200 | 16 | 2018 |
Roman Urdu Data Set | Text | Classification | NA | 20000 | 2 | 2018 |
Avila | Multivariate | Classification | Real | 20867 | 10 | 2018 |
PANDOR | Multivariate | Recommendation | Categorical | NA | NA | 2018 |
Drug Review Dataset (Druglib.com) | Multivariate, Text | Classification, Regression, Clustering | Integer | 4143 | 8 | 2018 |
Drug Review Dataset (Drugs.com) | Multivariate, Text | Classification, Regression, Clustering | Integer | 215063 | 6 | 2018 |
Physical Unclonable Functions | Multivariate | Classification | Integer | 6000000 | 129 | 2018 |
Superconductivty Data | Multivariate | Regression | Real | 21263 | 81 | 2018 |
WESAD (Wearable Stress and Affect Detection) | Multivariate, Time-Series | Classification, Regression | Real | 63000000 | 12 | 2018 |
GNFUV Unmanned Surface Vehicles Sensor Data Set 2 | Multivariate, Sequential, Time-Series | Regression | Real | 10190 | 6 | 2018 |
Student Academics Performance | Multivariate | Classification | NA | 300 | 22 | 2018 |
Online Shoppers Purchasing Intention Dataset | Multivariate | Classification, Clustering | Integer, Real | 12330 | 18 | 2018 |
PMU-UD | Univariate | Classification | NA | 5180 | 9 | 2018 |
Parkinson’s Disease Classification | Multivariate | Classification | Integer, Real | 756 | 754 | 2018 |
Electrical Grid Stability Simulated Data | Multivariate | Classification, Regression | Real | 10000 | 14 | 2018 |
Caesarian Section Classification Dataset | Univariate | Classification | Integer | 80 | 5 | 2018 |
BAUM-1 | Time-Series | Classification | NA | 1184 | NA | 2018 |
BAUM-2 | Time-Series | Classification | NA | 1047 | NA | 2018 |
Audit Data | Multivariate | Classification | Real | 777 | 18 | 2018 |
BuddyMove Data Set | Multivariate, Text | Classification, Clustering | Real | 249 | 7 | 2018 |
Real estate valuation data set | Multivariate | Regression | Integer, Real | 414 | 7 | 2018 |
Early biomarkers of Parkinsonâ<U+0080><U+0099>s disease based on natural connected speech Data Set | Multivariate | Classification | Real | NA | NA | 2018 |
Somerville Happiness Survey | NA | Classification | Integer | 143 | 7 | 2018 |
2.4 GHZ Indoor Channel Measurements | Multivariate | Classification | Real | 7840 | 5 | 2018 |
EMG data for gestures | Time-Series | Classification | Real | 30000 | 6 | 2019 |
Parking Birmingham | Multivariate, Univariate, Sequential, Time-Series | Classification, Regression, Clustering | Real | 35717 | 4 | 2019 |
Behavior of the urban traffic of the city of Sao Paulo in Brazil | Multivariate, Time-Series | Classification, Regression | Integer, Real | 135 | 18 | 2018 |
Travel Reviews | Multivariate, Text | Classification, Clustering | Real | 980 | 11 | 2018 |
Tarvel Review Ratings | Multivariate, Text | Classification, Clustering | Real | 5456 | 25 | 2018 |
Rice Leaf Diseases | Multivariate | Classification | Integer | 120 | NA | 2019 |
Gas sensor array temperature modulation | Multivariate, Time-Series | Classification, Regression | Real | 4095000 | 20 | 2019 |
Facebook Live Sellers in Thailand | Multivariate | Clustering | Integer | 7051 | 12 | 2019 |
Parkinson Dataset with replicated acoustic features | Multivariate | Classification | NA | 240 | 46 | 2019 |
Metro Interstate Traffic Volume | Multivariate, Sequential, Time-Series | Regression | Integer, Real | 48204 | 9 | 2019 |
Query Analytics Workloads Dataset | Multivariate | Regression, Clustering | Real | 260000 | 8 | 2019 |
Wave Energy Converters | Multivariate | Regression | Real | 288000 | 49 | 2019 |
PPG-DaLiA | Multivariate, Time-Series | Regression | Real | 8300000 | 11 | 2019 |
Alcohol QCM Sensor Dataset | Multivariate | Classification, Regression, Clustering | Real | 125 | 8 | 2019 |
Divorce Predictors data set | Multivariate, Univariate | Classification | Integer | 170 | 54 | 2019 |
Incident management process enriched event log | Multivariate, Sequential | Regression, Clustering | Integer | 141712 | 36 | 2019 |
Opinion Corpus for Lebanese Arabic Reviews (OCLAR) | Text | Classification | Integer | 3916 | 3916 | 2019 |
MEx | Time-Series | Classification, Clustering | Real | 6262 | 710 | 2019 |
Beijing Multi-Site Air-Quality Data | Multivariate, Time-Series | Regression | Integer, Real | 420768 | 18 | 2019 |
Online Retail II | Multivariate, Sequential, Time-Series, Text | Classification, Regression, Clustering | Integer, Real | 1067371 | 8 | 2019 |
Hepatitis C Virus (HCV) for Egyptian patients | Multivariate | Classification | Integer, Real | 1385 | 29 | 2019 |
QSAR fish toxicity | Multivariate | Regression | Real | 908 | 7 | 2019 |
QSAR aquatic toxicity | Multivariate | Regression | Real | 546 | 9 | 2019 |
Human Activity Recognition from Continuous Ambient Sensor Data | Multivariate, Sequential, Time-Series | Classification | Integer, Real | 13956534 | 37 | 2019 |
WISDM Smartphone and Smartwatch Activity and Biometrics Dataset | Multivariate, Time-Series | Classification | Real | 15630426 | 6 | 2019 |
QSAR oral toxicity | Multivariate | Classification | NA | 8992 | 1024 | 2019 |
QSAR androgen receptor | Multivariate | Classification | NA | 1687 | 1024 | 2019 |
QSAR Bioconcentration classes dataset | Multivariate | Classification, Regression | NA | 779 | 14 | 2019 |
QSAR fish bioconcentration factor (BCF) | Multivariate | Regression | Integer, Real | 1056 | 7 | 2019 |
A study of Asian Religious and Biblical Texts | Multivariate, Text | Classification, Clustering | Integer | 590 | 8265 | 2019 |
Real-time Election Results: Portugal 2019 | Multivariate, Time-Series, Text | Regression | Integer, Real | 21643 | 29 | 2019 |
Bias correction of numerical prediction model temperature forecast | Multivariate | Regression | Real | 7750 | 25 | 2020 |
Bar Crawl: Detecting Heavy Drinking | Multivariate, Time-Series | Classification, Regression | Real | 14057567 | 3 | 2020 |
Kitsune Network Attack Dataset | Multivariate, Sequential, Time-Series | Classification, Clustering, Causal-Discovery | Real | 27170754 | 115 | 2019 |
Shoulder Implant X-Ray Manufacturer Classification | Multivariate | Classification | Real | 597 | 1 | 2020 |
Speaker Accent Recognition | Multivariate | Classification | Real | 329 | 12 | 2020 |
Heart failure clinical records | Multivariate | Classification, Regression, Clustering | Integer, Real | 299 | 13 | 2020 |
Deepfakes: Medical Image Tamper Detection | Multivariate | Classification | Real | 20000 | 200000 | 2020 |
selfBACK | Time-Series | Classification, Clustering | Real | 26136 | 6 | 2020 |
South German Credit | Multivariate | Classification, Regression, Clustering | Integer, Real | 1000 | 21 | 2019 |
Exasens | Multivariate | Classification, Clustering | Integer | 399 | 4 | 2020 |
Swarm Behaviour | Multivariate | Classification | Real | 24017 | 2400 | 2020 |
Crop mapping using fused optical-radar data set | Multivariate, Time-Series | Classification | Real | 325834 | 175 | 2020 |
BitcoinHeistRansomwareAddressDataset | Multivariate, Time-Series | Classification, Clustering | Integer, Real | 2916697 | 10 | 2020 |
Facebook Large Page-Page Network | Multivariate | Classification | NA | 22470 | 4714 | 2020 |
Amphibians | Multivariate | Classification | Integer, Real | 189 | 23 | 2020 |
Early stage diabetes risk prediction dataset. | Multivariate | Classification | NA | 520 | 17 | 2020 |
Turkish Spam V01 | Text | Classification | NA | 826 | 2 | 2019 |
Stock keeping units | Multivariate | Clustering | Integer, Real | 2279 | 9 | 2019 |
Demand Forecasting for a store | Multivariate | Regression | Integer | 28764 | 8 | 2019 |
Detect Malware Types | Multivariate, Time-Series, Text | Classification | NA | 7107 | 280 | 2019 |
Wave Energy Converters | Multivariate | Regression | Real | 288000 | 49 | 2019 |
Youtube cookery channels viewers comments in Hinglish | Multivariate, Text | Classification | NA | 9800 | 3 | 2019 |
Pedestrian in Traffic Dataset | Multivariate, Sequential, Time-Series | Classification, Regression, Causal-Discovery | Real | 4760 | 14 | 2019 |
Cervical Cancer Behavior Risk | Multivariate, Univariate | Classification, Clustering | Integer | 72 | 19 | 2019 |
Sattriya_Dance_Single_Hand_Gestures Dataset | Multivariate | Classification | NA | 1450 | NA | 2019 |
Divorce Predictors data set | Multivariate, Univariate | Classification | Integer | 170 | 54 | 2019 |
3W dataset | Multivariate, Time-Series | Classification, Clustering | Integer, Real | 1984 | 8 | 2019 |
Malware static and dynamic features VxHeaven and Virus Total | Multivariate | Classification | Integer, Real | 2955 | 1087 | 2019 |
Internet Firewall Data | Multivariate | Classification | NA | 65532 | 12 | 2019 |
User Profiling and Abusive Language Detection Dataset | NA | Classification | NA | 65919 | 3 | 2019 |
Estimation of obesity levels based on eating habits and physical condition | Multivariate | Classification, Regression, Clustering | Integer | 2111 | 17 | 2019 |
Rice (Cammeo and Osmancik) | Multivariate | Classification | Real | 3810 | 8 | 2019 |
Vehicle routing and scheduling problems | Multivariate | Clustering | Integer, Real | 18 | 9 | 2019 |
Algerian Forest Fires Dataset | Multivariate | Classification, Regression | Real | 244 | 12 | 2019 |
Breath Metabolomics | Multivariate, Time-Series | Classification, Clustering | Real | 104 | 1656 | 2019 |
Horton General Hospital | Multivariate, Time-Series | Causal-Discovery | Integer | 139 | 6 | 2019 |
UrbanGB, urban road accidents coordinates labelled by the urban center | Univariate | Clustering | Real | 360177 | 2 | 2019 |
Gas Turbine CO and NOx Emission Data Set | Multivariate | Regression, Clustering | Real | 36733 | 11 | 2019 |
Activity recognition using wearable physiological measurements | Multivariate | Classification | Real | 4480 | 533 | 2019 |
clickstream data for online shopping | Multivariate, Sequential | Classification, Regression, Clustering | Integer, Real | 165474 | 14 | 2019 |
CNNpred: CNN-based stock market prediction using a diverse set of variables | Sequential, Time-Series | Classification, Regression | Real | 1985 | 84 | 2019 |
Apartment for rent classified | Multivariate | Classification, Regression, Clustering | NA | 10000 | 22 | 2019 |
: Simulated Data set of Iraqi tourism places | Multivariate | Classification, Clustering | NA | 232 | 16 | 2020 |
Nasarian CAD Dataset | Multivariate | Classification | NA | 150 | 52 | 2020 |
Monolithic Columns in Troad and Mysia Region | Multivariate | Classification | Real | 11 | 19 | 2020 |
Bar Crawl: Detecting Heavy Drinking | Multivariate, Time-Series | Classification, Regression | Real | 14057567 | 3 | 2020 |
Seoul Bike Sharing Demand | Multivariate | Regression | Integer, Real | 8760 | 14 | 2020 |
Person Classification Gait Data | Multivariate | Classification | Real | 48 | 321 | 2020 |
Shill Bidding Dataset | Multivariate | Classification, Clustering | NA | 6321 | 13 | 2020 |
Iranian Churn Dataset | Multivariate | Classification, Regression | Integer | 3150 | 13 | 2020 |
Unmanned Aerial Vehicle (UAV) Intrusion Detection | Multivariate | Classification | Real | 17256 | 55 | 2020 |
Bone marrow transplant: children | Multivariate | Classification, Regression | Integer, Real | 187 | 39 | 2020 |
Exasens | Multivariate | Classification, Clustering | Integer | 399 | 4 | 2020 |
COVID-19 Surveillance | Multivariate | Classification | NA | 14 | 7 | 2020 |
Refractive errors | Multivariate | Classification | Integer | 467 | 79 | 2020 |
Shoulder Implant X-Ray Manufacturer Classification | Multivariate | Classification | Real | 597 | 1 | 2020 |
CLINC150 | Text | Classification | NA | 23700 | NA | 2020 |
HCV data | Multivariate | Classification, Clustering | Integer, Real | 615 | 14 | 2020 |
Taiwanese Bankruptcy Prediction | Multivariate | Classification | Integer | 6819 | 96 | 2020 |
South German Credit (UPDATE) | Multivariate | Classification, Regression, Clustering | Integer, Real | 1000 | 21 | 2020 |
IIWA14-R820-Gazebo-Dataset-10Trajectories | NA | Regression | Integer | NA | NA | 2020 |
Guitar Chords finger positions | Text | Classification | NA | 2633 | 5 | 2020 |
Russian Corpus of Biographical Texts | Text | Classification | NA | 200 | 2 | 2020 |
Codon usage | Multivariate | Classification, Clustering | NA | 13028 | 69 | 2020 |
Intelligent Media Accelerometer and Gyroscope (IM-AccGyro) Dataset | Time-Series | Classification | Real | 800 | 9 | 2020 |
Myocardial infarction complications | Multivariate | Classification | Real | 1700 | 124 | 2020 |
Hungarian Chickenpox Cases | Time-Series | Regression | Real | 521 | 20 | 2021 |
Simulated data for survival modelling | Multivariate, Time-Series | Regression | Integer, Real | 120000 | 25 | 2018 |
Student Performance on an entrance examination | Multivariate | Classification | NA | 666 | 11 | 2018 |
Chemical Composition of Ceramic Samples | Multivariate | Classification, Clustering | Real | 88 | 19 | 2019 |
Labeled Text Forum Threads Dataset | Text | Classification | Integer | 200 | 9 | 2019 |
Stock keeping units | Multivariate | Clustering | Integer, Real | 2279 | 9 | 2019 |
BLE RSSI dataset for Indoor localization | Sequential, Time-Series | Classification | Integer | 23570 | 5 | 2019 |
Basketball dataset | Time-Series | Classification | Integer | 10000 | 7 | 2019 |
GitHub MUSAE | Multivariate | Classification | NA | 37700 | 4006 | 2019 |
Anticancer peptides | Sequential | Classification | NA | 1850 | 2 | 2019 |
Monolithic Columns in Troad and Mysia Region | Multivariate | Classification | Real | 11 | 19 | 2020 |
Gender by Name | Text | Classification, Clustering | NA | 147270 | 4 | 2020 |
Iranian Churn Dataset | Multivariate | Classification, Regression | Integer | 3150 | 13 | 2020 |
Unmanned Aerial Vehicle (UAV) Intrusion Detection | Multivariate | Classification | Real | 17256 | 55 | 2020 |
Shoulder Implant Manufacture Classification | Multivariate | Classification | NA | 597 | 1 | 2020 |
LastFM Asia Social Network | Multivariate | Classification | NA | 7624 | 7842 | 2020 |
Wheat kernels | Multivariate | Classification | Real | 314 | 15 | 2020 |
Productivity Prediction of Garment Employees | Multivariate, Time-Series | Classification, Regression | Integer, Real | 1197 | 15 | 2020 |
Multi-view Brain Networks | Multivariate | Classification, Clustering | Integer | 70 | 70 | 2020 |
LastFM Asia Social Network | Multivariate | Classification | NA | 7624 | 7842 | 2020 |
Wisesight Sentiment Corpus | Multivariate, Text | Classification | NA | 26737 | 4 | 2020 |
AI4I 2020 Predictive Maintenance Dataset | Multivariate, Time-Series | Classification, Regression, Causal-Discovery | Real | 10000 | 14 | 2020 |
Dry Bean Dataset | Multivariate | Classification | Integer, Real | 13611 | 17 | 2020 |
in-vehicle coupon recommendation | Multivariate | Classification | NA | 12684 | 23 | 2020 |
Gait Classification | Multivariate | Classification | Real | 48 | 321 | 2020 |
Wikipedia Math Essentials | Time-Series | Regression | Real | 731 | 1068 | 2021 |
Wikipedia Math Essentials | Time-Series | Regression | Real | 731 | 1068 | 2021 |
Synchronous Machine Data Set | Multivariate | Regression | Real | 557 | 5 | 2021 |
#https://archive.ics.uci.edu/ml/datasets/Las+Vegas+Strip
# r read csv from url
# allows you to directly download csv file from website
LV_data <- read.csv("https://archive.ics.uci.edu/ml/machine-learning-databases/00397/LasVegasTripAdvisorReviews-Dataset.csv", sep=';')
knitr::kable(head(LV_data,100))%>%
kableExtra::kable_styling("striped")%>%
kableExtra::scroll_box(width = "100%",height="300px")
User.country | Nr..reviews | Nr..hotel.reviews | Helpful.votes | Score | Period.of.stay | Traveler.type | Pool | Gym | Tennis.court | Spa | Casino | Free.internet | Hotel.name | Hotel.stars | Nr..rooms | User.continent | Member.years | Review.month | Review.weekday |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
USA | 11 | 4 | 13 | 5 | Dec-Feb | Friends | NO | YES | NO | NO | YES | YES | Circus Circus Hotel & Casino Las Vegas | 3 | 3773 | North America | 9 | January | Thursday |
USA | 119 | 21 | 75 | 3 | Dec-Feb | Business | NO | YES | NO | NO | YES | YES | Circus Circus Hotel & Casino Las Vegas | 3 | 3773 | North America | 3 | January | Friday |
USA | 36 | 9 | 25 | 5 | Mar-May | Families | NO | YES | NO | NO | YES | YES | Circus Circus Hotel & Casino Las Vegas | 3 | 3773 | North America | 2 | February | Saturday |
UK | 14 | 7 | 14 | 4 | Mar-May | Friends | NO | YES | NO | NO | YES | YES | Circus Circus Hotel & Casino Las Vegas | 3 | 3773 | Europe | 6 | February | Friday |
Canada | 5 | 5 | 2 | 4 | Mar-May | Solo | NO | YES | NO | NO | YES | YES | Circus Circus Hotel & Casino Las Vegas | 3 | 3773 | North America | 7 | March | Tuesday |
Canada | 31 | 8 | 27 | 3 | Mar-May | Couples | NO | YES | NO | NO | YES | YES | Circus Circus Hotel & Casino Las Vegas | 3 | 3773 | North America | 2 | March | Tuesday |
UK | 45 | 12 | 46 | 4 | Mar-May | Couples | NO | YES | NO | NO | YES | YES | Circus Circus Hotel & Casino Las Vegas | 3 | 3773 | Europe | 4 | April | Friday |
USA | 2 | 1 | 4 | 4 | Mar-May | Families | NO | YES | NO | NO | YES | YES | Circus Circus Hotel & Casino Las Vegas | 3 | 3773 | North America | 0 | April | Tuesday |
India | 24 | 3 | 8 | 4 | Mar-May | Friends | NO | YES | NO | NO | YES | YES | Circus Circus Hotel & Casino Las Vegas | 3 | 3773 | Asia | 3 | May | Saturday |
Canada | 12 | 7 | 11 | 3 | Mar-May | Families | NO | YES | NO | NO | YES | YES | Circus Circus Hotel & Casino Las Vegas | 3 | 3773 | North America | 5 | May | Tuesday |
USA | 102 | 24 | 58 | 2 | Jun-Aug | Families | NO | YES | NO | NO | YES | YES | Circus Circus Hotel & Casino Las Vegas | 3 | 3773 | North America | 9 | June | Friday |
Australia | 20 | 9 | 24 | 3 | Jun-Aug | Friends | NO | YES | NO | NO | YES | YES | Circus Circus Hotel & Casino Las Vegas | 3 | 3773 | Oceania | 4 | June | Saturday |
USA | 7 | 6 | 9 | 2 | Jun-Aug | Friends | NO | YES | NO | NO | YES | YES | Circus Circus Hotel & Casino Las Vegas | 3 | 3773 | North America | 1 | July | Wednesday |
USA | 22 | 5 | 13 | 3 | Jun-Aug | Friends | NO | YES | NO | NO | YES | YES | Circus Circus Hotel & Casino Las Vegas | 3 | 3773 | North America | 1 | July | Thursday |
UK | 3 | 3 | 0 | 3 | Jun-Aug | Friends | NO | YES | NO | NO | YES | YES | Circus Circus Hotel & Casino Las Vegas | 3 | 3773 | Europe | 1 | August | Sunday |
New Zeland | 146 | 17 | 33 | 4 | Jun-Aug | Families | NO | YES | NO | NO | YES | YES | Circus Circus Hotel & Casino Las Vegas | 3 | 3773 | Oceania | 2 | August | Saturday |
Canada | 8 | 8 | 9 | 1 | Sep-Nov | Families | NO | YES | NO | NO | YES | YES | Circus Circus Hotel & Casino Las Vegas | 3 | 3773 | North America | 4 | September | Wednesday |
USA | 9 | 3 | 1 | 4 | Sep-Nov | Families | NO | YES | NO | NO | YES | YES | Circus Circus Hotel & Casino Las Vegas | 3 | 3773 | North America | 2 | September | Saturday |
Canada | 41 | 9 | 19 | 3 | Sep-Nov | Couples | NO | YES | NO | NO | YES | YES | Circus Circus Hotel & Casino Las Vegas | 3 | 3773 | North America | 2 | October | Tuesday |
USA | 8 | 7 | 26 | 2 | Sep-Nov | Couples | NO | YES | NO | NO | YES | YES | Circus Circus Hotel & Casino Las Vegas | 3 | 3773 | North America | 10 | October | Monday |
UK | 10 | 5 | 2 | 4 | Sep-Nov | Couples | NO | YES | NO | NO | YES | YES | Circus Circus Hotel & Casino Las Vegas | 3 | 3773 | Europe | 7 | November | Saturday |
New Zeland | 4 | 3 | 3 | 1 | Sep-Nov | Couples | NO | YES | NO | NO | YES | YES | Circus Circus Hotel & Casino Las Vegas | 3 | 3773 | Oceania | 3 | November | Monday |
UK | 18 | 7 | 19 | 4 | Dec-Feb | Families | NO | YES | NO | NO | YES | YES | Circus Circus Hotel & Casino Las Vegas | 3 | 3773 | Europe | 0 | December | Saturday |
USA | 4 | 4 | 3 | 2 | Dec-Feb | Couples | NO | YES | NO | NO | YES | YES | Circus Circus Hotel & Casino Las Vegas | 3 | 3773 | North America | 5 | December | Sunday |
Ireland | 29 | 11 | 15 | 4 | Dec-Feb | Couples | YES | YES | NO | YES | YES | YES | Excalibur Hotel & Casino | 3 | 3981 | Europe | 3 | January | Monday |
USA | 114 | 42 | 52 | 4 | Dec-Feb | Business | YES | YES | NO | YES | YES | YES | Excalibur Hotel & Casino | 3 | 3981 | North America | 11 | January | Saturday |
Canada | 30 | 12 | 17 | 5 | Dec-Feb | Couples | YES | YES | NO | YES | YES | YES | Excalibur Hotel & Casino | 3 | 3981 | North America | 8 | February | Friday |
UK | 87 | 18 | 36 | 3 | Dec-Feb | Business | YES | YES | NO | YES | YES | YES | Excalibur Hotel & Casino | 3 | 3981 | Europe | 3 | February | Thursday |
USA | 26 | 10 | 28 | 5 | Mar-May | Solo | YES | YES | NO | YES | YES | YES | Excalibur Hotel & Casino | 3 | 3981 | North America | 1 | March | Wednesday |
Ireland | 8 | 7 | 9 | 5 | Mar-May | Couples | YES | YES | NO | YES | YES | YES | Excalibur Hotel & Casino | 3 | 3981 | Europe | 8 | March | Wednesday |
Canada | 11 | 8 | 13 | 5 | Mar-May | Couples | YES | YES | NO | YES | YES | YES | Excalibur Hotel & Casino | 3 | 3981 | North America | 7 | April | Friday |
Australia | 4 | 3 | 2 | 3 | Mar-May | Couples | YES | YES | NO | YES | YES | YES | Excalibur Hotel & Casino | 3 | 3981 | Oceania | 0 | April | Friday |
Canada | 56 | 8 | 7 | 3 | Mar-May | Solo | YES | YES | NO | YES | YES | YES | Excalibur Hotel & Casino | 3 | 3981 | North America | 1 | May | Tuesday |
Egypt | 13 | 12 | 8 | 3 | Mar-May | Friends | YES | YES | NO | YES | YES | YES | Excalibur Hotel & Casino | 3 | 3981 | Africa | 4 | May | Wednesday |
Australia | 58 | 9 | 15 | 4 | Jun-Aug | Couples | YES | YES | NO | YES | YES | YES | Excalibur Hotel & Casino | 3 | 3981 | Oceania | 0 | June | Saturday |
Finland | 20 | 7 | 4 | 3 | Jun-Aug | Couples | YES | YES | NO | YES | YES | YES | Excalibur Hotel & Casino | 3 | 3981 | Europe | 6 | June | Saturday |
USA | 70 | 27 | 24 | 4 | Jun-Aug | Friends | YES | YES | NO | YES | YES | YES | Excalibur Hotel & Casino | 3 | 3981 | North America | 6 | July | Friday |
Kenya | 6 | 3 | 7 | 4 | Jun-Aug | Friends | YES | YES | NO | YES | YES | YES | Excalibur Hotel & Casino | 3 | 3981 | Africa | 1 | July | Thursday |
USA | 290 | 263 | 299 | 4 | Jun-Aug | Couples | YES | YES | NO | YES | YES | YES | Excalibur Hotel & Casino | 3 | 3981 | North America | 10 | August | Monday |
USA | 24 | 6 | 9 | 2 | Jun-Aug | Families | YES | YES | NO | YES | YES | YES | Excalibur Hotel & Casino | 3 | 3981 | North America | 2 | August | Wednesday |
Jordan | 29 | 8 | 21 | 3 | Sep-Nov | Business | YES | YES | NO | YES | YES | YES | Excalibur Hotel & Casino | 3 | 3981 | Europe | 2 | September | Wednesday |
Canada | 20 | 5 | 59 | 3 | Sep-Nov | Couples | YES | YES | NO | YES | YES | YES | Excalibur Hotel & Casino | 3 | 3981 | North America | 8 | September | Saturday |
Netherlands | 3 | 3 | 3 | 3 | Sep-Nov | Families | YES | YES | NO | YES | YES | YES | Excalibur Hotel & Casino | 3 | 3981 | Europe | 4 | October | Saturday |
Ireland | 47 | 6 | 27 | 4 | Sep-Nov | Friends | YES | YES | NO | YES | YES | YES | Excalibur Hotel & Casino | 3 | 3981 | Europe | 5 | October | Monday |
USA | 35 | 8 | 19 | 4 | Sep-Nov | Families | YES | YES | NO | YES | YES | YES | Excalibur Hotel & Casino | 3 | 3981 | North America | 6 | November | Sunday |
UK | 6 | 0 | 4 | 4 | Sep-Nov | Friends | YES | YES | NO | YES | YES | YES | Excalibur Hotel & Casino | 3 | 3981 | Europe | 1 | November | Sunday |
UK | 74 | 47 | 54 | 4 | Dec-Feb | Couples | YES | YES | NO | YES | YES | YES | Excalibur Hotel & Casino | 3 | 3981 | Europe | 7 | December | Wednesday |
Syria | 34 | 8 | 30 | 3 | Dec-Feb | Solo | YES | YES | NO | YES | YES | YES | Excalibur Hotel & Casino | 3 | 3981 | Asia | 4 | December | Tuesday |
UK | 576 | 43 | 340 | 3 | Dec-Feb | Couples | YES | YES | NO | YES | YES | NO | Monte Carlo Resort&Casino | 4 | 3003 | Europe | 3 | January | Monday |
USA | 20 | 8 | 11 | 4 | Dec-Feb | Solo | YES | YES | NO | YES | YES | NO | Monte Carlo Resort&Casino | 4 | 3003 | North America | 5 | January | Saturday |
USA | 418 | 32 | 132 | 2 | Dec-Feb | Couples | YES | YES | NO | YES | YES | NO | Monte Carlo Resort&Casino | 4 | 3003 | North America | 3 | February | Tuesday |
USA | 73 | 13 | 22 | 4 | Dec-Feb | Couples | YES | YES | NO | YES | YES | NO | Monte Carlo Resort&Casino | 4 | 3003 | North America | 2 | February | Tuesday |
Canada | 30 | 10 | 32 | 3 | Mar-May | Couples | YES | YES | NO | YES | YES | NO | Monte Carlo Resort&Casino | 4 | 3003 | North America | 2 | March | Wednesday |
USA | 63 | 15 | 17 | 4 | Mar-May | Business | YES | YES | NO | YES | YES | NO | Monte Carlo Resort&Casino | 4 | 3003 | North America | 2 | March | Tuesday |
Scotland | 24 | 10 | 13 | 5 | Mar-May | Couples | YES | YES | NO | YES | YES | NO | Monte Carlo Resort&Casino | 4 | 3003 | Europe | 5 | April | Wednesday |
South Africa | 54 | 18 | 16 | 2 | Mar-May | Business | YES | YES | NO | YES | YES | NO | Monte Carlo Resort&Casino | 4 | 3003 | Africa | 5 | April | Tuesday |
Australia | 20 | 7 | 11 | 4 | Mar-May | Couples | YES | YES | NO | YES | YES | NO | Monte Carlo Resort&Casino | 4 | 3003 | Oceania | 2 | May | Friday |
UK | 41 | 7 | 24 | 4 | Mar-May | Couples | YES | YES | NO | YES | YES | NO | Monte Carlo Resort&Casino | 4 | 3003 | Europe | 3 | May | Monday |
Ireland | 7 | 5 | 7 | 4 | Jun-Aug | Couples | YES | YES | NO | YES | YES | NO | Monte Carlo Resort&Casino | 4 | 3003 | Europe | 9 | June | Tuesday |
Canada | 13 | 9 | 15 | 4 | Jun-Aug | Friends | YES | YES | NO | YES | YES | NO | Monte Carlo Resort&Casino | 4 | 3003 | North America | 1 | June | Monday |
UK | 10 | 5 | 4 | 4 | Jun-Aug | Families | YES | YES | NO | YES | YES | NO | Monte Carlo Resort&Casino | 4 | 3003 | Europe | 1 | July | Friday |
New Zeland | 9 | 6 | 19 | 3 | Jun-Aug | Couples | YES | YES | NO | YES | YES | NO | Monte Carlo Resort&Casino | 4 | 3003 | Oceania | 7 | July | Wednesday |
Swiss | 36 | 19 | 36 | 2 | Jun-Aug | Solo | YES | YES | NO | YES | YES | NO | Monte Carlo Resort&Casino | 4 | 3003 | Europe | 5 | August | Thursday |
UK | 33 | 12 | 19 | 3 | Jun-Aug | Families | YES | YES | NO | YES | YES | NO | Monte Carlo Resort&Casino | 4 | 3003 | Europe | 5 | August | Saturday |
United Arab Emirates | 156 | 126 | 142 | 3 | Sep-Nov | Friends | YES | YES | NO | YES | YES | NO | Monte Carlo Resort&Casino | 4 | 3003 | Asia | 8 | September | Friday |
Ireland | 19 | 17 | 16 | 4 | Sep-Nov | Couples | YES | YES | NO | YES | YES | NO | Monte Carlo Resort&Casino | 4 | 3003 | Europe | 5 | September | Wednesday |
USA | 23 | 17 | 11 | 3 | Sep-Nov | Families | YES | YES | NO | YES | YES | NO | Monte Carlo Resort&Casino | 4 | 3003 | North America | 1 | October | Friday |
USA | 13 | 3 | 3 | 2 | Sep-Nov | Couples | YES | YES | NO | YES | YES | NO | Monte Carlo Resort&Casino | 4 | 3003 | North America | 1 | October | Sunday |
Hungary | 8 | 5 | 8 | 4 | Sep-Nov | Couples | YES | YES | NO | YES | YES | NO | Monte Carlo Resort&Casino | 4 | 3003 | Europe | 8 | November | Friday |
China | 1 | 0 | 2 | 1 | Sep-Nov | Business | YES | YES | NO | YES | YES | NO | Monte Carlo Resort&Casino | 4 | 3003 | Asia | 0 | November | Wednesday |
Greece | 21 | 18 | 6 | 2 | Dec-Feb | Business | YES | YES | NO | YES | YES | NO | Monte Carlo Resort&Casino | 4 | 3003 | Europe | 0 | December | Sunday |
Mexico | 56 | 14 | 36 | 5 | Dec-Feb | Couples | YES | YES | NO | YES | YES | NO | Monte Carlo Resort&Casino | 4 | 3003 | North America | 3 | December | Monday |
Croatia | 29 | 11 | 14 | 3 | Dec-Feb | Business | YES | YES | YES | YES | YES | YES | Treasure Island- TI Hotel & Casino | 4 | 2884 | Europe | 6 | January | Sunday |
Australia | 11 | 5 | 8 | 4 | Dec-Feb | Couples | YES | YES | YES | YES | YES | YES | Treasure Island- TI Hotel & Casino | 4 | 2884 | Oceania | 1 | January | Thursday |
Canada | 19 | 12 | 167 | 4 | Dec-Feb | Friends | YES | YES | YES | YES | YES | YES | Treasure Island- TI Hotel & Casino | 4 | 2884 | North America | 9 | February | Monday |
USA | 17 | 9 | 16 | 5 | Dec-Feb | Solo | YES | YES | YES | YES | YES | YES | Treasure Island- TI Hotel & Casino | 4 | 2884 | North America | -1806 | February | Monday |
USA | 43 | 8 | 20 | 4 | Mar-May | Couples | YES | YES | YES | YES | YES | YES | Treasure Island- TI Hotel & Casino | 4 | 2884 | North America | 1 | March | Friday |
Canada | 12 | 8 | 3 | 4 | Mar-May | Friends | YES | YES | YES | YES | YES | YES | Treasure Island- TI Hotel & Casino | 4 | 2884 | North America | 2 | March | Saturday |
USA | 15 | 14 | 7 | 4 | Mar-May | Friends | YES | YES | YES | YES | YES | YES | Treasure Island- TI Hotel & Casino | 4 | 2884 | North America | 9 | April | Tuesday |
Australia | 16 | 13 | 16 | 3 | Mar-May | Families | YES | YES | YES | YES | YES | YES | Treasure Island- TI Hotel & Casino | 4 | 2884 | Oceania | 6 | April | Wednesday |
India | 12 | 4 | 25 | 3 | Mar-May | Families | YES | YES | YES | YES | YES | YES | Treasure Island- TI Hotel & Casino | 4 | 2884 | Asia | 1 | May | Friday |
Germany | 10 | 0 | 5 | 4 | Mar-May | Friends | YES | YES | YES | YES | YES | YES | Treasure Island- TI Hotel & Casino | 4 | 2884 | Europe | 3 | May | Sunday |
USA | 27 | 17 | 16 | 3 | Jun-Aug | Families | YES | YES | YES | YES | YES | YES | Treasure Island- TI Hotel & Casino | 4 | 2884 | North America | 6 | June | Thursday |
Canada | 6 | 5 | 5 | 4 | Jun-Aug | Couples | YES | YES | YES | YES | YES | YES | Treasure Island- TI Hotel & Casino | 4 | 2884 | North America | 2 | June | Tuesday |
Australia | 21 | 20 | 14 | 5 | Jun-Aug | Couples | YES | YES | YES | YES | YES | YES | Treasure Island- TI Hotel & Casino | 4 | 2884 | Oceania | 3 | July | Wednesday |
Malaysia | 43 | 14 | 27 | 4 | Jun-Aug | Solo | YES | YES | YES | YES | YES | YES | Treasure Island- TI Hotel & Casino | 4 | 2884 | Asia | 5 | July | Thursday |
Mexico | 97 | 31 | 37 | 4 | Jun-Aug | Couples | YES | YES | YES | YES | YES | YES | Treasure Island- TI Hotel & Casino | 4 | 2884 | South America | 8 | August | Saturday |
UK | 7 | 3 | 4 | 4 | Jun-Aug | Couples | YES | YES | YES | YES | YES | YES | Treasure Island- TI Hotel & Casino | 4 | 2884 | Europe | 5 | August | Friday |
UK | 11 | 9 | 6 | 3 | Sep-Nov | Couples | YES | YES | YES | YES | YES | YES | Treasure Island- TI Hotel & Casino | 4 | 2884 | Europe | 0 | September | Friday |
USA | 78 | 11 | 30 | 4 | Sep-Nov | Business | YES | YES | YES | YES | YES | YES | Treasure Island- TI Hotel & Casino | 4 | 2884 | North America | 2 | September | Tuesday |
Australia | 12 | 7 | 4 | 5 | Sep-Nov | Families | YES | YES | YES | YES | YES | YES | Treasure Island- TI Hotel & Casino | 4 | 2884 | Oceania | 5 | October | Monday |
USA | 27 | 11 | 5 | 3 | Sep-Nov | Couples | YES | YES | YES | YES | YES | YES | Treasure Island- TI Hotel & Casino | 4 | 2884 | North America | 2 | October | Monday |
Thailand | 4 | 3 | 1 | 5 | Sep-Nov | Couples | YES | YES | YES | YES | YES | YES | Treasure Island- TI Hotel & Casino | 4 | 2884 | Asia | 7 | November | Friday |
Australia | 27 | 9 | 8 | 4 | Sep-Nov | Families | YES | YES | YES | YES | YES | YES | Treasure Island- TI Hotel & Casino | 4 | 2884 | Oceania | 2 | November | Friday |
Canada | 12 | 3 | 7 | 4 | Dec-Feb | Friends | YES | YES | YES | YES | YES | YES | Treasure Island- TI Hotel & Casino | 4 | 2884 | North America | 4 | December | Saturday |
Canada | 21 | 16 | 48 | 5 | Dec-Feb | Couples | YES | YES | YES | YES | YES | YES | Treasure Island- TI Hotel & Casino | 4 | 2884 | North America | 12 | December | Thursday |
UK | 34 | 17 | 30 | 4 | Dec-Feb | Families | YES | YES | YES | YES | YES | YES | Tropicana Las Vegas - A Double Tree by Hilton Hotel | 4 | 1467 | Europe | 8 | January | Sunday |
USA | 12 | 6 | 1 | 5 | Dec-Feb | Friends | YES | YES | YES | YES | YES | YES | Tropicana Las Vegas - A Double Tree by Hilton Hotel | 4 | 1467 | North America | 2 | January | Sunday |
Phillippines | 79 | 39 | 51 | 3 | Dec-Feb | Couples | YES | YES | YES | YES | YES | YES | Tropicana Las Vegas - A Double Tree by Hilton Hotel | 4 | 1467 | Asia | 5 | February | Wednesday |
Israel | 18 | 10 | 16 | 3 | Dec-Feb | Business | YES | YES | YES | YES | YES | YES | Tropicana Las Vegas - A Double Tree by Hilton Hotel | 4 | 1467 | Asia | 2 | February | Thursday |
Hotel.stars
with neural networks.library(tidyverse)
set.seed(1)
#lets split the data 60/40
library(caret)
trainIndex <- createDataPartition(iris$Sepal.Length, p = .6, list = FALSE, times = 1)
#grab the data
IrisTrain <- iris[ trainIndex,]
IrisTest <- iris[-trainIndex,]
# mygrid<-expand.grid(size = seq(1, 9, by = 2),
# decay = c(0.0001,0.001,0.01, 0.1))
#added something here
IrisNNET<- train(
form = Sepal.Length~Sepal.Width+Petal.Length+Petal.Width+factor(Species),
data = IrisTrain,
#here we add classProbs because we want probs
trControl = trainControl(method = "cv", number = 10),
method = "nnet",
preProcess = c("center", "scale"),
tuneLength = 5,
trace=FALSE,
linout=T)#need for preds
#IrisNNET
knitr::kable(IrisNNET$bestTune)
size | decay | |
---|---|---|
3 | 1 | 0.001 |
plot(IrisNNET)
IrisNNET_Pred<-predict(IrisNNET,IrisTest)
knitr::kable(IrisNNET_Pred)%>%
kableExtra::kable_styling("striped")%>%
kableExtra::scroll_box(width = "50%",height="300px")
x | |
---|---|
3 | 4.981076 |
6 | 4.982496 |
7 | 4.981306 |
9 | 4.981010 |
11 | 4.981962 |
13 | 4.981171 |
14 | 4.980761 |
15 | 4.981592 |
16 | 4.982632 |
17 | 4.981464 |
18 | 4.981395 |
23 | 4.980923 |
24 | 4.981577 |
26 | 4.981427 |
29 | 4.981412 |
33 | 4.982724 |
36 | 4.980933 |
37 | 4.981315 |
38 | 4.981737 |
40 | 4.981617 |
43 | 4.981076 |
48 | 4.981237 |
49 | 4.981962 |
53 | 6.546743 |
55 | 6.100833 |
56 | 6.113737 |
59 | 6.256769 |
60 | 5.567130 |
62 | 5.837557 |
64 | 6.299787 |
65 | 5.483928 |
66 | 6.101865 |
67 | 6.097235 |
70 | 5.613337 |
73 | 6.250757 |
74 | 6.366381 |
78 | 6.481844 |
88 | 5.819630 |
92 | 6.248034 |
93 | 5.673617 |
99 | 5.217657 |
103 | 6.933569 |
107 | 5.622503 |
110 | 7.197159 |
112 | 6.274803 |
116 | 6.287565 |
117 | 6.691651 |
118 | 7.871619 |
126 | 7.298106 |
128 | 6.083273 |
131 | 7.154853 |
132 | 7.757664 |
133 | 6.455817 |
135 | 6.834968 |
137 | 6.639575 |
138 | 6.742665 |
142 | 6.049859 |
145 | 6.634311 |
147 | 5.909177 |
IrisNNETtestpred<-cbind(IrisNNET_Pred,IrisTest)
#root mean squared error
RMSE(IrisNNETtestpred$IrisNNET_Pred,IrisNNETtestpred$Sepal.Length)
[1] 0.3699236
#best measure ever...RSquared
cor(IrisNNETtestpred$IrisNNET_Pred,IrisNNETtestpred$Sepal.Length)^2
[1] 0.8220672
# Compute partial dependence
pd <- pdp::partial(IrisNNET, pred.var = c("Sepal.Width","Petal.Length"))
# Default PDP
pdp::plotPartial(pd)
# Add contour lines and use a different color palette
rwb <- colorRampPalette(c("darkred", "white", "pink"))
pdp::plotPartial(pd, contour = TRUE, col.regions = rwb)
# 3-D surface
pdp::plotPartial(pd, levelplot = FALSE, zlab = "Sepal Length", colorkey = TRUE,
screen = list(z = -20, x = -60))
# Interpolate the partial dependence values
dens <- akima::interp(x = pd$Sepal.Width, y = pd$Petal.Length, z = pd$yhat)
# 3D partial dependence plot with a coloring scale
p3 <- plotly::plot_ly(x = dens$x,
y = dens$y,
z = dens$z,
colors = c("blue", "grey", "red"),
type = "surface")
# Add axis labels for 3D plots
p3 <- p3%>% plotly::layout(scene = list(xaxis = list(title = 'Sepal Width'),
yaxis = list(title = 'Petal Length'),
zaxis = list(title = 'Predicted Sepal Length')))
# Show the plot
p3
fin