Complete the part marked with ??? in the code below (Scala):
Below is a skeleton for a function that trains and returns a classifier function that in effect enables the machine to "see" binary digits based on prior experience (the "training"). The function 'train' takes as input training digits (an array of digits, each of which is an Array of 25600 Doubles) and the labels of the training digits (each an Int, either 0 or 1). Put otherwise, for each j we have that labels(j) tells whether digits(j) is a 0 or 1. The training function returns a classifier function. A classifier function takes as input a digit (an Array of 25600 Doubles) and returns its very best guess whether it "sees" a 0 or 1 in the input.
You need to implement the training/classifier function. It is up to you what type of function you want to implement. Do not use external libraries. An easy possibility is to proceed as follows:
Record (in the closure of the classifier) the given training data and the training labels.
When given as input a digit to classify, return the label of the nearest neighbour in the training data, where distance is measured by the Euclidean distance between the corresponding feature vectors (feature vector corresponding to given digit, and closest neighbor feature-vector-wise in training data
The code that needs to be finished (??? part):
object classifier {
val sizex = 160 // width of digits (do not edit)
val sizey = 160 // height of digits (do not edit)
val m = sizex*sizey // length of digit array (=25600) (do not edit)
def train(digits: Array[Array[Double]], labels: Array[Int]) : (Array[Double]) => Int = {
val features = digits.map(feature.get(_))
def classifyDigit(digit: Array[Double]) : Int = {
// return the very best guess as to whether 'digit' is a 0 or 1
???
}
classifyDigit // return the classifier
}
}
/*
* We give you some help in your task. Namely, the object 'feature' below
* defines a function 'get' to compute __feature vectors__ from the
* grayscaled handwritten digits. A __feature vector__ is a
* low(er)-dimensional representation of the original input.
* The function consists of three simple steps
* (indeed, rather than do anything fancy, our aim is to be able
* to visualize the feature vectors as smaller images in the data browser):
*
* 1) Down-sample by a factor of four to reduce dimension
* (the original digits are 160-by-160 pixels, the feature
* vectors have 40-by-40 entries).
*
* 2) Do biased rounding to articulate the digit
* (all pixels that are >= 80% of the average grayscale become 0.0,
* all remaining pixels become 1.0).
*
* 3) Center the 40-by-40 array at the __barycenter__ of the 1.0-pixels,
* and complement the 1.0s to 0.0s and vice versa.
* (The centering at the barycenter eases e.g. nearest neighbor search.)
*
*/
object feature {
val h = 4
val sizex = classifier.sizex/h
val sizey = classifier.sizey/h
val m = sizex*sizey
def get(digit: Array[Double]) = {
val down = new Array[Double](m)
var i = 0
while(i < m) {
down(i) = digit((((i/(sizex))*sizex*h*h)+h*(i%(sizex))))
i = i+1
}
val average = down.sum/m
val step = down.map(v => if(v >= 0.8*average) { 0.0 } else { 1.0 })
val total = step.sum
val barx = math.round((0 until m).map(j => step(j)*(j%sizex)).sum/total).toInt
val bary = math.round((0 until m).map(j => step(j)*(j/sizex)).sum/total).toInt
val vec = new Array[Double](m)
i = 0
while(i < sizey) {
var j = 0
val bi = (i+bary+sizey+sizey/2)%sizey
while(j < sizex) {
val bj = (j+barx+sizex+sizex/2)%sizex
vec(i*sizex+j) = 1.0-step(bi*sizex+bj)
j = j+1
}
i = i+1
}
vec
}
}