ivis uses several hyperparameters that can have an impact on the desired embeddings:
embedding_dims: Number of dimensions in the embedding space.
k: The number of nearest neighbours to retrieve for each point.
n_epochs_without_progress: After n number of epochs without an improvement to the loss, terminate training early.
model: the keras model that is trained using triplet loss. If a model object is provided, an embedding layer of size
embedding_dimswill be appended to the end of the network. If a string is provided, a pre-defined network by that name will be used. Possible options are: ‘szubert’, ‘hinton’, ‘maaten’. By default the ‘szubert’ network will be created, which is a selu network composed of 3 dense layers of 128 neurons each, followed by an embedding layer of size ‘embedding_dims’.
model are tunable parameters that should be selected on
the basis of dataset size and complexity. The following table summarizes our findings:
We will now look at each of these parameters in turn.
This parameter controls the balance between local and global features of
the dataset. Low
k values will result in prioritisation of local
dataset features and the overall global structure may be missed.
k values will force
ivis to look at broader
aspects of the data, losing desired granularity. We can visualise
effects of low and large values on
k on the
Levine dataset (104,184 x 32).
Box plots represent distances across pairs of points in the embeddings, binned using 50 equal-width bins over the pairwise distances in the original space using 10,000 randomly selected points, leading to 49,995,000 pairs of pairwise distances. For each embedding, the value of the Pearson correlation coefficient computed over the pairs of pairwise distances is reported. We can see that where
k=5, smaller distances are better preserved, whilst larger distances have higher variability in the embedding space. As
k values increase, larger distances are beginning to be better preserved as well. However, for very large
k, smaller distances are no longer preserved.
To establish an appropriate value of
k, we evaluated a range of values across a severao subsamples of varying sizes, keeping
model hyperparameters fixed.
Accuracy was calculated by training a Support Vector Machine classifier on 75% of each subsample and evaluating the classifier performance on the remaining 25%, whilst predicting manually assigned cell types in the Levine dataset. Accuracy was high and generally stable for
k between 10 and 150. A decrease was observed when
k was considerably large in relation to subsample size.
ivis is fairly robust to values of
k, which can control the local vs. global trade off in the embedding space.
This patience hyperparameter impacts both the quality of embeddings and speed with which they are generated. Generally, the higher
n_epochs_without_progress are, the more accurate are the low-dimensional features. However, this comes at a computational cost. Here we examine, the speed vs. accuracy trade-off and recommend sensible defaults. For this experiment
ivis hyperparameters were set to
For each dataset, we trained a Support Vector Machine classifier to assess how well
ivis embeddings capture manually supplied response variable information. For example, in case of an MNIST dataset, the response variable is the digit label, whilst for Levine and Melanoma datasets it is the cell type. SVM classifier was trained on
ivis embeddings representing 3%, 40%, and 95% of the data obtained using a stratified random subsampling. The classifier was then validated on the
ivis embeddings of the remaining 97%, 60%, and 5% of data. For each training set split, an
ivis model was trained by keeping the
model hyperparameters constat, whilst varying
n_epochs_without_progress. Finally, classification accuracies were noramlised to a 0-1 range to facilitate comparisons between datasets.
Our final results indicate that oveall accuracy of embeddings is a function of dataset size and
n_epochs_without_progress. However, only marginal gain in performance is achieved when
n_epochs_without_progress>20. For large datasets (
n_epochs_without_progress between 3 and 5 comes to within 85% of optimal classification accuracy.
model hyperparameter is a powerful way for
ivis to handle
complex non-linear feature-spaces. It refers to a trainable neural
network that learns to minimise a triplet loss loss function.
Structure-preserving dimensionality reduction is achieved by creating
three replicates of the baseline architecture and assembling these
replicates using a siamese neural
network (SNNs). SNNs
are a class of neural network that employ a unique architecture to
naturally rank similarity between inputs. The ivis SNN consists of three
identical base networks; each base network is followed by a final
embedding layer. The size of the embedding layer reflects the desired
dimensionality of outputs.
model parameter is defined using a keras
model. This flexibility allows ivis to be trained
using complex architectures and patterns, including convolutions. Out of
the box, ivis supports three styles of baseline architectures -
szubert, hinton, and maaten. This can be passed as string
values to the
The szubert network has three dense layers of 128 neurons followed by a final embedding layer (128-128-128). The size of the embedding layer reflects the desired dimensionality of outputs. The layers preceding the embedding layer use the SELU activation function, which gives the network a self-normalizing property. The weights for these layers are randomly initialized with the LeCun normal distribution. The embedding layers use a linear activation and have their weights initialized using Glorot’s uniform distribution.
The hinton network has three dense layers (2000-1000-500) followed by a final embedding layer. The size of the embedding layer reflects the desired dimensionality of outputs. The layers preceding the embedding layer use the SELU activation function. The weights for these layers are randomly initialized with the LeCun normal distribution. The embedding layers use a linear activation and have their weights initialized using Glorot’s uniform distribution.
The maaten network has three dense layers (500-500-2000) followed by a final embedding layer. The size of the embedding layer reflects the desired dimensionality of outputs. The layers preceding the embedding layer use the SELU activation function. The weights for these layers are randomly initialized with the LeCun normal distribution. The embedding layers use a linear activation and have their weights initialized using Glorot’s uniform distribution.
Let’s examine each architectural option in greater detail:
Selecting an appropriate baseline architecture is a data-driven task. Three unique architectures that are shipped with ivis perform consistently well across a wide array of tasks. A general rule of thumb in our own experiments is to use the szubert network for computationally-intensive processing on large datasets (>1 million observations) and select maaten architecture for smaller real-world datasets.