Data Structures

This is the second in a series of blog posts sharing my experiences working with algorithms and data structures for machine learning. These experiences were gained whilst building out the nlp project for LSA (Latent Semantic Analysis) of text documents. In Part 1 of this series, I explored alternative approaches for representing and applying TF-IDF transforms for weighting term frequencies across document corpora. We tested the approaches using Go’s inbuilt benchmark functionality and found that our optimisations materially improved not just memory consumption but also performance (reducing memory consumption and processing time from 7 GB and 41 seconds to 250 KB and 0.8 seconds respectively). In this blog post I shall explore other areas for optimisation, seeking to further reduce memory consumption and processing time. ...

In my last blog post I walked through the use of machine learning algorithms in Golang to analyse the latent semantic meaning of documents. These algorithms, like many others in data science, rely on linear algebra and vector space analysis. By their nature, they often have to deal with large data sets, so any inefficiencies in the data structures used or algorithms themselves can result in a large impact on overall performance and/or memory usage. Inefficiencies that are negligable when working with small data sets can have a huge cost applied across extremely large datasets. As memory is a constrained resource, this could end up limiting the size of data sets that may be processed (certainly without having to resort to persistent storage and/or alternative algorithms) or the types of algorithms used. To this end, I decided to see if I could optimise the algorithms I used to consume less memory and improve processing performance without sacrificing too much functionality or accuracy. This is the first in a series of articles sharing my experiences benchmarking and optimising the algorithms and data structures used whilst building out the nlp project. ...

Data Structures

Optimising algorithms in Go for machine learning - Part 2: Sparse matrix formats

Optimising algorithms in Go for machine learning