Machine Learning for large scale logistics platform

Location

This project was done as a remote software development internship at New York Office, IIT Kanpur during May ‘18 - Jul ‘18, the summer of my second year.

Details

The project comprised of completing multiple tasks within the assigned duration of the internship. The first task was to implement the Document Distance problem, to remove the semantically same articles from large collection of articles. This was accomplished by using word vectors and document vectors and implementing the Word Mover’s Distance algorithm to successfully eliminate all similar entries within acceptable error values. We also experimented with word-vectors and tf-idf weights to generate document vectors and finding semantically similar articles through cosine similarity of these document vectors, but Word Mover’s distance was the final method used because of better accuracy, though being slower than the other approach. The development was done in Python, and used both pre-trained word embeddings and custom trained word embeddings to get a requirement specific model.
The second task was to implement the Reverse k-Nearest Neighbour problem. The data would contain two sets of coordinates, one for facilities and the other for users. The task is to return all those users (BiChromatic RkNN) or facilities(MonoChromatic RkNN) for which the query facility is among the k-Nearest facility. This task required proper data structure that could allow quick access and insertion along with efficient spatial storage, as we need to work with 2D coordinates. Hence, two separate R-Trees were used to store the coordinates of users and facilities. Then the SLICE algorithm was used to implement RkNN problem, that partitions the space in equal sectors around the query, and acts as an improved version of halfSpace and Six-region algorithms combined to get pruning and verification done relatively faster in speed and I/O operations combined. The code for this problem was implemented in C++.

Avatar
Amrit Singhal
MS in Machine Learning

My research interests include machine learning, reinforcement learning and artifical intelligence.

Related