Multi-space Logistic Markov Embedding (Multi-LME) is a software developed by Shuo Chen (shuochen@cs.cornell.edu)
from Dept. of Computer Science, Cornell University. It learns from
sequence data to embed the elements that constitute the sequences into
multiple spaces. We originally used it in music playlists modeling.
Please see the references for more details. This program is granted
free of charge for research and education purposes. However you must
obtain a license from the authors to use it for commercial purposes.
Since it is free, there is no warranty for it.
The software is
implemented in C with the support of Open MPI 1.6. A Makefile that uses
gcc and mpicc compilers is also included. To build the software, a
simple "make" in the same directory as the source files will do. It
creates three binaries MLogisticEmbed, MLogisticEmbed_MPI and
MLogisticPred in the same directory. The building has been tested on
various systems including GNU/Linux 2.6.9, 2.6.32, 2.6.18, 3.2.0 and
Mac OS X Mountain Lion.
MLogisticEmbed and
MLogisticEmbed_MPI take a training playlist dataset as input and
produces a multi-space embedding/model file for the songs.
MLogisticPred takes a testing playlist dataset and an embedding/model
file as input and print to stdout the average log-likelihood on the
test set.
Format of the playlist data: The first line of the data file is the IDs for the songs, separated by a space. The second line are the number of appearances of each song in the file, also separated by a space. In fact these two lines are not essential in the program, you can replace it with any integer placeholders. Starting from the third line are the playlists, with each song represented by its integer ID in this file (from 0 to the total number of songs minus one). Note that in the playlist data file, each line is ended with a space. We provide sample files, which are the datasets we used for our papers. You can download them at http://lme.joachims.org. Training: MLogisticEmbed is used in the following format for training with single process (sequentially solving each embedding in different spaces): MLogisticEmbed [options] training_file model_file where training_file is the input training playlist set, model_file is the model to output. Similarly, MLogisticEmbed_MPI is used for training embeddings in different spaces in parallel. To run it with mult-core setting, one can use mpirun -np x MLogisticEmbed_MPI [options] training_file model_file where x is the number of processes you want to launch. Usually it should be no more than the number of cores your CPU has. When running in a distributed environment, one needs to support with a host file: mpirun -np x --hostfile myhostfile MLogisticEmbed_MPI [options] training_file model_file where myhostfile may look like: machine-0 slots=2 max-slots=2 machine-1 slots=2 max-slots=2 machine-2 slots=2 max-slots=2 ...... It specifies what machines can be used and how many processes can each of them host. For more details, please refer to the manual of Open MPI. Available options are:
Testing: Testing only runs with a single process. It is simply as MLogisticPred testing_file model_file where testing_file is the input testing playlist set, model_file is the model obtained from training.
We also provide a simple python script plot.py to visualize the embeddings with portals in multiple spaces. The usage is:
python plot.py model_file Note that one needs to install Numpy and Matplotlib in order to run the script.
The following three command lines
show how to launch 4 processes in MPI to training a 2-dimensional model
with 10 spaces/clusters, then test for average log-likelihood on the
test set, finally visualize the trained model:
mpirun -np 4 MLogisticEmbed_MPI -d 2 -K 10 train.txt model.ebd MLogisticPred test.txt model.ebd python plot.py model.ebd
Please contact the author if you spot any bug in the software.
If you use the datasets, please cite the following papers:
[1] Shuo Chen, Joshua L. Moore, Douglas Turnbull, Thorsten Joachims, Playlist Prediction via Metric Embedding, ACM Conference on Knowledge Discovery and Data Mining (KDD), 2012. [2] Joshua L. Moore, Shuo Chen, Thorsten Joachims, Douglas Turnbull, Learning to Embed Songs and Tags for Playlists Prediction, International Society for Music Information Retrieval (ISMIR), 2012. [3] Shuo Chen, Jiexun Xu, Thorsten Joachims, Multi-space Probabilistic Sequence Modeling, ACM Conference on Knowledge Discovery and Data Mining (KDD), 2013. |