Using Apple Store Computers for Machine Learning
Saturday, September 19th, 2015
2 years ago
Apple Stores are awesome! If you have an Apple product, you have very good support at the Genius Bar. You can touch and try everything. You can access the internet very freely. There are mostly no restrictions. What could we do with all this computing power?
Machine Learning is one of my favorite areas of computer science at the moment. I started with the Machine Learning course over at Coursera and had some courses in university. Now I am writing my bachelor thesis in this area. I am using Echo State Networks to recognize gestures in an online learning approach. To find the best network I had to try every possible combination of variable configuration options. And this would take a long time on my computer. Two options: A better computer or multiple computers. Fortunately, searching for the best parameter configuration can be done in parallel, so multiple computers were a good option.
Where can I find a lot of unused computing power, with the right tools installed and with the ability to conveniently collect all the results?
Apple Store computers are turned on many hours a day. After business hours the computers are put into a mode, where they forget everything that happened to them over the day. Everything will be reset. If somebody downloads files to a computer, they will be gone the next day. No trace is given...
Apple Store computers have Python 2.7 and Numpy preinstalled. Many Machine Learning applications are scripted in Python. Numpy is mostly used to work with matrices and is ultra-fast because of optimizations and because of just providing bridges to C code. My script was written in Python and Numpy. Perfect!
Every Apple Store computer has access to the internet. Data can be received and can go out without a hassle. Nice!
Because Mac OS X is based on UNIX, many useful commands can be executed in the Terminal application. Apple does not really restrict the use of Terminal.app, which means, that we can do some terminal magic! Yeah!
I packaged all the necessary data and the Python scripts into a ZIP file and uploaded it to my webspace. Two PHP scripts were uploaded there as well. One of them managed which parameter configuration to send to the script that requests one. The other script saved the result data sent by the scripts on the computers.
Finally I wrote a shell script that would be downloaded by me in the Apple Store and then be started in the Terminal. It would download the data, extract it into an invisible folder and start the given Python script. The script also took care of moving the Python process out of the shell to allow closing the Terminal.
So I went to the Apple Store and opened the Terminal on a computer. I downloaded the shell script via CURL and started it. A few seconds later, the Terminal was free to close. I could go to the next computer. After doing this procedure on enough computers I went out of the store and was just waiting for the incoming results.
Here is a graph that visualizes the number of results I received over time. Notice that the computer sent the data to the server not until it collected the results for one hundred parameter configurations. Therefore the plot is not so smooth.
You can see that beginning at around 16 o'clock there are multiple time spans where the number of results didn't change. One guess is that one or two of the computers were restarted by the staff. Another guess is that the computers just needed more time for some parameter configurations. Another possibility is that there were more people using the computers at this time, so the Macs were more busy and therefore slower in testing configurations. But still, at the end of the day, I received more than 7000 results to work with.
Now I just want to list the commands I used to make things work on Apple Store computers.
To download the script and all other dependencies from my webspace:
curl -o /destination/file.path http://url.to/download.file -#
-# just makes the appearance of the download screen in the Terminal a little more beautiful. It shows less information, but easier to understand.
To start the Python script and moving it to the background:
nohup nice -n +20 python ./pythonscript.py &
nice -n +20 is setting the priority of this python process. The operating system uses a scheduling algorithm to determine which process is allowed to use processor power. A priority of
+20 is the nicest setting, so every other process will be preferred by the scheduling algorithm. We use this nice option because we don't want to make the Apple computers less performant. If a computer seems to be laggy and slow, Apple staff is instructed to restart the computer. But then our data is gone. The script should work in the background. We want it to be unobtrusive.
In my Python script, I used
urllib2 to request data from and send data to the server.
That's it. With these easy tools, I was able to let Macs in the Apple Store work for me. I hope that Apple won't declare this as an infringement of their store policy or something. I just wanted to show the possibility of using their Macs for something different. I mean, a rapper recorded his whole album in an Apple Store and Apple didn't react to this. I believe that Apple doesn't care either. I think Apple did a great job with their strategy, letting people try every product and having no restrictions in order to show the power of Macs and Mac OS X. With this strategy, they're convincing people around the world to buy their products. They convinced me, too.