Swipe Left to Scrape

Tue 02 May 2017 By Richard

Application Authentication

Yesterday morning security forums reported news that an AI researcher had published a dataset of 40,000 photos that had been scraped from the dating app Tinder. The purpose was simply to extract a real world data set that can be used for training Convolutional Neural Networks (CNN) to tell the difference between men and women. This seems innocent enough, although the author's choice of variable naming caused a bit of a stir. He quickly changed the variable name "hoe" to "subject" soon after the story broke. Apparently this original naming was inherited from the Tinder Auto-Liker code.

This isn't the first time this has happened. Tinder has a long history of API scraping abuses:

  • The supposedly private Tinder API has been reverse engineered and fully documented here. This sort of knowledge allows easy to use open source API clients. For instance this one and this one both use Python, It is easy for anyone to download these and extend them for whatever purpose they see fit.

  • Back in February 2015 a software developer from Vancouver automated his Tinder experience. "The dating app, like so many popular apps, has seen its internal, private API reverse engineered and employed by third parties. Unauthorized users of Tinder's API commonly use it to create Tinderbots that interact with the service and other users, but Justin Long's Tinderbot looks to be one of the most ambitious Tinderbot creations." This bot can even start initial messaging conversations and try and work out if the sentiment is looking good.

  • There have a been a whole slew of TinderBots written and open sourced. Some great examples like "Building a Tinder Bot in Python" and "Automating Tinder with Eigenfaces".

  • Swipebuster is a paid service that lets you find out if somebody you know (and maybe love) is using Tinder (and perhaps you don't think they should be).

The Tinder privacy policy (which bizarrely says it was last updated a week in the future) states the following "Information Shared with Other Users. When you register as a user of Tinder, your Tinder profile will be viewable by other users of the Service. Other users (and in the case of any sharing features available on Tinder, the individuals or apps with whom a Tinder user may choose to share you with) will be able to view information you have provided to us...". Fair enough, if you sign up to Tinder you are putting your information into the public domain. But I'm sure most Tinder users would interpret this in the obvious sense that other real human users using the Tinder app will be able to see the information and react with the swipe motion of their choosing. They wouldn't expect that it would be so easy for anyone to write a piece of software that simply copied their information en masse to do with as they see fit. I'm sure most users haven't thought about that possibility. They shouldn't need to. Surely it is reasonable for Tinder's users to expect a basic duty of care for their information to make such mass data extraction at least a little bit difficult? This is pretty personal content after all.

All that is needed to access the Tinder API is a single access token. That is pretty shocking. To get one of those, as explained here, you just need to sign up as a Tinder user. That is a pretty low barrier to entry and effectively anonymous. The python code provides a user-agent string of "Tinder Android Version 3.2.0". It's not of course, it's a script running on a PC. User agent strings provide absolutely no surety of caller identity whatsoever. Not even an API key required. As we at CriticalBlue have discussed before this is not necessarily a very big barrier to securing an API, but at least it is a start and forces the Tinder app to be reverse engineered to extract the keys. There are many more advanced techniques that we cover extensively in our mobile API security techniques series. Beyond that our Approov product provides full software attestation to specifically protect against this type of automated mobile API scraping.

Rate limiting might be in place in the API implementation. It is difficult to tell without abusing it. However if there is then it is pretty ineffective. The face scraper code just seems to add some small random delays (which presumably gives the interaction a more human like characteristic) after downloading the photos of each subject before effectively swiping left. The point about swiping left is that there is no daily limit, and I suspect some real users swipe left at a prodigious rate. It must be hard to set a swipe left limit that doesn't curtail the rate of disdain some users need to demonstrate to their potential matches. The posted code amply demonstrates how far this automation can be taken. It can apparently extract 40,000 images using the same user ID from the same IP address. From looking at the code it seems a new image can be extracted every few seconds on average, so this takes less than a day to do. This must beat even the greatest power dislikers on the platform. Ultimately rate limiting can't solve the problem. All it can do is slow down and complicate the scripts. You can always create enough fake users distributed over enough IP addresses to fly under the radar of any rate limiting system. What is needed is a concerted attempt to lock down access to the API to only the app or other approved software clients. Sure, attempts could be made to try and automate those but that is considerably more difficult to achieve and easier to detect.

Given the extensive history of abuses of the Tinder API at least some of these countermeasures should be in place. Perhaps most users don't care about these things, but it only seems a matter of time before such mass profile scraping and republishing turns into a much bigger and uglier story. That could really damage the brand and make would-be customers think twice before signing up and letting their personal data be swiped.

Category: misc