Improving transparency in AI Assistants: Researching Alexa privacy and accountability to users
Photo by Andres Urena on Unsplash

Research Blog

Improving transparency in AI Assistants: Researching Alexa privacy and accountability to users

AI assistants have become indispensable and a useful part of everyday life for people across the world. However, the key limitation in their uptake has always been the question of data security. The SAIS research team set out to investigate the gap between what users think is happening to their data and what is actually happening to their data. Through a review of current work the research team put together a picture of the Alexa ecosystem and created a way to quantify the transparency of Alexa skills, also leading to the tool Skillvet.

Since its launch in 2014, Amazon’s AI voice assistant, Alexa, has taken over the smart home market. With features ranging from playing music and checking the weather to interacting with smart home devices and holding a conversation with the user, the capabilities of Alexa goes far beyond what we could have expected at its start. The simplicity of its voice command control and other useful functions have made it a popular choice worldwide.

Along with the extensive list of Alexa smart home products,in recent years the growth of the Alexa skill ecosystem has been impressive, increasing from just 135 skills in early 2016 to over 125k skills by late 2020. However, the growing number of skills also brings a higher exposure to risks. Indeed, several security and privacy incidents have already been reported in the media, including questions over Alexa accidentally recording sensitive and private information or serving up dangerous information from the internet, making the need for addressing user privacy and transparency concerns crucial to fostering trust among the public.

Many skills are created by Amazon, but a huge number of skills are created and run by 3rd party developers, who set their own security and privacy regulations and define the parameters of their user data usage.

Analysing privacy and security in Alexa skills

In a two part project the SAIS (Secure AI assistantS) research team took on the challenge of investigating the security and privacy of Alexa skills, by reviewing the literature on security and privacy issues and then developing a method for detecting skills concerning privacy practices. The SAIS project is a collaboration between researchers from King’s College London and Imperial College London, working with industry partners, including Microsoft, Hospify and Mycroft.

Through a large-scale analysis of the Alexa skill ecosystem the team investigated the data collection practices of third-party skills and how transparent these practices are to the user. In particular, they analysed the traceability between the actions of the data specified in the privacy policies and how obvious these data operations were to users.

What they uncovered was bad privacy practices in about 43% of the Alexa skills that use permissions, which also accounts for 50% of developers with skills that request permissions.

The team contacted Amazon with the findings, who welcomed the team’s disclosure of 675 Alexa skills with privacy issues. The result? Amazon dealt with the issue and a large portion of the skills, 356 out of 675, no longer pose a threat. Clearly the skills marketplace can benefit from actionable research into privacy practices.

The SAIS team continued in this research by analysing how Alexa’s skills exposure to risks and related attacks has changed over the last three years. Interestingly, they found that while Amazon’s skills vetting process improved over time, there remains room for improvement. For instance, stricter scrutiny is given to skills that collect data via the Amazon API, but most skills that collect data via conversation (bypassing the API) do not disclose their data practices. Furthermore, many skills include a privacy policy to fulfil Amazon requirements, yet they do not promote awareness of this policy among users. They can also still be over-privileged and ask for more permissions from the user than necessary, even when they adequately disclose their data practices.

SkillVet - a tool to analyse transparency

Inspired by the positive impact of such analysis and disclosure, the SAIS team went one step further to develop ‘SkillVet’, an online privacy assessment tool that offers great benefits to users, developers, and regulators. SkillVet is an automated system that leverages natural language processing (NLP) and machine learning (ML) to systematically understand a privacy policy and automate the accountability process.

The tool works by identifying potential privacy issues by analysing the link between developers’ data practices and what’s written in privacy policies. In practice, it analyses the privacy policy text (copy, pasted or uploaded directly onto the web application), recognises relevant policy statements and assesses their traceability - is there a one-to-one match between what developers do and what they say they do (complete traceability), a more general description (partial), or no description at all (broken). The tool further lists what specific data permissions (e.g., address, location, mobile number, email address, etc.) are included in the policy.

Complete traceability - provides adequate information in its privacy policy document about its data practicesData action defined in the privacy policy document can be completely mapped to the access of data permissions.For example: "We use your device's address to find your location and search for aircraft around you"57% of skills looked at in this study
Partial traceability - privacy policy document maps partially to the access permissions.Privacy document states that a skill collects information without explicitly stating what this information is. Or data practices in a skill privacy document are not well mapped with the skill's data permission.For example: “We may require you to provide us with certain personally identifiable information", "We may also collect your zip code"18% of skills looked at in this study
Broken traceability - skill requests for permissions but does not have a privacy policy or the link to the privacy policy given is not working. Or has a privacy policy but provides no data implication in its privacy policy documentPermissions are not traceable to the data practices defined in the privacy document.For example: skill collects the device address, and its privacy document states nothing related to device address25% of skills looked at in this study

Information from https://skillvet.nms.kcl.ac.uk

The potential and benefits of the tool are undoubtedly significant. Regulators and end-users can leverage this tool to easily understand the data practices disclosed in a privacy policy and see if they violate any existing regulations, without having to read documents that are usually long and complicated. In addition, the tool allows us to inspect traceability over a period of years, as well as currently. For developers, this tool is particularly valuable to enable better compliance with privacy requirements, as they can see whether they have adequately disclosed the requested personal data and adjust their policies before going live.

With funding and support from the Information Commissioner’s Office (ICO), the SAIS team developed a web application to make SkillVet available to the public. To find out more about SkillVet and try it out, you can visit: skillvet.nms.kcl.ac.uk. SkillVet can currently only conduct traceability analysis for English-speaking marketplaces.


Connect with us on LinkedIn and Twitter

Our podcast Always Listening

If you would like to share any comments, or just to get in contact with the SAIS team email: sais-comms@kcl.ac.uk

SAIS is a cross-disciplinary collaboration between the departments of Informatics, Digital Humanities and The Policy Institute at King’s College London, and the Department of Computing at Imperial College London, working with non-academic partners: Microsoft, Humley, Hospify, Mycroft, policy and regulation experts, and the general public, including non-technical users.