As automated driving systems reach higher levels of automation, they need to be able to deal with increasingly complex situations. For this reason, Artificial Intelligence (AI) especially in the form of machine learning, is being increasingly applied in vehicles. With increasing complexity of functionality comes increasing complexity of testing. How can we ensure that these safety-critical AI systems are safe enough to go on the public road? We aim to address the challenges of verification and validation of AI in automated vehicles at system level. We have identified three key areas of research that need to be addressed to bring AI-powered vehicles safely to the road: 1) Quantification of the (dis)similarity between operational design domain and operational domain 2) Quantification of correct behaviour of the system 3) Quantification of the performance of the system in the operational domain In this presentation we will go deeper into these challenges and provide possible research directions in order to overcome them.