Tell me what you see: A zero-shot action recognition method based on natural language descriptions [2112.09976]