Deception detection is critical in high-stakes settings like border control, airport security screenings, and criminal justice interviews. Traditional methods such as the polygraph (lie detector) and human intuition have notable limitations in accuracy and consistency. In the past decade, researchers and agencies in the USA and Europe have turned to artificial intelligence (AI) and machine learning (ML) to improve lie detection during interviews. These new systems analyze verbal cues (what is said) and non-verbal behaviors (how it’s said or shown) – from linguistic patterns and voice stress to facial micro-expressions, eye movements, and physiological signals.
Text-Based Deception Detection (Verbal Content Analysis)
One AI approach analyzes verbal content – the words and language structure used – to flag deceptive statements. By training ML models on examples of truthful vs. fabricated statements, researchers attempt to spot linguistic patterns of lying (for example, liars might use fewer sensory details, more negative emotion words, or more convoluted phrasing). An important example is VeriPol, an AI system developed in Spain to detect false police reports.
VeriPol (Spain): Developed by researchers at Universidad Complutense de Madrid and Universidad Carlos III de Madrid (in collaboration with a Spanish National Police officer), VeriPol was designed to automatically identify false crime reports – particularly fabricated robbery reports – by analyzing the text of written statements. The tool uses natural language processing (NLP) to extract features of “deceptive language” and a machine learning classifier to predict if a report is truthful or not. In 2018, the Spanish National Police hailed VeriPol claiming it could detect fake robbery claims with over 90% accuracy. Indeed, an academic paper detailing VeriPol reported high success on a test set of police statements. This resulted in some operational use, Spanish police briefly used VeriPol to screen reports.
While the concept is appealing, the viability of this system was undermined by significant limitations. Most critically, the claimed accuracy has not been independently validated. A 2024 expert review identified serious concerns with the system, including a lack of transparency. These issues culminated in the Spanish National Police discontinuing its use, citing its lack of evidentiary value in judicial proceedings and failure to meet the scientific rigor required for courtroom admissibility.
In conclusion, while text-based AI tools like VeriPol hold promise as supplementary aids for investigative triage, their deployment in criminal justice must be approached with caution. Without rigorous independent validation, transparency, and demonstrable fairness across diverse populations and use cases, such tools are not yet viable for evidentiary or operational reliance in high-stakes settings.
Voice Stress Analysis and Speech-Based Systems
Analyzing a person’s voice for stress or other deception cues is another technique that predates modern AI, but recent systems claim to use AI to improve it. Voice Stress Analysis (VSA) programs listen to the audio of speech (either live or recorded) and measure subtle changes in pitch, tone, cadence, and micro-tremors that might indicate stress from lying. These have been marketed for security screenings, phone interviews, and law enforcement interrogations. Examples include the Computer Voice Stress Analyzer (CVSA) used by some U.S. police departments and Layered Voice Analysis (LVA) software. Newer research explores using machine learning on voice features for lie detection as well.
The underlying assumption is that speaking a lie will induce stress or cognitive load that involuntarily changes a person’s voice – for instance, a quiver in tone, a higher pitch, or altered speech rhythm. VSA tools process the audio signal to detect such anomalies. Some use proprietary algorithms said to detect “micro-tremors”in the 8–12 Hz range of speech, or changes in vocal stability. Modern AI could, in theory, identify more complex patterns across a person’s speech (for example, training a classifier on spectrograms of truthful vs deceptive speech).
Despite their widespread interest and experimental applications, the scientific foundation of VSA remains highly questionable. Independent studies, including a 2015 U.S. National Institute of Justice review, consistently show that VSA tools perform at chance levels, with accuracy hovering around 50%. Reviews by the Department of Defense and academic researchers have similarly concluded that VSA lacks empirical validity, with no credible evidence that these systems detect deception better than random guessing. This unreliability has led many agencies to quietly abandon VSA tools and contributed to their inadmissibility in court.
In summary, while VSA technologies are appealing in concept and simple to implement, their lack of scientific reliability, high error rates, and ethical concerns make them unsuitable for serious use in criminal justice, border security, or fraud investigations. At present, they should not be considered viable tools for deception detection in any context where accuracy, fairness, and legal integrity are paramount. As of 2025 no peer-reviewed study has demonstrated a voice-only AI system with high real-world accuracy.
Automated Border Screening Avatars (Multimodal Kiosks)
Perhaps the most futuristic development in deception detection is the use ofautomated questioning kiosks at borders and airports. These systems present a virtual avatar interviewer – often a computer-generated face on a screen – which asks travelers a series of questions, while multiple sensors covertly measure the person’s physiological and behavioral responses for signs of lying. Two prominent projects illustrate this approach: AVATAR in North America and iBorderCtrl in Europe. These systems combine multimodalnon-verbal analysis (facial expressions, eye movements, voice changes, and more) to assess credibility in real time.
AVATAR – Automated Virtual Agent for Truth Assessments (U.S./Canada)
The AVATAR system was originally developed through U.S. Department of Homeland Security (DHS) research (University of Arizona, later SDSU) as a kiosk to screen travelers at border checkpoints. A person stands in front of the kiosk, which features an animated human-like avatar on a monitor. The avatar asks the traveler standard questions When the traveler responds, several sensors monitor their behavior:
✱ A high-definition camera records facial expressions and micro-expressions.
✱ An infrared eye tracker and camera observe eye movements and pupil dilation. For example, gaze avoidance or rapid pupil changes under questioning might indicate stress.
✱ A microphone analyzes voice tone, pitch, and frequency for stress cues.
Other sensors can include motion detectors or pressure mats to catch fidgeting or changes in posture.
All these inputs feed into an AI system (with machine learning and anomaly-detection algorithms) that looks for patterns consistent with deceptive behavior. AVATAR establishes a baseline by asking innocuous questions first, to gauge a traveler’s normal physiological levels. Then it looks for deviations when the person answers pointed questions. If the system’s risk assessment crosses a threshold, it flags the traveler for follow-up by human border agents. Travelers who “pass” can be waved through more quickly.
In controlled environments, AVATAR reportedly achieved 80–85% accuracy, particularly excelling in eye-tracking tasks that identified deceptive behavior such as gaze aversion. Field trials yielded more modest performance levels—60–75% accuracy, which still surpassed average human deception detection (typically around 54%). However, practical limitations—including slow interview times and false positive rates have hindered its operational adoption. At a 75% accuracy rate, one in four deceptive individuals may go undetected, and a significant number of truthful travelers risk being misclassified.
Crucially, the lack of transparency surrounding AVATAR’s proprietary algorithm, combined with its limited demographic training data, raises concerns about bias and fairness. For example, cultural variations in gaze behavior or expression could be misinterpreted as deceptive if the AI was not trained on a diverse population. The EU has already flagged such AI systems and the forthcoming EU AI Act is likely to classify them as “high-risk,” triggering stricter regulatory oversight.
AVATAR illustrates the potential for AI to support border screening through integrated behavioral analysis and structured, fatigue-free interviewing. Its performance appears to exceed that of human screeners in specific scenarios, particularly when used as a supplementary triage tool. However, the system’s limitations especially regarding accuracy, speed, transparency, and fairness currently prevent it from being a viable standalone solution in high-stakes border or immigration decisions.
iBorderCtrl – Virtual Border Agent (EU Pilot)
Inspired by projects like AVATAR, the EU funded iBorderCtrl, a prototype intelligent border control system tested in 2018–2019. iBorderCtrl included an automated deception detection module that used an avatar interviewer to spot “biomarkers of deceit” in travelers.
A traveler approaching the border (in Hungary, Latvia, or Greece, where pilots were run) would first interact with a virtual border guard on a computer. The avatar – customized to the person’s gender, ethnicity, and language for realism – asks a series of standard questions. As the traveler responds on camera, the system uses AI to analyze 38 micro-gestures on the person’s face. These micro-expressions are tiny, involuntary muscle movements (like brief eyebrow raises, lip protrusions, eye movements) that can indicate emotional states. The idea is to catch fleeting signs of stress, anxiety, or inconsistency that might betray a lie. Each answer is scored by the AI with a credibility score.
Ultimately, anyone flagged as high risk is diverted to a human officer for a normal secondary inspection – the systemdid nothave authority to prevent crossing on its own during the trial. In parallel, iBorderCtrl also pulled data like the traveler’s identity info, did criminal database checks, and even examined the person’s online social media (Twitter) presence to further assess risk. All this information would be combined into a risk score for border guards.
In a limited pilot with approximately 30 participants, iBorderCtrl achieved a reported accuracy of 76% in detecting deception. However, this figure stemmed from a highly controlled environment involving mock scenarios, where participants were instructed to lie. Researchers and experts—including affective computing professor Maja Pantic—have cautioned that such accuracy is unlikely to hold in real-world contexts, where the psychological stakes of lying are significantly higher and more complex. Even if accuracy improved to the project’s hoped-for 85%, the vast scale of border crossings (over 700 million annually in the EU) means false positives would affect large numbers of innocent travelers, potentially leading to unnecessary interrogations and rights infringements.
Following its six-month pilot, iBorderCtrl was not adopted for operational use and its funding concluded in 2020. Investigative testing, such as that conducted byThe Intercept, demonstrated the system’s weaknesses—most notably, a false positive result where a truthful traveler was incorrectly flagged as deceptive. No further deployments have occurred, and the system remains a research prototype.
Most notably, the scientific foundation for iBorderCtrl was shaky. The assumption that deception can be reliably detected through facial micro-expressions lacks robust empirical support. Stress or nervousness at borders can stem from many benign factors, and the small, demographically imbalanced sample used in the pilot (with no Black participants and few from other underrepresented groups) undermines confidence in its generalizability. Experts warned that the system might reflect more about societal hopes for technology than about actual scientific feasibility.
In conclusion, iBorderCtrl illustrates both the technological ambition and the ethical hazards of deploying AI-based deception detection in border control. While the idea of streamlining screening through automation is compelling, the system’s limited scientific validity, high error rates, susceptibility to bias, and substantial legal and ethical issues render it currently unviable for real-world implementation. Rather than a breakthrough, it has become a cautionary tale in the debate over AI in public security—a reminder that innovation must not outpace evidence, rights protections, or public trust. Any future version would require rigorous validation, independent oversight, and strong safeguards to even be considered for operational use.
Eye-Tracking and Pupillometry: Converus EyeDetect
Another recent AI deception detection tool focuses on the eyes – measuring eye movement patterns and pupil responses under questioning. The leading product in this category is EyeDetect, developed by the U.S. company Converus (based in Utah). EyeDetect is often described as a “next-generation polygraph” that is fully automated and computer-based, using an optical sensor to monitor the eyes while the subject answers a series of questions on a screen.
In an EyeDetect test, the subject sits in front of a high-speed eye-tracking camera and reads true/false or yes/no questions displayed on a computer. They answer by pressing keys. The test typically lasts 30 minutes and includes a mix of relevant questions (about the matter under investigation) and control questions. The system measures a variety of ocular metrics: pupil dilation, blink rate, gaze fixation, reading time per question, and other micro-behaviors of the eyes and reading patterns. The theory is that lying causes measurable cognitive load and arousal that manifest in the eyes – for example, pupils may dilate under the stress of formulating a deception, or reading times might slow down due to increased processing. EyeDetect’s AI algorithm crunches these features to produce a credibility score at the end of the test.
Importantly, EyeDetect’s mechanism has roots in the same physiological theory as the polygraph: that deception causes stress and cognitive effort which can be detected via involuntary physiological changes. But instead of measuring pulse, respiration, and skin sweat (as a polygraph does), EyeDetect measures the eyes. It’s non-invasive, with no attachments, just a camera. The subject also doesn’t even need to speak aloud, which removes variation in voice or the need for an examiner to ask questions. It’s fully automated: the computer administers questions and scoring.
In practice, EyeDetect has been adopted by several U.S. police departments for vetting officer candidates and conducting internal investigations. It has also been piloted in the UK for monitoring sex offenders on probation and used by private companies in Latin America for employee screening related to criminal background or fraud. Though it has not been approved for widespread use by U.S. federal authorities, a 2018 New Mexico court did allow its results as supportive evidence—a rare legal acknowledgment, albeit on behalf of a defendant. Despite this limited operational presence, EyeDetect remains controversial and far from universally accepted.
Converus claims that EyeDetect achieves 86–90% accuracy, but these figures derive solely from internal or affiliated studies. The Policing Project at NYU Law noted a lack of independent, peer-reviewed validation, and even in the company’s own research, accuracy varied significantly depending on test conditions and populations. The algorithm’s adjustable sensitivity threshold allows users to prioritize either false positives or false negatives—raising concerns about result consistency and scientific robustness. Some internal reports, including those referenced byWIRED, suggest significant variability, warranting caution when interpreting claimed performance.
In controlled settings, EyeDetect may outperform random chance and offer faster, objective assessments compared to traditional polygraphs. The system is non-invasive, quick to administer (around 30 minutes), and does not require a trained examiner, potentially reducing human bias and scaling more easily in large institutions. Furthermore, its physiological focus on involuntary eye behavior may make it harder to consciously manipulate—at least in theory.
In conclusion, EyeDetect represents a technologically advanced attempt to modernize lie detection through AI and physiological analysis. Its speed, automation, and apparent objectivity make it appealing for organizations seeking alternatives to polygraphs. However, its viability is undermined by the absence of independent scientific validation, susceptibility to error, and significant legal and ethical concerns. At present, EyeDetect is better viewed as an adjunct tool—perhaps useful for internal risk assessments—but it falls short of the evidentiary standards and reliability required for high-stakes decisions in criminal justice or immigration contexts. Without greater transparency, rigorous external validation, and legal clarity, its role should remain limited and carefully scrutinized
Currently, no current AI system achieves anywhere near 100% accuracy. Most cluster in the 70–85% range under ideal conditions (and lower in realistic conditions). Polygraphs themselves only hover around 60–70%, despite decades of use. Humans unaided are around 54–60%. So, while AI systems can claim to beat a coin flip and even beat average humans, they still make plenty of errors. A crucial point is scalability: even a 90% accurate system will flag huge numbers of innocent people if used on millions of individuals, due to the base-rate problem. This comparative view underscores why caution and rigorous testing are needed before deploying these in high-stakes environments.
Admissibility and Legal Standards of Evidence: As of now, most courts in the USA and Europe do not admit results from polygraphs due to questions about scientific validity. The same skepticism would apply, if not more so, to AI lie detectors. There have been isolated instances (the EyeDetect admission in one case, mentioned earlier), but those are anomalies. Generally, to be admissible, a technique must pass standards like Daubert (in U.S. federal courts) which require it be based on solid science and with known error rates. AI deception detection would likely fail this currently, given lack of independent verification and error rates that are not negligible. Even if not used in court, these tools can influence pre-trial and investigative stages – raising due process issues. For instance, if police decide to arrest or charge someone mainly because an AI system flagged them as deceptive, is that a violation of the standard of probable cause or reasonable suspicion? Defense attorneys would likely challenge such uses, and rightly so, because an algorithm’s “opinion” is not evidence of a crime. Moreover, proprietary algorithms raise disclosure issues: a defendant has the right to examine the evidence against them. If that evidence is an AI’s output, the defense might demand access to the algorithm or training data to challenge it. Companies typically refuse (trade secrets), which has already been contentious with things like DNA analysis software and predictive algorithms in sentencing. This clash between trade secrecy and a fair trial is a big legal obstacle. Some jurisdictions might outright ban the use of such AI outputs in criminal matters until these issues are resolved.
Misuse and Overtrust: There is a danger that security officers or investigators might over-rely on these tools. The studyLie Detection Algorithms Disrupt the Social Dynamics of Accusation Behavior(von Schenk et al., 2024) highlights how the presence of algorithmic lie detection influences human judgment, particularly our inclination to accuse others of deception. At its core, the research reveals a significant psychological shift: when people actively choose to consult a lie-detection algorithm, they overwhelmingly defer to its judgment—especially when it suggests that someone is lying. Notably, individuals who would not otherwise make accusations become substantially more likely to do so when supported by algorithmic predictions. This suggests a tendency to offload moral and cognitive responsibility onto machines, thereby reducing personal accountability. The findings indicate that humans may default to machine judgment not necessarily because they are more prone to accuse, but because the machine’s presence lowers the perceived cost or social risk of making a false accusation. This deference is especially pronounced when individuals initiate the request for the algorithm’s opinion, reflecting a possible overreliance on technological authority in assessing veracity—even in morally charged situations. The Policing Project piece mused about how an officer might misuse test results to pressure a suspect – if they treat an 80% probability as certainty, it could lead to false confessions. This reliance could also cause complacency: agents might reduce their own vigilance, thinking the machine will catch liars, which could actually diminish overall effectiveness (especially if the AI is missing some liars).
In conclusion, AI and ML systems for deception detection are a frontier of security technology that is advancing rapidly but remains fraught with uncertainty. In the USA and Europe over the past decade, we’ve seen exciting pilots and products – from lie-detecting kiosks at borders to eye-tracking truth exams and emotion-reading avatars. Yet, alongside the excitement is a chorus of caution from researchers, legal experts, and ethicists. No system is infallible, and in life-and-death or liberty-at-stake situations, an unreliable lie detector (even if AI-powered) can do more harm than good. The trend for now is to use these systems as adjuncts to human judgment rather than replacements, and to conduct further validation. Ultimately, whether it’s in an airport or a courtroom, any use of AI for lie detection must grapple with a fundamental question: do the gains in security or efficiency outweigh the risks to truth, justice, and individual rights? As of 2025, that balance is still very much under evaluation, with most authorities leaning on the side of caution given the current state of the technology.