Skip to main content

· Laurent Perello

Why do most enterprise AI projects fail through excess complexity?

By Laurent Perello, founder of Perello Consulting -- web pioneer for over 25 years, AI operator in production since 2024. Last updated: 13 April 2026.

Seven AI projects out of ten do not reach the promised return on investment. Public reports converge: Gartner, RAND Corporation, MIT Sloan, BCG, Cour des comptes, Bpifrance Le Lab. The primary cause is almost never technical. You find it in the scoping: over-engineering of the first perimeter, mixing data and AI workstreams, heavy governance before any proof, absence of quantitative baseline, progressive scope creep. In this article, you will find five documented root causes, four KISS principles applied to AI, three dated composite cases, and a seven-point decision tree. Simplicity is not a modest retreat. It is the only strategy that delivers, with dates to prove it. We refer to the previous articles: hours costing, first process selection, risks and governance, solution classes.

The hard numbers on AI projects that fail

The figure that should open every AI committee is a failure rate between 70 and 85% according to public studies123. You will not find a serious report that contradicts this order of magnitude. RAND Corporation, in its analysis of root causes of AI project failure, places the failure rate above 80%, roughly twice that of traditional IT projects2. This is not a statistical accident; it is an industrial pattern that has repeated for five years.

Gartner has documented for several cycles that the majority of AI initiatives never pass the large-scale production stage1. MIT Sloan Management Review, in its work on the state of AI in business, observes a success gap of a factor of three between organisations that scope simply and those that scope ambitiously from the start3. BCG, in its Build for the Future programme, reports that fewer than three in ten executives report a measured ROI on their AI initiatives4. McKinsey converges in The State of AI, where the majority of respondents report deployments that stagnate at the pilot stage5.

The French perspective sharpens the picture. The Cour des comptes, in its reports on digital transformation of the State, documents systematic deadline and budget overruns on modernisation programmes that include AI6. Bpifrance Le Lab, in its SME and digital surveys, observes that a significant share of AI POCs launched by SMEs are never industrialised at 18 months7. France Strategie, in its work on AI and work, reminds us that the gap between experimentation and measurable value remains the central fault line of French adoption8.

[UNIQUE INSIGHT] You must here distinguish two types of failure that are often conflated. First type: the project that delivers nothing to production, stopped before switchover or abandoned after pilot. Second type: the project that delivers but significantly less than promised, where nobody can say whether it has paid back. The first is visible and noisy. The second, silent and frequent, is the one that erodes AI credibility for twelve to twenty-four months in the organisation. Both share the same root: excess complexity set at scoping, which the principles below allow you to avoid.

The five root causes of failure through excess complexity

Five causes recur in the engagements we have conducted between 2024 and 2026, and in the public analyses of RAND, MIT Sloan and Cour des comptes. They are not independent: they often chain in pairs. Each is described below with its observable symptoms, typical cost, anonymised composite example and the mitigation we apply in engagements.

Over-engineering from the POC

You will recognise this trap when the first project version is designed as a finished product rather than a proof. The team builds a multi-tool custom agent, a full LangGraph architecture, a vector RAG layer, a dedicated front-end, when a well-framed advanced prompt (Class 1 from article 4) or a five-step Make workflow (Class 2) would suffice to validate the business hypothesis. Your team confuses proof of value with technical demonstration. You lose three to six months before confirming that your target process deserves to be automated.

Composite example. Consulting SME, 30 people. Stated ambition: autonomous agent to produce commercial proposals. Initial choice: multi-tool LangGraph agent with vector store and in-house interface. Three months later, EUR 38k spent, the agent holds on three demo cases and fails on real proposals. Reason: commercial quality criteria had never been formalised. A versioned advanced prompt, a Notion template and a quality referent would have proven the need in two weeks.

Symptoms are easy to read. The POC exceeds six weeks without usable output. The team talks architecture before talking process. No degraded but functional version is deliverable at week two. Success criteria are not measurable. Typical cost: three to six months lost, EUR 30 to 80k consumed without production, and a credibility loss you pay for durably with your committee. Anthropic, in Building effective agents, formulates the inverse rule: start with the simplest solution, increase complexity only after proof9. Our mitigation: mandatory stage gate at D+14, no Class 3 or 4 until Class 1 or 2 has been proven.

Poorly scoped data ambition, double workstream

You will recognise the trap when an AI project is coupled with a data consolidation effort. The typical kickoff phrase: "we'll take the opportunity to clean up the CRM and install a data warehouse." Two workstreams merge into one, their risks multiply instead of adding. Your AI project waits for the data, the data waits for resources, your budget is consumed on consolidation. At twelve months, neither the data is consolidated nor the AI produces. You have the most expensive way to deliver nothing.

Composite example. Industrial mid-cap, 180 people. Ambition: product quality predictive scoring. Prerequisite set: complete data lake overhaul (18 months, EUR 400k) before any AI deployment. At fourteen months, data lake incomplete, no model trained, executive stops the programme. A simple version -- scoring on the three critical sensors already clean, a Mistral API and Metabase visualisation -- could have run in six weeks for EUR 15k. You would have kept the data overhaul as a separate workstream, with its own ROI, its own timeline, its own arbitrage.

Symptoms: the AI timeline depends on an unsecured data timeline. Data resources are mobilised over 60% on the workstream. No AI version can run on current data. You can no longer say which workstream should deliver first. Typical cost: twelve to twenty-four months of drift, EUR 150 to 500k consumed, zero AI production, and a double debt. The Cour des comptes documents this pattern6, France Strategie recalls it from the macroeconomic angle8, BCG and McKinsey in their international studies45. Our mitigation: decouple, always. The AI POC runs on data as they are; what blocks is documented, the AI perimeter is tightened to what works.

Heavy upfront governance

You will recognise the trap when an AI committee is formed before any POC, a fifty-page specification drafted, a legal review launched, an ethical charter validated by three bodies. Six months pass without a single line running in production. Governance requirements, legitimate in production and documented in article 3, are applied at a stage where they kill learning. Governance and experimentation do not deploy at the same pace. You treat it as a virtue at maturity, a burden in the learning phase.

Composite example. Services group, 400 people. Scoping started in January. AI committee (eight people), specification (sixty-two pages), external CNIL review, ethical charter, ad hoc ethics committee. First POC launched in August, first deliverable in November. Eleven months between decision and first output. An iterative version would have delivered four use cases in the same period, with a Class 1 POC from February and governance hardened progressively.

Symptoms: the specification exceeds thirty pages without a single costed use case. The steering committee meets more often than the build team. No POC has run at three months. Business teams are not invited to committees. Typical cost: six to twelve months of ignition delay, EUR 80 to 200k in internal time and services, business team demobilisation. The CNIL has formalised a lighter path through its AI sandbox, whose spirit is governance accompanied by real experimentation10. ANSSI recommends proportionate risk analysis11, AI Act articles 9 and 14 impose human supervision that does not require pre-launch documentary weight12. Our mitigation: one-page usage policy, one-page risk grid, hardening at POC-to-pilot transition, completeness at pilot-to-production transition.

Absence of quantitative baseline

You will recognise the trap when the project launches without timing the current state. No stopwatch on the target process, no volume, no reference error rate. At the end, you cannot prove a gain. You declare a feeling ("it saves us time"), your CFO contests, your executive committee arbitrates without data. The project, even technically successful, is classified "unverifiable subjective gain" and is not generalised. This is the most discreet and most frequent failure. You avoid it with three weeks of measurement, and you reap the benefits for years.

Composite example. Services SME, 22 people. Make workflow deployed to automate meeting notes. Technically impeccable. No prior timing on the time actually spent on notes. At six months, executive committee question: "how much did you save?" Answer: "hard to say." Generalisation stopped. The same project with a baseline -- four hours per week measured over three weeks before POC -- would have proven 2.8 hours saved per week and obtained extension to three other processes.

Symptoms: no initial-state figure collected before the POC. Success criteria are qualitative ("improve", "facilitate"). The business referent cannot answer "how long did this task take before?" No simple dashboard is in place. Typical cost: project delivered without demonstrable value, real gain not capitalised, extension blocked, AI credibility eroded for the following twelve months. France Num with the AI Self-Diagnostic13 and Bpifrance with the Diag Data IA14 formalise upstream measurement in their public tools. Our mitigation: mandatory baseline, three weeks of timed measurement before POC. Without baseline, no POC. The full method is in article 1 and article 2.

Progressive scope creep

You will recognise the trap when the initial perimeter is precise, then requests pile up month after month. "What if we also...", "while we're at it, we could...", "the committee asked to extend to...". No stage gate slows things down. Your budget doubles, your timeline slips, the planned architecture no longer holds, everything is rebuilt. At eighteen months, the initial project is unrecognisable and its cost has tripled. Nobody remembers the starting perimeter, nor the success criteria that went with it.

Composite example. Audit firm, 45 people. Initial project: automated qualification of incoming files, six weeks, EUR 12k. At three months, add summary generation. At six months, add multi-team routing. At nine months, add risk scoring. At twelve months, complete rebuild to absorb the whole. Final budget: EUR 78k. Final timeline: fourteen months. The initial perimeter finally works, the added modules are behind, and the team can no longer articulate the value proposition.

Symptoms: the perimeter written at kickoff is no longer recognisable at three months. Each committee adds a request without removing one. No formalised definition of done. The backlog grows faster than production. Typical cost: budget multiplied by two to five, timeline by two to three, architecture rebuilt one to two times. PMI documents this pattern as the primary cause of overruns in transformation programmes15. Our mitigation: systematic stage gates -- POC (2 weeks), pilot (6 weeks), generalisation (3 months) -- each with a written passage criterion. Any new request joins a v2 backlog. Never the current perimeter.

The KISS principle applied to AI

KISS stands for Keep It Simple, Stupid. The principle was formulated by Kelly Johnson, chief engineer of Lockheed Skunk Works, in the 1960s, about aircraft that needed to be repairable in the field with a standard toolbox16. John Gall, in Systemantics (1975), gives the theoretical version: "A complex system that works is invariably found to have evolved from a simple system that worked"17. Eric Ries gives the operational version in Lean Startup18. Anthropic applies it explicitly to AI in Building effective agents9. Four principles follow for enterprise AI.

Principle 1: a workflow running today beats a system that promises tomorrow

A simple version in production produces real data, user feedback, a learning curve. An ambitious version under construction produces slides. At equal expected value, you systematically prefer what runs. [PERSONAL EXPERIENCE] We put a diary synthesis workflow into production on 12 March 2026 in Class 1 (advanced prompt plus Notion template). Four weeks later, twelve editions produced, format stabilised through iteration. The initially envisioned custom agent version was estimated at six weeks of construction. You gain six weeks, and above all a corpus of real output that then enabled writing a better-specified agent.

Principle 2: measure before automating

No automation starts without a timed baseline. Three weeks minimum of measurement on the target process. Without this foundation, the automation can neither prove its value, nor be extended, nor be stopped with lucidity. [ORIGINAL DATA] Before automating our competitive monitoring on 20 February 2026, we measured 21 days of manual collection: 6.4 hours per week on average, of which 70% raw collection and 30% synthesis. After deploying a Class 2 workflow, measurement showed 1.8 hours per week, a net gain of 4.6 hours. Without a baseline, this figure would not exist, and the project could not have been replicated across three other monitoring tasks. The baseline method is detailed in article 1.

Principle 3: one priority use case, not three in parallel

Launching three cases in parallel divides attention, dilutes learning, multiplies dependencies. One priority case taken to generalisation produces the mental and technical infrastructure that accelerates the next ones. Sequencing is not slowing down; it is capitalising. MIT Sloan documents the correlation between AI portfolio concentration and success rate3. McKinsey confirms it in The State of AI5. [PERSONAL EXPERIENCE] Our March 2026 plan called for three internal workflows in parallel. The 8 March arbitrage retained a single case (video transcription) until generalisation. Duration: five weeks. The two subsequent cases (competitive monitoring, diary synthesis) started in week six with reused architecture, 40% faster construction.

Principle 4: iterate on simple before industrialising on complex

Each higher class from article 4 assumes the previous one has been proven. Custom agent after direct API, after no-code, after advanced prompt. Skipping a class means taking on invisible debt that will be paid within twelve months. Anthropic writes it plainly: "start with the simplest solution, increase complexity only when the need requires it"9. [ORIGINAL DATA] Our sub-agent brief generation agent followed the Class 1, then Class 2, then Class 3 path, spread over four months. No class skipped. Each step revealed constraints that the next integrated by design. Cumulative cost: EUR 8k. A project started directly in Class 4 would have cost EUR 35k and would still be under construction.

Three dated cases: what fails, what holds

Three composite cases illustrate the causes above in real configurations. All are anonymised composites: no client is named, figures are aggregated from several engagements to preserve confidentiality while maintaining accuracy of orders of magnitude.

Case 1, data overhaul plus AI in six months, failure

Composite anonymised case, no client named. Industrial SME, 40 people, EUR 7M revenue. Ambition stated March 2025: "become data-driven and AI-ready in one year." The approach merges four workstreams into a single programme: data warehouse overhaul, CRM consolidation, predictive maintenance model, commercial AI assistant. External integrator, initial budget EUR 180k over 12 months, biweekly steering committee.

The trajectory follows a classic curve. Months 1-3, specification, data specs, POCs envisioned but not started because data is not ready. Months 4-6, data migration underway, first predictive model delayed: sensor history quality insufficient. Months 7-9, 35% budget overrun, commercial assistant abandoned. Months 10-12, programme stopped. Data warehouse partially migrated, zero models in production.

Financial outcome: EUR 120k spent, zero AI deliverables in production, exhausted team, executive who defers AI by eighteen months. Dominant root cause is cause 2 (poorly scoped data ambition); secondary is cause 1 (over-engineering from POC). Reading: you should have decoupled. A twelve-month data workstream, a six-week AI POC on available data. The order is not anecdotal; it is structural.

Case 2, ambitious custom agent for lead qualification, failure

Composite anonymised case, no client named. B2B services SME, 25 people, EUR 3.2M revenue. Ambition stated September 2025: an AI agent that autonomously qualifies inbound leads, enriches, scores, routes and responds first-line. Direct choice of Class 4 (custom agent) via contractor. Budget EUR 45k, stated timeline three months. No prior POC in Class 1 or 2. No existing human qualification baseline.

Trajectory: month 1, scoping and LangGraph architecture plus commercial sheet vector store. Months 2-3, build; initial tests on fifty historical leads, 68% accuracy. Month 4, live tests, 62% accuracy versus 87% in human qualification by SDRs. The executive identifies in parallel a competing SaaS (HubSpot Breeze) achieving 78% turnkey at EUR 120/month. Project stopped, switch to Class 5 (vertical SaaS).

Outcome: EUR 45k spent, four months lost, a EUR 120/month SaaS ultimately replaces the function at 78% accuracy. A Class 2 (Make plus LLM) would have exceeded 80% in three weeks for EUR 4k; it would above all have confirmed that the need did not justify a custom agent. Dominant root cause: cause 1 (over-engineering from POC) combined with KISS principle 4 violation (classes skipped). You prove here that no custom agent should be built without prior proof in Class 1 or 2, and without SaaS comparison.

Case 3, simple workflow in two weeks, success

Composite anonymised case, no client named. B2B services SME, 15 people, EUR 1.8M revenue. Need identified December 2025: time-consuming production of client meeting notes. Baseline measured over three prior weeks: 6.4 hours per week spent on the activity. Approach: Class 2 (no-code). Zapier plus Claude Sonnet plus Notion template workflow. Budget EUR 1,800 build, EUR 35/month operations. Timeline: two weeks.

Trajectory: week 1, scoping, template, prompt, double-run (human CR and AI CR in parallel). Week 2, prompt calibration, team validation, production launch. Weeks 3-6, observation, measurement: 22 hours per month freed across four consultants, i.e. 5.5 hours per consultant. Month 3: generalisation to three other processes (audit notes, discovery call synthesis, weekly internal reports). Workflow infrastructure reused at 70%.

Outcome at six months: 140 hours per month saved, equivalent to 0.9 FTE. Total cumulative cost EUR 2,100. ROI above 30x in year one. No rebuild, no technical debt. Root cause of success: KISS 1, 2, 3 and 4 applied strictly. Baseline measured, one case at a time, Class 2 before any higher class. You have here the operational demonstration that a successful AI project looks much more like case 3 than cases 1 and 2.

The "start simple" decision tree in seven principles

Seven scoping principles form a decision tree. You answer yes to each before launching. A single "no" requires tightening the perimeter before any budget commitment. The tree is short, documented on one page, archived with the decision. It replaces neither the full methodology nor the audit work; it constitutes their systematic entry point.

Principle 1: the POC delivers value within two weeks maximum

You require a measurable output at D+14. If the proposed architecture does not allow it, the chosen solution class is too high, or the perimeter is too broad. In both cases, you tighten before signing. 80% of SME use cases fit in two weeks provided you start from Class 1 or 2.

Principle 2: no data overhaul upstream of the POC

You run the POC on data as they are. If data quality prevents any output, you document precisely what blocks, then tighten the AI perimeter to what works. The data overhaul remains a separate workstream, with its own ROI and its own timeline. You never merge the two.

Principle 3: the solution class is the simplest that does the job

You choose the lightest class that covers the need at twelve months, not three. You only move to a higher class after proving the previous one, or after excluding it with a written argument. The full class decision tree is in article 4.

Principle 4: systematic stage gates -- POC, pilot, generalisation

You formalise three steps: POC (2 weeks), pilot (6 weeks), generalisation (3 months). Each step has a written passage criterion and a binary decision: continue, stop, pivot. Without a stage gate, scope creep (cause 5) is guaranteed. A stage gate is not a constraint; it is project life insurance.

Principle 5: mandatory quantitative baseline before POC

You measure a minimum of three weeks before any launch: monthly volume, time per unit, error rate, current cost. Without baseline, no POC. Non-negotiable rule. The detailed measurement protocol is in article 1 and process target selection in article 2.

Principle 6: one process at a time

You take one case to generalisation before launching another. Three parallel cases yield three probable failures. One priority case produces reusable infrastructure that accelerates the next by 40%. In an SME under 50 people, the sustainable pace is one case every two months once the initial phase is past.

Principle 7: human supervision from day 1

You deploy the usage log, business referent and weekly review at POC production, not six months later. AI Act article 14 mandates this supervision for any system with material effect12, and article 3 documents the operational setup. You thereby protect the project, the team and the organisation.

What simplicity actually changes

Simplicity does not sell as well as complexity in committee. It does, however, produce what complexity promises without ever delivering. You gain four concrete things by applying the tree above. First gain: acceleration. A two-week POC delivers ten times faster than a six-month POC, and you learn ten times more. You iterate on reality, not on PowerPoint.

Second gain: learning. A Class 1 or 2 workflow exposes business constraints you would not see before touching the process. You discover what actually blocks, what formalises, what the team accepts. This learning becomes the project's most valuable asset: it conditions the quality of everything you build next, including a potential switch to a higher class.

Third gain: internal trust capital. A first case delivered in six weeks, measured, generalised, creates legitimacy to launch the next. A first case that drifts to nine months without a deliverable creates the opposite: a diffuse mistrust that will block your next three projects. AI credibility in the organisation is decided on the first case, not the most ambitious one.

Fourth gain: incremental accumulation. Three Class 2 workflows deployed in nine months often save more than one monumental custom agent that arrives in eighteen months (if it arrives). Complexity produces work. Simplicity produces measurable value that compounds month after month. You are not choosing between ambition and simplicity. You are choosing between talkative ambition and lucid ambition.

Frequently asked questions

Why do AI projects fail more than traditional IT projects?

The AI failure rate is approximately twice as high according to RAND Corporation2. Three structural reasons. Value is not proven upfront: AI seduces before being demonstrated. Results depend on data and prompt quality; they drift more than deterministic software. Teams confuse proof of value with technical demonstration. A traditional IT project automates an already-mapped process; an AI project often attempts to solve a still-fuzzy problem. The countermeasure is methodological: measured baseline, POC in two weeks, stage gates at each phase.

Is a two-week POC realistic?

Yes, for 80% of SME use cases, provided you start from the simplest class detailed in article 4. An advanced prompt (Class 1) is scoped and deployed in five days. A five-to-ten-step no-code workflow (Class 2) fits in ten business days. Cases that do not fit in two weeks are those starting in Class 3 or 4 through excess ambition, or coupling a data overhaul. You tighten the perimeter until D+14 output becomes possible. If it remains impossible, you do not have a POC; you have a programme that needs reframing.

Should I consolidate data before deploying AI?

No, you must decouple. The AI POC runs on data as they are. If quality blocks certain outputs, you document precisely what blocks, then tighten the AI perimeter to what works. Data consolidation is a separate workstream with its own ROI. Merging the two is the most expensive way to deliver nothing, as documented in cause 2. Exception: if data is unusable as-is (e.g. broken sensors), consolidation becomes a technical prerequisite; but then you no longer have an AI project, you have a data project.

How many parallel AI projects can an SME launch?

One, until generalisation. Three parallel use cases would dilute attention, multiply dependencies and prevent capitalisation. Once the first case is generalised, subsequent ones deploy 40% faster by reusing the architecture. In an SME under 50 people, the sustainable pace is one case every two months at cruising speed. In the initial phase, target one case over six weeks, then reassess.

When should I stop an AI project that is not progressing?

When a stage gate is not passed with a written passage criterion. If the POC does not deliver measurable value at D+14, you have two options: tighten the perimeter and give another two weeks, or stop. Never "keep going anyway, we're so close." Sunk cost is a cognitive bias to fight: euros already spent are not an argument to spend more. A lucid stop at D+14 costs EUR 5 to 10k. A forced stop at six months costs EUR 50 to 150k. You make the arbitrage using the PMI grid15.

Should you always start simple, even for a large company?

Yes. Organisation size does not cancel the laws of software. Large companies simply have more budget to burn on over-engineering; their projects fail too, just more quietly. Successful AI deployments in mid-caps and large groups follow the same pattern: simple POC, measurement, stage gate, progressive industrialisation. The difference lies in the number of POCs the organisation runs in parallel, not in the complexity of each individual POC3.

How do I convince a board not to over-engineer?

Through figures and dates. You cite public studies (Gartner1, RAND2, MIT Sloan3, Cour des comptes6) on the failure rate of ambitious AI projects. You show an internal or composite case where a Class 1 or 2 POC produced the proof a Class 4 would have taken six months to demonstrate. You propose the "start simple" tree in seven principles. You commit the board to a D+14 stage gate: proof or stop. A board that refuses the stage gate reveals its true risk appetite, and that is valuable information for what follows.

Must a successful AI project later evolve towards more complexity?

Not by default. A simple workflow that produces value for two years does not need to become a custom agent. Complexification is justified only when operational limits appear: volume exceeding the class threshold, need for multi-tool orchestration, required unit-level adaptation. Only then do you move up a class per article 4. Many successful AI projects stay in Class 1 or 2 for their entire life. This is not a failure; it is a design. Complexity is not a virtue in itself.

Start simple at your place

You hold here the substance of the matter: simplicity is not a modest retreat; it is the only strategy that delivers within stated timelines. The five root causes are avoidable, the four KISS principles are applicable from your next scoping meeting, the seven-principle tree fits on one page. You do not need a transformation programme to begin; you need one well-chosen, well-scoped first case.

Our AI audit is precisely that initial scoping. You find there the baseline measurement, the first process selection, the adapted class choice, the stage gate definition, the proportionate governance grid. It builds on hours costing, first process selection, the risks and governance framework, and solution classes. The full methodological framework is on the methodology page. Our team delivers this audit in under three weeks, and it avoids the five traps documented above.

Request your AI audit ->


Sources and methodology

This article relies exclusively on first-rank public sources: French authorities (CNIL, ANSSI, France Strategie, France Num, Bpifrance, Cour des comptes, INSEE, ARCEP, CESE, CNPEN, Banque de France, DGE, Senate), European bodies (European Commission, EDPB, AI Act), international institutions (OECD, Stanford HAI), research publications (RAND Corporation, MIT Sloan, Harvard Business Review), consulting firms cited for public studies (BCG, McKinsey, Gartner, PMI, IBM), and official technical documentation from model providers (Anthropic). Foundational references for the KISS principle (Lockheed Skunk Works, John Gall, Agile Manifesto, Lean Startup) are mobilised to historically anchor the thesis.


About the author

Laurent Perello runs Perello Consulting, an independent AI automation firm for French SMEs. After 25 years building products for the web, he now orchestrates ten AI agents that he pilots alone, with a production log published daily at perfectaiagent.xyz. He publishes his methodologies and pricing online so that every executive can decide with full information.


Orchestrator: Alpha -- Perello Consulting | 2026-04-17

Footnotes

  1. Gartner, AI adoption and failure rate research, https://www.gartner.com/en/research. 2 3

  2. RAND Corporation, The Root Causes of Failure for AI Projects, https://www.rand.org/pubs/research_reports. 2 3 4

  3. MIT Sloan Management Review, State of AI in the Enterprise, https://sloanreview.mit.edu. 2 3 4 5

  4. BCG, Build for the Future, https://www.bcg.com/publications/build-for-the-future. 2

  5. McKinsey, The State of AI, https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-state-of-ai. 2 3

  6. Cour des comptes, digital transformation reports, https://www.ccomptes.fr. 2 3

  7. Bpifrance Le Lab, SME and digital studies, https://lelab.bpifrance.fr.

  8. France Strategie, "Artificial intelligence and work", https://www.strategie.gouv.fr/publications. 2

  9. Anthropic, "Building effective agents", https://www.anthropic.com/research/building-effective-agents. 2 3

  10. CNIL, "AI sandbox", https://www.cnil.fr/fr/bac-sable-donnees-personnelles-la-cnil-accompagne-innovation-ia.

  11. ANSSI, "Security recommendations for a generative AI system", https://cyber.gouv.fr/publications, 2024.

  12. EU Regulation 2024/1689 (AI Act), articles 9, 13 and 14, https://eur-lex.europa.eu/eli/reg/2024/1689/oj. 2

  13. France Num, "AI self-diagnostic", https://www.francenum.gouv.fr/guides-et-conseils/strategie-numerique/diagnostic-numerique/autodiag-ia-evaluez-la-capacite-de.

  14. Bpifrance, "Diag Data IA", https://diag.bpifrance.fr/diag-data-ia.

  15. PMI, Pulse of the Profession, https://www.pmi.org/learning/thought-leadership/pulse. 2

  16. Lockheed Martin, Skunk Works and Kelly Johnson's KISS principle, https://www.lockheedmartin.com/en-us/news/features/history/skunk-works.html.

  17. John Gall, Systemantics (Gall's Law), bibliographic reference 1975.

  18. Eric Ries, The Lean Startup, https://theleanstartup.com/principles.