Up to 20% of the data used for training AI is already synthetic — that is, generated rather than obtained by observing the real world — with LLMs using millions of synthesized samples. That could reach up to 80% by 2028 according to Gartner, adding that by 2030, it’ll be used for more business decision making than real data. Technically, though, any output you get from an LLM is synthetic data. AI training is where synthetic data shines, says Gartner principal researcher Vibha Chitkara. “It effectively addresses many inherent challenges associated with real-world data, such as bias, incompleteness, noise, historical limitations, and privacy and regulatory concerns, including personally identifiable information,” she says. |