Patent attributes
A system, method, and computer-readable medium for generating synthetic data are described. Improved data models for databases may be achieved by improving the quality of synthetic data upon for modeling those databases. According to some aspects, these and other benefits may be achieved by using numeric distribution information in a schema describing one or more numeric fields and, based on that schema, distribution-appropriate numerical data may be generated. The schema may be compared against actual data and the schema adjusted to more closely match the actual data. In implementation, this may be effected by storing a schema with distribution information and/or one or more parameters, generating synthetic numerical data based on the schema, and, based on a comparison with actual data, modify the schema until the synthetic data is statistically similar to the actual data. A benefit may include improved database performance and indexing based on repeatable, statistically appropriate, synthetic data.