A food-recognition engine can be used with a mobile device to identify, in real-time, foods present in a video stream. To capture the video stream, a user points a camera of the mobile device at foods they are about to consume. The video stream is displayed, in real-time, on a screen of the mobile device. The food-recognition engine uses several neural networks to recognize, in the video stream, food features, text printed on packaging, bar codes, logos, and “Nutrition Facts” panels. The neural-network outputs are combined to identify foods with high probabilities. The foods may be packaged or unpackaged, branded or unbranded, and labeled or unlabeled, and may appear simultaneously within the view of the mobile device. Information about recognized foods is displayed on the screen while the video stream is captured. The user may log identified foods with a gesture and without typing.