Patent attributes
Devices and techniques are generally described for articulated three-dimensional pose tracking. In some examples, a plurality of frames of image data captured by one or more cameras may be received. First feature data representing the plurality of frames of image data may be determined using a backbone network. The first feature data may be projected into three-dimensional (3D) space. In some examples, 3D location data describing respective 3D locations of one or more persons represented by the first feature data projected in the 3D space may be determined. The first feature data and the 3D location data may be sent to a four-dimensional (4D) convolutional neural network (CNN). The 4D CNN may generate second feature data comprising respective 3D representations of the one or more persons. Three dimensional pose data representing articulated 3D pose information for the one or more persons may be generated.