gen4dWe present a novel 4D content generation framework, Diffusion4D, that, for the first time, adapts video diffusion models for explicit synthesis of spatial-temporal consistent novel views of 4DWe introduce grounded 4D content generation. We identify monocular video sequences as a key component in constructing the 4D content. Our pipe facilitates conditional 4D generation,