Genesis – a generative physics engine for general-purpose robotics

genesis-world.readthedocs.io

219 points by tomp 4 days ago

tomp 4 days ago

Twitter announcement: https://x.com/zhou_xian_/status/1869511650782658846

GitHub: https://github.com/Genesis-Embodied-AI/Genesis

academic project page: https://genesis-embodied-ai.github.io

etwigg 4 days ago

In the sizzle reel, the early waterdrop demos are beautiful but seem staged, the later robotics demos look more plausible and very impressive. But referring to all these "4D dynamical worlds" sounds overhyped / scammy - everyone else calls 3D space simulated through time a 3D world.

> Genesis's physics engine is developed in pure Python, while being 10-80x faster than existing GPU-accelerated stacks like Isaac Gym and MJX. ... Nvidia brought GPU acceleration to robotic simulation, speeding up simulation speed by more than one order of magnitude compared to CPU-based simulation. ... Genesis pushes up this speed by another order of magnitude.

I can believe that setting up some kind of compute pipeline in a high level language such as Python could be fast, but the marketing materials aren't explaining any of the "how", if it's real it must be GPU-accelerated, but they almost imply that it isn't. Looks neat, hope it works great!

erwincoumans 4 days ago

It is a nice physics engine, it uses Taichi (https://github.com/taichi-dev/taichi) to compile Python code to CUDA/GPU (similar to what Warp Sim does, https://github.com/NVIDIA/warp)
- bengarney 2 days ago
  
  Given what's there today, especially the sizzle reel, I'm pretty dubious.
  If the author drops an amazing generative text-to-sim system on top of this... THAT is impressive - but effectively orthogonal to what's there - so I'm withholding excitement for now.
  Take the time to read over the repo. It is not revolutionary. It is an integration of a bunch of third party packages (which are largely C/C++ libraries with Python wrappers, not "pure python"!). The stuff unique to Genesis is adequate implementations of well-known techniques, or integration code.
  The backflip is awesome but plausibly explained by the third party RL library, and they include an example program which... runs a third party library to do just this.
  The performance numbers are so far beyond real world numbers as to be incoherent. If you redefine what all the words mean, then the claims are not comparable to existing claims using the same words. 43 million FPS means, if my math is right, you are spending 70 clocks per frame on a 3ghz processor. On a 4080 you would have ~500k clocks in the same period, but that implies 100% utilization with zero overhead from Amdahl's law. (Also, Hi, Erwin, maybe you think these claims are 100% realistic for meaningful workloads in which case I'll gladly eat crow since I have a huge amount of respect for Bullet!)
  I can only judge what's released now, not a theoretical future release, and what's here now is something a really good developer could bang out in a couple of months. The USP is the really good spin around the idea that it's uniquely suited for AI to produce beyond-SOTA results.
  I have slightly longer form thoughts: https://x.com/bengarney/status/1869803238389887016
rirarobo 4 days ago

> But referring to all these "4D dynamical worlds" sounds overhyped / scammy - everyone else calls 3D space simulated through time a 3D world.
In the research community, "4D" is a commonly used term to differentiate from work on static 3D objects and environments, especially in recent years since the advent of NeRF.
The term "dynamic" has long been used similarly, but sometimes connotes a narrower scope. For example, reconstruction of cloth dynamics from an RGBD sensor, human body motion from a multi-view camera rig, or a scene from video, but assuming that the scene can be decomposed into rigid objects with their individual dynamics and an otherwise static environment. An even narrower related term in this space would be "articulated", such as reconstruction of humans, animals, or objects with moving parts. However, the representations used in prior works typically did not generalize outside their target domains.
So, "4D" has become more common recently to reflect the development of more general representations that can be used to model dynamic objects and environments.
If you'd like to find related work, I'd recommend searching in conjunction with a conference name to start, e.g. "4D CVPR" or "4D NeurIPS", and then digging into webpages of specific researchers or lab groups. Here are a couple interesting related works I found:
https://shape-of-motion.github.io/ https://generative-dynamics.github.io/ https://stereo4d.github.io/ https://make-a-video3d.github.io/
All that considered, "4D dynamical worlds" does feel like buzzword salad, even if the intended audience is the research community, for two main reasons. First, it's as if some authors with a background in physics simulation wanted to reference "dynamical systems", but none of the prior work in 4D reconstruction/generation uses "dynamical", they use "dynamic". Second, as described above, the whole point of "4D" is that it's more general than "dynamic", using both is redundant. So, "4D worlds" would be more appropriate IMO.
KaiserPro 4 days ago

> "4D dynamical worlds"
Its a feature of that field of science. I'm currently working in a lab that is doing bunch of things that in papers are described $adjective-AI. In practice its just a slightly hyped, but vaguely agreed upon by consensus in weird science paper english term, or set of terms. (in the same way that guassian splats and totally just point clouds with efficient alpha blending[only slightly more complex, please don't just take my word for it])
You probably understand what this term is meant to describe, but to spell it out gives a bit of insight into _why_ its got such a shite name.
o "4d": because its doing things over time. Normally thats a static scene with a camera flying through it (3D). when you have stuff other than the camera moving, you get an extra dimension, hence 4D.
o "dynamical" (god I hate this) dynamic means that objects in the video are moving around. So you can just used the multiple camera locations to build up a single view of an object or room, you need to account for movement of things in the scene.
o "worlds" to highlight that its not just one room being re-used over and over, its a generator (well its not, but thats for another post) of diverse scenes that can represent many locations around the world.
- ggerules 4 days ago
  
  They could be implying a little bit of computer graphics in the mix. Rotation, shear, and transformation matrices have a dimension of 4.
  - KaiserPro 3 days ago
    
    I mean yeah the transformation matrix is 4x3.

extr 4 days ago

I saw this on twitter and actually came on HN to see if there was a thread with more details. The demo on twitter was frankly unbelievable. Show me a water droplet falling...okay...now add a live force diagram that is perfectly rendered by just asking for it? What? Doesn't seem possible/real. And yet it seems reputable, the docs/tech look legit, they just "aren't released the generative part yet".

What is going on here? Is the demo just some researchers getting carried away and overpromising, hiding some major behind the scenes work to make that video?

vagabund 4 days ago

My understanding is they built a performant suite of simulation tools from the ground up, and then they expose those tools via API to an "agent" that can compose them to accomplish the user's ask. It's probably less general than the prompt interface implies, but still seems incredibly useful.
- bilsbie 3 days ago
  
  Still doesn’t seem possible with current technology? It would have to access those apis while it generates video.
upcoming-sesame 4 days ago

The values on the forces diagram can't be real

poslathian 4 days ago

The way this is rolled out is super weird. Ultra bold - shocking - claims by a credible researcher but zero details amidst tons of documentation and even code that would hint at how these two universes (model based simulation and model free generative models) are being unified beyond a lot of references to differentiable simulation…

Super exciting!

forrestthewoods 4 days ago

The GitHub claims:

> Genesis delivers an unprecedented simulation speed -- over 43 million FPS when simulating a Franka robotic arm with a single RTX 4090 (430,000 times faster than real-time).

That math works out to… 23.26 nanoseconds per frame. Uhh… no they don’t simulate a robot arm in 23 nanoseconds? That’s literally twice as fast as a single cache miss?

They may have an interesting platform. I’m not sure. But some of their claims scream exaggeration which makes me not trust other claims.

reitzensteinm 4 days ago

It's possible they're executing many simulations in parallel, and counting that. 16k robot arms executing at 3k FPS each is much more reasonable on a 4090. If you're effectively fuzzing for edge cases, this would have value.
- cyber_kinetist 4 days ago
  
  The reason why they are using the FPS (frames-per-second) term in a different way, is that this robotics simulator is primarily going to be used for reinforcement learning, where you run thousands of agents in parallel. In that context, the total "batched" throughput of how many frames you can generate per second is crucial for training your policy network quickly - than the actual latency between frames (which is more important for real-time tasks like gaming)
- forrestthewoods 4 days ago
  
  Yeah it’s gotta be something like that. The whole claim comes across as rather dishonest. If you’re simulating 16,000 arms at 3000 fps each then say that. Thats great. Be clear and concise with your claims.
  - reitzensteinm 4 days ago
    
    Agreed.
GrantMoyer 4 days ago

The fine text at the bottom of speed comparison video on the project homepage says "With `hibernation = True`". Based on a search through the code, the hibernation setting appears to skip simulating components which reach steady state.
Cane_P a day ago

If you read their documentation, then you see that what they are referring to, is when they run 30k instances in parallel.
"Now, let’s turn off the viewer, and change batch size to 30000 (consider using a smaller one if your GPU has a relatively small vram): ...
Running the above script on a desktop with RTX 4090 and 14900K gives you a futuristic simulation speed – over 43 million frames per second, this is 430,000 faster than real-time. Enjoy!"
https://genesis-world.readthedocs.io/en/latest/user_guide/ge...

sakras 4 days ago

Maybe I missed it, but are there any performance numbers? It being 100% implemented in Python makes me very suspicious that this won’t scale to any kind of large robot.

dragonwriter 4 days ago

It’s implemented in Python, but it is using existing Python libraries which themselves are implemented in C, etc.
Notably it uses both Taichi and Numba, which compile code expressed in (distinct restricted subsets of) Python (much broader in Numba’s case) to native CPU/GPU code including parallelization.
mccoyb 4 days ago

Python is used here to wrap around some sort of kernel compiler (taichi). Not out of the realm of possibility that kernels which are compiled out of Python source code could be placed on device with some sort of minimal runtime (although taichi executes on CPU via LLVM, so maybe not so minimal)
v9v 4 days ago

There is enough space on large robots to add in beefier compute if needed (at the expense of power consumption). Python is run all the time on robots. Compute usually becomes more of a problem as the robot gets smaller, but it should still be possible to run the intensive parts of a program on the cloud and stream the results back.

a_t48 4 days ago

This looks neat. Single step available - as far as I can tell though, no LIDAR, no wheels? Very arm/vision focused. There’s nothing wrong with that, but robotics encompasses a huge space to simulate, which is why I haven’t yet done my own simulator. Would love a generic simulation engine to plug my framework into, but this is missing a few things I need.

andrewsiah 4 days ago

Any roboticists here? Is this impressive/what is the impact of this?

gnabgib 4 days ago

HN: https://news.ycombinator.com/item?id=42456802

ericand 5 days ago

https://x.com/zhou_xian_/status/1869511650782658846

drak0n1c 4 days ago

The demo video on this post is excellent and really illustrates the potential for robotics, simulation, and animation/games. Great for sharing with anyone non-technical.

aeon-vadu 4 days ago

So we can run AI agents with RL in molecular level simulations for replacing product designing,machanical engineering, electrical engineering, aerospace engineerig and everything else right!!? If we can combine protein folding too then we could possibly solve any disease and poverty with fully automation

ChrisArchitect 4 days ago

[dupe]

Earlier project page: https://news.ycombinator.com/item?id=42456802

ubj 4 days ago

What method is Genesis using for JIT compilation? What subset of Python syntax / operations will be supported?

The automatic differentiation seems to be intended for compatibility with Pytorch. Will Genesis be able to interface with JAX as well?

The project looks interesting, but the website is somewhat light on details. In any case, all the best to the developers! It's great to hear about various efforts in the space of differentiable simulators.

dragonwriter 4 days ago

> What method is Genesis using for JIT compilation?
Taichi and Numba are both in the pyproject.toml
sroussey 4 days ago

I believe they use Taichi.

cchance 4 days ago

Begs the question, is this why Unitree has been advancing so quickly did they have access to this for the last year while it was being developed? I recall unitree showing mass simulated training of their robots in a physics based world recently.

fudged71 4 days ago

What does it mean that gs.generate() is missing in the project?

AuryGlenz 4 days ago

"Currently, we are open-sourcing the underlying physics engine and the simulation platform. Access to the generative framework will be rolled out gradually in the near future."

baq 4 days ago

I was mildly impressed with the water demo, but that robot thing is kinda crazy, really. Finally looks like a framework for AI which can do my laundry.

spacemanspiff01 4 days ago

So this seems insane? Is it really that big or more of a problem demo with lots of drawbacks?

ilaksh 4 days ago

I suspect that the actual generation and simulation/rendering takes several minutes for each step.

stonet2000 19 hours ago

This is kind of correct. For the demos they show those will simulate quite slowly due to the soft body / fluid simulation. Their current code does not parallelize soft body/fluid simulation like they do with rigid bodies. Running their fluid sim code now with default settings (which are tuned for speed, not accuracy) gives maybe max 10-100x real time speed (their rigid body sim is magnitudes faster but definitely not faster than existing GPU simulators like Isaac or MJX). Rendering at that quality will also take forever but everyone in this area of research always runs the slow high quality ray tracing pipelines for the demo videos (which run in the 10-100 FPS range depending on settings). Another note is they claim realism but this is only visual realism, there doesn’t exist soft body sim solvers that transfer to the real world accurately enough yet for eg robotics. They are good enough for visual data generation which will be pretty cool.
(full disclosure I work on ManiSkill/SAPIEN which runs on PhysX from NVIDIA, Genesis authors claim they are essentially faster than PhysX by 10-80x).
psb217 4 days ago

The simulation/rendering is actually pretty fast since it's all done by heavily optimized gpu-based physics and graphics engines. The "generative" part is that they have some LLM stuff that's finetuned for generating configurations/parameters for the physics engine conditioned on some text. Ie, the physics and graphics are classical clockworky simulations, with a generative frontend to make it easier (but less precise) to get a world up and running. The open source release currently provides the clockworky simulator stuff, with the generative frontend to be released some time in the future.
- ilaksh 3 days ago
  
  That's what I meant about the LLM.
  If you are saying it's a real time simulation and rendering then I did not realize that.
- iamfirdaus1 3 days ago
  
  man i love what u writing even as far of this post ur 2012 post about SVM, i am 2nd year CS student from indonesia.
  are you a phd on computer science?
  - psb217 3 days ago
    
    Yeah, I finished a PhD in machine learning around 2016 and have been working professionally as a researcher since then, though I'm currently between jobs. It's a fun gig, but the "publish or perish" aspect of academic-ish research roles gets old fast.

dr_kretyn 4 days ago

100% python and fast? Either it isn't 100% python, or it isn't fast.

zamadatix 4 days ago

Depends where your boundary for "100% Anything" is I suppose. It seems to use GPU accelerated kernels written in Python via the Taichi library for most of the physics calculations. At some point, sure, the OS+GPU driver+GPU firmware you need to run the GPU accelerated kernel are not written in Python (and if you run it on CPU instead it will be slow, but more because you're using the CPU than you're not using C or something). There is a bit of numpy too, which eventually boils down to some non-Python stuff (as any Python code eventually will). I'm not sure that's a useful distinction or that the choice of language in defining the kernels makes a meaningful difference on the overall performance in this case.
- dr_kretyn 4 days ago
  
  The doc emphasizes "100% Python" and that backend is natively in Python. I'm reading this as "you don't need anything else than Python interpreter." Given a large number of packages aren't in Python under the hood, that's a big, unnecessary hyperbole. It's Ok to acknowledge that there's a heavily reliance on non-python code, e.g. Taichi or Numpy.
  I also think that the distinction isn't particularly useful. Just pedantic claims will get pedantic feedback.
  - dragonwriter 4 days ago
    
    It’s particularly useful if it is an open source project and you want to communicate to people who might want to hack on it (either in a fork or the main project) what languages they will need to work directly with to do so.
    It’s not important to end users, but they aren’t the only audience.
dragonwriter 4 days ago

The Genesis code itself is 100% python. The underlying Python libraries it uses are not (just as, or that matter, the Python standard library isn’t, but this is, in particular, using Numba – which compiles fairly normal Python to CPU and optionally GPU-native code – and Taichi, which compiles very specially-crafted Python to kernels for GPU.)
rohan_ 4 days ago

python is a spec - not an implementation. It could be using CPython or Mojo, and still be Python.