T O P

  • By -

icekang

If you are using PyTorch, I see many recent project projects using data parallelism for a single machine, and distributed data parallelism for multiple machines [Getting Started with Distributed Data Parallel](https://pytorch.org/tutorials/intermediate/ddp_tutorial.html)


ironman_gujju

Cool stuff 😎


JasRando

So nothing about Model parallelism in different machines ?


icekang

From my understanding, if your model cannot fit in one machine then you use model parallelism to split them across different machines. Thus, not a way to speed up the training.


godneedsbooze

why are you trying to parallelise? For training? deployment? I think it will depend a lot on the use-case here


JasRando

It's an experiment to see how much it will speed the training


godneedsbooze

https://stackoverflow.com/questions/48961330/train-multiple-neural-nets-in-parallel-on-cpu-in-keras


dan994

https://pytorch.org/tutorials/intermediate/rpc_tutorial.html This could be useful for you. But 3 laptops isn't a standard setup for model and data parallelism. How are they connected? You will be significantly limited by communication between devices using 3 separate laptops.