You go parallel when the size of your problem does not fit on a single SMP
machine. The limit is now 32GB, single nodes that can handle more than that
are outrageous expensive. Below that, it's just a matter of money.Note that LES is NOT an algorithm for solving PDE, is just a subgrid model for
turbulence and it is computed for every element. You can use LES in
spectral, FVM, FEM, Lattice-Boltzmann...OpenFOAM is FVM so I guess it will scale nicely with gigE to maybe a hundred
of nodes.Please don't make assumptions like "if I buy twice the computing power it will
run in half the time" with distributed memory computers. Always keep in mind
that the bottleneck is your switch, when you hit its top performance it is
done whatever the performance of your nodes is. If you are lucky the time
spent communicating will still be neglectable compared with the computing
time.Computing time remains constant because it depends only on how much numbers
you can fit on every node's memory. Communications time depends on how much
data you must share between nodes and depends (not necessarily linearly) on
the number of nodes. The more nodes you add the more stressed the switch is.
If you hit its maximum bandwidth and the network is good latencies will be
constant and the communication time will raise on a predictable way; if the
network is bad latencies will increase and you will reach an asymptote.My advice is, if you are using FVM buy gigE NICs and a gigE switch, but buy
some good ones.10gig Myricom cx4 NICs cost 600Eur.
10gig switch with 12 ports (FUJITSU) cost 6300Eur.The same will apply on Infiniband.
This means that adding a fast network increases the cost of the node on
~1200Eur, and that's the cost of a 2 processor quad core computing node.Take the computing power you need, see how much money you need to buy it,
multiply it by two and you will have the money for your computing power with
a fast network. This will ALWAYS give you the maximum performance no matter
the workload of your network or the number of nodes.guillem
On Wednesday 30 July 2008 12:15:10 you wrote:
> Well, just I remember, that some people used Fluent on 5 nodes for LES
> through gigE. I don't really thrust their competence, but it still makes me
> asking something like: "what's the average considerable node count for LES
> problems in practice through gigE?" (I suppose, min is strictly 1)
> Note, that I "play" mostly with OpenFOAM, if makes some more sense.
>
> By the way, am I right about thinking, that Myrinet and Infiniband prices
> are quite similar?
> As you know, that makes one use powerful nodes and all together makes
> prices jump to next level (3 - 6 times more than with commodity hardware).
> For me the difference is some 2 times, also I can get 16cores in one for 2x
> more money, and 32 for 2x less!
>
> And one more thing, please explain your syntax here: "2x(Required flops)
> ($/flops)"!
>
> Thanks! Glad to be in touch Guillem!
>
> On Wednesday 30 July 2008 08:28:31 Guillem Borrell Nogueras wrote:
> > hello
> >
> > You hit the right mailbox.
> >
> > The problem with speedups is that you can never tell. Most of algorithms
> > scale unmodified on conventional networks up to 4 nodes then getting no
> > speedup at all, sometimes losing performance. On the other hand I have
> > seen FVM codes scaling linearly on gigE up to 64 nodes.
> >
> > The reason for that is always the same, inexpensive networks become
> > terribly slow under serious workloads. Fast networks are so not because
> > they have bigger bandwidth (sometimes you only get 10x for 50x money),
> > but because they almost never lose performance. And this is not a matter
> > of the bandwidth you get on the NIC but the real bandwidth you have on
> > the spine of your switch.
> >
> > You can make almost any algorithm perform decently on commodity hardware
> > but then you will have to work carefully on the topology of your
> > communications. Once you hit the 8-node limit this is a serious issue. It
> > is ONLY because of your network, it has nothing to do with your nodes.
> >
> > I will give you another rule of thumb that *may not be true on your
> > case*. If you look at most plots of performance vs number of nodes they
> > look all the same. Typically the number-of-nodes axis is plot using
> > powers of two. That curve is a straight line until a certain number of
> > nodes, call it 2^n. On 2^(n+1) you still get some speedup and at 2^(n+2)
> > you get no speed up at all. Plot performance vs number of nodes on your
> > 16-node cluster and you will obtain the behaviour of your network.
> >
> > hope it helped
> >
> > guillem
> >
> > On Wednesday 30 July 2008 09:50:16 you wrote:
> > > Hello!
> > >
> > > Some time ago I posted a question about that in gentoo.org forum. You
> > > created an impression, you are a competent person, so I would
> > > appreciate, if you estimate inefficiencies for these small cluster
> > > configurations:
> > >
> > > 1. 5 nodes, each 2 Xeon CPUs (4 cores)
> > > 2. 8 nodes, each 1 AMD Phenom (4 cores)
> > > 3. 16 nodes, each 1 AMD Phenom (4 cores)
> > > in all these cases Gbit LAN would be used.
> > >
> > > The thing is: I plan building a small cluster for education /
> > > experiments purposes, hardest of whose could be LES computing. It is
> > > possible to use monstrous board with 4 or even 8 quad cores, so avoid
> > > using expensive myrinet or infinibands (overkill for me), but it is
> > > still more expensive (2 times or so).
> > > So, as I wrote, basically my question is about speedup inefficiency -
> > > no good to waste money for more PCs, if they give nothing; what could
> > > be speedups for those?
> > >
> > > I hope, this will be the right mail address I found somewhere...
> > > Please spare some time!
> > >
> > > Regards,
> > > Kārlis