[IPython-user] ipython1 and farm tasking

Flavio Coelho fccoelho@gmail....
Sun Mar 2 07:25:38 CST 2008


On Thu, Feb 28, 2008 at 1:59 PM, Brian Granger <ellisonbg.net@gmail.com> wrote:

>
>  Another thing to keep in mind.  Whatever database solution you use,
>  you need to make sure that it supports multiple simultaneous client
>  reading/writing from it.  Otherwise, your database can easily get into
>  an inconsistent state.  This is outside my area of expertise, so you
>  will have to investigate this yourself.

ZODB with ZEO does handle multiple concurrent connections, but I am
still trying to figure out how to make it work with multiple
processes. I have been successuful with multi-threaded access so far.

I am trying the relstorage back-end now to compar it to the standard
ZEO+FileStorage for concurrent writes.

>  As a side issue, I would love to help develop a nice standard database
>  backend that ipython engines and read/write to.  This would provide a
>  very attractive way of handling global state.  One option that might
>  scale very well is mnesia, erlangs distributed database.  Thoughts?

I think if I can figure out how to  make ZODB work with Ipython1 (or
any other multi-processing solution) I will be able to say that
without any doubt, ZODB is the best solution for what you have in
mind. for a number of reasons:

1- It is a OO database, which means that "live" objects can flow in
and out of the database seamlessly without the burden of a ORM layer
to allow the interaction with a relational DB.
2-  Objects will able to be persisted not only during an
Ipithon1session, but also for as long as the user wants to (through
multiple sessions).
3- The seamless connection between memory and Database that ZODB
provides works great as a memory management tool, allowing for
effortless off-loading of large objects to the db as they go out of
scope, until they are needed again down the line.
it supports single file storage, or standard rdms server as backends,
such as (Postgres, MySQL, or Oracle)
4- It is a Pure python(and Pythonic) Solution.

>
>  Why did you choose ZODB - is it already a part of your system?

I chose ZODB beacuse  of the reasons I mention above, but mainly
because Its absolutely transparent. I dont have to write a lot of
"boiler plate" code just to get the data to and from the DB.

Flávio


>  >
>  >
>  >
>  >  On Wed, Feb 27, 2008 at 7:29 PM, Brian Granger <ellisonbg.net@gmail.com> wrote:
>  >  > Alex,
>  >  >
>  >  >  First, I would suggest updating your ipython1 install from our svn
>  >  >  repository.  We are about to push out a major new version and the
>  >  >  documentation is _much_ better.  Also, there are many new features
>  >  >  that will hopefully help you.  Here is a simple example (using the
>  >  >  latest svn of ipython1):
>  >  >
>  >  >  In [1]: from ipython1.kernel import client
>  >  >
>  >  >  In [2]: mec = client.MultiEngineClient(('127.0.0.1',10105))
>  >  >
>  >  >  In [3]: tc = client.TaskClient(('127.0.0.1',10113))
>  >  >
>  >  >  In [4]: def fold_package(x):
>  >  >    ...:     return 2.0*x
>  >  >    ...:
>  >  >
>  >  >  In [5]: mec.push_function(dict(fold_package=fold_package))
>  >  >  Out[5]: [None, None, None, None]
>  >  >
>  >  >  In [6]: tasks =
>  >  >  [client.Task("y=fold_package(x)",push={'x':x},pull=('y',)) for x in
>  >  >  range(128)]
>  >  >
>  >  >  In [7]: task_ids = [tc.run(t) for t in tasks]
>  >  >
>  >  >  In [8]: tc.barrier(task_ids)
>  >  >
>  >  >  In [9]: task_results = [tc.get_task_result(tid) for tid in task_ids]
>  >  >
>  >  >  In [10]: results = [tr.ns.y for tr in task_results]
>  >  >
>  >  >  In [11]: print results
>  >  >  [0.0, 2.0, 4.0, 6.0, 8.0, 10.0, 12.0, 14.0, 16.0, 18.0, 20.0, 22.0,
>  >  >  24.0, 26.0, 28.0, 30.0, 32.0, 34.0, 36.0, 38.0, 40.0, 42.0, 44.0,
>  >  >  46.0, 48.0, 50.0, 52.0, 54.0, 56.0, 58.0, 60.0, 62.0, 64.0, 66.0,
>  >  >  68.0, 70.0, 72.0, 74.0, 76.0, 78.0, 80.0, 82.0, 84.0, 86.0, 88.0,
>  >  >  90.0, 92.0, 94.0, 96.0, 98.0, 100.0, 102.0, 104.0, 106.0, 108.0,
>  >  >  110.0, 112.0, 114.0, 116.0, 118.0, 120.0, 122.0, 124.0, 126.0, 128.0,
>  >  >  130.0, 132.0, 134.0, 136.0, 138.0, 140.0, 142.0, 144.0, 146.0, 148.0,
>  >  >  150.0, 152.0, 154.0, 156.0, 158.0, 160.0, 162.0, 164.0, 166.0, 168.0,
>  >  >  170.0, 172.0, 174.0, 176.0, 178.0, 180.0, 182.0, 184.0, 186.0, 188.0,
>  >  >  190.0, 192.0, 194.0, 196.0, 198.0, 200.0, 202.0, 204.0, 206.0, 208.0,
>  >  >  210.0, 212.0, 214.0, 216.0, 218.0, 220.0, 222.0, 224.0, 226.0, 228.0,
>  >  >  230.0, 232.0, 234.0, 236.0, 238.0, 240.0, 242.0, 244.0, 246.0, 248.0,
>  >  >  250.0, 252.0, 254.0]
>  >  >
>  >  >  Or if you don't need load balancing:
>  >  >
>  >  >  # This sends the fold_package function for you!
>  >  >  results = mec.map(fold_package, range(128))
>  >  >
>  >  >  Let us know if you run into other problems.
>  >  >
>  >  >  Cheers,
>  >  >
>  >  >  Brian
>  >  >
>  >  >
>  >  >
>  >  >  On Mon, Feb 25, 2008 at 7:44 PM, Alexandre Gillet <gillet@scripps.edu> wrote:
>  >  >  > Hi,
>  >  >  >
>  >  >  >  I just started using ipython1 do to distribute job on multiple cpu.  I
>  >  >  >  am having some issue and I am not sure how it works.
>  >  >  >  I want to pass a function to be run by each task on each client.
>  >  >  >  In the following code, the function fold_package need to be run on each
>  >  >  >  client.
>  >  >  >
>  >  >  >  packages_list=[ '3114', '3115','3116']
>  >  >  >  # create a  remote  controller instance
>  >  >  >  rc = kernel.RemoteController(('127.0.0.1',10105))
>  >  >  >  # create task controller instance
>  >  >  >  tc = kernel.TaskController(('127.0.0.1', 10113))
>  >  >  >  # commands won't block by default
>  >  >  >  rc.block = False
>  >  >  >  # get id of available engine
>  >  >  >  engines_id = rc.getIDs()
>  >  >  >  # process the list of packages by dispatching them to different computer
>  >  >  >  # create the task list
>  >  >  >  tasks = [kernel.Task("fold_package(%s)"%t) for t in packages_list]
>  >  >  >  # test task controller
>  >  >  >  taskIDs = [tc.run(t) for t in tasks]
>  >  >  >
>  >  >  >
>  >  >  >  when I run that code I get:
>  >  >  >  NameError: name 'fold_package' is not defined
>  >  >  >
>  >  >  >  My questions are;
>  >  >  >  How do you pass a function define in my script to the client engine?
>  >  >  >  Or Do I have to create a package that will contains my function and
>  >  >  >  installed it on each client?
>  >  >  >
>  >  >  >  Thanks for any advices and answers.
>  >  >  >  Alex
>  >  >  >
>  >  >  >  --
>  >  >  >   o Alexandre Gillet    Ph.D.           email: gillet@scripps.edu
>  >  >  >  /  The Scripps Research Institute,
>  >  >  >  o  Dept. Molecular Biology,  MB-5,
>  >  >  >  \  10550  North Torrey Pines Road,
>  >  >  >   o La Jolla,  CA 92037-1000,  USA.
>  >  >  >  /  tel: (858) 784-2053
>  >  >  >  o  fax: (858) 784-2860
>  >  >  >     web: http://mgl.scripps.edu/projects/tangible_models/
>  >  >  >  _______________________________________________
>  >  >  >  IPython-user mailing list
>  >  >  >  IPython-user@scipy.org
>  >  >  >  http://lists.ipython.scipy.org/mailman/listinfo/ipython-user
>  >  >  >
>  >  >  _______________________________________________
>  >  >  IPython-user mailing list
>  >  >  IPython-user@scipy.org
>  >  >  http://lists.ipython.scipy.org/mailman/listinfo/ipython-user
>  >  >
>  >
>


More information about the IPython-user mailing list