Mongo EMF

If you have worked with the Eclipse Modeling Framework (EMF), then you already know how easy it is to create domain models.  If you haven’t, then I highly recommend giving it a try.  I develop all of my domain models using EMF.  EMF has built-in support for persisting models to files in XMI, XML, and binary format.  The persistence API is highly extensible, and I’ve recently been working with Ed Merks on some code that persists EMF models to MongoDB.  The project is called mongo-emf and is hosted on EclipseLabs. Version 0.3.1 is being used in a real application and is fairly stable.  This version supports all of the basic CRUD functions along with a basic query language for finding objects.  This blog post will cover the CRUD functions while Ed Merks will have a blog post soon on using the query model.

One feature that I hope you will find attractive is that there are no annotations or XML configuration files required.

If you are attending EclipseCon 2011, I will be giving a talk titled Hardware Developer’s Workbench: a Case Study in which I will discuss how we are using this technology in our application.

Quickstart

Here’s an example on how easy it is to insert an object into MongoDB using EMF.

ResourceSet resourceSet = new ResourceSetImpl();
EList uriHandlers = resourceSet.getURIConverter().getURIHandlers();
uriHandlers.add(0, new MongoDBURIHandlerImpl());

Person user = ModelFactory.eINSTANCE.createMyObject()
user.setName("Test User");
user.setEmail("test@example.org");

Resource resource = resourceSet.createResource(URI.createURI("mongo://localhost/app/users/"));
resource.getContents().add(user);

try
{
  resource.save(null);
}
catch(IOException e)
{
  e.printStackTrace();
}

Let’s look at the code above in detail.  Line 1 is the normal way you create an EMF ResourceSet.  Lines 2 and 3 hook the MongoDB URI handler that interfaces EMF to MongoDB to the ResourceSet. Lines 5 – 7 set up the model instance. Line 9 creates the EMF resource using a URI that has a specific pattern explained below.  Line 10 simply adds the model instance to the EMF Resource.  Line 14 saves and inserts the model instance to MongoDB.

By adding just two additional lines of code and using a mongo URI, you are able to utilize the full power of EMF with your objects persisted in MongoDB.

Mongo EMF URIs

All EMF resources are identified by a URI.  The URI you use when persisting models to MongoDB must have the form:

mongo://<host>/<database>/<collection>/<id>

  • host is the hostname of the server running MongoDB (the port may be specified if not default)
  • database is the name of the MongoDB database to use
  • collection is the name of the MongoDB collection
  • id is the unique identifier of the object in MongoDB (optional on create / insert)

The URI path must have exactly three segments. Anything else will cause an IOException on a load() or save().

Create

When inserting a new object into MongoDB, the id segment of the URI is optional and typically the empty string.  By default, MongoDB assigns a unique ID to an object when it is inserted into the database. In this mode, the URI of the EMF resource is automatically updated to include the MongoDB generated ID value.  Going back to the example in the Quickstart, the resource URI will be similar to mongo://localhost/app/users/4d6dc268b03b0db29961472c after the call to save() on line 14.

It is also possible for the client to generate the ID and persist the object to MongoDB using that ID.  The ID can be any string value that is unique within the MongoDB collection.  The quickstart example above would be modified as follows:

Resource resource = resourceSet.createResource(URI.createURI("mongo://localhost/app/users/1"));
resource.getContents().add(user);

try
{
  resource.save(null);
}
catch(IOException e)
{
  e.printStackTrace();
}

Retrieve

To retrieve an object from MongoDB, you load an EMF Resource using the unique URI for the object.

ResourceSet resourceSet = new ResourceSetImpl();
EList uriHandlers = resourceSet.getURIConverter().getURIHandlers();
uriHandlers.add(0, new MongoDBURIHandlerImpl());

Resource resource = resourceSet.getResource(URI.createURI("mongo://localhost/app/users/4d6dc268b03b0db29961472c"), true);
Person user = (Person) resource.getContents().get(0);

Update

To update an object in MongoDB, you simply save the Resource containing the object.

user.setEmail("mongo-emf@example.org);

try
{
  user.eResource().save(null);
}
catch(IOException e)
{
  e.printStackTrace();
}

Delete

To delete an object in MongoDB, you call delete() on the Resource containing the object.

try
{
  user.eResource().delete(null);
}
catch(IOException e)
{
  e.printStackTrace();
}

Collections

A collection of objects may be stored (contained) within a parent object, or stored as individual objects in a MongoDB collection. For objects that are contained as part of a parent object, when you operate on (load, save) the parent object, you operate on (load, save) the entire collection. When a collection is stored as individual objects in a MongoDB collection, each object is contained its own EMF Resource and must be managed individually by the client. If you allow MongoDB to generate the object ID, you may bulk insert a collection of objects in a single save() call. The resource is modified to contain a single Result result object with a proxy to each of the inserted objects. You may iterate over the proxies to load each object into its own Resource. Here is an example of bulk insert:

ResourceSet resourceSet = new ResourceSetImpl();
EList uriHandlers = resourceSet.getURIConverter().getURIHandlers();
uriHandlers.add(0, new MongoDBURIHandlerImpl());

Resource resource = resourceSet.createResource(URI.createURI("mongo://localhost/app/users/"));

for(int i = 0; i < 10; i++)
{
  Person user = ModelFactory.eINSTANCE.createMyObject()
  user.setName("User " + 1);
  user.setEmail("user" + i + "@example.org");
  resource.getContents().add(user);
}

try
{
  resource.save(null);
}
catch(IOException e)
{
  e.printStackTrace();
}

Result result = resource.getContents().get(0);

for(EObject eObject : result.getValues()
{
  Person user = (Person) eObject;
  System.out.println("Person " + user.getName() + " has URI: " + user.eResource.getURI());
}

EMF References

The way in which EMF references are persisted in MongoDB depends on the settings you specified when you created your EMF Ecore and Generator models. Two Ecore settings that affect persistence are: Containment, and Resolve Proxies. The Generator setting that affects persistence is: Containment Proxies. There are three types of references to consider: non-containment, containment, and bi-directional cross-document containment.

Non-containment References

A non-containment reference is modeled by setting Containment = false in the Ecore model. A non-containment reference can be to any other object in the database, a file, or on some other server. The target object may be contained by the referencing object, or could be in some other resource. Non-containment references are always persisted as a proxy. If the target object is in a separate resource from the referencing object, the target object must be persisted, on creation, before the object with the reference so that the proxy URI contains the ID of the target object.

Containment References

A containment reference is modeled by setting Containment = true in the Ecore model. A containment reference is persisted as a nested object in the same MongoDB document as the referencing object if Resolve Proxies = false in the Ecore model or Containment Proxies = false in the Generator model. If Resolve Proxies = true in the Ecore model and Containment Proxies = true in the Generator model, the reference will be persisted as a proxy if the target object is contained in its own Resource (cross-document containment). If the target object is not contained in its own Resource, it will be persisted as a nested object of the referencing object.

Bi-directional Cross-document Containment References

Bi-directional cross-document containment references need special consideration when it comes to saving the objects. For the proxies to be properly persisted, three calls to save() must be made on creation. One of the two objects must be saved twice. The other object must be saved once between the two saves to the other object. For example, consider a bi-directional reference between a Person and an Address, the code to save the two objects would be as follows:

user.setAddress(address);

try
{
  address.save(null);
  user.save(null);
  address.save(null);
}
catch(IOException e)
{
  e.printStackTrace();
}

Modeling Restrictions

There is one minor restriction that you must follow when creating your EMF Ecore model. The following keys are reserved for internal use and may not be used as attribute or reference names:

  • _id
  • _eId
  • _eClass
  • _timeStamp
  • _eProxyURI

Mongo EMF Project Bundles

The project consists of several core bundles, an example project, a JUnit test bundle, and a JUnit test utility bundle.

  • org.eclipselabs.emf.query – this bundle contains the model query support
  • org.eclipselabs.mongo – this bundle contains an IMongoDB OSGi service that provides basic connectivity to a MongoDB database
  • org.eclipselabs.mongo.emf – this bundle contains the MongoDBURIHandlerImpl that provides EMF persistence to MongoDB
  • org.eclipselabs.mongo.freemarker – this bundle contains support for persisting FreeMarker templates in MongoDB
  • org.eclipselabs.mongo.emf.examples – this bundle contains a simple example using MongoDBURIHandlerImpl
  • org.eclipselabs.mongo.emf.junit – this bundle contains the JUnit tests for the core bundles
  • org.eclipselabs.mongo.junit – this bundle contains utilities for users developing their own JUnit tests

Unit Testing

The org.eclipselabs.mongo.junit bundle contains two useful classes that can make writing your unit tests easier.

MongoDatabase

This class is a JUnit @Rule that will clear your MongoDB database after each test. The rule requires the BundleContext and the name of the database to be passed as parameters to its constructor. Here is an example on how to use that rule:

	@Rule
	public MongoDatabase database = new MongoDatabase(Activator.getInstance().getContext(), "junit");

MongoUtil

This utility class contains functions for creating a ResourceSet, getting one or more objects from the database, getting the ID of an object, registering the IMongoDB service with OSGi, and comparing two EObjects. This class has extensive JavaDoc comments that explain the usage of each function.

Launch Configurations

Your launch configuration must include the following bundles and their dependencies:

  • org.eclipselabs.emf.query
  • org.eclipselabs.mongo
  • org.eclipselabs.mongo.emf
  • org.eclipse.equinox.ds (or equivalent)

The org.eclipselabs.mongo bundle uses OSGi declarative services to register the IMongoDB service. This service is used by MongoDBURIHandlerImpl to connect to MongoDB.

References

  1. Eclipse Modeling Framework
  2. MongoDB
  3. EclipseLabs mongo-emf project
About these ads

45 thoughts on “Mongo EMF

  1. Very interesting work!

    Any feedback about the r/w performance compared to XMI & binary serializations ?

    Also, what about memory consumption ? I suppose that you keep the proxy resolution mechanism unchanged and that memory consumption of loaded/unloaded resources does not changed compared to XMI, isn’t it ?

    Regards,

    • We do not have any performance comparison to XMI & binary serializations. We will be doing some performance analysis in a real system soon. If the results are interesting, I’ll consider another blog post.

      I expect memory consumption to be no different since it really depends on which EMF options you used when generating your model.

      • When we first committed the support in 206267, we measured that binary results in files 1/3 the size and for in-memory byte array
        streams, it reads 6 times faster and writes 8 times faster than an XML
        serialization. One might expect the 1/3 size difference to add another factor of 3 when disk IO overhead is added.

    • Eike has his own integration with MongoDB and I’m not sure how he did the implementation. I’ll have to discuss this with him at EclipseCon.

      • Unfortunately, Eike’s implementation from what I can tell simply does a binary serialization which stores it to a single field in mongo. Extremely non-mongo friendly, and no interoperability.

        I started a project with this last night, and I am really happy to say so far so good. I love the fact that containment references appear to be serialized correctly in the document itself. I haven’t yet tested it to see what happens when you define a field in the database that doesn’t exist in the model, or vice versus. But even so, EMF on top of a non-relational store is a perfect set of technologies for emerging applications.

        Have you thought at all about how to handle map-reduce yet?

      • Glad to hear it’s working for you. if there are extra fields in MongoDB, they should just be ignored. We have not looked into map-reduce at all. If this is important for you, please file an enhancement request on the project page.

  2. Hi guys great work!

    I’m new to EMF and the persistence of that and so I have a questions about your implementation and hope you can help me:

    Is there a particular reason why you use your own URIHandlerImpl instead of registering a Resource.Factory? (Because for me it looks strange to make the real write operations inside the close() of your own OutputStream.)

    • Creating a URIHandler is considered best practice for managing EMF persistence. I agree that the API may not be ideal, but it’s a workable solution that supports backwards compatibility.

      • Thanks for the info.

        So if backward compatibility isn’t a point in my implementation I would be able to implement it as a Resource. (BTW do you know a good book (beside EMF 2nd Edition) or link (beside your mongo-emf blog) about EMF Persistence especially for DB access which mentions such best practices?)

    • It’s not so unusual to do a bunch of processing during the close of the stream. For example, if you look at PlatformResourceURIHandlerImpl.PlatformResourceOutputStream you’ll see that it’s not until you close (or flush) that the bytes you’re saving are actually written out to Eclipse’s workspace. Of course in this case, we are storing bytes-oriented data, but in the case of Mongo DB, we need to do more than byte-oriented serialization of the data, so we effectively defer everything until the stream is closed. In fact, we can’t do any real conversion to bytes at all, so the stream-based APIs don’t work well. As Bryan mentions, supporting Saveable/Loadable allows the URI converter’s RESTful APIs to be reused anyway, with a significant added advantage that now any resource implementation is not only able to produce/consume bytes for byte-oriented storage, e.g. XML, but can save/load using alternative storage representations like MongoDB without needing a specialized resource implementation. So in my opinion, if you’re writing a persistence mechanism that’s not byte-oriented, you should do it the way you see in MongoDBURIHandlerImpl so that it simply works for any client who adds your handler to their URI converter…

      I don’t think there’s any good reference implementation for how to implement DB-based persistence for EMF. Certainly Teneo supports that well already…

  3. Hello, my name is Javier Espinazo from the University of Murcia (Spain).

    I downloaded the example project of your work a month or so ago and test it with a big model (~70k objects). It was not capable of storing the model in the database since it exceeded the maximum document size. I looked at your code and I saw that the containment references are implemented as nested documents (which you have already explained above).

    Since all the elements in a model must be contained by another element or by a resource, this means that models are single documents, which causes this scalability issue. I also saw the example and realized that the metamodel consisted of two metaclasses that had non-containment references between them, which explains why so many objects were stored without causing problems.

    I am currently developing a MongoDB-based model repository that stores and loads big models (hundreds of millions of objects) in an scalable way with performance results close to the ones of CDO. It is implemented as a ResourceImpl subclass. It would be interesting to share thoughts and ideas on this topic. I hope I can release a prototype shortly.

    Javier

    • You may want to look at cross-document containment references. Using this feature of EMF would allow you to store huge (millions of objects) models.

      • Thank you for your response.

        My work is intended to be integrated with MDE tools suchs as model transformations, which generate huge models in memory. My goal is to let those tools incrementally store the generated models in the repository, freeing the client memory. I suppose that using your approach, tools must flush model partitions to MongoDB which must not exceed the maximum document size, connecting them through cross-document references.

  4. Hi Bryan,

    I would also be interested in your thoughts on using the existing EMF Query project.

    Another question: Have you though of saving the non-containment references in the MongoDB as a DBRef object? The DBRef is a more standard way to reference an ID in a collection (and possibly even a difference database).

    • I do plan on making the query engine pluggable – maybe in version 0.5.0. I believe EMF Query is no longer active, so EMF Query 2 would be worth investigating. If you know anyone would like to contribute in this area, it would be welcome.

      The first implementation of MongoEMF used DBRef, but it became problematic in the implementation. Our implementation for EMF references should support all use cases including cross-database, and should make queries more consistent.

      • I completely agree your current implementation is self-consistent. It always works when you are using your library. Unfortunately, we have others that use the MongoDB data directly, without your library, and in those cases a DBRef object is the standard way to indicate a reference. I can’t really change all the other libraries that use the same Mongo database, you see.

  5. Another recommendation.

    If the object being persisted as an ID, as specified in the EcoreUtil.getID(EObject) method, then use that ID for persistence.

    Similarly, on reading from Mongo, if there is an ID, set it from the _id in Mongo in addition to updating the resource URI.

    Don

  6. I think the queries would have to do that automatically. Interpret whatever attribute a user modeled as the ID to map to _id. This, of course, could be an option you supply in the save/load options to maintain backward compatibility or user preference.

    The same for using a DBRef for references. This could be an option. Maybe I will work on that and see what I can come up with.

  7. > Ed Merks will have a blog post soon on using the query model.

    Did Ed ever do the blog on the query model? Or is there anywhere to look for an example?

    • Ed has not done a blog post on the query model. For examples on how to use the query, have a look at the JUnit tests.

  8. Issues installing Mongo EMF on Indigo (Mac Intel)

    Hi Bryan, I look forward to trying out Mongo EMF. I have not been able to install (0.5.x) from p2 though (error with missing dependencies). I then just tried to install from the recent 0.6.0 and get this error below. Any advice on how to get a successful install are appreciated. Thanks !

    Cannot complete the install because one or more required items could not be found.
    Software being installed: MongoEMF JUnit Support 0.6.0.201203280725 (org.eclipselabs.mongo.emf.developer.junit.feature.feature.group 0.6.0.201203280725)
    Missing requirement: Junit 0.6.0.201203280725 (org.eclipselabs.mongo.emf.developer.junit 0.6.0.201203280725) requires ‘package org.hamcrest 1.3.0′ but it could not be found
    Cannot satisfy dependency:
    From: MongoEMF JUnit Support 0.6.0.201203280725 (org.eclipselabs.mongo.emf.developer.junit.feature.feature.group 0.6.0.201203280725)
    To: org.eclipselabs.mongo.emf.developer.junit [0.6.0.201203280725]

    • In order to install the JUnit utilities, you need Hamcrest 1.3.0 in your target platform. If you want to give it a try without the JUnit utilities, expand the category and uncheck the MongoEMF JUnit Support feature.

  9. Hi Bryan,
    I don’t know if this is the right place to ask questions about mongo-emf, but I couldn’t find an appropriate blog.
    Concerning references in mongo-emf: I have a class A refrencing a class B referencing C and D. They all have resolveProxies = true. Only C is contained in B, the others have containment = False. I save them as described above: first D than B and than A. The insertion in the database seems ok. The problem is that when I retrieve the object A, the referenced object B is created, but not the object C and its descendants. It seems that only one level of references is resolved. Is this behavior normal?
    Here is my code. Please tell me if there is a blog for mongo-emf users, I will post there. Thanks!
    Stefan

    // init mongo for insert
    ResourceSet resourceSet = new ResourceSetImpl();
    EList uriHandlers = resourceSet.getURIConverter().getURIHandlers();
    uriHandlers.add(0, new MongoURIHandlerImpl());
    resourceSet.getLoadOptions().put(MongoURIHandlerImpl.OPTION_USE_ID_ATTRIBUTE_AS_PRIMARY_KEY, Boolean.TRUE);
    Resource resource1 = resourceSet.createResource(URI.createURI(“mongo://127.0.0.1:27017/db/test/”));
    Resource resource2 = resourceSet.createResource(URI.createURI(“mongo://127.0.0.1:27017/db/test/”));
    Resource resource3 = resourceSet.createResource(URI.createURI(“mongo://127.0.0.1:27017/db/test/”));

    //create an object A referencing B who references C and D
    C cObject = BidonPackage.eINSTANCE.getBidonFactory().createC();
    cObject.setAttrC(“value c”);

    D dObject = BidonPackage.eINSTANCE.getBidonFactory().createD();
    dObject.setAttrD(“value for d”);

    B bObject = BidonPackage.eINSTANCE.getBidonFactory().createB();
    bObject.setAttrB(“pararam”);
    bObject.setRefC(cObject);
    bObject.setRefD(dObject);

    A aObject = BidonPackage.eINSTANCE.getBidonFactory().createA();
    aObject.setAttrA(55);
    aObject.setRefB(bObject);

    // save the objects
    resource1.getContents().add(dObject);
    resource1.save(null);
    resource2.getContents().add(bObject);
    resource2.save(null);
    resource3.getContents().add(aObject);
    resource3.save(null);

    For retrieving objects I do:

    // init mongo for retrieve

    ResourceSet resourceSetforGet = new ResourceSetImpl();
    EList uriHandlers2 = resourceSetforGet.getURIConverter()
    .getURIHandlers();
    uriHandlers2.add(0, new MongoURIHandlerImpl());
    resourceSetforGet.getLoadOptions().put(

    MongoURIHandlerImpl.OPTION_PROXY_ATTRIBUTES, Boolean.TRUE);
    resourceSetforGet.getLoadOptions().put(

    MongoURIHandlerImpl.OPTION_QUERY_CURSOR, Boolean.TRUE);
    Resource resourceForGet = resourceSetforGet.getResource(URI.createURI(“mongo://127.0.0.1:27017/db/test/?”), true);

    // retrieve objects
    MongoCursor cursor = (MongoCursor) resourceForGet.getContents().get(0);

    while (cursor.getDbCursor().hasNext()) {
    EObject aEObject = (EObject) cursor.getObjectBuilder().buildEObject(
    cursor.getDbCollection(), cursor.getDbCursor().next(),
    cursor.eResource(), false);
    if(aEObject instanceof A){
    A temp = (A) aEObject;
    System.out.println(temp); // it works
    System.out.println(temp.getRefB()); // it works
    System.out.println(temp.getRefB().getRefC()); // is null
    System.out.println(temp.getRefB().getRefD()); // is null
    }
    }

  10. Thank you for your answer. I’ve posted a message on the community forum, containing my example packaged as JUnit Test.
    Stefan

  11. Hi Bryan,

    Just a side question if you’ve ever persisted the ResourceSet in a HttpSession and how do you solve the Ecore-Not-Serializable scalability issue.

    Ken

    • I’ve never tried to persist a ResourceSet in a HttpSession. You will be best off using the standard EMF persistence mechanisms – Resource.save() rather than trying to use Java Serializable.

  12. Hi, I am trying to install MongoEMF using the P2 repository 0.7.1 does not return anything to install and 0.7.1 fails with the following dependency:
    Cannot complete the install because one or more required items could not be found.
    Software being installed: MongoEMF DefaultStreams 0.7.0.201206190805 (org.eclipselabs.mongo.emf.streams.feature.feature.group 0.7.0.201206190805)
    Missing requirement: MongoEMF DefaultStreams 0.7.0.201206190805 (org.eclipselabs.mongo.emf.streams.feature.feature.group 0.7.0.201206190805) requires ‘com.mongodb 0.0.0′ but it could not be found

    I have already installed the MongoDB driver (2.9.3) from the related install page. Any advice?

  13. Hi Bryan
    I am still on the EMF learning curve for saving “custom” resources (i.e. non XMI). I am unclear as to where an existing EMF based application I should create the new Mongo EMF ResourceSets and Resources? Is it wise to add the creation of the Mongo EMF resources in the code generated by EMF (e.g. EditingDomain ) ? Do I need to create a new Save or Save As Command / Handler to persist to Mongo, or does the Save Actions/Commands generated from EMF now know to also save to Mongo in addition to XMI? I tried to find some practical examples of how to design a RCP app, generated via EMF, to save to a custom Resource … so far have not had much luck. Any help / guidance is appreciated, I look forward to getting Mongo EMF integrated in my app.
    Thanks
    Ed

    • HI Ed,

      You can create the ResourceSet in your application where it’s needed. There is no special place that is must be created. You can even create multiple instances of ResourceSet. Just keep in mind that a ResourceSet is basically a cache of your EMF object instances. You do not need to create a special Save command to persist your EMF object instances to MongoDB, you simply need to use the right URI when creating your Resources, and hook in the right URI converter – this is done automatically when using the OSGi declarative services I have provided in the framework. Get an instance of IResourceSetFactory and call createResourceSet().

      Please be aware that I’m currently doing a major refactoring to the code to split it up into separate projects as portions of the current project can stand alone. This will cause many API changes. I hope to have this refactoring complete in a few weeks.

      Bryan

  14. hello …
    I’m working on a project with emf and rcp. I’m riding the database with CDO, but this includes a cdo / mongoDB store … mongoEMF does not like this part, or if there are examples that you can use and quisera ask how reliable is to use mongo.

    Agradeseria contribution to this question. Thank you.

    • I’m not quite sure what you are asking here. MongoEMF and CDO are separate technologies that store EMF models and I would not expect one framework to be able to read models stored from the other framework.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s