My proposal for joining .NET and the Hadoop ecosystem

Hi!

red-bull-racing-rb12I suppose that as a lot of people I have been living in the .NET world for last decade, and now we want to use the hadoop ecosystem in order to get all the benefits Big Data gives in our IoT projects: more storage and computing power for “less” money. But we want also to keep using our beloved Visual Studio. I like to keep things simple, I want to have fewer environments as possible and also being able to work in the cloud or on premise. My goal is to use .NET over a Horton o Cloudera environment, and just that.

Current Situation

Our latest project worked like a charm, our mix was a non-bigdata one using distributed microservices over a relational database. The technologies used are:

  • NServiceBus: for creating distributed business logic.
  • MSMQ: as a transport between our differente components. It gives us “exactly once” messaging.
  • Relational database.
  • Akka.net: for high performance services.

Services are isolated and exposes an API. They also have a command model and a read model. The problem is they don’t leave the results in a centralized log so orchestrating services have to poll for results or expose a known contract to receive events.

net-and-hadoop-ecosystem

Desired Situation

Now I want to move to a Big Data environment, for that I want to change all the persistence to a hadoop ecosystem. Specifically:

  • NServiceBus: for creating distributed business logic. Used as “at least one” to have more performance.
  • Akka.net: for high performance services.
  • Phoenix: for random access storage and being able to give a SQL environment to the support teams.
  • Spark: data analysis, streaming and machine learning.
  • Kafka: central log between services and transport between inner components.

net-and-hadoop-ecosystem-2

Click to see the image bigger

 

Work to be done

The work to be done I see it is not really big and I hope to have it finished in the first quarter of 2017. We just need to implement:

  1. Kafka transport for NServiceBus. 
  2. Livy for connecting .net with spark.
  3. Phoenix persistence for NServiceBus.
  4. Kafka persistence for Akka.net.

 

After that we will have a simple and compacted architecture in a big data environment.

I would love to hear comments/improvements/critics about these ideas, or any help with the components to be done is also welcomed.

Thanks.

Advertisements

5 comments

  1. Stefan · · Reply

    Hi! This is certainly a nice proposal but rare I am afraid. What do you think about MS Mobius project? Have you had any chance to try it?

  2. Hi Stefan!

    I saw Mobius, the problem we have with it is that it doesn’t support Phoenix and I suppose it is a strange way to use Spark, with what it means in terms of finding help/training/etc.

    On the other hand we are learning Python because we are improving our Machine Learning skills. So python is our first option for Spark.

    I think also it is rare, this is not for every use case. But I think this is a good way to go if you are in .NET and want to enter in the Big Data world.

    1. Stefan · · Reply

      Well, I only see username, uri valuse.. I probably need to explore it for some time. I saw this and I know it but I am afraid Mobius requires using MS kafka csharp driver, I have definitevely ditched it.

      Time to start learning Python I guess 🙂 Thank you for your tips

  3. Stefan · · Reply

    You’re right at the moment I am passing throught support issues related to Kafka external libraries but I think that if Microsoft puts some more resources into it in terms of support it might become viable alternatives to developers who do not have time to put into learning other language.

    .NET Core has arrived on some categories of IoT devices so maybe in future we will some improvements in terms of support from MS.

    I am currently implementing my pet project related to sending a multiple batches of simulated data that is slightly preprocessed and that needs to be processed in real time. I have used python on RPi Zero and Flume to digest’em and store to Hadoop. It seems that Mobius does not support Flume and it has problems with Kafka so I must go try to connect Flume with Spark Streaming.

    Probably I will use this project in order to connect it with .net:
    https://github.com/spark-jobserver/spark-jobserver

    To be honest I do not 100% understand your article being a junior .net programmer but it is nice to know that there are some alternatives in our “ecosystem” even tough it means some adaptation.

    I hope I will have time to dig deeper into it. Just one thing about livy .net is it meant to be used with Azure HDInsight? I have local hadoop and spark installation on my laptop as single node conf. and I tought I could integrate into them too.

    Thanks

    1. I suppose you will be able to use livy to connect to your local spark. But I don’t know exactly if Livy is activated by default in spark.

      We use it with hdinsight and hortonworks hdp (which are the same thing).

      Regarding kafka now you have an official confluent driver for .net. This will have professional support, now I think it is in beta.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: