Breaking

Friday, October 17, 2014

12 things I loathe about Hadoop

Hadoop is a magnificent creation, however it's advancing rapidly and it can show blemishes. Here are my dozen killjoys.


I adore the elephant. The elephant adores me. Nothing is impeccable, be that as it may, and some of the time companions battle.

Here are the things I quarrel with Hadoop over.

1. Pig versus Hive

You can't utilize Hive UDFs in Pig. You need to utilize HCatalog to get to Hive tables in Pig. You can't utilize Pig UDFs in Hive. Whether it's one minimal additional usefulness I require while in Hive, however don't generally have a craving for composing an all out Pig script or it's the "hmm, I could without much of a stretch do this on the off chance that I were just in Hive" while I'm composing Pig scripts, I habitually think, "Tear down this divider!" when I'm writing in either.

2. Being compelled to store all my mutual libraries in HDFS

This is a repeating subject in Hadoop. On the off chance that you store your Pig script on HDFS, then it naturally expect any JAR documents will arrive also (I'm dealing with altering that myself). This general subject rehashes in Oozie and different apparatuses. It's generally sensible, however now and again, having an association wide constrained shared library adaptation is agonizing. In addition, more than a fraction of the time, these are the same JAR documents you introduced all over the place you introduced the customer, so why store them twice? This is being settled in Pig. What about everything else?

3. Oozie

Troubleshooting you is not fun, so the docs have bunches of samples with the old pattern. When you get a mistake, it for the most part has nothing to do with whatever you did off-base. It might be a "convention blunder" for a design mistake or a pattern approval blunder for a composition that accepts utilizing the blueprint validator yet comes up short on the server. To an incredible degree, Oozie is similar to Ant or Maven, aside from circulated, with no tooling and somewhat weak.

4. Blunder messages

You're clowning, isn't that so? Talking about blunder messages. My most loved is the one where any of the Hadoop instruments say, "disappointment, no mistake returned," which means "something happened, good fortunes discovering it."

5. Kerberos

'Nuff said? On the off chance that you need to secure Hadoop in a way that was generally thoroughly considered, you get the opportunity to utilize Kerberos. Keep in mind Kerberos and what amount of fun and out of date it is? So you go straight LDAP, with the exception of that nothing in Hadoop is coordinated: no single sign-on, no SAML, no OAuth, and nothing passes the accreditations around (rather, it re-confirms and re-approves). Significantly more fun, every piece of the Hadoop biological system composed its own particular LDAP bolster, so it's conflicting.

6. Knox

Since composing an appropriate LDAP connector should be done no less than 100 more times in Java before we hit the nail on the head. Gosh, go take a gander at that code. It doesn't generally pool associations legitimately. Actually, I sort of think Knox was made out of an energy for Java or something. You could do likewise with an elegantly composed Apache config, mod_proxy, mod_rewrite. Indeed, that is essentially what Knox is, aside from in Java. To boot, after it confirms and approves, it doesn't pass the data on to Hive or WebHDFS or whatever you're getting to, and gets the opportunity to do it once more.

7. Hive won't give me a chance to have my outer table and erase it as well

On the off chance that you let Hive oversee tables, it consequently erases them in the event that you drop the table. On the off chance that you have an outside table, it doesn't. Why can't there be a "drop outside table as well" or something? Why do I need to do this outside in the event that I truly need to? Additionally, while Hive is basically advancing into a RDBMS, why doesn't it have Update and Delete?

8. Namenode fall flat

Oozie, Knox, and a few different parts of Hadoop don't comply with the new Namenode HA stuff. You can have HA Hadoop, inasmuch as you don't utilize whatever else with it.

9. Documentation

It's banality to grumble, however look at this. Line 37 isn't right - more terrible, it isn't right in each post everywhere throughout the Internet. This demonstrates nobody even tried to run the case before checking it in. The Oozie documentation is significantly more ghastly, and the majority of the illustrations won't pass blueprint acceptance on the rendition it's implied for.

10. Ambari scope

I experience difficulty scrutinizing Ambari; given what I think about Hadoop structural engineering, it's stunning Ambari works by any means. All things considered, where Ambari has deficiencies, they can be irritating. For instance, Ambari doesn't introduce - or now and again, doesn't introduce effectively - numerous things, including different HA settings, Knox, and much, significantly more. I'm certain it will show signs of improvement, yet "physically introduce a while later" or "we'll need to make a manikin script for the rest" shouldn't show up in my messages or documentation any more.

11. Store administration

Talking about Ambari, have you ever done an introduce while the Repositories were being redesigned? I have - it doesn't carry on well. Truth be told, in some cases it finds the quickest (and most outdated) mirror. It couldn't care less if what it pulls down is in any capacity good. You can arrange out of that part, yet's regardless it irritating the first occasion when you introduce disjointed bits of Hadoop over a couple of hundred hubs.

12. Invalid pointer special cases

I appear to discover them. Frequently they are parse mistakes or different deficiencies I've created. All things considered, regardless they ought not be uncovered as NPEs in Pig, Hive, HDFS, etc.

The reaction to any comparative rundown of protests will obviously be "fixes welcome!" or "hey, I'm dealing with it." Hadoop has progress significantly and is unquestionably one of my most loved devices, however kid, those sharp edges irritate me.

What's your most loved Hadoop bug or six-legged element? What are you improving?

SOURCE

No comments:

Post a Comment