Breaking

Tuesday, December 30, 2014

12/30/2014 08:27:00 PM

Pitchers Or Catchers?

In pro baseball, the catcher signals the pitcher how to throw the ball. Similarly, IT needs to inspire lines.

As the saying goes, "Every company is an IT company, every person is a technology person." This adage is becoming more relevant than ever in today's digitalized world, with organizations of all sizes looking for new ways to compete and grow by changing or pivoting their business models.

Research firm Gartner says that by 2017, more than 50 percent of technology spending will happen outside of the IT organization. Lines of business have always benefited from a discretionary IT budget that was directly proportional to the power of their leader, but until the advent of SaaS this budget remained low compared to the overall IT spend. Salesforce.com's "No Software" approach and the subsequent surge in Software-as-a-Service business applications has seen the advent of a new era in non-IT-driven IT investments. Many an IT leader has had to cope with the challenging task of regaining control of applications that were selected and deployed without his involvement, and sometimes without his knowledge.

Today, SaaS is no longer limited to business applications. Even technologies that until recently remained the prerogative of IT, are shifting toward business users. Integration is a perfect example: Gartner inventories today over 30 Integration Platform-as-a-Service (iPaaS) vendors, and even the highly specialized API world is getting a step closer to business users with self-service API Platform-as-a-Service. (Disclosure: I am a former employee and a shareholder of Talend, an integration vendor which has announced it will launch an iPaaS product in 2015, and I am an advisor to Restlet, provider of an API PaaS).

In this world where IT is loosing control of IT, it is easy for IT leaders to remain passive and wait until they get thrown projects at. After all, there is never enough budget to build everything they are asked to do, so why seek additional projects? But in the long run, once the excitement fades, as business leaders get replaced, IT will inevitably be asked to assume the choices that were made without them. The challenge for IT is thus to never lose this control, to remain involved in the key technology decisions, or at the minimum to help stir them in the right direction.

I have written before about sports and how they are driven by digitalization. But sports concepts also drive this digital transformation. Take baseball: who decides how the ball is going to be thrown, the pitcher -- or the catcher? For the casual observer, the pitcher is in control, and if he throws a curveball, the catcher will have to figure out how to catch it. And indeed this is what happens when the catcher remains passive, just squatting and waiting for the pitch. This is also what happens when IT is not involved -- until the line of business decides to throw ownership of SaaS applications or iPaaS integration jobs toward IT.

But if you observe more carefully a baseball game, pro catchers actually signal to their pitcher, through an elaborate finger-position code, the type of pitch to throw. In IT, this translates by IT gently nudging the business towards the right choices -- right for the business, and right for IT.

Of course, some pitchers don't always obey the hand signals from the catcher, because "they know better." Similarly, not all business leaders will follow the advice from IT on their technology choices, because "they own the budget." And IT will be left to deal with an unexpected curve ball at some point.
12/30/2014 08:24:00 PM

Make sure your data is housebroken

You'd never come home with a dirty, pest-infested and untrained puppy you randomly pick in the street. 

This data you are trying to bring in looks pretty nice, but where did you pick it up? Are you certain it's safe to import into our systems? Until you tell me about the origin of the data, and can certify to its state of cleanliness, you are leaving it outside the door! I don't want the data we have gone to great lengths to thoroughly assess, cleanse and enrich, to get corrupted by this new data puppy of yours! It certainly looks cute, but has it received the proper immunizations? Is it data warehouse-broken? And when was the last time it saw a data vet?

You really don't want to bring into your systems any data, coming from anywhere. In a not-too-distant past, the equation was fairly straightforward. Most data was produced internally by your transactions, on your systems. Some data would be provided by trading partners, or purchased from data providers, but the process to acquire this data would be properly designed, a contract with service level agreements would be devised, guaranteeing a proper level of quality and holding you harmless from infringement, from privacy violations and other difficulties caused by improper data collection.

We now live in a digitized world. More and more, all kinds of data is available for anyone to grab. Whether data is collected through calls to public APIs or via screen scraping, it is extremely easy to harvest all kind of data. But you have no control over the origin of this data, over its reliability, over its accuracy.
All this readily available and easily harvest able data creates new challenges linked to governance:
  • Origin: it's not because the origin of data is unknown that the data is unusable. Actually there are certainly cases where it is of better quality than your own data. However you have to assume that it may be bad, until it has been proven otherwise. Just leave that data in the front yard, or in the mudroom, until you have confirmed that it meets your standards.
  • Reliability and accuracy: if you are going to base mission-critical business processes on this data, you need to confirm that it is fit-for-purpose. This can be done by checking samples, or by executing test-runs of these processes and comparing the outcome with other predictions or with actual.
  • Liability: the press is filled with examples of data theft and privacy violations linked to improper use of data. If you are bringing harvested data into your systems, ensure that this data was collected appropriately, with the proper levels of consent.
  • Control: simply put, if it's not your data, you have no power in controlling and enhancing governance. You have to rely on a third party, or set of third parties, to properly govern the data. Or, you have to assume that the data is not governed, and use it as such.
So before you bring this cool data puppy home and let him become part of your household, make sure it's not going to cause too much trouble!

Read More:- Techies | Update
12/30/2014 08:22:00 PM

Build a Storm analytics solution

Storm lets you create real-time analytics for every conceivable need.

Two weeks ago, we examined the two most popular real-time processing frameworks, Apache Storm and Apache Spark. Now we're going to take a much deeper look at Storm and walk through a basic Storm deployment for consuming Twitter messages and performing analytics on the Twitter stream.

To this end, we'll extract important keywords from individual tweets and calculate rolling metrics related to how actively a given keyword is being discussed. Plus, we'll do some lightweight sentiment analysis to determine the tenor of the discussion on a given topic. We'll also look at how Storm and XMPP combine nicely for extracting important "moment in time" events from a stream and for sending those events out as alerts.

All about Storm

Storm is an open source, distributed, stream-processing platform, designed to make it easy to build massively scalable systems for performing real-time computations on continuous streams of data.
People sometimes refer to Storm as the Hadoop of real-time processing, but it's important to note that Storm has no particular dependency on the MapReduce programming model. You may, if your needs so dictate, code a Storm solution to use a MapReduce model, but nothing about Storm requires it. In fact, Storm bears a slight resemblance to pre-Hadoop distributed computing systems like MPI in terms of the flexibility you have in designing your application.
12/30/2014 08:16:00 PM

Harbinger of the Linux apocalypse

It might not be the end of the world, but the design of system and the attitudes of its developers.

Now that Red Hat has released RHEL 7 with system in place of the erstwhile SysVinit, it appears that the end of the world is indeed approaching. A schism and war of egos is unfolding within the Linux community right now, and it is drawing blood on both sides. Ultimately, no matter who "wins," Linux looks to lose this one.

The idea behind systems was to replace the aged Init functionality and provide a sleek, common system initialization framework that could be standardized across multiple Linux distributions. systems promised to speed up system boot times, better handle race conditions, and in general, improve upon an item that wasn't exactly broken, but wasn't as efficient as it could be.

As an example, you might be able to produce software that could compile and run on numerous Linux distributions, but if it had to start at boot time, you could be required to write several different Init-style boot scripts, one for each supported distribution. Clearly this is inelegant and could use improvement.

Also, there was the problem that traditional Init is slow and bulky, based on shell scripts and somewhat random text configuration files. This is a problem on systems that need to boot as fast as possible, like embedded Linux systems, but is much less of a problem on big iron that takes longer to count RAM in POST than it does to boot to a log-in prompt. However, it's hard to argue that providing accelerated boot times for Linux across the board is not a good thing.

These are all laudable goals, and systemd wasn't the first project aimed at achieving them. It is, however, the first such project to gain mainstream acceptance. This is in no small part due to the fact that the main developers of systemd are employed by Red Hat, which is still the juggernaut of Linux distributions.

Red Hat exerted its considerable force on the Linux world. Thus, we saw system take over Fedora, essentially become a requirement to run the GNOME desktop, then become an inextricable part of a significant number of other distributions (notably not the "old guard" distributions such as Gentoo). Now you'd be hard-pressed to find a distribution that doesn't have systemd in the latest release (Debian doesn't really use systemd, but still requires systemd-shim and CGManager).

While system has succeeded in its original goals, it's not stopping there. system is becoming the Svchost of Linux -- which I don't think most Linux folks want. You see, system is growing, like wildfire, well outside the bounds of enhancing the Linux boot experience. system wants to control most, if not all, of the fundamental functional aspects of a Linux system -- from authentication to mounting shares to network configuration to syslog to cron. It wants to do so as essentially a monolithic entity that obscures what's happening behind the scenes.

No matter which side of the argument you're on, this monolithic approach is in violation of the rules of Unix, specifically the rule stating it's best to have small tools that do one job perfectly rather than one large tool that is mediocre at performing many jobs. Prior to this, all the functions subsumed by systemd were accomplished by assembling small tools in such a way that they performed the desired function. These same tools could be used within a variety of other scripts to perform myriad tasks -- there was no singular way to do anything, which allowed for extreme freedom to address and fix problems. It also allowed for poor implementations of some functions, simply because they were poorly assembled. You can't have both, after all.

That's not the end of the story. There's more happening with system than many might realize. First, system is rather inelegantly designed. While there are many defensible aspects of system, other aspects boggle the mind. Not the least of these was that, as of a few months ago, trying to debug the kernel from the boot line would cause the system to crash. This was because of system's voracious logging and the fact that system responds to the "debug" flag on the kernel boot line -- a flag meant for the kernel, not anything else. That, straight up, is a bug.

However, the system developers didn't see it that way and actively fought with those experiencing the problem. Add the fact that one of the system developers was banned by Linus Torvalds for poor attitude and bad design and another was responsible for causing significant issues with Linux audio support, but blamed the problem on everything else but his software, and you have a bad situation on your hands.

There's no shortage of egos in the open source development world. There's no shortage of new ideas and veteran developers and administrators pooh-poohing something new simply because it's new. But there are also 45 years of history behind Unix and extremely good reasons it's still flourishing. Tools designed like system do not fit the Linux mold, to their own detriment. System's design has more in common with Windows than with Unix -- down to the binary logging.

My take is that system is a good idea poorly implemented, developed by people with enormous egos who firmly believe they can do no wrong. As it stands now, both system and the developers responsible for it need to change. In the open source world, change is a constant and sometimes violent process, and upheavals around issues such as system aren't necessarily bad. That said, these battles cannot be drawn out forever without causing irreparable harm -- and any element as integral to the stability and functionality of Linux as system has even less time than most.
12/30/2014 08:11:00 PM

A draft reimbursement policy for mobile users

Many companies still struggle with reimbursement and access policies for employees.

It's one of the most frequent questions I get at conferences: How to manage all those users who want to or simply do -- use mobile devices and want that work usage reimbursed. This question usually comes up in the context of BYOD, but of course can also be raised in terms of company-provided devices -- meaning Who gets those in the first place?

Although every company has its own requirements, employee-ennoblement bias, and context, there are core, equitable principles that every company can start from and modify for their own needs and culture. What follows is my proposed draft policy based on these principles. As you begin a new year, now is a good time to (re)think your own policies around mobile and remote-access reimbursements and of course, permissions.

Principles

The company's business units have different patterns and mixes of employee communications and information access. Thus, a flexible approach is needed that lets each business unit optimize its use of mobile devices for communications, information access, and systems access. But so is consistency in the framework governing how money is spent on such tools, so employees whose roles are similar are treated similarly.

In applying this policy, there are two key decisions for the department head to make that determine which policy section applies to each employee:

Read More:- Techies | Update
12/30/2014 08:09:00 PM

Developers must follow security rules, too

The role of the developer has risen in importance in many organizations.

We live in a least-privilege, role-based security world where no company should have full-time admins with full rights. Instead, you should distribute responsibilities where possible and rotate admins in and out of privileged groups. This is one of the most effective ways to stop malicious hackers from getting the keys to the kingdom.

But what about software developers? Because developers need to perform administrative tasks and have full control of their environment, it’s difficult to restrain and harden their workstations. They need to install software and drivers on a fairly regular basis, as well as debug programs. To a developer, denying full control over a workstation is akin to preventing them from doing their job.
I don't buy that argument. In exchange for a little developer inconvenience, you can prevent attackers from hijacking elevated privileges that aren't truly necessary for developers to do their jobs.

Isolation from the Internet

First of all, developers should not have elevated privileges in production environments.

There are few legitimate reasons why a developer should have those privileges; the ones I can think of should be given out sparingly and temporarily. In fact, I don’t think developers should have permanent, elevated permissions in a test environment. If they need elevated permissions, they should request them for a particular task and time period.

I'm fine with developers having full-time, local admin credentials on a computer that stands alone or is hooked to a test domain, but with significant caveats.

The most important is that developers should not program using the same computer with which they access the Internet and pick up email. It’s too high-risk. I know this flies in the face of the GitHub generation, where sharing code and dev tools on the Internet has become part of the culture. Fine -- but don't do that on the same machine where you're writing code.

I even think adding popular programming websites to an "allowed list" on a programming workstation is asking for trouble. The last few years have been replete with watering-hole attacks that targeted developers who hang out at popular programming websites.

If Web browsing must be allowed, developer workstations should be reset to an original, trusted state after the end of each session. Again, I know this is a pain, but you're giving developers highly privileged, permanent accounts, so there need to be security mitigation trade-offs.

Rights and responsibilities

If developers want permanent local admin rights, give them a virtual machine (running on their local desktop or a host server) dedicated to programming. It should be thoroughly hardened and perfectly patched, with judicious monitoring and alerting. The workstations should run up-to-date antimalware and host-intrusion detection programs. Reports should be automatically generated each morning to show what changed on the workstations from the previous day.

Optimally, I would use a whitelisting application control program to define what programs can run on developer computers. They'll hate this and say it restricts them in doing their job. So what? That’s the price you pay for getting local admin rights. Unfortunately, developers are not so great at ensuring they don’t download Trojans that masquerade as shared code or cool programming tools.

All local accounts, including the developer’s local admin account, should be prevented from logging on over the network. Developers definitely shouldn’t have the ability to log on to high-risk computers like domain controllers or other servers on the production network. All accounts should have unique names and unique passwords. That way, if a bad guy gets a hold of the developer’s local admin credentials, lateral or vertical movement attacks will be that much more difficult.

It’s important to remember that no matter how you handle developer access, a developer’s logon credentials should absolutely be unique between the test and production environments. That means no shared passwords or digital certificates. Failing to do this defeats the purpose of having separate environments -- and hackers love companies that violate these credentials.

If developers balk at these risk mitigations, see if you can come up with your own off-setting controls. If you have to loosen a few, be sure at the very least to implement a specialized, hardened programming workstation where developers cannot do their normal Internet browsing and email. This is the only control I would absolutely insist on as a deal breaker.

Most developers won't like my advice. But we live in a world of high risk, where letting any group indulge in bad security behavior can only lead to more Sony-style hacks. Browsing simply should not be done on the same computer (or image) as the one being used to develop software. This advice is no different than what we have been recommending for any infrastructure administrator for years.

Monday, December 29, 2014

12/29/2014 05:43:00 PM

Confessions of an open source purist

When picking a PC, we face a three-way choice: High maintenance, out-of-the-box utility, and software freedom.

Writing about my successful migration from a MacBook Pro to a Chromebook -- two years after the original transition -- led to the inevitable denunciations from certain ideological absolutists. I got comments like, “So you like proprietary OSes, good for you I guess,” as if I was choosing Windows.

I still long for a fully open source solution, but one of my fundamental requirements is that I don't take on a new full-time hobby to keep it working. When I threw out Windows nearly a decade ago, it was in no small part because it required too much of my time applying updates, maintaining malware protection, and fixing drivers after my kids used it for playing games.

I have also repeatedly evaluated GNU/Linux as a platform for my daily writing and administration. Each time, I’ve found it fairly easy to install (moreso every time I try) and easy to add applications. I’ve never had problems with malware, but at some point in the life of the system, a problem arises that at best causes an inconvenience (like the sleep mode failing) and at worst leaves the system impossible to boot.

My longing for open source purity found me switching to a GNU/Linux system for a while earlier in the year. Breaking my usual rule of using only preloaded operating systems, I bought a Samsung Series 5 Ultrabook at my local supermarket, wiped Windows (we haven’t allowed it or Office in the house since 2005), and easily installed Ubuntu 14.04 LTS. It worked well, apart from not resuming when I opened the lid. It was useful for tasks that are easier on a desktop computer, like remote debugging on a phone or running a local build instead of using a build server.

But most of the time I found myself using Chrome on it, effectively treating it as a Chromebook. When an attempt to upgrade to Ubuntu 14.10 bricked the device and left me unable to boot to a place where I had the experience to recover it, I’ve never gone back.

Software freedom is important. It’s the guarantee that you are free to choose your own technology solutions, rather than surrendering that choice to a vendor or government. All the same, if you care at all about software freedom (you should), every technology choice will involve a compromise.

You can spend large amounts of time and learning to make almost anything work -- on the other hand, a solution that “just works” is a great choice. Making systems “just work” can involve compromising the degree of software freedom left to you as the user. Any technology decision seems to involve optimizing on the technology utility triangle, whose three points are time expended (to keep things working), utility of the system, and optimal software freedom:
Technology Utility Triangle
In my experience, I've been able to pick only two of these points. My preference is to optimize away from time expended and toward software freedom and utility. The more I can push toward optimal software freedom, the better. What does that mean? There are three dimensions to optimal software freedom:
  • How much of the source code for the system is available? Can I (or someone working for me) build it and replace the system I am using while retaining utility?
  • How frequently is the system improving? Are bugs being fixed and improvements added?
  • Do I have alternatives in the ecosystem that remain viable to me? If something is wrong and stays wrong, is migration to an alternative viable?
Again, choose any two.

Every software user has to pick a set of compromises that suits them. What matters is to do so knowingly and to remain aware of the compromises so that you can continually improve.

When it comes to ideology, if you want to pick a fight, all you have to do is criticize the compromises of others. You can always do it, and almost always come away feeling the other person’s compromise is worse than yours (if you even recognize you have made a compromise – many of my critics don’t). I think it’s better to be accepting and learn from the compromises others make. Sometimes they are even right!

Saturday, December 27, 2014

12/27/2014 12:06:00 PM

Learning a second programming language? Try these 5 sites

Want to switch between programming languages? 

Developers trying to jump from one language to another often hit the same wall: How do I do this? They can do it in their base language, but introduce them to a strange new world, and the going gets rough.

Programmers in this sticky position often benefit from seeing how the same concepts, designs, and algorithms can be implemented in parallel across multiple languages. Here are five sites that feature examples of how the most popular languages -- and a few you might not know -- tackle the same commands so very differently.

Rosetta Code

Easily the largest, most robustly annotated, and consistently useful site of its kind, Rosetta Code is described as a "programming chrestomathy" -- a repository of examples for how to accomplish the same tasks in many programming languages. Most remarkable about Rosetta Code is not the sheer size of the site and the number of examples, but the granularity of the examples. Creating a window in a GUI, for instance, isn't annotated by language, but by specific toolkits within that language; take Python, with examples for Tkinter, PyGTK, Pythonwin, wxPython, and many other libraries.

Eqcode

Eqcode aims to show "equivalent codes for all languages," so it provides an index of common languages with drill-downs to specific concepts or tasks, such as removing a specific element from an array or constructing a regex to match an email address. The breadth of languages is decent, but the concepts addressed are somewhat scattershot, and it isn't updated often; the last updates were in April 2014.

CrossWise

Like the other sites here, CrossWise lets you see how multiple languages -- in this case, JavaScript, PHP, Ruby, and Python -- implement the same concepts. But the site design is undeniably ingenious: The comparisons are placed side by side in two columns, and you can choose which language examples to place in what column. CrossWise covers such details as how Boolean logic (the concepts of truth or falsehood) are implemented in each, or error handling and exceptions.

AlgPedia

An ambitious project created by Universidade Federal do Rio de Janerio in Brazil, AlgPedia is a collaborate encyclopedia that focuses on implementations of algorithms. Sorting, checksumming, arbitrary precision, data mining, pattern matching, and many other categories of algorithms are all included. The project is still in its early stages, so the coverage of algorithms and the types of examples provided are somewhat incomplete; most of them have only one or two examples.

PLEAC (Programming Language Examples Alike Cookbook)

Perl is noted for the Perl Cookbook, which documents common programming problems and their solutions for the language. PLEAC is an attempt to take the problems posed in the Cookbook and produce solutions for them in nearly every other language in use. Perl, Groovy, Python, OCaml, and Ruby have the best coverage of solutions so far, but stubs and partial entries for lots of other languages are also included. Interestingly, JavaScript is not among them, but a stub entry for CoffeeScript is. As with many of the others here, you're welcome to contribute if your favorite language is underrepresented.
12/27/2014 12:04:00 PM

13 insider tips for acing your job interview

Esoteric puzzles, landmine questions, ‘cultural fit’ -- these 13 tips help you navigate the IT interview.

If you’re an IT pro, chances the job interview is at -- or very near -- the top of your list of personal hells. Why not? Tech job interviews can be grueling experiences, rife with esoteric puzzles, uncomfortable pauses, landmine questions, and the aching underlying feeling that maybe you don’t belong.

Throughout the process, you will be talking with strangers via phone and video, taking tests, answering challenging and often uncomfortable questions, traveling on sleepless red-eye flights, and enduring multiple anxious periods of not knowing what is going on, how you’ve been received, or what will happen next.
To really ace the interview and minimize your anxiety going in, you must be prepared to an almost otherworldly extent, on many fronts at once. This means deep research on the position and company to decode what to expect during the interview process and to understand the company dynamics at as near the level of a current employee as you can.
Still sure you want to pursue that new job?

IT interview tip No. 1: Don’t be afraid to reach out early

Interview preparation begins the moment you find yourself intrigued by a posting. The questions you anticipate fielding, how you’ll present your skills and experience persuasively, what you’ll ask to get a better grasp of the position -- the interview should always be a point of reflection as you dig deeper in researching the job.

One note to keep in mind early: Sometimes the company can be more important than the advertised position. Just because the job ad says “junior programmer” does not mean the company doesn’t also need a senior developer. The key to acing an IT interview is finding the perfect fit. Don’t be afraid to call or email; many jobs are not advertised. In some cases, early contact on a not-quite-right job can lead to an informal conversation with the recruiter, who can tell you whether other, better-suited jobs are on offer or if the organization is open to expanding the role to fit your extraordinary qualifications.

IT interview tip No. 2: Don't believe everything you read on Glassdoor

Resources such as Glassdoor provide a wealth of information about the hiring and interviewing process at most major companies. In fact, for many IT pros, Glassdoor’s community reviews of a company’s culture, salary information, and so on is the first stop in researching a position. The information can be extensive, detailed, and very valuable, providing instant insights from fellow professionals of their experiences with your target company.

But as with every other aspect of the Internet, be aware that people who don’t like the company are much more likely to review it than people who like the company. Take all the negativity -- you’re likely to encounter a lot -- with a block of salt and make up your own mind. Don’t be dissuaded from pursuing what could be your dream job, or get derailed by interview advice from someone who might not have been the right fit or was not as well prepared for the interview as you will be.

IT interview tip No. 3: Find employee blogs and read them in depth

Once you know what the company thinks is important about the position you are targeting and how its interview process works, it’s time to gather all the information you can about the company. For most major companies, there are blogs, books, and websites devoted to their inner workings, technical focus, and business culture. Immersing yourself in these is not a waste of time; it helps you feel prepared, making you more relaxed when it comes time to interview.

Here, employee blogs can be a goldmine. When preparing for an interview with a major international consulting and development firm, I came across the blog of its chief scientist and read it -- all of it. This took about three days, several hours per day.

I also watched several of his presentations on YouTube, as well as presentations by the company’s CTO and other technical employees. I read everything on the company website, researched the founder’s background and vision, and skimmed hundreds of tweets and blog posts by current employees.

At no point in the interview process was I asked anything about this information, but it gave me a very good idea of what the company was focused on technically and socially, thus informing the conversations I had -- not just the technical ones. Doing so brings you a lot closer to the company, making it much easier to convince those who interview you that you belong and are ready to make a difference right away.

IT interview tip No. 4: Research social culture -- it's as critical as technical focus

Employee blogs, social media, and social networks are a great source of insider information about the company’s social culture. Does everyone complain that there’s no work/life balance? Do employees frequently get together after-hours to drink shots and sing karaoke? Do project teams volunteer for social-good projects together? Do employees often appear as speakers at conferences?

None of these are particularly important, but together they paint an overall picture of the social culture at the company. Do you find this picture attractive? Could you find yourself fitting into the company’s social dynamic?
For some, a company with an all-work-and-no-play focus is fine; for others, working somewhere with people you actually want to be around after-hours can be a great benefit. Many people take jobs they think they will like only to discover they don’t enjoy the dynamics of the workplace once they are hired, or simply enter the interview process blind to the social tenor interviews will take. Getting a sense of the day-to-day social interactions of a potential employer goes a long way toward giving you a sense of what to expect about the interview process.

IT interview tip No. 5: Understanding the underlying principles of interview puzzles is the key to crushing them

One of the more controversial -- and anxiety-inducing -- hiring practices these days is the use of puzzles during interviews. Perhaps because of this, the puzzles themselves often find their way onto the Net.
The last thing you should do is memorize the answers to the questions you find when researching a particular company. Read them to understand what kind of questions you might be asked, what kind of answers they might be looking for, and what the underlying purpose of the questions might be. Memorizing the answers can easily backfire; it takes only a small change to the question to render published answers incorrect -- not to mention the fact that rattling off answers to complex questions without taking time to think is highly suspicious. People who memorize answers without understanding underlying principles are easily exposed, and interviewers know it.

Instead, use your research as a guide to uncovering the underlying purpose of the puzzles. Silly questions about cannibals and canoes may be intended to see how you think through a logic puzzle, or it may be to see if you push back against stupid questions, or it may be to see if you think out of the box. Each company has its own agenda for asking these kinds of questions; study published puzzles with a mind to unlock them.

IT interview tip No. 6: Connect with current employees

As you prepare for your interview, be sure to make use of social networking outlets like LinkedIn. Send an invitation to connect with a few people in the company accompanied by a short note explaining that you have an interview soon and would appreciate some tips can confirm, deny, or expand on the research you’ve already done. You can ask about what to wear, what to expect, and so on, but the key to not coming off like a creep is to limit your correspondence to one question per contact; make it easy for them to connect and respond. Humor certainly helps in reaching out, but above all, be yourself.

IT interview tip No. 7: Don't tilt at windmills

If you find that the kinds of things a company asks in its interviews are ridiculous, irrelevant, or offensive, reconsider whether you really want to work there. It takes a lot of work to prepare for an interview, and if you find that a company’s process or culture makes your skin crawl, it may be in your best interests to walk away.

True, practice makes perfect. And if you’ve been out of the job market for a while, going through a couple not-quite-right or even uncomfortable interviews can help you get back in the saddle. But the strain of chasing a bad fit can ultimately be very unrewarding and potentially demoralizing, not to mention a distraction from finding exactly the right position for you.

IT interview tip No. 8: Dress as if you already work there

It’s simple, almost hackneyed, but it’s true: What you wear, how you present yourself, your body language, and so on, all of these influence the interviewers and thus the outcome. If the company is “business casual,” don’t show up in a suit. Dress as if you already work there. If you’re not sure from the description, ask your recruiter or inside contact, look for office pictures, or if possible drop by the office and peek through the window.

Before you walk into the room, remind yourself of your good qualities. Stand up straight, make eye contact, and enter the space as if you belong there -- not as if you own it or rule it, but as if it is a comfortable, familiar place where several of your friends reside. You want to project an open, friendly level of confidence, to set both you and the interviewers at ease.

IT interview tip No. 9: Let your personality out

Interviewers -- the good ones -- aren’t merely looking for boxes to check. They’re looking for someone who fits into the company culture, adds value to the business, and is able to grow with the job. It's important to let your personality out, though not all at once. This may be difficult for those with a more introverted personality to ease into, and for the more extroverted among us it may be hard to hold back on the thousand things you find fascinating.

Pick two or three areas of personal interest that you feel are relevant to the job, the company, or the interviewer. You may not be able to identify these in advance, so stay alert for clues during the course of the conversation. When opportunity presents, add a little bit of your personality into the dialog, and see if that increases interest or engagement. If it does, continue the thread for a little bit; if it doesn’t, don’t press it.
In the end, you want the company to be excited about you, not about filling a round hole with an uninteresting peg.

IT interview tip No. 10: Beware the "interviewing the interviewer" trap

Some publications recommend you take control of the interview by “interviewing the interviewer.” This can be misinterpreted as adversarial in some circumstances. It’s fine to turn the interview process into a conversation instead of an interrogation and ask questions. For example, if asked about your experience with a particular programming language, it’s not a bad idea to answer with pertinent facts briefly, then follow up with a leading question such as, “How much of my work would be with this language, and in what domains?”
This is easier than it sounds, and it is definitely a good idea to prepare questions for the interviewer in advance, especially if you have concerns about the position or company. But by no means do you want to come off as challenging the interviewer by turning the process on its head. You want to be remembered as someone who is well-prepared, well-informed, and easy to talk to -- not someone who is going to undermine colleagues by questioning them for the sake of showing what you know.

IT interview tip No. 11: Help the interviewer imagine you in the position

Open-ended questions are an excellent opportunity to help the interviewer imagine you in the position. For example, “What would my day-to-day duties be?” pretty much forces the interviewer to imagine that you already have the position and are going about your daily routine before answering, or at least drawing a parallel between you and their ideal candidate or between you and the last person to fill the position.

IT interview tip No. 12: Always speak favorably about former employers

It doesn’t matter if your prior employer was a complete control freak who made your life a living hell; always speak favorably about them. Don’t say, “I got tired of text messages at midnight asking me for help with the manager’s moonlighting project, so I’m looking for a new gig.” Instead, make it positive: “I enjoyed the variety of challenges presented but am looking for new ones.” Going on and on about how badly your prior employer treated you is a serious red flag, no matter how justified you feel you are in doing so.

You are very likely to be asked point-blank why you have left former positions, so be positive -- “to expand my opportunities to use my abilities to help more people,” “to take my career in a more modern direction,” “to relocate to [city name].” Whatever you say, you must mean it, and be ready to back it up with concrete examples. Moving toward a positive goal is attractive; running away from a negative space is not.

IT interview tip No. 13: Ask for the job

Most interviewers ask if you have any questions for them at the end of the interview. If you are excited about the opportunity, it’s OK to ask, “When do I start?” If the interviewer has hiring authority, he/she may make you an offer on the spot. This question shows enthusiasm and initiative, and at the very least elicits a laugh and ends the discussion on a positive note. If the interviewer reacts negatively to this question -- seems taken aback or flustered -- that may be a sign the interview did not go well from their perspective, and it’s time to hunt anew, or seek out ways to assuage them of their reservations about you.

Read More:- http://www.infoworld.com/article/2851128/it-careers/13-tips-ace-your-it-job-interview.html 
12/27/2014 11:57:00 AM

Ultimate cloud speed tests: Amazon vs. Google vs. Windows Azure

A diverse set of real-world Java benchmarks shows Google is fastest, Azure is slowest, and Amazon is priciest.

If the cartoonists are right, heaven is located in a cloud where everyone wears white robes, every machine is lightning quick, everything you do works perfectly, and every action is accompanied by angels playing lyres. The current sales pitch for the enterprise cloud isn't much different, except for the robes and the music. The cloud providers have an infinite number of machines, and they're just waiting to run your code perfectly.

The sales pitch is seductive because the cloud offers many advantages. There are no utility bills to pay, no server room staff who want the night off, and no crazy tax issues for amortizing the cost of the machines over N years. You give them your credit card, and you get root on a machine, often within minutes.

To test out the options available to anyone looking for a server, I rented some machines on Amazon EC2, Google Compute Engine, and Microsoft Windows Azure and took them out for a spin. The good news is that many of the promises have been fulfilled. If you click the right buttons and fill out the right Web forms, you can have root on a machine in a few minutes, sometimes even faster. All of them make it dead simple to get the basic goods: a Linux distro running what you need.

At first glance, the options seem close to identical. You can choose from many of the same distributions, and from a wide range of machine configuration options. But if you start poking around, you'll find differences -- including differences in performance and cost. The machines may seem like commodities, but they're not. This became more and more evident once the machines started churning through my benchmarks.

Fast cloud, slow cloud

I tested small, medium, and large machine instances on Amazon EC2, Google Compute Engine, and Microsoft Windows Azure using the open source DaCapo benchmarks, a collection of 14 common Java programs bundled into one easy-to-start JAR. It's a diverse set of real-world applications that will exercise a machine in a variety different ways. Some of the tests will stress CPU, others will stress RAM, and still others will stress both. Some of the tests will take advantage of multiple threads. No machine configuration will be ideal for all of them.

Some of the benchmarks in the collection will be very familiar to server users. The Tomcat test, for instance, starts up the popular Web server and asks it to assemble some Web pages. The Luindex and Lusearch tests will put Lucene, the common indexing and search tool, through its paces. Another test, Avrora, will simulate some microcontrollers. Although this task may be useful only for chip designers, it still tests the raw CPU capacity of the machine.

I ran the 14 DaCapo tests on three different Linux machine configurations on each cloud, using the default JVM. The instances aren't perfect "apples to apples" matches, but they are roughly comparable in terms of size and price. The configurations and cost per hour are broken out in the table below.

Cloud machines under test

 Virtual CPUs or coresRAMCost per hour
Amazon m1.medium13.75GB12 cents
Amazon c3.large23.75GB15 cents
Amazon m3.2xlarge830.00GB90 cents
Google n1-standard113.75GB10.4 cents
Google n1-highcpu-221.80GB13.1 cents
Google n1-standard-8830.00GB82.9 cents
Windows Azure Small VM11.75GB6 cents
Windows Azure Medium VM23.50GB12 cents
Windows Azure Extra Large VM814.00GB48 cents
I gathered two sets of numbers for each machine. The first set shows the amount of time the instance took to run the benchmark from a dead stop. It fired up the JVM, loaded the code, and started to work. This isn't a bad simulation because many servers start up Java code from command lines in scripts.

To add another dimension, the second set reports the times using the "converge" option. This runs the benchmark repeatedly until consistent results appear. This sometimes happens after just a few runs, but in a few cases, the results failed to converge after 20 iterations. This option often resulted in dramatically faster times, but sometimes it only produced marginally faster times.

The results (see charts and tables below) will look like a mind-numbing sea of numbers to anyone, but a few patterns stood out:
  • Google was the fastest overall. The three Google instances completed the benchmarks in a total of 575 seconds, compared with 719 seconds for Amazon and 834 seconds for Windows Azure. A Google machine had the fastest time in 13 of the 14 tests. A Windows Azure machine had the fastest time in only one of the benchmarks. Amazon was never the fastest.
  • Google was also the cheapest overall, though Windows Azure was close behind. Executing the DaCapo suite on the trio of machines cost 3.78 cents on Google, 3.8 cents on Windows Azure, and 5 cents on Amazon. A Google machine was the cheapest option in eight of the 14 tests. A Windows Azure instance was cheapest in five tests. An Amazon machine was the cheapest in only one of the tests.
  • The best option for misers was Windows Azure's Small VM (one CPU, 6 cents per hour), which completed the benchmarks at a cost of 0.67 cents. However, this was also one of the slowest options, taking 404 seconds to complete the suite. The next cheapest option, Google's n1-highcpu-2 instance (two CPUs, 13.1 cents per hour), completed the benchmarks in half the time (193 seconds) at a cost of 0.70 cents.
  • If you cared more about speed than money, Google's n1-standard-8 machine (eight CPUs, 82.9 cents per hour) was the best option. It turned in the fastest time in 11 of the 14 benchmarks, completing the entire DaCapo suite in 101 seconds at a cost of 2.32 cents. The closest rival, Amazon's m3.2xlarge instance (eight CPUs, $0.90 per hour), completed the suite in 118 seconds at a cost of 2.96 cents.
  • Amazon was rarely a bargain. Amazon's m1.medium (one CPU, 10.4 cents per hour) was both the slowest and the most expensive of the one CPU instances. Amazon's m3.2xlarge (eight CPUs, 90 cents per hour) was the second fastest instance overall, but also the most expensive. However, Amazon's c3.large (two CPUs, 15 cents per hour) was truly competitive -- nearly as fast overall as Google's two-CPU instance, and faster and cheaper than Windows Azure's two CPU machine.
These general observations, which I drew from the "standing start" tests, are also borne out by the results of the "converged" runs. But a close look at the individual numbers will leave you wondering about consistency.
Some of this may be due to the randomness hidden in the cloud. While the companies make it seem like you're renting a real machine that sits in a box in some secret, undisclosed bunker, the reality is that you're probably getting assigned a thin slice of a box. You're sharing the machine, and that means the other users may or may not affect you. Or maybe it's the hypervisor that's behaving differently. It's hard to know. Your speed can change from minute to minute and from machine to machine, something that usually doesn't happen with the server boxes rolling off the assembly line.
Cloud benchmark results - time
Cloud benchmark results - Windows AzureCloud benchmark results - cost
So while there seem to be clear performance differences among the cloud machines, your results could vary. These patterns also emerged:
  • Bigger, more expensive machines can be slower. You can pay more and get worse performance. The three Windows Azure machines started with one, two, and eight CPUs and cost 6, 12, and 48 cents per hour, but the more expensive they were, the slower they ran the Avrora test. The same pattern appeared with Google's one CPU and two CPU machines.
  • Sometimes bigger pays off. The same Windows Azure machines that ran the Avrora jobs slower sped through the Eclipse benchmark. On the first runs, the eight-CPU machine was more than twice as fast as the one-CPU machine.
  • Comparisons can be troublesome. The results table has some holes produced when a particular test failed, some of which are easy to explain. The Windows Azure machines didn't have the right codec for the Batik tests. It didn't come installed with the default version of Java. I probably could have fixed it with a bit of work, but the machines from Amazon and Google didn't need it. (Note: Because Azure balked at the Batik test, the comparative times and costs cited above omit the Batik results for Amazon and Google.)
  • Other failures seemed odd. The Tradesoap routine would generate an exception occasionally. This was probably caused by some network failure deep in the OS layer. Or maybe it was something else. The same test would run successfully in different circumstances.
  • Adding more CPUs often isn't worth the cost. While Windows Azure's eight-CPU machine was often dramatically faster than its one-CPU machine, it was rarely ever eight times faster -- disappointing given that it costs eight times as much. This was even true on the tests that are able to recognize the multiple CPUs and set up multiple threads. In most of the tests the eight CPU machine was just two to four times faster. The one test that stood out was the Sunflow raytracing test, which was able to use all of the compute power given to it.
  • The CPU numbers don't always tell the story. While the companies usually double the price when you get a machine with two CPUs and multiply by eight when you get eight CPUs, you can often save money if you don't increase the RAM too. But if you do, don't expect performance to still double. The Google two-CPU machine in these tests was a so-called "highcpu" machine with less RAM than the standard machine. It was often slower than the one-CPU machine. When it was faster, it was often only about 30 percent faster.
  • Thread count can also be misleading. While the performance of the Windows Azure machines on the Sunflow benchmark track the number of threads, the same can't be said for the Amazon and Google machines. Amazon's two-CPU instance often went more than twice as fast as the one-CPU machine. On one test, it was almost three times faster. Google's two-CPU machine, on the other hand, went only 20 to 25 percent faster on Sunflow.
  • The pricing table can be a good indicator of performance. Google's n1-highcpu-2 machine is about 30 percent more expensive than the n1-standard-1 machine even though it offers twice as much theoretical CPU power. Google probably used performance benchmarks to come up with the prices.
  • Burst effects can distort behavior. Some of the cloud machines will speed up for short "bursts." This is sort of a free gift of the extra cycles lying around. If the cloud providers can offer you a temporary speed up, they often do. But beware that the gift will appear and disappear in odd ways. Thus, some of these results may be faster because the machine was bursting.
  • The bursting behavior varies. On the Amazon and Google machines, the Eclipse benchmark would speed up by a factor of more than three when using the "converge" option of the benchmark. Windows Azure's eight-CPU machine, on the other hand, wouldn't even double.
Cloud benchmark results - Amazon EC2If all of these factors leave you confused, you're not alone. I tested only a small fraction of the configurations available from each cloud and found that performance was only partially related to the amount of compute power I was renting. The big differences in performance on the different benchmarks means that the different platforms could run your code at radically different speeds. In the past, my tests have shown that cloud performance can vary at different times or days of the week.

This test matrix may be large, but it doesn't even come close to exploring the different variations that the different platforms can offer. All of the companies are offering multiple combinations of CPUs and RAM and storage. These can have subtle and not-so-subtle effects on performance. At best, these tests can only expose some of the ways that performance varies.

This means that if you're interested in getting the best performance for the lowest price, your only solution is to create your own benchmarks and test out the platforms. You'll need to decide which options are delivering the computation you need at the best price.

Calculating cloud costs

Working with the matrix of prices for the cloud machines is surprisingly complex given that one of the selling points of the clouds is the ease of purchase. You're not buying machines, real estate, air conditioners, and whatnot. You're just renting a machine by the hour. But even when you look at the price lists, you can't simply choose the cheapest machine and feel secure in your decision.
The tricky issue for the bean counters is that the performance observed in the benchmarks rarely increased with the price. If you're intent upon getting the most computation cycles for your dollar, you'll need to do the math yourself.

The simplest option is Windows Azure, which sells machines in sizes that range from extra small to extra large. The amount of CPU power and RAM generally increase in lockstep, roughly doubling at each step up the size chart. Microsoft also offers a few loaded machines with an extra large amount of RAM included. The smallest machines with 768MB of RAM start at 2 cents per hour, and the biggest machines with 56GB of RAM can top off at $1.60 per hour. The Windows Azure pricing calculator makes it straightforward.

One of the interesting details is that Microsoft charges more for a machine running Microsoft's operating system. While Windows Azure sometimes sold Linux instances for the same price, at this writing, it's charging exactly 50 percent more if the machine runs Windows. The marketing department probably went back and forth trying to decide whether to price Windows as if it's an equal or a premium product before deciding that, duh, of course Windows is a premium.

Google also follows the same basic mechanism of doubling the size of the machine and then doubling the price. The standard machines start at 10.4 cents per hour for one CPU and 3.75GB of RAM and then double in capacity and price until they reach $1.66 per hour for 16 CPUs and 60GB of RAM. Google also offers options with higher and lower amounts of RAM per CPU, and the prices move along a different scale.

The most interesting options come from Amazon, which has an even larger number of machines and a larger set of complex pricing options. Amazon charges roughly double for twice as much RAM and CPU capacity, but it also varies the price based upon the amount of disk storage. The newest machines include SSD options, but the older instances without flash storage are still available.

Amazon also offers the chance to create "reserved instances" by pre-purchasing some of the CPU capacity for one or three years. If you do this, the machines sport lower per-hour prices. You're locking in some of the capacity but maintaining the freedom to turn the machines on and off as you need them. All of this means that you can ask yourself how much you intend to use Amazon's cloud over the next few years because it will then help you save more money.

In an effort to simplify things, Google created the GCEU (Google Compute Engine Unit) to measure CPU power and "chose 2.75 GCEUs to represent the minimum power of one logical core (a hardware hyper-thread) on our Sandy Bridge platform." Similarly, Amazon measures its machines with Elastic Compute Units, or ECUs. Its big fat eight-CPU machine, known as the m3.2xlarge, is rated at 26 ECUs while the basic one-core version, the m3.medium, is rated at three ECUs. That's a difference of more than a factor of eight.

This is a laudable effort to bring some light to the subject, but the benchmark performance doesn't track the GCEUs or ECUs too closely. RAM is often a big part of the equation that's overlooked, and the algorithms can't always use all of the CPU cores they're given. Amazon's m3.2xlarge machine, for instance, was often only two to four times faster than the m3.medium, although it did get close to being eight times faster on a few of the benchmarks. 

Caveat cloudster

The good news is that the cloud computing business is competitive and efficient. You put in your credit card number, and a server pops out. If you're just looking for a machine and don't have hard and fast performance numbers in mind, you can't go wrong with any of these providers.

Is one cheaper or faster? The accompanying tables show the fastest and cheapest results in green and the slowest and priciest results in red. There's plenty of green in Google's table and plenty of red in Amazon's. Depending on how much you emphasize cost, the winners shift. Microsoft's Windows Azure machines start running green when you take the cost into account.

The freaky thing is that these results are far from consistent, even across the same architecture. Some of Microsoft's machines have green numbers and red numbers for the same machine. Google's one-CPU machine is full of green but runs red with the Tradesoap test. Is this a problem with the test or Google's handling of it? Who knows? Google's two-CPU machine is slowest on the Fop test -- and Google's one-CPU machine is fastest. Go figure.

All of these results mean that doing your own testing is crucial. If you're intent on squeezing the most performance out of your nickel, you'll have to do some comparison testing and be ready to churn some numbers. The performance varies, and the price is only roughly correlated with usable power. There are a number of tasks where it would just be a waste of money to buy a fancier machine with extra cores because your algorithm can't use them. If you don't test these things, you can be wasting your budget.
Cloud benchmark results - Windows Azure

It's also important to recognize that there can be quite a bit of markup hidden in these prices. For comparison, I also ran the benchmarks on a basic eight-core (AMD FX-8350) machine with 16GB of RAM on my desk. It was generally faster than Windows Azure's eight-core machine, just a bit slower than Google's eight-core machine, and about the same speed as Amazon's eight-core box. Yet the price was markedly different. The desktop machine cost about $600, and you should be able to put together a server in the same ballpark. The Google machine costs 82 cents per hour or about $610 for a 31-day month. You could start saving money after the first month if you build the machine yourself.

The price of the machine, though, is just part of the equation. Hosting the computer costs money, or more to the point, hosting lots of computers costs lots of money. The cloud services will be most attractive to companies that need big blocks of compute power for short sessions. If they pay by the hour and run the machines for only a short block of time, they can cut the costs dramatically. If your workload appears in short bursts, the markup isn't a problem because any machine you own will just sit there most of the day waiting, wasting cycles and driving up the air conditioning bills.

All of these facts make choosing a cloud service dramatically more complicated and difficult than it might appear. The marketing is glossy and the imagery makes it all look comfy, but hidden underneath is plenty of complexity. The only way you can tell if you're getting what you're paying for is to test and test some more. Only then can you make a decision about whether the light, airy simplicity of a cloud machine is for you.
12/27/2014 11:44:00 AM

Mobile's gift to the world: In poor countries, it's life-changing

A third of the world's population has at least basic access to the mobile Internet.

It's an amazing stat: There are 2.3 billion mobile Internet subscribers on the planet -- one-third of the world's population. When you add desktop and laptop computers to the mix, the number of people with Internet access rises to 3 billion. If you had any illusion that mobile computing is the future, these figures should show you it is actually the present.

The data comes from a survey of 5,500 people across the globe commissioned by Juniper Networks, which of course wants to power much of those connections. But Juniper's self-interest doesn't affect what the survey shows: Those 2.3 billion people get tremendous benefit from being connected to the Internet from devices they have at hand.
The effects of that connectivity are profound:
  • 97 percent of people in developing countries say mobile Internet access has been transformative in their lives, versus 78 percent in the richest countries, including the United States.
  • 52 percent of people in developing countries say mobile Internet access has been a key change agent for how they work, versus 26 percent in the richest countries. Also, 40 percent of people in developing countries report that connectivity has improved their earnings power, compared with 17 percent in rich countries.
  • 24 percent of people in developing countries use the mobile Internet for educational purposes, versus 12 percent in the richest countries.
  • People in rich countries use the mobile Internet more for tasks rich people can do: shop (41 percent), bank (51 percent), and (increasingly) control gadgets in our homes. In poor nations, 33 percent shop via mobile and 40 percent bank via mobile. Home automation is relatively unknown in developing countries; instead, the mobile Internet focus is more on communication, research (such as on foodstuff prices, weather, and traffic), and education.
The Juniper report divides the two groups thus: Poor people use the mobile Internet for personal advancement, whereas rich people use it for personal convenience.

The bottom line: Americans, Canadians, Europeans, Koreans, Japanese, Australians, New Zealanders, and so on have had email, the Web, e-commerce, in-car navigation, and other connected technologies for a couple decades now, so we take it more for granted -- the rest of the world has not. Also, the rest of the world more strongly feels the opportunities and advantages that mobile connectivity brings.

That's all the more remarkable when you consider the state of mobile infrastructure in developing countries. Cellular networks are often 2G, less often 3G, and rarely LTE. They contend with bandwidth limitations we moved past seven or eight years ago. They have mobile connections in far fewer places, given the greater proportion of their populations in harder-to-connect rural areas and slums.

People in developing countries also use much less capable devices. An iPhone or Galaxy costs months of income -- or more -- in many developing countries and is well beyond the reach of the typical farmer, merchant, educator, or worker. These people instead rely on what we would consider crippled Android devices, old Nokia S60s, aging BlackBerrys, and homegrown OSes we've never heard of.

Thus, the significant impact of mobile connectivity on these people is even more remarkable. It's a gift that they clearly appreciate. We should, too.

Tuesday, December 23, 2014

12/23/2014 08:49:00 PM

Google hitches cloud data analysis

The company is looking to extend its Google Cloud Dataflow platform to other languages and environments.

Google is offering a Java SDK to integrate with the Google Cloud Dataflow managed service for analyzing live streaming data as part of its effort to broaden support for the platform.

By sharing via open source, the SDK provides a basis for adapting Dataflow to other languages and execution environments, said Sam McVeety, Google software engineer, in a recent bulletin. "We've learned a lot about how to turn data into intelligence as the original FlumeJava programming models (basis for Cloud Dataflow) have continued to evolve internally at Google."

Google hopes to expand the Dataflow service as well as spur innovation in combining stream and batch processing models. "As the proliferation of data grows, so do programming languages and patterns," said McVeety. "We are currently building a Python 3 version of the SDK to give developers even more choice and to make dataflow accessible to more applications. Reusable programming patterns are a key enabler of developer efficiency. The Cloud Dataflow SDK introduces a unified model for batch and stream data processing."

For other environments, McVeety said modern development, particularly in the cloud, is about heterogeneous service and composition. "As Storm, Spark, and the greater Hadoop family continue to mature, developers are challenged with bifurcated programming models. We hope to relieve developer fatigue and enable choice in deployment platforms by supporting execution and service portability."

Google Cloud Dataflow was introduced in June as a step toward providing a managed service model for data processing. Still in an alpha stage of release and restricted to "whitelisted" users (newcomers must apply for access to the service), Cloud Dataflow is intended to make it easier to focus on analysis without having to fret over maintenance of underlying data piping and processing infrastructure. An InfoWorld analysis of Cloud Dataflow concluded it is probably not a Hadoop killer, but a way for Google Cloud users to enrich applications.
12/23/2014 08:47:00 PM

Spark 1.2 challenges MapReduce's

The newest version of Spark in-memory framework for Hadoop improves performance.

Apache Spark, the in-memory and real-time data processing framework for Hadoop, turned heads and opened eyes after version 1.0 debuted. The feature changes in 1.2 show Spark working not only to improve, but to become the go-to framework for large-scale data processing in Hadoop.

Among the changes in Spark 1.2, the biggest items broaden Spark's usefulness in multiple ways. A new elastic scaling system allows Spark to better use cluster nodes during long-running jobs, which has apparently been requested often for multitenant environments. Spark's streaming functionality, a major reason why it's on the map in the first place, now has a Python API and a write-ahead log to support high-availability scenarios.

The new version also includes Spark SQL, which allows Spark jobs to perform Apache Hive-like queries against data, and it can now work with external data sources via a new API. Machine learning, all the rage outside of Hadoop as well, gets a boost in Spark thanks to a new package of APIs and algorithms, with better support for Python as a bonus. Finally, Spark's graph-computing API GraphX is out of alpha and stable.

Spark's efforts to ramp up and expand speak to two ongoing efforts within the Hadoop world at large. The first is to shed the straitjacket created by legacy dependencies on the MapReduce framework and move processing to YARN, Tez, and Spark. Gary Nakamura, CEO of data-application infrastructure outfit Concurrent, believes the "proven and reliable" MapReduce will continue to dominate production over Spark (and Tez) in the coming year. However, MapReduce's limitations are hard to ignore, and they put real limitations on the work that can be done with it.

Another development worth noting is Python's expanding support for Spark -- and Hadoop. Python's popularity with number-crunchers remains strong and is ideal for use in Hadoop and Spark, but most of Python's support there has remained confined to MapReduce jobs. Bolstering Spark's support for Python broadens its appeal beyond the typical enterprise Java crowd and with Hadoop in general.

Much of Spark's continued development has come through contributions from Hadoop shop Hortonworks. The company has deeply integrated Spark with YARN, is adding security and governance by way of the Apache Argus project, and is improving debugging.

This last issue has been the focus of criticism in the past, as programmer Alex Rubinsteyn has cited Spark for being difficult to debug: "Spark's lazy evaluation," he wrote, "makes it hard to know which parts of your program are the bottleneck and, even if you can identify a particularly slow expression, it's not always obvious why it's slow or how to make it faster."
12/23/2014 08:43:00 PM

Busted! The campaign against counterfeit reviews

Fake reviews are poisoning the Internet. Here's how machine learning is attempting to nail the counterfeiters.

Customer sentiment is a type of soft currency. Good reviews are monetizable data, especially when they come from influential, reputable sources and are broadcast far and wide.

In other words, it's best when your fans and their fond feelings are earned.

This marketing principle works in reverse, of course -- negative sentiments and nasty reviews can be showstoppers. Reputations lost cannot easily be reclaimed. And when bad raps persist in public forums -- such as social media, e-commerce sites, or review sites -- you can't count on people forgetting the mud that was slung at you last year or the year before. It will stain your brand in perpetuity, even if the charges were baseless and you've effectively addressed those that weren't.

What's shocking about online reviews is how easy this "currency" is to counterfeit. Cyber space is rife with fake reviews, both positive and negative. We can interpret "fake" in several ways:
  • The reviewer may use their own name but conceal the fact that they've been "put up to it" (they may have been paid, have a vested interest, or anticipate other material benefits to flow if they say nice things -- or nasty things to trash the competition).
  • The reviewer may use a pseudonym or otherwise post anonymously, in order to shield himself or herself from being fingered as the perpetrator.
  • The reviewer may be an automated program that posts legitimate-looking reviews in bulk, thereby overwhelming whatever authentic reviews have been posted manually.
Due to the levels of deception that may be involved, detecting fake online reviews requires that we confirm the following:
  • The authenticity of the source
  • The source's impartiality on the matters being reviewed
  • The nonspam originality of the actual reviews posted by the source
These are tough nuts to crack, especially in an automated fashion that can weed out the bogus reviews before they're posted and do their damage. In that regard, I recently came across an interesting article about an effort at the University of Kansas to develop machine learning algorithms to detect fake reviews. The researchers cite the need for a "more trustworthy social media experience" as driving their initiative.

What the article describes is one part semantic analysis of the posts (to look for verbal signatures of fake reviews), one part graph analysis (to assess the status of each reviewer's relationship with the site on which they post), one part outlier analysis (to determine whether the posts are far outside the average in terms of the sentiments expressed and the frequency of posting), and one part behavioral analysis (to determine whether bogus reviewers are changing their strategy over time and across sites to avoid detection). Underlying the researchers' efforts is an attempt to model fake-review attacks as a graph of "interactions between sociological, psychological, and technological factors."

People might trust online reviews more if they have some confidence that bogus postings are being detected promptly and accurately. Like any content-filtering technology, anti-fake-review algorithms will need to minimize both false positives (seemingly fake reviews that are real) and false negatives (fake reviews that are misclassified as real).

The stakeholders are obviously the businesses and other online entities whose reputations are at stake, as well as the public at large, which uses these opinions in determining whether this or that site, community, or business is worth associating with. If the researchers succeed in bringing machine learning algorithms to bear on the problem, their work could aid online sites in their efforts to self-police fake reviews. It could also help flag possible abusers so that they can be investigated further, blocked from accessing sites, and even referred to the relevant authorities for punitive actions.

If the researchers want to produce an algorithm of practical value, they will need to make it fast, efficient, parallelizable, and automatable to the max. It will need the cloud scalability of today's state-of-the-art antispam, antiphish, and antimalware technologies. As no sane human wants to manually filter an ocean of Nigerian scams, no one in his or her right mind will want to adjudicate whether the next "this restaurant's food stinks" review is the authentic voice of a real customer -- or the malicious posting of its archrival across the street.

It all comes down to detecting the fine line between sincerity and its opposite. It's the same fine line that anti-sarcasm algorithms also attempt to identify, with mixed results.

 

12/23/2014 08:41:00 PM

Forget commodity hardware now the cloud wants custom chips

It used to be that computing hardware was kept general-purpose to support any workload.

Intel will produce its own custom chips for cloud providers next year, reports the New York Times.
The reason is simple: Intel can easily change chip designs and leave out unwanted core features or alter other properties, such as to gain compute and power optimization for public cloud servers. It’s a matter of a few commands to the production systems to get custom chips.

The public cloud providers are willing to pay for custom chips if they can save power during operations or get better performance when processing common workloads. Indeed, since 2012, Intel has had an internal “Just Say Yes” program dedicated to looking for workloads that require custom chips.

In fact, most public cloud providers already contract with Intel or other chipmakers for custom chips, again for cost and workload optimization purposes. There does not seem to be much of a downside to doing so, and I suspect the practice will continue.

But there is a downside: Workload characteristics change over time, so the underlying software that supports a public cloud system needs to change as well. If the chips are customized for specific system requirements, changing the system means a mismatch in what the workloads need to do -- and the services the custom chips provide.

However, general-purpose chips won’t provide the same optimization and cost efficiencies, so it may still be more economical to swap out the chips (or the servers using them) as the workloads change. I suspect such chip-planning meetings are happening in cloud providers' offices all over the world right now, as public cloud computing expands and the margins continue to be small to nonexistent, which only increases the pressure to increase efficiency.

The use of custom chips for public clouds seems to be the right thing to do, but it’s also a bit strange. Software was always the part of the system you customized, and the hardware supported the software. But these days, software and hardware need to work closer together, so hardware may have to be more malleable than before, especially for the scale and fast pace of the cloud.
12/23/2014 08:39:00 PM

Review: Salesforce has the right stuff for mobile development

Salesforce1 Platform gives Salesforce developers at all skill levels good options for building mobile apps.

Salesforce.com started as a cloud service for sales force automation. The company added a cloud-based development platform, Force.com, with a Java-like language (APEX), and went on to acquire the Heroku PaaS and a number of other services.

Today, Salesforce.com is the acknowledged 800-pound gorilla of SaaS for a number of business application areas: not only sales force automation, but also marketing, customer service, community building, business intelligence, B2B prospecting, and collaboration. Over the last few years, Salesforce has been building out its mobile strategy.

As we’ll see, the Salesforce1 toolkit includes a Web-based drag-and-drop designer that even a Neanderthal – make that a business analyst – can use. (I’m now in trouble with both Neanderthals and business analysts.) At the next level of complexity, a Web developer who knows some HTML5, CSS3, and JavaScript can build pages for Salesforce1 in Visual Force with Mobile Packs or using Lightning components. At the highest level of complexity, a mobile developer can build native or hybrid applications against Salesforce data for iOS and Android with the Salesforce Mobile SDKs. Meanwhile, any of these can utilize the MBaaSes provided by the Salesforce1 Platform.
12/23/2014 02:45:00 PM

Add yours now! 20 hot user ideas for Windows 10

Vox populi: Windows Feedback means you can prod Microsoft to change Windows 10.





Top 20 Windows 10 Feedback suggestions

Top 20 Windows 10 Feedback suggestions

The Windows 10 Feedback mechanism offers a unique way for you to change the course of Windows development. While Microsoft's made only a few, very minor changes to the Win10 Technical Preview in response to feedback, the time to get your vote in is right now, while the dev team goes into an intense six-to-eight-week full-court press.

The following list aggregates feedback items that many Windows "Insiders" feel are most important, with a bit of cheering from the mouse-and-keyboard peanut gallery, and vetting from yours truly. If you want to see these changes in the shipping version of Windows 10, speak now or forever hold the pieces.
Let’s not let the tragedy of Windows 8 be repeated on our watch!

If you aren't already running Windows 10, it's easy to use Microsoft's ISO file to install the latest version, Windows 10 Technical Preview build 9879. Get with the system and get your opinion heard!

Windows 10 Feedback: How to be heard

Windows 10 Feedback: How to be heard

When you are officially signed up for the Insider Program and have the latest version of Windows 10 Technical Preview installed, go into the Windows Feedback application by clicking Start, then choosing the Windows Feedback tile on the right side of the Start menu.
Take a few minutes to orient yourself in the Windows Feedback app. In particular, note how you can add a Me Too to any of the existing feedback items. That's the key. If you find a suggestion that rings your chimes, give it a Me Too. If you have a suggestion that's slightly different from one you see, write up the details as a New Feedback -- but don't forget to add your Me Too vote to the original item. Microsoft counts Me Toos.
Section: Apps/OneDrive 

Section: Apps/OneDrive

1. Tester's Feedback: Not at all pleased with the changes to how OneDrive interacts with this latest build (9879). In the OneDrive folder ONLY the synced stuff shows up. Things that are online only have to be gotten to by going to the website. Not good imo.
You can see the problem in this screenshot. On the left, in File Explorer, my OneDrivePictures folder has a folder called Camera Roll and a handful of individual files. On the right, if I log in to OneDrive, there are two folders -- including an extra one called Photos. The file called Photos isn't synced, so it doesn't show up in File Explorer. In fact, if you looked at your OneDrive folder with File Explorer in build 9879, you'd have no way of knowing the Photos folder existed -- applications can't get to it, Windows searches won't find anything in it, you can't save to it.
It's a controversial move that, in my opinion, makes Windows 10 and OneDrive considerably less useful than they should be. Computerworld's Gregg Keizer has an excellent synopsis of the controversy. Peter Bright at Ars Technica posted a more conciliatory analysis, and Microsoft has responded. I, for one, think that CITEworld contributor Mary Branscombe hit the nail on the head with her original post, which garnered 7,100 votes, then was pulled by Microsoft.
You can't vote for Mary Branscombe's original. But you can, if so inclined, vote for its proxy, listed above.
Section: Apps/All Other Apps 

Section: Apps/All Other Apps

2. Tester's Feedback: We need a new Windows Media player. Most people install other players like mpc and vlc because of the lack of codec support and features. WMP should have a playlist that can be detached as a separate window. The playlist should support auto save in case you've added new files to it but the player was closed unexpectedly…
3. Tester's Feedback: Don't preload so many junk apps: Travel, Games, News etc. If people want an app for it, they'll go to the Windows Store to get it. Not including them will also make the OS be smaller and configure faster.
I get that Microsoft wants to sell Xboxes, games, music, videos, and the like. I don't get how that translates into poorly behaved Windows apps that I'm forced to install but will never use.
Looks like Windows 10 will be able to play FLAC and MKV files -- only 10 years late and 10 cents short.
I give a Me Too to both suggestions -- with a twist. Some inside scuttlebutt says Windows Media Player may be dropped in Win10, or at the very least, it won't be improved. Good riddance, sez I. Hey, Microsoft, why not include a copy of VLC with Win10, kill WMP, and make Xbox Music and Xbox Video optional?
Section: Apps/App behavior on multi-monitor 

Section: Apps/App behavior on multi-monitor

4. Tester's Feedback: Add the ability to set other desktops to another monitor. This will provide users with Multi-display setups to have more multitasking functionality when combining the power of Multiple desktops and Other displays.
I'm surprised this wasn't in the design spec from day one.
Although the user interface for assigning desktops to monitors might be challenging (context menu on each of the thumbnails?), the ability to set up a desktop, then send it to a different monitor would be a godsend for many multi-mon-munchkins.
Section: File Explorer/File Association 

Section: File Explorer/File Association

5. Tester's Feedback: Give us an option to unassociate file types! If someone, for example, associates by mistake a system file type to a 3rd party program, then all files of the same type will appear to open with that program.
6. Tester's Feedback: Don't check "Use this app for all .xyz files" by default. This drives me nuts. I use Open With to open a file with a different program in a one-off scenario. I constantly have to uncheck it.
The two problems go hand in hand. Advanced Windows users frequently open a file with a one-off program and forget to uncheck the box. That leads to the situation where you want to get rid of the association.
I've seen people royally mess up their machines by assigning an unusual program (say, Notepad) to a critical filename extension (for example, .dll). Try diagnosing that one.
Section: File Explorer/File Picker 

Section: File Explorer/File Picker

7. Tester's Feedback: Tabbed browsing! We use it daily with our Web browser. Our file browser needs it built in, so I don't have to keep using third-party programs.
(There's a similar item in the section File Explorer/Ribbon and context menus)
Meet Clover, the best-known third-party program to add tabs to the Windows 8 (File) Explorer. Clover has a very simple user interface: Exactly as you would drag and drop websites to create browser tabs, you can drag and drop locations inside Windows Explorer up to the top, to turn them into tabs.
Click on the tab, and Windows Explorer navigates to the location: easy, intuitive, effective.
This screenshot shows Clover working perfectly well with the Windows 10 Tech Preview build 9879 File Explorer. The tabs across the top are the ones I chose to speed up navigating. Why can't Windows 10's built-in File Explorer do the same thing?
Section: File Explorer/Libraries 

Section: File Explorer/Libraries

8. Tester's Feedback: I would like the option to open file explorer to "This PC" instead of "Home" or the ability to add drives to the Frequent folders under "Home."
9. Tester's Feedback: Windows + E should take you to This PC, not Home. Or give the option to include This PC to my Favorites.
10. Tester's Feedback: The home folder should be customizable.
Windows 7 opened Windows Explorer to your Libraries. Windows 8 doesn't play well with Libraries, so Microsoft changed Windows Explorer to open in a made-up place called "This PC," which includes the primary folders (Desktop, Documents, Downloads, Music, Pictures, Videos), Devices and Drives, and Network Locations. In Windows 10 build 9879, File Explorer (different name, same app) opens to a new made-up place called "Home," which, as of this build, lists Frequent folders and Recent files.
There doesn't appear to be any way to modify the contents of the "Home" location, so you're stuck with Frequent folders and Recent files. Clearly, Microsoft hasn't thought this through very well.
Section: Windows Installation and Setup/First sign-in Start screen layout or app registration 

Section: Windows Installation and Setup/First sign-in Start screen layout or app registration

11. Tester's Feedback: Please add the ability to register a local user account without logging into the Microsoft online account.
12. Tester's Feedback: You can sign in with a standard account, but the need to select "create a new account" under "sign in with Microsoft account" is misleading, having the "Make a local account" under it would be better.
13. Tester's Feedback: I don't like being forced to use Microsoft accounts as my Windows account, so I had to go through hustle of providing fake Outlook account to force install process option where I can create local account.
Similar Feedback under Windows Installation and Setup/Out-of-box-experience and under Windows Installation and Setup/Windows installation, Personalization and Ease of Access/User accounts, and several others.
Microsoft is still stacking the deck, trying hard to get you to use your Microsoft account as a Windows logon. If you don't want to use a Microsoft account (or convert a current email account into a Microsoft account) to log onto Windows, when you create the local Windows account (see screenshot) you have to click "Sign in without a Microsoft account (not recommended)," then at the bottom of the next screen click "Local Account."
Microsoft's playing Google's game. When you use a Microsoft Account to log on to Windows, Microsoft can keep track of where and when you're logging in, correlate your user ID with your IP address (and thus your Bing searches and visited URLs in IE), and track all of your local searches.
If you think Local Accounts are for power users only, ask yourself this: If a typical Windows customer understood that using a Microsoft account let Microsoft track them and their searches, what would their reaction be? I don't know about you, but my Aunt Mergatroid would be aghast.
Section: Network/Network and sharing center 

Section: Network/Network and sharing center

14. Tester's Feedback: There should be a more intuitive way to change the type (private vs public) of a network.
As it stands in build 9879, the only way I've found to change a network from public to private, or vice versa, requires you to "Forget" the network. The only way to do that, as best I can tell, is to bring up the Network pane on the right -- click Start, PC Settings, Networks, Manage, and at the bottom click Open Network Flyout (see screenshot). Then right-click on the connection you want to change, and choose “Forget this network.”
Then you have to reconnect to the network, this time specifying either private or public.
Yeah, I think that needs to be more intuitive.
Section: Search/Windows Search 

Section: Search/Windows Search

15. Tester's Feedback: The Windows Search button on the taskbar is weird. Is it supposed to be its own app? If so, why whenever you click a link does it go to IE? This is very strange to me. If this is the case, why even have it? I could just fire up IE and do the same thing. Seems redundant.
16. Tester's Feedback: Windows Search result brings search results from the Web which personally is a terrible idea. If I want to search the Web, I will start IE or any other browser and search against Google or Bing or Yahoo search engine.
Microsoft uses Windows Search's extended Web searching -- "Smart Search" -- to sell ads and to add to its Bing hit count. I talked about it a year ago, when Microsoft first introduced the "feature" into what would become Windows 8.1 Update 1.
In a nutshell, unless you turn it off, Microsoft can track every search you make on your machine -- and feed you ads based on your search terms. I'm not talking about Web searches. I'm talking about simple searches for documents, or photos, or music. If you use a Microsoft account and leave Windows Search enabled, Microsoft can amass an enormous amount of information, solely from the way you use your machine on your data.
In Windows 8.1 you can turn it off (or set it off during installation), although it's enabled by default. In Windows 10, I don't see any way to turn it off.
Scroogled? Bah! Microsoft snoops around, too -- and it brags to its advertisers about how effective Windows snooping can be.
Section: Windows Update and Recovery/Backup and restore 

Section: Windows Update and Recovery/Backup and restore

17. Tester's Feedback: You should return Windows 7 style backup.
18. Tester's Feedback: Bring back scheduled increment Windows image backups. File history is great, but it's not enough.
19. Tester's Feedback: Make System Restore easier to find! Ever since Windows 8 you've hidden the classic system restore (where you have automatic restore points) inside "Recovery" so you can push Refresh and Reset PC to the front... Make it accessible again for non-computer-savvy users!
Many more Feedback items, in multiple Feedback sections, are in the same vein.
Microsoft buried the Windows Backup tools in Windows 8, tore some of them out of Windows 8.1, and they sure as shootin' aren't coming back in Windows 10. The goal, I'm told, is to present a Chromebook-style backup capability: You don't need to back up anything on your computer because it's in the cloud. To that I say, balderdash. If I want Chromebook backup, I'll use a Chromebook. (In fact, I do, but that's another story.)
Windows 10 doesn't create daily restore points (see screenshot). If there's a way to bring back Windows 7's full-system backup, I can't find it (the Win 8.1 trick of searching for "Windows 7 File Recovery" doesn't work). With OneDrive hitting the skids (see previous slide), backing up to OneDrive is harder than ever.
I doubt that Microsoft will bring back Windows 7 backup/recovery, but I can always hope.
Section: Windows Update and Recovery/Restore, refresh and reset 

Section: Windows Update and Recovery/Restore, refresh and reset

20. Tester's Feedback: Allow holding down Shift or F8 or possibly even the Windows key at boot to get the Advanced Boot Options screen. This will make getting into safe mode and other recovery options easier to access before Windows boots.
Booting into Safe Mode in Windows 10 (like Win8 before it) is a convoluted process with a chicken-and-egg element. If you want to get into Safe Mode (or System Restore, Image Recovery, Startup Repair), you have to go into PC Settings, Update and Recovery, Recovery, then click Restart Now at the bottom (see screenshot).
For the 1.5 billion people who've been exposed to F8 on boot, perhaps by proxy or vicariously, that's a big change. It also means Windows has to be working before you can boot into Safe Mode.
Windows 10 Feedback: Speak up now 

Windows 10 Feedback: Speak up now

That's my take on the best of the Windows 10 Feedback items.
I didn't include items that have been suggested a million times and ignored by Microsoft (for example, creating a clipboard manager that stores multiple items, or showing filename extensions by default). I also didn't include feedback that's surely already on the dev list (such as dragging an open app from one desktop to another, different themes on different desktops, putting an icon marker on shared folders, bringing back the network icon in the notification area), nor did I include obvious bugs. I skipped over UI suggestions, many of which are fine, but all of which fall into the de gustibus bucket (except for bringing back Aero Glass which, to my uneducated eye, is more like a Holy Grail).
Did I miss one of your favorites? Sound off in the comments! Let's get these design screwups fixed by the time Win10 ships.
Fer heaven's sake, quitcherbitchin, sign up, and tell Microsoft how to make Windows 10 better!