First impression is everything. Making a good first impression is very key for interviews and Big Data or Hadoop interviews are no exception to this rule. For eg., lets say you come across a open position for Hadoop Developer from Apparel Inc., you are super excited about the position, you feel that the roles and responsibilities in the job requirement match exactly with what you are doing on a day to day basis. You feel confident and then you apply to this job with a resume and your Hadoop project in the resume looks something like below.
Client: Apparel Inc., Dec 2013 – Till Date Hadoop Developer Responsibilities:
- Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
- Moving data from HDFS to RDBMS and vice-versa using SQOOP.
- Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis
- Installed and configured Hadoop cluster in Test and Production environments
- Performed both major and minor upgrades to the existing CDH cluster
- Implemented Commissioning and Decommissioning of new nodes to existing cluster.
- Analyzing/Transforming data with Hive and Pig.
Now you are expecting and anxiously waiting for a call from Apparel Inc. or at least from a vendor. 1 day, 2 days and 2 weeks pass by and still no calls. Feeling rejected already you post your Hadoop resume online in major job sites and still no interviews and you wonder why.
Resume with above project description and details lacks quality, looks sloppy and honestly boring. Understand that vendor or client doesn’t know anything about you. Their decision to consider you for the Hadoop position or even call you is solely based on resume and nothing else. Over the course of time we have seen several resumes. And believe it or not about 8 out of 10 resume are with poor quality and looks something like above.
Hadoop is so HOT in the market and resume preparation is a one time process and done right your Hadoop resume will get noticed, picked up and you will get calls non stop. I don’t want to leave you hanging any longer. The remaining of the post will list 5 actionable items or tips or things your Hadoop Resume should have that will help you get the next dream job with the dream company and a dream career you always wanted.
1. ABOUT CLIENT & USE CASE
An interviewer would like to know who you are working for and what you do. More importantly why you do what you do. Explain who your employer is. You may think everyone knows about your company or client or employer but that may not be the case. This is true for any resume not just Hadoop resume. In couple of lines tell about your employer.
People who look at your resume would need to know how and why you are using Hadoop in your project. Describe the project in detail and explain how Hadoop helps you solve your use case or problem at hand. So that anyone who looks at your Resume understands exactly why you are using Hadoop. Doing this is very important and we can not stress this enough. If you did this correctly you are half way through in getting a phone call from the recruiter.
2. YOUR HADOOP ENVIRONMENT
A Hadoop environment is something in which 100s of nodes work together in harmony. It is a like a well organized orchestra if you think about it. It is very important to give more details about your cluster like number of nodes, data volume, distribution used, tools used and versions, node configuration, special configurations like High Availability (HA), details about cloud services like AWS etc.
Why this is important? This will explain the interviewer or the hiring manager the expertise you have and depending on the size of the cluster, the volume you dealt with they can imagine the experience you have in debugging and troubleshooting issues. Also this explains that you actually more involved in project than just working on a single tool like Pig or Hive in a Hadoop cluster. It’s all about how well you can communicate what you know.
3. ROLE YOU PLAYED
If you are hiring manager, would you hire a Hadoop tester for a Hadoop developer position? You wouldn’t. Look at this post to get an idea of what the managers are wanting to know about the role you played in your Hadoop project. Now imagine that you can answer all those questions (at least in brief) in your Hadoop resume, every single interviewer/recruiter/hiring manager/vendor would like to discuss about the Hadoop openings they have. Because you have demonstrated what you know in your resume very clearly.
Keeping your day to day activities in mind you may think that you are a developer. But you might have involved in designing your cluster or installing and configuring your cluster. You might have done a Namenode restore once. Don’t be shy and be too modest. List all the things that you did and improve your chances of getting picked up by a recuriter. Many times we see resumes from referrals which are lousy and boring but when we speak to the candidates we can see their full potential and what they know about Hadoop and what they accomplish day to day. Their resume clearly don’t do justice to what they are doing and what they know. If they did not come through referrals there is no way we would have given them a call for an interview because of the lack of information in their resume.
4. TOOLS YOU USED
If you are working in a Hadoop environment you are most likely working on or familiar with more than one tool from the Hadoop ecosystem. List all the tools that you are familiar with and knowledgeable about. We are not suggesting that you add every tool out there in and around the Hadoop ecosystem to make your resume a power house. Keep in mind you will get questions from the tools that you have mentioned in your resume.
Tip: when you list the tools order them in a natural order of their function and usage.
- Start with the data ingestion tools like flume, sqoop etc.
- Then go ahead with listing the data transformation and analysis tools like Pig or Hive.
- Be sure to mention about the file formats like Sequence File, RCFile etc at this point.
- Then comes the coordination or orchestration tools like Oozie.
- Mention the tools used for troubleshooting and debugging.
- Then list the tools used for cluster management like Cloudera Manager or Apache Ambari.
- List the tools used for security for eg. Kerberos or Apache Sentry.
- BI tools like Tableau, Kibana comes next to give a perfect wrap to the list of tools you are familiar with.
- Don’t forget to mention NoSQL database like HBase, Cassandra or MongoDB.
- Also mention your experience with cloud services like EC2, EMR etc.
We are not suggesting you to mention every single tool that was mentioned above. No one will be or expected to be an expert in all the tools from the Hadoop ecosystem. Specify only the tools you work with on a day to day basis and very familiar with.
5. DEBUGGING, TROUBLESHOOTING & OTHER IMPORTANT STUFF
As a Hadoop developer or an administrator or an architect you know managing a Hadoop cluster is no easy task. More number of nodes and process equals to more chances for failure. When we interview candidates for key positions we are most interested to know about the issues they run in to in the past and how they addressed them. The answer to this question speaks a lot about their expertise. Quite honestly more issues you have seen more expert you become in a specific area. So make sure to point out the troubleshooting or the debugging you have done with your Hadoop cluster.
Finally if you have done one off stuff which are key, make sure to call them out in your resume. Here are some examples. You might have –
- Configured a High Availability in your cluster
- Configured security with Kerberos
- Restore a failed Namenode
- Migrated your cluster from one datacenter to another
- Involved in a version upgrade
- Implemented and configured FAIR scheduler.
In the beginning of the post we showed a section of a resume which we said was poor quality. This post is not complete without showing a good quality one. Again this is just to give you an idea. There is no “One Size Fits All” approach in creating a Hadoop resume.
Client: Apparel Inc., Dec 2013 – Till Date Hadoop Consultant Apparel Inc is a high end fashion online retailer with a global online presence. Apparel Inc currently hold about 20% of the market share in high end fashion sector. Apparel Inc’s website and mobile application see close to a million hits every day. One of the key things we focus at Apparel Inc. is provide a unique and personalized customer experience when the user shops at our website or using our mobile application. This means understanding customer’s likes and dislikes, shopping patterns are key. At Apparel Inc., we collect and analyzes large amounts of data from our customers 24×7 from several data points – websites, mobile apps, credit card program, loyalty program, social media and coupons redemption. Data from these data points could be structured, semi-structured and unstructured in few cases. All these data is collected, aggregated and analyzed in the Hadoop cluster to find shopping patterns, customer preferences, identify cross sell or upsell business decisions and devise targeted marketing strategies as a result improving the overall user experience in the website. Responsibilities:
- Worked on a live 60 nodes Hadoop cluster running CDH4.4
- Worked with highly unstructured and semi structured data of 90 TB in size (270 TB with replication factor of 3)
- Extracted the data from Teradata into HDFS using Sqoop.
- Created and worked Sqoop (version 1.4.3) jobs with incremental load to populate Hive External tables.
- Extensive experience in writing Pig (version 0.11) scripts to transform raw data from several data sources into forming baseline data.
- Developed Hive (version 0.10) scripts for end user / analyst requirements to perform ad hoc analysis
- Very good understanding of Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance
- Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation and how does it translate to MapReduce jobs.
- Developed UDFs in Java as and when necessary to use in PIG and HIVE queries
- Experience in using Sequence files, RCFile, AVRO and HAR file formats.
- Developed Oozie workflow for scheduling and orchestrating the ETL process
- Implemented authentication using Kerberos and authentication using Apache Sentry.
- Worked with the admin team in designing and upgrading CDH 3 to CDH 4
- Good working knowledge of Amazon Web Service components like EC2, EMR, S3 etc
- Very good experience with both MapReduce 1 (Job Tracker) and MapReduce 2 (YARN) setups
- Very good experience in monitoring and managing the Hadoop cluster using Cloudera Manager.
- Good working knowledge of Cassandra
- Good Working knowledge of Tableau
Think of preparing a good Hadoop Resume as a one time investment. The time and effort that you put in creating a quality resume will definitely pay off in landing the dream career in Big Data field that you always wanted.
Wait !!! Don’t leave yet. Did you find the post to be helpful? Do you think it will benefit one of your friends? If yes, please use our social icons below to share with your friends.