As Data Scientist you will work closely with clients, data stewards, project/program managers, and other IT teams to turn data into critical information and knowledge that can be used to make sound organizational decisions. Other responsibilities include providing data that is congruent and reliable. They need to be creative thinkers and propose innovative ways to look at problems by using data mining (the process of discovering new patterns from large datasets) approaches on the set of information available. They will need to validate their findings using an experimental and iterative approach. Also, Data Scientists will need to be able to present back their findings to the business by exposing their assumptions and validation work in a way that can be easily understood by their business counterparts.
You will need a combination of business focus, strong analytical and problem solving skills and programming knowledge to be able to quickly cycle hypothesis through the discovery phase of the project. Excellent written and communications skills to report back the findings in a clear, structured manner are required.
- Typically requires +2 years’ experience manipulating large datasets and using databases, and +2 years’ experience with a general-purpose programming language (such as Hadoop MapReduce or other big data frameworks, Java).
- Designs experiments, test hypotheses, and build models.
- Conducts data analysis and moderately complex designs algorithm.
- Works with stakeholders to identify the business requirements and the expected outcome.
- Works with and alongside business analysts by suggesting other products of interest to the client.
- Models and frames business scenarios that are meaningful and which impact on critical business processes and/or decisions.
- Collaborates with subject matter experts to select the relevant sources of information.
- Works with team leaders and members to solve client analytics problems and documents results and methodologies.
- Works in iterative processes within IT and validates findings.
- Performs experimental design approaches to validate finding or test hypotheses.
- Validates analysis by comparing appropriate samples.
- Employs the appropriate algorithm to discover patterns.
- Uses the expected qualification and assurance of the information to quantify the accuracy metrics of the analysis.
- Qualifies where information can be stored or what information, external to the organization, may be used in support of the use case.
- Assesses the volume of data supporting the initiative, the type of data (e.g., images, text, clickstream or metering data) and the speed or sudden variations in data collection.
- Collaborates with the data steward to ensure that the information used follows the compliance, access management, and control policies and that it meets the qualification and assurance requirements.
- Recommends ongoing improvements to methods and algorithms that lead to findings, including new information.
- Presents and depicts the rationale of their findings in easy to understand terms for the business.
- Presents back results that contradict common belief, if needed.
- Communicates and works with business subject matter experts.
- May educate the organization both from IT and the business perspectives on new approaches, such as testing hypotheses and statistical validation of results.
- Helps the organization understand the principles and the math behind the process to drive organizational buy-in.
- Provides business metrics for the overall project to show improvements (contribution to the improvement should be monitored initially and over multiple iterations).
- Demonstrates the following scientist qualities: clarity, accuracy, precision, relevance, depth, breadth, logic, significance, and fairness.
- Provides on-going tracking and monitoring of performance of decision systems and statistical models.
- Implements enhancements and fixes to systems as needed.
- Bachelor’s degree in mathematics, statistics or computer science or related field.
- Experience in the use of statistical packages.
- Familiarity with basic principles of distributed computing and/or distributed databases.
- Demonstrable ability to quickly understand new concepts-all the way down to the theorems- and to come out with original solutions to mathematical issues.
- Good communication and interpersonal skills.
- Knowledge of one or more business/functional areas.
Join our Talent Community
If you're ready to make a difference in the world, you can do it here.Join