By Padma Priya Chitturi
- Use Apache Spark for information processing with those hands-on recipes
- Implement end-to-end, large-scale facts research greater than ever before
- Work with robust libraries similar to MLLib, SciPy, NumPy, and Pandas to achieve insights out of your data
Spark has emerged because the so much promising huge info analytics engine for facts technology pros. the real strength and cost of Apache Spark lies in its skill to execute facts technological know-how initiatives with pace and accuracy. Spark's promoting element is that it combines ETL, batch analytics, real-time movement research, laptop studying, graph processing, and visualizations. It permits you to take on the complexities that include uncooked unstructured information units with ease.
This advisor gets you cozy and assured appearing info technological know-how projects with Spark. you'll find out about implementations together with disbursed deep studying, numerical computing, and scalable computing device studying. you may be proven potent ideas to complex options in info technological know-how utilizing Spark's information technological know-how libraries corresponding to MLLib, Pandas, NumPy, SciPy, and extra. those easy and effective recipes will help you enforce algorithms and optimize your work.
What you are going to learn
- Explore the subjects of information mining, textual content mining, average Language Processing, details retrieval, and computing device learning.
- Solve real-world analytical issues of huge info sets.
- Address facts technological know-how demanding situations with analytical instruments on a allotted method like Spark (apt for iterative algorithms), which bargains in-memory processing and extra flexibility for info research at scale.
- Get hands-on event with algorithms like category, regression, and advice on actual datasets utilizing Spark MLLib package.
- Learn approximately numerical and clinical computing utilizing NumPy and SciPy on Spark.
- Use Predictive version Markup Language (PMML) in Spark for statistical facts mining models.
About the Author
Padma Priya Chitturi is Analytics Lead at Fractal Analytics Pvt Ltd and has over 5 years of expertise in great information processing. at present, she is a part of strength improvement at Fractal and chargeable for resolution improvement for analytical difficulties throughout a number of company domain names at huge scale. sooner than this, she labored for an airways product on a real-time processing platform serving a million consumer requests/sec at Amadeus software program Labs. She has labored on understanding large-scale deep networks (Jeffrey dean's paintings in Google mind) for photo class at the enormous facts platform Spark. She works heavily with gigantic information applied sciences reminiscent of Spark, hurricane, Cassandra and Hadoop. She was once an open resource contributor to Apache Storm.
Table of Contents
- Big facts Analytics with Spark
- Tricky statistics with Spark
- Data research with Spark
- Clustering, type, and Regression
- Working with Spark MLlib
- NLP with Spark
- Working with gleaming Water - H2O
- Data Visualization with Spark
- Deep studying on Spark
- Working with SparkR
Read or Download Apache Spark for Data Science Cookbook PDF
Similar data modeling & design books
M. David Merrill has been energetic within the box of tutorial know-how for nearly forty years. His contributions variety from easy tutorial ideas and educational layout conception to improvement and implementation of studying environments. techniques in educational expertise is a set of unique essays written via top students and practitioners who've labored with and been encouraged via Professor Merrill.
In DetailPublishing games on-line has been gaining in acceptance for a couple of years, yet with the arrival of social networks and using in-game facts research lately, its capability profitability has skyrocketed. the facility of online game analytics is immensely valuable if performed good; it will probably offer loads of info with a excessive point of relevancy.
Info technology libraries, frameworks, modules, and toolkits are nice for doing facts technological know-how, yet they’re additionally so that it will dive into the self-discipline with out really realizing info technological know-how. during this publication, you’ll find out how a number of the such a lot primary info technological know-how instruments and algorithms paintings through imposing them from scratch.
Facts Modeling around journey Engineering utilizing Oracle information Modeler
- Integration of AI and OR Techniques in Constraint Programming: 14th International Conference, CPAIOR 2017, Padua, Italy, June 5-8, 2017, Proceedings (Lecture Notes in Computer Science)
- Scilab : De la théorie à la pratique - II. Modéliser et simuler avec Xcos (French Edition)
Extra resources for Apache Spark for Data Science Cookbook
Apache Spark for Data Science Cookbook by Padma Priya Chitturi