X

Download Big Data PowerPoint Presentation

SlidesFinder-Advertising-Design.jpg

Login   OR  Register
X


Iframe embed code :



Presentation url :

Home / Education & Training / Education & Training Presentations / Big Data PowerPoint Presentation

Big Data PowerPoint Presentation

Ppt Presentation Embed Code   Zoom Ppt Presentation

PowerPoint is the world's most popular presentation software which can let you create professional Big Data powerpoint presentation easily and in no time. This helps you give your presentation on Big Data in a conference, a school lecture, a business proposal, in a webinar and business and professional representations.

The uploader spent his/her valuable time to create this Big Data powerpoint presentation slides, to share his/her useful content with the world. This ppt presentation uploaded by programmingsguru in Education & Training ppt presentation category is available for free download,and can be used according to your industries like finance, marketing, education, health and many more.

About This Presentation

Big Data Presentation Transcript

Slide 1 - Big Data 5 exabytes (1018 bytes) of data were created by human until 2003. Today this amount of information is created in two days. In 2012, digital world of data was expanded to 2.72 zettabytes (1021 bytes). It is predicted to double every two years, reaching about 8 zettabytes of data by 2015. IBM indicates that every day 2.5 exabytes of data created also 90% of the data produced in last two years. A personal computer holds about 500 gigabytes (109 bytes), so it would require about 20 billion PCs to store all of the world’s data. In the past, human genome decryption process takes approximately 10 years, now not more than a week.
Slide 2 - Big Data Multimedia data have big weight on internet backbone traffic and is expected to increase 70% by 2013. Only Google has got more than one million servers around the worlds. There have been 6 billion mobile subscriptions in the world and every day 10 billion text messages are sent. By the year 2020, 50 billion devices will be connected to networks and the internet .
Slide 3 - Big Data In 2012, The Human Face of Big Data accomplished as a global project, which is centering in real time collect, visualize and analyze large amounts of data. According to this media project many statistics are derived. Facebook has 955 million monthly active accounts using 70 languages, 140 billion photos uploaded, 125 billion friend connections, every day 30 billion pieces of content and 2.7 billion likes and comments have been posted. Every minute, 48 hours of video are uploaded and every day, 4 billion views performed on YouTube. Google support many services as both monitorizes 7.2 billion pages per day and processes 20 petabytes (1015 bytes) of data daily also translates into 66 languages. 1 billion Tweets every 72 hours from more than 140 million active users on Twitter. 571 new websites are created every minute of the day [23]. Within the next decade, number of information will increase by 50 times however number of information technology specialists who keep up with all that data will increase by 1.5 times. [5].
Slide 4 - Important Issues Big Data requires a revolutionary step forward from traditional data analysis, characterized by its three main components: variety, velocity and volume as shown in Figure 1 [3,8,13,17].
Slide 5 - Figure 1.The three Vs of big data
Slide 6 - ppt slide no 6 content not found
Slide 7 - Variety Variety makes big data really big. Big data comes from a great variety of sources and generally has in three types: structured, semi structured and unstructured. Structured data inserts a data warehouse already tagged and easily sorted but unstructured data is random and difficult to analyze. Semi- structured data does not conform to fixed fields but contains tags to separate data elements [4,17].
Slide 8 - Volume, Velocity Volume or the size of data now is larger than terabytes and petabytes. The grand scale and rise of data outstrips traditional store and analysis techniques [4,16]. Velocity is required not only for big data, but also all processes. For time limited processes, big data should be used as it streams into the organization in order to maximize its value [4,16].
Slide 9 - Analyzing Big Data Analyzing big data can require hundreds of servers running massively paralel software. That actually distinguishes big data, aside from its variety, volume and velocity, is the potential to analyze it to reveal new insights to optimize decision making.
Slide 10 - Big Data Samples Examples in the literature are available in are astronomy, atmospheric science, genomics, biogeochemical, biological science and research, life sciences, medical records, scientific research, government, natural disaster and resource management, private sector,
Slide 11 - Big Data Samples military surveillance, private sector, financial services, retail, social networks, web logs, text, document, photography, audio, video,
Slide 12 - Big Data Samples click streams, search indexing, call detail records, POS information, RFID, mobile phones, sensor networks and telecommunications [20].
Slide 13 - Big Data McKinsey Global Institute specified the potential of big data in five main topics [9]: Healthcare: clinical decision support systems, individual analytics applied for patient profile, personalized medicine, performance based pricing for personnel, analyze disease patterns, improve public health
Slide 14 - Big Data Public sector: creating transparency by accessible related data, discover needs, improve performance, customize actions for suitable products and services, decision making with automated systems to decrease risks, innovating new products and services Retail: in store behavior analysis, variety and price optimization, product placement design, improve performance, labor inputs optimization, distribution and logistics optimization, web based markets
Slide 15 - Big Data Manufacturing: improved demand forecasting, supply chain planning, sales support, developed production operations, web search based applications Personal location data: smart routing, geo targeted advertising or emergency response, urban planning, new business models
Slide 16 - Big Data Analysis Web provides kind of opportunities for big data too. For example; social network analysis such as understanding user intelligence for more targeted advertising, marketing campaigns and capacity planning, customer behavior and buying patterns also sentiment analytics. According to these inferences firms optimization their content and recommendation engine [1].
Slide 17 - Big Data Analysis Some companies such as Google and Amazon publishing articles related to their work. Inspired by the writings published, developers are developing similar technologies as open source software such as Lucene, Solr, Hadoop and HBase. Facebook, Twitter and LinkedIn are going a step further thereby publishing open source projects for big data like Cassandra, Hive, Pig, Voldemort, Storm, IndexTank.
Slide 18 - Methods Most enterprises are facing lots of new data, which arrives in many different forms. Big data has the potential to provide insights that can transform every business. Big data has generated a whole new industry of supporting architectures such as MapReduce. MapReduce is a programming framework for distributed computing which was created by Google using the divide and conquer method to break down complex big data problems into small units of work and process them in parallel [13].
Slide 19 - MapReduce MapReduce can be divided into two stages [10]: Map Step: The master node data is chopped up into many smaller subproblems. A worker node processes some subset of the smaller problems under the control of the JobTracker node and stores the result in the local file system where a reducer is able to access it. Reduce Step: This step analyzes and merges input data from the map steps. There can be multiple reduce tasks to parallelize the aggregation, and these tasks are executed on the worker nodes under the control of the JobTracker.
Slide 20 - Hadoop Hadoop created to inspire by BigTable which is Google’s data storage system, Google File System and MapReduce [6]. Hadoop is Java based framework and heterogeneous open source platform. It is not a replacement for database, warehouse or ETL (Extract, Transform, Load) strategy. Hadoop includes a distributed file system, analytics and data storage platforms and a layer that manages parallel computation, workflow and configuration administration [8,22].
Slide 21 - Hadoop It is not designed for real time complex event processing like streams. HDFS (Hadoop Distributed File System) runs across the nodes in a Hadoop cluster and connects together the file systems on many input and output data nodes to make them into one big file system [4,13,19].
Slide 22 - Hadoop offers : HDFS: A highly fault tolerant distributed file system that is responsible for storing data on the clusters. MapReduce: A powerful parallel programming technique for distributed processing on clusters. HBase: A scalable, distributed database for random read/write access. Pig: A high level data processing system for analyzing data sets that occur a high level language.
Slide 23 - Hadoop offers : Hive: A data warehousing application that provides a SQL like interface and relational model. Sqoop: A project for transferring data between relational databases and Hadoop. Avro: A system of data serialization. Oozie: A workflow for dependent Hadoop jobs. Chukwa: A Hadoop subproject as data accumulation system for monitoring distributed systems.
Slide 24 - Hadoop offers Flume: A reliable and distributed streaming log collection. ZooKeeper: A centralized service for providing distributed synchronization and group services
Slide 25 - HPCC (High Performance Computing Cluster HPCC (High Performance Computing Cluster) Systems distributed data intensive open source computing platform and provides big data workflow management services. Unlike Hadoop, HPCC’s data model defined by user. The key to complex problems can be stated easily with high level ECL basis. HPCC ensure that ECL is executed at the maximum elapsed time and nodes are processed in parallel. Furthermore HPCC Platform does not require third party tools like GreenPlum, Cassandra, RDBMS, Oozie, etc [22].
Slide 26 - The three main HPCC components are: HPCC Data Refinery (Thor) is a massively parallel ETL engine that enables data integration on a scale and provides batch oriented data manipulation. HPCC Data Delivery Engine (Roxie) is a massively parallel, high throughput, ultra fast, low latency, allows efficient multi user retrieval of data and structured query response engine. Enterprise Control Language (ECL) is automatically distributes workload between nodes, has automatic synchronization of algorithms, develop extensible machine learning library, has simple usage programming language optimized for big data operations and query transactions.
Slide 27 - Figure 2. Comparison between HPCC Systems Platform and Hadoop architecture
Slide 28 - Comparisons between HPCC Systems Platform and Hadoop in terms of architecture and stacks According to reference , some differences summarized below: HPCC clusters can be exercised in Thor and Roxie. Hadoop clusters perform with MapReduce processing. In HPCC environments ECL is primary programming language. However, Hadoop MapReduce processes are based on Java language. HPCC platform builds multikey and multivariate indexes on Distributed File System. Hadoop HBase procures column oriented database
Slide 29 - Comparisons between HPCC Systems Platform and Hadoop in terms of architecture and stacks Data warehouse abilities used in HPCC Roxie for structural queries and analyzer applications on the other hand Hadoop Hive provide data warehouse abilities and allow data to be loaded into HDFS. On the same hardware configuration a 400-node system, HPCC success is 6 minutes 27 seconds and Hadoop success is 25 minutes 28 seconds. This result showed that HPCC faster than Hadoop for this comparison.
Slide 30 - Knowledge Discovery from Big Data Knowledge Discovery from Data (KDD) entitle as some operations designed to get information from complicated data sets [6]. Reference [18] outlines the KDD at nine steps:
Slide 31 - Knowledge Discovery from Big Data Application domain prior to information and defining purpose of process from customer’s perspective. Generate subset data point for knowledge discovery. Removing noise, handling missing data fields, collecting required information to model and calculating time information and known changes. Finding useful properties to present data depending on purpose of job. Mapping purposes to a particular data mining methods.
Slide 32 - Knowledge Discovery from Big Data Choose data mining algorithm and method for searching data patterns. Researching patterns in expressional form. Returning any steps 1 through 7 for iterations also this step can include visualization of patterns. Using information directly, combining information into another system or simply enlisting and reporting.
Slide 33 - Reference [6] analyzes knowledge discovery from big data in three principles using Hadoop. These are: KDD includes a variety of analysis methods as distributed programming, pattern recognition, data mining, natural language processing, sentiment analysis, statistical and visual analysis and human computer interaction. Therefore architecture must support various methods and analysis techniques.
Slide 34 - Reference [6] analyzes knowledge discovery from big data in three principles using Hadoop. These are: Statistical analysis interested in summarizing massive datasets, understanding data and defining models for prediction. Data mining correlate with discovering useful models in massive data sets by itself, machine learning combine with data mining and statistical methods enabling machines to understand datasets. Visual analysis is developing area in which large datasets are serviced to users in challenging ways will be able to understand relationships.
Slide 35 - Reference [6] analyzes knowledge discovery from big data in three principles using Hadoop. These are: A comprehensive KDD architecture must procure to keep and operate process line. Preparation of data and batch analytics are made, for proper troubleshooting with errors, missing values and unusable format. Processing structured and semi structured data
Slide 36 - Reference [6] analyzes knowledge discovery from big data in three principles using Hadoop. These are: It is cardinal that making results accessible and foolproof. For this reason following approaches are used to overcome this issue. Using open source and popular standards Use WEB based architectures Publicly available results
Slide 37 - ppt slide no 37 content not found