New VCE and PDF Exam Dumps from PassLeader
➢ ➢ ➢
Vendor: Microsoft Exam Code: 70-775
Exam Name: Perform Data Engineering on Microsoft Azure HDInsight ➢
New Questions (Sep/2018)
Visit PassLeader and Download Full Version 70-775 Exam Dumps NEW QUESTION 41 You have an Apache Pig table named Sales in Apache HCatalog. You need to make the data in the table accessible from Apache Pig. Solution: You use the following script: A = STORE ‘Sales’ USING org.apache.hive.hcatalog.pig.HCatLoader(); Does this meet the goal? A. Yes B. No Answer: B Explanation: https://hortonworks.com/hadoop-tutorial/how-to-use-hcatalog-basic-pig-hive-commands/ NEW QUESTION 42 You have an Apache Pig table named Sales in Apache HCatalog. You need to make the data in the table accessible from Apache Pig. Solution: You use the following script: A = LOAD ‘Sales’ USING org.apache.hive.hcatalog.pig.HCatLoader(); Does this meet the goal? A. Yes B. No Answer: A Explanation: https://hortonworks.com/hadoop-tutorial/how-to-use-hcatalog-basic-pig-hive-commands/ NEW QUESTION 43 You are implementing a batch processing solution by using Azure HDInsight. You have a workflow that retrieves data by using a U-SQL query. You need to provide the ability to query and combine data from multiple data sources. What should you do? A. B. C. D.
Use a shuffle join in an Apache Hive query that stores the data in a JSON format. Use a broadcast join in an Apache Hive query that stores the data in an ORC format. Increase the number of spark.executor.cores in an Apache Spark job that stores the data in a text format. Increase the number of spark.executor.instances in an Apache Spark job that stores the data in a text format.
70-775 Exam Dumps
70-775 Exam Questions
70-775 PDF Dumps
https://www.passleader.com/70-775.html
70-775 VCE Dumps
New VCE and PDF Exam Dumps from PassLeader E. F. G. H.
Decrease the level of parallelism in an Apache Spark job that stores the data in a text format. Use an action in an Apache Oozie workflow that stores the data in a text format. Use an Azure Data Factory linked service that stores the data in Azure Data Lake. Use an Azure Data Factory linked service that stores the data in an Azure DocumentDB database.
Answer: G Explanation: https://www.sqlchick.com/entries/2017/10/29/two-ways-to-approach-federated-queries-with-u-sqland-azure-data-lake-analytics NEW QUESTION 44 You are implementing a batch processing solution by using Azure HDInsight. You have two tables. Each table is larger than 250 TB. Both tables have approximately the same number of rows and columns. You need to match the tables based on a key column. You must minimize the size of the data table that is produced. What should you do? A. B. C. D. E. F. G. H.
Use a shuffle join in an Apache Hive query that stores the data in a JSON format. Use a broadcast join in an Apache Hive query that stores the data in an ORC format. Increase the number of spark.executor.cores in an Apache Spark job that stores the data in a text format. Increase the number of spark.executor.instances in an Apache Spark job that stores the data in a text format. Decrease the level of parallelism in an Apache Spark job that stores the data in a text format. Use an action in an Apache Oozie workflow that stores the data in a text format. Use an Azure Data Factory linked service that stores the data in Azure Data Lake. Use an Azure Data Factory linked service that stores the data in an Azure DocumentDB database.
Answer: A Explanation: http://www.openkb.info/2014/11/understanding-hive-joins-in-explain.html NEW QUESTION 45 You deploy Apache Kafka to an Azure HDInsight cluster. You plan to load data into a topic that has a specific schema. You need to load the data while maintaining the existing schema. Which file format should you use to receive the data? A. B. C. D.
JSON Kudu Apache Sequence CSV
Answer: A Explanation: https://docs.microsoft.com/en-us/azure/hdinsight/kafka/apache-kafka-auto-create-topics NEW QUESTION 46 You have an Apache Interactive Hive cluster in Azure HDInsight. The cluster has 12 processors and 96 GB of RAM. The YARN container size is set to 2 GB and the Tez container size is 3 GB. You configure one Tez container per processor. You are performing map joints between a 2 GB dimension table and a 96 GB fact table. You experience slow performance due to an inadequate utilization of the available resources. You need to ensure that the map joins are used. Which two settings should you configure? (Each correct answer presents part of the solution.Choose two.) A. SET hive.tez.container.size=98304MB 70-775 Exam Dumps
70-775 Exam Questions
70-775 PDF Dumps
https://www.passleader.com/70-775.html
70-775 VCE Dumps
New VCE and PDF Exam Dumps from PassLeader B. C. D. E.
SET hive.auto.convert.join.noconditionaltask.size=2048MB SET yarn.scheduler.minimum-allocation-mb=6144MB SET hive.auto.convert.join.noconditionaltask.size=3072MB SET hive.tez.container.size=6144MB
Answer: AC Explanation: https://hortonworks.com/blog/how-to-plan-and-configure-yarn-in-hdp-2-0/ https://www.justanalytics.com/blog/apache-hive-memory-management-tuning NEW QUESTION 47 You have an array of integers in Apache Spark. You need to save the data to an Apache Parquet file. Which methods should you use? A. B. C. D.
take an .toDF makeRDD and sqlContext createDataSet sqlContext load and makeRDD makeRDD and sqlContext createDataFrame
Answer: D Explanation: https://spark.apache.org/docs/1.5.2/sql-programming-guide.html#data-types NEW QUESTION 48 You have an Apache Spark cluster in Azure HDInsight. Users report that Spark jobs take longer than expected to complete. You need to reduce the amount of time it takes for the Spark jobs to complete. What should you do? A. B. C. D.
From HDFS, modify the maximum thread setting. From Spark, modify the spark_thrift_cmd_opts parameter. From YARN, modify the container size setting. From Spark, modify the spark.executor.cores parameter.
Answer: D Explanation: https://rea.tech/how-we-optimize-apache-spark-apps/ NEW QUESTION 49 You are configuring an Apache Phoenix operation on top of an Apache HBase server. The operation executes a statement that joins an Apache Hive table and a Phoenix table. You need to ensure that when the table is dropped, the table files are retained, but the table metadata is removed from the Apache HCatalog. Which type of table should you use? A. B. C. D.
internal external temp Azure Table Storage
Answer: B Explanation: https://phoenix.apache.org/hive_storage_handler.html
70-775 Exam Dumps
70-775 Exam Questions
70-775 PDF Dumps
https://www.passleader.com/70-775.html
70-775 VCE Dumps
New VCE and PDF Exam Dumps from PassLeader NEW QUESTION 50 You have an Apache Hive cluster in Azure HDInsight. You plan to ingest on-premises data into Azure Storage. You need to automate the copying of the data to Azure Storage. Which tool should you use? A. B. C. D.
Microsoft Azure Storage Explorer Azure Import/Export Service Azure Backup AzCopy
Answer: D Explanation: https://docs.microsoft.com/en-us/azure/data-factory/tutorial-hybrid-copy-data-tool NEW QUESTION 51 You have an Apache HBase cluster in Azure HDInsight. You plan to use Apache Pig, Apache Hive, and HBase to access the cluster simultaneously and to process data stored in a single platform. You need to deliver consistent operations, security, and data governance. What should you use? A. B. C. D.
Apache Ambari MapReduce Apache Oozie YARN
Answer: D Explanation: https://hortonworks.com/blog/hbase-hive-better-together/ NEW QUESTION 52 You have several Linux-based and Windows-based Azure HDInsight clusters. The clusters are indifferent Active Directory domains. You need to consolidate system logging for all of the clusters into a single location. The solution must provide near real-time analytics of the log data. What should you use? A. B. C. D.
Apache Ambari YARN Microsoft System Center Operations Manager Microsoft Operations Management Suite (OMS)
Answer: A Explanation: https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-log-management NEW QUESTION 53 You have an Apache Spark job. The performance of the job deteriorates over time. You plan to debug the job. You need to gather information that you can use to debug the job. Which tool should you use? A. B. C. D.
YARN Spark History Server HDInsight Cluster Dashboard Jupyter Notebook
70-775 Exam Dumps
70-775 Exam Questions
70-775 PDF Dumps
https://www.passleader.com/70-775.html
70-775 VCE Dumps
New VCE and PDF Exam Dumps from PassLeader Answer: A Explanation: https://docs.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-job-debugging NEW QUESTION 54 ......
Visit PassLeader and Download Full Version 70-775 Exam Dumps
70-775 Exam Dumps
70-775 Exam Questions
70-775 PDF Dumps
https://www.passleader.com/70-775.html
70-775 VCE Dumps