大數(shù)據(jù)常見錯誤及其解決方案-經(jīng)管之家官網(wǎng)！

經(jīng)濟(jì)學(xué) 管理學(xué) 金融學(xué) 統(tǒng)計學(xué)

您當(dāng)前的位置> 期刊>>

大數(shù)據(jù)常見錯誤及其解決方案

人大經(jīng)濟(jì)論壇-經(jīng)管之家：分享大學(xué)、考研、論文、會計、留學(xué)、數(shù)據(jù)、經(jīng)濟(jì)學(xué)、金融學(xué)、管理學(xué)、統(tǒng)計學(xué)、博弈論、統(tǒng)計年鑒、行業(yè)分析包括等相關(guān)資源。
經(jīng)管之家是國內(nèi)活躍的在線教育咨詢平臺!

提供"微信號、微博、抖音、快手、頭條、小紅書、百家號、企鵝號、UC號、一點資訊"等虛擬賬號交易，真正實現(xiàn)買賣雙方的共贏�！菊堻c擊這里訪問】

TOP熱門關(guān)鍵詞

大數(shù)據(jù)常見錯誤及其解決方案1、用./bin/spark-shell啟動spark時遇到異常：java.net.BindException:Can'tassignrequestedaddress:Service'sparkDriver'failedafter16retries!解決方法：addexportSPARK_LOCAL_IP="12 ...

免費學(xué)術(shù)公開課,掃碼加入

大數(shù)據(jù)常見錯誤及其解決方案

1、用./bin/spark-shell啟動spark時遇到異常：java.net.BindException: Can't assign requested address: Service 'sparkDriver' failed after 16 retries!

解決方法：add export SPARK_LOCAL_IP="127.0.0.1" to spark-env.sh

點擊進(jìn)入『21世紀(jì)大數(shù)據(jù)人才匯聚領(lǐng)域』

點擊進(jìn)入『21世紀(jì)數(shù)據(jù)分析精英聚集地』

點擊進(jìn)入『21世紀(jì)大數(shù)據(jù)高薪就業(yè)領(lǐng)地』

2、java Kafka producer error:ERROR kafka.utils.Utils$ - fetching topic metadata for topics [Set(words_topic)] from broker [ArrayBuffer(id:0,host: xxxxxx,port:9092)] failed

解決方法：Set 'advertised.host.name' on server.properties of Kafka broker to server's realIP(same to producer's 'metadata.broker.list' property)

3、java.net.NoRouteToHostException: No route to host

解決方法：zookeeper的IP要配對

4、Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer) java.net.UnknownHostException: linux-pic4.site:

解決方法：add your hostname to /etc/hosts: 127.0.0.1 localhost linux-pic4.site

5、org.apache.spark.SparkException: A master URL must be set in your configuration

解決方法：SparkConf sparkConf = new SparkConf().setAppName("JavaDirectKafkaWordCount").setMaster("local");

6、Failed to locate the winutils binary in the hadoop binary path

解決方法:先安裝好hadoop

7、啟動spark時： Failed to get database default, returning NoSuchObjectException

解決方法：1)Copy winutils.exe from here(https://github.com/steveloughran ... er/hadoop-2.6.0/bin) to some folder say, C:\Hadoop\bin. Set HADOOP_HOME to C:\Hadoop.2）Open admin command prompt. Run C:\Hadoop\bin\winutils.exe chmod 777 /tmp/hive

8、org.apache.spark.SparkException: Only one SparkContext may be running in this JVM (see SPARK-2243). To ignore this error, set spark.driver.allowMultipleContexts = true.

解決方法：Use this constructor JavaStreamingContext(sparkContext: JavaSparkContext, batchDuration: Duration)替代new JavaStreamingContext(sparkConf, Durations.seconds(5));

9、Reconnect due to socket error: java.nio.channels.ClosedChannelException

解決方法：kafka服務(wù)器broker ip寫對

10、java.lang.IllegalArgumentException: requirement failed: No output operations registered, so nothing to execute

解決方法：tranformation最后一步產(chǎn)生的那個RDD必須有相應(yīng)Action操作，例如massages.print()等

11、經(jīng)驗：spark中數(shù)據(jù)寫入ElasticSearch的操作必須在action中以RDD為單位執(zhí)行

12、 Problem binding to [0.0.0.0:50010] java.net.BindException: Address already in use;

解決方法：master和slave配置成同一個IP導(dǎo)致的，要配成不同IP

13、CALL TO LOCALHOST/127.0.0.1:9000

解決方法：host配置正確，/etc/sysconfig/network /etc/hosts /etc/sysconfig/network-scripts/ifcfg-eth0

13、打開namenode:50070頁面，Datanode Infomation只顯示一個節(jié)點

解決方法：SSH配置錯誤導(dǎo)致，主機(jī)名一定要嚴(yán)格匹配，重新配置ssh免密碼登錄

14、經(jīng)驗：搭建集群時要首先配置好主機(jī)名，并重啟機(jī)器讓配置的主機(jī)名生效

15、INFO hdfs.DFSClient: Exception in createBlockOutputStreamjava.net.NoRouteToHostException: No route to host

解決方法：如果主從節(jié)點能相互ping通，那就關(guān)掉防火墻 service iptables stop

16、經(jīng)驗：不要隨意格式化HDFS，這會帶來數(shù)據(jù)版本不一致等諸多問題，格式化前要清空數(shù)據(jù)文件夾

17、namenode1: ssh: connect to host namenode1 port 22: Connection refused

解決方法：sshd被關(guān)閉或沒安裝導(dǎo)致，which sshd檢查是否安裝，若已經(jīng)安裝，則sshd restart，并ssh 本機(jī)hostname，檢查是否連接成功

18、Log aggregation has not completed or is not enabled.

解決方法：在yarn-site.xml中增加相應(yīng)配置，以支持日志聚合

19、failed to launch org.apache.spark.deploy.history.History Server full log in

解決方法：正確配置spark-defaults.xml,spark-en.sh中SPARK_HISTORY_OPTS屬性

20、Exception in thread "main" org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.

解決方法：yarn-lient模式出現(xiàn)的異常，暫時無解

21、hadoop的文件不能下載以及YARN中Tracking UI不能訪問歷史日志

解決方法：windows系統(tǒng)不能解析域名所致，把hosts文件hostname復(fù)制到windows的hosts中

22、經(jīng)驗：HDFS文件路徑寫法為：hdfs://master:9000/文件路徑，這里的master是namenode的hostname,9000是hdfs端口號。

23、Yarn JobHistory Error: Failed redirect for container

解決方法：將 http://:19888/jobhistory/logs配置到y(tǒng)arn-site.xml中，重啟yarn和JobHistoryServer

24、通過hadoop UI訪問hdfs文件夾時，出現(xiàn)提示 Permission denied: user=dr.who

解決方法：namonode節(jié)點終端執(zhí)行：hdfs dfs -chmod -R 755 /

25、經(jīng)驗：Spark的Driver只有在Action時才會收到結(jié)果

26、經(jīng)驗：Spark需要全局聚合變量時應(yīng)當(dāng)使用累加器（Accumulator）

27、經(jīng)驗：Kafka以topic與consumer group劃分關(guān)系，一個topic的消息會被訂閱它的消費者組全部消費，如果希望某個consumer使用topic的全部消息，可將該組只設(shè)一個消費者，每個組的消費者數(shù)目不能大于topic的partition總數(shù)，否則多出的consumer將無消可費

28、java.lang.NoSuchMethodError: com.google.common.util.concurrent.MoreExecutors.directExecutor()Ljava/util/concurrent/Executor;

解決方法：統(tǒng)一ES版本，盡量避免直接在spark中創(chuàng)建ES client

29、eturned Bad Request(400) - failed to parse;Compressor detection can only be called on some xcontent bytes or compressed xcontent bytes; Bailing out..

解決方法：寫入ES的數(shù)據(jù)格式糾正

30、java.util.concurrent.TimeoutException: Cannot receive any reply in 120 seconds

解決方法：確保所有節(jié)點之間能夠免密碼登錄

31、集群模式下，spark無法向elasticsearch寫入數(shù)據(jù)

解決方法：采用這種寫入方式（帶上es配置的Map參數(shù)）results.foreachRDD(javaRDD -> {JavaEsSpark.saveToEs(javaRDD, esSchema, cfg);return null;});

32、經(jīng)驗：所有自定義類要實現(xiàn)serializable接口，否則在集群中無法生效

33、經(jīng)驗：resources資源文件讀取要在Spark Driver端進(jìn)行，以局部變量方式傳給閉包函數(shù)

34、通過nio讀取資源文件時，java.nio.file.FileSystemNotFoundExceptionat com.sun.nio.zipfs.ZipFileSystemProvider.getFileSystem(ZipFileSystemProvider.java:171)

解決方法：打成jar包后URI發(fā)生變化所致，形如jar:file:/C:/path/to/my/project.jar!/my-folder，要采用以下解析方式，

final Map env = new HashMap<>();

final String[] array = uri.toString().split("!");

final FileSystem fs = FileSystems.newFileSystem(URI.create(array[0]), env);

final Path path = fs.getPath(array[1]);

35、經(jīng)驗：DStream流轉(zhuǎn)化只產(chǎn)生臨時流對象，如果要繼續(xù)使用，需要一個引用指向該臨時流對象

36、經(jīng)驗：提交到y(tǒng)arn cluster的作業(yè)不能直接print到控制臺，要用log4j輸出到日志文件中

37、java.io.NotSerializableException: org.apache.log4j.Logger

解決方法：序列化類中不能包含不可序列化對象，you have to prevent logger instance from default serializabtion process, either make it transient or static. Making it static final is preferred option due to many reason because if you make it transient than after deserialization logger instance will be null and any logger.debug() call will result in NullPointerException in Java because neither constructor not instance initializer block is called during deserialization. By making it static and final you ensure that its thread-safe and all instance of Customer class can share same logger instance, By the way this error is also one of the reason Why Logger should be declared static and final in Java program.

38、log4j:WARN Unsupported encoding

解決方法：1.把UTF改成小寫utf-8 2.設(shè)置編碼那行有空格

39、MapperParsingException[Malformed content, must start with an object

解決方法：采用接口JavaEsSpark.saveJsonToEs，因為saveToEs只能處理對象不能處理字符串

40、 ERROR ApplicationMaster: SparkContext did not initialize after waiting for 100000 ms. Please check earlier log output for errors. Failing the application

解決方法：資源不能分配過大,或者沒有把.setMaster("local

")去掉

41、WARN Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)

解決方法：配置文件broker編號要寫對，命令中的IP寫真實IP

42、 User class threw exception: org.apache.spark.SparkException: org.apache.spark.SparkException: Couldn't find leaders for Set([mywaf,7], [mywaf,1])

解決方法：正確配置kafka，并重新創(chuàng)建topic

43、在ES界面發(fā)現(xiàn)有節(jié)點shard分片不顯示

解決方法：該節(jié)點磁盤容量不足，清理磁盤增加容量

44、The method updateStateByKey(Function2,Optional,Optional>, int) in the type JavaPairDStream is not applicable for the arguments (Function2,Optional,Optional>, int)

解決方法：Spark use com.google.common.base.Optional not jdk default package java.util.Optional

45、NativeCrc32.nativeComputeChunkedSumsByteArray

解決方法：配置eclipse的hadoop-home，bin和system32文件夾中加入64位的2.6版本的hadoop.dll

46、經(jīng)驗：Spark Streaming包含三種計算模式：nonstate 、stateful 、window

47、Yarn的RM單點故障

解決方法：通過三節(jié)點zookeeper集群和yarn-site.xml配置文件完成Yarn HA

48、經(jīng)驗：kafka可通過配置文件使用自帶的zookeeper集群

49、經(jīng)驗：Spark一切操作歸根結(jié)底是對RDD的操作

50、如何保證kafka消息隊列的強(qiáng)有序

解決方法：把需要強(qiáng)有序的topic只設(shè)置一個partition

51、linux批量多機(jī)互信

解決方法：pub秘鑰配成一個

52、org.apache.spark.SparkException: Failed to get broadcast_790_piece0 of broadcast_790

解決方法：去除spark-defaults.conf中spark.cleaner.ttl配置

53、Yarn HA環(huán)境下，通過web訪問history日志被跳轉(zhuǎn)到8088而無法顯示

解決方法：恢復(fù)Yarn Http默認(rèn)端口8088

54、but got no response. Marking as slave lost

解決方法：使用yarn client提交作業(yè)遇到這種情況，暫時無解

55、Using config: /work/poa/zookeeper-3.4.6/bin/../conf/zoo.cfgError contacting service. It is probably not running.

解決方法：配置文件不正確，例如hostname不匹配等

56、經(jīng)驗：部署Spark任務(wù)，不用拷貝整個架包，只需拷貝被修改的文件，然后在目標(biāo)服務(wù)器上編譯打包。

57、Spark setAppName doesn't appear in Hadoop running applications UI

解決方法：set it in the command line for spark-submit "--name BetterName"

58、如何監(jiān)控Sprak Streaming作業(yè)是否掛掉

解決方法：通過監(jiān)控Driver端口或者根據(jù)yarn指令寫Linux定時腳本監(jiān)控

59、kafka內(nèi)外網(wǎng)問題

解決方法：kafka機(jī)器雙網(wǎng)卡，配置文件server.properties中advertised.host.name不要寫IP，用域名形式，外網(wǎng)的生產(chǎn)者和內(nèi)網(wǎng)的消費者各自解析成自己所需的IP。

60、經(jīng)驗：kafka的log.dirs不要設(shè)置成/tmp下的目錄，貌似tmp目錄有文件數(shù)和磁盤容量限制

61、kafka搬機(jī)器后，在新的集群，topic被自動創(chuàng)建，且只有一臺broker負(fù)載

解決方法：server.properties中加上delete.topic.enable=true和auto.create.topics.enable=false，刪除舊的topic，重新創(chuàng)建topic，重啟kafka

62、安裝sbt，運行sbt命令卡在Getting org.scala-sbt sbt 0.13.6 ...

解決方法：sbt takes some time to download its jars when it is run first time，不要退出，直至sbt處理完

63、經(jīng)驗：ES的分片類似kafka的partition

64、kafka出現(xiàn)OOM異常

解決方法：進(jìn)入kafka broker啟動腳本中，在export KAFKA_HEAP_OPTS="-Xmx24G -Xms1G"調(diào)大JVM堆內(nèi)存參數(shù)

65、linux服務(wù)器磁盤爆滿，檢查超過指定大小的文件

解決方法：find / -type f -size +10G

66、spark-direct kafka streaming限速

解決方法：spark.streaming.kafka.maxRatePerPartition，配置每秒每個kafka分區(qū)讀取速率

67、org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: Found unrecoverable error returned Not Found(404) - [EngineClosedException CurrentState[CLOSED]

解決方法：在kopf插件中對該索引先close再open即可。造成原因可能是Index創(chuàng)建時有shard壞掉。

68、Job aborted due to stage failure: Task not serializable:

解決方法：Serializable the class;Declare the instance only within the lambda function passed in map;Make the NotSerializable object as a static and create it once per machine;Call rdd.forEachPartition and create the NotSerializable object in there

69、Pipeline write will fail on this Pipeline because it contains a stage which does not implement Writable

解決方法：this cannot be done as of Spark 1.6,需升級spark版本

70、IDEA從git導(dǎo)入scala項目，通篇提示變量never used

解決方法：將src文件夾mark directory as sources root

71、Run configuration in IntelliJ result in "Cannot start compilation: the output path is not specified for module "xxx". Specify the output path in Configure Project.

解決方法：In the default intellij options, "Make" was checked as "Before Launch". Unchecking it fixed the issue.

72、UDFRegistration$$anonfun$register$26$$anonfun$apply$2 cannot be cast to scala.Function1

解決方法：聚合函數(shù)不能用UDF，而應(yīng)該定義UDAF

73、SPARK SQL replacement for mysql GROUP_CONCAT aggregate function

解決方法：自定義UDAF

74、在intellij idea的maven項目中，無法New scala文件

解決方法：pom.xml加入scala-tools插件相關(guān)配置，下載并更新

75、Error:scala: Error: org.jetbrains.jps.incremental.scala.remote.ServerException

解決方法：修改pom.xml配置文件，把scala換到最新版本

76、HADOOP 磁盤滿的各節(jié)點平衡

解決方法：運行指令hdfs balancer -Threshold 3 或者運行 start-balancer.sh 腳本格式：$Hadoop_home/bin/start-balancer.sh -threshold，參數(shù)3是比例參數(shù)，表示3%，也就是平各個DataNode直接磁盤使用率偏差在3%以內(nèi)

77、經(jīng)驗：sparkSQL UDAF中update函數(shù)的第二個參數(shù) input: Row 對應(yīng)的并非DataFrame的行，而是被inputSchema投影了的行

78、Error: No TypeTag available for StringsqlContext.udf.register()

解決方法：scala版本不一致，統(tǒng)一所有scala版本

79、How to add a constant column in a Spark DataFrame?

解決方法：The second argument for DataFrame.withColumn should be a Column so you have to use a literal: df.withColumn('new_column', lit(10))

80、Error:scalac:Error:object VolatileDoubleRef does not have a member create

解決方法：scala版本不一致，統(tǒng)一開發(fā)環(huán)境和系統(tǒng)的scala版本

81、java.lang.NoSuchMethodError: scala.collection.immutable.HashSet$.empty()Lscala/collection/immutable/HashSet

解決方法：統(tǒng)一scala和spark的scala版本

82、maven項目打包去除不要的依賴，防止目標(biāo)jar容量過大

解決方法：在中加入provided標(biāo)明該依賴不放進(jìn)目標(biāo)jar,并用maven shaded方式打包

83、maven打包scala和java的混合項目

解決方法：使用指令 mvn clean scala:compile compile package

84、sparkSQL的udf無法注冊UDAF聚合函數(shù)

解決方法：把UDAF自定義類的object關(guān)鍵字改成class聲明

85、經(jīng)驗：運行時刪除hadoop數(shù)據(jù)目錄會導(dǎo)致依賴HDFS的JOB失效

86、[IllegalArgumentException[Document contains at least one immense term in field=XXX

解決方法：在ES中創(chuàng)建索引時對長文本字段要分詞

87、maven shade打包資源文件沒有打進(jìn)去

解決方法：把resources文件夾放到src/main/下面，與scala或java文件夾并排

88、經(jīng)驗：spark Graph根據(jù)邊集合構(gòu)建圖，頂點集合只是指定圖中哪些頂點有效

89、ES寫query用到正則匹配時，Determinizing automaton would result in more than 10000 states.

解決方法：正則表達(dá)式的字符串太長，復(fù)雜度過高，正則匹配要精練，不要枚舉式匹配

90、java.lang.StackOverflowError at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:53)

解決方法：sql語句的where條件過長，字符串棧溢出

91、org.apache.spark.shuffle.MetadataFetchFailedException:Missing an output location for shuffle 0

解決方法：加大executor內(nèi)存，減少executor個數(shù)，加大executor并發(fā)度

92、ExecutorLostFailure (executor 3 exited caused by one of the running tasks) Reason: Container killed by YARN for exceeding memory limits. 61.0 GB of 61 GB physical memory used

解決方法：移除RDD緩存操作，增加該JOB的spark.storage.memoryFraction系數(shù)值，增加該job的spark.yarn.executor.memoryOverhead值

93、EsRejectedExecutionException[rejected execution (queue capacity 1000) on org.elasticsearch.search.action.SearchServiceTransportAction

解決方法:減少spark并發(fā)數(shù)，降低對ES的并發(fā)讀取

94、經(jīng)驗：單個spark任務(wù)的excutor核數(shù)不宜設(shè)置過高，否則會導(dǎo)致其他JOB延遲

95、經(jīng)驗：數(shù)據(jù)傾斜只發(fā)生在shuffle過程，可能觸發(fā)shuffle操作的算子有：distinctgroupByKeyreduceByKeyaggregateByKeyjoincogrouprepartition等

96、如何定位spark的數(shù)據(jù)傾斜

解決方法：在Spark Web UI看一下當(dāng)前stage各個task分配的數(shù)據(jù)量以及執(zhí)行時間，根據(jù)stage劃分原理定位代碼中shuffle類算子

97、如何解決spark數(shù)據(jù)傾斜

解決方法：1）過濾少數(shù)導(dǎo)致傾斜的key（僅限于拋棄的Key對作業(yè)影響很�。�，2）提高shuffle操作并行度（提升效果有限），3）兩階段聚合（局部聚合+全局聚合），先對相同的key加前綴變成多個key，局部shuffle后再去掉前綴，再次進(jìn)行全局shuffle（僅適用于聚合類的shuffle操作，效果明顯，對于join類的shuffle操作無效），4）將reduce join轉(zhuǎn)為map join，將小表進(jìn)行廣播，對大表map操作，遍歷小表數(shù)據(jù)（僅適用于大小表或RDD情況），5）使用隨機(jī)前綴和擴(kuò)容RDD進(jìn)行join，對其中一個RDD每條數(shù)據(jù)打上n以內(nèi)的隨機(jī)前綴，用flatMap算子對另一個RDD進(jìn)行n倍擴(kuò)容并擴(kuò)容后的每條數(shù)據(jù)依次打上0~n的前綴，最后將兩個改造key后的RDD進(jìn)行join（能大幅緩解join類型數(shù)據(jù)傾斜，需要消耗巨額內(nèi)存）

98、經(jīng)驗：shuffle write就是在一個stage結(jié)束計算之后，為了下一個stage可以執(zhí)行shuffle類的算子，而將每個task處理的數(shù)據(jù)按key進(jìn)行分類，將相同key都寫入同一個磁盤文件中，而每一個磁盤文件都只屬于下游stage的一個task，在將數(shù)據(jù)寫入磁盤之前，會先將數(shù)據(jù)寫入內(nèi)存緩存中，下一個stage的task有多少個，當(dāng)前stage的每個task就要創(chuàng)建多少份磁盤文件。

99、java.util.regex.PatternSyntaxException: Dangling meta character '?' near index 0

解決方法：元字符記得轉(zhuǎn)義

100、spark彈性資源分配

解決方法：配置spark shuffle service,打開spark.dynamicAllocation.enabled

101、經(jīng)驗：kafka的comsumer groupID對于spark direct streaming無效

102、啟動hadoop yarn,發(fā)現(xiàn)只啟動了ResourceManager，沒有啟動NodeManager

解決方法：yarn-site.xml配置有問題，檢查并規(guī)范各項配置

103、如何查看hadoop系統(tǒng)日志

解決方法：Hadoop 2.x中YARN系統(tǒng)的服務(wù)日志包括ResourceManager日志和各個NodeManager日志，它們的日志位置如下：ResourceManager日志存放位置是Hadoop安裝目錄下的logs目錄下的yarn-*-resourcemanager-*.log，NodeManager日志存放位置是各個NodeManager節(jié)點上hadoop安裝目錄下的logs目錄下的yarn-*-nodemanager-*.log

104、經(jīng)驗：小于128M的小文件都會占據(jù)一個128M的BLOCK，合并或者刪除小文件節(jié)省磁盤空間

105、how to remove Non DFS Used

解決方法：1）清除hadoop數(shù)據(jù)目錄中用戶緩存文件：cd /data/hadoop/storage/tmp/nm-local-dir/usercache;du -h;rm -rf `find-type f -size +10M`;2）清理Linux文件系統(tǒng)中的垃圾數(shù)據(jù)

106、經(jīng)驗：Non DFS Used指的是非HDFS的所有文件

107、linux profile配置文件隔離

解決方法：cd /etc/profile.d;在這里新建相應(yīng)配置腳本

108、The reference to entity "autoReconnect" must end with the ';' delimiter

解決方法：把&替換成&

109、Service hiveserver not found

解決方法：Try to run bin/hive --service hiveserver2 instead of hive --service hiveserver for this version of apache hive

110、Failed to execute spark task, with exception 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create spark client.)'

解決方法：不要預(yù)編譯的spark，重新編譯spark，并保證與hive pom中的版本一致

111、java.lang.NoSuchFieldError: SPARK_RPC_SERVER_ADDRESSat org.apache.hive.spark.client.rpc.RpcConfiguration.(RpcConfiguration.java:45)

解決方法：hive spark版本要匹配，同時必須是沒有-phive參數(shù)編譯的spark

112、javax.jdo.JDOFatalInternalException: Error creating transactional connection factory

解決方法：把mysql connector加入hive的lib中

113、org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create spark clientFAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.spark.SparkTask

解決方法：原因有多種，去hive.log查看日志進(jìn)一步定位問題

114、Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream

解決方法：編譯spark用了hadoop-provided參數(shù)，導(dǎo)致缺少hadoop相關(guān)包

115、linux 輸入錯誤命令按刪除鍵顯示^H

解決方法：執(zhí)行指令 stty erase ^H

116、經(jīng)驗：通過hive源文件pom.xml查看適配的spark版本，只要打版本保持一致就行，例如spark1.6.0和1.6.2都能匹配

117、經(jīng)驗：打開Hive命令行客戶端，觀察輸出日志是否有打印“SLF4J: Found binding in [jar:file:/work/poa/hive-2.1.0-bin/lib/spark-assembly-1.6.2-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]”來判斷hive有沒有綁定spark

118、啟動yarn，發(fā)現(xiàn)只啟動了部分Nodemanager

解決方法：未啟動的節(jié)點缺少yarn相關(guān)包，要保持所有節(jié)點jar包一致

119、Error: Could not find or load main class org.apache.hive.beeline.BeeLine

解決方法：重新編譯Hive，并帶上參數(shù)-Phive-thriftserver

120、經(jīng)驗：編譯spark，hive on spark就不要加-Phive參數(shù)，若需sparkSQL支持hive語法則要加-Phive參數(shù)

121、User class threw exception: org.apache.spark.sql.AnalysisException: path hdfs://XXXXXX already exists.;

解決方法：df.write.format("parquet").mode("append").save("path.parquet")

122、check the manual that corresponds to your MySQL server version for the right syntax to use near 'OPTION SQL_SELECT_LIMIT=DEFAULT' at line 1

解決方法：用新版mysql-connector

123、org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException): User: root is not allowed to impersonate

解決方法：vim core-site.xml,hadoop.proxyuser.root.hosts,value = *,hadoop.proxyuser.root.groups,value = *，restart yarn

124、java.lang.NoSuchMethodError: org.apache.parquet.schema.Types$MessageTypeBuilder.addFields([Lorg/apache/parquet/schema/Type;)Lorg/apache/parquet/schema/Types$BaseGroupBuilder;

解決方法：版本沖突所致，統(tǒng)一hive和spark中parquet組件版本

125、經(jīng)驗：可以通過hive-site.xml修改spark.executor.instances、spark.executor.cores、spark.executor.memory等配置來優(yōu)化hive on spark執(zhí)行性能，不過最好配成動態(tài)資源分配。

126、WARN SparkContext: Dynamic Allocation and num executors both set, thus dynamic allocation disabled.

解決方法：如果要使用動態(tài)資源分配，就不要設(shè)置執(zhí)行器個數(shù)

127、Invalid configuration property node.environment: is malformed (for class io.airlift.node.NodeConfig.environment)

解決方法：the node.environment property (in the node.properties file) is set but fails to match the following regular expression: [a-z0-9][_a-z0-9]*. 重新規(guī)范命名

128、com.facebook.presto.server.PrestoServerNo factory for connector hive-XXXXXX

解決方法：在hive.properties中 connector.name寫錯了，應(yīng)該為指定的版本，以便于presto使用對應(yīng)的適配器，修改為：connector.name=hive-hadoop2

129、org.apache.spark.SparkException: Task failed while writing rowsCaused by: org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: null

解決方法：ES負(fù)載過高，修復(fù)ES

130、經(jīng)驗：如果maven下載很慢，很可能是被天朝的GFW墻了，可以在maven安裝目錄的setting.conf配置文件mirrors標(biāo)簽下加入國內(nèi)鏡像抵制**黨的網(wǎng)絡(luò)封鎖，例如：

nexus-aliyun

Nexus aliyun

http://maven.aliyun.com/nexus/content/groups/public

131、RROR ApplicationMaster: Uncaught exception: java.lang.SecurityException: Invalid signature file digest for Manifest main attributes

解決方法：pom.xml文件中標(biāo)簽下加入

META-INF/*.SF

META-INF/*.DSA

META-INF/*.RSA

132、scala.MatchError: Buffer(10.113.80.29, None) (of class scala.collection.convert.Wrappers$JListWrapper)

解決方法：清除ES中跟scala數(shù)據(jù)類型不兼容的臟數(shù)據(jù)

133、HDFS誤刪文件如何恢復(fù)解決方法：core-site文件中加入

fs.trash.interval

2880

HDFS垃圾箱設(shè)置，可以恢復(fù)誤刪除，配置的值為分鐘數(shù)，0為禁用

恢復(fù)文件執(zhí)行 hdfs dfs -mv/user/root/.Trash/Current/誤刪文件 /原路徑

134、改了linux定時腳本里邊部分任務(wù)順序，導(dǎo)致有些任務(wù)未執(zhí)行，而有些重復(fù)執(zhí)行

解決方法：Linux腳本修改后實時生效，務(wù)必在腳本全部執(zhí)行完再修改，以免產(chǎn)生副作用

135、經(jīng)驗：spark兩個分區(qū)方法coalesce和repartition，前者窄依賴，分區(qū)后數(shù)據(jù)不均勻，后者寬依賴，引發(fā)shuffle操作，分區(qū)后數(shù)據(jù)均勻

136、org.apache.spark.SparkException: Task failed while writing rows scala.MatchError: Buffer(10.113.80.29, None) (of class scala.collection.convert.Wrappers$JListWrapper)

解決方法：ES數(shù)據(jù)在sparksql類型轉(zhuǎn)化時不兼容，可通過EsSpark.esJsonRDD以字符串形式取ES數(shù)據(jù)，再把rdd轉(zhuǎn)換成dataframe

137、Container exited with a non-zero exit code 143Killed by external signal

解決方法：分配的資源不夠，加大內(nèi)存或者調(diào)整代碼，盡量避免類似JsonObject這樣的大對象過度消耗內(nèi)存，或者Include below properties in yarn-site.xml and restart VM,

yarn.nodemanager.vmem-check-enabled

false

Whether virtual memory limits will be enforced for containers

yarn.nodemanager.vmem-pmem-ratio

Ratio between virtual memory to physical memory when setting memory limits for containers

18、對已有jar手動生成maven依賴

解決方法：mvn install:install-file -Dfile=spark-assembly-1.6.2-hadoop2.6.0.jar -DgroupId=org.apache.repack -DartifactId=spark-assembly-1.6.2-hadoop2.6.0 -Dversion=2.6 -Dpackaging=jar

139、FAILED: SemanticException [Error 10006]: Line 1:122 Partition not found ''2016-08-01''

解決方法：hive版本太新，hive自身bug，把hive版本從2.1.0降到1.2.1

140、ParseException line 1:17 mismatched input 'hdfs' expecting StringLiteral near 'inpath' in load statement

解決方法：去掉以hdfs開頭的IP端口號前綴，直接寫HDFS中的絕對路徑，并用單引號括起來

141、[ERROR] Terminal initialization failed; falling back to unsupportedjava.lang.IncompatibleClassChangeError: Found class jline.Terminal, but interface was expected解決方案：export HADOOP_USER_CLASSPATH_FIRST=true

142、crontab中啟動的shell腳本不能正常運行，但是使用手動執(zhí)行沒有問題

解決方法：在腳本第一行寫上source /etc/profile,因為cront進(jìn)程不會自動加載用戶目錄下的.profile文件

143、SparkListenerBus has already stopped! Dropping event SparkListenerStageCompleted

解決方法：集群資源不夠，確保真實剩余內(nèi)存大于spark job申請的內(nèi)存

144、PrestoException: ROW comparison not supported for fields with null elements

解決方法：把 !=null 換成 is not null

145、啟動presto服務(wù)器，部分節(jié)點啟動不成功

解決方法：JVM所分配的內(nèi)存，必須小于真實剩余內(nèi)存

146、經(jīng)驗：presto進(jìn)程一旦啟動，JVM server會一直占用內(nèi)存

147、Error injecting constructor, java.lang.IllegalArgumentException: query.max-memory-per-node set to 20GB, but only 10213706957B of useable heap available

解決方法：Presto will claim 0.40 * max heap size for the system pool, so your query.max-memory-per-node must not exceed this. You can increase the heap or decrease query.max-memory-per-node.

148、failed: Encountered too many errors talking to a worker node. The node may have crashed or be under too much load. failed java.util.concurrent.CancellationException: Task was cancelled

解決方法：such exceptions caused by timeout limits，延長等待時間，在work節(jié)點config配置中set exchange.http-client.request-timeout=50s

149、大數(shù)據(jù)ETL可視化有哪些主流方案

解決方法：可以考慮的技術(shù)棧有ELK(elasticsearch+logstash+kibana)或者HPA(hive+presto+airpal)

150、經(jīng)驗：presto集群沒必要采用on yarn模式，因為hadoop依賴HDFS，如果部分機(jī)器磁盤很小，HADOOP會很尷尬，而presto是純內(nèi)存計算，不依賴磁盤，獨立安裝可以跨越多個集群，可以說有內(nèi)存的地方就可以有presto

「經(jīng)管之家」APP：經(jīng)管人學(xué)習(xí)、答疑、交友，就上經(jīng)管之家！
免流量費下載資料----在經(jīng)管之家app可以下載論壇上的所有資源，并且不額外收取下載高峰期的論壇幣。
涵蓋所有經(jīng)管領(lǐng)域的優(yōu)秀內(nèi)容----覆蓋經(jīng)濟(jì)、管理、金融投資、計量統(tǒng)計、數(shù)據(jù)分析、國貿(mào)、財會等專業(yè)的學(xué)習(xí)寶庫，各類資料應(yīng)有盡有。
來自五湖四海的經(jīng)管達(dá)人----已經(jīng)有上千萬的經(jīng)管人來到這里，你可以找到任何學(xué)科方向、有共同話題的朋友。
經(jīng)管之家（原人大經(jīng)濟(jì)論壇），跨越高校的圍墻，帶你走進(jìn)經(jīng)管知識的新世界。
掃描下方二維碼下載并注冊APP

本文關(guān)鍵詞：

本文論壇網(wǎng)址：http://xalimeijing.com/thread-6112162-1-1.html

上一篇 | 中國創(chuàng)投簡史 · 番外篇

下一篇 | R語言時間戳轉(zhuǎn)換為時間后求最大值為NA

期刊庫精彩帖子推薦更多

您可能感興趣的文章

本站推薦的文章

人氣文章

本文標(biāo)題：大數(shù)據(jù)常見錯誤及其解決方案

本文鏈接網(wǎng)址：http://xalimeijing.com/jg/qikan_qikanku_6112162_1.html

1.凡人大經(jīng)濟(jì)論壇-經(jīng)管之家轉(zhuǎn)載的文章,均出自其它媒體或其他官網(wǎng)介紹,目的在于傳遞更多的信息,并不代表本站贊同其觀點和其真實性負(fù)責(zé)；
2.轉(zhuǎn)載的文章僅代表原創(chuàng)作者觀點,與本站無關(guān)。其原創(chuàng)性以及文中陳述文字和內(nèi)容未經(jīng)本站證實,本站對該文以及其中全部或者部分內(nèi)容、文字的真實性、完整性、及時性，不作出任何保證或承若；
3.如本站轉(zhuǎn)載稿涉及版權(quán)等問題,請作者及時聯(lián)系本站,我們會及時處理。

五月天婷亚洲天久久综合网,婷婷丁香五月激情亚洲综合,久久男人精品女人,麻豆91在线播放

大數(shù)據(jù)常見錯誤及其解決方案-經(jīng)管之家官網(wǎng)！

期刊庫