题目
使用 Hive 进行 Word Count 词频统计下列说法正确的是 _____。A.select word count ( * ) from ( select explode ( split ( line , ) ) as word from docs ) w group by line ; B.以上都不对 C.select word count ( * ) from ( select explode ( split ( line , ) ) as word from docs ) w group by word ; D.select word line count ( * ) from ( select explode ( split ( line , ) ) as word from docs ) w group by word ;
使用 Hive 进行 Word Count 词频统计下列说法正确的是 _____。
A.select word count ( * ) from ( select explode ( split ( line , ) ) as word from docs ) w group by line ;
B.以上都不对
C.select word count ( * ) from ( select explode ( split ( line , ) ) as word from docs ) w group by word ;
D.select word line count ( * ) from ( select explode ( split ( line , ) ) as word from docs ) w group by word ;
题目解答
答案
答案是C。在使用 Hive 进行 Word Count 词频统计时,应使用以下语句:
select word, count(*) as count from (select explode(split(line, ' ')) as word from docs) w group by word;
解释:
explode方法用于把一行数据按照指定的分隔符分割成多个单词,得到一个单词数组;
split方法用于指定分隔符;
由于一个文件可能有多行,需要对每行数据进行单独处理;
最终按照"word"分组,对每个单词计算出现的次数。
因此选项C是正确的,而A和D都存在语法错误[2]。
解析
步骤 1:理解问题
问题要求我们使用Hive进行Word Count词频统计,即统计文档中每个单词出现的次数。我们需要从给定的选项中选择正确的Hive查询语句。
步骤 2:分析选项
A. select word count ( * ) from ( select explode ( split ( line , ) ) as word from docs ) w group by line ;
B. 以上都不对
C. select word count ( * ) from ( select explode ( split ( line , ) ) as word from docs ) w group by word ;
D. select word line count ( * ) from ( select explode ( split ( line , ) ) as word from docs ) w group by word ;
步骤 3:检查语法和逻辑
A选项:语法错误,group by line而不是word。
B选项:需要检查其他选项是否正确。
C选项:语法正确,逻辑正确,group by word。
D选项:语法错误,select word line count ( * )。
问题要求我们使用Hive进行Word Count词频统计,即统计文档中每个单词出现的次数。我们需要从给定的选项中选择正确的Hive查询语句。
步骤 2:分析选项
A. select word count ( * ) from ( select explode ( split ( line , ) ) as word from docs ) w group by line ;
B. 以上都不对
C. select word count ( * ) from ( select explode ( split ( line , ) ) as word from docs ) w group by word ;
D. select word line count ( * ) from ( select explode ( split ( line , ) ) as word from docs ) w group by word ;
步骤 3:检查语法和逻辑
A选项:语法错误,group by line而不是word。
B选项:需要检查其他选项是否正确。
C选项:语法正确,逻辑正确,group by word。
D选项:语法错误,select word line count ( * )。