对无标题的文本做行列转置
【问题】 I have a tab delimited file that looks as follows:
cat my file.txt
gives:
1 299 1 150 1 50 1 57 2 -45 2 62 3 515 3 215 3 -315 3 -35 3 3 3 6789 3 34 5 66 5 1334 5 123
I’d like to use Unix commands to get a tab-delimited file that based on values in column#1, each column of the output file will hold all relevant values of column#2 (I’m using here for the example the separator “|” instead of tab only to illustrate my desired output file):
299 | -45| 515| 66 150| 62| 215| 1334 50 | | -315 | 57 | | -35 | | | 3 |
The corresponding Headers (1,2,3,5; based on column#1 values) could be a nice addition to the code (as shown below), but the main request is to split the information of the first file into separated columns. Thanks!
1| 2| 3| 5 299 | -45| 515| 66 150| 62| 215| 1334 50 | | -315 | 57 | | -35 | | | 3 |
别人解答:
awk BEGIN{max=0;} { d[$1][length(d[$1])+1] = $2; if(length(d[$1])>max) max = length(d[$1]); } END{ PROCINFO["sorted\_in"] = "@ind\_num_asc"; line = ""; flag = 0; for(j in d){ line = line (flag?"\t|\t":"") j; flag = 1; } print line; for(i=1; i<=max; ++i){ line = ""; flag = 0; for(j in d){ line = line (flag?"\t|\t":"") d[j][i]; flag = 1; } print line; } } file.txt
you get
1 | 2 | 3 | 5 299 | -45 | 515 | 66 150 | 62 | 215 | 1334 50 | | -315 | 57 | | -35 | | | 3 |
Or, you can use python …. for example, in split2Columns.py
import sys records = [line.split() for line in open(sys.argv[1])] import collections records_dict = collections.defaultdict(list) for key, val in records: records_dict[key].append(val) from itertools import izip_longest print "\t|\t".join(records_dict.keys()) print "\n".join(("\t|\t".join(map(str,l)) for l in izip\_longest(*records\_dict.values(), fillvalue="")))
【回答】 先将数据按第 1 字段分组,生成期望结果集,然后将分组成员竖着填入结果集。
上述算法涉及分组后运算及行转列,如无特殊要求可用 SPL 实现,代码简单易懂: