批量文件取某列合并成一个新文件
【问题】
I have a large number of files with the same, tab-delimited format:
Column A Column B Data_A1 Data_B1 Data_A2 Data_B2 Data_A3 Data_B3
These files all have the same number of lines.
I want to compile every files Column B data into a single tab-delimited file. Right now, my best plan is to write a Perl script along these lines:
#!/usr/bin/perl my $file = shift @ARGV; my $ref = shift @ARGV; open ( FILE, $file ); # FILE WITH FORMAT DESCRIBED ABOVE while (<FILE>) { chomp; my @a = split(" ", $_); push(@B, $a[1]); } close FILE; my $counter = 0; open (REF, $ref); # TAB-DELIMITED COMPILATION OF EVERY FILES COLUMN B while (<REF>) { chomp; print "$_ $B[$counter] "; } close REF;
Then, write a BASH script that loops through all the files and saving the output of the Perl script as its input for the next iteration of the shell loop:
<!-- /* Font Definitions */ @font-face {font-family:宋体; panose-1:2 1 6 0 3 1 1 1 1 1; mso-font-alt:SimSun; mso-font-charset:134; mso-generic-font-family:auto; mso-font-pitch:variable; mso-font-signature:3 680460288 22 0 262145 0;} @font-face {font-family:宋体; panose-1:2 1 6 0 3 1 1 1 1 1; mso-font-alt:SimSun; mso-font-charset:134; mso-generic-font-family:auto; mso-font-pitch:variable; mso-font-signature:3 680460288 22 0 262145 0;} @font-face {font-family:Consolas; panose-1:2 11 6 9 2 2 4 3 2 4; mso-font-charset:0; mso-generic-font-family:modern; mso-font-pitch:fixed; mso-font-signature:-536869121 64767 1 0 415 0;} @font-face {font-family:"\@宋体"; panose-1:2 1 6 0 3 1 1 1 1 1; mso-font-charset:134; mso-generic-font-family:auto; mso-font-pitch:variable; mso-font-signature:3 680460288 22 0 262145 0;} /* Style Definitions */ p.MsoNormal, li.MsoNormal, div.MsoNormal {mso-style-unhide:no; mso-style-qformat:yes; mso-style-parent:""; margin:0cm; margin-bottom:.0001pt; text-align:justify; text-justify:inter-ideograph; mso-pagination:none; font-size:12.0pt; mso-bidi-font-size:10.0pt; font-family:"Times New Roman","serif"; mso-fareast-font-family:宋体; mso-font-kerning:1.0pt;} .MsoChpDefault {mso-style-type:export-only; mso-default-props:yes; font-size:10.0pt; mso-ansi-font-size:10.0pt; mso-bidi-font-size:10.0pt; mso-ascii-font-family:"Times New Roman"; mso-fareast-font-family:宋体; mso-hansi-font-family:"Times New Roman"; mso-font-kerning:0pt;} /* Page Definitions */ @page {mso-page-border-surround-header:no; mso-page-border-surround-footer:no;} @page WordSection1 {size:612.0pt 792.0pt; margin:72.0pt 90.0pt 72.0pt 90.0pt; mso-header-margin:36.0pt; mso-footer-margin:36.0pt; mso-paper-source:0;} div.WordSection1 {page:WordSection1;} --> #!/bin/bash for file in *.txt do perl Script.pl $file Infile > Temp mv Temp Infile done
But this feels like a huge amount of work for something so simple. Is there a simple Unix command that can do the same thing?
Expected Output:
File1_Column_B File2_Column_B File3_Column_B ... Data_B1 Data_B1 Data_B1 ... Data_B2 Data_B2 Data_B2 ... Data_B3 Data_B3 Data_B3 ... ...
【回答】
这个需求涉及一系列的有序算法,尤是其将字段名拼成特定的格式时。可以考虑用SPL解决这个问题,代码如下:
A1:读取文件列表。
A2:取得每个文件的Column_B列数据。
A3:将Column B列数据拼在一起。
A4:将A3结果写入到文件,并修改字段名。