将大量文件合并为一个

问题描述:

我大约有30 K个文件.我想将它们合并为一个.我使用了CAT,但出现此错误.

I have around 30 K files. I want to merge them into one. I used CAT but I am getting this error.

cat *.n3 > merged.n3

-bash: /usr/bin/xargs: Argument list too long

如何增加使用"cat"命令的限制?如果有任何合并大量文件的迭代方法,请帮助我.

How to increase the limit of using the "cat" command? Please help me if there is any iterative method to merge a large number of files.

这是一种安全的方法,不需要find:

Here's a safe way to do it, without the need for find:

 printf '%s\0' *.n3 | xargs -0 cat > merged.txt

(我还选择了merged.txt作为输出文件,正如@MichaelDautermann忠告建议的那样;此后重命名为merged.n3).

(I've also chosen merged.txt as the output file, as @MichaelDautermann soundly advises; rename to merged.n3 afterward).

注意:此方法有效的原因是:

Note: The reason this works is:

  • printf是bash外壳内置,其命令行不受传递给外部可执行文件的命令行长度的限制.
  • xargs聪明地将输入参数(通过 pipe 传递,因此也不受命令行长度限制)划分为多个调用,从而避免了长度限制;换句话说:xargs在不超出限制的情况下进行尽可能少的呼叫.
  • 使用\0作为与xargs的-0选项配对的定界符,可确保所有文件名-甚至包括带有嵌入式空格或换行符的文件名-均按原样传递.
  • printf is a bash shell builtin, whose command line is not subject to the length limitation of command lines passed to external executables.
  • xargs is smart about partitioning the input arguments (passed via a pipe and thus also not subject to the command-line length limit) into multiple invocations so as to avoid the length limit; in other words: xargs makes as few calls as possible without running into the limit.
  • Using \0 as the delimiter paired with xargs' -0 option ensures that all filenames - even those with, e.g., embedded spaces or even newlines - are passed through as-is.