The UNIX "head" and "tail" utilities are two of the most useful tools. Head outputs the first part of the file, while tail outputs the last part of the file. But sometimes you want to output the middle part of the file. This is where this "body" scripts comes in. Body will output the middle part of the file based on the starting line number and the ending line number specified on the command-line. Body reads from stdin and outputs to stdout. So you can pipe in a file.
body is distributed as executable source code under the GNU General Public License. Please see the license agreement elsewhere on this site.
Usage: body "start line number" "end line number"
cat readme.txt | body 20 40
Pipes readme.txt into body, which outputs the text from line 20 to line 40.
Attached File: body (411 B)
As much as I hate non-structured programming, I finally gave into the demon when I decided to make the body shell script more efficient. By introducing a single "break" statement, the script can stop whenever it's done outputting the last specified line, rather than running through the rest of the text file.
I've also changed the command-line argument check. Although the previous implemented worked, this new implementation is more semantically correct.
The only problem left? The "head" and "tail" UNIX tools has the ability to output white spaces in front of each line. The "read" line statement in "body" strips white spaces. How could we re-implement body to keep the white spaces?
A while back, I've thought of a way to greatly improve the speed of this "body" utility. I would take advantage of the efficiency of "head" and "tail". But I wasn't motivated to fix the problem, until today.
Today, I've found that it's hard to use the "body" utility in shell scripts due to it's inherit nature to strip extra white spaces. It make shell programming unpredictable and made things file in unexpected ways. So I decide to re-implement body with my "head" and "tail" idea.
Attached below is the latest version of "body" utility using the efficient UNIX "head" and "tail" utility. This latest version fixed the extra white space stripping problem. And I suspect it is faster than the original implementation. I am in the process of testing the performance and will post the result as soon as I have it.
I now have the performance data. As I suspected, the "head" and "tail" algorithm is way faster than the original implementation. In my test setup, I wrote a simple script to retrieve ten lines from the "body" of a large file and time stamp the process. That large file is basically a large spam IP list that you can get from "Fight Comment Spam, Ban IP's". The following is the sample script.
ls -aogF BannedList.txt
The result of using the second implementation of "body" took almost four minutes (executing on a 300 MHz notebook computer), as shown below.
Using the latest release of "body" with the "head" and "tail" algorithm, the process took less than a second (see below)!
-rw-r--r-- 1 5296239 2007-08-20 17:59 BannedList.txt
Made an improvement to the script to handle errors. This version of body detects the first line being greater than the last line and output an error to stderr.
Did your message disappear? Read the Forums FAQ.
Spam Control | * indicates required field
No TrackBacks yet. TrackBack can be used to link this thread to your weblog, or link your weblog to this thread. In addition, TrackBack can be used as a form of remote commenting. Rather than posting the comment directly on this thread, you can posts it on your own weblog. Then have your weblog sends a TrackBack ping to the TrackBack URL, so that your post would show up here.
Messages, files, and images copyright by respective owners.
Copyright © 1996 - 2017. All Rights Reserved.