Discussion:
Performance of File::Copy?
(too old to reply)
Henry Law
2017-07-05 14:50:13 UTC
Permalink
I searched for this because if I discern a problem then surely someone
else must have too, but found nothing; which makes me wonder whether I'm
barking up the wrong tree.

I wrote a simple test program which copies the contents of a directory
(I use images because they're 3 or 4MB each) into a temporary directory;
a parameter allows me to choose File::Copy operations or using "system"
with the "cp" command. (The source is at the bottom in case anyone
wants it). Here are some results:

***@eris ~/tools $ ./testcopy p '/d/u/Photos/2017_06_07'
157 files copied using perl calls in 9.43216 seconds

***@eris ~/tools $ ./testcopy s '/d/u/Photos/2017_06_07'
157 files copied using system calls in 3.445381 seconds

This suggests that, despite shelling out to a system command each time,
system(cp ....) is near enough three times as fast, which I wouldn't
have expected.

Does this tie in with other people's experience? Is there any reason
why I shouldn't re-code my programs to use 'cp' rather than 'File::Copy'?

-------------------------------------------------
#! /usr/bin/perl

# testcopy method dir
#
# Where method: p or s, meaning "via perl" or "via system"
# dir: a directory full of test files
#
# files are copied to a temporary directory, which is then deleted

use strict;
use warnings;
use 5.010;

use File::Temp qw( tempdir );
use File::Copy;
use Time::HiRes qw( gettimeofday tv_interval );

my $method = shift or die usage();
$method = lc $method;
die "Invalid method '$method'\n" unless $method eq 'p' || $method eq 's';

my $source = shift or die usage();
die "'$source' is not a directory\n" unless -d $source;
$source =~ s|/$||g;

my $target = tempdir( CLEANUP => 1 );

opendir my $DIR, $source or die "Couldn't open '$source':$!";
my $count = 0;
my $start = [ gettimeofday() ];
while( my $file = readdir $DIR ){
my $fullname = "$source/$file";
next if $file eq '.' || $file eq '..' || -d $fullname;
if ( $method eq 'p' ){
copy $fullname, $target;
}
else {
system( 'cp', $fullname, $target );
}
$count++;
}
my $elapsed = tv_interval( $start );
print "$count files copied using " . ($method eq 'p'? 'perl':'system') .
" calls in $elapsed seconds\n";
closedir $DIR;

sub usage{
my $usage = <<ENDTEXT;
Usage:
$0 { p | s } directory
ENDTEXT
return $usage;
}
--
Henry Law n e w s @ l a w s h o u s e . o r g
Manchester, England
Eric Pozharski
2017-07-06 07:55:07 UTC
Permalink
with <***@giganews.com> Henry Law wrote:
*SKIP*
Post by Henry Law
157 files copied using perl calls in 9.43216 seconds
157 files copied using system calls in 3.445381 seconds
This suggests that, despite shelling out to a system command each
time, system(cp ....) is near enough three times as fast, which I
wouldn't have expected.
Does this tie in with other people's experience? Is there any reason
why I shouldn't re-code my programs to use 'cp' rather than
'File::Copy'?
I just did couple of runs (data: 69K..23M, 142M total, 7.89M average;
flipping 'CLEANUP' and 'DIR"; three runs each combination), one of
results (diagnostic 'target is ...' is mine curiosity):

target is /tmp/T6d_MvVWnY
18 files copied using perl calls in 0.329459 seconds
target is /tmp/emp4Xp2Hq0
18 files copied using system calls in 0.390571 seconds

And never perl was three times as bad as system. So it rises a
question: versions please?

*CUT*
--
Torvalds' goal for Linux is very simple: World Domination
Stallman's goal for GNU is even simpler: Freedom
Henry Law
2017-07-06 15:13:18 UTC
Permalink
Post by Eric Pozharski
And never perl was three times as bad as system. So it rises a
question: versions please?
Hmm; interesting. That begins to explain why, when I searched, I found
no hits of the kind "File::Copy performance sucks ...".

$ perl -v

This is perl 5, version 22, subversion 1 (v5.22.1) built for
x86_64-linux-gnu-thread-multi
(with 58 registered patches, see perl -V for more detail)

OS is Linux Mint 18 (Sarah), patched up to date.
--
Henry Law n e w s @ l a w s h o u s e . o r g
Manchester, England
Eric Pozharski
2017-07-08 16:39:50 UTC
Permalink
Well, since no-one has jumped on I feel entitled to start speculate,
just like always :)
Post by Henry Law
Post by Eric Pozharski
And never perl was three times as bad as system. So it rises a
question: versions please?
Hmm; interesting. That begins to explain why, when I searched, I
found no hits of the kind "File::Copy performance sucks ...".
The other explanation is: no-one-cares. F::C might be used in
one-liner, or deep inside spagetti, or people just got used to mouse
files around.
Post by Henry Law
$ perl -v
This is perl 5, version 22, subversion 1 (v5.22.1) built for
x86_64-linux-gnu-thread-multi (with 58 registered patches, see perl -V
for more detail)
OS is Linux Mint 18 (Sarah), patched up to date.
Anyway, this strongly suggests your F::C is 2.30. diff tells me:

[1] you have "no warnings 'newline'" -- hard to imagine to be a culprit
(hm, quickly consulting with dict -- 'culprit' is actually a good
guy);

[2] you have VMS support to be greatly reworked (like mostly dropped or
cleaned up maybe);

[3] you have 'mpeix' support to be dropped (what the heck is 'mpeix'
anyway?).

Well, that strongly suggests -- the answer is out there. What else
could possible go wrong?

[1] I have vague impression (darn, 'impression' is already vague) that
some time ago these sys-and-friends became less sys* and more like
print-and-friends. IOW -- layers. Those must be removed manually.
But my impression might be wrong.

[2] It's hard to imagine that your BUFSIZ is 1024 bytes (or characters?
IOW -- layers). It's possible here is some weird interaction among
kernel, libc, and perl. Like GC maybe? Interesting, with
BUFFERSIZE (of 'copy') set to 16384 perl is always faster and even
this faster (caching on kernel side in action?) (test data of
yesterday has gone)

target is /tmp/trp5bM3lYT
23 files copied using perl calls in 2.740705 seconds
target is /tmp/E5NdcuVWew
23 files copied using system calls in 6.381695 seconds

In essence: I have no fscking idea what's going on.

p.s. However, I still believe in porters. They may know better. How
strange is this?
--
Torvalds' goal for Linux is very simple: World Domination
Stallman's goal for GNU is even simpler: Freedom
Continue reading on narkive:
Loading...