Digital Library of India: Download all that you can...
February 23, 2007 - 00:20 — hpn
[:http://dli.iiit.ac.in/|Digital Library of India] has been unveiled, but with a shocker of an interface. But not much can be expected out of a "Government of India" project, as they always manage to find just the right technologies (or people?) for their job (Why, e-governance in India is all ready to go Microsoft's way. When M$ boasts riches, we can all show our kids its logo and say that our government of poor people is one of their key customers).
They use TIFF format for image scans of thousands of books probably from libraries all over India. There are two petty interfaces which need you to download a software to be able to view. The software, in turn, needs to be registered to be able to use.
Last month, it was the [:http://ildc.gov.in/Kannada/kdownload.htm|disappointing set of tools] released by TDIL, and this month, the DLI. Not to mention, the projects are obviously worth a lot, but shabbily done. Idea seems to be right, but implementation has been terribly bad.
A Quick Script
To overcome the toil between interest for books on DLI (which otherwise are not easily available) and the irritation of shabby interface, I wrote a (shabby) script that batch downloads the TIFFs.
It is a quickly written script with pieces from here and there that has many stupid parts (which undoubtedly would be mine). But mainly it works, just like the projects I've been mentioning here. Serves right, in a way.
It needs you to paste URLs pointing to TIFF of starting page for each book, with the filename removed. Pretty clumsy, yes. But that was convenient for me, since I removed the frames from their web page while viewing, browsed through the list, and clicked on each to check the quality. Saves the irritation for the next many pages.
Ah, and you'll need to paste the URLs onto a file. For use on just a single book, it is easy to modify, anyway.
Try it, modify it and let me know if you improve it - 'cos I'm still downloading books from there. :)
#!/bin/sh
#Get your favourite book from DLI: Specify the start page and end page, and this script takes care of the rest.
#Caveat: you'll need to specify the base URL, though.
PATH=/bin:/usr/bin:/usr/local/bin
progname=`basename $0`
case $# in
0) 1>&2 echo $progname: usage $progname start end; exit 1 ;;
esac
start=$1
end=$2
prefix=0000
echo "Enter path for the file to read:"
read file
#exec > $HOME/log_dli.txt
x=1
lns=`wc -l $file`
echo "LNS: $lns"
index=`expr $start`
while [ $x -lt $(wc -l <$file) ]
do
url=`head -n $x $file | tail -n 1`
index=`expr $start`
mkdir $x
cd $x
while [ $index -le $end ]
do
if [ $index -lt 10 ]
then
digits=000
elif [ $index -ge 10 ]
then
if [ $index -lt 100 ]
then
digits=00
else
digits=0
fi
else
echo "bah. \n"
fi
WGET_OUTPUT=$(2>&1 wget --timestamping --progress=dot:mega \
"$url$prefix$digits$index.tif")
# wget $url$prefix$digits$index.tif
if [ $? -ne 0 ]
then
# wget had problems.
echo 1>&2 $0: "$WGET_OUTPUT" Exiting.
fi
if (echo "$WGET_OUTPUT" | fgrep 'Not Found') > /dev/null
then
break
else
echo "~~~~ Page found. Downloaded. ~~~~ \n"
fi
index=`expr $index + 1`
done
cd ..
x=`expr $x + 1`
done
Note: Make sure the URL is in the form of
http://dli.iiit.ac.in//server12/disk3a/TO%20SUBMISSION/KANNADA/Bharatiya%20Tatva%20Shastra%20Samgraha//PTIFF and each URL is to be placed on the file in new line (the script doesn't detect empty lines).
See also:
- [:http://hpnadig.net/notes/converting-and-merging-tiff-to-pdf|Converting Tiffs to PDF].
- [:http://sampada.net/Kannada-ebooks-torrent-1-and-2-Index|Torrents to several books] I prepared using this script.
Enjaaay!
- hpn's blog
- 3548 reads
Comments
March 3, 2007 - 21:00 — Sathish Nayak B (not verified)
Re: Digital Library of India: Download all that you can...
Namaskara sir.
I recently come to know about Digital Library of India.
I copied your shell script and run but i dint got how to give url of file.
Is it neccesary to download .tif file.
Can i give URL link of tif file?
I downloaded file and gave the path of the file to shell script and run the script it created nearly 500 empty folders.
Is i followed wrong way?
I want to download Some books of Shivarama Karantha. Please help me
Regarding
Sathish Nayak
March 3, 2007 - 23:44 — hpn
Re: Digital Library of India: Download all that you can...
You'll need to put the URLs in a text file and specify the path to that file when the script asks for it.
The URLs should be of the form http://.../PTIFF/ (Paste it from the Image Link for each book)
Besides "Cut & Paste" use of this script is not recommended. The script doesn't do too many fancy things other than doing what it is intended to do when the input is right. Feel free to improve it.
March 27, 2007 - 20:41 — Anonymous (not verified)
I dont know how to run this script and download the books.
ನಮಸ್ಕಾರ ನಾಡಿಗರೆ,
ನನಗೆ.. ಸ್ಕ್ರಿಪ್ಟ್ಸ್ ರನ್ನ್ ಮಾಡಿ.. ಗೊತ್ತಿಲ್ಲಾ..
ದಯವಿಟ್ಟು.. ಇದನ್ನ ಹ್ಯಾಗೆ ರನ್ ಮಾಡಿ books download ಮಾಡಬೇಕು ಅಂತ steps by steps ಹೇಳಿತೀರಾ..
(ಎಲ್ಲಾ ಆಂಗ್ಲ ಭಾಷೆಯಲ್ಲೇ ಹೆಳಬಹುದಿತ್ತೇನೊ.. ಏಕೆಂದರೆ.. ನಾನು ಬರೆದ 80% ಅದರಲ್ಲೆ ಇದೆ ಅಲ್ಲವೆ.. ಇರಲಿ :))
ನಿಮ್ಮ ಉತ್ತರಕ್ಕಾಗಿ ಕಾಯ್ತಾ ಇದ್ದೇನೆ..
ವಿನಾಯಕ
March 19, 2008 - 18:26 — mala (not verified)
Dear Sir, Pls help me also,
Dear Sir,
Pls help me also, pls teach me ow to download books from digital library of india
i want download some tamil books.
Pls help me
Mala

