Community @ The Turning Gate

Support community for TTG plugins and products.

NOTICE

The Turning Gate's Community has moved to a new home, at https://discourse.theturninggate.net.

This forum is now closed, and exists here as a read-only archive.

  • New user registrations are disabled.
  • Users cannot create new topics.
  • Users cannot reply to existing topics.

You are not logged in.

#2 Re: Backlight Support » Double encoding of accented characters in file name » 2018-02-02 08:44:48

Thanks Ben for clarifying. I know sometimes it makes different sense from the developer's position than from the user's position (even if I feel obliged to say again that am running Apache on Windows, not IIS wink )
So in the end my encoding problem was only linked to Lightroom (still not sure why it has to encode the names when asked to export images by the Publisher, but removing the captions solved the problem, so I don't mind it any more), and it's all good now. I'm hoping that at some point the site is decent enough that I can post it to the Showcase gallery.

Thanks again for all your time and explanations!

Francois

#4 Re: Backlight Support » Double encoding of accented characters in file name » 2018-01-30 21:19:22

I've wasted enough of your time and this is starting to feel like a religion war, so I'll stop after this...

Only supporting Linux is fair enough and I didn't comment on it.

But just to be clear, the 5% of Windows servers I mentioned are those running something else than IIS. In total, 32% of all web servers in the world were running IIS in January 2018. That's more than Apache, with represented only 27%...

Source: https://news.netcraft.com/archives/cate … er-survey/

#5 Re: Backlight Support » Double encoding of accented characters in file name » 2018-01-29 08:06:59

Hi Ben,

In fact the "live" site to be will be running on Linux at my ISP. But as I have a rather slow connection at home, I needed a local development environment to have speed up my tests. So I followed the advice I found somewhere on this very forum which is to run a Backlight site on one's local system. Since I am on Windows (don't tell I'm your only Windows user?) I have this dev environment running on WAMP (=> no IIS involved), which I publish to directly from Lightroom, and where I first noticed the problem. And then I have the live environment running on Linux where I verified that the problem was occurring too, obviously.

Just one last thing, when you say "Apache should be fine", I assume you mean "Apache on Windows should be fine"? I wouldn't go so far as to say that Apache on Windows is exactly the same as it is on Linux, but given its presence on the web (about 5% of all web servers are Windows servers running something else than IIS = mostly Apache and NGINX) you can be fairly confident that it runs php applications properly smile

Francois

#6 Re: Backlight Support » Double encoding of accented characters in file name » 2018-01-28 23:09:51

Rod, Matthew, Ben - thank you all for your prompt replies. This is what I needed to know, whether adding the caption was the Publisher's doing or something else's. So since you all adamantly pointed out it wasn't the Publisher's, I investigated further and realized that there is one thing that one does only once and that I had not checked: the configuration of the publish service in Lightroom. This is where I had, myself, configured Lightroom to include the caption in the filename when I started playing with Backlight a few months ago.
I should have seen this right away and I am sorry for wasting your time, so please accept my red-faced apologies!!

@Matthew: I am in fact a systems admin and I manage web servers among other things, so I am familiar with what you explained about filename sanitizing. But it wasn't so much the fact that Apache couldn't find the files that worried me, it was more the belief that the Publisher did something that could cause harm, and the fact that different filenames had been set in different locations (filesystem, database)... Especially since I am actually running Lightroom and the Apache development server on the same Windows machine wink

So anyway, I'm in the process of updating the site now and it looks like it's all fine without the caption.
Thanks very much again!

#7 Re: Backlight Support » Double encoding of accented characters in file name » 2018-01-28 11:49:20

And since I realize that I didn't completely comment on your reply:
- the problem is not coming from the browser, all the browser does is display what the server returns (and AFAIK, unless you use a very old browser or try to display a very specific character set, it will usually understand the server's response and display all characters properly). In this case, it is the server that doesn't know where to find the files, because the page source code is inconsistent with the actual filenames.
- even if it isn't good practice to use anything outside of the ASCII charset in a URL, I'm pretty sure that you can get your browser to understand extended characters, as long as it knows how to read them. RFC 1738 which defines the format of URLs states that:

(...) an octet may be represented by
the chararacter which has that octet as its code within the US-ASCII
   [20] coded character set.

   In addition, octets may be encoded by a character triplet consisting
   of the character "%" followed by the two hexadecimal digits (from
   "0123456789ABCDEF") which forming the hexadecimal value of the octet.
   (The characters "abcdef" may also be used in hexadecimal encodings.)

   Octets must be encoded if they have no corresponding graphic
   character within the US-ASCII coded character set, if the use of the
   corresponding character is unsafe, or if the corresponding character
   is reserved for some other interpretation within the particular URL
   scheme.

RFC 3986 updates RFC 1738 without contradicting the above but I find the formulation in 1738 clearer.
So... my understanding is that as long as the browser knows how to read the server's response, all should be fine (hence my Apache trying to serve those files with URL-encoded names). But I'm not particularly asking for support of fancy characters in URLs. I'm only asking for all of the Backlight components to act consistently - or at least for the ability not to include captions in published filenames smile

#8 Re: Backlight Support » Double encoding of accented characters in file name » 2018-01-28 10:56:16

Hi Rod, thanks for your prompt reply.

The thing is, my file names do not contain any of these characters initially, and if I just export my pictures using only Lightroom (eg. after I developed them), I have no problem as they retain their original name, or they are renamed based on a preset I specified, which won't include any special characters.

It is the files whose captions contain such characters AND that are exported by Lightroom after I placed them in a Backlight collection and published them that are created with names containing encoded characters. I have now run the following extra tests:
1 - I renamed one of my pictures to "Rüppell's Wrasse.NEF" and exported it as a JPEG using Lightroom alone, and it was correctly exported as "Rüppell's Wrasse.jpg". So Lightroom is able to encode characters as it should if need be, it is the Publisher that causes the bad encoding to occur.
2 - I changed the caption in one of my pictures from "Thalassoma rueppellii - Rüppell's Wrasse, juvenile" to "Thalassoma rueppellii - Rüppell's Wrasse, juvenile - test ú ù û ü" to see if the problem occurred with all characters from extended character sets. The files created on disk upon publication were
- thumbnails\Thalassoma-rueppellii---Rüppell's-Wrasse,-juvenile---test-ú-ù-û-ü-Egypt-6271.jpg
- photos\Thalassoma-rueppellii---Rüppell's-Wrasse,-juvenile---test-ú-ù-û-ü-Egypt-6271.jpg
But the value of the "filename" field in the database for the corresponding entry is "Thalassoma-rueppellii---Rüppell's-Wrasse,-juvenile---test-ú-ù-û-ü-Egypt-6271.jpg". Thus publishing the photo created a discrepancy between the database and the filesystem. Of course in this case the web server then tries to find "Thalassoma-rueppellii---R%C3%BCppell's-Wrasse,-juvenile---test-%C3%BA-%C3%B9-%C3%BB-%C3%BC-Egypt-6271.jpg" and fails to do so...

My analysis is that the Publisher causes Lightroom to include the caption in the file name, perhaps without passing it the proper character set info: maybe it instructs it to use the ASCII charset (since most other charsets know about the accented characters). This usually goes unnoticed as captions may usually not include characters from extended character sets. Or maybe other users don't include captions in their filename (how?), but in my case they captions are included and this poses a problem.

And as I explained, the problem is that the Publisher (I suppose) stores file names in the Backlight database without applying the same encoding that it instructs Lightroom to apply for file export. So on the one hand it causes Lightroom to encode the characters, and on the other hand it doesn't encode them in the same way itself...

As for the problem with Apache URL-encoding the filename, it might be that this has nothing to do with Backlight, that it is just Apache trying to cope with characters that would otherwise not be acceptable for a URL. I don't know about this, but again the problem wouldn't exist in the first place if the Publisher didn't cause Lightroom to generate filenames which include the caption.

#9 Backlight Support » Double encoding of accented characters in file name » 2018-01-28 06:53:24

FrancoisL
Replies: 17

Hello,

I am facing a problem with double encoding of accented characters between Lightroom and the Backlight database, and between the filesystem and Apache.
I would like to add that I'm new with Backlight, and that although I'm comfortable with most IT concepts, I'm not a developer, so my understanding and ability to code (including web pages) is limited, and I might very well be missing something obvious... so please be nice wink

I'm creating (very slowly due to limited time and aforementioned skills) a fish pictures gallery. At this early stage, I chose to show the Lightroom caption field under each image by configuring the Thumbnail grid to display the {Caption} token. The problem is that my captions include both Latin and English names, and that some of those English names contain accented characters.

The first such character that I've come across is the umlaut (ü), which Lightroom encodes in UTF-8 upon export. For example, an image with caption "Thalassoma rueppellii - Rüppell's Wrasse, terminal phase" is exported as "Thalassoma-rueppellii---Rüppell's-Wrasse,-terminal-phase-Egypt-1736.jpg".
I checked the Backlight database in backlight\data\publisher\master.sq3, and found a first issue in that the filename is not encoded accordingly in the "photo" table as the encoding matches the caption, not the actual filename - the record contains "Thalassoma-rueppellii---Rüppell's-Wrasse,-terminal-phase-Egypt-1736.jpg". So the generated HTML contains the same encoding, which differs from the file name already :

<figure id="fig-Thalassoma-rueppellii---Rüppell's-Wrasse,-terminal-phase-Egypt-1736" itemscope="" itemtype="http://schema.org/ImageObject">
		<div class="thumbnail">
			<div class="thumbnail-background" style="background-image: url('http://localhost:8080/backlight/galleries/osteichtyes/perciformes/labridae/thumbnails/Thalassoma-rueppellii---Rüppell's-Wrasse,-terminal-phase-Egypt-1736.jpg');"></div>

	    <a class="photo-hyperlink" href="http://localhost:8080/backlight/galleries/osteichtyes/perciformes/labridae/single.php?id=Thalassoma-rueppellii---Rüppell's-Wrasse,-terminal-phase-Egypt-1736" data-fancybox="gallery" data-src="http://localhost:8080/backlight/galleries/osteichtyes/perciformes/labridae/photos/Thalassoma-rueppellii---Rüppell's-Wrasse,-terminal-phase-Egypt-1736.jpg">
	    <img src="http://localhost:8080/backlight/galleries/osteichtyes/perciformes/labridae/thumbnails/Thalassoma-rueppellii---Rüppell's-Wrasse,-terminal-phase-Egypt-1736.jpg" id="photo-Thalassoma-rueppellii---Rüppell's-Wrasse,-terminal-phase-Egypt-1736" class="landscape" height="166" width="250" style="height: 166px; width: 250px;" alt="Thalassoma-rueppellii---Rüppell's-Wrasse,-terminal-phase-Egypt-1736.jpg" title="Thalassoma-rueppellii---Rüppell's-Wrasse,-terminal-phase-Egypt-1736.jpg"></a>

Then for some reason, Apache URL-encodes the actual filename (so Ã=>%C3 and ¼=>%BC), and searches for "Thalassoma-rueppellii---R%C3%BCppell's-Wrasse,-terminal-phase-Egypt-1736.jpg", which it can't find on the filesystem:

::1 - - [27/Jan/2018:18:06:57 +0100] "GET /backlight/galleries/osteichtyes/perciformes/labridae/thumbnails/Thalassoma-rueppellii---R%C3%BCppell's-Wrasse,-terminal-phase-Egypt-1736.jpg HTTP/1.1" 404 6697

This holds true for both thumbnails and photos, so eventually neither display correctly. So... aside from fixing the multiple encoding issue, maybe a quick workaround would be not to include the caption in the filenames? Is there an easy way to do so? I searched for the corresponding option in the Publisher/Albums settings, but couldn't find one.

Thanks in advance for your help!

Francois

Board footer

Powered by FluxBB