Community @ The Turning Gate

Support community for TTG plugins and products.

You are not logged in.

#1 2018-01-28 06:53:24

FrancoisL
Member
Registered: 2018-01-28
Posts: 9

Double encoding of accented characters in file name

Hello,

I am facing a problem with double encoding of accented characters between Lightroom and the Backlight database, and between the filesystem and Apache.
I would like to add that I'm new with Backlight, and that although I'm comfortable with most IT concepts, I'm not a developer, so my understanding and ability to code (including web pages) is limited, and I might very well be missing something obvious... so please be nice wink

I'm creating (very slowly due to limited time and aforementioned skills) a fish pictures gallery. At this early stage, I chose to show the Lightroom caption field under each image by configuring the Thumbnail grid to display the {Caption} token. The problem is that my captions include both Latin and English names, and that some of those English names contain accented characters.

The first such character that I've come across is the umlaut (ü), which Lightroom encodes in UTF-8 upon export. For example, an image with caption "Thalassoma rueppellii - Rüppell's Wrasse, terminal phase" is exported as "Thalassoma-rueppellii---Rüppell's-Wrasse,-terminal-phase-Egypt-1736.jpg".
I checked the Backlight database in backlight\data\publisher\master.sq3, and found a first issue in that the filename is not encoded accordingly in the "photo" table as the encoding matches the caption, not the actual filename - the record contains "Thalassoma-rueppellii---Rüppell's-Wrasse,-terminal-phase-Egypt-1736.jpg". So the generated HTML contains the same encoding, which differs from the file name already :

<figure id="fig-Thalassoma-rueppellii---Rüppell's-Wrasse,-terminal-phase-Egypt-1736" itemscope="" itemtype="http://schema.org/ImageObject">
		<div class="thumbnail">
			<div class="thumbnail-background" style="background-image: url('http://localhost:8080/backlight/galleries/osteichtyes/perciformes/labridae/thumbnails/Thalassoma-rueppellii---Rüppell's-Wrasse,-terminal-phase-Egypt-1736.jpg');"></div>

	    <a class="photo-hyperlink" href="http://localhost:8080/backlight/galleries/osteichtyes/perciformes/labridae/single.php?id=Thalassoma-rueppellii---Rüppell's-Wrasse,-terminal-phase-Egypt-1736" data-fancybox="gallery" data-src="http://localhost:8080/backlight/galleries/osteichtyes/perciformes/labridae/photos/Thalassoma-rueppellii---Rüppell's-Wrasse,-terminal-phase-Egypt-1736.jpg">
	    <img src="http://localhost:8080/backlight/galleries/osteichtyes/perciformes/labridae/thumbnails/Thalassoma-rueppellii---Rüppell's-Wrasse,-terminal-phase-Egypt-1736.jpg" id="photo-Thalassoma-rueppellii---Rüppell's-Wrasse,-terminal-phase-Egypt-1736" class="landscape" height="166" width="250" style="height: 166px; width: 250px;" alt="Thalassoma-rueppellii---Rüppell's-Wrasse,-terminal-phase-Egypt-1736.jpg" title="Thalassoma-rueppellii---Rüppell's-Wrasse,-terminal-phase-Egypt-1736.jpg"></a>

Then for some reason, Apache URL-encodes the actual filename (so Ã=>%C3 and ¼=>%BC), and searches for "Thalassoma-rueppellii---R%C3%BCppell's-Wrasse,-terminal-phase-Egypt-1736.jpg", which it can't find on the filesystem:

::1 - - [27/Jan/2018:18:06:57 +0100] "GET /backlight/galleries/osteichtyes/perciformes/labridae/thumbnails/Thalassoma-rueppellii---R%C3%BCppell's-Wrasse,-terminal-phase-Egypt-1736.jpg HTTP/1.1" 404 6697

This holds true for both thumbnails and photos, so eventually neither display correctly. So... aside from fixing the multiple encoding issue, maybe a quick workaround would be not to include the caption in the filenames? Is there an easy way to do so? I searched for the corresponding option in the Publisher/Albums settings, but couldn't find one.

Thanks in advance for your help!

Francois

Offline

#2 2018-01-28 09:54:20

rod barbee
Moderator
From: Port Ludlow, WA USA
Registered: 2012-09-24
Posts: 13,976
Website

Re: Double encoding of accented characters in file name

If I read this correctly, your filenames that contain special characters are being reformatted?

Filenames on the web should contain only standard letters, numbers, underscores, and dashes. There should not contain any special characters like unlauts. Nor should they contain commas, apostrophes, parentheses, spaces, etc.

I don't think Lightroom is making the changes, rather the browser is interpreting and changing the characters.
For instance, if a filename has a space in it, for example, my file.jpg, it will show up in the url to that file as my%20file.jpg.


Rod 
Just a user with way too much time on his hands.
www.rodbarbee.com
ttg-tips.com, Pangolin test site, Backlight 1.1.1 test site

Offline

#3 2018-01-28 10:56:16

FrancoisL
Member
Registered: 2018-01-28
Posts: 9

Re: Double encoding of accented characters in file name

Hi Rod, thanks for your prompt reply.

The thing is, my file names do not contain any of these characters initially, and if I just export my pictures using only Lightroom (eg. after I developed them), I have no problem as they retain their original name, or they are renamed based on a preset I specified, which won't include any special characters.

It is the files whose captions contain such characters AND that are exported by Lightroom after I placed them in a Backlight collection and published them that are created with names containing encoded characters. I have now run the following extra tests:
1 - I renamed one of my pictures to "Rüppell's Wrasse.NEF" and exported it as a JPEG using Lightroom alone, and it was correctly exported as "Rüppell's Wrasse.jpg". So Lightroom is able to encode characters as it should if need be, it is the Publisher that causes the bad encoding to occur.
2 - I changed the caption in one of my pictures from "Thalassoma rueppellii - Rüppell's Wrasse, juvenile" to "Thalassoma rueppellii - Rüppell's Wrasse, juvenile - test ú ù û ü" to see if the problem occurred with all characters from extended character sets. The files created on disk upon publication were
- thumbnails\Thalassoma-rueppellii---Rüppell's-Wrasse,-juvenile---test-ú-ù-û-ü-Egypt-6271.jpg
- photos\Thalassoma-rueppellii---Rüppell's-Wrasse,-juvenile---test-ú-ù-û-ü-Egypt-6271.jpg
But the value of the "filename" field in the database for the corresponding entry is "Thalassoma-rueppellii---Rüppell's-Wrasse,-juvenile---test-ú-ù-û-ü-Egypt-6271.jpg". Thus publishing the photo created a discrepancy between the database and the filesystem. Of course in this case the web server then tries to find "Thalassoma-rueppellii---R%C3%BCppell's-Wrasse,-juvenile---test-%C3%BA-%C3%B9-%C3%BB-%C3%BC-Egypt-6271.jpg" and fails to do so...

My analysis is that the Publisher causes Lightroom to include the caption in the file name, perhaps without passing it the proper character set info: maybe it instructs it to use the ASCII charset (since most other charsets know about the accented characters). This usually goes unnoticed as captions may usually not include characters from extended character sets. Or maybe other users don't include captions in their filename (how?), but in my case they captions are included and this poses a problem.

And as I explained, the problem is that the Publisher (I suppose) stores file names in the Backlight database without applying the same encoding that it instructs Lightroom to apply for file export. So on the one hand it causes Lightroom to encode the characters, and on the other hand it doesn't encode them in the same way itself...

As for the problem with Apache URL-encoding the filename, it might be that this has nothing to do with Backlight, that it is just Apache trying to cope with characters that would otherwise not be acceptable for a URL. I don't know about this, but again the problem wouldn't exist in the first place if the Publisher didn't cause Lightroom to generate filenames which include the caption.

Last edited by FrancoisL (2018-01-28 12:19:18)

Offline

#4 2018-01-28 11:49:20

FrancoisL
Member
Registered: 2018-01-28
Posts: 9

Re: Double encoding of accented characters in file name

And since I realize that I didn't completely comment on your reply:
- the problem is not coming from the browser, all the browser does is display what the server returns (and AFAIK, unless you use a very old browser or try to display a very specific character set, it will usually understand the server's response and display all characters properly). In this case, it is the server that doesn't know where to find the files, because the page source code is inconsistent with the actual filenames.
- even if it isn't good practice to use anything outside of the ASCII charset in a URL, I'm pretty sure that you can get your browser to understand extended characters, as long as it knows how to read them. RFC 1738 which defines the format of URLs states that:

(...) an octet may be represented by
the chararacter which has that octet as its code within the US-ASCII
   [20] coded character set.

   In addition, octets may be encoded by a character triplet consisting
   of the character "%" followed by the two hexadecimal digits (from
   "0123456789ABCDEF") which forming the hexadecimal value of the octet.
   (The characters "abcdef" may also be used in hexadecimal encodings.)

   Octets must be encoded if they have no corresponding graphic
   character within the US-ASCII coded character set, if the use of the
   corresponding character is unsafe, or if the corresponding character
   is reserved for some other interpretation within the particular URL
   scheme.

RFC 3986 updates RFC 1738 without contradicting the above but I find the formulation in 1738 clearer.
So... my understanding is that as long as the browser knows how to read the server's response, all should be fine (hence my Apache trying to serve those files with URL-encoded names). But I'm not particularly asking for support of fancy characters in URLs. I'm only asking for all of the Backlight components to act consistently - or at least for the ability not to include captions in published filenames smile

Offline

#5 2018-01-28 14:54:00

rod barbee
Moderator
From: Port Ludlow, WA USA
Registered: 2012-09-24
Posts: 13,976
Website

Re: Double encoding of accented characters in file name

this is the first I've ever heard of Lightroom’s Publish service placing caption text into a filename unless it was set up to change the filenames and a filename preset that placed the caption in the filename was being used upon publication.

As far as I know, TTG Publisher doesn’t change the way Publish Services renames files. But Ben will be the one who knows about that. It will be interesting to see what he thinks. It’s an odd problem.


Rod 
Just a user with way too much time on his hands.
www.rodbarbee.com
ttg-tips.com, Pangolin test site, Backlight 1.1.1 test site

Offline

#6 2018-01-28 17:12:46

Matthew
Administrator
From: Seoul, South Korea
Registered: 2012-09-24
Posts: 5,190
Website

Re: Double encoding of accented characters in file name

You're running Lightroom on Mac or Windows, and hosting your website on Linux/Apache. That's a minimum of two different systems handling your files. And how your host has configured their instance of Linux/Apache, we don't know. And who knows what your files are being wrung through in between.

In addition, Lightroom does whatever it does. We attempt to sanitize titles, captions and file names where we can, where we think we need to. But best practice is generally to minimize danger.

Rod's example is a good one, with "my file" being turned into "my%20file" in the URL. Similar things happen to umlauted and accented characters, various punctuation characters, and so on.

For file names, you should play it same, keeping to "standard" letters and numbers, and using only hyphens or underscores as needed. No other punctuation, no spaces, and no special characters or character variants. Just because your Mac can handle umlauts in a file name, your web-host may not be configured to do so.


Campagna Pictures, http://campagnapictures.com
The Turning Gate, http://theturninggate.net

Offline

#7 2018-01-28 19:50:57

Ben
Moderator
From: Melbourne, Australia
Registered: 2012-09-29
Posts: 3,456

Re: Double encoding of accented characters in file name

The crux of it seems to be captions appearing in your filenames.  Publisher should not do that, and as far as I know, we have not encountered that happening before.  This sounds like a LR userland issue to me, whereby something has been configured to add captions to your filenames upon export or publishing. 
Does this happen with any other export or publishing plugins you might happen to use?

Offline

#8 2018-01-28 23:09:51

FrancoisL
Member
Registered: 2018-01-28
Posts: 9

Re: Double encoding of accented characters in file name

Rod, Matthew, Ben - thank you all for your prompt replies. This is what I needed to know, whether adding the caption was the Publisher's doing or something else's. So since you all adamantly pointed out it wasn't the Publisher's, I investigated further and realized that there is one thing that one does only once and that I had not checked: the configuration of the publish service in Lightroom. This is where I had, myself, configured Lightroom to include the caption in the filename when I started playing with Backlight a few months ago.
I should have seen this right away and I am sorry for wasting your time, so please accept my red-faced apologies!!

@Matthew: I am in fact a systems admin and I manage web servers among other things, so I am familiar with what you explained about filename sanitizing. But it wasn't so much the fact that Apache couldn't find the files that worried me, it was more the belief that the Publisher did something that could cause harm, and the fact that different filenames had been set in different locations (filesystem, database)... Especially since I am actually running Lightroom and the Apache development server on the same Windows machine wink

So anyway, I'm in the process of updating the site now and it looks like it's all fine without the caption.
Thanks very much again!

Offline

#9 2018-01-29 06:19:06

Ben
Moderator
From: Melbourne, Australia
Registered: 2012-09-29
Posts: 3,456

Re: Double encoding of accented characters in file name

Hi Francois, there's no need to apologise.  We're more than happy to work through things like this.

On Windows, in case you weren't aware we don't provide support for Windows servers.  You may find that Backlight works perfectly well for you, but if things don't, our support is limited to 'best effort'.  Apache should be fine, although we can't guarantee it.  Where things become difficult is PHP on IIS, as we definitely don't have the capability to test on that environment (or environments, with different versions of IIS).

Offline

#10 2018-01-29 08:06:59

FrancoisL
Member
Registered: 2018-01-28
Posts: 9

Re: Double encoding of accented characters in file name

Hi Ben,

In fact the "live" site to be will be running on Linux at my ISP. But as I have a rather slow connection at home, I needed a local development environment to have speed up my tests. So I followed the advice I found somewhere on this very forum which is to run a Backlight site on one's local system. Since I am on Windows (don't tell I'm your only Windows user?) I have this dev environment running on WAMP (=> no IIS involved), which I publish to directly from Lightroom, and where I first noticed the problem. And then I have the live environment running on Linux where I verified that the problem was occurring too, obviously.

Just one last thing, when you say "Apache should be fine", I assume you mean "Apache on Windows should be fine"? I wouldn't go so far as to say that Apache on Windows is exactly the same as it is on Linux, but given its presence on the web (about 5% of all web servers are Windows servers running something else than IIS = mostly Apache and NGINX) you can be fairly confident that it runs php applications properly smile

Francois

Offline

#11 2018-01-29 08:32:18

Ben
Moderator
From: Melbourne, Australia
Registered: 2012-09-29
Posts: 3,456

Re: Double encoding of accented characters in file name

Yes, Apache for Windows.  Of course many of our users run Windows locally.  We do not provide support beyond 'best effort' for local web server installations, and expect users who manage their own web servers to know what they are doing, whether remotely or local.  As you said perhaps 5% of servers are Windows-based.  We do not support them as we do not have the resources to test against Windows servers.  We make it known that Linux is our only supported server platform in our product requirements.

Offline

#12 2018-01-30 21:19:22

FrancoisL
Member
Registered: 2018-01-28
Posts: 9

Re: Double encoding of accented characters in file name

I've wasted enough of your time and this is starting to feel like a religion war, so I'll stop after this...

Only supporting Linux is fair enough and I didn't comment on it.

But just to be clear, the 5% of Windows servers I mentioned are those running something else than IIS. In total, 32% of all web servers in the world were running IIS in January 2018. That's more than Apache, with represented only 27%...

Source: https://news.netcraft.com/archives/cate … er-survey/

Offline

#13 2018-01-31 05:15:50

Ben
Moderator
From: Melbourne, Australia
Registered: 2012-09-29
Posts: 3,456

Re: Double encoding of accented characters in file name

Mostly corporate sites. Our market is for indepently hosted sites, of which the vast majority are Linux-based. Individuals and small businesses also have greater flexibility to choose a Linux host should they choose to use our product.

TTG would not be a viable business if we chose to support IIS.

Offline

#14 2018-01-31 06:47:10

FrancoisL
Member
Registered: 2018-01-28
Posts: 9

Re: Double encoding of accented characters in file name

Touché! smile

Offline

#15 2018-01-31 08:29:38

Ben
Moderator
From: Melbourne, Australia
Registered: 2012-09-29
Posts: 3,456

Re: Double encoding of accented characters in file name

I should elaborate on the support we do or have provided for non-Linux installations.  I have in the past fired up an IIS instance, and made changes to the code to make sure it works.  I've also made various changes over the years for easy fixes that enabled code to work on various environments.  You'll note we have a basic document for Nginx, that may be of help to some users.  Many of the fixes made in such cases are in our codebase.  However, our support ends when we don't have the means to readily replicate or fix code under certain environments (e.g. IIS version X on Windows version Y).

The difference in support is that if you're on Linux, the sky is the limit for how far we'll go to provide support.  I've spent 10-20 hours supporting individual customers, going so far as signing up with a host so that I could try and resolve an issue from a customer's point-of-view.  We have the philosophy that we'd like 100% of customer sites to work if their hosting meets our stated requirements. We'll only recommend customer's shift hosts if the environment has impediments out of our control, such as firewall or mod_security rules that prevent Backlight from running and which can't or won't be changed by the host.

Offline

#16 2018-02-02 08:44:48

FrancoisL
Member
Registered: 2018-01-28
Posts: 9

Re: Double encoding of accented characters in file name

Thanks Ben for clarifying. I know sometimes it makes different sense from the developer's position than from the user's position (even if I feel obliged to say again that am running Apache on Windows, not IIS wink )
So in the end my encoding problem was only linked to Lightroom (still not sure why it has to encode the names when asked to export images by the Publisher, but removing the captions solved the problem, so I don't mind it any more), and it's all good now. I'm hoping that at some point the site is decent enough that I can post it to the Showcase gallery.

Thanks again for all your time and explanations!

Francois

Offline

#17 2018-02-02 09:02:21

rod barbee
Moderator
From: Port Ludlow, WA USA
Registered: 2012-09-24
Posts: 13,976
Website

Re: Double encoding of accented characters in file name

There's a really good Lightroom forum at https://www.lightroomforums.net/ that might be able to tackle that file name encoding issue.


Rod 
Just a user with way too much time on his hands.
www.rodbarbee.com
ttg-tips.com, Pangolin test site, Backlight 1.1.1 test site

Offline

#18 2018-02-06 05:19:31

FrancoisL
Member
Registered: 2018-01-28
Posts: 9

Re: Double encoding of accented characters in file name

Thanks a lot Rod! smile

Offline

Board footer

Powered by FluxBB